Machine Learning, Uncertain Information, and the Inevitability of Negative `Probabilities'
published: Feb. 25, 2007, recorded: September 2004, views: 598
Report a problem or upload filesIf you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
`The only difference between a probabilistic classical world and the equations of the quantum world is that somehow or other it appears as if the probabilities would have to go negative ... that's the fundamental problem. I don't know the answer to it, but I wanted to explain that if I try my best to make the equations look as near as possible to what would be imitable by a classical probabilistic computer, I get into trouble' These are the words of Richard Feynman in a famous keynote talk on Simulating Physics with Computers. He was pointing out that we have to face an intrinsic conceptual difficulty if we want to understand the world through mimicking its behaviour with computational systems. Actually, we do not have to go as esoteric as quantum physics. We see some of the same issues in Machine Learning and inference from probabilistic estimators in data-driven modelling. And in the same way that Feynman did not know the resolution to his problem, we are only just starting to become aware of some of our own problems in machine intelligence. The principled approach to Machine Intelligence that we have now come to accept is through a probabilistic viewpoint. The Bayesian view of inference is a subjective one and our knowledge of the universe derives from observation. But I will argue that the use of Machine Learning to represent or simulate the universe only allows generically non-positive probabilities! Of course, we can fudge some of the more uncomfortable aspects that some of these issues raise, but it still should make us think about whether we have got the correct working framework. In this talk I want to question parts of our working machinery we use in Machine Learning. At its heart I want to challenge the assumption that probabilities have to be positive. I want to give several arguments, descriptive and formal, to indicate why the use of positive probabilities is an ideal which is both overly restrictive and unrealisable. Indeed I will argue that the use of non-positive `probabilities' is both inevitable and natural. To do this I will need to use some old mathematical ideas from classical statistics and some more modern ideas from information theory. I will use some simple examples and proofs from Machine Learning applied to regression and classification tasks, and draw parallels with some basic quantum theory ideas. The core of the argument is that in modelling the universe through Machine Learning, we are obliged to make inferences based on finite and hence typically less-than-complete information. We can never know everything about a situation, and this gives us our link between quantum mechanics and statistical inference through machine learning. I will try to make a case that inference through any finite data-driven computation leads to this apparent problem with `probabilities'. So the issue is not just connected with quantum mechanics, but is a more generic problem related to trying to simulate even classical probabilities by Machine Learning ideas. If we have enough time, I will also discuss the consequences of this for information measures such as Entropy, and make the case for Fisher Information being a more appropriate measure for our state of knowledge about a system instead.
Link this pageWould you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !