Intelligence is the ability to perform well across a wide range of tasks.
Intuition is inexpressible implicit knowledge.
Creativity is synthesizing knowledge to produce novel ideas.
One day my daughter came back from school, very excited. Nothing particular in that: she enjoyed education. But this time it was more than a class discussion, a maths competition won, or the delights of Java programming. She had listened to a talk by an outside speaker and was inspired. So, the speaker became some-one we lived with, in the ethereal but instructive sense of hearing her discuss the ideas he had engendered. She managed to get a week with him and his game company as part of work experience later in her education, and we all followed his illustrious career with a sense of identification. Moral for researchers: give at least one talk at a school.
Yesterday, thanks to a recommendation from Dominic Cummings, I listened to the same guy and have come away inspired, despite the contact being through a YouTube recording of an MIT lecture, and not face to face in a small classroom.
In the taped lecture below he discusses how his general intelligence system beat the world champion Go player. That is astounding in itself, but to me the most interesting aspect of his talk is his enthralling enquiry into the nature of thinking and problem solving. Has he found a technique with very powerful and wide application that will change the way we solve difficult problems?
His company employs 200 researchers, and attempts to fuse Silicon Valley with academia: the blue sky thinking of the ivory tower with the focus and energy of a start-up. With commendable enthusiasm and naïve impudence (doesn’t he know that many clever academics find these issues complicated, have studied them, and left them even more complicated?) he frames the problem thus:
Step 1 fundamentally solve intelligence.
Step 2 use it to solve everything else.
Who does he think he is? OK, a master chess player at 13, flourishing game company boss that developed Theme Park and Republic, double First in Computing at Cambridge, then PhD in cognitive neuroscience at UCL, lots of excellent publications, and all this without listening to wise advice that he was setting his sights too high.
He says: Artificial Intelligence is the most powerful technology we will ever invent.
What follows is my considerable simplification of his talk, from which the aphorisms at the very start are also my compressed renditions of his remarks and working principles.
More prosaically, the technology he has developed is based on general purpose learning algorithms which can learn automatically for themselves from raw inputs, and are not pre-programmed; and can operate across a broad range of tasks. Operationally, intelligence is the ability to perform well across a broad range of tasks. This artificial general intelligence is flexible, adaptive and inventive. It is built from the ground up to deal with the unexpected: things it has never seen before. Old style artificial intelligence was narrow: hand-crafted, specialist, single purpose, brittle. Deep Blue beat Kasparov, but could not play simpler games like tick-tack-toe.
Artificial general intelligence is based on a reinforcement learning framework, in which an agent operates in an environment and tries to achieve a goal: it can observe reality and obtain rewards. With only noisy, incomplete observations it must build a statistical model of the environment, and then decide what actions to take from the options available at any particular moment to achieve its goal. A machine that can really think has to be grounded in a rich sensorimotor reality. There should be no cheating, no getting to see the internal game code. (Cheating leaves the system superficial and dull). The thinking machine interacts with the world through its perception. Games are a good platform for developing and testing AI algorithms. There is unlimited training data, no testing bias (one side wins, the other loses), opportunities to carry out parallel testing, and measure progress accurately. End to end learning agents go from the very simplest sensory inputs to concrete actions.
Deep reinforcement learning is the extension of reinforcement learning (conditioning, it used to be called: making actions conditional upon outcomes) so that it works at scale. Deep Mind started its learning journey with Atari games from the 1980s. (How Douglas Adams would have loved this! It reminds me of showing him around the technology museum at Karlsruhe, and as I walked past what I assumed he would see as boring Atari kids games, he burbled with pleasure, and named every one of them and their characteristics. I digress.) The learning agents received nothing but the raw pixels (about 30,000 pixels per frame in the game), tried to learn how to maximise their scores, learnt everything from scratch, and developed ONE system to play ALL the different games. Hence, the systems were learning about the games at a very deep level. (Nature, Learning Curve, 2015 Mnih et al).
In a nod to neuroscience, systems can be considered to have a neurology at a very high computational level: algorithms, representations and architectures. Deep-reinforcement-trained machines can now cope with two-dimensional symbolic reasoning, similar to Tower of Hanoi problems, in which a start state is given and the device must follow the rules, but get to a specified Goal state. This is like (example comes from friends at lunch yesterday) trying to change round the furniture in their house and realising, late in the process, that the correct solution depended entirely on moving the small desk on the top landing.
“Go” is the perfect game to test the deep learning machine, previously trained up on the starter problem of all the Atari games. Go has 10 to the power 170 positions, 19 by 19 “squares” (interstices) and only two rules: stones are captured when they have no liberties (are surrounded and have no free vertices to move to); and a repeated board position is not allowed. It is the most complex, profound game, requires intuition and calculation, and pattern recognition plus long term planning: the pinnacle of information games. Brute force approaches don’t work, because the search space is really huge (branching factor of 200, compared to 20 in chess) and it is extremely hard to determine who is winning. A tiny change can transform the balance of power, a so called “divine” move can win the game, and change the history of the game. (See the pesky small desk at the top of the stairs).
To deep-learn the game of Go, the team downloaded 100,000 amateur games and trained a supervised learning “policy” network to predict and play the move the human player played. After a lot of work they got to 60% accuracy as to what a human would have done. They then made the system play itself millions of times, and rewarded it for wins, which made it slowly re-evaluate the value of each move. This got the win rate up to 80%. Then the system played itself another 30 million times. That meant for every position they knew the probability of winning the game, which gave them an evaluation function, previously thought an impossible achievement. They called this the value network, which allowed a calculation of who was winning, and by how much.
The Policy Network provides the input in terms of the probability of moves arising from a position, and the Value Network provides the game-winning value of a move. All this is great, but you still need a planning function. They used a Monte Carlo tree search, and instead of having to churn through 200 possibilities, they looked at the 2 or 3 moves most played by the amateurs. I have simplified this, but it made the search task manageable: a great breakthrough. Thus trained and maximized, AlphaGo could beat 494 out of 495 computer opponents. It then beat Fan Hui, a professional player 5-0. (Silver et al. Nature 2016)
Very interestingly, getting more computer power does not help AlphaGo all that much. Between the first match against the professional European Champion Fan Hui and then the test match against World Champion Lee Sedol, AlphaGo improved to a 99% win rate against the 6 month earlier version. Against the world champion Lee Sedol, AlphaGo played a divine move: a move with a human probability of only 1 in 1000, but a value move revealed 50 moves later to have been key to influencing power and territory in the centre of the board. (The team do not yet have techniques to show exactly why it made that move). Originally seen by commentators as a fat finger miss-click, it was the first indication of real creativity. Not a boring machine.
The creative capabilities of the deep knowledge system is only one aspect of this incredible achievement. More impressive is the rate at which it learnt the game, going up the playing hierarchy from nothing, 1 rank a month, to world champion in 18 months, and is nowhere near asymptote yet. It does not require the computer power to compute 200 million positions a second that IBMs Deep Blue required to beat Kasparov. Talk about a mechanical Turk! AlphaGo needed to look at only 100,000 positions a second for a game that was one order of magnitude more complicated than chess. It becomes more human, comparatively, the more you find out about it, yet what it does now is not rigid and handcrafted, but flexible, creative, deep and real.
Further, it is doing things which the creators cannot explain in detail. So intent were they in building a winner, they did not give it the capacity to give a running commentary. Now, post-win, they are going to build visualizers to show what is going on inside the Von Neumann mind. What will the system say? “Same stupid problem as Thursday?” “Don’t interrupt me while I am thinking?” Or just, every time: “comparing Policy with Network, considering the 3 most common moves, watching the clock and sometimes, just sometimes, finding a shortcut”.
What about us poor humans, of the squishy sort? Fan Hui found his defeat liberating, and it lifted his game. He has risen from 600th position to 300th position as a consequence of thinking about Go in a different way. Lee Sedol, at the very top of the mountain till he met AlphaGo, rated it the best experience of his life. The one game he won was based on a divine move of his own, another “less than 1 in 1000” moves. He will help overturn convention, and take the game to new heights.
All the commentary on the Singularity is that when machines become brighter than us they will take over, reducing us to irrelevant stupidity. I doubt it. They will drive us to new heights.
On that note, the program was created by humans, as shown in the picture at the top of the post. The AlphaGo team, who in my mind must rank high in the annals of creative enterprise, are a snapshot of bright people on whom the rest of us rely for real innovation.
All those years ago, my daughter was right to think that Demis Hassabis showed promise.
Promise me you will give at least one talk at a school.