The Algorithm Behind the Curtain: How DeepMind Built a Machine that Beat a Go Master (1 of 5)

Machine learning’s victory in the game of Go is a major milestone in computer science. In the first article in this series, we’ll explain why, and start dissecting the algorithms that made it happen.

2016-05-18 | By Jake Bennett

In March, an important milestone for machine learning was accomplished: a computer program called AlphaGo beat one of the best Go players in the world—Lee Sedol—four times in a five-game series. At first blush, this win may not seem all that significant. After all, machines have been using their growing computing power for years to beat humans at games, most notably in 1997 when IBM’s Deep Blue beat world champ Garry Kasparov at chess. So why is the AlphaGo victory such a big deal?

The answer is two-fold. First, Go is a much harder problem for computers to solve than other games due to the massive number of possible board configurations. Backgammon has 10²⁰ different board configurations, Chess has 10⁴³ and Go has a whopping 10¹⁷⁰ configurations. 10¹⁷⁰ is an insanely large number—too big for humans to truly comprehend. The best analogy used to describe 10¹⁷⁰ is that it is larger than the number of atoms in the universe. The reason that the magnitude of 10¹⁷⁰ is so important is because it implies that if machine learning (ML) can perform better than the best humans for a large problem like Go, then ML can solve a new set of real-world problems that are far more complex than previously thought possible. This means that the potential that machine learning will impact our day-to-day lives in the near future just got a lot bigger.

Furthermore, the sheer size of the Go problem means that pure, brute-force computation alone will never be able to solve the problem—it requires designing a smarter algorithm. This brings us to the second reason why the AlphaGo win is such a major milestone: the program was driven by a general-purpose learning algorithm, rather than a purpose-built one. That is, the same code used to win Go can also be used to solve other problems. This approach is distinctly different from other machine learning programs like IBM’s Deep Blue, which can only play chess. In contrast, the precursor to the AlphaGo program has also learned how to play 49 different classic Atari games, each with distinctly different rules and game mechanics. The implication of a general-purpose algorithm in the real world is that many different types of problems could potentially be solved using the same codebase.

It is the combination of these two factors—the ability to solve very large problems and the design of a general-purpose learning algorithm—that makes the AlphaGo win such a significant milestone. It also explains why the match has caused such a stir in the media. Some people view Lee Sedol’s defeat as the harbinger of machine domination in the labor market. Others suggest that it has ushered in the Golden Age of AI. South Korea—which gave the Go match prime-time coverage—saw it as a wake-up call, pledging to invest $860 million in AI research to remain globally competitive.

Despite all the hype, however, it’s important to remember that AlphaGo is just a computer program, written by humans, to solve a particular set of tasks. In this regard it is no different from a calculator, a CRM system or a search engine. The only difference is that most people have at least a high-level understanding about how these tools work, so they have no reason to fear them. On the other hand, very few people other than data scientists have any earthly idea how machine learning works. How the heck do you go about designing an algorithm that learns how to solve a problem? Sounds like dark magic.

Fortunately for us, the underlying algorithms and computing architecture used to build AlphaGo and its precursor (the machine that learned to master Atari games) have been made available to the public in a series of academic papers and video lectures. Rather than shroud the technology under a cloak of secrecy, Google (the owner of DeepMind, the U.K. company who developed AlphaGo) has been happy to share its knowledge with the world. Why, you might ask? Because Google is in the midst of clawing its way up from a third-place position as a cloud provider (behind Microsoft and Amazon) by building a reputation as a leader in cloud-based machine learning and big data. Its association with DeepMind and AlphaGo gives it serious street cred among the machine learning crowd, and is helping convince Amazon cloud customers like Spotify to jump ship and move to Google Cloud Platform. Their competition is our gain.

In this series of articles, we’ll use these published papers to peek behind the curtain to understand how DeepMind’s program works. We’ll explain the algorithms at a high-level, and try to demystify the dark art of machine learning, piecing together a blueprint of how DeepMind was able to achieve such a breakthrough milestone.

The series is comprised of four articles:

Reinforcement Learning Introduction – This is what you’re reading right now.
Reinforcement Learning Concepts – RL is the algorithm that drives DeepMind’s game playing machine. It uses reward signals to learn and make long-term-oriented decisions during the game. In this article we’ll review the major components of an RL system.
Reinforcement Learning Algorithms – In this article, we’ll see how the various RL components come together in the Q-learning algorithm.
Function Approximation with Neural Networks – Deep neural networks are performing amazing feats right now in fields like computer vision and speech recognition. DeepMind used Deep Learning in a completely different way, combining it with Reinforcement Learning to address the problem of huge problem spaces.

Even though the math is a little intense in machine learning, the concepts behind the data science are fairly intuitive. With a greater understanding about how state-of-the-art machine learning works, hopefully you’ll come to realize that ML is just a new way for computers to solve problems—not the beginning of the robot apocalypse.

Review the next installment in this series: Reinforcement Learning Concepts.