RandomAnt


Technology + Management + Innovation
20
May
2016
Machine Learning

The Algorithm Behind the Curtain: Understanding How Machines Learn with Q-Learning (3 of 5)

by Jake Bennett

Reinforcement Learning (RL) is the driving algorithm behind AlphaGo, the machine the beat a Go master. In this article, we explore how the components of an RL system come together in an algorithm that is able to learn.

Maze Mind


Our goal in this series is to gain a better understanding of how DeepMind constructed a learning machine — AlphaGo — that was able beat a worldwide Go master. In the first article, we discussed why AlphaGo’s victory represents a breakthrough in computer science. In the the second article, we attempted to demystify machine learning (ML) in general, and reinforcement learning (RL) in particular, by providing a 10,000-foot view of traditional ML and unpacking the main components of an RL system. We discussed how RL agents operate in a flowchart-like world represented by a Markov Decision Process (MDP), and how they seek to optimize their decisions by determining which action in any given state yields the most cumulative future reward. We also defined two important functions, the state-value function (represented mathematically as V) and the action-value function (represented as Q), that RL agents use to guide their actions. In this article, we’ll put all the pieces together to explain how a self-learning algorithm works.

The state-value and action-value functions are the critical bits that makes RL tick. These functions quantify how much each state or action is estimated to be worth in terms of its anticipated, cumulative, future reward. Choosing an action that leads the agent to a state with a high state-value is tantamount to making a decision that maximizes long-term reward — so it goes without saying that getting these functions right is critical. The challenge is, however, that figuring out V and Q is difficult. In fact, one of the main areas of focus in the field of reinforcement learning is finding better and faster ways to accomplish this.

One challenge faced when calculating V and Q is that the value of a given state, let’s say state A, is dependent on the value of other states, and the values of these other states are in turn dependent on the value of state A. This results in a classic chicken-or-the-egg problem: The value of state A depends on the value of state B, but the value of state B depends on the value of state A. It’s circular logic.

more…

18
May
2016
Machine Learning

The Algorithm Behind the Curtain: Reinforcement Learning Concepts (2 of 5)

by Jake Bennett

Reinforcement Learning (RL) is at the heart of DeepMind’s Go playing machine. In the second article in this series, we’ll explain what RL is, and why it represents a break from mainstream machine learning.

Rats in maze


In the first article in this series, we discussed why AlphaGo’s victory over world champ Lee Sedol in Go represented a major breakthrough for machine learning (ML). In this is article, we’ll dissect how reinforcement learning (RL) works. RL is one of the main components used in DeepMind’s AlphaGo program.

Reinforcement Learning Overview

Reinforcement learning is a subset of machine learning that has its roots in computer science techniques established in the mid-1950s. Although it has evolved significantly over the years, reinforcement learning hasn’t received as much attention as other types of ML until recently. To understand why RL is unique, it helps to know a bit more about the ML landscape in general.

Most machine learning methods used in business today are predictive in nature. That is, they attempt to understand complex patterns in data — patterns that humans can’t see — in order to predict future outcomes. The term “learning” in this type of machine learning refers to the fact that the more data the algorithm is fed, the better it is at identifying these invisible patterns, and the better it becomes at predicting future outcomes.

This type of predictive machine learning falls into two categories: supervised learning and unsupervised learning. Supervised learning uses large sets of training data that describe observations that have occurred in the past. This training data contains columns that quantitatively describe the observations (these descriptive columns are called “features”), in addition to the final outcome of the observation that the algorithm is trying to predict (this is called the “label”). For example, a spam filter designed to predict if an incoming email is spam might look at millions of emails that have already been classified as spam or not-spam (this is the label) to learn how to properly classify new emails. The list of existing emails are the observations (also called “samples”). The features in the dataset might include things like a count of the word “Viagra” in the text of the email, whether or not the email contains a “$” in the subject line, and the number of users who have flagged it as junk email.

more…

18
May
2016
Machine Learning

The Algorithm Behind the Curtain: How DeepMind Built a Machine that Beat a Go Master (1 of 5)

by Jake Bennett

Machine learning’s victory in the game of Go is a major milestone in computer science. In the first article in this series, we’ll explain why, and start dissecting the algorithms that made it happen.

Chalkboard


In March, an important milestone for machine learning was accomplished: a computer program called AlphaGo beat one of the best Go players in the world—Lee Sedol—four times in a five-game series. At first blush, this win may not seem all that significant. After all, machines have been using their growing computing power for years to beat humans at games, most notably in 1997 when IBM’s Deep Blue beat world champ Garry Kasparov at chess. So why is the AlphaGo victory such a big deal?

The answer is two-fold. First, Go is a much harder problem for computers to solve than other games due to the massive number of possible board configurations. Backgammon has 1020 different board configurations, Chess has 1043 and Go has a whopping 10170 configurations. 10170 is an insanely large number—too big for humans to truly comprehend. The best analogy used to describe 10170 is that it is larger than the number of atoms in the universe. The reason that the magnitude of 10170 is so important is because it implies that if machine learning (ML) can perform better than the best humans for a large problem like Go, then ML can solve a new set of real-world problems that are far more complex than previously thought possible. This means that the potential that machine learning will impact our day-to-day lives in the near future just got a lot bigger.

Furthermore, the sheer size of the Go problem means that pure, brute-force computation alone will never be able to solve the problem—it requires designing a smarter algorithm. This brings us to the second reason why the AlphaGo win is such a major milestone: the program was driven by a general-purpose learning algorithm, rather than a purpose-built one. That is, the same code used to win Go can also be used to solve other problems. This approach is distinctly different from other machine learning programs like IBM’s Deep Blue, which can only play chess. In contrast, the precursor to the AlphaGo program has also learned how to play 49 different classic Atari games, each with distinctly different rules and game mechanics. The implication of a general-purpose algorithm in the real world is that many different types of problems could potentially be solved using the same codebase.

It is the combination of these two factors—the ability to solve very large problems and the design of a general-purpose learning algorithm—that makes the AlphaGo win such a significant milestone. It also explains why the match has caused such a stir in the media. Some people view Lee Sedol’s defeat as the harbinger of machine domination in the labor market. Others suggest that it has ushered in the Golden Age of AI. South Korea—which gave the Go match prime-time coverage—saw it as a wake-up call, pledging to invest $860 million in AI research to remain globally competitive.

more…