Technology + Management + Innovation
Machine Learning

The Algorithm Behind the Curtain: Understanding How Machines Learn with Q-Learning (3 of 4)

by Jake Bennett

Reinforcement Learning (RL) is the driving algorithm behind AlphaGo, the machine the beat a Go master. In this article, we explore how the components of an RL system come together in an algorithm that is able to learn.

Maze Mind

Our goal in this series is to gain a better understanding of how DeepMind constructed a learning machine — AlphaGo — that was able beat a worldwide Go master. In the first article, we discussed why AlphaGo’s victory represents a breakthrough in computer science. In the the second article, we attempted to demystify machine learning (ML) in general, and reinforcement learning (RL) in particular, by providing a 10,000-foot view of traditional ML and unpacking the main components of an RL system. We discussed how RL agents operate in a flowchart-like world represented by a Markov Decision Process (MDP), and how they seek to optimize their decisions by determining which action in any given state yields the most cumulative future reward. We also defined two important functions, the state-value function (represented mathematically as V) and the action-value function (represented as Q), that RL agents use to guide their actions. In this article, we’ll put all the pieces together to explain how a self-learning algorithm works.

The state-value and action-value functions are the critical bits that makes RL tick. These functions quantify how much each state or action is estimated to be worth in terms of its anticipated, cumulative, future reward. Choosing an action that leads the agent to a state with a high state-value is tantamount to making a decision that maximizes long-term reward — so it goes without saying that getting these functions right is critical. The challenge is, however, that figuring out V and Q is difficult. In fact, one of the main areas of focus in the field of reinforcement learning is finding better and faster ways to accomplish this.

One challenge faced when calculating V and Q is that the value of a given state, let’s say state A, is dependent on the value of other states, and the values of these other states are in turn dependent on the value of state A. This results in a classic chicken-or-the-egg problem: The value of state A depends on the value of state B, but the value of state B depends on the value of state A. It’s circular logic.


Machine Learning

The Algorithm Behind the Curtain: Reinforcement Learning Concepts (2 of 4)

by Jake Bennett

Reinforcement Learning (RL) is at the heart of DeepMind’s Go playing machine. In the second article in this series, we’ll explain what RL is, and why it represents a break from mainstream machine learning.

Rats in maze

In the first article in this series, we discussed why AlphaGo’s victory over world champ Lee Sedol in Go represented a major breakthrough for machine learning (ML). In this is article, we’ll dissect how reinforcement learning (RL) works. RL is one of the main components used in DeepMind’s AlphaGo program.

Reinforcement Learning Overview

Reinforcement learning is a subset of machine learning that has its roots in computer science techniques established in the mid-1950s. Although it has evolved significantly over the years, reinforcement learning hasn’t received as much attention as other types of ML until recently. To understand why RL is unique, it helps to know a bit more about the ML landscape in general.

Most machine learning methods used in business today are predictive in nature. That is, they attempt to understand complex patterns in data — patterns that humans can’t see — in order to predict future outcomes. The term “learning” in this type of machine learning refers to the fact that the more data the algorithm is fed, the better it is at identifying these invisible patterns, and the better it becomes at predicting future outcomes.

This type of predictive machine learning falls into two categories: supervised learning and unsupervised learning. Supervised learning uses large sets of training data that describe observations that have occurred in the past. This training data contains columns that quantitatively describe the observations (these descriptive columns are called “features”), in addition to the final outcome of the observation that the algorithm is trying to predict (this is called the “label”). For example, a spam filter designed to predict if an incoming email is spam might look at millions of emails that have already been classified as spam or not-spam (this is the label) to learn how to properly classify new emails. The list of existing emails are the observations (also called “samples”). The features in the dataset might include things like a count of the word “Viagra” in the text of the email, whether or not the email contains a “$” in the subject line, and the number of users who have flagged it as junk email.


Machine Learning

The Algorithm Behind the Curtain: How DeepMind Built a Machine that Beat a Go Master (1 of 4)

by Jake Bennett

Machine learning’s victory in the game of Go is a major milestone in computer science. In the first article in this series, we’ll explain why, and start dissecting the algorithms that made it happen.


In March, an important milestone for machine learning was accomplished: a computer program called AlphaGo beat one of the best Go players in the world—Lee Sedol—four times in a five-game series. At first blush, this win may not seem all that significant. After all, machines have been using their growing computing power for years to beat humans at games, most notably in 1997 when IBM’s Deep Blue beat world champ Garry Kasparov at chess. So why is the AlphaGo victory such a big deal?

The answer is two-fold. First, Go is a much harder problem for computers to solve than other games due to the massive number of possible board configurations. Backgammon has 1020 different board configurations, Chess has 1043 and Go has a whopping 10170 configurations. 10170 is an insanely large number—too big for humans to truly comprehend. The best analogy used to describe 10170 is that it is larger than the number of atoms in the universe. The reason that the magnitude of 10170 is so important is because it implies that if machine learning (ML) can perform better than the best humans for a large problem like Go, then ML can solve a new set of real-world problems that are far more complex than previously thought possible. This means that the potential that machine learning will impact our day-to-day lives in the near future just got a lot bigger.

Furthermore, the sheer size of the Go problem means that pure, brute-force computation alone will never be able to solve the problem—it requires designing a smarter algorithm. This brings us to the second reason why the AlphaGo win is such a major milestone: the program was driven by a general-purpose learning algorithm, rather than a purpose-built one. That is, the same code used to win Go can also be used to solve other problems. This approach is distinctly different from other machine learning programs like IBM’s Deep Blue, which can only play chess. In contrast, the precursor to the AlphaGo program has also learned how to play 49 different classic Atari games, each with distinctly different rules and game mechanics. The implication of a general-purpose algorithm in the real world is that many different types of problems could potentially be solved using the same codebase.

It is the combination of these two factors—the ability to solve very large problems and the design of a general-purpose learning algorithm—that makes the AlphaGo win such a significant milestone. It also explains why the match has caused such a stir in the media. Some people view Lee Sedol’s defeat as the harbinger of machine domination in the labor market. Others suggest that it has ushered in the Golden Age of AI. South Korea—which gave the Go match prime-time coverage—saw it as a wake-up call, pledging to invest $860 million in AI research to remain globally competitive.


Solutions Architecture

Customer Experience Eats Proximity Technology

by Jake Bennett

Proximity technology alone won’t transform retail—it must be used to address customer need in the digital age.

Store, shopping mall abstract defocused blurred background

Proximity technology is a class of emerging technologies (which includes iBeacon, NFC, RFID and a host of others) that enable marketers to pinpoint the location of a customer at a particular point in time. Although proximity technology holds vast potential for marketers, it raises some legitimate concerns as well. Probably the most famous (or infamous) example of the dark side of proximity marketing was in the movie “Minority Report,” which depicted a world where people are under constant surveillance, allowing governments and businesses to track people continuously via retina scanners. In this futuristic landscape, digital billboards identify customers as they pass by and speak to them with highly personalized marketing messages: “Hello Mr. Yakimoto, welcome back to the Gap. How did those tank tops work out for you?”

Fortunately for us, ubiquitous, government-controlled retina scanners don’t exist in the real world. But, an even more powerful and pervasive tracking device does — the smartphone. When paired with proximity technology, the smartphone provides all the computational horsepower necessary to create sci-fi-inspired personalized marketing experiences, experiences that truly add value for the customer rather than creating a dystopian landscape. So if that’s the case, why hasn’t proximity technology transformed retail?

Indeed, proximity technology is at a critical inflection point. Right now retailers are dipping their toes in the water. Most are focused on experimentation, and retailers are still learning the benefits and limitations of the technology. This tentative approach, though, hasn’t created the disruption retailers have been hoping for — a transformation that will lure online shoppers back to the store. Installing iBeacons in a few stores won’t by itself change macro consumer trends. Nonetheless, the question is still hanging out there: Will proximity technology truly enhance retail ROI? Or is it another technology fad that will fail to live up to its promise?

It is too early to tell for sure, but the evidence still strongly suggests that creating a personalized, in-store, digital experience will benefit retail significantly. According to a PricewaterhouseCoopers study, the majority of shoppers in every tracked category, ranging from toys and electronics to clothing, conduct research online before making a retail purchase. What if retail locations could give customers the same level of personalization they receive online?



Seven Practical Technology Leadership Principles

by Jake Bennett

Being a great technologist requires very different skills than being a great technology leader. The key to making the transition is adopting the right mindset.

Super Hero Illustration

Technical managers are often promoted to their positions of leadership by rising through the ranks—more so than most other disciplines. This is a practical move considering that business decisions today increasingly hinge on the nuanced details of underlying technology. Technology leaders need to assess technical options, align recommendations with business requirements and communicate these decisions to non-technical stakeholders. If technology managers don’t understand the technology at a detailed level, it’s difficult for them to make the right call.

The challenge is that being a great engineer doesn’t automatically translate into being a great leader. Leadership—technical or otherwise—is not something one is born with; it is a skill that is developed over a lifetime. Unfortunately, many companies don’t have management training programs in place to cultivate leaders as they move up the org chart. And for those that do, these trainings are typically generic and conceptual. General management training is an important first step, but it is insufficient by itself to prepare technology leaders for the tactical challenges that await them on a day-to-day basis in their new role.

To help new technical managers through the transition from individual contributor to leader, I often work with them to adopt a new set of non-technical skills. Although everyone is different, I’ve found that the principles outlined below provide a strong foundation for becoming an effective technology leader—that is, one who is able lead a team, implement change and consistently achieve results.

1. Adopt a Business Mindset and Develop Empathy

As an individual contributor, it is acceptable to view technology through a purely engineering lens. You have the luxury of focusing on the “how” and not the “why.” This means that as a contributor, you can indulge in technology religion, propose solutions without regard to business impact, and leave it to management to sort out the practical considerations of the real world. When you become a leader, however, you no longer have this luxury. You are now “management.” This means you need to make decisions based on the messy realities of the business, which requires considering financial constraints, organizational culture, office politics, human foibles, and business results.

New managers often make the mistake of making the case for their initiatives in technical terms, rather than business terms, and they become frustrated when they fail to receive the proper support. They expect the business to instinctively adopt a technical perspective, instead of realizing that it’s their job to reframe their proposals from the standpoint of the business.

The best way to overcome this mistake is to take the time to understand the business metrics that the company cares about the most, and the pain points felt by other departments. This requires empathy—a critical skill for effective leadership. Technology managers should talk to their colleagues and listen to their challenges. They should unpack the key metrics of the business, and understand the forces that drive them. They must summon the quantitative and analytical skills they have developed as engineers and apply them toward a new set managerial problems. Once they have done this, then they can make their case as a business leader rather than a technologist, and they can start engaging in a constructive dialog with the business.



New Years Security Resolutions

by Jake Bennett

Seven steps that will make this year the most secure year yet.

Screen displaying Happy New Year

It’s the New Year, which means it’s time for the annual human ritual of making personal promises to give up bad habits and commit to living life better going forward. While most people are focused on renewing their gym memberships or cutting out carbs, my New Years resolution is to help make the Internet a safer place. As an industry, our collective security bad habits caught up with us last year, and it’s time for a change. Last year was a very bad year in terms of security. Here is but a small sampling of the headline-grabbing breaches that happened in 2015:

  • Toy manufacturer VTech suffered a breach that exposed 4.8 customer records, including personal information about kids, as a result of weak password security.
  • Security researchers published a method for hacking a Jeep Cherokee, giving attackers the ability to violently stop the car. Virginia State Policy revealed that their police cars could be compromised. Chrysler recalled 1.4 million cars to patch their in-car software to fix a security flaw.
  • CIA director John Brennan’s email was hacked by a teenager through a social engineering attack.
  • Credit agency Experian lost 15 million T-Mobile customer records to hackers, including names, addresses, social security numbers, birth dates and passport numbers.
  • Ransomware became a big deal in 2015. Cyber extortionists starting using it more prevalently and more broadly, expanding beyond the desktop to encrypt websites and mobile devices for ransom.
  • Infidelity match-maker Ashley Madison lost customer records on 30+ million registered cheaters. Poor software development practices, and bad password management were likely contributors.
  • Italian hacking consultancy Hacking Team was itself hacked, unleashing untold zero-day exploits into the wild and exposing its list of largely government clients.
  • The U.S. Office of Personnel Management suffered a breach affecting 22 million government workers, including the theft of entire background checks, fingerprint data, and other highly sensitive personal information. Great fodder for blackmail.
  • LastPass, a password management provider, was hacked, resulting in the theft of its customers’ passwords. Fortunately, the stolen passwords were encrypted, so it’s unclear if they were ever actually used. Adding to an already bad year for LastPass, security researchers then published multiple methods for compromising LastPass’s security.
  • The biggest hack of 2015, however, was the security breach of healthcare provider Anthem/Bluecross/BlueShield, which resulted in the theft of a whopping 80 million customer records, and an additional 19 million records of rejected customers. That’s about a third of the entire U.S. population.

2015 was abysmal for security—there is no denying it—but there is a silver lining. I’m hopeful that we’ll look back at 2015 as a watershed moment, the year the industry hit rock bottom, motivating us to get off the couch and start working our infosec muscles again. To that end, I’ve drafted a set of New Years security resolutions to get the ball rolling.

1. No Patch Left Behind

The bad guys are constantly scanning our networks for older software with known vulnerabilities. Even if a vulnerability shouldn’t be exploitable under normal circumstance or it only leaks a little information, it is still one piece of the puzzle that hackers are assembling to gain access. I want to take all of the these pieces off the board. That means maximizing the capabilities of vulnerability scanning tools (e.g. using agent-based or credentialed scans on every host), scanning every node on the network and getting every host to a “green” status. Think of this as the “broken windows” approach to patch management.

The challenge to patch management in the real world isn’t the time it takes to patch hosts (although that’s still a huge hurdle), it’s creating a process whereby we feel safe installing patches with minimal effort and little risk of breaking production systems. Effective patch management is a math problem:

Cost/time of patch management =
 (# of hosts * time to patch) +
 (# of hosts * time to fix post-patch problems) +
 (lost productivity from post-patch problems)

If you feel a twinge of hesitation to install the latest patch on a mission-critical system, that would be you mentally calculating the last two terms of the equation. You need to reduce this hesitation if you’re going to get to “green.” This can be managed by classifying systems into three buckets: 1) systems that result in a call from the CEO if they break, 2) systems that result in a call from a VP if they break, 3) all others. The number of hosts in bucket #1 is going to be small. Focus your time and energy on these patches by running them on test systems first and testing them thoroughly before patching. Or, alternatively, have a good rollback plan if things go wrong. For the systems in bucket #2, do a quick assessment of the latest patches, and run them if there are no obvious breakers. As for the hosts in bucket #3, put those on autopilot, running patches automatically and asking for forgiveness later if they break.



Three Reasons Why Docker is a Game Changer

by Jake Bennett

Containers represent a fundamental evolution in software development, but not for the reasons most people think.

Image of containers being unloaded

Docker’s rapid rise to prominence has put it on the radar of almost every technologist today, both IT professionals and developers alike. Docker containers hold the promise of providing the ideal packaging and deployment mechanism for microservices, a concept which has also experienced a growing surge in popularity.

But while the industry loves its sparkling new technologies, it is also deeply skeptical of them. Until a new technology has been battletested, it’s just an unproven idea with a hipster logo, so it’s not surprising that Docker is being evaluated with a critical eye—it should be.

To properly assess Docker’s utility, however, it’s necessary to follow container-based architecture to its logical conclusion. The benefits of isolation and portability, which get most of the attention, are reasons enough to adopt Docker containers. But the real game changer, I believe, is the deployment of containers in clusters. Container clusters managed by a framework like Google’s Kubernetes, allow for the true separation of application code and infrastructure, and enable highly resilient and elastic architectures

It is these three benefits in combination—isolation, portability, and container clustering—that are the real reasons why Docker represents such a significant evolution in how we build and deploy software. Containers further advance the paradigm shift in application development brought about by cloud computing by providing a higher layer of abstraction for application deployment, a concept we’ll explore in more detail later.

Is Docker worth it?

However, you don’t get the benefits of containers for free: Docker does add a layer of complexity to application development. The learning curve for Docker itself is relatively small, but it gets steeper when you add clustering. The question then becomes: is the juice worth the squeeze? That is, do containers provide enough tangible benefit to justify the additional complexity? Or are we just wasting our time following the latest fad?

Certainly, Docker is not the right solution for every project. The Ruby/Django/Laravel/NodeJS crowd will be the first to point out that their PaaS-ready frameworks already give them rapid development, continuous delivery and portability. Continuous integration platform provider Circle CI wrote a hilarious post poking fun at Docker, transcribing a fictitious conversation in which a Docker evangelist tries to explain the benefits of containers to a Ruby developer. The resulting confusion about the container ecosystem perfectly captures the situation.



Strengthen Your AWS Security by Protecting App Credentials and Automating EC2 and IAM Key Rotation

by Jake Bennett

Effective information security requires following strong security practices during development. Here are three ways to secure your build pipeline, and the source code to get you started.


One of the biggest headaches faced by developers and DevOps professionals is the problem of keeping the credentials used in application code secure. It’s just a fact of life. We have code that needs to access network resources like servers and databases, and we have to store these credentials somewhere. Even in the best of circumstances this is a difficult problem to solve, but the messy realities of daily life further compound the issue. Looming deadlines, sprawling technology and employee turnover all conspire against us when we try to keep the build pipeline secure. The result is “credential detritus”: passwords and security keys littered across developer workstations, source control repos, build servers and staging environments.

Use EC2 Instance Profiles

A good strategy for minimizing credential detritus is to reduce the number of credentials that need to be managed in the first place. One effective way to do this in AWS is by using EC2 Instance Profiles. An Instance Profile is an IAM Role that is assigned to an EC2 instance when it’s created. Once this is in-place, any code running on the EC2 instance that makes CLI or SDK calls to AWS resources will be made within the security context of the Instance Profile. This is extremely handy because it means that you don’t need to worry about getting credentials onto the instance when it’s created, and you don’t need to manage them on an ongoing basis—AWS automatically rotates the keys for you. Instead, you can spend your time fine tuning the security policy for the IAM Role to ensure that it has the least amount of privileges to get its job done.

Get Credentials Out of Source Control

EC2 Instance Profiles are a big help, but they won’t completely eliminate the need to manage credentials in your application. There are plenty of non-AWS resources that your code requires access to, and EC2 Instance Profiles won’t help you with those. For these credentials, we need another approach. This starts by making sure that credentials are NEVER stored in source control. A good test I’ve heard is this: if you were forced to push your source code to a public repository on GitHub today, then you should be able to sleep well tonight knowing that no secret credentials would be revealed. How well would you sleep if this happened to you?

The main problem is that source code repositories typically have a wide audience, including people who shouldn’t have access to security credentials. And once you check in those credentials, they’re pretty much up there forever. Moreover, if you use a distributed SCM like Git, then those credentials are stored along with your source code on all of your developers’ machines, further increasing your exposure. The more breadcrumbs you leave lying around, the more likely rats will end up infesting your home. This appears to be what happened in the Ashley Madson hack that took place earlier this year.  Hard-coded credentials stored in source control were implicated as a key factor in the attack. Apparently, their source code was littered with the stuff. Not good.



Eight Reasons Why Agile Motivates Project Teams

by Jake Bennett

Research proves what software developers already know: Agile projects are more fun and inspiring to work on. In this article, we review the science that explains why Agile fosters greater motivation.

Scrum Board

A few weeks ago, I finished conducting a series of video retrospectives with several POP team members who recently completed Agile/Scrum projects. The goal of these one-on-one interviews was to elicit the kinds of critical insights that can only be discovered through in-the-trenches experience. By video recording the conversations, it allowed me to quickly distribute these Agile learnings to the larger agency in easy-to-digest bites.

It was great listening to the team talk about their Scrum experiences, but what struck me the most was the universal belief among the people I talked to that Agile projects were more fun and motivating than Waterfall projects. I wouldn’t have considered this a pattern if the people I interviewed had all worked on the same project. But the team members I spoke with worked on a variety of different projects, ranging from e-commerce sites, to mobile apps, to frontend-focused work. Furthermore, the participants came from different departments, including design, development, project management and QA. Yet despite these differences, it was clear that everyone I talked to shared one thing in common: they all had a much higher level of satisfaction and motivation when working on Agile projects. So for me the big question was: Why? What was it about Agile that fostered greater motivation and better performance than a typical Waterfall project?

Money certainly didn’t have anything to do with it. None of the team members I spoke with were compensated any more or less based on their participation on an Agile project. But the fact that money wasn’t the answer didn’t come as a surprise. Decades of research has debunked the myth that money motivates employees, despite corporate America’s obsession with performance-based pay. So if not money then, what?

The truth is that there isn’t any one aspect of Scrum that increases motivation. But when you dig into the research behind employee motivation it becomes pretty clear that there are several aspects of Scrum that do. To better understand why, let’s dive into the research.

1. Setting Goals

One of the most powerful motivators for employees is simply setting clear goals. According to Stephen Robbins, professor and author of The Essentials of Organizational Behavior, the research is definitive—setting goals works: “Considerable evidence supports goal-setting theory. This theory states that intentions—expressed as goals—can be a major source of work motivation. We can say with a considerable degree of confidence that specific goals lead to increased performance; that difficult goals, when accepted, result in higher performance than easy goals; and that feedback leads to higher performance than no feedback.” Fortunately, we’ve already been told that setting actionable goals is good, which is why managers focus on goal-setting during the annual employee review process. But this isn’t enough. Clear and challenging project-based goals occur more frequently and are usually more tangible and satisfying than amorphous yearly career goals. So having employees work on Scrum projects throughout the year can help bolster morale in between the reviews.



Using Vagrant, Chef and IntelliJ to Automate the Creation of the Java Development Environment

by Jake Bennett

The long path to DevOps enlightenment begins with the Developer’s IDE: Here’s how to get started on the journey. In this article we walk through the steps for automating the creation of a virtual development environment.

Cloud Laptop Vagrant Chef

One of the challenges faced by software developers today working on cloud applications and distributed systems is the problem of setting up the developer workstation in a development environment comprised of an increasing number of services and technologies. It was already hard enough to configure developer workstations for complex monolithic applications, and now it’s even harder as we start to break down the application into multiple micro-services and databases. If you are starting to feel like your developers’ workstations have become fragile beasts that are able to generate builds only by the grace of God and through years of mystery configuration settings, then you are facing trouble. Seek medical help immediately if you are experiencing any of the following symptoms:

  •  The onboarding of new developers takes days or even weeks because getting a new development machine configured is a time-consuming and error-prone process.
  • The words “But the code works on my machine!” are uttered frequently within your organization.
  • Bugs are often discovered in production that don’t occur in development or staging.
  • The documentation for deploying the application to production is a short text file with a last modified date that’s over a year old.

The good news is that there are technologies and practices to remedy these problems. The long-term cure for this affliction is cultivating a DevOps culture within your organization. DevOps is the new hybrid combination of software development and infrastructure operations. With the rise of virtualization and cloud-computing, these two formerly separate departments have found themselves bound together like conjoined twins. In the cloud, hardware is software, and thus software development now includes infrastructure management.