‘The Bitter Lesson’ of AI is not too bitter

I found an old bookmark today and it led me to The Bitter Lesson, a 2019 essay by Rich Sutton, a computer science professor and research scientist based in Alberta, Canada. Apparently OpenAI engineers were instructed to memorize the article.

tl;dr: “Leveraging human knowledge” has not been proven effective in significantly advancing artificial intelligence systems. Instead, leveraging computation (computational power, speed, operations per second, parallelization) is the only thing that works. “These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other.” Thus all efforts to encode how we think and what we know are just wasting precious time, damn it! This is the bitter lesson.

Sutton has worked extensively on reinforcement learning, so it’s not surprising that he mentions examples of AI systems playing games. Systems that “learn” by self-play — that is, one copy of the program playing another copy of the same program — leverage computational power, not human knowledge. DeepMind’s AlphaZero demonstrated that self-play can enable a program/system to learn to play not just one game but multiple games (although one copy only “knows” one game type).

“Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research.”

—Richard S. Sutton

Games, of course, are not the only domain in which we can see the advantages of computational power. Sutton notes that breakthroughs in speech recognition and image recognition came from application of statistical methods to huge training datasets.

Trying to make systems “that worked the way the researchers thought their own minds worked” was a waste of time, Sutton wrote — although I think today’s systems are still using layers of units (sometimes called “artificial neurons”) that connect to multiple units in other layers, and that architecture was inspired by what we do know about how brains work. Modifications (governed by algorithms, not adjusted by humans) to the connections between units constitute the “learning” that has proved to be so successful.

“[B]reakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.”

—Richard S. Sutton

I don’t think this lesson is actually bitter, because what Sutton is saying is that human brains (and human minds, and human thinking, and human creativity) are really, really complex, and so we can’t figure out how to make the same things happen in, or with, a machine. We can produce better and more useful outputs thanks to improved computational methods, but we can’t make the machines better by sharing with them what we know — or trying to teach them how we may think.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

AI researchers love playing games

I was catching up today on a couple of new-ish developments in reinforcement learning/game-playing AI models.

Meta (which, we always need to note, is the parent company of Facebook) apparently has an entire team of researchers devoted to training an AI system to play Diplomacy, a war-strategy board game. Unlike in chess or Go, a player in Diplomacy must collaborate with others to succeed. Meta’s program, named Cicero, has passed the bar, as explained in a Gizmodo article from November 2022.

“Players are constantly interacting with each other and each round begins with a series of pre-round negotiations. Crucially, Diplomacy players may attempt to deceive others and may also think the AI is lying. Researchers said Diplomacy is particularly challenging because it requires building trust with others, ‘in an environment that encourages players to not trust anyone,’” according to the article.

We can see the implications for collaborations between humans and AI outside of playing games — but I’m not in love with the idea that the researchers are helping Cicero learn how to gain trust while intentionally working to deceive humans. Of course, Cicero incorporates a large language model (R2C2, further trained on the WebDiplomacy dataset) for NLP tasks; see figures 2 and 3 in the Science article linked below. “Each message in the dialogue training dataset was annotated” to indicate its intent; the dataset contained “12,901,662 messages exchanged between players.”

Cicero was not identified as an AI construct while playing in online games with unsuspecting humans. It “apparently ‘passed as a human player,’ in 40 games of Diplomacy with 82 unique players.” It “ranked in the top 10% of players who played more than one game.”

See also: Human-level play in the game of Diplomacy by combining language models with strategic reasoning (Science, 2022).

Meanwhile, DeepMind was busy conquering another strategy board game, Stratego, with a new AI model named DeepNash. Unlike Diplomacy, Stratego is a two-player game, and unlike chess and Go, the value of each of your opponent’s pieces is unknown to you — you see where each piece is, but its identifying symbol faces away from you, like cards held close to the vest. DeepNash was trained on self-play (5.5 billion games) and does not search the game tree. Playing against humans online, it ascended to the rank of third among all Stratego players on the platform — after 50 matches.

Apparently the key to winning at Stratego is finding a Nash equilibrium, which I read about at Investopedia, which says: “There is not a specific formula to calculate Nash equilibrium. It can be determined by modeling out different scenarios within a given game to determine the payoff of each strategy and which would be the optimal strategy to choose.”

See: Mastering the game of Stratego with model-free multiagent reinforcement learning (Science, 2022).

See more posts about games at this site.


Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


Using Super Mario to understand neural networks

I doubt I will ever program a neural network, but I’m trying to understand how they work — and how they are trained — well enough to make assumptions about how the systems work. What I want to be able to do is raise questions when I hear about a new-to-me AI system. I don’t want to take it on faith that a system is safe and likely to function well.

Ultimately I want to help my journalism and communications students understand this too.

Last week I discussed here a video about how neural networks work. Some time before I found that video, I had watched this one a couple of times. It’s from 2015 and it’s only 6 minutes long. It’s been viewed on YouTube more than 9 million times. In fact, it’s pretty close to 1 billion views!

Video game designer Seth Bling demonstrates a fully trained neural network that plays Mario expertly. Then he shows us how the system looks at the start, when the Mario character just stands in one place and dies every time. This is the untrained neural network, when it “knows” nothing.

Unlike the example in my earlier post — where the input to the neural network was an image of a handwritten number, and the output was the number (thereby “reading” the image) — here the input is the game state, which changes by the split second. The game state is a simplified digital representation of the Mario character, the surfaces he can run on or jump to, and any obstacles or rewards that are present. The output is which button should be pressed — holding down right continuously makes Mario run toward the right without stopping.

So the output layer in this neural network is the set of all possible actions Mario can take. For a human playing the game, these would be the buttons on the game controller.

In the training, Mario has a “fitness level,” which is a number. When Mario is dying all the time, that number stays around 2. When Mario reaches the end of the level without dying (but without scoring extra points), his fitness is 528. So by “looking at” the fitness level, the neural net assesses success. If the number has increased, then keep doing the same thing.

“The more lines and neurons you have, the more nuanced the decisions can be.”

—Seth Bling

Of course there are more actions than only moving right. Training the neural net to make Mario jump and perform more actions required many generations of neural nets, and only the best-performing ones were selected for the next generation. After 34 generations, the fitness level reached 4,000.

One thing I especially like about this video is the simultaneous visual of real Mario running in the real game level, along with a representation of the neural net showing its pathways in green and red. There is no code and no math in this video, and so while watching it, you are only thinking about how the connections come to be made and reinforced.

The method used is called NeuroEvolution of Augmenting Topologies (NEAT), which I’ve read almost nothing about — but apparently it enables the neural net to grow itself, essentially. Which is kind of mind blowing.

Bling shared his code here; it’s written in the Lua language.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


What is called ‘AI’ but really isn’t

Because “artificial intelligence” and “AI” have become such potent buzzwords in business — and so many firms are trying to sell some kind of “AI” system or software or strategy to every business possible — we should all take a step back and evaluate whether there is actual AI operating in some of these systems.

That won’t always be easy to discern. If a company claims there is “AI” in its product, they are not going to divulge exactly how it works. If they want to convince you, their literature or their engineers will likely throw out a tangled net of terms that, while accurate, might not help anyone but another engineer understand what’s inside the black box.

I was thinking about this recently as I worked on assignments for an online computer science course in AI. One of the early projects was to program a tic-tac-toe game in which a human can play against “an AI.” Just like most humans, the AI can force a tie in every tic-tac-toe game unless the human makes a mistake, and then the human will lose. I wrote the code that enables the AI to play — that was the assignment. But I didn’t invent the code from nothing. I was taught in the course to use an algorithm called minimax. Further, I was encouraged to make my program faster by using another algorithm called alpha-beta pruning.

Illustration of alpha-beta pruning (Wikipedia, by Jez9999, GNU license)

There is no machine learning involved in those two algorithms. They are simply a time-tested way for a computer language to direct a certain kind of look-ahead in a two-player game (not only tic-tac-toe).

Don’t despair or tune out — look at the diagram and understand that the computer, through instructions in my code, is able to rapidly advance through every possible outcome in tic-tac-toe and see how to: (a) prevent a win for the opponent, and (b) win if a win is possible.

There is no magic here.

Tic-tac-toe with “AI” playing X, human playing O.

Another assignment in the same course has the students programming “an AI” that plays Minesweeper. This game is quite different from tic-tac-toe in that there is only one player, and there is hidden knowledge: The player doesn’t know where the mines are. One move at a time, the player builds knowledge about the game board.

Completed Minesweeper game, with AI playing all moves.

A human player doesn’t click on a mine, because she chooses squares that are next to a 0 (indicating no mines touch that square) and marks a mine square when it becomes obvious that a mine is hidden there.

The “AI” builds knowledge in a way that it is programmed to do (that is the assignment). In this case, there is no pre-existing algorithm, but there are principles of logic. I programmed “knowledge” that was stored in the program each time the AI clicked a square and a number was revealed. The knowledge is: (a) that number, and (b) the coordinates of all the surrounding squares. Thus the AI “knows” that, for example, among eight specified squares there are two mines.

If among eight specified squares there are zero mines, my code tells the AI to mark all eight of those squares as safe. My code also tells the AI that if there are any safe moves left to be made, then make a safe move. If not, make a random move. That is the only time when the AI can possibly set off a mine.

Once again, there is no magic here.

In contrast to these two simple examples of a computer successfully playing a game, AlphaGo (which I wrote about previously) uses real AI and could not have beaten a human Go master otherwise. Some games can’t be programmed with only simple algorithms or logic — if they are to win, they need something akin to intuition.

Programming a computer to develop and use an approximation of human intuition is what we have in today’s machine learning with deep neural networks. It’s still not magic, but it’s a lot more complicated than the kind of strictly mapped-out processes I wrote for playing tic-tac-toe or Minesweeper.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


AI programs that play games

One of the very best media items I’ve found is this feature-length documentary about the program that beat an international master at the game of Go in 2016. It’s excellent as a documentary film — well-paced, sparking curiosity, exciting in some parts, and never pedantic.

You don’t need to understand anything about the game (which is immensely popular in China, Japan, and Korea, but not widely played elsewhere). It’s explained visually so that you can appreciate what’s going on. The film is free to watch on YouTube.

As a resource for learning about AI — or, more specifically, about machine learning — the film excels at helping us understand the work of the team of humans that created and trained the AlphaGo program. We don’t see a lot of people sitting at computer keyboards, typing. There are clustered people pointing at a screen, talking enthusiastically, or saying, “What happened there? Why did it do that?”

Probably my favorite moment in the film is after Lee Se-dol, the human Go master, has played a move that is so great, it was later referred to as “the God move.” The AlphaGo team begins analyzing the program’s responses in real-time, watching the graphs of its probability calculations on a large screen in their command center. For all the talk of AI as a black box that makes decisions humans can”t comprehend, this scene demonstrates that AI can be made transparent and accountable.

There’s much, much more to love about this documentary. The director, Greg Kohs, had extraordinary access to the DeepMind team during the months leading up to the five-game match with Lee. In the end, Google financed a general-audience-friendly film. (Google acquired DeepMind in 2014.)

In an interview with CNET, Kohs said the film “had very modest beginnings.”

“A couple members of Google’s creative lab that I’d worked with before gave me a ring and said we’d have access behind the curtain with [DeepMind founder and CEO] Demis Hassabis and his team. So I jumped on board with the expectation we would just film what happens for archival purposes and then put it on a shelf on a hard drive and that would be the end of it.”

Greg Kohs

Another wonderful aspect of the film is its humanity. I’ve seen a fair number of “scare essays” that predict the end of everything as AI gains dominance over its creators — but here we hear a more nuanced and thought-provoking set of views and reactions.

First, there is Lee, possibly the best (human) Go player who has ever lived, in closeup, in the very moment of his realization that the machine has bested him. Then there are the other Go experts, who understand more than you or I what the machine has actually done. Finally, there are the team members of DeepMind, who built the machine. Of course they are happy, ecstatically happy — but they are humbled, and even awed, as well.

At the end of 2019, Lee Se-dol retired as a professional Go player, at age 36. He is the only human who has ever defeated AlphaGo in tournament play.

More about AlphaGo:

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.