reinforcement_learning – AI in Media and Society

‘The Bitter Lesson’ of AI is not too bitter

March 11, 2024 Mindy McAdams

I found an old bookmark today and it led me to The Bitter Lesson, a 2019 essay by Rich Sutton, a computer science professor and research scientist based in Alberta, Canada. Apparently OpenAI engineers were instructed to memorize the article.

tl;dr: “Leveraging human knowledge” has not been proven effective in significantly advancing artificial intelligence systems. Instead, leveraging computation (computational power, speed, operations per second, parallelization) is the only thing that works. “These two need not run counter to each other, but in practice they tend to. Time spent on one is time not spent on the other.” Thus all efforts to encode how we think and what we know are just wasting precious time, damn it! This is the bitter lesson.

Sutton has worked extensively on reinforcement learning, so it’s not surprising that he mentions examples of AI systems playing games. Systems that “learn” by self-play — that is, one copy of the program playing another copy of the same program — leverage computational power, not human knowledge. DeepMind’s AlphaZero demonstrated that self-play can enable a program/system to learn to play not just one game but multiple games (although one copy only “knows” one game type).

“Search and learning are the two most important classes of techniques for utilizing massive amounts of computation in AI research.”
—Richard S. Sutton

Games, of course, are not the only domain in which we can see the advantages of computational power. Sutton notes that breakthroughs in speech recognition and image recognition came from application of statistical methods to huge training datasets.

Trying to make systems “that worked the way the researchers thought their own minds worked” was a waste of time, Sutton wrote — although I think today’s systems are still using layers of units (sometimes called “artificial neurons”) that connect to multiple units in other layers, and that architecture was inspired by what we do know about how brains work. Modifications (governed by algorithms, not adjusted by humans) to the connections between units constitute the “learning” that has proved to be so successful.

“[B]reakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning.”
—Richard S. Sutton

I don’t think this lesson is actually bitter, because what Sutton is saying is that human brains (and human minds, and human thinking, and human creativity) are really, really complex, and so we can’t figure out how to make the same things happen in, or with, a machine. We can produce better and more useful outputs thanks to improved computational methods, but we can’t make the machines better by sharing with them what we know — or trying to teach them how we may think.

AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

AI researchers love playing games

April 26, 2023 Mindy McAdams

I was catching up today on a couple of new-ish developments in reinforcement learning/game-playing AI models.

Meta (which, we always need to note, is the parent company of Facebook) apparently has an entire team of researchers devoted to training an AI system to play Diplomacy, a war-strategy board game. Unlike in chess or Go, a player in Diplomacy must collaborate with others to succeed. Meta’s program, named Cicero, has passed the bar, as explained in a Gizmodo article from November 2022.

“Players are constantly interacting with each other and each round begins with a series of pre-round negotiations. Crucially, Diplomacy players may attempt to deceive others and may also think the AI is lying. Researchers said Diplomacy is particularly challenging because it requires building trust with others, ‘in an environment that encourages players to not trust anyone,’” according to the article.

We can see the implications for collaborations between humans and AI outside of playing games — but I’m not in love with the idea that the researchers are helping Cicero learn how to gain trust while intentionally working to deceive humans. Of course, Cicero incorporates a large language model (R2C2, further trained on the WebDiplomacy dataset) for NLP tasks; see figures 2 and 3 in the Science article linked below. “Each message in the dialogue training dataset was annotated” to indicate its intent; the dataset contained “12,901,662 messages exchanged between players.”

Cicero was not identified as an AI construct while playing in online games with unsuspecting humans. It “apparently ‘passed as a human player,’ in 40 games of Diplomacy with 82 unique players.” It “ranked in the top 10% of players who played more than one game.”

Meanwhile, DeepMind was busy conquering another strategy board game, Stratego, with a new AI model named DeepNash. Unlike Diplomacy, Stratego is a two-player game, and unlike chess and Go, the value of each of your opponent’s pieces is unknown to you — you see where each piece is, but its identifying symbol faces away from you, like cards held close to the vest. DeepNash was trained on self-play (5.5 billion games) and does not search the game tree. Playing against humans online, it ascended to the rank of third among all Stratego players on the platform — after 50 matches.

Apparently the key to winning at Stratego is finding a Nash equilibrium, which I read about at Investopedia, which says: “There is not a specific formula to calculate Nash equilibrium. It can be determined by modeling out different scenarios within a given game to determine the payoff of each strategy and which would be the optimal strategy to choose.”

See: Mastering the game of Stratego with model-free multiagent reinforcement learning (Science, 2022).

See more posts about games at this site.

Using Super Mario to understand neural networks

September 17, 2020September 26, 2020 Mindy McAdams

I doubt I will ever program a neural network, but I’m trying to understand how they work — and how they are trained — well enough to make assumptions about how the systems work. What I want to be able to do is raise questions when I hear about a new-to-me AI system. I don’t want to take it on faith that a system is safe and likely to function well.

Ultimately I want to help my journalism and communications students understand this too.

Last week I discussed here a video about how neural networks work. Some time before I found that video, I had watched this one a couple of times. It’s from 2015 and it’s only 6 minutes long. It’s been viewed on YouTube more than 9 million times. In fact, it’s pretty close to 1 billion views!

Video game designer Seth Bling demonstrates a fully trained neural network that plays Mario expertly. Then he shows us how the system looks at the start, when the Mario character just stands in one place and dies every time. This is the untrained neural network, when it “knows” nothing.

Unlike the example in my earlier post — where the input to the neural network was an image of a handwritten number, and the output was the number (thereby “reading” the image) — here the input is the game state, which changes by the split second. The game state is a simplified digital representation of the Mario character, the surfaces he can run on or jump to, and any obstacles or rewards that are present. The output is which button should be pressed — holding down right continuously makes Mario run toward the right without stopping.

So the output layer in this neural network is the set of all possible actions Mario can take. For a human playing the game, these would be the buttons on the game controller.

In the training, Mario has a “fitness level,” which is a number. When Mario is dying all the time, that number stays around 2. When Mario reaches the end of the level without dying (but without scoring extra points), his fitness is 528. So by “looking at” the fitness level, the neural net assesses success. If the number has increased, then keep doing the same thing.

“The more lines and neurons you have, the more nuanced the decisions can be.”
—Seth Bling

Of course there are more actions than only moving right. Training the neural net to make Mario jump and perform more actions required many generations of neural nets, and only the best-performing ones were selected for the next generation. After 34 generations, the fitness level reached 4,000.

One thing I especially like about this video is the simultaneous visual of real Mario running in the real game level, along with a representation of the neural net showing its pathways in green and red. There is no code and no math in this video, and so while watching it, you are only thinking about how the connections come to be made and reinforced.

The method used is called NeuroEvolution of Augmenting Topologies (NEAT), which I’ve read almost nothing about — but apparently it enables the neural net to grow itself, essentially. Which is kind of mind blowing.

Bling shared his code here; it’s written in the Lua language.