I was catching up today on a couple of new-ish developments in reinforcement learning/game-playing AI models.
Meta (which, we always need to note, is the parent company of Facebook) apparently has an entire team of researchers devoted to training an AI system to play Diplomacy, a war-strategy board game. Unlike in chess or Go, a player in Diplomacy must collaborate with others to succeed. Meta’s program, named Cicero, has passed the bar, as explained in a Gizmodo article from November 2022.
“Players are constantly interacting with each other and each round begins with a series of pre-round negotiations. Crucially, Diplomacy players may attempt to deceive others and may also think the AI is lying. Researchers said Diplomacy is particularly challenging because it requires building trust with others, ‘in an environment that encourages players to not trust anyone,’” according to the article.
We can see the implications for collaborations between humans and AI outside of playing games — but I’m not in love with the idea that the researchers are helping Cicero learn how to gain trust while intentionally working to deceive humans. Of course, Cicero incorporates a large language model (R2C2, further trained on the WebDiplomacy dataset) for NLP tasks; see figures 2 and 3 in the Science article linked below. “Each message in the dialogue training dataset was annotated” to indicate its intent; the dataset contained “12,901,662 messages exchanged between players.”
Cicero was not identified as an AI construct while playing in online games with unsuspecting humans. It “apparently ‘passed as a human player,’ in 40 games of Diplomacy with 82 unique players.” It “ranked in the top 10% of players who played more than one game.”
See also: Human-level play in the game of Diplomacy by combining language models with strategic reasoning (Science, 2022).
Meanwhile, DeepMind was busy conquering another strategy board game, Stratego, with a new AI model named DeepNash. Unlike Diplomacy, Stratego is a two-player game, and unlike chess and Go, the value of each of your opponent’s pieces is unknown to you — you see where each piece is, but its identifying symbol faces away from you, like cards held close to the vest. DeepNash was trained on self-play (5.5 billion games) and does not search the game tree. Playing against humans online, it ascended to the rank of third among all Stratego players on the platform — after 50 matches.
Apparently the key to winning at Stratego is finding a Nash equilibrium, which I read about at Investopedia, which says: “There is not a specific formula to calculate Nash equilibrium. It can be determined by modeling out different scenarios within a given game to determine the payoff of each strategy and which would be the optimal strategy to choose.”
See: Mastering the game of Stratego with model-free multiagent reinforcement learning (Science, 2022).
See more posts about games at this site.
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.