{"id":1305,"date":"2023-04-26T16:27:08","date_gmt":"2023-04-26T20:27:08","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=1305"},"modified":"2023-04-26T16:27:08","modified_gmt":"2023-04-26T20:27:08","slug":"ai-researchers-love-playing-games","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2023\/04\/26\/ai-researchers-love-playing-games\/","title":{"rendered":"AI researchers love playing games"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I was catching up today on a couple of new-ish developments in reinforcement learning\/game-playing AI models. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Meta (which, we always need to note, is the parent company of Facebook) apparently has an entire team of researchers devoted to training an AI system to play <strong>Diplomacy, <\/strong>a war-strategy board game. Unlike in chess or Go, a player in Diplomacy must <em>collaborate with others<\/em> to succeed. Meta&#8217;s program, named Cicero, has passed the bar, as explained in <a rel=\"noreferrer noopener\" href=\"https:\/\/gizmodo.com\/meta-ai-cicero-diplomacy-gaming-1849811840\" target=\"_blank\">a Gizmodo article<\/a> from November 2022. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u201cPlayers are constantly interacting with each other and each round begins with a series of pre-round negotiations. Crucially, Diplomacy players may attempt to deceive others and may also think the AI is lying. Researchers said Diplomacy is particularly challenging because it requires building trust with others, \u2018in an environment that encourages players to not trust anyone,\u2019\u201d according to the article.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We can see the implications for collaborations between humans and AI <em>outside of playing games<\/em> \u2014 but I&#8217;m not in love with the idea that the researchers are helping Cicero learn <strong>how to gain trust<\/strong> while intentionally <strong>working to deceive<\/strong> humans. Of course, Cicero incorporates <strong>a large language model<\/strong> (R2C2, further trained on the WebDiplomacy dataset) for NLP tasks; see figures 2 and 3 in the <em>Science<\/em> article linked below. \u201cEach message in the dialogue training dataset was annotated\u201d to indicate its intent; the dataset contained \u201c12,901,662 messages exchanged between players.\u201d <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cicero was not identified as an AI construct while playing in online games with unsuspecting humans. It \u201capparently \u2018passed as a human player,\u2019 in 40 games of Diplomacy with 82 unique players.\u201d It \u201cranked in the top 10% of players who played more than one game.\u201d<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">See also: <a rel=\"noreferrer noopener\" href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/36413172\/\" target=\"_blank\">Human-level play in the game of Diplomacy by combining language models with strategic reasoning<\/a> (<em>Science,<\/em> 2022).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Meanwhile, DeepMind was busy conquering another strategy board game, <strong>Stratego,<\/strong> with a new AI model named DeepNash. Unlike Diplomacy, Stratego is a two-player game, and unlike chess and Go, the value of each of your opponent&#8217;s pieces is unknown to you \u2014 you see where each piece is, but its identifying symbol faces away from you, like cards held close to the vest. DeepNash was trained on self-play (5.5 billion games) and does not search the game tree. Playing against humans online, it ascended to the rank of third among all Stratego players on the platform \u2014 after 50 matches. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Apparently the key to winning at Stratego is finding a Nash equilibrium, which I read about at <a rel=\"noreferrer noopener\" href=\"https:\/\/www.investopedia.com\/terms\/n\/nash-equilibrium.asp\" target=\"_blank\">Investopedia<\/a>, which says: \u201cThere is not a specific formula to calculate Nash equilibrium. It can be determined by <strong>modeling out different scenarios<\/strong> within a given game to determine the payoff of each strategy and which would be the optimal strategy to choose.\u201d <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">See: <a rel=\"noreferrer noopener\" href=\"https:\/\/www.science.org\/doi\/10.1126\/science.add4679\" target=\"_blank\">Mastering the game of Stratego with model-free multiagent reinforcement learning<\/a> (<em>Science,<\/em> 2022).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.macloo.com\/ai\/category\/games\/\">See more posts about games<\/a> at this site.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\"><\/a><br><small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>Include the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was catching up today on a couple of new-ish developments in reinforcement learning\/game-playing AI models. Meta (which, we always need to note, is the parent company of Facebook) apparently has an entire team of researchers devoted to training an AI system to play Diplomacy, a war-strategy board game. Unlike in chess or Go, a&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2023\/04\/26\/ai-researchers-love-playing-games\/\">Continue reading <span class=\"screen-reader-text\">AI researchers love playing games<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[7,5,2],"tags":[13,226,228,96,227],"class_list":["post-1305","post","type-post","status-publish","format-standard","hentry","category-games","category-machine-learning","category-nlp","tag-deepmind","tag-diplomacy","tag-meta","tag-reinforcement_learning","tag-stratego"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/1305","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=1305"}],"version-history":[{"count":7,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/1305\/revisions"}],"predecessor-version":[{"id":1312,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/1305\/revisions\/1312"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=1305"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=1305"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=1305"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}