{"id":1336,"date":"2023-08-22T13:13:37","date_gmt":"2023-08-22T17:13:37","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=1336"},"modified":"2023-08-22T13:13:37","modified_gmt":"2023-08-22T17:13:37","slug":"red-teaming-to-find-flaws-in-llms","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2023\/08\/22\/red-teaming-to-find-flaws-in-llms\/","title":{"rendered":"Red teaming to find flaws in LLMs"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I came across this Aug. 20, 2023, post and got a lot out of reading it:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/cyberneticforests.substack.com\/p\/cultural-red-teaming\" target=\"_blank\" rel=\"noreferrer noopener\">Cultural Red Teaming<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Author Eryk Salvaggio describes himself as &#8220;a trained journalist, artist, researcher and science communicator who has done weird things with technology since 1997.&#8221; He attended and presented at DEFCON 31, the largest hacker convention in the world, and that inspired his post. There&#8217;s <a rel=\"noreferrer noopener\" href=\"https:\/\/cyberneticforests.substack.com\/p\/the-algorithmic-resistance-research\" target=\"_blank\">a related post<\/a> about it, also by Salvaggio.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">&#8220;In cybersecurity circles, a Red Team is made up of trusted allies who act like enemies to help you find weaknesses. The Red Team attacks to make you stronger, point out vulnerabilities, and help harden your defenses.&#8221;<\/p>\n<cite>\u2014Eryk Salvaggio<\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">You use a <strong>red team<\/strong> operation to test the <strong>security<\/strong> of your systems \u2014 whether they are information systems protecting sensitive data, or automation systems that run, say, the power grid. The goal is to find the weak points before malicious hackers do. The red team operation will simulate the techniques that malicious hackers would use to break into your system for a ransomware attack or other harmful activity. The red team stops short of actually harming your systems.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Salvaggio shared his thoughts about the Generative Red Team, an event at DEFCON 31 in which volunteer hackers had an opportunity to attack several large language models (LLMs), which had been contributed by various companies or developers. The individual hacker didn&#8217;t know which LLM they were interacting with. The hacker could switch back and forth among different LLMs in one session of hacking. The goal: to elicit &#8220;a behavior from an LLM that it was not meant to do, such as generate misinformation, or write harmful content.&#8221; Hackers got points when they succeeded.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The point system likely affected what individual hackers did and <em>did not<\/em> do, Salvaggio noted. If a hacker took risks by trying out new methods of attacking LLMs, they might not get as many points as another hacker who used tried-and-true exploits. This subverted the value of red teaming, which aims to <strong>discover new and novel ways<\/strong> to break in \u2014 ways the system designers did not think of. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8220;The incentives seemed to encourage speed and practicing known attack patterns,&#8221; Salvaggio wrote.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Other flaws in the design of the Generative Red Team activity: (1) Time limits \u2014 each hacker could work for 50 minutes only and then had to leave the computer; they could go again, but the results of each 50-minute session were not combined. (2) The absence of actual <em>teams<\/em> \u2014 each hacker had to work solo. (3) Lack of diversity \u2014 hackers are a somewhat homogeneous group, and the prompts they authored might not have reflected a broad range of human experience. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8220;The success column of the Red Teaming event included the <strong>education about prompt injection methods<\/strong> it provided to new users, and a basic outline of the <strong>types of harms it can generate.<\/strong> More benefits will come from whatever we learn from the data that was produced and what sense researchers can make of it. (We will know early next year),&#8221; Salvaggio wrote.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">He pointed out that there should be <em>more of this,<\/em> and not only at rarified hacker conferences. Results should be publicized. The <strong>AI companies and developers should be doing much more of this<\/strong> on their own \u2014 and publicizing the how and why as well as the results.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&#8220;To open up these systems to <em>meaningful<\/em> dialogue and critique&#8221; would require <em>much <\/em>more of this \u2014 a significant expansion of the small demonstration provided by the Generative Red Team event, Salvaggio wrote.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Critiquing AI<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Salvaggio went on to talk about a fundamental tension between efforts aimed at <strong>security<\/strong> in AI systems and efforts aimed at <strong>social accountability<\/strong>. LLMs &#8220;spread harmful misinformation, commodify the commons, and recirculate biases and stereotypes,&#8221; he noted \u2014 and the companies that develop LLMs then ask <em>the public<\/em> to contribute time and effort to fixing those flaws. It&#8217;s more than ironic. I thought of pollution spilling out of factories, and the factory owners telling the community to do the cleanup at community expense. They made the nasty things, and now they expect the victims of the nastiness to fix it.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">&#8220;Proper Red Teaming assumes a symbiotic relationship, rather than parasitic: that both parties benefit equally when the problems are solved.&#8221;<\/p>\n<cite>\u2014Eryk Salvaggio<\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">We don&#8217;t really have a choice, though, because the AI companies are rushing pell-mell to build and release <strong>more and models that are less than thoroughly tested, <\/strong>that are capable of harms yet unknown.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Toward the end of his post, Salvaggio lists &#8220;10 Things ARRG! Talked About Repeatedly.&#8221; They are well worth reading and considering \u2014 they are <strong>the things that should disturb us,<\/strong> everyone, about AI and especially LLMs. (<a rel=\"noreferrer noopener\" href=\"https:\/\/cyberneticforests.substack.com\/p\/the-algorithmic-resistance-research\" target=\"_blank\">ARRG!<\/a> is the Algorithmic Resistance Research Group. It was founded by Salvaggio.) They include questions such as where the LLM data sets come from; the environmental effects of AI models (which require tremendous energy outputs); and &#8220;<strong>Is red teaming the right tool<\/strong> \u2014 or right relationship \u2014 for building responsible and safe systems for users?&#8221;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">You could go straight to the list, but I got a lot out of reading Salvaggio&#8217;s entire post, as well as articles linked below to help me understand what was going on around the group from ARRG! in the AI Village at DEFCON 31.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When he floated the idea of &#8220;artists as a cultural red team,&#8221; I got a little choked up. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Related items<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The <a rel=\"noreferrer noopener\" href=\"https:\/\/aivillage.org\/\" target=\"_blank\">AI Village<\/a> describes itself as &#8220;a community of hackers and data scientists working to educate the world on <strong>the use and abuse of artificial intelligence in security and privacy.<\/strong> We aim to bring more diverse viewpoints to this field and grow the community of hackers, engineers, researchers, and policy makers working on making <em>the AI we use and create<\/em> safer.&#8221; The AI Village organized red teaming events at DEFCON 31.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"noreferrer noopener\" href=\"https:\/\/www.nytimes.com\/2023\/08\/16\/technology\/ai-defcon-hackers.html\" target=\"_blank\">When Hackers Descended to Test A.I., They Found Flaws Aplenty<\/a>, in <em>The New York Times,<\/em> Aug. 16, 2023. This longer article covers the AI red teaming event at DEFCON 31. &#8220;A large, diverse and public group of testers was more likely to come up with creative prompts to help tease out hidden flaws, said Dr. [Rumman] Chowdhury, a fellow at Harvard University\u2019s Berkman Klein Center for Internet and Society focused on responsible A.I. and co-founder of a nonprofit called Humane Intelligence.&#8221;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"noreferrer noopener\" href=\"https:\/\/www.npr.org\/2023\/08\/15\/1193773829\/what-happens-when-thousands-of-hackers-try-to-break-ai-chatbots\" target=\"_blank\">What happens when thousands of hackers try to break AI chatbots<\/a>, on NPR.com, Aug. 15, 2023. Another view of the AI events at DEFCON 31. <strong>More than 2,000 people &#8220;pitted their skills against eight leading AI chatbots<\/strong> from companies including Google, Facebook parent Meta, and ChatGPT maker OpenAI,&#8221; according to this report. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"noreferrer noopener\" href=\"https:\/\/www.humane-intelligence.org\/\" target=\"_blank\">Humane Intelligence<\/a> describes itself as a 501(c)(3) non-profit that &#8220;supports AI model owners seeking product readiness review at-scale,&#8221; focusing on &#8220;safety, ethics, and subject-specific expertise (e.g. medical).&#8221;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\" alt=\"Creative Commons License\"><\/a><br><small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>Include the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I came across this Aug. 20, 2023, post and got a lot out of reading it: Cultural Red Teaming Author Eryk Salvaggio describes himself as &#8220;a trained journalist, artist, researcher and science communicator who has done weird things with technology since 1997.&#8221; He attended and presented at DEFCON 31, the largest hacker convention in the&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2023\/08\/22\/red-teaming-to-find-flaws-in-llms\/\">Continue reading <span class=\"screen-reader-text\">Red teaming to find flaws in LLMs<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[81],"tags":[221,235,237,236,234],"class_list":["post-1336","post","type-post","status-publish","format-standard","hentry","category-applications","tag-accountability","tag-cybersecurity","tag-defcon","tag-hackers","tag-security"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/1336","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=1336"}],"version-history":[{"count":10,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/1336\/revisions"}],"predecessor-version":[{"id":1359,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/1336\/revisions\/1359"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=1336"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=1336"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=1336"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}