{"id":791,"date":"2021-06-02T09:00:00","date_gmt":"2021-06-02T13:00:00","guid":{"rendered":"https:\/\/www.macloo.com\/ai\/?p=791"},"modified":"2021-06-02T09:33:55","modified_gmt":"2021-06-02T13:33:55","slug":"the-trouble-with-large-language-models","status":"publish","type":"post","link":"https:\/\/www.macloo.com\/ai\/2021\/06\/02\/the-trouble-with-large-language-models\/","title":{"rendered":"The trouble with large language models"},"content":{"rendered":"\n<p>Yesterday I <a href=\"https:\/\/www.macloo.com\/ai\/2021\/06\/01\/summary-of-the-challenges-facing-algorithms-ai\/\" target=\"_blank\" rel=\"noreferrer noopener\">summarized<\/a> the first two articles in a series about algorithms and AI by Hayden Field, a technology journalist at <a rel=\"noreferrer noopener\" href=\"https:\/\/www.morningbrew.com\/\" target=\"_blank\">Morning Brew<\/a>. Today I&#8217;ll finish out the series.<\/p>\n\n\n\n<p>The third article, <a rel=\"noreferrer noopener\" href=\"https:\/\/www.morningbrew.com\/emerging-tech\/stories\/2021\/03\/29\/one-biggest-advancements-ai-also-sparked-fierce-debate-heres\" target=\"_blank\">This Powerful AI Technique Led to Clashes at Google and Fierce Debate in Tech. Here&#8217;s Why<\/a>, explores the basis of the volatile situation around the firing of Timnit Gebru and later Margaret Mitchell from Google&#8217;s Ethical AI unit earlier this year. Both women are highly respected and experienced AI researchers. Mitchell founded the team in 2017.<\/p>\n\n\n\n<p>Central to the situation is a criticism of <strong>large language models<\/strong> and a March 2021 paper (<a rel=\"noreferrer noopener\" href=\"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/3442188.3445922\" target=\"_blank\">On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?<\/a>) co-authored by Gebru, Mitchell, and two researchers at the University of Washington. The biggest current example is GPT-3, previously covered in <a rel=\"noreferrer noopener\" href=\"https:\/\/www.macloo.com\/ai\/?s=gpt-3\" target=\"_blank\">several posts<\/a> here.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;Models this big require <em>an unthinkable amount of data<\/em>; the entirety of English-language Wikipedia makes up just 0.6% of GPT-3\u2019s training data.&#8221;<\/p><cite>\u2014&#8221;This Powerful AI Technique Led to Clashes at Google and Fierce Debate in Tech. Here&#8217;s Why&#8221;<\/cite><\/blockquote>\n\n\n\n<p>The Morning Brew article sums up the very recent and very big improvements in large language models that have come about thanks to new algorithms and faster computer hardware (GPUs running in parallel). It highlights BERT, &#8220;the model that now underpins Google Search,&#8221; which came out of the research that resulted in the first <a rel=\"noreferrer noopener\" href=\"https:\/\/www.macloo.com\/ai\/2021\/05\/30\/attention-in-machine-learning-and-nlp\/\" target=\"_blank\">Transformer<\/a>. A good at-the-time article about GPT-3&#8217;s release was <a rel=\"noreferrer noopener\" href=\"https:\/\/www.technologyreview.com\/2020\/07\/20\/1005454\/openai-machine-learning-language-generator-gpt-3-nlp\/\" target=\"_blank\">published in July 2020<\/a> in MIT&#8217;s <em>Technology Review<\/em>: &#8220;OpenAI first described GPT-3 in a research paper published in May [2020].&#8221;<\/p>\n\n\n\n<p>One point being \u2014 Google fired Timnit Gebru <em>very soon after<\/em> news and discussion of large language models (GPT-3 especially, but remember Google&#8217;s investment in BERT too) ramped up \u2014 <em>way<\/em> up. Her criticism of a previously obscure AI technology (not obscure among NLP researchers, but in the wider world) might have been seen as increasingly inconvenient for Google. Morning Brew summarizes the criticism (not attributed to Gebru): &#8220;Because large language models often scrape data from most of the internet, racism, sexism, homophobia, and other toxic content inevitably filter in.&#8221;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;Once the barrier to create AI tools and generate text is lower, people could just use it to create misinformation at scale, and having that data coupled with certain other platforms can just be a very disastrous situation.\u201d<\/p><cite>\u2014Sandhini Agarwal, AI policy researcher, OpenAi<\/cite><\/blockquote>\n\n\n\n<p>The Morning Brew article goes well beyond Google&#8217;s dismissal of Gebru and Mitchell, bringing in a lot of clear, easy-to-understand explanation of what large language models require (for example, significant <strong>energy resources<\/strong>), what they&#8217;re being used for, and even the <strong>English-centric nature<\/strong> of such models \u2014 lacking a gigantic corpus of digitized text in a given human language, you can&#8217;t create a large model in that language.<\/p>\n\n\n\n<p>The turmoil in Google&#8217;s Ethical AI unit is covered in more detail in <a rel=\"noreferrer noopener\" href=\"https:\/\/www.morningbrew.com\/emerging-tech\/stories\/amp\/2021\/05\/14\/earth-ai-ethics-experts-react-google-doubling-embattled-ethics-team\" target=\"_blank\">this May 2021 article<\/a>, also by Hayden Field.<\/p>\n\n\n\n<p>It&#8217;s easy to find articles that discuss &#8220;scary things GPT-3 can do and does&#8221; and especially the bias issues; it&#8217;s much harder to find information about some of the other aspects covered here. It&#8217;s also not just about GPT-3. I appreciated insights from an interview with Emily M. Bender, first author on the &#8220;Stochastic Parrots&#8221; article. I also liked the explicit statement that many useful NLP tasks can be done well without a <em>large<\/em> language model. In smaller datasets, finding and accounting for toxic content can be more manageable.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\"><p>&#8220;Do we need this at all? What\u2019s the actual value proposition of the technology? &#8230; Who is paying the environmental price for us doing this, and is this fair?\u201d<\/p><cite>\u2014Emily M. Bender, professor and director, Professional MS in Computational Linguistics, University of Washington<\/cite><\/blockquote>\n\n\n\n<p>Finally, in a <a rel=\"noreferrer noopener\" href=\"https:\/\/www.morningbrew.com\/emerging-tech\/stories\/2021\/04\/02\/event-recap-can-algorithm-actually-know\" target=\"_blank\">recap of Morning Brew&#8217;s &#8220;Demystifying Algorithms&#8221; event<\/a>, editor Dan McCarthy summarized two AI researchers&#8217; answers to one of my favorite questions: What can an algorithm actually <em>know<\/em>?<\/p>\n\n\n\n<p>An AI system\u2019s ability to <strong>generalize<\/strong> \u2014 to transfer learning from one domain to another \u2014 is still a wide-open frontier, according to Mark Riedl, a computer science professor at Georgia Tech. This is something I remind my students of over and over \u2014 what&#8217;s called &#8220;general intelligence&#8221; is still a long way off for artificial intelligence. Riedl works on aspects of storytelling to test whether an AI system is able to &#8220;make something new&#8221; out of what it has ingested.<\/p>\n\n\n\n<p>Sa\u0161ka Mojsilovi\u0107, head of Trusted AI Foundations at IBM Research, made a similar point \u2014 and also emphasized that &#8220;<strong>narrow AI<\/strong>&#8221; (which is all the AI we&#8217;ve ever had, up to now <em>and<\/em> for the foreseeable future) is not nothing. <\/p>\n\n\n\n<p>She suggested: \u201cWe may want to take a pause from obsessing over artificial general intelligence and maybe think about how we create AI solutions for these kinds of problems\u201d \u2014 for example, narrow domains such as drug discovery (e.g. new antibiotics) and creation of new molecules. These are extraordinary accomplishments within the capabilities of today&#8217;s AI.<\/p>\n\n\n\n<p>This is a half-hour conversation with those two experts:<\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jetpack-video-wrapper\"><iframe loading=\"lazy\" title=\"Explorations: Demystifying Algorithms Virtual Event\" width=\"739\" height=\"416\" src=\"https:\/\/www.youtube.com\/embed\/442rxMm3BDU?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p>Thanks to the video, I learned about the <a rel=\"noreferrer noopener\" href=\"https:\/\/www.i-programmer.info\/news\/105-artificial-intelligence\/7999-lovelace-20-test-an-alternative-turing-test.html\" target=\"_blank\">Lovelace 2.0 Test<\/a>, which Riedl developed in 2014. It&#8217;s an alternative to the Turing Test.<\/p>\n\n\n\n<p>Mojsilovi\u0107 talked about the perceptions that arise when we use the word <em>intelligence<\/em> when talking about machines. &#8220;The reality is that many things that we call AI today are the same old models that we used to call data science maybe five or six years ago,&#8221; she said (at 21:55). She also talked about the <strong>need for collaboration<\/strong> between AI researchers and experts in entirely separate fields: &#8220;Because we can&#8217;t create solutions for the problems that we don&#8217;t understand&#8221; (at 29:24).<\/p>\n\n\n\n<p>.<\/p>\n\n\n\n<p><a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\"><img decoding=\"async\" alt=\"Creative Commons License\" style=\"border-width:0\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc-nd\/4.0\/88x31.png\"><\/a><br>\n<small><span xmlns:dct=\"http:\/\/purl.org\/dc\/terms\/\" property=\"dct:title\"><strong>AI in Media and Society<\/strong><\/span> by <span xmlns:cc=\"http:\/\/creativecommons.org\/ns#\" property=\"cc:attributionName\">Mindy McAdams<\/span> is licensed under a <a rel=\"license\" href=\"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/\">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License<\/a>.<br>\nInclude the author&#8217;s name (Mindy McAdams) and a link to the original post in any reuse of this content.<\/small><\/p>\n\n\n\n<p>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday I summarized the first two articles in a series about algorithms and AI by Hayden Field, a technology journalist at Morning Brew. Today I&#8217;ll finish out the series. The third article, This Powerful AI Technique Led to Clashes at Google and Fierce Debate in Tech. Here&#8217;s Why, explores the basis of the volatile situation&hellip; <a class=\"more-link\" href=\"https:\/\/www.macloo.com\/ai\/2021\/06\/02\/the-trouble-with-large-language-models\/\">Continue reading <span class=\"screen-reader-text\">The trouble with large language models<\/span> <span class=\"meta-nav\" aria-hidden=\"true\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[6,2],"tags":[137,102,135],"class_list":["post-791","post","type-post","status-publish","format-standard","hentry","category-ethics-and-bias","category-nlp","tag-bert","tag-gpt3","tag-toxicity"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/791","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/comments?post=791"}],"version-history":[{"count":10,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/791\/revisions"}],"predecessor-version":[{"id":815,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/posts\/791\/revisions\/815"}],"wp:attachment":[{"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/media?parent=791"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/categories?post=791"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.macloo.com\/ai\/wp-json\/wp\/v2\/tags?post=791"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}