How does machine learning understand sentiment?

Sometimes I come across a video on YouTube that’s almost too simple — and that’s exactly what makes it great. Andy Kim, a junior at the elite prep school Deerfield Academy in Massachusetts, gave a local TED Talk about sentiment analysis, and I think it’s really perfect for anyone who’s spent a little time on understanding image recognition, but who has not yet studied much about natural language processing.

Your first thought might be that detecting the sentiment of a tweet, a movie review, or a response to customer service is just a matter of word definitions. Love is a positive word; hate is a negative word.

But as Melanie Mitchell wrote in Artificial Intelligence: A Guide for Thinking Humans (2019): “Looking at single words or short sequences in isolation is generally not sufficient to glean the overall sentiment; it’s necessary to capture the semantics of words in the context of the whole sentence” (p. 183; my emphasis).

Kim, in his TED Talk, does a good job of explaining how words are represented as vectors, and how this enables complex associations with similar or related terms. He doesn’t use a diagram of three-dimensional space (which I find helpful for conceptualizing this in my own mind); instead he refers to “an n dimensional space,” which I think my journalism students might not instantly visualize.

“These word vectors can span from 25 up to a thousand components. Now, conveniently, as these vectors are still simply a list of numbers, they can be plotted on an n dimensional space …”

—Andy Kim

In computer programming, a vector is a list of values, which you can think of as points or coordinates. In a two-dimensional space, you might have x and y, with the value of x representing the point’s position on a horizontal line, and the value of y representing the point’s position on a vertical line. Add a third dimension, and you have a third coordinate, z.

To simulate more dimensions, we add even more values to the list. A single word will have a list of many values, and those values signify its relations to other words in the collection of all words in the system.

At about the middle of his talk, Kim makes it perfectly clear why so many dimensions are needed to represent relationships among terms that have multiple meanings.

Kim goes on to talk about the labeled data for training a system to detect, or recognize, sentiment in text. He used a freely available dataset from Kaggle, probably the Sentiment140 dataset with 1.6 million tweets. (Another widely used dataset for sentiment analysis training is the IMDB Dataset of 50K Movie Reviews.) Kim also demonstrates cleaning the Twitter data so that usernames, hashtags and stop words are eliminated.

Kim used the GloVe algorithm to construct vectors for the words in his dataset, but he skips over the details of the training and just tells us that he wasn’t very successful; his model only reached a 60 percent accuracy level. He closes by summarizing some of the uses of sentiment analysis.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Journalists use machine learning to examine medical device records

Some investigations in the public interest require journalists to search through large quantities of official documents. Often the set of documents is very diverse — that is, the format, structure, and even language of the documents might vary greatly.

One of the more impressive investigations I know of is the ongoing Implant Files project, conducted originally by 250 journalists in 36 countries. The purpose: To examine how medical devices (specifically, those implanted into human bodies) are “tested, approved, marketed, and monitored” (source). I’ve heard this project discussed at conferences, and I’m full of admiration for the editors and reporters involved, led by the International Consortium of Investigative Journalists (ICIJ).

At the heart of the investigation, with its first results published in 2018, was “an analysis of more than 8 million device-related health records, including death and injury reports and recalls.”

“The entire process involved text mining, clustering, feature selection, association rules and classification algorithms to identify events not always described consistently in different parts of the data.”

How ICIJ Used Machine Learning to Help Find Medical Device Issues

These implanted devices — hip replacements, defibrillators, breast implants, intraocular lenses, and more — are used all around the world. When something goes wrong and a product recall is issued, however, the news might not spread to all the locations where the devices continue to be used in new surgeries for new patients. Moreover, people who already have a faulty implant might not be notified. This is why a global investigation was sorely needed.

Above: An ICIJ video summarizes how patients who receive implants are left unprotected

In 2018, ICIJ shared “a publicly searchable database of more than 70,000 recalls and safety warnings in 11 countries.” The project has continued since then, and the database now contains “more than 120,000 recalls, safety alerts and field safety notices” for medical devices. Throughout 2019, thousands more records were added.

A December 2018 post details the team’s data methodology for the Implant Files. First, journalists had to get the records — and often, their legitimate requests for public records were denied. Of the 8 million device-related records they managed to obtain, 5.4 million came from the U.S. Food and Drug Administration.

The records “describe cases where a device is suspected to have caused or contributed to a serious injury or death or has experienced a malfunction that would likely lead to harm if it were to recur.”

The value in these records was in the connections — connections among cases, and connections among devices. The ICIJ analysis concluded that “devices that broke, misfired, corroded, ruptured or otherwise malfunctioned after implantation or use were linked to more than 1.7 million injuries and nearly 83,000 deaths” in just one decade.

To identify the records that involved a patient’s death, it was necessary for humans to determine various terms and phrasing used instead of the word “death” in the documents. Eventually they developed “a set of more than 3,400 key phrases” that were used to train the machine learning system. After using that model to extract the relevant records, it was necessary to run them through another algorithm configured to determine whether the implant device had contributed to the death.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Using machine learning to uncover racist laws

A common use of machine learning is to train a model to identify a particular kind of document, or a particular characteristic in a document — and then sort a gigantic set of documents. This produces a much-reduced subset of all documents that match the desired criteria. There might be some false positives in the subset, but it still gives researchers or journalists a big jump forward by eliminating thousands of unwanted documents.

This kind of sorting goes well beyond a simple search for keywords.

Above: Screenshot from On the Books at lib.unc.edu

A great example has emerged from the University of North Carolina at Chapel Hill. On the Books: Jim Crow and Algorithms of Resistance is a project that includes a public plain-text collection of North Carolina laws (1866–1967) likely to be Jim Crow laws.

There is a public GitHub repo of the code used in this project. It includes a full walkthrough of the project’s workflow — data acquisition and cleaning, OCR, unsupervised and supervised classification, etc.

The base document set (the main corpus) consists of 96 volumes, with 53,515 chapters, having 297,790 sections (source).

The project’s title gives homage to Safiya Noble’s 2018 book Algorithms of Oppression: How Search Engines Reinforce Racism.

“State-based racial segregation laws were incredibly inconvenient, irregular, and, most importantly, unconstitutional.”

—William Sturkey, Ph.D.

A historical perspective on this data collection was provided by William Sturkey, a history professor at UNC, in “On the Books”: Machine Learning Jim Crow (September 2020). He says On the Books is “the first and most complete collection of all Jim Crow laws from a single American state.” He points to the difficulty of cataloging and studying all Jim Crow laws from any state “because there were just so many.”

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Visual Chatbot: What can AI tell you?

To see for yourself the product, or end results, of an AI system, check out the Visual Chatbot online. It’s free. It’s fun.

Screenshot of dialog with Visual Chatbot

This app invites you to upload any image of your choice. It then generates a caption for that image. As you see above, the caption is not always 100 percent accurate. Yes, there is a dog in the photo, but there is no statue. There is a live person, who happens to be a soldier and a woman.

You can then have a conversation about the photo with the chatbot. The chatbot’s answer to my first question, “What color is the dog?”, was spot-on. Further questions, however, reveal limits that persist in most of today’s image-recognition systems.

The chat is still pretty awesome, though.

Public domain photo of a soldier and a dog indoors, probably in an airport, with a "Welcome Home" balloon. U.S. Department of Defense photo.
U.S. Department of Defense photo, 2015 (public domain)

The image appears in chapter 4 of in Artificial Intelligence: A Guide for Thinking Humans, where author Melanie Mitchell uses it to discuss the complexity that we humans can perceive instantly in an image, but which machines are still incapable of “seeing.”

In spite of the mistakes the chatbot makes in its answers to questions about this image, it serves as a nice demonstration of how today’s chatbots do not need to follow a set script. Earlier chatbots were programmed with rules that stepped through a tree or flowchart of choices — if the human’s question contains x, then reply with y.

You can see more info about Visual Dialog if you’re curious about what the Visual Chatbot entails in terms of data, model, and/or code.

Below you can see some more questions I asked, with the answers from Visual Chatbot.

  • Screenshot of dialog with Visual Chatbot
  • Screenshot of dialog with Visual Chatbot
  • Screenshot of dialog with Visual Chatbot
  • Screenshot of dialog with Visual Chatbot
  • Screenshot of dialog with Visual Chatbot

Some of my favorite wrong answers are on the last two screens. Note, you can ask questions that are not answered with only yes or no.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

GPT-3 and automated text generation

GPT-3 has to be the most-hyped AI technology of the past year. Headlines said its predecessor, GPT-2, was “too dangerous” to be released publicly. Then it was released. The world did not end.

Less than a year later, the more advanced (next generation) GPT-3 was released by OpenAI. Why are people so excited about GPT-3? See for yourself in the video below.

GPT-3 is a natural language generation (NLG) system. Given instructions about what you want, it writes original text that — in most (but not all) cases — sounds like a human wrote it. The technology could be used to rapidly write 10,000 fake user comments into a discussion forum, for example. Or 10,000 fake restaurant reviews.

Don’t worry about the first examples in the video showing GPT-3 writing computer code, if that’s not something you’re well acquainted with — it quickly moves on to show the system extracting text from long documents and writing summaries on the fly. The presenter does a good job of demonstrating the breadth and variety of tasks GPT-3 can be used for. You might be flat-out amazed.

Bear in mind that the examples shown in the video are different, separate applications of GPT-3. You don’t just install GPT-3 and it does all of those things.

Developers can apply to gain access to the GPT-3 API. This enables them to create applications that use GPT-3 but not to see or modify the actual code that makes GPT-3 work. You can view more examples of GPT-3 applications at that same link.

Another nice thing about the video above is the explanation of generative pre-training. Instead of training the GPT-3 model (or models) only with labeled data (supervised learning), the OpenAI researchers used “a semi-supervised approach for language understanding tasks using a combination of unsupervised pre-training and supervised fine-tuning.” The pre-training for GPT-2 included a dataset of more than 7,000 unpublished books “from a variety of genres including Adventure, Fantasy, and Romance.” Because entire books were used — instead of sentences separated from their context — the model was able to learn long-range structure.

GPT-3 used even more long-form texts for pre-training (described in a technical paper):

Above: Screenshot from “Language Models Are Few-Shot Learners,” Brown et al., July 2020

Once again we can see that tremendous advances in AI capability are made possible precisely because today’s computer hardware has the ability to run through enormous quantities of data very quickly. It’s not only that we now have billions of pages of text in digital form. It’s not just that we can store that Himalayan mountain range of data. It’s very much because processors are able to run multiple calculations simultaneously at lightning speed.

An important point about GPT-3 that’s not covered in the video: None of these applications, or GPT-3 itself, understands the meaning of the text that is being generated.

It’s going to be very easy for people to jump to conclusions about the “intelligence” of a computer system when it’s able to generate responses and explanations that are so human-like. There is no comprehension here. There is no knowledge of the world — there is only knowledge about language itself.

To learn more about how GPT-3 does what it does: GPT-3 Explained in Under 3 Minutes.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Untangling speech recognition

Dealing with language is so complicated! In this post I want to focus on speech, voice, audio — but bear in mind that text is also language, and unlike humans, a machine must be able to process text if it’s going to do anything at all with language.

The speech part of machine learning goes two ways: The machine can “hear” speech as audio (it receives audio and simultaneously creates a digital representation of it) — but to make sense of it, to use it (to find the answer to your question, for example), the machine must convert the audio into text. On the other hand, before the machine can “speak,” it needs text — and that text must be converted into digital audio. For the machine, these are not just one thing and its reverse.

Until I began researching this, I hadn’t given any thought to accents. I had thought about the differences among languages (and I still don’t know whether it’s harder, easier or the same to train a speech-recognition system in tonal languages such as the Chinese languages, or Vietnamese, as compared with a non-tonal language such as English), but I’d never considered that a person speaking English with an accent might not be “understood” by a speech-recognition system.

Behind the Mic: The Science of Talking with Computers (2014)

This breezy video from Google (7 minutes) does a good job of conveying a bit of the actual science behind how Siri, Alexa or Google Assistant “know” what we are saying when we speak to them. Even though it’s from 2014, there’s nothing outdated (as far as I know). You can see how the machine represents the speech it takes in. Like many explanations I found, however, it kind of mushes the text part and the sound part altogether, leaving the viewer with a general sense of how it all works but still in the dark as to how the parts work, separately. (I don’t like how they show a human brain when they talk about neural networks. That’s very misleading.)

The video provides a quick background on the development of speech recognition, which was pretty awful until just a few years ago when researchers started applying deep neural networks to the acoustics part. Just like image recognition, speech recognition got a tremendous boost from the advances in computer processing hardware that now allow immense quantities of data to be analyzed at super speed.

To get a handle on how the separate parts of a speech-recognition system work, I needed to listen to this podcast from March 2020. It’s a 50-minute interview with Catherine Breslin, a U.K. machine learning scientist who specializes in speech recognition. She worked at Amazon Alexa for four and a half years. There’s a full transcript at the same URL if you’d rather read than listen.

For speech recognition, machine learning is used to train separate models — one for acoustics, and one for language. There’s also a third piece, the lexicon, which indicates the sequence of phones (the tiniest sound segments) that make up a single word. I don’t yet understand how that part is made. (Any program that reads text aloud would need to have a lexicon.)

“So if we put these together, we have an acoustic model, which tells you from some audio which sounds are likely to be spoken at that time; the lexicon tells you how those sounds combine into words, and then the language model tells you how those words combine into sequences of words.”

—Catherine Breslin

The three pieces, Breslin explains, work together in a decoding process that produces text from speech — the most likely representation of what was said. I looked at some further technical explanations of how the decoding is done, and it resembles a system for AI analysis of game moves — giant trees, many layers, lots of nodes. What the system needs to learn is the probabilities for sounds forming words forming sentences.

Note, all this is just to get to where the machine has the text of what was said. It hasn’t yet done any analysis of what was meant. Whew.

However, apart from voice assistants like Siri and Alexa, this process by itself has tremendous value for transcription. It is used to produce transcripts of radio programs, interviews and meetings, as well as to generate subtitles for movies and videos.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.