Exploring artificial intelligence and machine learning
Author: Mindy McAdams
The Center for Responsible AI at New York University has published a free online course titled “AI Ethics: Global Perspectives.”
The course consists of a series of videos produced by many different people in countries around the world. The instructors include computer science and engineering professors as well as researchers in various fields, including government, health care, and the humanities. These are the lectures I intend to watch:
Danya Glabau, a medical anthropologist, discusses AI for Whom?
It was a challenge for me to figure out how to teach non–computer science students about word vectors. I wanted them to have a clear idea of how words and their meanings are represented for use in an AI system — otherwise, I worried they would assume something like a written dictionary with text and definitions. I also wanted them to know that it wasn’t something simple like “each word has a numerical code assigned to it.” So we spent some time talking about what a vector is and what “n-dimensional space” means.
Now I need to work out how to teach them about transformers. I found a surprisingly clear article at Orange.com (formerly France Télécom), on their Hello Future website about research and innovation. I’m going to quote a large section from that article:
“Originally, in 2013, word embeddings (such as Word2Vec, Glove, or Fasttext) were able to capture representations of words in the form of vectors taking into account the context of neighboring words in large volumes of text. Two words appearing in similar contexts were ‘embedded’ into N-dimensional space, to neighboring points in this space. This approach has led to significant advances in the field of NLP, but also has its limitations. From 2018 a new way of generating these word vectors emerged. Rather than selecting the vector of a word in a previously learnt static ‘dictionary,‘ a model is responsible for dynamically generating the vector representation of a word. A word is thus projected to a vector not only according to its prior meaning, but also according to the context in which it appears. The models for effective realization of these contextual projections (BERT, ELMO and derivatives, GPT and its successors) are based on a simple yet powerful architecture called Transformer.” (Spelling and punctuation edited for American English.)
I know that paragraph might not make sense if you haven’t already learned about word vectors. The key is that transformers are able to build on and enhance the machine accuracy of what a word or sentence means by taking into account its context in the current data. So you do have a language model, previously trained on a large corpus, but the transformer analyzes the present text input in a more holistic way, transforming the vectors as it goes.
Again quoting from the Orange.com article: “While previous approaches … could model contextual dependencies, they were always constrained by referencing words by their positions [in the sentence]. Attention is about referencing by content. Instead of looking for relationships with other words in the context at given positions, attention allows you to search for relationships with all words in the context, and through a very effective implementation, it allows you to rely on the most similar words to improve prediction, whatever their position in context.”
The role of the attention module is explained in a 2017 paper that, according to Google Scholar, has been cited more than 20,000 times: Attention Is All You Need. See the PDF for diagrams of the Transformer network architecture.
Language models produced by transformers include BERT (developed by Google, and which powers Google searches), ELMo, and GPT-3. These so-called large language modelshave raised many concerns, particularly around ethics, as their interior processes are a black box, and their immense training data has included biased and toxic texts. The Orange.com article includes two charts that illustrate differences among BERT, ELMo, and three generations of GPT.
An important aspect of transformers is that they produce these large language models from unlabeled data, and when developing applications based on transformers and such models, good results can be obtained with only a small amount of additional training data (“few-shot learning”).
Orange — like many other companies — is using large language models for classification and information-extraction tasks such as: “sentiment analysis, personal data detection, detection and identification of named entities, syntactic dependency analysis, semantic parsing, co-reference resolution,” and question answering. These tasks involve customer-service applications as well as internal data analysis.
The basic idea: Immediately detect and remove hateful or dangerous posts in social media and other online forums. With advances in natural language processing (NLP), identification of harmful speech becomes more accurate and more practical.
In this essay published in Scientific American (2021), researchers from the private company Unitary (see their public Detoxify code on GitHub) discuss the challenges in rating the level of toxicity or harmfulness in text content. One aspect is what is considered harmful: profanity is easy to detect; misinformation is complicated. Another aspect: Terms describing gender, race, or ethnicity can be used hatefully or as (non-toxic) self-description.
(I’ve written before about machine learning used in comment moderation, which is a large concern in media companies that permit users to post comments on articles and blog posts.)
Jigsaw, a Google division, “released two public data sets containing over one million toxic and non-toxic comments from Wikipedia and a service called Civil Comments.” Each comment was labeled with a rating such as “Toxic” or “Very Toxic.” The data sets were used as training data in three competitions, hosted by Google, in which AI researchers could enter their trained models and see how they compared to others (and win money). The three “Jigsaw challenges” (one per year):
“We decided to take inspiration from the best Kaggle solutions and train our own algorithms with the specific intent of releasing them publicly.”
— Unitary researchers
The Unitary researchers describe Detoxify, “an open-source, user-friendly comment detection library,” which is intended “to help researchers and practitioners identify potential toxic comments.” The library includes three separate models, one for each Jigsaw challenge. These models can be fine-tuned using additional data sets.
One particular limitation pointed out by the researchers is that a high toxicity score does not always indicate actually toxic content: “As an example, the sentence ‘I am tired of writing this stupid essay’ will give a toxicity score of 99.7 percent, while removing the word ‘stupid’ will change the score to 0.05 percent.”
There’s still a long way to go before harmful comments and social media posts can be instantly removed from platforms.
The system described in this wonderful New Yorker article from March 2021 is NOT a neural network, and that’s one of the things that make it fascinating. I’ve written before about ImageNet and how neural networks, trained on humongous datasets of labeled digital images, are able to very accurately say what is in a photograph that the system has never “seen” before.
This is different.
This system, developed by a small company in Japan, does not require hundreds or thousands of images of each object it needs to identify precisely because it doesn’t use a neural network. The technologies it uses can be called good old-fashioned AI (GOFAI). Essentially it consists of a collection of manually constructed algorithms.
The system also “learns,” but not in the typical black-box sense of today’s machine learning systems. It is widely used in the checkout systems of Japanese bakeries, which offer a bewilderingly large assortment of pastries and small bread items, many of which look quite similar to one another. BakeryScan was released in 2013; it was 15 years in development.
More recently, the bakery system has been adapted to recognize specific types of cancer cells. The new system is able to “look at an entire microscope slide and identify the cells that might be cancerous” (source: The New Yorker article).
Rather than summarizing the article further, I’m just going to urge you to read it. It’s very much worth your time.
Cassie Kozyrkov, who wrote this article, is head of decision intelligence at Google. It starts out with what looks like a standard explanation of an image-recognition system — which she deprecatingly refers to as the “the cat/not-cat task.” But don’t be fooled — Kozyrkov communicates with clear, sharp precision, and very quickly she asks us to consider circumstances in which we would want a tiger to be considered a cat and those in which we would want it to be not-cat.
This leads to a discussion of ground truth. This is “an ideal expected result” — but for whom? Well, for the people who originally built the system. Kozyrkov notes that ground truth is NOT an objective, perfect truth like something studied in a philosophy class (Truth with a capital T). It’s whether a tiger is a cat in your reality or not-cat in mine.
I am reminded of one of my favorite lines in the rock opera Jesus Christ Superstar: “But what is truth? Is truth unchanging law? We both have truths. Are mine the same as yours?”
“When such a dataset is used to train ML/AI systems, systems based on it will inherit and amplify the implicit values of the people who decided what the ideal system behavior looked like to them.”
Say you are using an existing labeled dataset — not one you yourself have created — which is often the case. The labels attached to the data items are the ground truth for that dataset. If it’s a dataset of images, and some labels applied to photos of people are racist, then that’s the ground truth in that dataset. If it’s a dataset for sentiment analysis, and a lot of toxic comments are labeled “not toxic,” then that’s the ground truth you’re adopting.
It’s essential for developers to test systems extensively to uncover these flaws in ground truth.
“You wouldn’t want to fall victim to a myopic fraud detection system with sloppy definitions of what financial fraud looks like, especially if such a system is allowed to falsely accuse people without giving them an easy way to prove their innocence.”
— Cassie Kozyrkov
In a video embedded in the same article, Kozyrkov pithily proclaims: “There are only actually two real lines there. Here’s what they are: This objective. That data set.” (At 9:16.) Of course there’s a ton more code than that (she’s talking about the programming of the system that creates the model), but in terms of what you want the system to be able to do, that’s it in a nutshell: How have you framed your objective? And what’s in your dataset? More important, in many cases, is what’s NOT in your dataset.
She says this is where the core danger in AI lies, because in traditional programming “it might take 10,000 lines of code, a hundred thousand lines of code maybe, and some human being has to worry about every single one of those lines, agonize over it.” With supervised machine learning, you’ve only got the objective and the (gigantic) dataset, and the question is, Have enough people with expertise really agonized over each of those things?
My other favorite bits from the video:
“A system that is built and designed for one purpose may not work for a different purpose.” (6:17)
“Remember that the objective is subjective.” (6:31)
“And if you take those two parts really seriously, that is how you are going to build a safe and effective and kind AI system.” (20:16)
Sometimes I read something that is like a voice out of my own head:
“Artificial intelligence is a buzzword increasingly being used by companies around the world that seek to project themselves at the forefront of cutting-edge research … As the word loses its meaning, it is important for investors to understand what artificial intelligence is and what companies stand to gain from breakthroughs in the new technology.”
That comes from an article titled “10 Best Artificial Intelligence Stocks to Buy for 2021” (link above) but it’s more than just a list of stock tips. It points out that “technology firms with social media services” (e.g. Facebook) are hot because they have the massive datasets that power machine learning about consumers. Companies that make super-fast computer hardware — particularly graphical processing units (GPUs) that crunch through that data — are also good bets (although I’ve heard about growing hardware shortages due to the pandemic).
The article’s author refers to hedge-fund investments as an indicator, which might make me leery about investing my own hard-earned cash, but the list of companies still interested me. Along with hardware manufacturers such as Micron Technology and Nvidia; Amazon, which is valuable for more than only its growing AI expertise; and Alphabet Inc., the parent of Google and DeepMind — the list also includes:
Adobe, which is “integrating data-based learning into most of its software through Adobe Sensei, a tool that uses artificial intelligence to improve user experiences across a wide range of Adobe products.”
Facebook — this is Yahoo! FInance’s No. 1 pick, and with its deep pockets, Facebook is certainly able to acquire some of the best research minds in AI today. Its efforts are grouped under the Facebook AI label, and the breadth of its work is visible on this page.
Microsoft, which “has a separate artificial intelligence unit called Microsoft AI that helps users, organizations, and governments across the world with machine learning, data analytics, robotics, and internet of things products.” Just this week, Microsoft to announced a $16 billion cash deal to buy Nuance, which develops AI software including speech-recognition products (Dragon is one). Microsoft pointed to Nuance’s position in the healthcare market as a primary reason for the acquisition.
Pinterest, because it is using AI to sort and categorize the millions of images shared by its users and also to “tailor the experiences” of users. Note, news organizations such as The New York Times are also using AI to determine how content is presented to users.
Salesforce.com, which “provides customer relationship management services and other enterprise solutions on market automation, data analytics, and application development.” The company markets its AI products under the Einstein brand — see AI use cases from the company. Salesforce acquired Slack Technologies last year.
Notably absent from the list is Apple (although maybe not a great investment, due to its high valuation), which is no newcomer to incorporating AI into its products. Critics might pooh-pooh Apple’s AI clout, but machine learning has been integral to the iPhone, iPad, and Apple Watch for years. Ars Technica published an excellent article about this in mid-2020.
Another absence is the assorted promising startups — particularly those in the climate arena and those founded by alumni of DeepMind, which to me is the most fantastic incubator of AI talent (see AlphaFold) outside the top universities. Just this week, Google put money into one of those startups — founded by a former research engineer at DeepMind, and “focused on reducing greenhouse gas emissions.”
I got my first look at spaCy, a Python library for natural language processing, near the end of 2019. I wanted to learn it but had too many other things to do. Fast-forward to now, almost 14 months into the pandemic, and I recently stumbled across spaCy’s own tutorial for learning to use the library.
The interactive tutorial includes videos, slides, and code exercises, and there is a GitHub repo. It is available in English, Deutsch, Español, Français, Português, 日本語, and 中文. Today I completed chapter 2. If you already know Python at, say, an intermediate level, check it out!
In chapter 1 (there are four chapters), I got a handle on part-of-speech tags, syntactic dependencies, and named entities. I learned that we can search on these, and also on words (tokens) related to combinations that we define. I’ve known about large-scale document searches (where a huge collection of documents is searched programmatically, usually to extract the most meaningful docs for some purpose — like a journalism investigation), and now I was getting a much better idea of how such searches can be designed.
SpaCy provides “pre-trained model packages,” meaning someone else has already done the hard work of machine learning/training to generate word vectors. There are packages of various sizes and in various languages. Loading a model provides various features (the bigger the model, the more features).
I think I was hooked as soon as I saw this and realized you could ask for all the MONEY entities, or all the ORG entities, in a document and evaluate them:
Then (still in chapter 1) I learned that I can easily define my own entities if the model doesn’t recognize the ones I need to find. I learned that if I don’t know what GPE is, I can enter spacy.explain("GPE") and spaCy will return 'Countries, cities, states' — sweet!
Then I learned about rule-based matching, and I thought: “Regular expressions, buh-bye!”
Chapter 1 didn’t really get deeply into lemmatization, but it offered this:
That was just chapter 1! Chapter 2 went further into creating your own named entities and using parts of speech as part of your search criteria. For example, if you want to find all instances where a particular entity (say, a city) is followed by a verb — any verb — you can do that. Or any part of speech. You can construct a complex pattern, mixing specific words, parts of speech, and selected types of entities. The pattern can include as many tokens as you want. (If you’re familiar with regex — all the regex things are available.)
You can determine whether phrases or sentences are similar to each other (although imperfectly).
I’m not entirely sure how I would use these, but I’m sure they’re good for something:
.root — the token that decides the category of the phrase
.head — the syntactic “parent” that governs the phrase
There is an exercise in which I matched country names and their root head token (span.root.head), which gave me a bit of a clue as to how useful that might be in some circumstances.
Also in chapter 2, I learned how to use an imported JSON file to add 240 country names as GPE entities — obviously, the imported terms could be any kind of entity.
So, I’m feeling very excited about spaCy! Halfway through the tutorial!
Machine learning systems for image recognition aren’t always perfect — and neither are AI systems marketed for medical use, whether they use image recognition or not. But here’s an example of image recognition used in a medical context where the system appears to have succeeded at something significant — and it’s something humans can’t do, or at least can’t do well.
“Researchers used the AI tool Subtype and Stage Inference (SuStaIn) to scan the MRI brain scans of 6,322 patients with MS, letting SuStaIn train itself unsupervised. The AI identified 3 previously unknown patterns …” (Pharmacy Times). The model was then tested on MRIs from “a separate independent cohort of 3,068 patients” and successfully identified the three new MS subtypes in them.
Subtype and Stage Inference (SuStaIn) was introduced in this 2018 paper. It is an “unsupervised machine-learning technique that identifies population subgroups with common patterns of disease progression” using MRI images. The original researchers were studying dementia.
Why does it matter? Identifying the subtype of the disease multiple sclerosis (MS) enables doctors to pursue different treatments for them, which might lead to better results for patients.
“While further clinical studies are needed, there was a clear difference, by subtype, in patients’ response to different treatments and in accumulation of disability over time. This is an important step towards predicting individual responses to therapies,” said Dr. Arman Eshaghi, the lead researcher (EurekAlert).
In the latest JournalismAI newsletter, a list of recommendations called “Reporting on AI Effectively” shares wisdom from several journalists who are reporting about a range of artificial intelligence and machine learning topics. The advice is grouped under these headings:
Build a solid foundation
Beat the hype
Complicate the narrative
Be compassionate, but embrace critical thinking
Karen Hao, senior AI editor at MIT Technology Review — whose articles I read all the time! — points out that to really educate yourself about AI, you’re going to need to read some of the research papers in the field. She also recommends YouTube as a resource for learning about AI — and I have to agree. I’ve never used YouTube so much to learn about a topic before I began studying AI.
The post also offers good advice about questions a reporter should ask about AI research and new developments int the field.
The Biden Administration is working hard in a wide range of areas, so maybe it’s no surprise that HHS released this report, titled Artificial Intelligence (AI) Strategy (PDF), this month.
“HHS recognizes that Artificial Intelligence (AI) will be a critical enabler of its mission in the future,” it says on the first page of the 7-page document. “HHS will leverage AI to solve previously unsolvable problems,” in part by “scaling trustworthy AI adoption across the Department.”
So HHS is going to be buying some AI products. I wonder what they are (will be), and who makes (or will make) them.
“HHS will leverage AI capabilities to solve complex mission challenges and generate AI-enabled insights to inform efficient programmatic and business decisions” — while to some extent this is typical current business jargon, I’d like to know:
Which complex mission challenges? What AI capabilities will be applied, and how?
Which programmatic and business decisions? How will AI-enabled insights be applied?
These are the kinds of questions journalists will need to ask when these AI claims are bandied about. Name the system(s), name the supplier(s), give us the science. Link to the relevant research papers.
I think a major concern would be use of any technologies coming from Amazon, Facebook, or Google — but I am no less concerned about government using so-called solutions peddled by business-serving firms such as Deloitte.
The following executive orders (both from the previous administration) are cited in the HHS document:
The department will set up a new HHS AI Council to identify priorities and “identify and foster relationships with public and private entities aligned to priority AI initiatives.” The council will also establish a Community of Practice consisting of AI practitioners (page 5).
Four key focus areas:
An AI-ready workforce and AI culture (includes “broad, department-wide awareness of the potential of AI”)
AI research and development in health and human services (includes grants)
“Democratize foundational AI tools and resources” — I like that, although implementation is where the rubber meets the road. This sentence indicates good aspirations: “Readily accessible tools, data assets, resources, and best practices will be critical to minimizing duplicative AI efforts, increasing reproducibility, and ensuring successful enterprise-wide AI adoption.”
“Promote ethical, trustworthy AI use and development.” Again, a fine statement, but let’s see how they manage to put this into practice.