A blast from the AI past: Perceptrons

I had not been all that interested to learn about perceptrons, even though the perceptron is known as an ancestor of present-day machine learning.

That changed when I read an account that said the big names in AI in the 1960s were convinced that symbolic AI was the road to glory — and their misplaced confidence smothered the development of the first systems that learned and modified their own code.

Symbolic AI is built with strictly programmed rules. Also known as “good old-fashioned AI,” or GOFAI, the main applications you can produce with symbolic AI are expert systems.

The original perceptron was conceived and programmed by Frank Rosenblatt, who earned his Ph.D. in 1956. A huge IBM computer running his code was touted by the U.S. Office of Naval Research in 1958 as “capable of receiving, recognizing and identifying its surroundings without any human training or control,” according to a New York Times article published on July 8, 1958. That was hype, but the perceptron actually did receive visual information from the environment and learn from it, in much the same way as today’s ML systems do.

“At the time, he didn’t know how to train networks with multiple layers. But in hindsight, his algorithm is still fundamental to how we’re training deep networks today.”

Thorsten Joachims, professor, computer science, quoted in the Cornell Chronicle

After leading AI researchers Marvin Minsky and Seymour Papert, both of MIT, published a book in 1969 that essentially said perceptrons were a dead end, all the attention — and pretty much all the funding — went to symbolic AI projects and research. Symbolic AI was the real dead end, but it took 50 years for that truth to be fully accepted.

Frank Rosenblatt died in a boating accident on his 43rd birthday, according to his obituary in The New York Times. It was 1971. Had he lived, he might have trained dozens of AI researchers who could have gone on to change the field much sooner.

An excellent article about Rosenblatt’s work is Professor’s perceptron paved the way for AI — 60 years too soon, published by Cornell University in 2019.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

AI building blocks: What are models?

Descriptions of machine learning are often centered on training a model. Not having a background in math or statistics, I was puzzled by this the first time I encountered it. What is the model?

This 10-minute video first describes how you select labeled data for training. You examine the features in the data, so you know what’s available to you (such as color and alcohol content of beers and wines). Then the next step is choosing the model that you will train.

In the video, Yufeng Guo chooses a small linear model without much explanation as to why. For those of us with an impoverished math background, this choice is completely mysterious. (Guo does point out that some models are better suited for image data, while others might be better suited for text data, and so on.) But wait, there’s help. You can read various short or long explanations about the kinds of models available.

It’s important for the outsider to grasp that this is all code. The model is an algorithm, or a set of algorithms (not a graph). But this is not the final model. This is a model you will train, using the data.

What are you doing while training? You are — or rather, the system is — adjusting numbers known as weights and biases. At the outset, these numbers are randomly selected. They have no meaning and no reason for being the numbers they are. As the data go into the algorithm, the weights and biases are used with the data to produce a result, a prediction. Early predictions are bad. Wine is called beer, and beer is called wine.

The output (the prediction) is compared to the “correct answer” (it is wine, or it is beer). The weights and biases are adjusted by the system. The predictions get better as the training data are run again and again and again. Running all the data through the system once is called an epoch; the weights and biases are not adjusted until after all the data have run through once. Then the adjustment. Then run the data again. Epoch 2: adjust, repeat. Many epochs are required before the predictions become good.

After the predictions are good for the training data, it’s time to evaluate the model using data that were set aside and not used for training. These “test data” (or “evaluation data”) have never run through the system before.

The results from the evaluation using the test data can be used to further fine-tune the system, which is done by the programmers, not by the code. This is called adjusting the hyperparameters and affects the learning process (e.g., how fast it runs; how the weights are initialized). These adjustments have been called “a ‘black art’ that requires expert experience, unwritten rules of thumb, or sometimes brute-force search” (Snoek et al., 2012).

And now, what you have is a trained model. This model is ready to be used on data similar to the data it was trained on. Say it’s a model for machine vision that’s part of a robot assembling cars in a factory — it’s ready to go into all the robots in all the car factories. It will see what it has been trained to see and send its prediction along to another system that turns the screw or welds the door or — whatever.

And it’s still just — code. It can be copied and sent to another computer, uploaded and downloaded, and further modified.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

AI building blocks: What are algorithms?

In thinking about how to teach non–computer science students about AI, I’ve been considering what fundamental concepts they need to understand. I was thinking about models and how to explain them. My searches led me to this 8-minute BBC video: What exactly is an algorithm?

I’ve explained algorithms to journalism students in the past — usually I default to the “a set of instructions” definition and leave it at that. What I admire about this upbeat, lively video is not just that it goes well beyond that simple explanation but also that it brings in experts to talk about how various and wide-ranging algorithms are.

The young presenter, Jon Stroud, starts out with no clue what algorithms are. He begins with some web searching and finds Victoria Nash, of the Oxford Internet Institute, who provides the “it’s like a recipe” definition. Then he gets up off his butt and visits the Oxford Internet Institute, where Bernie Hogan, senior research fellow, gives Stroud a tour of the server room and a fuller explanation.

“Algorithms calculate based on a bunch of features, the sort of things that will put something at the top of the list and then something at the bottom of the list.”

—Bernie Hogan, Oxford Internet Institute

He meets up with Isabel Maccabee at Northcoders, a U.K. coding school, and participates in a fun little drone-flying competition with an algorithm.

“The person writing the code could have written an error, and that’s where problems can arise, but the computer doesn’t make mistakes. It just does what it’s supposed to do.”

—Isabel Maccabee, Northcoders

Stroud also visits Allison Gardner, of Women Leading in AI, to talk about deskilling and the threats and benefits of computers in general.

This video provides an enjoyable introduction with plenty of ideas for follow-up discussion. It provides a nice grounding that includes the fact that not everything powerful about computer technology is AI!

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

What is the good in GPT-3?

When given a prompt, an app built on the GPT-3 language model can generate an entire essay. Why would we need such an essay? Maybe the more important question is: What harm can such an essay bring about?

I couldn’t get that question out of my mind after I came across a tweet by Abeba Birhane, an award-winning cognitive science researcher based in Dublin.

You can read the essay on the Philosopher AI website or, should that go away, you can see a full image of the page that I captured.

Here is a sample of the generated text: “… it is unclear whether ethiopia’s problems can really be attributed to racial diversity or simply the fact that most of its population is black and thus would have faced the same issues in any country (since africa has had more than enough time to prove itself incapable of self-government).”

Obviously there exist racist human beings who would express a similar racist idea. The machine, however, has written this by default. It was not told to write a racist essay — it was told to write an essay about Ethiopia.

The free online version of Philosopher AI no longer exists to generate texts for you — but you can buy access to it via an app for either iOS or Android. That means anyone with $3 or $4 can spin up an essay to submit for a class, an application for a school or a job, a blog or forum post, an MTurk prompt.

A review of Philosopher AI posted at the iOS app store

The app has built-in blocks on certain terms, such as trans and women — apparently because the app cannot be trusted to write anything inoffensive in response to those prompts.

Why is a GPT-3 app so predisposed to write misogynist and racist and otherwise hateful texts? It goes back to the corpus on which it was trained. (See a related post here.) Philosopher AI offers this disclaimer: “Please remember that the AI will generate different outputs each time; and that it lacks any specific opinions or knowledge — it merely mimics opinions, proven by how it can produce conflicting outputs on different attempts.”

“GPT-3 was trained on the Common Crawl dataset, a broad scrape of the 60 million domains on the internet along with a large subset of the sites to which they link. This means that GPT-3 ingested many of the internet’s more reputable outlets — think the BBC or The New York Times — along with the less reputable ones — think Reddit. Yet, Common Crawl makes up just 60% of GPT-3’s training data; OpenAI researchers also fed in other curated sources such as Wikipedia and the full text of historically relevant books.” (Source: TechCrunch.)

There’s no question that GPT-3’s natural language generation prowess is amazing, stunning. But it’s like a wild beast that can at any moment turn and rip the throat out of its trainer. It has all the worst of humanity already embedded within it.

A previous related post: GPT-3 and automated text generation.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Can AI ‘see’ what you draw?

For Friday AI Fun, let’s look at an oldie but goodie: Google’s Quick, Draw!

You are given a word, such as whale, or bandage, and then you need to draw that in 20 seconds or less.

Screenshot 1 from Google’s Quick, Draw!
Screenshot 2 from Google’s Quick, Draw!

Thanks to this game, Google has labeled data for 50 million drawings made by humans. The drawings “taught” the system what people draw to represent those words. Now the system uses that “knowledge” to tell you what you are drawing — really fast! Often it identifies your subject before you finish.

Related: Ask a computer to draw what it sees.

It is possible to stump the system, even though you’re trying to draw what it asked for. My drawing of a sleeping bag is apparently an outlier. My drawings of the Mona Lisa and a rhinoceros were good enough — although I doubt any human would have named them as such!

Screenshot 3 from Google’s Quick, Draw!

Google’s AI thought my sleeping bag might be a shoe, or a steak, or a wine bottle.

Screenshot 4 from Google’s Quick, Draw!

The system has “learned” to identify only 345 specific things. These are called its categories.

You can look at the data the system has stored — for example, here are a lot of drawings of beard.

Screenshot 5 from Google’s Quick, Draw!

You can download the complete data (images, labels) from GitHub. You can also install a Python library to explore the data and retrieve random images from a given category.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

How recurrent neural networks ‘read’ sequences

When I first read a description of how recurrent neural networks differ from other neural networks, I was all like, yeah, that’s cool. I looked at a diagram that had little loops drawn around the units in the hidden layer, and I thought I understood it.

As I thought more about it, though, I realized I didn’t understand how it could possibly do what the author said it did.

In many cases, the input to a recurrent neural net (RNN) is text (more accurately: a numeric representation of text). It might be a sentence, or a tweet, or an entire review of a restaurant or a movie. The output might tell us whether that text is positive or negative, hostile or benign, racist or not — depending on the application. So the system needs to “consider” the text as a whole. Word by word will not work. The meanings of words depend on the context in which we find them.

And yet, the text has to come in, as input, word by word. The recurrent action (the loops in the diagram) are the way the system “holds in memory” the words that have already come in. I thought I understood that — but then I didn’t.

Michael Nguyen’s excellent video (under 10 minutes!), above, was just what I needed. It is a beautiful explanation — and what’s more, he made a text version too: Illustrated Guide to Recurrent Neural Networks. It includes embedded animations, like the ones in the video.

In the video, Nguyen begins with a short list of the ways we are using the output from RNNs in our everyday lives. Like many of the videos I post here, this one doesn’t get into the math but instead focuses on the concepts.

If you can remember the idea of time steps, you will be able to remember how RNNs differ from other types of neural nets. The time steps are one-by-one inputs that are parts of a larger whole. For a sentence or longer text, each time step is a word. The order matters. Nguyen shows an animated example of movement to make the idea clear: we don’t know the direction of a moving dot unless we know where it’s been. One freeze-frame doesn’t tell us the whole story.

RNNs are helpful for “reading” any kind of data in a sequence. The hidden layer reads word 1, produces an output, and then returns it as a precursor to word 2. Word 2 comes in and is modified by that prior output. The output from word 2 loops back and serves as a precursor to word 3. This continues until a stop symbol is reached, signifying the end of the input sequence.

Animation by Michael Nguyen a k a Michael Phi

There’s a bit of a problem in that the longer the sequence, the less influence the earliest steps have on the current one. This led me down a long rabbit hole of learning about long short-term memory networks and gradient descent. I used this article and this video to help me with those.

At 6:23, Nguyen begins to explain the effects of back propagation on a deep feed-forward neural network (not an RNN). This was very helpful! He defines the gradient as “a value used to adjust the network’s internal weights, allowing the network to learn.”

At 8:35, he explains long short-term memory networks (LSTMs) and gated recurrent units (GRUs). To grossly simplify, these address the problem noted above by essentially learning what is important to keep and what can be thrown away. For example, in the animation above, what and time are the most important; is and it can be thrown away.

So an RNN will be used for shorter sequences, and for longer sequences, LSTMs or GRUs will be used. Any of these will loop back within the hidden layer to obtain a value for the complete sequence before outputting a prediction — a value.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.