ImageNet and labels for data – AI in Media and Society

Supervised learning is a type of machine learning in which a model is trained using labeled data. You begin with a very large collection of labeled data. (In the case of ImageNet, the data were all digital images. For the Iris Data Set, the data all refer to individual iris flowers, which can be divided into three related species. For the MNIST dataset, the data are images of about 70,000 handwritten numbers, 0 through 9.)

You divide the dataset into two parts, the training data and the test data. The split might be 70/30, or 80/20. You don’t choose which data goes into which group. Then you run the training data many, many, many times, adjusting certain parameters in the code along the way, until the code consistently returns good results — that is, the thing the code identifies (an object in an image, an iris species, a number) matches the label (which is hidden from the code).

At that point, you have a trained model. You feed the test data set to it and see whether the accuracy rate is also high. (It’s important that none of the test data were used to train the model.) Again, the proof is in the labels.

In a later post I will discuss how data come to be labeled. (Hint: It’s not elves.) In this post, I will discuss bad labels. Specifically, I want to highlight the work that AI researcher Kate Crawford and artist-researcher Trevor Paglen did around the famous ImageNet dataset.

In the video above, Crawford and Paglen present this work and show a lot of great examples. They also published a long article about the work, if you’d rather read than watch.

ImageNet is a huge collection of labeled images. More than 14 million images. They were labeled according to a set of categories and synonym groupings from WordNet, an English-language lexical database. The images were labeled by humans.

And that, it seems, is at the root of the problem.

Crawford and Paglen were interested in the ImageNet photos of people. Person is a category in WordNet. Within the category, there are many descriptive terms for people, such as “cheerleaders, scuba divers, welders, Boy Scouts, fire walkers, and flower girls.” So the photos of people in ImageNet are labeled with these terms. However, not all terms are neutral.

“A young man drinking beer is categorized as an ‘alcoholic, alky, dipsomaniac, boozer, lush, soaker, souse.’ A child wearing sunglasses is classified as a ‘failure, loser, non-starter, unsuccessful person.’”
—Crawford and Paglen

You might say, well, where’s the harm? They are only labels in a database, after all.

The ImageNet database has been used to train many convolutional neural networks used in image-recognition software.

When you feed a photo of yourself into an image-recognition application, you might be surprised at the labels that are applied to you. For example, an image of Paglen (a white man with a shaved head) was labeled as “Klansman, Ku Kluxer.”

Paglen built a web app called ImageNet Roulette so that anyone could upload a photo of themselves or a friend and see what labels were applied. (The app is no longer online.) It became clear that perfectly innocuous people in photos were being labeled as criminals or dangerous, or with racist or sexist terms.

About 952,000 of ImageNet’s 14 million images were in the person category as of 2010 (source). Many of those images — with their labels — were removed after the opening of Crawford and Paglen’s art exhibition, Training Humans, in Milan in September 2019.

ImageNet has been used to train countless image-recognition systems since 2010.

Additional information:

Leading online database to remove 600,000 images after art project reveals its racist bias (September 2019), The Art Newspaper.

AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.