Robots, and what’s not AI

Think of a robot. Do you picture a human-looking construct? Does it have a human-like face? Does it have two legs and two arms? Does it have a head? Does it walk?

It’s easy to assume that a robot that walks across a room and picks something up has AI operating inside it. What’s often obscured in viral videos is how much a human controller is directing the actions of the robot.

I am a gigantic fan of the Spot videos from Boston Dynamics. Spot is not the only robot the company makes, but for me it is the most interesting. The video above is only 2 minutes long, and if you’ve never seen Spot in action, it will blow your mind.

But how much “intelligence” is built into Spot?

The answer lies in between “very little” and “Spot is fully autonomous.” To be clear, Spot is not autonomous. You can’t just take him out of the box, turn him on, and say, “Spot, fetch that red object over there.” (I’m not sure Spot can be trained to respond to voice commands at all. But maybe?) Voice commands aside, though, Spot can be programmed to perform certain tasks in certain ways and to walk from one given location to another.

This need for additional programming doesn’t mean that Spot lacks AI, and I think Spot provides a nice opportunity to think about rule-based programming and the more flexible reinforcement-learning type of AI.

This 20-minute video from Adam Savage (of MythBusters fame) gives us a look behind the scenes that clarifies how much of what we see in a video about a robot is caused by a human operator with a joystick in hand. If you pay attention, though, you’ll hear Savage point out what Spot can do that is outside the human’s commands.

Two points in particular stand out for me. The first is that when Spot falls over, or is upside-down, he “knows” how to make himself get right-side-up again. The human doesn’t need to tell Spot he’s upside-down. Spot’s programming recognizes his inoperable position and corrects it. Watching him move his four slender legs to do so, I feel slightly creeped out. I’m also awed by it.

Given the many incorrect positions in which Spot might land, there’s no way to program this get-right-side-up procedure using set, spelled-out rules. Spot must be able to use estimations in this process — just like AlphaGo did when playing a human Go master.

The second point, which Savage demonstrates explicitly, is accounting for non-standard terrain. One of the practical uses for a robot would be to send it somewhere a human cannot safely go, such as inside a bombed-out building — which would require the robot to walk over heaps of rubble and avoid craters. The human operator doesn’t need to tell Spot anything about craters or obstacles. The instruction is “Go to this location,” and Spot’s AI figures out how to go up or down stairs or place its feet between or on uneven surfaces.

The final idea to think about here is how the training of a robot’s AI takes place. Reinforcement learning requires many, many iterations, or attempts. Possibly millions. Possibly more than that. It would take lifetimes to run through all those training episodes with an actual, physical robot.

So, simulations. Here again we see how super-fast computer hardware, with multiple processes running in parallel, must exist for this work to be done. Before Spot — the actual robot — could be tested, he existed as a virtual system inside a machine, learning over nearly endless iterations how not to fall down — and when he did fall, how to stand back up.

See more robot videos on Boston Dynamics’ YouTube channel.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


Racial and gender bias in AI

Different AI systems do different things when they attempt to identify humans. Everyone has heard about face recognition (a k a facial recognition), which you might expect would return a name and other personal data about a person whose face is “seen” with a camera.

No, not always.

A system that analyzes human faces might simply try to return information about the person that you or I would tag in our minds when we see a stranger. The person’s gender, for example. That’s relatively easy to do most of the time for most humans — but it turns out to be tricky for machines.

Machines often get it wrong when trying to identify the gender of a trans person. But machines also misidentify the gender of people of color. In particular, they have a big problem recognizing Black women as women.

A short and good article about this ran in Time magazine in 2019, and the accompanying video is well worth watching. It shows various face recognition software systems at work.

Another serious problem concerns differentiating among people of Asian descent. When apartment buildings and other housing developments have installed face recognition as a security system — to open for residents and stay locked for others — the Asian residents can find themselves locked out of their own home. The doors can also open for Asian people who don’t live there.

You can find a lot of articles about this widespread and very serious problem with AI technology, including the deservedly famous mug shots test by the American Civil Liberties Union.

“While it is usually incorrect to make statements across algorithms, we found empirical evidence for the existence of demographic differentials in the majority of the face recognition algorithms we studied.”

—Patrick Grother, NIST computer scientist

So how does this happen? How do companies with almost infinite resources deploy products that are so seriously — and even dangerously — flawed?

Yesterday I wrote a little about training data for object-detection AI. To identify any image, or any part of an image, an AI system is usually trained on an immense set of images. If you want to identify human faces, you feed the system hundreds of thousands, or even millions, of pictures of human faces. If you’re using supervised learning to train the system, the images are labeled: Man, woman. Black, white. Old, young. Convicted criminal. Sex offender. Psychopath.

Who is in the images? How are those images labeled?

This is part of how the whole thing goes sideways. There’s more to it, though. Before a system is marketed, or released to the public, its developers are going to test it. They’re going to test the hell out of it. This can be compared with when an AI is developed that plays a particular game, like Go, or chess. After the system has been trained, you test it. To test the system, you’re going to have it play, and see if it can win — consistently. So when developers create a face recognition system, and they’ve tested it extensively, and they say, great, now it’s ready for the public, it’s ready for commercial use — ask yourself how they missed these glaring flaws.

Ask yourself how they missed the fact that the system can’t differentiate between various Asian faces.

Ask yourself how they missed the fact that the system identifies Black women as men.

Fortunately, in just the past year these flaws have received so much attention that a number of large firms (Amazon, IBM, Microsoft) have pulled back on commercial deployments of face recognition technologies. Whether they will be able to build more trustworthy systems remains to be seen.

More about bias in face recognition systems:

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


Ask a computer to draw what it sees

If a computer can correctly identify an object (an apple, a tricycle) or an animal such as a zebra, can it produce a drawing of that object or animal? This is something most people can do, even if their drawing skills are minimal. After all, almost anyone can play Pictionary.

This 8-minute video shows us what happened when a programmer-artist reversed the process of an AI that recognizes objects and animals in digital images. I really admire the deft storytelling here.

Object recognition has improved amazingly in the past 10 years, but that does not mean these AI systems see the same way as a human does. In some cases, that might not matter at all. In other cases, it can mean the difference between life and death.

In yesterday’s post I mentioned the way a convolutional neural network (part of a machine learning system) processes an image through many stacked layers of detection units (sometimes called neurons), identifying edges and shapes that eventually lead to a conclusion that the image is likely to contain such-and-such an object, animal, or person. Today’s video shows a bit more about the training process that an AI goes through before it can perform these identifications.

Training is necessary in the type of machine learning called supervised learning. The training data (in this case, digital images of objects and animals) must be labeled in advance. That is, the system receives thousands of images labeled “tiger” before it is able to recognize a tiger in a random photo or video. If a system can identify 20 different animals, that system was trained on thousands of images of each animal.

If the system was never trained on tigers, it cannot recognize a tiger.

So today’s video gives us a nice glimpse into how and why that training works, and what its limitations are. What’s really fascinating to me, though, are the images produced by programmer-artist Tom White‘s system.

“I have created a drawing system that allows neural networks to produce abstract ink prints that reveal their visual concepts. Surprisingly, these prints are recognized not only by the neural networks that created them, but also universally across most AI systems which have been trained to recognize the same objects.”

—Tom White

In the video, you’ll see that humans cannot recognize what the AI drew. The rendering is too abstract, too unlike what we see and what we would draw ourselves. Note what White says, though, about other AI systems: they can recognize the object in these AI-produced drawings.

This is, I think, related to what is called adversarial AI, which I’ll discuss in a future post.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


How machines ‘see’

I am fascinated by image recognition. I read about how ImageNet changed the whole universe of machine “vision” in 2009 in the excellent book Artificial Intelligence: A Guide for Thinking Humans, but I’m not going to discuss ImageNet in this post. (I will get to it eventually.)

To think about how a machine sees requires us first to think about human eyes vs. cameras. The machine doesn’t have a biological eyeball and an optic nerve and a brain. The machine might have one or more cameras to allow it to take in visual information.

Whether the machine has cameras or not, the images it receives are the same: digital images, made up entirely of pixels. This is true even if the visual inputs are video. The machine will need to sample that video, taking discrete frames from it to process and analyze.

So the first thing to absorb, as you begin to understand how a machine sees, is that it receives a grid of pixels. If it’s video, then there are a lot of separate grids. If it’s one still image, there is one grid. And how does the machine process that grid? It analyzes the differences between groups of pixels.

This 4-minute video, from an artist and programmer named Gene Kogan, helped me a lot.

Most people have an idea (possibly vague) of how the human brain works, with neurons kind of “wired together” in a network. When we imagine a computer neural network, most of us probably factor in that mental image of a brain full of neurons. This is both semi-accurate and wildly inaccurate.

In his video, Kogan points out that an image-recognition system uses a convolutional neural network, and this network has many, many layers.

When he’s clicking down the list in his video, Kogan is showing us what the different layers are “paying attention to” as the video is continuously chopped into one-frame segments. The mind-blowing thing (to me) is that the layers feed forward and backward to each other — ultimately producing the result he shows near the end, when he can hold a water bottle in front of his webcam, and the software says it sees a water bottle.

Screenshot of man holding water bottle and neural net evaluation of video image
Above: Screenshot from 3:10 in the video

Notice too, that “water bottle” is the machine’s top guess at that moment. Its number 2 guess is “bow tie.” Its confidence in “water bottle” is not very high, as shown by the red bar to the left of the label. However, the machine’s confidence in “water bottle” is much higher than all the other things it determines it might be seeing in that frame.

After watching this video, I understood why super-fast graphics-processing hardware is so important to image recognition and machine vision.

In tomorrow’s post, I’m going to say a bit more about these ideas and share a completely different video that also helped me a lot in my attempt to understand how machines see.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


AI programs that play games

One of the very best media items I’ve found is this feature-length documentary about the program that beat an international master at the game of Go in 2016. It’s excellent as a documentary film — well-paced, sparking curiosity, exciting in some parts, and never pedantic.

You don’t need to understand anything about the game (which is immensely popular in China, Japan, and Korea, but not widely played elsewhere). It’s explained visually so that you can appreciate what’s going on. The film is free to watch on YouTube.

As a resource for learning about AI — or, more specifically, about machine learning — the film excels at helping us understand the work of the team of humans that created and trained the AlphaGo program. We don’t see a lot of people sitting at computer keyboards, typing. There are clustered people pointing at a screen, talking enthusiastically, or saying, “What happened there? Why did it do that?”

Probably my favorite moment in the film is after Lee Se-dol, the human Go master, has played a move that is so great, it was later referred to as “the God move.” The AlphaGo team begins analyzing the program’s responses in real-time, watching the graphs of its probability calculations on a large screen in their command center. For all the talk of AI as a black box that makes decisions humans can”t comprehend, this scene demonstrates that AI can be made transparent and accountable.

There’s much, much more to love about this documentary. The director, Greg Kohs, had extraordinary access to the DeepMind team during the months leading up to the five-game match with Lee. In the end, Google financed a general-audience-friendly film. (Google acquired DeepMind in 2014.)

In an interview with CNET, Kohs said the film “had very modest beginnings.”

“A couple members of Google’s creative lab that I’d worked with before gave me a ring and said we’d have access behind the curtain with [DeepMind founder and CEO] Demis Hassabis and his team. So I jumped on board with the expectation we would just film what happens for archival purposes and then put it on a shelf on a hard drive and that would be the end of it.”

Greg Kohs

Another wonderful aspect of the film is its humanity. I’ve seen a fair number of “scare essays” that predict the end of everything as AI gains dominance over its creators — but here we hear a more nuanced and thought-provoking set of views and reactions.

First, there is Lee, possibly the best (human) Go player who has ever lived, in closeup, in the very moment of his realization that the machine has bested him. Then there are the other Go experts, who understand more than you or I what the machine has actually done. Finally, there are the team members of DeepMind, who built the machine. Of course they are happy, ecstatically happy — but they are humbled, and even awed, as well.

At the end of 2019, Lee Se-dol retired as a professional Go player, at age 36. He is the only human who has ever defeated AlphaGo in tournament play.

More about AlphaGo:

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


Getting thrown into machine learning

Early in 2018, I had several senior journalism students who wanted to learn about machine learning. I knew nothing about it, and they knew that, and we plowed forward together.

The three student teams chose these topics:

  • Sentiment analysis on subreddits for NBA teams
  • Analysis of county court documents naming our university
  • Analysis of tweets by one news organization for audience reactions, engagements

We quickly learned that knowing Python was a big plus. (Fortunately, we all knew Python.) Each of the teams found a different Python library to work with, and after a few weeks, projects were completed and demonstrated — although desired results were not achieved in all cases.

I crammed information mainly from two sources — a YouTube video series called Machine Learning Recipes with Josh Gordon, and something I’ve lost that explained in detail how a model was trained on the Iris Data Set. These provided a surprisingly solid foundation for beginning to understand how today’s machine learning projects are done.

Above: Histograms and features from the Iris Data Set

Since then, I’ve continued to read casually about AI and machine learning. As more and more articles have appeared in the general press and news reports about face recognition and self-driving cars (among other topics related to AI), it’s become clear to me that journalism students need to know more about these technologies — if for no other reason than to avoid being bamboozled by buzzword-spewing politicians or tech-company flacks.

Since May 2020, I’ve been collecting resources, reading and researching, with an intention to teach a course about AI for communications students in spring 2021. This new blog is going to help me organize and prioritize articles, posts, videos, and more.

If it helps other people get a handle on AI, so much the better!

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.