What is a neural network and how does it work?

The most wonderful thing about YouTube is you can use it to learn just about anything.

One of the 10,000 annoying things about YouTube is finding a good, satisfying version of the lesson you want to learn can take hours of searching. This is especially true of videos about technical aspects of machine learning. Of course there are one- and two-hour recordings of course lectures by computer science professors. But I’ve been seeking out shorter videos with more animations and illustrations of concepts.

Understanding what a neural network is and how it processes data is necessary to demystifying machine learning. Data goes in, results come out — but in between is a “black box” consisting of code and hardware. It sort of works like a human brain, and yet, it really doesn’t.

So here at last is a painless, math-free video that walks us through a neural network. The particular example shown uses the MNIST dataset, which consists of 70,000 images of handwritten digits, 0–9. So the task being performed is the recognition of those digits. (This kind of system can be used to sort mail using postal codes, for example.)

What you’ll see is how the first layer (a vertical line of circles on the left side) represents the input. If each of the MNIST images is 28 pixels wide by 28 pixels high, then that first layer has to represent 784 pixels and each of their color values — which is a number. (One image is the input — only one at a time.)

The final vertical layer, all the way to right side, is the output of the neural network. In this example, the output tells us which digit was in the input — 0, 1, 2, etc. To see the value in this, go back to the mail-sorting idea. If a system can read postal codes, it recognizes several numbers and then transmits them to another system that “knows” which postal code goes to which geographical location. My letter gets sorted into the Florida bin and yours into the bin for your home.

In between the input and the output are the vertical “hidden” layers, and that’s where the real work gets done. In the video you’ll see that the number of circles — often called neurons, but they can also be called just units — in a hidden layer might well be less than the number of units in the input layer. The number of units in the output layer can also differ from the numbers in other layers.

When the video describes edge detection, you might recall an earlier post here.

Beautifully, during an animation, our teacher Grant Sanderson explains and shows that the weights exist not in or on the units (the “neurons”) but in fact in or on the connections between the units.

Okay, I lied a little. There is some math shown here. The weight assigned to the connection is multiplied by the value of the unit to the left. The results are all summed, for all left-side units, and that sum is assigned to the unit to the right (meaning the right side of that one connection).

The video bogs down just a bit between the Sigmoid squishification function and applying the bias, but all you really need to grasp is that the value of the right-side unit shows whether or not that little region of the image (in this case, it’s an image) has a significant difference. The math is there to determine if the color, the amount of color, is significant enough to count. And how much it should count.

I know — math, right?

But seriously, watch the video. It’s excellent.

“And that’s a lot to think about! With this hidden layer of 16 neurons, that’s a total of 784 times 16 weights, along with 16 biases. And all of that is just the connections from the first layer to the second.”

—Grant Sanderson, But what is a neural network? (video)

Sanderson doesn’t burden us with the details of the additional layers. Once you’ve seen the animations for that first step — from the input layer through the connections to the first hidden layer — you’ll have a real appreciation for what’s happening under the hood in a neural network.

In the final 6 minutes of this 19-minute video, you’ll also learn how the “learning” takes place in machine learning when a neural net is involved. All those weights and bias values? They are not determined by humans.

“Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions.”

—Grant Sanderson, But what is a neural network? (video)

I confess it does get rather mathy at the end, but hang on through the parts that are beyond your personal math background and listen to what Sanderson is telling us. You can get a lot out of it even if the equation itself is like hieroglyphics to you.

The video content ends at 16:26, followed by the usual “subscribe to my channel” message. More info about Sanderson and his excellent videos is on his website, 3Blue1Brown.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Sorting out a degree in artificial intelligence

Reading course descriptions and degree plans has helped me understand more about the fields of artificial intelligence and data science. I think some universities have whipped up a program in one of these hot fields of study just to put something on the books. It’s quite unfair to students if this is just a collection of existing courses and not a deliberate, well structured path to learning.

I came across this page from Northeastern University that attempts to explain the “difference” between artificial intelligence and machine learning. (I use those quotation marks because machine learning is a subset of artificial intelligence.) The university has two different master’s degree programs for artificial intelligence; neither one has “machine learning” in its name — but read on!

Illustration by chenspec at Pixabay

One of the two programs does not require a computer science undergraduate degree. It covers data science, robotics, and machine learning.

The other master’s program is for students who do have a background in computer science. It covers “robotic science and systems, natural language processing, machine learning, and special topics in artificial intelligence.”

I noticed that data science is in the program for those without a computer science background, while it’s not mentioned in the other program. This makes sense if we understand that data science and machine learning really go hand in hand nowadays. A data scientist likely will not develop any new machine learning systems, but she will almost certainly use machine learning to solve some problems. Training in statistics is necessary so that one can select the best algorithm for use in machining learning for solving a particular problem.

Graduates of the other program, with their prior experience in computer science, should be ready to break ground with new and original AI work. They are not going to analyze data for firms and organizations. Instead, they are going to develop new systems that handle data in new ways.

The distinction between these two degree programs highlights a point that perhaps a lot of people don’t yet understand: people (like journalists who have code experience) are training models — using machine learning systems through writing code to control them — and yet they are not people who create new machine learning systems.

Separately there are developers who create new AI software systems, and engineers who create new AI hardware systems. In other words, there are many different roles in the AI field.

Finally, there are so-called AI systems sold to banks and insurance companies, and many other types of firms, for which the people using the system do not write code at all. Using them requires data to be entered, and results are generated (such as whose insurance rates will go up next year). The workers who use these systems don’t write code any more than an accountant writes code. Moreover, they can’t explain how the system works — they need only know what goes in and what comes out.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Comment moderation as a machine learning case study

Continuing my summary of the lessons in Introduction to Machine Learning from the Google News Initiative, today I’m looking at Lesson 5 of 8, “Training your Machine Learning model.” Previous lessons were covered here and here.

Now we get into the real “how it works” details — but still without looking at any code or computer languages.

The “lesson” (actually just a text) covers a common case for news organizations: comment moderation. If you permit people to comment on articles on your site, machine learning can be used to identify offensive comments and flag them so that human editors can review them.

With supervised learning (one of three approaches included in machine learning; see previous post here), you need labeled data. In this case, that means complete comments — real ones — that have already been labeled by humans as offensive or not. You need an equally large number of both kinds of comments. Creating this dataset of comments is discussed more fully in the lesson.

You will also need to choose a machine learning algorithm. Comments are text, obviously, so you’ll select among the existing algorithms that process language (rather than those that handle images and video). There are many from which to choose. As the lesson comes from Google, it suggests you use a Google algorithm.

In all AI courses and training modules I’ve looked at, this step is boiled down to “Here, we’ll use this one,” without providing a comparison of the options available. This is something I would expect an experienced ML practitioner to be able to explain — why are they using X algorithm instead of Y algorithm for this particular job? Certainly there are reasons why one text-analysis algorithm might be better for analyzing comments on news articles than another one.

What is the algorithm doing? It is creating and refining a model. The more accurate the final model is, the better it will be at predicting whether a comment is offensive. Note that the model doesn’t actually know anything. It is a computer’s representation of a “world” of comments in which some — with particular features or attributes perceived in the training data — are rated as offensive, and others — which lack a sufficient quantity of those features or attributes — are rated as not likely to be offensive.

The lesson goes on to discuss false positives and false negatives, which are possibly unavoidable — but the fewer, the better. We especially want to eliminate false negatives, which are offensive comments not flagged by the system.

“The most common reason for bias creeping in is when your training data isn’t truly representative of the population that your model is making predictions on.”

—Lesson 6, Bias in Machine Learning

Lesson 6 in the course covers bias in machine learning. A quick way to understand how ML systems come to be biased is to consider the comment-moderation example above. What if the labeled data (real comments) included a lot of comments offensive to women — but all of the labels were created by a team of men, with no women on the team? Surely the men would miss some offensive comments that women team members would have caught. The training data are flawed because a significant number of comments are labeled incorrectly.

There’s a pretty good video attached to this lesson. It’s only 2.5 minutes, and it illustrates interaction bias, latent bias, and selection bias.

Lesson 6 also includes a list of questions you should ask to help you recognize potential bias in your dataset.

It was interesting to me that the lesson omits a discussion of how the accuracy of labels is really just as important as having representative data for training and testing in supervised learning. This issue is covered in ImageNet and labels for data, an earlier post here.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Interrogating the size of AI algorithms

I have watched so many videos in my journey to understand how artificial intelligence and machine learning work, and one of my favorite YouTube channels belongs to Jordan Harrod. She’s a Ph.D. student working on neuroengineering, brain-machine interfaces, and machine learning.

I began learning about convolutional neural networks in my reading about AI. Like most people (?), I had a vague idea of a neural network being modeled after a human brain, with parallel processors wired together like human synapses. When you read about neural nets in AI, though, you are not reading about processors, computer chips, or hardware. Instead, you read about layers and weights. (Among other things.)

A deep neural network has multiple layers. That’s what makes it “deep.” You’ll see these layers in a simple diagram in the 4-minute video below. A convolutional neural network has hidden layers. These are not hidden as in “secret”; they are called hidden because they are sandwiched in between the input layer and and output layer.

The weights are — as with all computer data — numeric. What happens in machine learning is that the weights associated with each node in a layer are adjusted, again and again, during the process of training the AI — with an end result that the neural network’s output is more accurate, or even highly accurate.

As Harrod points out, not all AI systems include a neural network. She says that “training a model will almost always produce a set of values that correspond or are analogous to weights in a neural network.” I need to think more about that.

Now, does Harrod definitively answer the question “How big is an AI algorithm?” Not really. But she provides a nice set of concepts to help us understand why there isn’t just one simple answer to this question. She offers a glimpse at the way AI works under the hood that might make you hungry to learn more.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Face detection without a deep neural network

I was surprised when I watched this video about how most face detection works. Granted, this is not face recognition (identifying the specific person). Face detection looks at an image or video and can almost instantly point out all the human faces. In a consumer camera, this is part of the code that puts a rectangle around each person’s face while you’re framing your shot.

What’s wonderful in the video is how the Viola–Jones object detection framework is illustrated and explained so that even we non-math types can understand it.

Like the game cases I wrote about yesterday, this is a case where tried-and-true algorithms are used, but deep neural networks are not.

As is typical with AI, there is a model. How does the code identify a human face? It “knows” some things about the shape and proportions of human faces. But it knows these attributes (features) not as noses and eyes and mouths — as we humans do. Instead, it knows them as rectangular shapes that map very well to the pixels in a digital image.

Above: Graphic from Viola and Jones (2001) — PDF

Make sure you stay with the video until 3:30, when Mike Pound begins to draw on paper. (This drawing-by-hand is a large part of why I love the videos from Computerphile!) At 8:30 he begins drawing a face to show how the algorithm analyzes that segment of an image.

The one part that might not be clear (depending on how much time you spend thinking about pixels in images) is that the numbers in the grid he draws represent values of lightness or darkness in the image. In all cases, computers require knowledge to be represented as numbers. When dealing with images, numbers represent differences. To compare sections of an image with other sections, the numeric values for one section are added up and compared with the sum of numeric values from another section.

The animations in the final three minutes of the video provide an awesomely clear explanation of how the regions of the image are assessed and quickly discarded as “not a face” or retained for further examination.

Computers are lightning-fast at these kinds of calculations. This method is so efficient, it runs rapidly even on simple hardware — which is why this method of face detection has been in use since 2002.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

What is called ‘AI’ but really isn’t

Because “artificial intelligence” and “AI” have become such potent buzzwords in business — and so many firms are trying to sell some kind of “AI” system or software or strategy to every business possible — we should all take a step back and evaluate whether there is actual AI operating in some of these systems.

That won’t always be easy to discern. If a company claims there is “AI” in its product, they are not going to divulge exactly how it works. If they want to convince you, their literature or their engineers will likely throw out a tangled net of terms that, while accurate, might not help anyone but another engineer understand what’s inside the black box.

I was thinking about this recently as I worked on assignments for an online computer science course in AI. One of the early projects was to program a tic-tac-toe game in which a human can play against “an AI.” Just like most humans, the AI can force a tie in every tic-tac-toe game unless the human makes a mistake, and then the human will lose. I wrote the code that enables the AI to play — that was the assignment. But I didn’t invent the code from nothing. I was taught in the course to use an algorithm called minimax. Further, I was encouraged to make my program faster by using another algorithm called alpha-beta pruning.

Illustration of alpha-beta pruning (Wikipedia, by Jez9999, GNU license)

There is no machine learning involved in those two algorithms. They are simply a time-tested way for a computer language to direct a certain kind of look-ahead in a two-player game (not only tic-tac-toe).

Don’t despair or tune out — look at the diagram and understand that the computer, through instructions in my code, is able to rapidly advance through every possible outcome in tic-tac-toe and see how to: (a) prevent a win for the opponent, and (b) win if a win is possible.

There is no magic here.

Tic-tac-toe with “AI” playing X, human playing O.

Another assignment in the same course has the students programming “an AI” that plays Minesweeper. This game is quite different from tic-tac-toe in that there is only one player, and there is hidden knowledge: The player doesn’t know where the mines are. One move at a time, the player builds knowledge about the game board.

Completed Minesweeper game, with AI playing all moves.

A human player doesn’t click on a mine, because she chooses squares that are next to a 0 (indicating no mines touch that square) and marks a mine square when it becomes obvious that a mine is hidden there.

The “AI” builds knowledge in a way that it is programmed to do (that is the assignment). In this case, there is no pre-existing algorithm, but there are principles of logic. I programmed “knowledge” that was stored in the program each time the AI clicked a square and a number was revealed. The knowledge is: (a) that number, and (b) the coordinates of all the surrounding squares. Thus the AI “knows” that, for example, among eight specified squares there are two mines.

If among eight specified squares there are zero mines, my code tells the AI to mark all eight of those squares as safe. My code also tells the AI that if there are any safe moves left to be made, then make a safe move. If not, make a random move. That is the only time when the AI can possibly set off a mine.

Once again, there is no magic here.

In contrast to these two simple examples of a computer successfully playing a game, AlphaGo (which I wrote about previously) uses real AI and could not have beaten a human Go master otherwise. Some games can’t be programmed with only simple algorithms or logic — if they are to win, they need something akin to intuition.

Programming a computer to develop and use an approximation of human intuition is what we have in today’s machine learning with deep neural networks. It’s still not magic, but it’s a lot more complicated than the kind of strictly mapped-out processes I wrote for playing tic-tac-toe or Minesweeper.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.