Explaining common misconceptions about AI

Sometimes people make a statement that an artificial intelligence system is a computer system that learns, or that learns on its own.

That is inaccurate. Machine learning is a subset of artificial intelligence, not the whole field. Machine learning systems are computer systems that learn from data. Other AI systems do not. Various systems are wholly programmed by humans to follow explicit rules and do not generate any code or instructions on their own.

The error probably arises from the fact that many of the exciting advances in AI since 2012 have involved some form of machine learning.

The recent successes of machine learning have much to do with neural networks, each of which is a system of algorithms that (in some respects) mimics the way neurons work in the brains of humans and other animals — but only in some respects. In other words, a neural network shares some features with human brains, but is not extremely similar to a human brain in all its complexity.

Advances in neural networks have been made possible not only by new algorithms (written by humans) but also by new computer hardware that did not exist in the earlier decades of AI development. The main advance concerns graphical processing units, commonly called GPUs. If you’ve noticed how computer games have evolved from simple flat pixel blocks (e.g. Pac-Man) to vast 3D worlds through which the player can fly or run at high speed, turning in different directions to view vast new landscapes, you can extrapolate how the advanced hardware has increased the speed of processing of graphical information by many orders of magnitude.

Without today’s GPUs, you can’t create a neural network that runs multiple algorithms in parallel fast enough to achieve the amazing things that AI systems have achieved. To be clear, the GPUs are just engines, powering the code that creates a neural network.

More about the role of GPUs in today’s AI: Computational Power and the Social Impact of Artificial Intelligence (2018), by Tim Hwang.

Another reason why AI has leapt onto the public stage recently is Big Data. Headlines alerted us to the existence and importance of Big Data a few years ago, and it’s tied to AI because how else could we process that ginormous quantity of data? If all we were doing with Big Data was adding sums, well, that’s no big deal. What businesses and governments and the military really want from Big Data, though, is insights. Predictions. They want to analyze very, very large datasets and discover information there that helps them control populations, make greater profits, manage assets, etc.

Big Data became available to businesses, governments, the military, etc., because so much that used to be stored on paper is now digital. As the general population embraced digital devices for everyday use (fitness, driving cars, entertainment, social media), we contributed even more data than we ever had before.

Very large language models (an aspect of AI that contributes to Google Translate, automatic subtitles on YouTube videos, and more) are made possible by very, very large collections of text that are necessary to train those models. Something I read recently that made an impression on me: For languages that do not have such extensive text corpuses, it can be difficult or even impossible to train an effective model. The availability of a sufficiently enormous amount of data is a prerequisite for creating much of the AI we hear and read about today.

If you ever wonder where all the data comes from — don’t forget that a lot of it comes from you and me, as we use our digital devices.

Perhaps the biggest misconception about AI is that machines will soon become as intelligent as humans, or even more intelligent than all of us. As a common feature in science fiction books and movies, the idea of a super-intelligent computer or robot holds a rock-solid place in our minds — but not in the real world. Not a single one of the AI systems that have achieved impressive results is actually intelligent in the way humans (even baby humans!) are intelligent.

The difference is that we learn from experience, and we are driven by curiosity and the satisfaction we get from experiencing new things — from not being bored. Every AI system is programmed to perform particular tasks on the data that is fed to it. No AI system can go and find new kinds of data. No AI system even has a desire to do so. If a system is given a new kind of data — say, we feed all of Wikipedia’s text to a face-recognition AI system — it has no capability to produce meaningful outputs from that new kind of input.

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Intro to Machine Learning course

A couple of days ago, I wrote about Kaggle’s free introductory Python course. Then I started the next free course in the series: Intro to Machine Learning. The course consists of seven modules; the final module, like the last module in the Python course, shows you how to enter a Kaggle competition using the skills from the course.

The first module, “How Models Work,” begins with a simple decision tree, which is nice because (I think) everyone can grasp how that works, and how you add complexity to the tree to get more accurate answers. The dataset is housing data from Melbourne, Australia; it includes the type of housing unit, the number of bedrooms, and most important, the selling price (and other data too). The data have already been cleaned.

In the second module, we load the Python Pandas library and the Melbourne CSV file. We call one basic statistics function that is built into Pandas — describe() — and get a quick explanation of the output: count, mean, std (standard deviation), min, max, and the three quartiles: 25%, 50% (median), 75%.

When you do the exercise for the module, you can copy and paste the code from the lesson into the learner’s notebook.

The third module, “Your First Machine Learning Model,” introduces the Pandas columns attribute for the dataframe and shows us how to make a subset of column headings — thus excluding any data we don’t need to analyze. We use the dropna() method to eliminate rows that have missing data (this is not explained). Then we set the prediction target (y) — here it will be the Price column from the housing data. This should make sense to the learner, given the earlier illustration of the small decision tree.

y = df.Price

We use the previously created list of selected column headings (named features) to create X, the features of each house that will go into the decision tree model (such as the number of rooms, and the size of the lot).

X = df[features]

Then we build a model using Python’s scikit-learn library. Up to now, this will all be familiar to anyone who’s had an intro-to-Pandas course, particularly if the focus was data science or data journalism. I do like the list of steps given (building and using a model):

  1. Define: What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.
  2. Fit: Capture patterns from provided data. This is the heart of modeling.
  3. Predict: Just what it sounds like.
  4. Evaluate: Determine how accurate the model’s predictions are. (List quoted from Kaggle course.)

Since fit() and predict() are commands in scikit-learn, it begins to look like machine learning is just a walk in the park! And since we are fitting and predicting on the same data, the predictions are perfect! Never fear, that bubble will burst in module 4, “Model Validation,” in which the standard practice of splitting your data into a training set and a test set is explained.

First, though, we learn about predictive accuracy. Out of all the various metrics for summarizing model quality, we will use one called Mean Absolute Error (MAE). This is explained nicely using the housing prices, which is what we are attempting to predict: If the house sold for $150,000 and we predicted it would sell for $100,000, then the error is $150,000 minus $100,000, or $50,000. The function for MAE sums up all the errors and returns the mean.

This is where the lesson says, “Uh-oh! We need to split our data!” We use scikit-learn’s train_test_split() method, and all is well.

MAE shows us our model is pretty much crap, though. In the fifth module, “Underfitting and Overfitting,” we get a good explanation of the title topic and learn how to limit the number of leaf nodes at the end of our decision tree — DecisionTreeRegressor(max_leaf_nodes).

After all that, our model’s predictions are still crap — because a decision tree model is “not very sophisticated by modern machine learning standards,” the module text drolly explains. That leads us to the sixth module, “Random Forests,” which is nice for two reasons: (1) The explanation of a random forest model should make sense to most learners who have worked through the previous modules; and (2) We get to see that using a different model from scikit-learn is as simple as changing

my_model = DecisionTreeRegressor(random_state=1)

to

my_model = RandomForestRegressor(random_state=1)

Overall I found this a helpful course, and I think a lot of beginners could benefit from taking it — depending on their prior level of understanding. I would assume at least a familiarity with datasets as CSV files and a bit more than beginner-level Python knowledge.

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Free courses at Kaggle

I recently found out that Kaggle has a set of free courses for learning AI skills.

Screenshot from Kaggle.com

The first course is an introduction to Python, and these are the course modules:

  1. Hello, Python: A quick introduction to Python syntax, variable assignment, and numbers
  2. Functions and Getting Help: Calling functions and defining our own, and using Python’s builtin documentation
  3. Booleans and Conditionals: Using booleans for branching logic
  4. Lists: Lists and the things you can do with them. Includes indexing, slicing and mutating
  5. Loops and List Comprehensions: For and while loops, and a much-loved Python feature: list comprehensions
  6. Strings and Dictionaries: Working with strings and dictionaries, two fundamental Python data types
  7. Working with External Libraries: Imports, operator overloading, and survival tips for venturing into the world of external libraries

Even though I’m an intermediate Python coder, I skimmed all the materials and completed the seven problem sets to see how they are teaching Python. The problems were challenging but reasonable, but the module on functions is not going to suffice for anyone who has little prior experience with programming languages. I see this in a lot of so-called introductory materials — functions are glossed over with some ready-made examples, and then learners have no clue how returns work, or arguments, etc.

At the end of the course, the learner is encouraged to join a Kaggle competition using the Titanic passengers dataset. However, the learner is hardly prepared to analyze the Titanic data at this point, so really this is just an introduction to how to use files provided in a competition, name your notebook, save your work, and submit multiple attempts. The tutorial gives you all the code to run a basic model with the data, so it’s really more a demo than a tutorial.

My main interest is in the machine learning course, which I’ll begin looking at today.

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

What is a neural network and how does it work?

The most wonderful thing about YouTube is you can use it to learn just about anything.

One of the 10,000 annoying things about YouTube is finding a good, satisfying version of the lesson you want to learn can take hours of searching. This is especially true of videos about technical aspects of machine learning. Of course there are one- and two-hour recordings of course lectures by computer science professors. But I’ve been seeking out shorter videos with more animations and illustrations of concepts.

Understanding what a neural network is and how it processes data is necessary to demystifying machine learning. Data goes in, results come out — but in between is a “black box” consisting of code and hardware. It sort of works like a human brain, and yet, it really doesn’t.

So here at last is a painless, math-free video that walks us through a neural network. The particular example shown uses the MNIST dataset, which consists of 70,000 images of handwritten digits, 0–9. So the task being performed is the recognition of those digits. (This kind of system can be used to sort mail using postal codes, for example.)

What you’ll see is how the first layer (a vertical line of circles on the left side) represents the input. If each of the MNIST images is 28 pixels wide by 28 pixels high, then that first layer has to represent 784 pixels and each of their color values — which is a number. (One image is the input — only one at a time.)

The final vertical layer, all the way to right side, is the output of the neural network. In this example, the output tells us which digit was in the input — 0, 1, 2, etc. To see the value in this, go back to the mail-sorting idea. If a system can read postal codes, it recognizes several numbers and then transmits them to another system that “knows” which postal code goes to which geographical location. My letter gets sorted into the Florida bin and yours into the bin for your home.

In between the input and the output are the vertical “hidden” layers, and that’s where the real work gets done. In the video you’ll see that the number of circles — often called neurons, but they can also be called just units — in a hidden layer might well be less than the number of units in the input layer. The number of units in the output layer can also differ from the numbers in other layers.

When the video describes edge detection, you might recall an earlier post here.

Beautifully, during an animation, our teacher Grant Sanderson explains and shows that the weights exist not in or on the units (the “neurons”) but in fact in or on the connections between the units.

Okay, I lied a little. There is some math shown here. The weight assigned to the connection is multiplied by the value of the unit to the left. The results are all summed, for all left-side units, and that sum is assigned to the unit to the right (meaning the right side of that one connection).

The video bogs down just a bit between the Sigmoid squishification function and applying the bias, but all you really need to grasp is that the value of the right-side unit shows whether or not that little region of the image (in this case, it’s an image) has a significant difference. The math is there to determine if the color, the amount of color, is significant enough to count. And how much it should count.

I know — math, right?

But seriously, watch the video. It’s excellent.

“And that’s a lot to think about! With this hidden layer of 16 neurons, that’s a total of 784 times 16 weights, along with 16 biases. And all of that is just the connections from the first layer to the second.”

—Grant Sanderson, But what is a neural network? (video)

Sanderson doesn’t burden us with the details of the additional layers. Once you’ve seen the animations for that first step — from the input layer through the connections to the first hidden layer — you’ll have a real appreciation for what’s happening under the hood in a neural network.

In the final 6 minutes of this 19-minute video, you’ll also learn how the “learning” takes place in machine learning when a neural net is involved. All those weights and bias values? They are not determined by humans.

“Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions.”

—Grant Sanderson, But what is a neural network? (video)

I confess it does get rather mathy at the end, but hang on through the parts that are beyond your personal math background and listen to what Sanderson is telling us. You can get a lot out of it even if the equation itself is like hieroglyphics to you.

The video content ends at 16:26, followed by the usual “subscribe to my channel” message. More info about Sanderson and his excellent videos is on his website, 3Blue1Brown.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Getting thrown into machine learning

Early in 2018, I had several senior journalism students who wanted to learn about machine learning. I knew nothing about it, and they knew that, and we plowed forward together.

The three student teams chose these topics:

  • Sentiment analysis on subreddits for NBA teams
  • Analysis of county court documents naming our university
  • Analysis of tweets by one news organization for audience reactions, engagements

We quickly learned that knowing Python was a big plus. (Fortunately, we all knew Python.) Each of the teams found a different Python library to work with, and after a few weeks, projects were completed and demonstrated — although desired results were not achieved in all cases.

I crammed information mainly from two sources — a YouTube video series called Machine Learning Recipes with Josh Gordon, and something I’ve lost that explained in detail how a model was trained on the Iris Data Set. These provided a surprisingly solid foundation for beginning to understand how today’s machine learning projects are done.

Above: Histograms and features from the Iris Data Set

Since then, I’ve continued to read casually about AI and machine learning. As more and more articles have appeared in the general press and news reports about face recognition and self-driving cars (among other topics related to AI), it’s become clear to me that journalism students need to know more about these technologies — if for no other reason than to avoid being bamboozled by buzzword-spewing politicians or tech-company flacks.

Since May 2020, I’ve been collecting resources, reading and researching, with an intention to teach a course about AI for communications students in spring 2021. This new blog is going to help me organize and prioritize articles, posts, videos, and more.

If it helps other people get a handle on AI, so much the better!

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.