After writing yesterday’s post, I was thinking about how much students should know about algorithms if they are to have a basic understanding of how AI works. Is it enough to tell them an algorithm is a set of instructions?
So I turned, as I often do, to Khan Academy — a free online learning site that often helps me through my lack of a mathematics background. I found a set of three short lessons, starting with a video.
In the introductory video, “What is an algorithm and why should you care?”, we see various practical uses of algorithms, followed by the statement above, and a brief description of how route finding works — what Google Maps does when it gives you directions. Route finding is often used as an example of accepting a “good enough” output for the sake of speed (that is, efficiency).
Watching the animation, we comprehend that the computer is following a set of instructions to determine a good route for a delivery truck with 25 stops to make. We see the process of the algorithm at work, rather than seeing formulas and equations.
I love that the video also shows us, with animation, how the efficiency of an algorithm is calculated.
The second lesson, “A guessing game,” demonstrates binary search (an algorithm) by allowing you to discover it interactively. Wonderful!
The third lesson, “Route-finding,” is much more reading intensive. It explains the algorithm in terms of solving a maze. Without knowing the exact path to solve the maze, the algorithm can “know” which choice for its next step takes it closer to the goal (the center of the maze). I don’t consider this lesson very helpful, but that’s because I saw a much better explanation of maze-solving algorithms here:
I am continually amazed and humbled by the variety of ways in which people teach these concepts. More important, I realize how some ways of explaining a concept are not at all effective — for me, at least — and another way of explaining makes it clear as crystal.
So, how much should students know about algorithms, if they are to have a general understanding of AI? I think a good start would be to watch and discuss the introductory Khan Academy video, and also to see a further visual (probably animated) representation of another kind of algorithm at work.
The first post, “What is an algorithm, anyway?”, addresses the fact that the word algorithm is often bandied about as if it means a mysterious, possibly evil, machine-embedded power.
But an algorithm doesn’t need to have anything to do with computers. An algorithm is a set of instructions for how to solve a problem. A recipe for a cake is an algorithm.
And yes, of course, computer software is full of algorithms. The programs that make machine learning and artificial intelligence work are full of algorithms. So algorithms are not magical, and they are not good or bad by nature. Also, they are not perfect.
We went through a period — maybe five years, maybe more — when there were a ton of articles about algorithms, and the word became almost common in nonfiction book titles. Now I see a shift toward the term AI — or artificial intelligence, or machine learning — substituting for algorithms in provocative headlines.
Too many articles, though, don’t make much of an effort to differentiate, to explain what they’re really talking about. They may as well just say computers, or software.
An algorithm is real. It is constructed by a person, or people, to do a certain task. Algorithms are often combined, so that inside one algorithm, another algorithm is followed. Thus algorithms can be components of other algorithms.
I’m often reminded of a book I read three years ago, Algorithms to Live By: The Computer Science of Human Decisions. It was fun to read, but it was hardly the breezy self-help type of thing the cover blurbs might lead one to believe. The authors describe and explain a number of established algorithms used widely in various fields and applications — and they apply each one to everyday life.
Stories about the people who discovered (authored) many of the algorithms are woven in. I appreciated seeing how someone working on one problem sometimes ended up solving another. I also saw how an algorithm built for one use gets repurposed for other ends. Best of all, I understood what many of the algorithms are meant to do — as well as how they do it.
What I’d like to see in general articles about algorithms is a little more of what Christian and Griffiths managed to do in their book.
On Fridays I try to find something to write about that’s a little less heavy than explanations of neural networks and examinations of embedded biases in AI systems. I call it Friday AI Fun.
The BBC recently wrote about a mobile app that uses AI to help you concoct a meal from the ingredients you already have at home. Plant Jammer is available for both iOS and Android, and it doesn’t merely take your ingredients and find an existing recipe for you — it actually creates a new recipe.
According to BBC journalist Nell Mackenzie, the results are not always delicious. She made some veggie burgers that came out tasting like oatmeal.
I was interested in how the app uses AI, and this is what I found: The team behind Plant Jammer consists of 15 chefs and data scientists, based in Copenhagen, Denmark. They admit that “AI is only a fraction” of what powers the app, framing that as a positive because the app incorporates “gastronomical learnings from chefs.”
The app includes multiple databases, including one of complete recipes. An aspect of the AI is a recommender system, which they compare to Netflix’s. As Plant Jammer learns more about you, it will improve at creating recipes you like, based on “people like you.”
“We asked the chefs which ingredients are umami, and how umami they are. This part reflects the ‘human intelligence’ we used to build our system, a great ‘engine’ that has led to very interesting findings.”
—Michael Haase, CEO, Plant Jammer
My searches led me to an interview with Michael Haase, Plant Jammer’s CEO, in which he described the “gastro-wheel” feature in the app. The wheel encourages you to find balance in your ingredients among a base, something fresh, umami, crunch, sweet-spicy-bitter, and something that ties the ingredients together in harmony.
I’ve downloaded the app but, unlike Mackenzie, I haven’t been brave enough yet to let it create a recipe for me. Exploring some of the recommended recipes in the app, I did find the ability to select any ingredient and instantly see substitutions for it — that could come in handy!
Mackenzie’s article for the BBC also describes other AI–powered food and beverage successes, such as media agency Tiny Giant using AI to help clients “find new combinations of flavors for cupcakes and cocktails.”
I doubt I will ever program a neural network, but I’m trying to understand how they work — and how they are trained — well enough to make assumptions about how the systems work. What I want to be able to do is raise questions when I hear about a new-to-me AI system. I don’t want to take it on faith that a system is safe and likely to function well.
Ultimately I want to help my journalism and communications students understand this too.
Last week I discussed here a video about how neural networks work. Some time before I found that video, I had watched this one a couple of times. It’s from 2015 and it’s only 6 minutes long. It’s been viewed on YouTube more than 9 million times. In fact, it’s pretty close to 1 billion views!
Video game designer Seth Bling demonstrates a fully trained neural network that plays Mario expertly. Then he shows us how the system looks at the start, when the Mario character just stands in one place and dies every time. This is the untrained neural network, when it “knows” nothing.
Unlike the example in my earlier post — where the input to the neural network was an image of a handwritten number, and the output was the number (thereby “reading” the image) — here the input is the game state, which changes by the split second. The game state is a simplified digital representation of the Mario character, the surfaces he can run on or jump to, and any obstacles or rewards that are present. The output is which button should be pressed — holding down right continuously makes Mario run toward the right without stopping.
So the output layer in this neural network is the set of all possible actions Mario can take. For a human playing the game, these would be the buttons on the game controller.
In the training, Mario has a “fitness level,” which is a number. When Mario is dying all the time, that number stays around 2. When Mario reaches the end of the level without dying (but without scoring extra points), his fitness is 528. So by “looking at” the fitness level, the neural net assesses success. If the number has increased, then keep doing the same thing.
“The more lines and neurons you have, the more nuanced the decisions can be.”
Of course there are more actions than only moving right. Training the neural net to make Mario jump and perform more actions required many generations of neural nets, and only the best-performing ones were selected for the next generation. After 34 generations, the fitness level reached 4,000.
One thing I especially like about this video is the simultaneous visual of real Mario running in the real game level, along with a representation of the neural net showing its pathways in green and red. There is no code and no math in this video, and so while watching it, you are only thinking about how the connections come to be made and reinforced.
The method used is called NeuroEvolution of Augmenting Topologies (NEAT), which I’ve read almost nothing about — but apparently it enables the neural net to grow itself, essentially. Which is kind of mind blowing.
I’m interested in applications of machine learning in journalism. This is natural, as my field is journalism. In the field of computer science, however, accolades and honors tend to favor research on new algorithms or procedures, or new network architectures. Applications are practical uses of algorithms, networks, etc., to solve real-world problems — and developing them often doesn’t garner the acclaim that researchers need to advance their careers.
“The first image of a black hole was produced using machine learning. The most accurate predictions of protein structures, an important step for drug discovery, are made using machine learning.”
Noting that applications of machine learning are making real contributions to science in fields outside computer science, Kerner (who works on machine learning solutions for NASA’s food security and agriculture program) asks how much is lost because of the priorities set by the journals and conferences in the machine learning field.
She also ties this focus on ML research for the sake of advancing ML to the seepage of bias out from widely used datasets into the mainstream — the most famous cases being in face recognition, with systems (machine learning models) built on flawed datasets that disproportionately skew toward white and male faces.
“When studies on real-world applications of machine learning are excluded from the mainstream, it’s difficult for researchers to see the impact of their biased models, making it far less likely that they will work to solve these problems.”
Machine learning is rarely plug-and-play. In creating an application that will be used to perform useful work — to make new discoveries, perhaps, or to make medical diagnoses more accurate — the machine learning researchers will do substantial new work, even when they use existing models. Just think, for a moment, about the data needed to produce an image of a black hole. Then think about the data needed to make predictions of protein structures. You’re not going to handle those in exactly the same way.
I imagine the work is quite demanding when a number of non–ML experts (say, the biologists who work on protein structures) get together with a bunch of ML experts. But either group working separately from the other is unlikely to come up with a robust new ML application. Kerner linked to this 2018 news report about a flawed cancer-detection system — leaked documents said that “instead of feeding real patient data into the software,” the system was trained on data about hypothetical patients. (OMG, I thought — you can’t train a system on fake data and then use it on real people!)
Judging from what Kerner has written, machine learning researchers might be caught in a loop, where they work on pristine and long-used datasets (instead of dirty, chaotic real-world data) to perfect speed and efficiency of algorithms that perhaps become less adaptable in the process.
It’s not that applications aren’t getting made — they are. The difficulty lies in the priorities for research, which might dissuade early-career ML researchers in particular from work on solving interesting and even vital real-world problems — and wrestling with the problems posed by messy real-world data.
I was reminded of something I’ve often heard from data journalists: If you’re taught by a statistics professor, you’ll be given pre-cleaned datasets to work with. (The reason being: She just wants you to learn statistics.) If you’re taught by a journalist, you’ll be given real dirty data, and the first step will be learning how to clean it properly — because that’s what you have to do with real data and a real problem.
So the next time you read about some breakthrough in machine learning, consider whether it is part of a practical application, or instead, more of a laboratory experiment performed in isolation, using a tried-and-true dataset instead of wild data.
Some investigations in the public interest require journalists to search through large quantities of official documents. Often the set of documents is very diverse — that is, the format, structure, and even language of the documents might vary greatly.
One of the more impressive investigations I know of is the ongoing Implant Files project, conducted originally by 250 journalists in 36 countries. The purpose: To examine how medical devices (specifically, those implanted into human bodies) are “tested, approved, marketed, and monitored” (source). I’ve heard this project discussed at conferences, and I’m full of admiration for the editors and reporters involved, led by the International Consortium of Investigative Journalists (ICIJ).
At the heart of the investigation, with its first results published in 2018, was “an analysis of more than 8 million device-related health records, including death and injury reports and recalls.”
“The entire process involved text mining, clustering, feature selection, association rules and classification algorithms to identify events not always described consistently in different parts of the data.”
These implanted devices — hip replacements, defibrillators, breast implants, intraocular lenses, and more — are used all around the world. When something goes wrong and a product recall is issued, however, the news might not spread to all the locations where the devices continue to be used in new surgeries for new patients. Moreover, people who already have a faulty implant might not be notified. This is why a global investigation was sorely needed.
In 2018, ICIJ shared “a publicly searchable database of more than 70,000 recalls and safety warnings in 11 countries.” The project has continued since then, and the database now contains “more than 120,000 recalls, safety alerts and field safety notices” for medical devices. Throughout 2019, thousands more records were added.
A December 2018 post details the team’s data methodology for the Implant Files. First, journalists had to get the records — and often, their legitimate requests for public records were denied. Of the 8 million device-related records they managed to obtain, 5.4 million came from the U.S. Food and Drug Administration.
The records “describe cases where a device is suspected to have caused or contributed to a serious injury or death or has experienced a malfunction that would likely lead to harm if it were to recur.”
The value in these records was in the connections — connections among cases, and connections among devices. The ICIJ analysis concluded that “devices that broke, misfired, corroded, ruptured or otherwise malfunctioned after implantation or use were linked to more than 1.7 million injuries and nearly 83,000 deaths” in just one decade.
To identify the records that involved a patient’s death, it was necessary for humans to determine various terms and phrasing used instead of the word “death” in the documents. Eventually they developed “a set of more than 3,400 key phrases” that were used to train the machine learning system. After using that model to extract the relevant records, it was necessary to run them through another algorithm configured to determine whether the implant device had contributed to the death.
A common use of machine learning is to train a model to identify a particular kind of document, or a particular characteristic in a document — and then sort a gigantic set of documents. This produces a much-reduced subset of all documents that match the desired criteria. There might be some false positives in the subset, but it still gives researchers or journalists a big jump forward by eliminating thousands of unwanted documents.
This kind of sorting goes well beyond a simple search for keywords.
“State-based racial segregation laws were incredibly inconvenient, irregular, and, most importantly, unconstitutional.”
—William Sturkey, Ph.D.
A historical perspective on this data collection was provided by William Sturkey, a history professor at UNC, in “On the Books”: Machine Learning Jim Crow (September 2020). He says On the Books is “the first and most complete collection of all Jim Crow laws from a single American state.” He points to the difficulty of cataloging and studying all Jim Crow laws from any state “because there were just so many.”
MIT has a cool and easy-to-play game (okay, not really a game, but like a game) in which you get to choose what a self-driving car would do when facing an imminent crash situation.
At the end of one round, you get to see how your moral choices measure up to those of other people who have played. Note that all the drawings of people in the game have distinct meanings. People inside the car are also represented. Try it yourself here.
It is often discussed how the split-second decision affecting who lives, who dies is one of the most difficult aspects of training an autonomous vehicle.
Imagine this scenario:
“The car is programmed to sacrifice the driver and the occupants to preserve the lives of bystanders. Would you get into that car with your child?”
—Meredith Broussard, The Atlantic, 2018
In a 2018 article, Self-Driving Cars Still Don’t Know How to See, data journalist and professor Meredith Broussard tackled this question head-on. We find that the way the question is asked elicits different answers. If you say the driver might die, or be injured, if a child in the street is saved, people tend to respond: Save the child! But if someone says, “You are the driver,” the response tends to be: Save me.
You can see the conundrum. When programming the responses into the self-driving car, there’s not a lot of room for fine-grained moral reasoning. The car is going to decide in terms of (a) Is a crash is imminent? (b) What options exist? (c) Does any option endanger the car’s occupants? (d) Does any option endanger other humans?
In previous posts, I’ve written a little about the weights and probability calculations used in AI algorithms. For the machine, this all comes down to math. If (a) is True, then what options are possible? Each option has a weight. The largest weight wins. The prediction of the “best outcome” is based on probabilities.
The most wonderful thing about YouTube is you can use it to learn just about anything.
One of the 10,000 annoying things about YouTube is finding a good, satisfying version of the lesson you want to learn can take hours of searching. This is especially true of videos about technical aspects of machine learning. Of course there are one- and two-hour recordings of course lectures by computer science professors. But I’ve been seeking out shorter videos with more animations and illustrations of concepts.
Understanding what a neural network is and how it processes data is necessary to demystifying machine learning. Data goes in, results come out — but in between is a “black box” consisting of code and hardware. It sort of works like a human brain, and yet, it really doesn’t.
So here at last is a painless, math-free video that walks us through a neural network. The particular example shown uses the MNIST dataset, which consists of 70,000 images of handwritten digits, 0–9. So the task being performed is the recognition of those digits. (This kind of system can be used to sort mail using postal codes, for example.)
What you’ll see is how the first layer (a vertical line of circles on the left side) represents the input. If each of the MNIST images is 28 pixels wide by 28 pixels high, then that first layer has to represent 784 pixels and each of their color values — which is a number. (One image is the input — only one at a time.)
The final vertical layer, all the way to right side, is the output of the neural network. In this example, the output tells us which digit was in the input — 0, 1, 2, etc. To see the value in this, go back to the mail-sorting idea. If a system can read postal codes, it recognizes several numbers and then transmits them to another system that “knows” which postal code goes to which geographical location. My letter gets sorted into the Florida bin and yours into the bin for your home.
In between the input and the output are the vertical “hidden” layers, and that’s where the real work gets done. In the video you’ll see that the number of circles — often called neurons, but they can also be called just units — in a hidden layer might well be less than the number of units in the input layer. The number of units in the output layer can also differ from the numbers in other layers.
Beautifully, during an animation, our teacher Grant Sanderson explains and shows that the weights exist not in or on the units (the “neurons”) but in fact in or on the connectionsbetween the units.
Okay, I lied a little. There is some math shown here. The weight assigned to the connection is multiplied by the value of the unit to the left. The results are all summed, for all left-side units, and that sum is assigned to the unit to the right (meaning the right side of that one connection).
The video bogs down just a bit between the Sigmoid squishification function and applying the bias, but all you really need to grasp is that the value of the right-side unit shows whether or not that little region of the image (in this case, it’s an image) has a significant difference. The math is there to determine if the color, the amount of color, is significant enough to count. And how much it should count.
I know — math, right?
But seriously, watch the video. It’s excellent.
“And that’s a lot to think about! With this hidden layer of 16 neurons, that’s a total of 784 times 16 weights, along with 16 biases. And all of that is just the connections from the first layer to the second.”
—Grant Sanderson, But what is a neural network? (video)
Sanderson doesn’t burden us with the details of the additional layers. Once you’ve seen the animations for that first step — from the input layer through the connections to the first hidden layer — you’ll have a real appreciation for what’s happening under the hood in a neural network.
In the final 6 minutes of this 19-minute video, you’ll also learn how the “learning” takes place in machine learning when a neural net is involved. All those weights and bias values? They are not determined by humans.
“Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions.”
—Grant Sanderson, But what is a neural network? (video)
I confess it does get rather mathy at the end, but hang on through the parts that are beyond your personal math background and listen to what Sanderson is telling us. You can get a lot out of it even if the equation itself is like hieroglyphics to you.
The video content ends at 16:26, followed by the usual “subscribe to my channel” message. More info about Sanderson and his excellent videos is on his website, 3Blue1Brown.
Reading course descriptions and degree plans has helped me understand more about the fields of artificial intelligence and data science. I think some universities have whipped up a program in one of these hot fields of study just to put something on the books. It’s quite unfair to students if this is just a collection of existing courses and not a deliberate, well structured path to learning.
I came across this page from Northeastern University that attempts to explain the “difference” between artificial intelligence and machine learning. (I use those quotation marks because machine learning is a subset of artificial intelligence.) The university has two different master’s degree programs for artificial intelligence; neither one has “machine learning” in its name — but read on!
One of the two programs does not require a computer science undergraduate degree. It covers data science, robotics, and machine learning.
The other master’s program is for students who do have a background in computer science. It covers “robotic science and systems, natural language processing, machine learning, and special topics in artificial intelligence.”
I noticed that data science is in the program for those without a computer science background, while it’s not mentioned in the other program. This makes sense if we understand that data science and machine learning really go hand in hand nowadays. A data scientist likely will not develop any new machine learning systems, but she will almost certainly use machine learning to solve some problems. Training in statistics is necessary so that one can select the best algorithm for use in machining learning for solving a particular problem.
Graduates of the other program, with their prior experience in computer science, should be ready to break ground with new and original AI work. They are not going to analyze data for firms and organizations. Instead, they are going to develop new systems that handle data in new ways.
The distinction between these two degree programs highlights a point that perhaps a lot of people don’t yet understand: people (like journalists who have code experience) are training models — using machine learning systems through writing code to control them — and yet they are not people who create new machine learning systems.
Separately there are developers who create new AI software systems, and engineers who create new AI hardware systems. In other words, there are many different roles in the AI field.
Finally, there are so-called AI systems sold to banks and insurance companies, and many other types of firms, for which the people using the system do not write code at all. Using them requires data to be entered, and results are generated (such as whose insurance rates will go up next year). The workers who use these systems don’t write code any more than an accountant writes code. Moreover, they can’t explain how the system works — they need only know what goes in and what comes out.