Book notes: Hello World, by Hannah Fry

I finished reading this book back in April, and I’d like to revisit it before I read a couple of new books I just got. This was published in 2018, but that’s no detriment. The author, Hannah Fry, is a “mathematician, science presenter and all-round badass,” according to her website. She’s also a professor at University College London. Her bio at UCL says: “She was trained as a mathematician with a first degree in mathematics and theoretical physics, followed by a PhD in fluid dynamics.”

The complete title, Hello World: Being Human in the Age of Algorithms, doesn’t sound like this is a book about artificial intelligence. She refers to control, and “the boundary between controller and controlled,” from the very first pages, and this reflects the link between “just” talking about algorithms and talking about AI. Software is made of algorithms, and AI is made of software, so there we go.

In just over 200 pages and seven chapters simply titled Power, Data, Justice, Medicine, Cars, Crime, and Art, this author organizes primary areas of concern for the question of “Are we in control?” and provides examples in each area.

Power. I felt disappointed when I saw this chapter starts with Deep Blue beating world chess champion Garry Kasparov in 1997 — but my spirits soon lifted as I saw how she framed this example as the way we perceive a computer system affects how we interact with it (shades of Sherry Turkle and Reeves & Nass). She discusses machine learning and image recognition here, briefly. She talks about people trusting GPS map directions and search engines. She explains a 2012 ACLU lawsuit involving Medicaid assistance, bad code, and unwarranted trust in code. Intuition tells us when something seems “off,” and that’s a critical difference between us and the machines.

Algorithms “are what makes computer science an actual science.”

—Hannah Fry, p. 8

Data. Sensibly, this chapter begins with Facebook and the devil’s bargain most of us have made in giving away our personal information. Fry talks about the first customer loyalty cards at supermarkets. The pregnant teenager/Target story is told. In explaining how data brokers operate, Fry describes how companies buy access to you via your interests and your past behaviors (not only online). She summarizes a 2017 DEFCON presentation that showed how supposedly anonymous browsing data is easily converted into real names, and the dastardly Cambridge Analytica exploit. I especially liked how she explains how small the effects of newsfeed manipulation are likely to be (based on research) and then adds — a small margin might be enough to win an election. This chapter wraps up with China’s citizen rating system (Black Mirror in reality) and the toothlessness of GDPR.

Justice. First up is inequality in sentences for crimes, using two U.K. examples. Fry then surveys studies where multiple judges ruled on the same hypothetical cases and inconsistencies abounded. Then the issues with sentencing guidelines (why judges need to be able to exercise discretion). So we arrive at calculating the probability that a person will “re-offend”: the risk assessment. Fry includes a nice, simple decision-tree graphic here. She neatly explains the idea of combining multiple decision trees into an ensemble, used to average the results of all the trees (the random forest algorithm is one example). More examples from research; the COMPAS product and the 2016 ProPublica investigation. This leads to a really nice discussion of bias (pp. 65–71 in the U.S. paperback edition).

Medicine. Although image recognition was mentioned very briefly earlier, here Fry gets more deeply into the topic, starting off with the idea of pattern recognition — and what pattern, exactly, is being recognized? Classifying and detecting anomalies in biopsy slides doesn’t have perfect results when humans do it, so this is one of the promising frontiers for machine learning. Fry describes neural networks here. She gets into specifics about a system trained to detect breast cancer. But image recognition is not necessarily the killer app for medical diagnosis. Fry describes a study of 678 nuns (which previously I’d never heard about) in which it was learned that essays the nuns had written before taking vows could be used to predict which nuns would have dementia later in life. The idea is that an analysis of more data about women (not only their mammograms) could be a better predictor of malignancy.

“Even when our detailed medical histories are stored in a single place (which they often aren’t), the data itself can take so many forms that it’s virtually impossible to connect … in a way that’s useful to an algorithm.”

—Hannah Fry, p. 103

The Medicine chapter also mentions IBM Watson; challenges with labeling data; diabetic retinopathy; lack of coordination among hospitals, doctor’s offices, etc., that lead to missed clues; privacy of medical records. Fry zeroes in on DNA data in particular, noting that all those “find your ancestors” companies now have a goldmine of data to work with. Fry ends with a caution about profit — whatever medical systems might be developed in the future, there will always be people who stand to gain and others who will lose.

Cars. I’m a little burnt out of the topic of self-driving cars, having already read a lot about them. I liked that Fry starts with DARPA and the U.S. military’s longstanding interest in autonomous vehicles. I can’t agree with her that “the future of transportation is driverless” (p. 115). After discussing LiDAR and the flaws of GPS and conflicting signals from different systems in one car, Fry takes a moment to explain Bayes’ theorem, saying it “offers a systematic way to update your belief in a hypothesis on the basis of evidence,” and giving a nice real-world example of probabilistic inference. And of course, the trolley problem. She brings up something I don’t recall seeing before: Humans are going to prank autonomous vehicles. That opens a whole ‘nother box of trouble. Her anecdote under the heading “The company baby” leads to a warning: Always flying on autopilot can have unintended consequences when the time comes to fly manually.

Crime. This chapter begins with a compelling anecdote, followed by a neat historical case from France in the 1820s, and then turns to predictive policing and all its woes. I hadn’t read about the balance between the buffer zone and distance decay in tracking serial criminals, so that was interesting — it’s called the geoprofiling algorithm. I also didn’t know about Jack Maple, a New York City police officer, and his “Charts of the Future” depicting stations of the city’s subway system, which evolved into a data tool named CompStat. I enjoyed learning what burglaries and earthquakes have in common. And then — PredPol. There have been thousands of articles about this since its debut in 2011, as Fry points out. Her summary of the issues related to how police use predictive policing data is quite good, compact and clear. PredPol is one specific product, and not the only one. It is, Fry says, “a proprietary algorithm, so the code isn’t available to the public and no one knows exactly how it works” (p. 157).

“The [PredPol] algorithm can’t actually tell the future. … It can only predict the risk of future events, not the events themselves — and that’s a subtle but important difference.”

—Hannah Fry, p. 153

Face recognition is covered in the Crime chapter, which makes perfect sense. Fry offers a case where a white man was arrested based on incorrect identification of him from CCTV footage at a bank robbery. The consequences of being the person arrested by police can be injury or death, as we all know — not to mention the legal expenses as you try to clear your name after the erroneous arrest. Even though accuracy rates are rising, the chances that you will match a face that isn’t yours remains worrying.

“How do you decide on that trade-off between privacy and protection, fairness and safety?”

—Hannah Fry, p. 172

Art. Here we have “a famous experiment” I’d never heard of — Music Lab, where thousands of music fans logged into a music player app, listened to songs, rated them, and chose what to download (back when we downloaded music). The results showed that for all but the very best and very worst songs, the ratings by other people had a huge influence on what was downloaded in different segments of the app. A song that became a massive hit in one “world” was dead and buried in another. This leads us to recommendation engines such as those used by Netflix and Amazon. Predicting how well movies would do at the box office, turned out to be badly unreliable. The trouble is the lack of an objective measure of quality — it’s not “This is cancer/This is not cancer.” Beauty in the eye of the beholder and all that. A recommendation engine is different because it’s not using a quality score — it’s matching similarity. You liked these 10 movies; I like eight of those; chances are I might like the other two.

Fry goes on to discuss programs that create original (or seemingly original) works of art. A system may produce a new musical or visual composition, but it doesn’t come from any emotional basis. It doesn’t indicate a desire to communicate with others, to touch them in any way.

In her Conclusion, Fry returns to the questions about bias, fairness, mistaken identity, privacy — and the idea of the control we give up when we trust the algorithms. People aren’t perfect, and neither are algorithms. Taking the human consequences of machine errors into account at every stage is a step toward accountability. Building in the capability to backtrack and explain decisions, predictions, outputs, is a step toward transparency.

For details about categories of algorithms based on tasks they perform (prioritization, classification, association, filtering; rule-based vs. machine learning), see the Power chapter (pp. 8–13 in the U.S. paperback edition).

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Summary of the challenges facing algorithms, AI

Hayden Field, a technology journalist at Morning Brew, published a series of articles about algorithms and AI earlier this year, and they’ve been on my TBR list.

First up was Nine Experts on the Single Biggest Obstacle Facing AI and Algorithms in the Next Five Years. Experts: Drago Anguelov (Waymo); Kathy Baxter (Salesforce); David Cox (IBM Watson); Natasha Crampton (Microsoft); Mark Diaz (Ethical AI at Google); Charles Isbell (professor and dean, College of Computing, Georgia Institute of Technology); Peter Lofgren (Stripe); Andrew Ng (co-founder and former head, Google Brain); Cathy O’Neil (author, Weapons of Math Destruction).

Predictably, ethics was noted as a big challenge — O’Neil asked what we will do about unfairness in decisions made by algorithms. Diaz pointed to the need for involving “experts from a wide range of disciplines, including non-technical disciplines,” in the development process, long before an end product emerges. This intersects with ethics and fairness, as the absence of experts and stakeholders opens the door wide to omissions and errors. Baxter was explicit about systemic racism that is embedded in both training data and models. She listed “medical care decisions, hiring recommendations, access to housing and social programs, visa application approvals, school exam results, hate speech detection, dynamic pricing algorithms for ride hailing services, and even dating apps” — as well as face recognition and predictive policing.

“In essence, problems that are not purely technical require solutions that are not purely technical.”

—Mark Diaz, Ethical AI at Google

Isbell spoke of systematic solutions that can be widely applied. “We cannot treat minority groups as exceptions and edge cases,” he said. Cox highlighted transparency and explainability, as well as ethics and bias. He also alluded to adversarial attacks as well as the non-adversarial errors that surprise researchers (possibly due to overfitting). He grouped all this under trust. Crampton also focused on fairness and referred to diversity in teams, similar to Diaz’s and Isbell’s concerns.

Anguelov explained the need for reliable simulations so that systems can scale up to real-world use. He’s talking about the Long Tail problem: the real world throws up too many unexpected situations. Simulations allow testing in ways that don’t risk human lives (think self-driving cars). Lofgren also talked about scale, but in terms of personalization — his example is detecting credit card fraud in real-time based on Big Data that detects abusing IP addresses and then drills down to the individual cards being used. Ng talked about the difficulty in making dependable commercial AI products — basically off-the-shelf solutions.

“We will often need to make hard decisions based on competing priorities, including decisions to not build or deploy a system for certain purposes.”

—Natasha Crampton, Microsoft

Second in the series is titled Amex’s Fraud Detection AI Was Ready to Go Live. Then Covid Hit. This article starts with the idea that large AI models in the field will still need adjustments as unforeseen problems crop up. This echos the concerns about scale raised by Anguelov and Lofgren in the first article in the series.

The challenge thrown by COVID-19 was that all existing models had been developed and adjusted in a non-pandemic world. Then the world changed.

Amex’s fraud-detecting systems are a blend of old-school rule-based systems and newer machine learning techniques. A team of about 30 decision scientists monitors the system round-the-clock and updates it when necessary, at least once a year. The pandemic came at a bad time for Amex, just as they were rolling out a new model.

“Since each generation of a gradient-boosting ML model is typically developed on data from earlier that same year, many of the model’s assumptions no longer made sense” in 2020.

—”Amex’s Fraud Detection AI Was Ready to Go Live. Then Covid Hit”

This is a really interesting article — although I’d read others about issues caused for AI models by pandemic changes, most of those had to do with either healthcare or travel.

Because of increased online traffic in 2020 — more people online, every day, as the pandemic drove work-from-home and stay-at-home schooling — demands on Amazon Web Services (providing servers and processing power to millions of commercial clients such as Amex) grew enormously. This “dwindling cloud capacity” meant testing new solutions for Amex’s model took much longer than usual. The team had to run new simulations that took our new way of life into account, and those simulations required lots of processor juice.

In the end, Amex’s rollout was successful — but it came months later than originally planned. This was a really neat case study and could be discussed in a lot of different contexts.

I’m going to look at the other articles in the series in tomorrow’s post.

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Image recognition in medicine: MS subtypes

Machine learning systems for image recognition aren’t always perfect — and neither are AI systems marketed for medical use, whether they use image recognition or not. But here’s an example of image recognition used in a medical context where the system appears to have succeeded at something significant — and it’s something humans can’t do, or at least can’t do well.

“Researchers used the AI tool Subtype and Stage Inference (SuStaIn) to scan the MRI brain scans of 6,322 patients with MS, letting SuStaIn train itself unsupervised. The AI identified 3 previously unknown patterns …” (Pharmacy Times). The model was then tested on MRIs from “a separate independent cohort of 3,068 patients” and successfully identified the three new MS subtypes in them.

Subtype and Stage Inference (SuStaIn) was introduced in this 2018 paper. It is an “unsupervised machine-learning technique that identifies population subgroups with common patterns of disease progression” using MRI images. The original researchers were studying dementia.

Why does it matter? Identifying the subtype of the disease multiple sclerosis (MS) enables doctors to pursue different treatments for them, which might lead to better results for patients.

“While further clinical studies are needed, there was a clear difference, by subtype, in patients’ response to different treatments and in accumulation of disability over time. This is an important step towards predicting individual responses to therapies,” said Dr. Arman Eshaghi, the lead researcher (EurekAlert).

Sources: Artificial Intelligence Weekly newsletter, from The Wall Street Journal; Pharmacy Times; EurekAlert.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

AI building blocks: What are algorithms?

In thinking about how to teach non–computer science students about AI, I’ve been considering what fundamental concepts they need to understand. I was thinking about models and how to explain them. My searches led me to this 8-minute BBC video: What exactly is an algorithm?

I’ve explained algorithms to journalism students in the past — usually I default to the “a set of instructions” definition and leave it at that. What I admire about this upbeat, lively video is not just that it goes well beyond that simple explanation but also that it brings in experts to talk about how various and wide-ranging algorithms are.

The young presenter, Jon Stroud, starts out with no clue what algorithms are. He begins with some web searching and finds Victoria Nash, of the Oxford Internet Institute, who provides the “it’s like a recipe” definition. Then he gets up off his butt and visits the Oxford Internet Institute, where Bernie Hogan, senior research fellow, gives Stroud a tour of the server room and a fuller explanation.

“Algorithms calculate based on a bunch of features, the sort of things that will put something at the top of the list and then something at the bottom of the list.”

—Bernie Hogan, Oxford Internet Institute

He meets up with Isabel Maccabee at Northcoders, a U.K. coding school, and participates in a fun little drone-flying competition with an algorithm.

“The person writing the code could have written an error, and that’s where problems can arise, but the computer doesn’t make mistakes. It just does what it’s supposed to do.”

—Isabel Maccabee, Northcoders

Stroud also visits Allison Gardner, of Women Leading in AI, to talk about deskilling and the threats and benefits of computers in general.

This video provides an enjoyable introduction with plenty of ideas for follow-up discussion. It provides a nice grounding that includes the fact that not everything powerful about computer technology is AI!

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

How to start learning about algorithms

After writing yesterday’s post, I was thinking about how much students should know about algorithms if they are to have a basic understanding of how AI works. Is it enough to tell them an algorithm is a set of instructions?

So I turned, as I often do, to Khan Academy — a free online learning site that often helps me through my lack of a mathematics background. I found a set of three short lessons, starting with a video.

Screenshot from Khan Academy video

In the introductory video, “What is an algorithm and why should you care?”, we see various practical uses of algorithms, followed by the statement above, and a brief description of how route finding works — what Google Maps does when it gives you directions. Route finding is often used as an example of accepting a “good enough” output for the sake of speed (that is, efficiency).

Watching the animation, we comprehend that the computer is following a set of instructions to determine a good route for a delivery truck with 25 stops to make. We see the process of the algorithm at work, rather than seeing formulas and equations.

I love that the video also shows us, with animation, how the efficiency of an algorithm is calculated.

The second lesson, “A guessing game,” demonstrates binary search (an algorithm) by allowing you to discover it interactively. Wonderful!

The third lesson, “Route-finding,” is much more reading intensive. It explains the algorithm in terms of solving a maze. Without knowing the exact path to solve the maze, the algorithm can “know” which choice for its next step takes it closer to the goal (the center of the maze). I don’t consider this lesson very helpful, but that’s because I saw a much better explanation of maze-solving algorithms here:

Start video at 54:35 for demo of the greedy best-first search algorithm

I am continually amazed and humbled by the variety of ways in which people teach these concepts. More important, I realize how some ways of explaining a concept are not at all effective — for me, at least — and another way of explaining makes it clear as crystal.

So, how much should students know about algorithms, if they are to have a general understanding of AI? I think a good start would be to watch and discuss the introductory Khan Academy video, and also to see a further visual (probably animated) representation of another kind of algorithm at work.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

What do we talk about when we talk about algorithms?

Mashable recently published a series about algorithms.

  1. What is an algorithm, anyway?
  2. Algorithms control your online life. Here’s how to reduce their influence.
  3. It’s almost impossible to avoid triggering content on TikTok
  4. The algorithms defining sexuality suck. Here’s how to make them better.
  5. Why it’s impossible to forecast the weather too far into the future (The Dominance of Chaos)
  6. 12 unexpected ways algorithms control your life
  7. People are fighting algorithms for a more just and equitable future. You can, too.
  8. How to escape your social media bubble before the election
  9. An open letter to the most disappointing algorithms in my life

The first post, “What is an algorithm, anyway?”, addresses the fact that the word algorithm is often bandied about as if it means a mysterious, possibly evil, machine-embedded power.

But an algorithm doesn’t need to have anything to do with computers. An algorithm is a set of instructions for how to solve a problem. A recipe for a cake is an algorithm.

Image by Gerd Altmann from Pixabay

And yes, of course, computer software is full of algorithms. The programs that make machine learning and artificial intelligence work are full of algorithms. So algorithms are not magical, and they are not good or bad by nature. Also, they are not perfect.

We went through a period — maybe five years, maybe more — when there were a ton of articles about algorithms, and the word became almost common in nonfiction book titles. Now I see a shift toward the term AI — or artificial intelligence, or machine learning — substituting for algorithms in provocative headlines.

Too many articles, though, don’t make much of an effort to differentiate, to explain what they’re really talking about. They may as well just say computers, or software.

An algorithm is real. It is constructed by a person, or people, to do a certain task. Algorithms are often combined, so that inside one algorithm, another algorithm is followed. Thus algorithms can be components of other algorithms.

Photo by Mindy McAdams

I’m often reminded of a book I read three years ago, Algorithms to Live By: The Computer Science of Human Decisions. It was fun to read, but it was hardly the breezy self-help type of thing the cover blurbs might lead one to believe. The authors describe and explain a number of established algorithms used widely in various fields and applications — and they apply each one to everyday life.

Stories about the people who discovered (authored) many of the algorithms are woven in. I appreciated seeing how someone working on one problem sometimes ended up solving another. I also saw how an algorithm built for one use gets repurposed for other ends. Best of all, I understood what many of the algorithms are meant to do — as well as how they do it.

What I’d like to see in general articles about algorithms is a little more of what Christian and Griffiths managed to do in their book.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

How would you respond to the trolley problem?

MIT has a cool and easy-to-play game (okay, not really a game, but like a game) in which you get to choose what a self-driving car would do when facing an imminent crash situation.

Above: Results from one round of playing the MoralMachine

At the end of one round, you get to see how your moral choices measure up to those of other people who have played. Note that all the drawings of people in the game have distinct meanings. People inside the car are also represented. Try it yourself here.

It is often discussed how the split-second decision affecting who lives, who dies is one of the most difficult aspects of training an autonomous vehicle.

Imagine this scenario:

“The car is programmed to sacrifice the driver and the occupants to preserve the lives of bystanders. Would you get into that car with your child?”

—Meredith Broussard, The Atlantic, 2018

In a 2018 article, Self-Driving Cars Still Don’t Know How to See, data journalist and professor Meredith Broussard tackled this question head-on. We find that the way the question is asked elicits different answers. If you say the driver might die, or be injured, if a child in the street is saved, people tend to respond: Save the child! But if someone says, “You are the driver,” the response tends to be: Save me.

You can see the conundrum. When programming the responses into the self-driving car, there’s not a lot of room for fine-grained moral reasoning. The car is going to decide in terms of (a) Is a crash is imminent? (b) What options exist? (c) Does any option endanger the car’s occupants? (d) Does any option endanger other humans?

In previous posts, I’ve written a little about the weights and probability calculations used in AI algorithms. For the machine, this all comes down to math. If (a) is True, then what options are possible? Each option has a weight. The largest weight wins. The prediction of the “best outcome” is based on probabilities.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

What is a neural network and how does it work?

The most wonderful thing about YouTube is you can use it to learn just about anything.

One of the 10,000 annoying things about YouTube is finding a good, satisfying version of the lesson you want to learn can take hours of searching. This is especially true of videos about technical aspects of machine learning. Of course there are one- and two-hour recordings of course lectures by computer science professors. But I’ve been seeking out shorter videos with more animations and illustrations of concepts.

Understanding what a neural network is and how it processes data is necessary to demystifying machine learning. Data goes in, results come out — but in between is a “black box” consisting of code and hardware. It sort of works like a human brain, and yet, it really doesn’t.

So here at last is a painless, math-free video that walks us through a neural network. The particular example shown uses the MNIST dataset, which consists of 70,000 images of handwritten digits, 0–9. So the task being performed is the recognition of those digits. (This kind of system can be used to sort mail using postal codes, for example.)

What you’ll see is how the first layer (a vertical line of circles on the left side) represents the input. If each of the MNIST images is 28 pixels wide by 28 pixels high, then that first layer has to represent 784 pixels and each of their color values — which is a number. (One image is the input — only one at a time.)

The final vertical layer, all the way to right side, is the output of the neural network. In this example, the output tells us which digit was in the input — 0, 1, 2, etc. To see the value in this, go back to the mail-sorting idea. If a system can read postal codes, it recognizes several numbers and then transmits them to another system that “knows” which postal code goes to which geographical location. My letter gets sorted into the Florida bin and yours into the bin for your home.

In between the input and the output are the vertical “hidden” layers, and that’s where the real work gets done. In the video you’ll see that the number of circles — often called neurons, but they can also be called just units — in a hidden layer might well be less than the number of units in the input layer. The number of units in the output layer can also differ from the numbers in other layers.

When the video describes edge detection, you might recall an earlier post here.

Beautifully, during an animation, our teacher Grant Sanderson explains and shows that the weights exist not in or on the units (the “neurons”) but in fact in or on the connections between the units.

Okay, I lied a little. There is some math shown here. The weight assigned to the connection is multiplied by the value of the unit to the left. The results are all summed, for all left-side units, and that sum is assigned to the unit to the right (meaning the right side of that one connection).

The video bogs down just a bit between the Sigmoid squishification function and applying the bias, but all you really need to grasp is that the value of the right-side unit shows whether or not that little region of the image (in this case, it’s an image) has a significant difference. The math is there to determine if the color, the amount of color, is significant enough to count. And how much it should count.

I know — math, right?

But seriously, watch the video. It’s excellent.

“And that’s a lot to think about! With this hidden layer of 16 neurons, that’s a total of 784 times 16 weights, along with 16 biases. And all of that is just the connections from the first layer to the second.”

—Grant Sanderson, But what is a neural network? (video)

Sanderson doesn’t burden us with the details of the additional layers. Once you’ve seen the animations for that first step — from the input layer through the connections to the first hidden layer — you’ll have a real appreciation for what’s happening under the hood in a neural network.

In the final 6 minutes of this 19-minute video, you’ll also learn how the “learning” takes place in machine learning when a neural net is involved. All those weights and bias values? They are not determined by humans.

“Digging into what the weights and biases are doing is a good way to challenge your assumptions and really expose the full space of possible solutions.”

—Grant Sanderson, But what is a neural network? (video)

I confess it does get rather mathy at the end, but hang on through the parts that are beyond your personal math background and listen to what Sanderson is telling us. You can get a lot out of it even if the equation itself is like hieroglyphics to you.

The video content ends at 16:26, followed by the usual “subscribe to my channel” message. More info about Sanderson and his excellent videos is on his website, 3Blue1Brown.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Sorting out a degree in artificial intelligence

Reading course descriptions and degree plans has helped me understand more about the fields of artificial intelligence and data science. I think some universities have whipped up a program in one of these hot fields of study just to put something on the books. It’s quite unfair to students if this is just a collection of existing courses and not a deliberate, well structured path to learning.

I came across this page from Northeastern University that attempts to explain the “difference” between artificial intelligence and machine learning. (I use those quotation marks because machine learning is a subset of artificial intelligence.) The university has two different master’s degree programs for artificial intelligence; neither one has “machine learning” in its name — but read on!

Illustration by chenspec at Pixabay

One of the two programs does not require a computer science undergraduate degree. It covers data science, robotics, and machine learning.

The other master’s program is for students who do have a background in computer science. It covers “robotic science and systems, natural language processing, machine learning, and special topics in artificial intelligence.”

I noticed that data science is in the program for those without a computer science background, while it’s not mentioned in the other program. This makes sense if we understand that data science and machine learning really go hand in hand nowadays. A data scientist likely will not develop any new machine learning systems, but she will almost certainly use machine learning to solve some problems. Training in statistics is necessary so that one can select the best algorithm for use in machining learning for solving a particular problem.

Graduates of the other program, with their prior experience in computer science, should be ready to break ground with new and original AI work. They are not going to analyze data for firms and organizations. Instead, they are going to develop new systems that handle data in new ways.

The distinction between these two degree programs highlights a point that perhaps a lot of people don’t yet understand: people (like journalists who have code experience) are training models — using machine learning systems through writing code to control them — and yet they are not people who create new machine learning systems.

Separately there are developers who create new AI software systems, and engineers who create new AI hardware systems. In other words, there are many different roles in the AI field.

Finally, there are so-called AI systems sold to banks and insurance companies, and many other types of firms, for which the people using the system do not write code at all. Using them requires data to be entered, and results are generated (such as whose insurance rates will go up next year). The workers who use these systems don’t write code any more than an accountant writes code. Moreover, they can’t explain how the system works — they need only know what goes in and what comes out.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Comment moderation as a machine learning case study

Continuing my summary of the lessons in Introduction to Machine Learning from the Google News Initiative, today I’m looking at Lesson 5 of 8, “Training your Machine Learning model.” Previous lessons were covered here and here.

Now we get into the real “how it works” details — but still without looking at any code or computer languages.

The “lesson” (actually just a text) covers a common case for news organizations: comment moderation. If you permit people to comment on articles on your site, machine learning can be used to identify offensive comments and flag them so that human editors can review them.

With supervised learning (one of three approaches included in machine learning; see previous post here), you need labeled data. In this case, that means complete comments — real ones — that have already been labeled by humans as offensive or not. You need an equally large number of both kinds of comments. Creating this dataset of comments is discussed more fully in the lesson.

You will also need to choose a machine learning algorithm. Comments are text, obviously, so you’ll select among the existing algorithms that process language (rather than those that handle images and video). There are many from which to choose. As the lesson comes from Google, it suggests you use a Google algorithm.

In all AI courses and training modules I’ve looked at, this step is boiled down to “Here, we’ll use this one,” without providing a comparison of the options available. This is something I would expect an experienced ML practitioner to be able to explain — why are they using X algorithm instead of Y algorithm for this particular job? Certainly there are reasons why one text-analysis algorithm might be better for analyzing comments on news articles than another one.

What is the algorithm doing? It is creating and refining a model. The more accurate the final model is, the better it will be at predicting whether a comment is offensive. Note that the model doesn’t actually know anything. It is a computer’s representation of a “world” of comments in which some — with particular features or attributes perceived in the training data — are rated as offensive, and others — which lack a sufficient quantity of those features or attributes — are rated as not likely to be offensive.

The lesson goes on to discuss false positives and false negatives, which are possibly unavoidable — but the fewer, the better. We especially want to eliminate false negatives, which are offensive comments not flagged by the system.

“The most common reason for bias creeping in is when your training data isn’t truly representative of the population that your model is making predictions on.”

—Lesson 6, Bias in Machine Learning

Lesson 6 in the course covers bias in machine learning. A quick way to understand how ML systems come to be biased is to consider the comment-moderation example above. What if the labeled data (real comments) included a lot of comments offensive to women — but all of the labels were created by a team of men, with no women on the team? Surely the men would miss some offensive comments that women team members would have caught. The training data are flawed because a significant number of comments are labeled incorrectly.

There’s a pretty good video attached to this lesson. It’s only 2.5 minutes, and it illustrates interaction bias, latent bias, and selection bias.

Lesson 6 also includes a list of questions you should ask to help you recognize potential bias in your dataset.

It was interesting to me that the lesson omits a discussion of how the accuracy of labels is really just as important as having representative data for training and testing in supervised learning. This issue is covered in ImageNet and labels for data, an earlier post here.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.