How might we regulate AI to prevent discrimination?

Discussions about regulation of AI, and algorithms in general, often revolve around privacy and misuse of personal data. Protections against bias and unfair treatment are also part of this conversation.

In a recent article in Harvard Business Review, lawyer Andrew Burt (who might prefer to be called a “legal engineer”) wrote about using existing legal standards to guide efforts at ensuring fairness in AI–based systems. In the United States, these include the Equal Credit Opportunity Act, the Civil Rights Act, and the Fair Housing Act.

Photo by Tingey Injury Law Firm on Unsplash

Burt emphasizes the danger of unintentional discrimination, which can arise from basing the “knowledge” in the system on past data. You might think it would make sense to train an AI to do things the way your business has done things in the past — but if that means denying loans disproportionately to people of color, then you’re baking discrimination right into the system.

Burt linked to a post on the Google AI Blog that in turn links to a GitHub repo for a set of code components called ML-fairness-gym. The resource lets developers build a simulation to explore potential long-term impacts of a machine learning decision system — such as one that would decide who gets a loan and who doesn’t.

In several cases, long-term analysis via simulations showed adverse unintended consequences that arose from decisions made by ML. These are detailed in a paper by Google researchers. We can see that determining the true outcomes of use of AI systems is not just a matter of feeding in the data and getting a reliable model to churn out yes/no decisions for a firm.

It makes me wonder about all the cheerleading and hype around “business solutions” offered by large firms such as Deloitte. Have those systems been tested for their long-term effects? Is there any guarantee of fairness toward the people whose lives will be affected by the AI system’s decisions?

And what is “fair,” anyway? Burt points out that statistical methods used to detect a disparate impact depend on human decisions about “what ‘fairness’ should mean in the context of each specific use case” — and also how to measure fairness.

The same applies to the law — not only in how it is written but also in how it is interpreted. Humans write the laws, and humans sit in judgment. However, legal standards are long established and can be used to place requirements on companies that produce, deploy, and use AI systems, Burt suggests.

  • Companies must “carefully monitor and document all their attempts to reduce algorithmic unfairness.”
  • They must also “generate clear, good faith justifications for using the models” that are at the heart of the AI systems they develop, use, or sell.

If these suggested standards were applied in a legal context, it could be shown whether a company had employed due diligence and acted responsibly. If the standards were written into law, companies that deploy unfair and discriminatory AI systems could be held liable and face penalties.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Comment moderation as a machine learning case study

Continuing my summary of the lessons in Introduction to Machine Learning from the Google News Initiative, today I’m looking at Lesson 5 of 8, “Training your Machine Learning model.” Previous lessons were covered here and here.

Now we get into the real “how it works” details — but still without looking at any code or computer languages.

The “lesson” (actually just a text) covers a common case for news organizations: comment moderation. If you permit people to comment on articles on your site, machine learning can be used to identify offensive comments and flag them so that human editors can review them.

With supervised learning (one of three approaches included in machine learning; see previous post here), you need labeled data. In this case, that means complete comments — real ones — that have already been labeled by humans as offensive or not. You need an equally large number of both kinds of comments. Creating this dataset of comments is discussed more fully in the lesson.

You will also need to choose a machine learning algorithm. Comments are text, obviously, so you’ll select among the existing algorithms that process language (rather than those that handle images and video). There are many from which to choose. As the lesson comes from Google, it suggests you use a Google algorithm.

In all AI courses and training modules I’ve looked at, this step is boiled down to “Here, we’ll use this one,” without providing a comparison of the options available. This is something I would expect an experienced ML practitioner to be able to explain — why are they using X algorithm instead of Y algorithm for this particular job? Certainly there are reasons why one text-analysis algorithm might be better for analyzing comments on news articles than another one.

What is the algorithm doing? It is creating and refining a model. The more accurate the final model is, the better it will be at predicting whether a comment is offensive. Note that the model doesn’t actually know anything. It is a computer’s representation of a “world” of comments in which some — with particular features or attributes perceived in the training data — are rated as offensive, and others — which lack a sufficient quantity of those features or attributes — are rated as not likely to be offensive.

The lesson goes on to discuss false positives and false negatives, which are possibly unavoidable — but the fewer, the better. We especially want to eliminate false negatives, which are offensive comments not flagged by the system.

“The most common reason for bias creeping in is when your training data isn’t truly representative of the population that your model is making predictions on.”

—Lesson 6, Bias in Machine Learning

Lesson 6 in the course covers bias in machine learning. A quick way to understand how ML systems come to be biased is to consider the comment-moderation example above. What if the labeled data (real comments) included a lot of comments offensive to women — but all of the labels were created by a team of men, with no women on the team? Surely the men would miss some offensive comments that women team members would have caught. The training data are flawed because a significant number of comments are labeled incorrectly.

There’s a pretty good video attached to this lesson. It’s only 2.5 minutes, and it illustrates interaction bias, latent bias, and selection bias.

Lesson 6 also includes a list of questions you should ask to help you recognize potential bias in your dataset.

It was interesting to me that the lesson omits a discussion of how the accuracy of labels is really just as important as having representative data for training and testing in supervised learning. This issue is covered in ImageNet and labels for data, an earlier post here.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Racial and gender bias in AI

Different AI systems do different things when they attempt to identify humans. Everyone has heard about face recognition (a k a facial recognition), which you might expect would return a name and other personal data about a person whose face is “seen” with a camera.

No, not always.

A system that analyzes human faces might simply try to return information about the person that you or I would tag in our minds when we see a stranger. The person’s gender, for example. That’s relatively easy to do most of the time for most humans — but it turns out to be tricky for machines.

Machines often get it wrong when trying to identify the gender of a trans person. But machines also misidentify the gender of people of color. In particular, they have a big problem recognizing Black women as women.

A short and good article about this ran in Time magazine in 2019, and the accompanying video is well worth watching. It shows various face recognition software systems at work.

Another serious problem concerns differentiating among people of Asian descent. When apartment buildings and other housing developments have installed face recognition as a security system — to open for residents and stay locked for others — the Asian residents can find themselves locked out of their own home. The doors can also open for Asian people who don’t live there.

You can find a lot of articles about this widespread and very serious problem with AI technology, including the deservedly famous mug shots test by the American Civil Liberties Union.

“While it is usually incorrect to make statements across algorithms, we found empirical evidence for the existence of demographic differentials in the majority of the face recognition algorithms we studied.”

—Patrick Grother, NIST computer scientist

So how does this happen? How do companies with almost infinite resources deploy products that are so seriously — and even dangerously — flawed?

Yesterday I wrote a little about training data for object-detection AI. To identify any image, or any part of an image, an AI system is usually trained on an immense set of images. If you want to identify human faces, you feed the system hundreds of thousands, or even millions, of pictures of human faces. If you’re using supervised learning to train the system, the images are labeled: Man, woman. Black, white. Old, young. Convicted criminal. Sex offender. Psychopath.

Who is in the images? How are those images labeled?

This is part of how the whole thing goes sideways. There’s more to it, though. Before a system is marketed, or released to the public, its developers are going to test it. They’re going to test the hell out of it. This can be compared with when an AI is developed that plays a particular game, like Go, or chess. After the system has been trained, you test it. To test the system, you’re going to have it play, and see if it can win — consistently. So when developers create a face recognition system, and they’ve tested it extensively, and they say, great, now it’s ready for the public, it’s ready for commercial use — ask yourself how they missed these glaring flaws.

Ask yourself how they missed the fact that the system can’t differentiate between various Asian faces.

Ask yourself how they missed the fact that the system identifies Black women as men.

Fortunately, in just the past year these flaws have received so much attention that a number of large firms (Amazon, IBM, Microsoft) have pulled back on commercial deployments of face recognition technologies. Whether they will be able to build more trustworthy systems remains to be seen.

More about bias in face recognition systems:

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.