Using machine learning to uncover racist laws

A common use of machine learning is to train a model to identify a particular kind of document, or a particular characteristic in a document — and then sort a gigantic set of documents. This produces a much-reduced subset of all documents that match the desired criteria. There might be some false positives in the subset, but it still gives researchers or journalists a big jump forward by eliminating thousands of unwanted documents.

This kind of sorting goes well beyond a simple search for keywords.

Above: Screenshot from On the Books at lib.unc.edu

A great example has emerged from the University of North Carolina at Chapel Hill. On the Books: Jim Crow and Algorithms of Resistance is a project that includes a public plain-text collection of North Carolina laws (1866–1967) likely to be Jim Crow laws.

There is a public GitHub repo of the code used in this project. It includes a full walkthrough of the project’s workflow — data acquisition and cleaning, OCR, unsupervised and supervised classification, etc.

The base document set (the main corpus) consists of 96 volumes, with 53,515 chapters, having 297,790 sections (source).

The project’s title gives homage to Safiya Noble’s 2018 book Algorithms of Oppression: How Search Engines Reinforce Racism.

“State-based racial segregation laws were incredibly inconvenient, irregular, and, most importantly, unconstitutional.”

—William Sturkey, Ph.D.

A historical perspective on this data collection was provided by William Sturkey, a history professor at UNC, in “On the Books”: Machine Learning Jim Crow (September 2020). He says On the Books is “the first and most complete collection of all Jim Crow laws from a single American state.” He points to the difficulty of cataloging and studying all Jim Crow laws from any state “because there were just so many.”

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

How might we regulate AI to prevent discrimination?

Discussions about regulation of AI, and algorithms in general, often revolve around privacy and misuse of personal data. Protections against bias and unfair treatment are also part of this conversation.

In a recent article in Harvard Business Review, lawyer Andrew Burt (who might prefer to be called a “legal engineer”) wrote about using existing legal standards to guide efforts at ensuring fairness in AI–based systems. In the United States, these include the Equal Credit Opportunity Act, the Civil Rights Act, and the Fair Housing Act.

Photo by Tingey Injury Law Firm on Unsplash

Burt emphasizes the danger of unintentional discrimination, which can arise from basing the “knowledge” in the system on past data. You might think it would make sense to train an AI to do things the way your business has done things in the past — but if that means denying loans disproportionately to people of color, then you’re baking discrimination right into the system.

Burt linked to a post on the Google AI Blog that in turn links to a GitHub repo for a set of code components called ML-fairness-gym. The resource lets developers build a simulation to explore potential long-term impacts of a machine learning decision system — such as one that would decide who gets a loan and who doesn’t.

In several cases, long-term analysis via simulations showed adverse unintended consequences that arose from decisions made by ML. These are detailed in a paper by Google researchers. We can see that determining the true outcomes of use of AI systems is not just a matter of feeding in the data and getting a reliable model to churn out yes/no decisions for a firm.

It makes me wonder about all the cheerleading and hype around “business solutions” offered by large firms such as Deloitte. Have those systems been tested for their long-term effects? Is there any guarantee of fairness toward the people whose lives will be affected by the AI system’s decisions?

And what is “fair,” anyway? Burt points out that statistical methods used to detect a disparate impact depend on human decisions about “what ‘fairness’ should mean in the context of each specific use case” — and also how to measure fairness.

The same applies to the law — not only in how it is written but also in how it is interpreted. Humans write the laws, and humans sit in judgment. However, legal standards are long established and can be used to place requirements on companies that produce, deploy, and use AI systems, Burt suggests.

  • Companies must “carefully monitor and document all their attempts to reduce algorithmic unfairness.”
  • They must also “generate clear, good faith justifications for using the models” that are at the heart of the AI systems they develop, use, or sell.

If these suggested standards were applied in a legal context, it could be shown whether a company had employed due diligence and acted responsibly. If the standards were written into law, companies that deploy unfair and discriminatory AI systems could be held liable and face penalties.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.