Book notes: Atlas of AI, by Kate Crawford

Published earlier this year by Yale University Press, Atlas of AI carries the subtitle “Power, Politics, and the Planetary Costs of Artificial Intelligence.” This is a remarkably accurate subtitle — or maybe I should say the book fulfills the promise of the subtitle better than many other books do.

Planetary costs are explained in chapter 1, “Earth,” which discusses not only the environment-destroying batteries required by both giant data centers and electric cars but also the immense electrical power requirements of training large language models and others with deep-learning architectures. Extraction is a theme Crawford returns to more than once; here it’s about the extraction of rare earth minerals. Right away we can see in the end notes that this is no breezy “technology of the moment” nonfiction book; the wealth of cited works could feed my curiosity for years of reading.

Photo: Book cover and cat on a porch
Photo copyright © 2021 Mindy McAdams

Crawford comes back to the idea of depleting resources in the Coda, titled “Space,” which follows the book’s conclusion. There she discusses the mineral-extraction ambitions of Jeff Bezos (and other billionaires) as they build their own rockets — they don’t want only to fly into space for their own pleasure and amusement; they also want to pillage it like 16th– to 19th–century Europeans pillaged Africa and the Americas.

Politics are a focus in chapter 6, “State,” and in the conclusion, “Power” — politics not of any political party or platform but rather the politics of domination, of capitalism, of the massive financial resources of Bezos and Silicon Valley. Crawford has done a great job of laying the groundwork for these final chapters without stating the same arguments in the earlier chapters, which is a big peeve of mine when reading many books about the progress of technologies — that is, the author has told me the same thing so many times before the conclusion that I am already bored with the ideas. That’s not what happened here.

Chapter 2, “Labor,” focuses on low pay, surveillance of workers, deskilling, and time in particular. It’s a bit of “how the sausage gets made,” which is nothing new to me because I’ve been interested for a while already in how data gets labeled by a distributed global workforce. I like how Crawford frames it, in part, as not being about robots who will take our skilled jobs — in fact, that tired old trope is ignored in this book. The more real concern is that like the minerals being extracted to feed the growing AI industrial complex, the labor of many, many humans is required to enable the AI industrial complex to function. Workers’ time at work is increasingly monitored down to the second, and using analysis of massive datasets, companies such as Amazon can track and penalize anyone whose output falls below the optimum. The practice of “faking AI” with human labor is likened to Potemkin villages (see Sadowski, 2018), and we should think about how many of those so-called AI-powered customer service systems (and even decision-support systems) are really “Potemkin AI.” (See also “The Automation Charade”: Taylor, 2018.) Crawford reminds us of the decades of time-and-motion research aimed at getting more value out of workers in factories and fast-food restaurants. This is a particularly rich chapter.

“Ultimately, ‘data’ has become a bloodless word; it disguises both its material origins and its ends.”

—Crawford, p. 113

In “Data,” the third chapter, Crawford looks at where images of faces have come from — the raw material of face recognition systems. Mug shots, of course, but also scraping all those family photos that moms and dads have posted to social media platforms. This goes beyond face recognition and on to all the data about us that is collected or scraped or bought and sold by the tech firms that build and profit from the AI that uses it as training data to develop systems that in turn can be used to monitor us and our lives. Once again, we’re looking at extraction. Crawford doesn’t discuss ImageNet as much as I expected here (which is fine; it comes around again in the next chapter). She covers the collection of voice data and the quantities of text needed to train large language models, detailing some earlier (1980s–90s) NLP efforts with which I was not familiar. In the section subheaded “The End of Consent,” Crawford covers various cases of the unauthorized capture or collection of people’s faces and images — it got me thinking about how the tech firms never ask permission, and there is no informed consent. Another disturbing point about datasets and the AI systems that consume them: Researchers might brush off criticism by saying they don’t know how their work will be used. (This and similar ethical concerns were detailed in a wonderful New Yorker article earlier this year.)

I’m not sure whether chapter 3 is the first time she mention the commons, but she does, and it will come up again. Even though the publicly available data remains available, she says the collection and mining and classification of public data centers the value of it in private hands. It’s not literally enclosure, but it’s as good as, she argues.

“Every dataset … contains a worldview.”

—Crawford, p. 135

Chapter 4, “Classification,” is very much about power. When you name a thing, you have power over it. When you assign labels to the items in a dataset, you exclude possible interpretations at the same time. Labeling images for race, ethnicity, or gender is as dangerous as labeling human skulls for phrenology. The ground truth is constructed, not pristine, and never free of biases. Here Crawford talks more about ImageNet and the language data, WordNet, on which it was built. I made a margin note here: “boundaries, boxes, centers/margins.” At the end of the chapter, Crawford points out that we can examine training datasets when they are made public, like the UTKFace dataset — but the datasets underlying systems being used on us today by Facebook, TikTok, Google, and Baidu are proprietary and therefore not open to scrutiny.

The chapter I enjoyed most was “Affect,” chapter 5, because it covers lots of unfamiliar territory. A researcher named Paul Ekman (apparently widely known, but unknown to me) figures prominently in the story of how psychologists and others came to believe we can discern a person’s feelings and emotions from the expression on their face. At first you think, yes, that makes sense. But then you learn about how people were asked to “perform” an expression of happiness, or sadness, or fear, etc., and then photographs were made of them pulling those expressions. Based on such photos, machine learning models have been trained. Uh-oh! Yes, you see where this goes. But it gets worse. Based on your facial expression, you might be tagged as a potential shoplifter in a store. Or as a terrorist about to board a plane. “Affect recognition is being built into several facial recognition platforms,” we learn on page 153. Guess where early funding for this research came from? The U.S. Advanced Research Projects Agency (ARPA), back in the 1960s. Now called Defense Advanced Research Projects Agency (DARPA), this agency gets massive funding for research on ways to spy on and undermine the governments of other countries. Classifying types of facial expressions? Just think about it.

In chapter 6, “State,” which I’ve already mentioned, Crawford reminds us that what starts out as expensive, top-secret, high-end military technology later migrates to state and governments and local police for use against our own citizens. Much of this has to do with surveillance, and of course Edward Snowden and his leaked files are mentioned more than once. The ideas of threats and targets are discussed. We recall the chapter about classification. Crawford also brings up the paradox that huge multinationals (Amazon, Apple, Facebook, Google, IBM, Microsoft) suddenly transform into patriotic all–American firms when it comes to developing top-secret surveillance tech that we would not want to share with China, Iran, or Russia. Riiight. There’s a description of the DoD’s Project Maven (which Wired magazine covered in 2018), anchoring a discussion of drone targets. This chapter alerted me to an article titled “Algorithmic warfare and the reinvention of accuracy” (Suchman, 2020). The chapter also includes a long section about Palantir, one of the more creepy data/surveillance/intelligence companies (subject of a long Vox article in 2020). Lots about refugees, ICE, etc., in this chapter. Ring doorbell surveillance. Social credit scores — and not in China! It boils down to domestic eye-in-the-sky stuff, countries tracking their own citizens under the guise of safety and order but in fact setting up ways to deprive the poorest and most vulnerable people even further.

This book is short, only 244 pages before the end notes and reference list — but it’s very well thought-out and well focused. I wish more books about technology topics were this good, with real value in each chapter and a comprehensive conclusion at the end that brings it all together. Also — awesome references! I applaud all the research assistants!

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Rules and ethics for use of AI by governments

The governments of British Columbia and Yukon, in Canada, have jointly issued a report (June 2021) about ethical use of AI in the public sector. It’s interesting to me as it covers issues of privacy and fairness, and in particular, the rights of people to question decisions derived from AI systems. The report notes that the public increasingly expects services provided by governments to be as fast and as personalized as services provided by online platforms such as Amazon — and this leads or will lead to increasing adoption of AI systems to aid in delivery of government services to members of the public.

The report’s concluding recommendations (pages 47–48) cover eight points (edited):

  1. Establish guiding principles for AI use: “Each public authority should make a public commitment to guiding principles for the use of AI that incorporate transparency, accountability, legality, procedural fairness and protection of privacy.”
  2. Inform the public: “If an ADS [automated decision system] is used to make a decision about an individual, public authorities must notify and describe how that system operates to the individual in a way that is understandable.”
  3. Provide human accountability: “Identify individuals within the public authority who are responsible for engineering, maintaining, and overseeing the design, operation, testing and updating of any ADS.”
  4. Ensure that auditing and transparency are possible: “All ADS should include robust and open auditing functionality with enhanced transparency measures for closed-source, proprietary datasets used to develop and update any ADS.”
  5. Protect privacy of individuals: “Wherever possible, public authorities should use synthetic or de-identified data in any ADS.” See synthetic data definition, below.
  6. Build capacity and increase education (for understanding of AI): This point covers “public education initiatives to improve general knowledge of the impact of AI and other emerging technologies on the public, on organizations that serve the public,” etc.; “subject-matter knowledge and expertise on AI across government ministries”; “knowledge sharing and expertise between government and AI developers and vendors”; development of “open-source, high-quality data sets for training and testing ADS”; “ongoing training of ADS administrators” within government agencies.
  7. Amend privacy legislation to include: “an Artificial Intelligence Fairness and Privacy Impact Assessment for all existing and future AI programs”; “the right to notification that ADS is used, an explanation of the reasons and criteria used, and the ability to object to the use of ADS”; “explicit inclusion of service providers to the same obligations as public authorities”; “stronger enforcement powers in both the public and private sector …”; “special rules or restrictions for the processing of highly sensitive information by ADS”; “shorter legislative review periods of 4 years.”
  8. Review legislation to make sure “oversight bodies are able to review AIFPIAs [see item 7 above] and conduct investigations regarding the use of ADS alone or in collaboration with other oversight bodies.”

Synthetic data is defined (on page 51) as: “A type of anonymized data used as a filter for information that would otherwise compromise the confidentiality of certain aspects of data. Personal information is removed by a process of synthesis, ensuring the data retains its statistical significance. To create synthetic data, techniques from both the fields of cryptography and statistics are used to render data safe against current re-identification attacks.”

The report uses the term automated decision systems (ADS) in view of the Government of Canada’s Directive on Automated Decision Making, which defines them as: “Any technology that either assists or replaces the judgement of human decision-makers.”

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

New AI strategy from U.S. Department of Health and Human Services

The Biden Administration is working hard in a wide range of areas, so maybe it’s no surprise that HHS released this report, titled Artificial Intelligence (AI) Strategy (PDF), this month.

“HHS recognizes that Artificial Intelligence (AI) will be a critical enabler
of its mission
in the future,” it says on the first page of the 7-page document. “HHS will leverage AI to solve previously unsolvable problems,” in part by “scaling trustworthy AI adoption across the Department.”

So HHS is going to be buying some AI products. I wonder what they are (will be), and who makes (or will make) them.

“HHS will leverage AI capabilities to solve complex mission challenges and generate AI-enabled insights to inform efficient programmatic and business decisions” — while to some extent this is typical current business jargon, I’d like to know:

  • Which complex mission challenges? What AI capabilities will be applied, and how?
  • Which programmatic and business decisions? How will AI-enabled insights be applied?

These are the kinds of questions journalists will need to ask when these AI claims are bandied about. Name the system(s), name the supplier(s), give us the science. Link to the relevant research papers.

I think a major concern would be use of any technologies coming from Amazon, Facebook, or Google — but I am no less concerned about government using so-called solutions peddled by business-serving firms such as Deloitte.

The following executive orders (both from the previous administration) are cited in the HHS document:

The department will set up a new HHS AI Council to identify priorities and “identify and foster relationships with public and private entities aligned to priority AI initiatives.” The council will also establish a Community of Practice consisting of AI practitioners (page 5).

Four key focus areas:

  1. An AI-ready workforce and AI culture (includes “broad, department-wide awareness of the potential of AI”)
  2. AI research and development in health and human services (includes grants)
  3. “Democratize foundational AI tools and resources” — I like that, although implementation is where the rubber meets the road. This sentence indicates good aspirations: “Readily accessible tools, data assets, resources, and best practices will be critical to minimizing duplicative AI efforts, increasing reproducibility, and ensuring successful enterprise-wide AI adoption.”
  4. “Promote ethical, trustworthy AI use and development.” Again, a fine statement, but let’s see how they manage to put this into practice.

The four focus areas are summarized in a compact chart (image file).

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.