AI colonizes the world

I began at the beginning with journalist Karen Hao’s Artificial intelligence is creating a new colonial world order (April 2022), an introduction to a four-part series that explains the effects of AI with a focus on specific countries.

The more users a company can acquire for its products, the more subjects it can have for its algorithms, and the more resources — data — it can harvest from their activities, their movements, and even their bodies,” Hao wrote. Humans are also exploited for cheap labor, such as labeling data for AI training sets, “often in the Global South.” The ultimate aim for the series, she said, is “to broaden the view of AI’s impact on society so as to begin to figure out how things could be different.

Links to fellow travelers on this road (from the article):

South Africa

In South Africa’s private surveillance machine is fueling a digital apartheid, Hao and co-author Heidi Swart report on high-speed network infrastructure spreading into areas that lack basic necessities such as clean drinking water. Why? All the better to spy on the citizens 24/7 with cameras connected to AI systems, using tools “like license plate recognition to track population movement and trace individuals.” And face recognition? Maybe. Maybe not yet. (Face recognition is addressed near the end of the article.)

“When AI is ‘developed in Europe and America and all of these places,’ says Kyle Dicks, a Johannesburg-based sales engineer for Axis Communications, ‘often South Africa is the place to put them to the test.’”

An AI system originally developed for military use is trained on video footage of so-called normal behavior in an area and then deemed fit to alert human employees to “unusual” activity. The humans can dismiss the alert or escalate it. This is all taking place within a private company. Clients include “schools, businesses, and residential neighborhoods,” which are patrolled by private security firms.

Tracking cars by their license plates can be done outside any police systems, and the journalists raise the question of transparency: Who reported the car, and why? Once the license plate is in the system, when and how does it ever get removed? (The U.S. already has “a massive network of license plate readers.”)

Crime rates are high in South Africa, but that is associated with an immense wealth gap, which in turn is associated with race. “As a result, it’s predominantly white people who have the means to pay for surveillance, and predominantly Black people who end up without a say about being surveilled.” The choice to increase and invest in surveillance does nothing to address the causes of poverty.

This was news to me: “The likelihood that facial recognition software will make a false identification increases dramatically when footage is recorded outdoors, under uncontrolled conditions …” Although this was not a surprise: “… and that risk is much greater for Black people.” (Murray Hunter is researching Vumacam, the private security firm hosting much of the surveillance apparatus in South Africa: “Vumacam’s model is, in the most literal sense, a tech company privatizing the public space.”)

My main takeaway here was that technologies of oppression will be deployed, tested and perfected in developing countries that are not experiencing war or military actions — and then used everywhere else. Moreover, by allowing private companies unregulated access to footage from a network of cameras they control, we compromise privacy and invite a multitude of risks.

Venezuela

Because of its economic collapse, once-rich Venezuela has become a primary source of workers who label data for use in supervised machine learning. “Most profit-maximizing algorithms, which underpin e-commerce sites, voice assistants, and self-driving cars, are based on” this type of deep learning, which requires correctly labeled data for training a system that then “recognizes” objects, images, phrases, hate speech, sounds, etc. In How the AI industry profits from catastrophe, Hao and Andrea Paola Hernández explain how data annotation is just another form of exploitative gig work.

“The Venezuela example made so clear how it’s a mixture of poverty and good infrastructure that makes this type of phenomenon possible. As crises move around, it’s quite likely there will be another country that could fulfill that role.”

—Florian Alexander Schmidt, professor, University of Applied Sciences HTW Dresden

Labeling dashboard-camera video as training data for self-driving cars pushed the business of data annotation to expand in 2017, as it requires not only millions of hours but also “the highest levels of annotation accuracy” because of the life-or-death consequences of errors. Scale AI (founded in 2016) profited from the demand for quality, devising and refining systems that maximize the output of remote contract workers. Other companies capitalized on the crisis in Venezuela sooner, according to this article, but Scale AI was not far behind.

Appen — a firm that recruits data annotators for Google, YouTube, and Facebook — presents the worker with a queue of tasks ranging from “image tagging to content moderation to product categorization.” Tasks are divided into units and labeled with a (very low) payment per unit. As tasks are completed, the payments add up in an electronic wallet. Appen “adjusts its pay per task to the minimum wage of each worker’s locale,” according to the company’s chief technology officer. The workers supply their own laptop and internet service without compensation.

With the pandemic and an increasing number of Venezuelans competing for tasks on Appen, more people signed onto Remotasks Plus, a platform controlled by Scale AI, which was recruiting aggressively on social media. (The article also mentions Hive Micro, “the easiest service to join, [but] it offers the most disturbing work — such as labeling terrorist imagery — for the most pitiful pay.”)

The article describes bait-and-switch tactics — and retaliation against workers who protest — that will be familiar to anyone who has followed labor reporting about Uber and Lyft over the past few years. The Remo Plus platform was also plagued with technical problems and finally shut down, leaving some workers unpaid, according to the article. Scale AI continues to operate its standard Remotasks platform, which has its own problems.

The irony is that this poorly paid work done by the data annotators is essential to AI systems that in turn are sold or licensed for very high fees. Of the four articles in this series, this is the one that shows the most similarities to the corvée labor system under colonial regimes, which extracted the wealth from so many places around the world, put it into the hands of Europeans, and shared none of it with the workers who made it all possible.

Indonesia

Gojek, a ride-hailing firm employing drivers of motorbikes as well as cars, is the focus of The gig workers fighting back against the algorithms, by Hao and Nadine Freischlad. The motorbikes are everywhere in Jakarta; they deliver food and packages as well as ferrying passengers on the seat behind the driver.

“[A] growing chorus of experts have noted how platform companies have paralleled the practices of colonial empires in using management tools to surveil and exploit a broad base of cheap labor. But the experience of Jakarta’s drivers could reveal a new playbook for resistance” — in part because the drivers always tended to gather in face-to-face groups in between rides, eating and smoking together at roadside food stalls while awaiting the next call. The article calls these gathering places “base camps.”

Gojek driver fist-bumps with ojek driver in front of Universitas Indonesia
Photo by Tommy Wahyu Utomo on Flickr; CC BY-NC 2.0

Informal organization among Gojek drivers has produced communities that, with the help of social media platforms such as Twitter, share information and support drivers outside the structure of the Gojek platform — which is all about squeezing the most work out of them at the lowest cost. The ubiquitous WhatsApp and Telegram groups of Indonesia contribute to the flow of driver-shared information. This trend is being studied by various scholars, including computational social scientist Rida Qadri, who wrote about it for Vice in April 2021. Indonesian scholars have also published articles on the topic.

Beyond sharing tips and tricks, and even responding to drivers’ requests for roadside assistance, the drivers also use unauthorized apps to hack the Gojek system in various ways (at the risk of losing their driver accounts). As the drivers stand up for themselves, Gojek corporate has taken some steps to reach out to them — even visiting base camps in person to seek feedback.

From this article I learned that organizing/uniting makes even gig workers more powerful and better able to combat exploitation by platform companies, and that hacks can be used to subvert the platform’s apps (although the companies are continually finding and plugging the “holes” that make the hacks possible).

New Zealand

In A new vision of artificial intelligence for the people, Hao details an attempt to preserve and revive te reo, the Māori language, in New Zealand. As with many indigenous languages, use of te reo declined as the colonizers (in this case, British) forced local people to use the colonizers’ language instead of their own. Languages die out as children grow up not hearing their own language.

A key to the AI language efforts is a Māori radio station, Te Hiku Media, based in the small town of Kaitaia near the northern tip of the North Island. The station has a 20-year archive of te reo audio recordings. By digitizing the audio files, the project can offer access to Māori people anywhere in the world. Beyond that, accurate transcriptions of the audio could eventually make it possible to get good automated transcription of te reo audio. If a large enough corpus of transcribed te reo existed, then a good-quality language model could be created (via the usual AI processes), and good-quality automated translation would be possible.

There was a problem, though: finding enough people who are fluent enough to transcribe the very fluent speech in the Te Hiku recordings. The solution is fabulous: “rather than transcribe existing audio, they would ask people to record themselves reading a series of sentences designed to capture the full range of sounds in the language. … From those thousands of pairs of spoken and written sentences, [an algorithm] would learn to recognize te reo syllables in audio.” A cash prize was offered to whichever group or team submitted the most recordings.

“Within 10 days, Te Hiku amassed 310 hours of speech-text pairs from some 200,000 recordings made by roughly 2,500 people …”

Although thousands of hours would normally be needed, it was a decent start. The group’s first te reo speech-recognition model tested out with an 86 percent accuracy score.

This article introduced me to the concept of data sovereignty: when indigenous people own and control their own data (see research by Tahu Kukutai, professor at University of Waikato). If a group like Te Hiku released their language data to an outside party, even without ceding ownership, the data could be used in a manner that goes against Māori values and principles. Te Hiku offers APIs through Papa Reo, “a multilingual language platform grounded in indigenous knowledge and ways of thinking and powered by cutting edge data science” (Papa Reo website). Te Hiku has created a data license in an attempt to ensure that Māori values are respected and any profit is shared back to the Māori people.

For other Pacific Island languages that share common roots with te reo, Te Hiku’s te reo language model can provide a leg up toward training their own unique language models.

This is one of the best AI articles I’ve read lately, as I learned a number of new things from it.

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.