Rules and ethics for use of AI by governments

The governments of British Columbia and Yukon, in Canada, have jointly issued a report (June 2021) about ethical use of AI in the public sector. It’s interesting to me as it covers issues of privacy and fairness, and in particular, the rights of people to question decisions derived from AI systems. The report notes that the public increasingly expects services provided by governments to be as fast and as personalized as services provided by online platforms such as Amazon — and this leads or will lead to increasing adoption of AI systems to aid in delivery of government services to members of the public.

The report’s concluding recommendations (pages 47–48) cover eight points (edited):

  1. Establish guiding principles for AI use: “Each public authority should make a public commitment to guiding principles for the use of AI that incorporate transparency, accountability, legality, procedural fairness and protection of privacy.”
  2. Inform the public: “If an ADS [automated decision system] is used to make a decision about an individual, public authorities must notify and describe how that system operates to the individual in a way that is understandable.”
  3. Provide human accountability: “Identify individuals within the public authority who are responsible for engineering, maintaining, and overseeing the design, operation, testing and updating of any ADS.”
  4. Ensure that auditing and transparency are possible: “All ADS should include robust and open auditing functionality with enhanced transparency measures for closed-source, proprietary datasets used to develop and update any ADS.”
  5. Protect privacy of individuals: “Wherever possible, public authorities should use synthetic or de-identified data in any ADS.” See synthetic data definition, below.
  6. Build capacity and increase education (for understanding of AI): This point covers “public education initiatives to improve general knowledge of the impact of AI and other emerging technologies on the public, on organizations that serve the public,” etc.; “subject-matter knowledge and expertise on AI across government ministries”; “knowledge sharing and expertise between government and AI developers and vendors”; development of “open-source, high-quality data sets for training and testing ADS”; “ongoing training of ADS administrators” within government agencies.
  7. Amend privacy legislation to include: “an Artificial Intelligence Fairness and Privacy Impact Assessment for all existing and future AI programs”; “the right to notification that ADS is used, an explanation of the reasons and criteria used, and the ability to object to the use of ADS”; “explicit inclusion of service providers to the same obligations as public authorities”; “stronger enforcement powers in both the public and private sector …”; “special rules or restrictions for the processing of highly sensitive information by ADS”; “shorter legislative review periods of 4 years.”
  8. Review legislation to make sure “oversight bodies are able to review AIFPIAs [see item 7 above] and conduct investigations regarding the use of ADS alone or in collaboration with other oversight bodies.”

Synthetic data is defined (on page 51) as: “A type of anonymized data used as a filter for information that would otherwise compromise the confidentiality of certain aspects of data. Personal information is removed by a process of synthesis, ensuring the data retains its statistical significance. To create synthetic data, techniques from both the fields of cryptography and statistics are used to render data safe against current re-identification attacks.”

The report uses the term automated decision systems (ADS) in view of the Government of Canada’s Directive on Automated Decision Making, which defines them as: “Any technology that either assists or replaces the judgement of human decision-makers.”

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Book notes: Hello World, by Hannah Fry

I finished reading this book back in April, and I’d like to revisit it before I read a couple of new books I just got. This was published in 2018, but that’s no detriment. The author, Hannah Fry, is a “mathematician, science presenter and all-round badass,” according to her website. She’s also a professor at University College London. Her bio at UCL says: “She was trained as a mathematician with a first degree in mathematics and theoretical physics, followed by a PhD in fluid dynamics.”

The complete title, Hello World: Being Human in the Age of Algorithms, doesn’t sound like this is a book about artificial intelligence. She refers to control, and “the boundary between controller and controlled,” from the very first pages, and this reflects the link between “just” talking about algorithms and talking about AI. Software is made of algorithms, and AI is made of software, so there we go.

In just over 200 pages and seven chapters simply titled Power, Data, Justice, Medicine, Cars, Crime, and Art, this author organizes primary areas of concern for the question of “Are we in control?” and provides examples in each area.

Power. I felt disappointed when I saw this chapter starts with Deep Blue beating world chess champion Garry Kasparov in 1997 — but my spirits soon lifted as I saw how she framed this example as the way we perceive a computer system affects how we interact with it (shades of Sherry Turkle and Reeves & Nass). She discusses machine learning and image recognition here, briefly. She talks about people trusting GPS map directions and search engines. She explains a 2012 ACLU lawsuit involving Medicaid assistance, bad code, and unwarranted trust in code. Intuition tells us when something seems “off,” and that’s a critical difference between us and the machines.

Algorithms “are what makes computer science an actual science.”

—Hannah Fry, p. 8

Data. Sensibly, this chapter begins with Facebook and the devil’s bargain most of us have made in giving away our personal information. Fry talks about the first customer loyalty cards at supermarkets. The pregnant teenager/Target story is told. In explaining how data brokers operate, Fry describes how companies buy access to you via your interests and your past behaviors (not only online). She summarizes a 2017 DEFCON presentation that showed how supposedly anonymous browsing data is easily converted into real names, and the dastardly Cambridge Analytica exploit. I especially liked how she explains how small the effects of newsfeed manipulation are likely to be (based on research) and then adds — a small margin might be enough to win an election. This chapter wraps up with China’s citizen rating system (Black Mirror in reality) and the toothlessness of GDPR.

Justice. First up is inequality in sentences for crimes, using two U.K. examples. Fry then surveys studies where multiple judges ruled on the same hypothetical cases and inconsistencies abounded. Then the issues with sentencing guidelines (why judges need to be able to exercise discretion). So we arrive at calculating the probability that a person will “re-offend”: the risk assessment. Fry includes a nice, simple decision-tree graphic here. She neatly explains the idea of combining multiple decision trees into an ensemble, used to average the results of all the trees (the random forest algorithm is one example). More examples from research; the COMPAS product and the 2016 ProPublica investigation. This leads to a really nice discussion of bias (pp. 65–71 in the U.S. paperback edition).

Medicine. Although image recognition was mentioned very briefly earlier, here Fry gets more deeply into the topic, starting off with the idea of pattern recognition — and what pattern, exactly, is being recognized? Classifying and detecting anomalies in biopsy slides doesn’t have perfect results when humans do it, so this is one of the promising frontiers for machine learning. Fry describes neural networks here. She gets into specifics about a system trained to detect breast cancer. But image recognition is not necessarily the killer app for medical diagnosis. Fry describes a study of 678 nuns (which previously I’d never heard about) in which it was learned that essays the nuns had written before taking vows could be used to predict which nuns would have dementia later in life. The idea is that an analysis of more data about women (not only their mammograms) could be a better predictor of malignancy.

“Even when our detailed medical histories are stored in a single place (which they often aren’t), the data itself can take so many forms that it’s virtually impossible to connect … in a way that’s useful to an algorithm.”

—Hannah Fry, p. 103

The Medicine chapter also mentions IBM Watson; challenges with labeling data; diabetic retinopathy; lack of coordination among hospitals, doctor’s offices, etc., that lead to missed clues; privacy of medical records. Fry zeroes in on DNA data in particular, noting that all those “find your ancestors” companies now have a goldmine of data to work with. Fry ends with a caution about profit — whatever medical systems might be developed in the future, there will always be people who stand to gain and others who will lose.

Cars. I’m a little burnt out of the topic of self-driving cars, having already read a lot about them. I liked that Fry starts with DARPA and the U.S. military’s longstanding interest in autonomous vehicles. I can’t agree with her that “the future of transportation is driverless” (p. 115). After discussing LiDAR and the flaws of GPS and conflicting signals from different systems in one car, Fry takes a moment to explain Bayes’ theorem, saying it “offers a systematic way to update your belief in a hypothesis on the basis of evidence,” and giving a nice real-world example of probabilistic inference. And of course, the trolley problem. She brings up something I don’t recall seeing before: Humans are going to prank autonomous vehicles. That opens a whole ‘nother box of trouble. Her anecdote under the heading “The company baby” leads to a warning: Always flying on autopilot can have unintended consequences when the time comes to fly manually.

Crime. This chapter begins with a compelling anecdote, followed by a neat historical case from France in the 1820s, and then turns to predictive policing and all its woes. I hadn’t read about the balance between the buffer zone and distance decay in tracking serial criminals, so that was interesting — it’s called the geoprofiling algorithm. I also didn’t know about Jack Maple, a New York City police officer, and his “Charts of the Future” depicting stations of the city’s subway system, which evolved into a data tool named CompStat. I enjoyed learning what burglaries and earthquakes have in common. And then — PredPol. There have been thousands of articles about this since its debut in 2011, as Fry points out. Her summary of the issues related to how police use predictive policing data is quite good, compact and clear. PredPol is one specific product, and not the only one. It is, Fry says, “a proprietary algorithm, so the code isn’t available to the public and no one knows exactly how it works” (p. 157).

“The [PredPol] algorithm can’t actually tell the future. … It can only predict the risk of future events, not the events themselves — and that’s a subtle but important difference.”

—Hannah Fry, p. 153

Face recognition is covered in the Crime chapter, which makes perfect sense. Fry offers a case where a white man was arrested based on incorrect identification of him from CCTV footage at a bank robbery. The consequences of being the person arrested by police can be injury or death, as we all know — not to mention the legal expenses as you try to clear your name after the erroneous arrest. Even though accuracy rates are rising, the chances that you will match a face that isn’t yours remains worrying.

“How do you decide on that trade-off between privacy and protection, fairness and safety?”

—Hannah Fry, p. 172

Art. Here we have “a famous experiment” I’d never heard of — Music Lab, where thousands of music fans logged into a music player app, listened to songs, rated them, and chose what to download (back when we downloaded music). The results showed that for all but the very best and very worst songs, the ratings by other people had a huge influence on what was downloaded in different segments of the app. A song that became a massive hit in one “world” was dead and buried in another. This leads us to recommendation engines such as those used by Netflix and Amazon. Predicting how well movies would do at the box office, turned out to be badly unreliable. The trouble is the lack of an objective measure of quality — it’s not “This is cancer/This is not cancer.” Beauty in the eye of the beholder and all that. A recommendation engine is different because it’s not using a quality score — it’s matching similarity. You liked these 10 movies; I like eight of those; chances are I might like the other two.

Fry goes on to discuss programs that create original (or seemingly original) works of art. A system may produce a new musical or visual composition, but it doesn’t come from any emotional basis. It doesn’t indicate a desire to communicate with others, to touch them in any way.

In her Conclusion, Fry returns to the questions about bias, fairness, mistaken identity, privacy — and the idea of the control we give up when we trust the algorithms. People aren’t perfect, and neither are algorithms. Taking the human consequences of machine errors into account at every stage is a step toward accountability. Building in the capability to backtrack and explain decisions, predictions, outputs, is a step toward transparency.

For details about categories of algorithms based on tasks they perform (prioritization, classification, association, filtering; rule-based vs. machine learning), see the Power chapter (pp. 8–13 in the U.S. paperback edition).

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Symbolic AI: Good old-fashioned AI

The distinction between symbolic (explicit, rule-based) artificial intelligence and subsymbolic (e.g. neural networks that learn) artificial intelligence was somewhat challenging to convey to non–computer science students. At first I wasn’t sure how much we needed to dwell on it, but as the semester went on and we got deeper into the differences among types of neural networks, it was very useful to keep reminding the students that many of the things neural nets are doing today would simply be impossible with symbolic AI.

The difficulty lies in the shallow math/science background of many communications students. They might have studied logic problems/puzzles, but their memory of how those problems work might be very dim. Most of my students have not learned anything about computer programming, so they don’t come to me with an understanding of how instructions are written in a program.

This post by Ben Dickson at his TechTalks blog offers a very nice summary of symbolic AI, which is sometimes referred to as good old-fashioned AI (or GOFAI, pronounced GO-fie). This is the AI from the early years of AI, and early attempts to explore subsymbolic AI were ridiculed by the stalwart champions of the old school.

The requirements of symbolic AI are that someone — or several someones — needs to be able to specify all the rules necessary to solve the problem. This isn’t always possible, and even when it is, the result might be too verbose to be practical. As many people have said, things that are easy for humans are hard for computers — like recognizing an oddly shaped chair as a chair, or distinguishing a large upholstered chair from a small couch. Things we do almost without thinking are very hard to encode into rules a computer can follow.

“Symbolic artificial intelligence is very convenient for settings where the rules are very clear cut, and you can easily obtain input and transform it into symbols.”

—Ben Dickson

Subsymbolic AI does not use symbols, or rules that need symbols. It stems from attempts to write software operations that mimic the human brain. Not copy the way the brain works — we still don’t know enough about how the brain works to do that. Mimic is the word usually used because a subsymbolic AI system is going to take in data and form connections on its own, and that’s what our brains do as we live and grow and have experiences.

Dickson uses an image-recognition example: How would you program specific rules to tell a symbolic system to recognize a cat in a photo? You can’t write rules like “Has four legs,” or “Has pointy ears,” because it’s a photo. Your rules would need to be about pixels and edges and clusters of contrasting shades. Your rules would also need to account for infinite variations in photos of cats.

“You can’t define rules for the messy data that exists in the real world.”

—Ben Dickson

Thus “messy” problems such as image recognition are ideally handled by neural networks — subsymbolic AI.

Problems that can be drawn as a flow chart, with every variable accounted for, are well suited to symbolic AI. But scale is always an issue. Dickson mentions expert systems, a classic application of symbolic AI, and notes that “they require a huge amount of effort by domain experts and software engineers and only work in very narrow use cases.” On top of that, the knowledge base is likely to require continual updating.

An early, much-praised expert system (called MYCIN) was designed to help doctors determine treatment for patients with blood diseases. In spite of years of investment, it remained a research project — an experimental system. It was not sold to hospitals or clinics. It was not used in day-to-day practice by any doctors diagnosing patients in a clinical setting.

“I have never done a calculation of the number of man-years of labor that went into the project, so I can’t tell you for sure how much time was involved … it is such a major chore to build up a real-world expert system.”

—Edward H. Shortliffe, principal developer of the MYCIN expert system (source)

Even though expert systems are impractical for the most part, there are other useful applications for symbolic AI. Dickson mentions “efforts to combine neural networks and symbolic AI” near the end of his post. He points out that symbolic systems are not “opaque” the way neural nets are — you can backtrack through a decision or prediction and see how it was made.

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

The trouble with large language models

Yesterday I summarized the first two articles in a series about algorithms and AI by Hayden Field, a technology journalist at Morning Brew. Today I’ll finish out the series.

The third article, This Powerful AI Technique Led to Clashes at Google and Fierce Debate in Tech. Here’s Why, explores the basis of the volatile situation around the firing of Timnit Gebru and later Margaret Mitchell from Google’s Ethical AI unit earlier this year. Both women are highly respected and experienced AI researchers. Mitchell founded the team in 2017.

Central to the situation is a criticism of large language models and a March 2021 paper (On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?) co-authored by Gebru, Mitchell, and two researchers at the University of Washington. The biggest current example is GPT-3, previously covered in several posts here.

“Models this big require an unthinkable amount of data; the entirety of English-language Wikipedia makes up just 0.6% of GPT-3’s training data.”

—”This Powerful AI Technique Led to Clashes at Google and Fierce Debate in Tech. Here’s Why”

The Morning Brew article sums up the very recent and very big improvements in large language models that have come about thanks to new algorithms and faster computer hardware (GPUs running in parallel). It highlights BERT, “the model that now underpins Google Search,” which came out of the research that resulted in the first Transformer. A good at-the-time article about GPT-3’s release was published in July 2020 in MIT’s Technology Review: “OpenAI first described GPT-3 in a research paper published in May [2020].”

One point being — Google fired Timnit Gebru very soon after news and discussion of large language models (GPT-3 especially, but remember Google’s investment in BERT too) ramped up — way up. Her criticism of a previously obscure AI technology (not obscure among NLP researchers, but in the wider world) might have been seen as increasingly inconvenient for Google. Morning Brew summarizes the criticism (not attributed to Gebru): “Because large language models often scrape data from most of the internet, racism, sexism, homophobia, and other toxic content inevitably filter in.”

“Once the barrier to create AI tools and generate text is lower, people could just use it to create misinformation at scale, and having that data coupled with certain other platforms can just be a very disastrous situation.”

—Sandhini Agarwal, AI policy researcher, OpenAi

The Morning Brew article goes well beyond Google’s dismissal of Gebru and Mitchell, bringing in a lot of clear, easy-to-understand explanation of what large language models require (for example, significant energy resources), what they’re being used for, and even the English-centric nature of such models — lacking a gigantic corpus of digitized text in a given human language, you can’t create a large model in that language.

The turmoil in Google’s Ethical AI unit is covered in more detail in this May 2021 article, also by Hayden Field.

It’s easy to find articles that discuss “scary things GPT-3 can do and does” and especially the bias issues; it’s much harder to find information about some of the other aspects covered here. It’s also not just about GPT-3. I appreciated insights from an interview with Emily M. Bender, first author on the “Stochastic Parrots” article. I also liked the explicit statement that many useful NLP tasks can be done well without a large language model. In smaller datasets, finding and accounting for toxic content can be more manageable.

“Do we need this at all? What’s the actual value proposition of the technology? … Who is paying the environmental price for us doing this, and is this fair?”

—Emily M. Bender, professor and director, Professional MS in Computational Linguistics, University of Washington

Finally, in a recap of Morning Brew’s “Demystifying Algorithms” event, editor Dan McCarthy summarized two AI researchers’ answers to one of my favorite questions: What can an algorithm actually know?

An AI system’s ability to generalize — to transfer learning from one domain to another — is still a wide-open frontier, according to Mark Riedl, a computer science professor at Georgia Tech. This is something I remind my students of over and over — what’s called “general intelligence” is still a long way off for artificial intelligence. Riedl works on aspects of storytelling to test whether an AI system is able to “make something new” out of what it has ingested.

Saška Mojsilović, head of Trusted AI Foundations at IBM Research, made a similar point — and also emphasized that “narrow AI” (which is all the AI we’ve ever had, up to now and for the foreseeable future) is not nothing.

She suggested: “We may want to take a pause from obsessing over artificial general intelligence and maybe think about how we create AI solutions for these kinds of problems” — for example, narrow domains such as drug discovery (e.g. new antibiotics) and creation of new molecules. These are extraordinary accomplishments within the capabilities of today’s AI.

This is a half-hour conversation with those two experts:

Thanks to the video, I learned about the Lovelace 2.0 Test, which Riedl developed in 2014. It’s an alternative to the Turing Test.

Mojsilović talked about the perceptions that arise when we use the word intelligence when talking about machines. “The reality is that many things that we call AI today are the same old models that we used to call data science maybe five or six years ago,” she said (at 21:55). She also talked about the need for collaboration between AI researchers and experts in entirely separate fields: “Because we can’t create solutions for the problems that we don’t understand” (at 29:24).

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.

Summary of the challenges facing algorithms, AI

Hayden Field, a technology journalist at Morning Brew, published a series of articles about algorithms and AI earlier this year, and they’ve been on my TBR list.

First up was Nine Experts on the Single Biggest Obstacle Facing AI and Algorithms in the Next Five Years. Experts: Drago Anguelov (Waymo); Kathy Baxter (Salesforce); David Cox (IBM Watson); Natasha Crampton (Microsoft); Mark Diaz (Ethical AI at Google); Charles Isbell (professor and dean, College of Computing, Georgia Institute of Technology); Peter Lofgren (Stripe); Andrew Ng (co-founder and former head, Google Brain); Cathy O’Neil (author, Weapons of Math Destruction).

Predictably, ethics was noted as a big challenge — O’Neil asked what we will do about unfairness in decisions made by algorithms. Diaz pointed to the need for involving “experts from a wide range of disciplines, including non-technical disciplines,” in the development process, long before an end product emerges. This intersects with ethics and fairness, as the absence of experts and stakeholders opens the door wide to omissions and errors. Baxter was explicit about systemic racism that is embedded in both training data and models. She listed “medical care decisions, hiring recommendations, access to housing and social programs, visa application approvals, school exam results, hate speech detection, dynamic pricing algorithms for ride hailing services, and even dating apps” — as well as face recognition and predictive policing.

“In essence, problems that are not purely technical require solutions that are not purely technical.”

—Mark Diaz, Ethical AI at Google

Isbell spoke of systematic solutions that can be widely applied. “We cannot treat minority groups as exceptions and edge cases,” he said. Cox highlighted transparency and explainability, as well as ethics and bias. He also alluded to adversarial attacks as well as the non-adversarial errors that surprise researchers (possibly due to overfitting). He grouped all this under trust. Crampton also focused on fairness and referred to diversity in teams, similar to Diaz’s and Isbell’s concerns.

Anguelov explained the need for reliable simulations so that systems can scale up to real-world use. He’s talking about the Long Tail problem: the real world throws up too many unexpected situations. Simulations allow testing in ways that don’t risk human lives (think self-driving cars). Lofgren also talked about scale, but in terms of personalization — his example is detecting credit card fraud in real-time based on Big Data that detects abusing IP addresses and then drills down to the individual cards being used. Ng talked about the difficulty in making dependable commercial AI products — basically off-the-shelf solutions.

“We will often need to make hard decisions based on competing priorities, including decisions to not build or deploy a system for certain purposes.”

—Natasha Crampton, Microsoft

Second in the series is titled Amex’s Fraud Detection AI Was Ready to Go Live. Then Covid Hit. This article starts with the idea that large AI models in the field will still need adjustments as unforeseen problems crop up. This echos the concerns about scale raised by Anguelov and Lofgren in the first article in the series.

The challenge thrown by COVID-19 was that all existing models had been developed and adjusted in a non-pandemic world. Then the world changed.

Amex’s fraud-detecting systems are a blend of old-school rule-based systems and newer machine learning techniques. A team of about 30 decision scientists monitors the system round-the-clock and updates it when necessary, at least once a year. The pandemic came at a bad time for Amex, just as they were rolling out a new model.

“Since each generation of a gradient-boosting ML model is typically developed on data from earlier that same year, many of the model’s assumptions no longer made sense” in 2020.

—”Amex’s Fraud Detection AI Was Ready to Go Live. Then Covid Hit”

This is a really interesting article — although I’d read others about issues caused for AI models by pandemic changes, most of those had to do with either healthcare or travel.

Because of increased online traffic in 2020 — more people online, every day, as the pandemic drove work-from-home and stay-at-home schooling — demands on Amazon Web Services (providing servers and processing power to millions of commercial clients such as Amex) grew enormously. This “dwindling cloud capacity” meant testing new solutions for Amex’s model took much longer than usual. The team had to run new simulations that took our new way of life into account, and those simulations required lots of processor juice.

In the end, Amex’s rollout was successful — but it came months later than originally planned. This was a really neat case study and could be discussed in a lot of different contexts.

I’m going to look at the other articles in the series in tomorrow’s post.

.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.

.