Examples of machine learning in journalism

Following on from yesterday’s post, today I looked at more lessons in Introduction to Machine Learning from the Google News Initiative. (Friday AI Fun posts will return next week.)

The separation of machine learning into three different approaches — supervised learning, unsupervised learning, and reinforcement learning — is standard (Lesson 3). In keeping with the course’s focus on journalism applications of ML, the example given for supervised learning is The Atlanta Journal-Constitution‘s deservedly famous investigative story about sex abuse of patients by doctors. Supervised learning was used to sort more than 100,000 disciplinary reports on doctors.

The example of unsupervised learning is one I hadn’t seen before. It’s an investigation of short-term rentals (such as Airbnb rentals) in Austin, Texas. The investigator used locality-sensitive hashing (LSH) to group property records in a set of about 1 million documents, looking for instances of tax evasion.

The main example given for reinforcement learning is AlphaGo (previously covered in this blog), but an example from The New York TimesHow The New York Times Is Experimenting with Recommendation Algorithms — is also offered. Reinforcement learning is typically applied when a clear “reward” can be identified, which is why it’s useful in training an AI system to play a game (winning the game is a clear reward). It can also be used to train a physical robot to perform specified actions, such as pouring a liquid into a container without spilling any.

Also in Lesson 3, we find a very brief description of deep learning (it doesn’t mention layers and weights). and just a mention of neural networks.

“What you should retain from this lesson is fairly simple: Different problems require different solutions and different ML approaches to be tackled successfully.”

—Lesson 3, Different approaches to Machine Learning

Lesson 4, “How you can use Machine Learning,” might be the most useful in this set of eight lessons. Its content comes (with permission) from work done by Quartz AI Studio — specifically from the post How you’re feeling when machine learning might help, by the super-talented Jeremy B. Merrill.

The examples in this lesson are really good, so maybe you should just read it directly. You’ll learn about a variety of unusual stories that could only be told when journalists used machine learning to augment their reporting.

“Machine learning is not magic. You might even say that it can’t do anything you couldn’t do — if you just had a thousand tireless interns working for you.”

—Lesson 4, How you can use Machine Learning

(The Quartz AI Studio was created with a $250,000 grant from the Knight Foundation in 2018. For a year the group experimented, helped several news organizations produce great work, and ran a number of trainings for journalists. Then it was quietly disbanded in early 2020.)

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.


Google’s machine learning ‘course’ for journalists

I couldn’t resist dipping into this free course from the Google News Initiative, and what I found surprised me: eight short lessons that are available as PDFs.

The good news: The lessons are journalism-focused, and they provide a painless introduction to the subject. The bad news: This is not really a course or a class at all — although there is one quiz at the end. And you can get a certificate, for what it’s worth.

There’s a lot here that many journalists might not be aware of, and that’s a plus. You get a brief, clear description of Reuters’ News Tracer and Lynx Insight tools, both used in-house to help journalists discover new stories using social media or other data (Lesson 1). A report I recall hearing about — how automated real-estate stories brought significant new subscription revenue to a Swedish news publisher — is included in a quick summary of “robot reporting” (also Lesson 1).

Lesson 2 helpfully explains what machine learning is without getting into technical operations of the systems that do the “learning.” They don’t get into what training a model entails, but they make clear that once the model exists, it is used to make predictions. The predictions are not like what some tarot-card reader tells you but rather probability-based results that the model is able to produce, based on its prior training.

Noting that machine learning is a subset of the wider field called artificial intelligence is, of course, accurate. What is inaccurate is the definition “specific applications that use data to train a model to perform a given task independently and learn from experience.” They left out Q-learning, a type of reinforcement learning (a subset of machine learning), which does not use a model. It’s okay that they left it out, but they shouldn’t imply that all machine learning requires a trained model.

The explosion of machine learning and AI in the past 10 years is explained nicely and concisely in Lesson 2. The lesson also touches on misconceptions and confusion surrounding AI:

“The lack of an officially agreed definition, the legacy of science-fiction, and a general low level of literacy on AI-related topics are all contributing factors.”

—Lesson 2, Is Machine Learning the same thing as AI?

I’ll be looking at Lessons 3 and 4 tomorrow.

Creative Commons License
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.