Some investigations in the public interest require journalists to search through large quantities of official documents. Often the set of documents is very diverse — that is, the format, structure, and even language of the documents might vary greatly.
One of the more impressive investigations I know of is the ongoing Implant Files project, conducted originally by 250 journalists in 36 countries. The purpose: To examine how medical devices (specifically, those implanted into human bodies) are “tested, approved, marketed, and monitored” (source). I’ve heard this project discussed at conferences, and I’m full of admiration for the editors and reporters involved, led by the International Consortium of Investigative Journalists (ICIJ).
At the heart of the investigation, with its first results published in 2018, was “an analysis of more than 8 million device-related health records, including death and injury reports and recalls.”
“The entire process involved text mining, clustering, feature selection, association rules and classification algorithms to identify events not always described consistently in different parts of the data.”—How ICIJ Used Machine Learning to Help Find Medical Device Issues
These implanted devices — hip replacements, defibrillators, breast implants, intraocular lenses, and more — are used all around the world. When something goes wrong and a product recall is issued, however, the news might not spread to all the locations where the devices continue to be used in new surgeries for new patients. Moreover, people who already have a faulty implant might not be notified. This is why a global investigation was sorely needed.
In 2018, ICIJ shared “a publicly searchable database of more than 70,000 recalls and safety warnings in 11 countries.” The project has continued since then, and the database now contains “more than 120,000 recalls, safety alerts and field safety notices” for medical devices. Throughout 2019, thousands more records were added.
A December 2018 post details the team’s data methodology for the Implant Files. First, journalists had to get the records — and often, their legitimate requests for public records were denied. Of the 8 million device-related records they managed to obtain, 5.4 million came from the U.S. Food and Drug Administration.
The records “describe cases where a device is suspected to have caused or contributed to a serious injury or death or has experienced a malfunction that would likely lead to harm if it were to recur.”
The value in these records was in the connections — connections among cases, and connections among devices. The ICIJ analysis concluded that “devices that broke, misfired, corroded, ruptured or otherwise malfunctioned after implantation or use were linked to more than 1.7 million injuries and nearly 83,000 deaths” in just one decade.
To identify the records that involved a patient’s death, it was necessary for humans to determine various terms and phrasing used instead of the word “death” in the documents. Eventually they developed “a set of more than 3,400 key phrases” that were used to train the machine learning system. After using that model to extract the relevant records, it was necessary to run them through another algorithm configured to determine whether the implant device had contributed to the death.
AI in Media and Society by Mindy McAdams is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Include the author’s name (Mindy McAdams) and a link to the original post in any reuse of this content.