When fact-checking technologists and journalists gather in Durham for the 2019 Tech & Check Conference this month, they will share new tools intended to optimize and automate fact-checking.
For Dan Schultz, one founder of the Bad Idea Factory software development collective, this will be a chance to debut a “mannequin” version of the Talking Point Tracker. Created in collaboration with the Duke Tech & Check Cooperative, the tracker is intended to “capture the zeitgeist” of television news by identifying trending topics.
Duke journalism professor Bill Adair, who runs Tech & Check, launched the project by asking Schultz how fact-checkers could capture hot topics on TV news as quickly as possible. That is a simple but powerful idea. TV news is a place of vast discourse, where millions of viewers watch traditional, nonpartisan newscasts and partisan broadcasters such as Sean Hannity and Rachel Maddow. Listening in would give insight into what Schultz calls a “driver or predictor of collective consciousness.”
But executing even simple ideas can be difficult. In this case, TV news programs broadcast dense flows of media: audio, video, text and images that are not simple to track. Luckily, network and cable news outlets produce closed-caption subtitles for news shows. Talking Pointer Tracker scans those subtitles to identify keywords used most frequently within blocks of time. It also puts the keywords in context by showing sentences and longer passages where the keywords were found. To deepen the context, the tracker shows related keywords that often appear with the trending words.
The eventual goal is to group keywords into clusters that better capture emerging conversations. “Our hope is that it will be a useful tool for journalists who want to write in the context of what’s being discussed,“ said Schultz, who is collaborating with Justin Reese, a front-end developer with the Bad Idea Factory, on the project.
More technically, Talking Point Tracker runs closed-caption transcripts through a natural language processing pipeline that cleans the text as well as it can. An application programming interface, an API, uses separate language processing algorithm to find the most common keywords. These are “named entities” — usually proper nouns that can be sorted into different categories such as places, organizations and locations.
Talking Point Tracker’s prototype, to be unveiled at Tech & Check, is dense with information. But the design Reese created for viewing on a computer screen makes it readable. There’s enough white space to be easy on the eyes and a color scheme of red, blue, black and yellow that organizes text.
A list of the most frequent keywords over a specified time period are listed in a column on the left. Next to that is a line graph highlights their frequency. Sentences from which the keywords are listed on the right. If you click there, the tool points you to longer passages of transcripts. On the bottom are related keywords that often appear in the same sentences as a given word.
Moving from a mannequin stage to a living stage for this project will be challenging, Schultz said. As much as natural language processing has evolved over the past decade, algorithms still have trouble understanding aspects of human language. One free, open-source system the Tracker relies on is an API called spaCy. But programs like spaCy don’t always recognize the same thing when they’re stated differently — say, the “Virginia legislature” and the “Virginia General Assembly.”
Another challenge is coping with the quality of news show transcripts, Schultz said. The transcripts can contain many typos, in addition to sometimes being either all caps or all lowercase, which the API can have trouble reading.
And the API doesn’t always know where sentences break. Too often, the system will return sentences that contain just “Mr.” because it concludes that a period signifies the end of the sentence. To get around this, Schultz is using another NLP technology to clean the transcripts he obtains.
To prepare for the Tech & Check Conference, Schultz is building better searching tools and further cleaning up the Tracker’s design. “It’s always good to have your feet close to the fire,” Schultz said.
The biggest question he hopes to get answered before leaving is whether Talking Point Tracker could be useful for journalists, he said.
“There’s a lot things we can gain from feedback. If we have the capacity and interest from whoever, we will continue to iterate and build on top of that,” Schultz said.