Three Duke computer science majors advanced the quest for what some computer scientists say is the Holy Grail in fact-checking this summer.
Caroline Wang, Ethan Holland and Lucas Fagan tackled major challenges to creating an automated system that can both detect factual claims while politicians speak and instantly provide fact-checks.
That required finding and customizing state-of-art computing tools that most journalists would not recognize. A collective fondness for that sort of challenge helped, a lot.
“We had a lot of fun discussing all the different algorithms out there, and just learning what machine learning techniques had been applied to natural language processing,” said Wang, a junior also majoring in math.
Wang and her partners took on the assignment for a Data+ research project. Part of the Information Initiative at Duke, Data+ invites students and faculty to find data-driven solutions to research challenges confronting scholars on campus.
The fact-checking team convened in a Gross Hall conference from 9 am to 4 pm every weekday for 10 weeks to help each other figure out how to help achieve live fact-checking, a goal of Knight journalism professor Bill Adair and other practitioners of accountability journalism.
Their goal was to do something of a “rough cut” of end-to-end automated fact-checking: to convert a political speech to text, identify the most “checkable” sentences in the speech and then match them with previously published fact-checks.
The students concluded that Google Cloud Speech-to-Text API was the best available tool to automate audio transcriptions. They then submitted the sentences to ClaimBuster, a project at the University of Texas at Arlington that the Duke Tech & Check Cooperative uses to identify statements that merit fact-checking. ClaimBuster acted as a helpful filter that reduced the number of claims submitted to the database, which in turn reduced processing time.
They chose Google Cloud speech-to-text because it can infer where punctuation belongs, Holland said. That yields text divided into complete thoughts. Google speech-to-text also shares transcription results while it processes the audio, rather than waiting until translation is done. That speeds up how fast the new text can get moved to the next steps along a fact-checking pipeline.
“Google will say: This is my current take and this is my current confidence that take is right. That lets you cut down on the lag,” said Holland, a junior whose second major is statistics.
Their next step was finding ways to match the claims from that speech with the database of fact-checks that came from the Lab’s Share the Facts project. (The database contains thousands of articles published by the Washington Post, FactCheck.org and PolitiFact, each checking an individual claim.)
To do that, the students adapted an algorithm that the open-source research outfit OpenAI released in June, after the students started working together. The algorithm builds on The Transformer, a new neural network computing architecture that Google researchers published just six months prior.
The architecture alters how computers organize trying to understand written language. Instead of translating a sentence word by word, The Transformer weighs the importance of each word to the meaning of every other word. Over time that system helps machines discern meaning in more and more sentences more quickly.
“It’s a lot more like learning English. You grow up hearing it and your learn it,” said Fagan, a sophomore also majoring in math.
Work by Wang, Holland and Fagan is expected to help jumpstart a Bass Connections fact-checking team that started this fall. Students on that team will continue the hunt for better strategies to find statements that are good fact-check candidates, produce pop-up fact-checks and create apps to deliver this accountability journalism to more people.
Tech & Check has $1.2 million in funding from the John S. and James L. Knight Foundation, the Facebook Journalism Project and the Craig Newmark Foundation to tackle that job.