Press "Enter" to skip to content

Tag: Squash

The lessons of Squash, our groundbreaking automated fact-checking platform

Squash began as a crazy dream.

Soon after I started PolitiFact in 2007, readers began suggesting a cool but far-fetched idea. They wanted to see our fact checks pop up on live TV.

That kind of automated fact-checking wasn’t possible with the technology available back then, but I liked the idea so much that I hacked together a PowerPoint of how it might look. It showed a guy watching a campaign ad when PolitiFact’s Truth-O-Meter suddenly popped up to indicate the ad was false.

Bill Adair’s original depiction of pop-up fact-checking.

It took 12 years, but our team in the Duke University Reporters’ Lab managed to make the dream come true. Today, Squash (our code name for the project, chosen because it is a nutritious vegetable and a good metaphor for stopping falsehoods) has been a remarkable success. It displays fact checks seconds after politicians utter a claim and it largely does what those readers wanted in 2007.

But Squash also makes lots of mistakes. It converts politicians’ speech to the wrong text (often with funny results) and it frequently stays idle because there simply aren’t enough claims that have been checked by the nation’s fact-checking organizations. It isn’t quite ready for prime time.

As we wrap up four years on the project, I wanted to share some of our lessons to help developers and journalists who want to continue our work. There is great potential in automated fact-checking and I’m hopeful that others will build on our success.

When I first came to Duke in 2013 and began exploring the idea, it went nowhere. That’s partly because the technology wasn’t ready and partly because I was focused on the old way that campaign ads were delivered — through conventional TV. That made it difficult to isolate ads the way we needed to.

But the technology changed. Political speeches and ads migrated to the web and my Duke team partnered with Google, Jigsaw and Schema.org to create ClaimReview, a tagging system for fact-check articles. Suddenly we had the key elements that made instant fact-checking possible: accessible video and a big database of fact checks.

I wasn’t smart enough to realize that, but my colleague Mark Stencel, the co-director of the Reporters’ Lab, was. He came into my office one day and said ClaimReview was a game changer. “You realize what you’ve done, right? You’ve created the magic ingredient for your dream of live fact-checking.” Um … yes! That had been my master plan all along!

Fact-checkers use the ClaimReview tagging system to indicate the person and claim being checked, which not only helps Google highlight the articles in search results, it also makes a big database of checks that Squash can tap.

It would be difficult to overstate the technical challenge we were facing. No one had attempted this kind of work beyond doing a demo, so there was no template to follow. Fortunately we had a smart technical team and some generous support from the Knight Foundation, Craig Newmark and Facebook.

Christopher Guess, our wicked-smart lead technologist, had to invent new ways to do just about everything, combining open-source tools with software that he built himself. He designed a system to ingest live TV and process the audio for instant fact-checking. It worked so fast that we had to slow down the video.

To reduce the massive amount of computer processing, a team of students led by Duke computer science professor Jun Yang came up with a creative way to filter out sentences that did not contain factual claims. They used ClaimBuster, an algorithm developed at the University of Texas at Arlington, to act like a colander that kept only good factual claims and let the others drain away.

Squash works by converting audio to text and then matching the claim against a database of fact-checks.

Today, this is how Squash works: It “listens” to a speech or debate, sending audio clips to Google Cloud that are converted to text. That text is then run through ClaimBuster, which identifies sentences the algorithm believes are good claims to check. They are compared against the database of published fact checks to look for matches. When one is found, a summary of that fact check pops up on the screen.

The first few times you see the related fact check appear on the screen, it’s amazing. I got chills. I felt was getting a glimpse of the future. The dream of those PolitiFact readers from 2007 had come true.

But …

Look a little closer and you will quickly realize that Squash isn’t perfect. If you watch in our web mode, which shows Squash’s AI “brain” at work, you will see plenty of mistakes as it converts voice to text. Some are real doozies.

Last summer during the Democratic convention, former Iowa Gov. Tom Vilsack said this: “The powerful storm that swept through Iowa last week has taken a terrible toll on our farmers ……”

But Squash (it was really Google Cloud) translated it as “Armpit sweat through the last week is taking a terrible toll on our farmers.”

Squash’s matching algorithm also makes too many mistakes finding the right fact check. Sometimes it is right on the money. It often correctly matched then-President Donald Trump’s statements on China, the economy and the border wall.

But other times it comes up with bizarre matches. Guess and our project manager Erica Ryan, who spends hours analyzing the results of our tests, believe this often happens because Squash mistakenly thinks an individual word or number is important. (Our all-time favorite was in our first test, when it matched a sentence by President Trump about men walking on the moon with a Washington Post fact-check about the bureaucracy for getting a road permit. The match occurred because both included the word years.)

Squash works by detecting politicians’ claims and matching them with related fact-checks. (Screengrab from Democratic debate)

To reduce the problem, Guess built a human editing tool called Gardener that enables us to weed out the bad matches. That helps a lot because the editor can choose the best fact check or reject them all.

The most frustrating problem is that a lot of time, Squash just sits there, idle, even when politicians are spewing sentences packed with factual claims. Squash is working properly, Guess assures us, it just isn’t finding any fact checks that are even close. This happened in our latest test, a news conference by President Joe Biden, when Squash could muster only two matches in more than an hour.

That problem is a simple one: There simply are not enough published fact checks to power Squash (or any other automated app).

We need more fact checks – As I noted in the previous section, this is a major shortcoming that will hinder anyone who wants to draw from the existing corpus of fact checks. Despite the steady growth of fact-checking in the United States and around the world, and despite the boom that occurred in the Trump years, there simply are not enough fact checks of enough politicians to provide enough matches for Squash and similar apps.

We had our greatest success during debates and party conventions, events when Squash could draw from a relatively large database of checks on the candidates from PolitiFact, FactCheck.org and The Washington Post. But we could not use Squash on state and local events because there simply were not enough fact-checks for possible matches.

Ryan and Guess believe we need dozens of fact checks on a single candidate, across a broad range of topics, to have enough to make Squash work.

More armpit sweat is needed to improve voice to text – We all know the limitations of Siri, which still translates a lot of things wrong despite years of tweaks and improvements by Apple. That’s a reminder that improving voice-to-text technology remains a difficult challenge. It’s especially hard in political events when audio can be inconsistent and when candidates sometimes shout at each other. (Identifying speakers in debates is yet another problem.)

As we currently envision Squash and this type of automated fact-checking, we are reliant on voice-to-text translations, but given the difficulty of automated “hearing,” we’ll have to accept a certain error level for the foreseeable future.

Matching algorithms can be improved – This is one area that we’re optimistic about. Most of our tests relied on off-the-shelf search engines to do the matching, until Guess began to experiment with a new approach to improve the matching. That approach relies on subject tags (which unfortunately are not included in ClaimReview) to help the algorithm make smarter choices and avoid irrelevant choices.

The idea is that if Squash knows the claim is about guns, it would find the best matches from published fact checks that have been tagged under the same subject. Guess found this approach promising but did not get a chance to try the approach at scale.

Until the matching improves, we’ve found humans are still needed to monitor and manage anything that gets displayed — as we did with our Gardener tool.

Ugh, UX – The simplest part of my vision, the Truth-O-Meter popping up on the screen, ended up being one of our most complex challenges. Yes, Guess was able to make the meter or the Washington Post Pinocchios pop up, but what were they referring to? This question of user experience was tricky in several ways.

First, we were not providing an instant fact check of the statement that was just said. We were popping up a summary of a related fact check that was previously published. Because politicians repeat the same talking points, the statements were generally similar and in some cases, even identical. But we couldn’t guarantee that, so we labeled the pop-up “Related fact-check.”

Second, the fact check appeared during a live, fast-moving event. So we realized it could be unclear to viewers which previous statement the pop-up referred to. This was especially tricky in a debate when candidates traded competing factual claims. The pop-up could be helpful with either of them. But the visual design that seemed so simple for my PowerPoint a decade earlier didn’t work in real life. Was that “False” Truth-O-Meter for the immigration statement Biden said? Or the one that Trump said?

Another UX problem: To give people time to read all the text (the related fact checks sometimes had lengthy statements), Guess had them linger on the screen for 15 seconds. And our designer Justin Reese made them attractive and readable. But by the end of that time the candidates might have said two more factual claims, further confusing viewers that saw the “False” meter.

So UX wasn’t just a problem, it was a tangle of many problems involving limited space on the screen (What should we display and where? Will readers understand the concept that the previous fact check is only related to what was just said?), time (How long should we display it in relation to when the politician spoke?) and user interaction (Should our web version allow users to pause the speech or debate to read a related fact check?). It’s an enormously complicated challenge.

* * *

Looking back at my PowerPoint vision of how automated fact-checking would work, we came pretty close. We succeeded in using technology to detect political speech and make relevant fact checks automatically pop up on a video screen. That’s a remarkable achievement, a testament to groundbreaking work by Guess and an incredible team.

But there are plenty of barriers that make it difficult for us to realize the dream and will challenge anyone who tries to tackle this in the future. I hope others can build on our successes, learn from our mistakes, and develop better versions in years to come.

Comments closed

Pop-up fact-checking moves online: Lessons from our user experience testing

We initially wanted to build pop-up fact-checking for a TV screen. But for nearly a year, people have told us in surveys and in coffee shops that they like live fact-checking but they need more information than they can get on a TV.

The testing is a key part of our development of Squash, our groundbreaking live fact-checking product. We started by interviewing a handful of users of our FactStream app. We wanted to know how they found out about the app, how they find fact checks about things they hear on TV, and what they would need to trust live fact-checking. As we saw in our “Red Couch Experiments” in 2018, they were excited about the concept but they wanted more than a TV screen allowed. 

We supplemented those interviews with conversations in coffee shops – “guerilla research” in user experience (UX) terms. And again, the people we spoke with were excited about the concept but wanted more information than a 1740×90 pixel display could accommodate.

The most common request was the ability to access the full published fact-check. Some wanted to know if more than one fact-checker had vetted the claim, and if so, did they all reach the same conclusion? Some just wanted to be able to pause the video. 

Since those things weren’t possible with a conventional TV display, we pivoted and began to imagine what live fact-checking would look like on the web. 

Bringing Pop-Up Fact-Checking to the Web

In an online whiteboard session, our Duke Tech and Check Cooperative team discussed many possibilities for bringing live fact-checking online, and then, our UX team — students Javan Jiang and Dora Pekec and myself — designed a new interface for live fact-checking and tested it in a series of simple open-ended preference surveys. 

In total, 100 people responded to these surveys, in addition to the eight interviews above and a large experiment with 1,500 participants we did late last year about whether users want ratings in on-screen displays (they do). 

A common theme emerged in the new research: Make live fact-checking as non-disruptive to the viewing experience as possible. More specifically, we found three things that users want and need from the live fact-checking experience.

  • Users prefer a fact-checking display beneath the video. In our initial survey, users could choose if they liked a display beside or beneath the video. About three-quarters of respondents said that a display beneath the video was less disruptive to their viewing, with several telling us that this placement was similar to existing video platforms such as YouTube. 
  •  Users need “persistent onboarding” to make use of the content they get from live fact-checking. A user guide or FAQ is not enough. Squash can’t yet provide real-time fact-checking. It is a system that matches claims made during a televised event to claims previously checked. But users need to be reminded that they are seeing a “related fact-check,” not necessarily a perfect match to the claim they just heard. “Persistent onboarding” means providing users with subtle reminders in the display. For example, when a user hovers over the label “Related Fact Check,” a small box could explain that this is not a real-time fact check but an already published fact check about a similar claim made in the past. This was one of the features users liked most because it kept them from having to find the information themselves.
  • Users prefer all the information that is available on the initial screen. Our first test allowed users to expand the display to see more information about the fact check, such as the publisher of the fact check and an explanation of what statement triggered the system to display a fact check. But users said that having to toggle the display to see this information was disruptive. 
Users told us they wanted more on-screen explanations, sometimes called “persistent onboarding.”

More to Learn

Though we’ve learned a lot, some big questions remain. We still don’t know what live fact-checking looks like under less-than-ideal conditions. For example, how would users react to a fact check when the spoken claim is true but the relevant fact check is about a claim that was false? 

And we need to figure out timing, particularly for multi-speaker events such as debates. When is the right time to display a fact-check after a politician has spoken? And what if the screen is now showing another politician?

And how can we appeal to audiences that are skeptical of fact-checking? One respondent specifically said he’d want to be able to turn off the display because “none of the fact-checkers are credible.” What strategies or content would help make such audiences more receptive to live fact-checking? 

As we wrestle with those questions, moving live fact-checking to the web still opens up new possibilities, such as the ability to pause content (we call that “DVR mode”), read fact-checks,  and return to the event. We are hopeful this shift in platform will ultimately bring automated fact-checking to larger audiences.

Comments closed

Squash report card: Improvements during State of the Union … and how humans will make our AI smarter

Squash, the experimental pop-up fact-checking product of the Reporters’ Lab, is getting better.

Our live test during the State of the Union address on Feb. 4 showed significant improvement over our inaugural test last year. Squash popped up 14 relevant fact-checks on the screen, up from just six last year.

That improvement matches a general trend we’ve seen in our testing. We’ve had a higher rate of relevant matches when we use Squash on videos of debates and speeches.

But we still have a long way to go. This month’s State of the Union speech also had 20 non-relevant matches, which means Squash displayed fact-checks that weren’t related to what the president said. If you’d been watching at that moment, you probably would have thought, “What is Squash thinking?”

We’re now going to try two ways to make Squash smarter: a new subject tagging system that will be based on a wonderfully addictive game developed by our lead technologist Chris Guess; and a new interface that will bring humans into the live decision-making. Squash will recommend fact-checks to display, but an editor will make the final judgment.

Some background in case you’re new to our project: Squash, part of the Lab’s Tech & Check Cooperative, is a revolutionary new product that displays fact-checks on a video screen during a debate or political speech. Squash “hears” what politicians say, converts their speech to text and then searches a database of previously published fact-checks for one that’s related. When Squash finds one, it displays a summary on the screen.

For our latest tests, we’ve been using Elasticsearch, a tool for building search engines that we’ve made smarter with two filters: ClaimBuster, an algorithm that identifies factual claims, and a large set of common synonyms. ClaimBuster helps Squash avoid wasting time and effort on sentences that aren’t factual claims, and the synonyms help it make better matches.

Guess, assisted by project manager Erica Ryan and student developers Jack Proudfoot and Sanha Lim, will soon be testing a new way of matching that uses natural language processing based on the subject of the fact-check. We believe that we’ll get more relevant matches if the matching is based on subjects rather than just the words in the politicians’ claims.

But to make that possible, we have to put subject tags on thousands of fact-checks in our ClaimReview database. So Guess has created a game called Caucus that displays a fact-check on your phone and then asks you to assign subject tags to it. The game is oddly addictive. Every time you submit one, you want to do another…and another. Guess has a leaderboard so we can keep track of who is tagging the most fact-checks. We’re testing the game with our students and staff, but hope to make it public soon.

We’ve also decided that Squash needs a little human help. Guess, working with our student developer Matt O’Boyle, is building an interface for human editors to control which matches actually pop up on users’ screens.

The new interface would let them review the fact-check that Squash recommends and decide whether to let it pop up on the screen, which should help us filter out most of the unrelated matches.

That should eliminate the slightly embarrassing problem when Squash makes a match that is comically bad. (My favorite: one from last year’s State of the Union when Squash matched the president’s line about men walking on the moon with a fact-check on how long it takes to get a permit to build a road.)

Assuming the new interface works relatively well, we’ll try to do a public demo of Squash this summer. 

Slowly but steadily, we are making progress. Watch for more improvements soon.

Comments closed

Beyond the Red Couch: Bringing UX Testing to Squash

Fact-checkers have a problem.

They want to use technology to hold politicians accountable by getting fact-checks in front of the public as quickly as possible. But they don’t yet know the best ways to make their content understood. At the Duke Reporters’ Lab, that’s where Jessica Mahone comes in.

Jessica Mahone is designing tests to help Duke Reporters’ Lab researchers figure out how to clearly share fact-checks live during broadcasts. Photo by Andrew Donohue

The Lab is developing Squash, a tool built to bring live fact-checking of politicians to TV. Mahone, a social scientist, was brought on board to design experiments and conduct user experience (UX) tests for Squash. 

UX design is the discipline focused on making new products easy to use. A clear UX design means that a product is intuitive and new users get it without a steep learning curve. 

“If people can’t understand your product or find it hard to use, then you are doomed from the start. With Squash, this means that we want people to comprehend the information and be able to quickly determine whether a claim is true or not,” Mahone said

For Squash, fact-check content that pops up on screens needs to be instantly understood since it will only be visible for a few seconds. So what’s the best way?

Bill Adair, the director of the Duke Tech & Check Cooperative, organized some preliminary testing last year that he dubbed the red couch experiments. The tests revealed more research was needed to understand the best way to inform viewers. 

“I originally thought that all it would take is a Truth-O-Meter popping up on screen,” Adair said. “Turns out it’s much more complicated than that.”

Sixteen people watched videos of Barack Obama and Donald Trump delivering State of the Union speeches while fact-checks of some of what they said appeared on the screen. Ratings were true, false or something in between. Blink, a company specializing in UX testing, found that participants loved the concept of real-time fact-checking and would welcome it on TV broadcasts. But the design of the pop-up fact-checks often confused them.

It’s not just the quality of content that counts. Viewers must understand what they see very quickly. Squash may one day share fact-checks during live events, including State of the Union addresses.

Some viewers didn’t understand the fact-check ratings such as true or false when they were displayed. Others assumed the presidents’ statements must be true if no fact-check was shown. That’s a problem because Squash doesn’t fact-check all claims in speeches. It displays published previously fact-checks for only the claims that match Squash’s finicky search algorithm. 

The red couch experiments were “a very basic test of the concept,” Mahone said. “What they found mainly is that there was a need to do more diving in and digging into the some questions about how people respond to this. Because it’s actually quite complex.”

Mahone has developed a new round of tests scheduled to begin this week. These tests will use Amazon Mechanical Turk, an online platform that relies on people who sign up to be paid research subjects.

“One thing that came out of the initial testing was that people don’t like to see a rating of a fact-check,” Mahone said. “I was a little skeptical of that. Most of the social science research says that people do prefer things like that because it makes it a lot easier for them to make decisions.”

In this next phase, Mahone will recruit about 500 subjects. A third will see a summary of a fact-check with a PolitiFact TRUE icon. Another third will see a summary with the just the label TRUE. The rest will see just a summary text of a fact-check.

Each viewer will rank how interested they are in using an automated fact-checking tool after viewing the different displays. Mahone will compare the results.

After finding out if including ratings works, Mahone and three undergraduate students, Dora Pekec, Javan Jiang and Jia Dua, will look at the bigger picture of Squash’s user experience. They will use a company to find about 20 people to talk to, ideally individuals who consistently watch TV news and are familiar with fact-checking.

Participants will be asked what features they would want in real-time fact-checking.

“The whole idea is to ask people ‘Hey, if you had access to a tool that could tell you if what someone on TV is saying is true or false, what would you want to see in that tool?’ ” Mahone said. “We want to figure out what people want and need out of Squash.”

Figuring out how to make Squash intuitive is critical to its success, according to Chris Guess, the Lab’s lead technologist. Part of the challenge is that Squash is something new and viewers have no experience with similar products.

“These days, people do a lot more than just watch a debate. They’re cooking dinner, playing on their phone, watching over the kids,” Guess said. “We want people to be able to tune in, see what’s going on, check out the automated fact-checks and then be able to tune out without missing anything.”

Reporters’ Lab researchers hope to have Squash up and running for the homestretch of the 2020 presidential campaign. Adair, Knight Professor of the Practice of Journalism and Public Policy at Duke, has begun reaching out to television executives to gauge their interest in an automated fact-checking tool. 

“TV networks are interested, but they want to wait and see a product that is more developed.” Adair said. 

 

Comments closed