What exactly is the Fact-Check Insights dataset?

You Are Here: Home » What exactly is the Fact-Check Insights dataset?

What exactly is the Fact-Check Insights dataset?

Get details about data that can aid your misinformation research

By Erica Ryan - June 20, 2024

Since its launch in December, the Fact-Check Insights dataset has been downloaded hundreds of times by researchers who are studying misinformation and developing technologies to boost fact-checking.

But what should you expect if you want to use the dataset for your work?

First, you will need to register. The Duke Reporters’ Lab, which maintains the dataset with support from the Google News Initiative, generally approves applications within a week. The dataset is intended for academics, researchers, journalists and/or fact-checkers.

Once you are approved, you will be able to download the dataset in either CSV or JSON format.

Those files include the metadata for more than 200,000 fact-checks that have been tagged with ClaimReview and/or MediaReview markup.

The two tagging systems — ClaimReview for text-based claims, MediaReview for images and videos — are used by fact-checking organizations across the globe. ClaimReview summarizes a fact-check, noting the person and claim being checked and a conclusion about its accuracy. MediaReview allows fact-checkers to share their assessment of whether a given image, video, meme or other piece of media has been manipulated.

The Reporters’ Lab collects ClaimReview and MediaReview data when it is submitted by fact-checkers. We filter the data to include only reputable fact-checking organizations that have qualified to be listed in our database, which we have been publishing and updating for a decade. We also work to reduce duplicate entries, and standardize the names of fact-checking organizations. However, for the most part, the data is presented in its original form as submitted by fact-checking organizations.

Here are the fields that you can expect to be included in the dataset, along with examples:

ClaimReview

CSV Key	Description	Example Value
id	Unique ID for each ClaimReview entry	6c4f3a30-2ec1-4e2e-9b57-41ad876223e5
@context	Link to schema.org, the home of ClaimReview	https://schema.org
@type	Type of schema being used	ClaimReview
claimReviewed	The claim/statement that was assessed by the fact-checker	Marsha Blackburn “voted against the Reauthorization of the Violence Against Women Act, which attempts to protect women from domestic violence, stalking, and date rape.”
datePublished	The date the fact-check article was published	10/9/18
url	The URL of the fact-check article	https://www.politifact.com/truth-o-meter/statements/2018/oct/09/taylor-swift/taylor-swift-marsha-blackburn-voted-against-reauth/
author.@type	Type of author	Organization
author.name	The name of the fact-checking organization that submitted the fact-check	PolitiFact
author.url	The main URL of the fact-checking organization	http://www.politifact.com
itemReviewed.@type	Type of item reviewed	Claim
itemReviewed.author.name	The person or group that made the claim that was assessed by the fact-checker	Taylor Swift
itemReviewed.author.@type	Type of speaker	Person
itemReviewed.author.sameAs	URLs that help establish the identity of the person or group that made the claim, such as a Wikipedia page (rarely used)	https://www.taylorswift.com/
reviewRating.@type	Type of review	Rating
reviewRating.ratingValue	An optional numerical value assigned to a fact-checker’s rating. Not standardized. (Note: 1.) The ClaimReview schema specifies the use of an integer for the ratingValue, worstRating and bestRating fields. 2.) For organziations that use ratings scales (such as PolitiFact), if the rating chosen falls on the scale, the numerical rating will appear in the ratingValue field. 3.) If the rating isn’t on the scale (ratings that use custom text, or special categories like Flip Flops), the ratingValue field will be empty, but worstRating and bestRating will still appear. 4.) For organizations that don’t use ratings that fall on a numerical scale, all three fields will be blank.)	8
reviewRating.alternateName	The fact-checker’s conclusion about the accuracy of the claim in text form — either a rating, like “Half True,” or a short summary, like “No evidence”	Mostly True
author.image	The logo of the fact-checking organization	https://d10r9aj6omusou.cloudfront.net/factstream-logo-image-61554e34-b525-4723-b7ae-d1860eaa2296.png
itemReviewed.name	The location where the claim was made	in an Instagram post
itemReviewed.datePublished	The date the claim was made	10/7/18
itemReviewed.firstAppearance.url	The URL of the first known appearance of the claim	https://www.instagram.com/p/BopoXpYnCes/?hl=en
itemReviewed.firstAppearance.type	Type of content being referenced	Creative Work
itemReviewed.author.image	An image of the person or group that made the claim	https://static.politifact.com/CACHE/images/politifact/mugs/taylor_swift_mug/03dfe1b483ec8a57b6fe18297ce7f9fd.jpg
reviewRating.ratingExplanation	One to two short sentences providing context and information that led to the fact-checker’s conclusion	Blackburn voted in favor of a Republican alternative that lacked discrimination protections based on sexual orientation and gender identity. But Blackburn did vote no on the final version that became law.
itemReviewed.author.jobTitle	A title or description of the person or group that made the claim	Mega pop star
reviewRating.bestRating	An optional numerical value representing what rating a fact-checker would assign to the most accurate content it assesses. See note on “reviewRating.ratingValue” field above.	10
reviewRating.worstRating	An optional numerical value representing what rating a fact-checker would assign to the least accurate content it assesses. See note on “reviewRating.ratingValue” field above.	0
reviewRating.image	An image representing the fact-checker’s rating, such as the Truth-O-Meter	https://static.politifact.com/politifact/rulings/meter-mostly-true.jpg
itemReviewed.appearance.1.url to itemReviewed.appearance.15.url	A URL where the claim appeared. This field has been limited to the first 15 URLs submitted for the stability of the CSV. See the JSON download for complete “appearance” data.	https://www.instagram.com/p/BopoXpYnCes/?hl=en
itemReviewed.appearance.1.@type to itemReviewed.appearance.15.@type	Type of content being referenced	CreativeWork

MediaReview

CSV Key	Description	Example Value
id	Unique ID for each MediaReview entry	2bfe531d-ff53-40f5-8114-a819db22ca8b
@context	Link to schema.org, the home of MediaReview	https://schema.org
@type	Type of schema being used	MediaReview
datePublished	The date the fact-check article was published	2020-07-02
mediaAuthenticityCategory	The fact-checker’s conclusion about whether the media was manipulated, ranging from “Original” to “Transformed” (More detail)	Transformed
originalMediaContextDescription	A short sentence explaining the original context if media is used out of context	In this case, there was no original context. But this is a text field.
originalMediaLink	Link to the original, non-manipulated version of the media (if available)	https://example.com/
url	The URL of the fact-check article that assesses a piece of media	https://www.politifact.com/factchecks/2020/jul/02/facebook-posts/no-taylor-swift-didnt-say-we-should-remove-statue-/
author.@type	Type of author	Organization
author.name	The name of the fact-checking organization	PolitiFact
author.url	The URL of the fact-checking organization	http://www.politifact.com
itemReviewed.contentUrl	The URL of the post containing the media that was fact-checked	https://www.facebook.com/photo.php?fbid=10223714143346243&set=a.3020234149519&type=3&theater
itemReviewed.startTime	Timestamp of video edit (in HH:MM:SS format)	0:01:00
itemReviewed.endTime	Ending timestamp of video edit, if applicable (in HH:MM:SS format)	0:02:00
itemReviewed.@type	Type of media being reviewed	ImageObject / VideoObject / AudioObject

Please note that not every fact-check will contain data for every field.

For the JSON version of the table above, please see the “What you can expect when you download the data” section of the Guide on the Fact-Check Insights website. The Guide page also contains tips for working with the ClaimReview and MediaReview data.

If you continue to have questions about the Fact-Check Insights dataset, please reach out to hello@factcheckinsights.org.