CareSet and data journalism

Our mission is to encourage an ecosystem of innovators to collaborate and share tools and research methodologies around open healthcare datasets.

The DocGraph Journal was formed as a movement to create open health data transparency. The efforts of the DocGraph community, guided by Fred Trotter, led to the first national Provider Referral data release by the US government. The original “DocGraph Data” has helped researchers, journalists, and companies around the nation to provide data-backed healthcare solutions.

The DocGraph Journal is now a subsidiary of CareSet. Our commercial arm helps fund our data journalism efforts, as our journalistic principles guide our commercial operations.

We continue to work with federal, state, private, non-profit, and public entities to create and open healthcare datasets. Our community includes academics, journalists, doctors, entrepreneurs, statisticians and more. Our members have used our datasets to restructure provider networks, teach classes, start companies, and report on quality metrics. We welcome anyone with passion for healthcare improvement to join us.

Our community also includes VRDC seat holders. The VRDC Users Group is for seat holders who have access to the CMS Virtual Research Data Center provided by CMS. This is a support and collaboration users group for data scientists who have access to approved data files and conduct their analysis within the CMS secure environment.

Purchases of CareSet datasets support the community of data scientists focused on improving transparency in healthcare. Many datasets are provided at no cost.

Contact Us


The Medicare Referring Provider Utilization for Procedures (MrPUP) dataset details the healthcare procedures that Medicare providers referred in the outpatient setting in 2014, and is particularly useful in understanding how specific doctors use blood tests, CT scans, and MRIs to diagnose patients.

NPI to Contact Domain FOIA Data

This is a list of NPIs connected to contact email domains as submitted by providers to NPPES. For example, when Jane Smith applied for an NPI, she provided a contact email of “”. In the data, her NPI will be listed alongside the domain of “”.

This data comes from the NPPES database, but is NOT currently included in the public dissemination file provided by CMS.

203,939 rows of data are in this free release. We will be making refresh FOIA requests in the future, and and we expect the number of records to increase substantially over time.

PaPR Data Set

The PaPR data set is an important new open data set that describes the adoption of e-prescribing as measured in Part D Medicare data. The dataset, called PaPR (Providers and Prescribing Records) details how different individual prescribers deliver prescriptions to pharmacies, either using e-prescribing, faxes or paper.  The PaPR (pronounced “paper”) data set aggregates this information on a per-prescriber basis.

DocGraph Hop Teaming

DocGraph Hop Teaming is latest version of the classic dataset that shows how health care providers in the United States work together.

Please take a look at the extensive documentation for the reasons for the improved HOP dataset.

DocGraph FOIA Data (deprecated)

The physician referral data was provided as a response to a Freedom of Information Act (FOIA) request that CareSet made years ago.

The current CMS link to this data is here:

Note that this dataset has been deprecated in favor of the new DocGraph HOP teaming dataset (above). For all new projects, please use that dataset.

Cancer Moonshot Data

Details how cancer patients enter and exit hospitals. This data is aggregated Medicare data starting in 2010 and ending in 2015, totaling six years.