Datasets from CareSet Journal

We believe the release of reliable and current data is vital to the improvement of the healthcare system.

CareSet Journal is a data journalism organization.

Our mission is to encourage an ecosystem of innovators to collaborate and share tools and research methodologies around open healthcare datasets. 

The DocGraph Journal was formed as a movement to create open health data transparency. The efforts of the DocGraph community, guided by Fred Trotter, led to the first national Provider Referral data release by the US government. The original “DocGraph Data” has helped researchers, journalists, and companies around the nation to provide data-backed healthcare solutions.

The DocGraph Journal is now doing business as CareSet Journal, and is a subsidiary of CareSet Systems. Our commercial arm helps fund our data journalism efforts, as our journalistic principles guide our commercial operations.

We continue to work with federal, state, private, non-profit, and public entities to create and open healthcare datasets. Our community includes academics, journalists, doctors, entrepreneurs, statisticians and more. Our members have used our datasets to restructure provider networks, teach classes, start companies, and report on quality metrics. We welcome anyone with passion for healthcare improvement to join us.

Our community also includes VRDC seat holders. The VRDC Group is for seat holders who have access to the CMS Virtual Research Data Center provided by CMS. This is a support and collaboration users group for data scientists who have access to approved data files and conduct their analysis within the CMS secure environment.

Purchases of CareSet Journal datasets support the community of data scientists focused on improving transparency in healthcare. Many datasets are provided at no cost.

Cart is empty (0)

DocGraph Hop Teaming

DocGraph Hop Teaming is latest version of the classic dataset that shows how health care providers in the United States work together.

Please take a look at the extensive documentation for the reasons for the improved HOP dataset.

Data DocumentationDocGraph Hop Teaming Dataset Documentation

License: Open Source Eventually / Commercial License

Version :

MrPUP

The Medicare Referring Provider Utilization for Procedures (MrPUP) dataset details the healthcare procedures that Medicare providers referred in the outpatient setting in 2014, and is particularly useful in understanding how specific doctors use blood tests, CT scans, and MRIs to diagnose patients.

Data DocumentationCareSet MrPup Dataset Documentation

License: Open Source Eventually / Commercial License

License :

PaPR Data Set

The PaPR data set is an important new open data set that describes the adoption of e-prescribing as measured in Part D Medicare data. The dataset, called PaPR (Providers and Prescribing Records) details how different individual prescribers deliver prescriptions to pharmacies, either using e-prescribing, faxes or paper.  The PaPR (pronounced “paper”) data set aggregates this information on a per-prescriber basis.

Data Documentation: Data Documentation

License: Non-Commercial , Commercial

Year :
License :

Cancer Moonshot Data

Details how cancer patients enter and exit hospitals. This data is aggregated Medicare data starting in 2010 and ending in 2015, totaling six years.

Data Documentation: CareSet Cancer Moonshot Dataset Documentation

License: Creative Common 3.0

DocGraph FOIA Data

The physician referral data linked below was provided as a response to a Freedom of Information Act (FOIA) request that CareSet made more than 5 years ago.

These datasets can be difficult to find on the CMS website and sometimes the CMS resource is down, for this reason, we mirror this dataset here.

These files represent the number of encounters a single beneficiary has had across physicians at intervals of 30, 60, 90 and 180 days. For more details about the file contents for years 2009 – 2015, please see the Technical Requirements document.

Note that this dataset has been deprecated in favor of the new DocGraph HOP teaming dataset. For all new projects, please use that dataset. We also maintain these datasets because they go back farther than HOP, which could be useful for long-term analysis projects.

Data Source: Physician Shared Patient Patterns

License: Open Database License (ODbL)

Version :

NPI to Contact Domain FOIA Data

This is a list of NPIs connected to contact email domains as submitted by providers to NPPES. For example, when Jane Smith applied for an NPI, she provided a contact email of “jane.smith@examplehospital.com”. In the data, her NPI will be listed alongside the domain of “examplehospital.com”.

This data comes from the NPPES database, but is NOT currently included in the public dissemination file provided by CMS.

203,939 rows of data are in this free release. We will be making refresh FOIA requests in the future, and and we expect the number of records to increase substantially over time.