CareSet Dataset: PaPR

PaPR Dataset Documentation

The PaPR data set is an important new open data set that describes the adoption of e-prescribing as measured in Part D Medicare data. The dataset, called PaPR (Providers and Prescribing Records) details how different individual prescribers deliver prescriptions to pharmacies, either using e-prescribing, faxes or paper.  The PaPR (pronounced “paper”) data set aggregates this information on a per-prescriber basis.

To purchase or download the PaPR dataset, please go to the CareSet Data Store.

Medicare Part D data shows how each prescription written for a Medicare patient is delivered to a pharmacy. The PaPR (Provides and Prescribing Records) data set, released by CareSet Labs, aggregates this information on a per-prescriber basis.

Policy makers tend to agree that electronic prescribing methods enable for more consistent and safer management of medications. EHR-borne prescriptions should ensure that a single patient is not given combinations of medications that might be dangerous. By delivering the right medications, more consistently, it should be possible to enable greater patient adherence to medication regimes. The extensive use of automated formulary checking should ensure that patients are prescribed medications that their insurance covers and that more inexpensive drugs are leveraged first. All of these benefits rely on a robust infrastructure for e-prescribing.

As a result, CMS has created several programs for specifically incentivizing providers to leverage e-prescribing, including the Meaningful Use and MACRA programs, which provide payments for EHR systems that are capable of e-prescribing.

E-prescribing adoption is an important techno-policy issue, and this data set is intended to provide a means for policy makers, data scientists and entrepreneurs to understand how adoption is shifting, on a per-provider basis.

There are summary datasets for each year from 2010 to 2013. Provider-level datasets are available for 2014 and 2015.

There are 6 possible values for how each Part D prescription was delivered to a pharmacy:

Blank = null value = no data.
0 = Not specified
1 = Written
2 = Telephone
3 = Electronic
4 = Facsimile
5 = Pharmacy

These data categories are defined in the following AHRQ data documentation:

Most of them are what you might expect, but #5 pharmacy complex and worth taking a closer look at.

The provider aggregated data set begins with the following field.
prescriber_npi – NPI of the prescriber

Both datasets ( aggregated on per-NPI basis or nationally) have the following additional fields:

origin_code – one of the values listed above that represents a mechanism to relay prescription information
patient_count – the number of patients who had scripts sent this way in a given year
pde_count – the count of individual Part D Drug Events (PDE) for these patients
prescription_count – the count of individual prescriptions, estimated by the count of distinct prescription reference numbers

The patient count is the number of patients that this prescriber wrote scripts for, using this origin code, when that number was over 11 patients.

The prescribers are coded using the NPI. Which is available as public data using the NPPES data set. NPPES is available here:

Organizational NPIs are very rare in the data set, since they do not appear frequently as the “prescriber” in Part D data, but they do occasionally occur.

When interpreting the data, it is important to remember that script writing that falls below the 11 patient threshold is redacted. So if a prescriber with an NPI of 8888888881 had the following data pattern:

8888888881, 0, 11, 11
8888888881, 1, 11, 11
8888888881, 2, 11, 11
8888888881, 3, 11, 11
8888888881, 4, 11, 11
8888888881, 5, 11, 11

You can be certain that you have all of the data for provider 8888888881 and that there at least 11 patients total who received prescriptions (assuming that 8888888881 treated each patient a provided one script delivered for each of the 6 pathways. You also know that no more than 66 patients were given scripts by 8888888881 (with each given one and only one pathways.

Things get more complicated for providers with only a few rows of data. Consider a hypothetical provider 7777777772

7777777772, 3,11,11

Perhaps there are 6 six redacted rows of data for this provider, they could have zeros, like this:


Or, these redacted rows could be as high as…


So when calculating statistics about 7777777772 you must use ranges. As with 8888888881, you can be certain that there are at least 11 patients who have been prescribed medication by 7777777772. But is also possible that there are as many as 60 patients for which data has been redacted.

For most prescribers, the redactions to damage the quality of estimates of the degree to which they exercise one prescription delivery mechanism over another. Depending on the type of analysis you want to perform, it might be simpler to exclude providers who have less than 100 patients or less than 2 or 3 rows of data. This would serve to ensure that your analysis only includes providers for whom the public data is completely accurate.