CareSet Dataset: DocGraph Hop Teaming Documentation

DocGraph Hop Teaming Documentation

Data Structure

The data has the following columns:

  • from_npi – The provider seen first in sequence, coded by NPI
  • to_npi – The provider seen second in sequence, coded by NPI
  • patient_count – The total number of patients shared between the two providers over the entire time period (the time period is typically one year)
  • transaction_count – The count of times that a patient switched between the two providers, in the from-to direction.
  • average_day_wait – The average amount of days it took for a “HOP” to occur. Which is the the time it took, in days, for a patient to switch to the second provider after having seen the first provider.
  • std_day_wait – The standard deviation of days it took for a HOP to occur.


For additional information about the Hop Teaming dataset, please refer to our blog post here. We are not copying the contents of the blog post here, so it is critically important for you to read that content before continuing.

An “NPI” stands for National Provider Identifier, which is a unique identifier assigned to a specific person or institution that bills for services in the United States. CMS maintains these identifiers as part of its NPPES system and regularly releases information about which provider has which NPI here:


How do I look up information about the NPIs?

The NPPES data has information about what type of provider each NPI is (i.e. physician, hospital, nurse practitioner, etc etc). NPPES comes with an extensive README which should be read carefully. But using this information it is possible to link each NPI to real names, locations, as well as contact information, as well as provider type information.

This dataset is a “shared patient” dataset, how to calculate referrals?

Strictly speaking, you cannot perfectly calculate this. Just because provider A and provider B shared a set of patients does not mean that this patient sharing was the result of a referral between A and B. Other explanations include they were BOTH referrals from provider C. Or they are simply involved in a not-optional chain of care (many Primary Care physicians connect with ER departments under this mechanism).

However, here are some assumptions that can help you to estimate which connections are explicit referrals.

  • Use MrPUP in combination with HOP. MrPUP shows the portion of referrals, where the referral was noted explicitly in the claim. Not all referrals explicitly listed in claims, but most lab and imaging results will appear there. In clinical cases, CMS requires that the referral field is populated to ensure payment MrPUP is very reliable.
  • You can use the combination of the average and std “day wait” fields to calculate an estimate of how quickly the bulk of patients were seen by provider B after leaving provider A. Some people use closeness in time to infer that a referral has taken place. Note that standard deviations are very useful for this, but explaining standard deviations (especially for time-series data, which is non-normal) is waaaay beyond the scope of this documentation.
  • You should use the NPPES provider type information to estimate which types of pairings are likely referral relationships and which are not. For instance, a primary care to cardiologist connection is likely a referral, but a primary care to emergency department referral is unlikely to be a referral.

Can you calculate referral counts from this data?

For the remaining question of “how many referrals”.  There are two things you could mean by this:
1. How many patients are referred between A and B?
When the relationship between A and B is a “referral” style relationship, then the patient count field answers this question. It is unlikely that a patient saw B after seeing A without the referral occurring.
2. How many transactions are referred between A and B?
You should use the transaction_count field to estimate this number.

HOP and the original FOIA DocGraph count transactions differently. The HOP algorithm gives a more conservative estimate of this, but could underestimate the number of appointments that are “inspired” by the “referral”. The original FOIA algorithm like over-estimates in longer sliding window and under-estimates for smaller sliding windows.

Generally, however, it is a very good assumption that when the transaction_count and the patient_count fields are nearly the same, the number of inspired transactions approaches one. When the transaction count is much larger, then it does require interpretation.
You will need to study the how HOP calculates the transaction_count field in order to use this field effectively.

I cannot load this file in excel. What do I do?

Working with this file in its entirety will require database experience. However, it is possible to use excel on files that are derived from this larger CSV file. To do this type of work without loading the file into a database and connecting it to NPPES, you can use csvkit. Which allows you to perform some SQL-like operations on CSV files. Start by choosing a provider you want to study using the NPPES file and then use csvgrep to find all mentions of that NPI (in a specific field) in the file.

Has the data been pre-processed or filtered?

Previous versions of this data were not filtered and included non-valid NPI values that are in the source data. In later versions, we have started to remove some cruft NPIs from the dataset. It’s pretty easy to see which generation of a file you have. Just grep for 99999999 in the file. That is a very frequent cruft NPI value that appears in the messy CMS source data. For the most part values like this must be ignored because it is impossible to sort out what they might mean.

How does the privacy threshold work?

As per CMS privacy policies provider pairs who saw less than 11 distinct patients together, in total, in the given time-period are not included in the release. There will always be at least 11 transactions between the two providers, at least one for each patient.

Does DocGraph HOP include data from referral fields?

No. In order to avoid double counting the relationships between any two NPIs, all versions of DocGraph (including HOP) do not include data from the referring NPI field. Typically, NPIs that appear in referral fields will also have a different claim where they appeared as a performing provider. This is the reason why it can be helpful to combine HOP with MrPUP which can partially address this problem by detailing how NPIs are referring.

Why has the row count of HOP Teaming increased over the years?

HOP considers every provider that is associated as a non-referring provider on a claim. Recently, CMS has begun to new provider fields that has increased the number of provider fields. Also, the granularity of the data has improved. In previous years, less of the provider fields were correctly filled out. So in current releases of the data, there are more provider fields associated with a claim and they are more often filled out. The HOP algorithm grows combinatorically in response to the more providers being associated with each claim. This has resulted in much more granular results in more recent years and many more rows of data.