HHS fighting silos

HHS finally used the “s”-word (silos) talking about themselves.

The Context:

When anyone seeks to understand what is happening within HHS, a “search” must be conducted. In the era of paper records, this was a literal, physical search. An HHS employee would go to filing cabinets somewhere and find resources.

Now a “search” means figuring out who has a database. Once a database is located, it is queried and the “search” is over. Sounds reasonable, right?

This whole process grinds to a halt when the question being asked requires components from more than one database. In any large organization, this can be a problem, but HHS is one of the largest government agencies in the world.

As a data journalist, this means that I sometimes have to wait decades to get relatively simple data query problems solved. In the most frustrating data project that I have been working on with HHS, I have been waiting to get the right answer to an important question since 2008.

This problem does not just impact the researchers and data journalists who sit outside HHS. For every silo problem that I have, the leadership at HHS must have 10-20. Very frequently, the data silo problem is “the problem behind the problem”.

Do you care about Women’s health? Obesity? Health Equity? Opioid addiction? Neonatal health? Birth complication rates?

Perhaps you care about controversial topics like Abortion? Or Euthanasia? Or things that are controversial, but should not be, like Vaccines?

If you care about any of those topics, then you should care about these internal data silos at HHS. Because very frequently, when we discuss the important decisions about critical healthcare policy issues, we need to ask hard and specific questions. The Chief Data Officer of HHS, Mona Siddiqui, says in her blog post on the report, “To collect information and not use it to its fullest potential, however, is not just inefficient, it defies common sense”.

And too often, the answer is “we don’t know and we cannot figure it out, because Database A does not talk to Database B.” And whether the person running HHS is a Trump Republican or an Obama Democrat, that answer is going to frustrate them. And if you work in or around healthcare, or if you or a loved one is impacted directly by one of these topics… then the answer “we don’t know and we cannot figure it out” because of data silos is not acceptable.

This makes the news that HHS is taking formal steps to resolve its internal silo problem important news. Here is the list of significant thresholds that were crossed in this report:

  • First and foremost, HHS is formally admitting that it has a problem. This report sounds like OIG could have written it, but it was written by the CTO office. This is important because it means that this bureaucracy is organically facing its own issues. That is huge.
  • The report does not pull punches in its assessment. It talks about its problems in frank terms and does not gloss over the depth of the problem or its impact.
  • It specifically lists what the next steps should be to start working on solutions to the problem.

You will not hear about it from Colbert, and I doubt that Trump is going to tweet anything, but this is still a hugely significant step for healthcare in this country. If you are a wonk, I had you at “HHS is working to solve its silo problem” but even if you are not a wonk, I hope I have illustrated why people are excited about this! Even though the only ones who will be truly enthused will be wonks, patient advocates, and transparency watchdogs!

To the data team at HHS, specifically Mona Siddiqui,  Ed Simcox (HHS CTO), and the HHS IDEA Lab, thank you for your (not-so-common) common sense and hard work.

But enough about why this news is important… let’s review what the news was!

The News:

Yesterday, it released a report detailing its intra-agency data-sharing obstacles.

HHS first got lots of perspective from agency leaders and personnel from CMS, FDA, NIH, AHRQ and more about high-level data issues. Then they had lengthy interviews with employees who are most familiar with specific data assets. The cast a wide net, and this shows in the quality of the report.

The outline of the report is below, including HHS’ five data collaboration “Challenges” and their summaries of those challenges, which are informative. I’ve also sprinkled in some of my impressions in italics (you might have seen some of these already on my Twitter feed).

Report Outline:

Introduction / Methods

On pg 6 – What is non-public data? If HHS does not understand what “non-public data” means, no one does. One thing that they did not emphasize enough here is the possibility that by linking non-public data, they might be able to generate new releasable data. Obviously we have an ox in that ditch, so I am slightly biased about the importance of that point…  

Challenge 1 – Process for Data Access

HHS lacks consistent and standardized processes for one agency to request data from another agency. Agencies are not accountable for their responses to requests for access to internal data. If access is inappropriately denied or if access is significantly and inappropriately delayed, there are no consequences.

Challenge 2 – Technology for Data Access & Analysis

The technical formats and approaches to sharing restricted and nonpublic data across agencies vary widely. The analytical tools to interpret data can be redundant. Finally, agencies are tracking who has access to restricted and nonpublic data but can be challenged in auditing analyses for misinterpretation and misuse.

Challenge 3 – Regulatory Environment

Each data collection effort has statutes, regulations, and policies that govern the collection of and access to the data. Some statutes limit access to data and its use. In order to increase access or broaden use, changes to the relevant statutes may be required.

This is the most comprehensive high-level regulatory overview of the topic that I have ever read. Only a handful of people that I know have those issues on top of mind (me only about 50/50). Very valuable information. I will refer to this section of the report as the canonical all-in-one-place summary of these issues from now on.

Challenge 4 – Disclosure Risk Management

The risk of identifying geographic areas or violating individual privacy increases as more variables and more granular data are collected and shared, often leading to an increase in limits on microdata access.

Challenge 5 – Norms & Resource Constraints

Data representatives do not see the demand for sharing restricted and nonpublic data; view the public use files as sufficient for the majority of analyses; and, for certain data programs, view data sharing requests as ad-hoc or special. Strained resources, fear of misrepresentation of

the data, and reluctance to critique a sister agency for unsatisfactory data sharing practices all contribute to maintaining the status quo.

This is the best description of the “data hoarding mentality” that I have ever read.

Also love this quote from a staff member: “Data is like manure. It is best when it is spread around.”

Next Steps

Excited to see that “Efforts are underway to construct an enterprise-wide data sharing framework, through validation and collaboration with agencies and using an agile development approach.” (pg 26)

Beyond that specific next steps include:

  • Understand the landscape
  • Build workforce capacity
  • Identify use cases, demonstrate value
  • Create environment for data analysis, workflow management, and streamline data acquisition


You have to be a real data geek if your favorite part of a paper is an appendix. They list the data sources that they cover in detail.

The end of the dataset includes a list of 25 different HHS datasets. Of course, we converted this to a csv file. Direct Message us on Twitter or LinkedIn and we will send you the data file.


Fred Trotter

Fred shapes our software development and data gathering strategies, which doesn't stop him from getting elbow-deep in the code on a regular basis. He is co-author of the first Health IT O’Reilly book Hacking Healthcare, and co-creator of the DIRECT protocol mandated in Meaningful Use. Fred’s technical commentary and data journalism work has been featured in several online and print journals including Wired, Forbes, U.S. News, NPR, Government Health IT, and Modern Healthcare.