Smooth our data pipeline by extending and improving automation. You have experience writing, testing, and reviewing SQL statements and otherwise managing the quality of multi-stage SQL data pipelines. MariaDB and MySQL feel very comfortable for you. You know how to write Python code (preferably with Great Expectations) to ensure SQL commands work properly. You know how to related code reviews, QA processes, and testing infrastructure to pay down data pipeline debt. Also how to work with the Linux command prompt generally. You find the notion of being on a team that builds and maintains healthcare data ETL for the long haul exciting.
This role would be a fit for a senior Python engineer who is interested in repurposing a developer career into data engineering and/or data science. Or someone who already is a mid-level data engineer/data scientist with MySQL experience. A background in data processing quality assurance is a must, and SAS experience is invaluable.
● Understand the underlying data structures in our data pipeline, including our data sources and how
they are managed in our data warehouse within 1 month
● Contribute to the quality of our ETL pipeline system within 1 month
● Write automated tests for our ETL pipeline 2 months
● Collaborate with other employees in improving the quality of SQL within 3 months
● Evaluate new SQL-based data ingestion pipelines to expand the contents of our data warehouse,
where the SQL is encased in lightweight PHP/Python scripts that exist only for this purpose within 3
● Develop and use custom command-line-based data verification tools within 6 months
● If you know SAS, become familiar with Python. If you know Python, become familiar with SAS within 6 months
● Experience working with healthcare claims data and/or EHR systems, X12/HL7 standards etc.
● At least 2 years experience using SQL
● At least 1 year experience using SQL for ETL
● At least 2 years experience working with MySQL or PostgreSQL
● Bachelors in IT/Comp Sci or equivalent experience
● Understand how to use SQL to do ETL tasks. Understand which ETL tasks are better performed
outside of SQL
● Experience with automated test frameworks, especially Great Expectations.
● Experience with Github-based code review processes and continuous integration functionality.
● Communicate technical subjects clearly in writing.
● Operate in a fully pun-compliant environment.
● Document code so that others can easily understand it.
● CLI Scripting experience in Python.
● SQL generally, MariaDB and SAS Proc SQL specifically. You will be tested on the differences
between types of JOINS, a basic understanding of the different permutations of the “CREATE
TABLE” syntax, etc
● Capable of breaking complex data transformations into several distinct SQL statements that can be
● R, Stata or SPSS and/or statistical methods
● Agile software development principles
Location: Flexible, US-based.
Position Type: Full-time permanent
About CareSet Systems
CareSet (https://careset.com) is transforming the way biomedical companies go to market. We believe in getting the best treatments to the right patients quickly and efficiently. We do that by analyzing government data sources, such as Medicare claims data. With CareSet, biomedical companies become better at serving the patient community. To apply, contact firstname.lastname@example.org.