Smooth our data pipeline by extending and improving automation. You have experience writing, testing, and reviewing SQL statements and otherwise managing multi-stage SQL data pipelines. MariaDB and MySQL feel very comfortable for you. You know how to write PHP/Python code to execute SQL commands and how to work with the Linux command prompt generally. You find the notion of being on a team that builds and maintains healthcare data ETL for the long haul exciting.
This role would be a fit for a senior PHP/Python engineer who is interested in repurposing a developer career into data engineering and/or data science. Or someone who already is a mid-level data engineer/data scientist with MySQL experience.
● Understand the underlying data structures in our data pipeline, including our data sources and how
they are managed in our data warehouse within 1 month
● Contribute to the maintenance of our ETL pipeline system within 1 month
● Write reports to assist data analysis within 2 months
● Collaborate with other employees in SQL report-writing within 3 months
● Write new SQL-based data ingestion pipelines to expand the contents of our data warehouse, where
the SQL is encased in lightweight PHP/Python scripts that exist only for this purpose within 3 months
● Create and maintain multiple-stage raw-SQL ETL transformation pipelines, including complicated
data fixing and repair process, within 3 months.
● Develop and use custom command-line-based data munging tools within 6 months
● If you know PHP, become familiar with Python. If you know Python, become familiar with PHP within 6 months.
● At least 2 years experience using SQL
● At least 1 year experience using SQL for ETL
● At least 2 years experience working with MySQL or PostgreSQL
● Bachelors in IT/Comp Sci or equivalent experience
● Understand how to use SQL to do ETL tasks. Understand which ETL tasks are better performed
outside of SQL.
● Understand advanced index techniques in MariaDB as they apply to various DB engines (InnoDB vs MyISAM etc).
● Communicate technical subjects clearly in writing.
● Operate in a fully pun-compliant environment.
● Document code so that others can easily understand it.
● CLI Scripting experience in either PHP or Python.
● SQL generally, MariaDB specifically. You will be tested on the differences between types of JOINS,
a basic understanding of the different permutations of the “CREATE TABLE” syntax, etc.
● Capable of breaking complex data transformations into several distinct SQL statements that run one after another.
● Python for data analysis (Pandas, Jupyter, etc)
● X12/HL7, claims data and/or other clinical data standards
● SAS (the statistical programming language)
● R, Stata or SPSS and/or statistical methods
● Reporting engines
● Agile software development principles
● Unit testing and/or other test-drive development methods
Location: Houston, TX preferred but remote candidates considered
Position Type: Full-time permanent
About CareSet Systems
CareSet (https://careset.com) is transforming the way biomedical companies go to market. We believe in getting the best treatments to the right patients quickly and efficiently. We do that by analyzing government data sources, such as Medicare claims data. With CareSet, biomedical companies become better at serving the patient community. To apply, contact email@example.com.