Job Description
The Informatics Data Engineer will be responsible for developing and maintaining highly scalable and reliable data management pipelines, tools, and centralized databases for conducting analyses involving clinical and phenotypic data with an overarching goal of improving clinical phenotyping. You will design, implement, automate, and maintain the ETL pipelines that facilitate approaches for extracting and analyzing large-scale phenotypic datasets, including de-identified EHR data from external collaborators, targeted clinical datasets in selected cohorts, and internal datasets from clinical trials and other human subject research. The engineer will work with analysts, clinical scientists, software developers, and programmers to provide the best data management technology solutions to store, standardize, structure and mine the clinical and phenotypic data sets.
As a Clinical Informatics Engineer, a typical day may include:
Develop tools and pipelines, and optimize internal data processes for extraction, curation, processing, and storage of clinical data.
Develop code and automate production data ETL and quality assurance pipelines.
Develop, update, and maintain standards and procedures for data and databases access, storage, versioning, and maintenance.
Use a customer-focused approach to provide data extraction and storage solutions driven by scientific use cases.
Function as a "super user" of data reporting, analysis, and management processes and tools.
Build and maintain data standardization and optimization solutions to support the use of AI/ML for advanced phenotyping.
Basic data analysis including mining and curating of phenotypic datasets to facilitate downstream genomic analysis and workflows pertaining to reporting/ dashboarding.
Maintain close collaboration and coordination with external health system collaborators and informatics teams mining EHR and phenotypic data sets. Work with these collaborators to structure data and develop algorithms, rules engines, and querying tools to access and curate phenotypic datasets.
This role might be for you if:
You are a data steward.
You are interested in data management, mining, clinical databases, and hospital health informatics databases.
You are proficient in developing python-based data processing pipelines, and are familiar with ETL tools such as AWS Glue.
Understanding of AI/ ML concepts and architecture to the extent of being able to support activities of the clinical informatics ML team.
You can multitask and manage simultaneous projects to meet deadlines with strong attention to detail.
Possess the ability to interpret and communicate analytical information clearly and concisely.
You have exceptional analytical, organizational, and quantitative problem-solving skills and a willingness to learn and acquire new skills.
You excel at managing relationships and projects involving diverse partners.
You communicate findings clearly and document work for training and replication purposes.
To be considered for this role, you must have a bachelors or master’s (preferred) degree in Computer Science, Information Science, informatics, or other relevant data engineering field, and a minimum of 3 years of working experience in data engineering and management, ETL pipelines development, automation, and management. Healthcare and EHR data management experience is preferred. Familiarity with data mining, clinical databases, and hospital health informatics databases, including EHR data structures. Familiarity with clinical data standards such as ICD, SNOMED, LOINC, and OMOP, database architecture and administration. Proven experience with Hadoop. Demonstrated understanding of relational database concepts and querying tools. Experience with CI/CD framework, data flow orchestration. Working knowledge of programming languages such as Python and R. Experience with cloud computing services such as AWS and GCP. Experience with agile methods (Scrum, Kanban) and tools such as Atlassian JIRA and Microsoft teams foundation server. Experience with Machine Learning is not necessary but certainly a plus. However, general understanding of AI/ ML concepts to support deployment, orchestration of datasets as part of a pipeline is expected. The level is commensurate with education and experience.
Does this sound like you? Apply now to take your first steps toward living the Regeneron Way! We have an inclusive and diverse culture that provides comprehensive benefits including health and wellness programs, fitness centers and equity awards, annual bonuses, and paid time off for eligible employees at all levels!
Regeneron is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion or belief (or lack thereof), sex, nationality, national or ethnic origin, civil status, age, citizenship status, membership of the Traveler community, sexual orientation, disability, genetic information, familial status, marital or registered civil partnership status, pregnancy or parental status, gender identity, gender reassignment, military or veteran status, or any other protected characteristic in accordance with applicable laws and regulations. We will ensure that individuals with disabilities are provided reasonable accommodations to participate in the job application process. Please contact us to discuss any accommodations you think you may need.
The salary ranges provided are shown in accordance with U.S. law and apply to U.S. based positions, where the hired candidate will be located in the U.S. If you are outside the U.S, please speak with your recruiter about salaries and benefits in your location.
Salary Range (annually)