Lab Data Engineer
Turbine
Join our team as a Lab Data Engineer and help virtualize biological experiments to accelerate discovery!
The Lab Data Engineer is responsible for building data pipelines for effective processing of our high-throughput screening results. Your goal is to continuously develop and maintain our existing data pipelines for various use cases ranging from plate reader-based readouts, microscopy images and ideally NGS-derived and omics data. You will be responsible to create dashboards and QC reports for our lab scientists and proactively communicate about their needs, explore new processing algorithms and work together with the database owners for long term storage.
Responsibilities
- Data handling: Backup and handle data according to standard practices established for data pipelines
- Data pipeline development: Programming and development of data pipelines for quality control and downstream handover to the modeling teams
- Troubleshooting: Support lab scientists during daily operations with troubleshooting and insights
- Cross-team data analysis support: Work together with other team members and lab teams to assess the needs and provide insights from the data generated
- QC report generation: Deliver QC reports for lab scientists for evaluation of screen results
- Long term database storage: Coordinate persistence of the data in a centralized location with the database owners
- Data model development: Propose ideas how to develop and sustain data models and schemas for various biological datatypes
- Documentation: Document of the processing workflows and logic
Key Expectations
- Design, build, maintain, and continuously improve data pipelines for high-throughput screening (HTS) data.
- Ensure reliable processing of diverse biological data types (plate reader data, microscopy images, NGS/omics data).
- Enable smooth handover of processed data to modeling teams.
- Implement and maintain QC workflows.
- Generate clear and actionable QC reports for lab scientists.
- Work closely with lab scientists to understand experimental needs.
- Partner with modeling/data science teams for downstream analysis.
- Coordinate with database owners for long-term storage and architecture alignment.
- Maintain clear documentation of data workflows.
- Standardize processes for reproducibility and onboarding.
- Communicate technical solutions effectively to non-technical stakeholders.
Skills
- Strong proficiency in Python (preferred), R, or similar languages.
- Experience building ETL/ELT pipelines.
- Experience with workflow orchestration tools (e.g., Airflow, Nextflow, Snakemake).
- Version control (Git).
- Experience handling high-throughput screening data.
- Processing plate reader outputs and assay readouts.
- Image processing (e.g., microscopy image analysis tools).
- Familiarity with NGS pipelines and omics data processing.
- Data cleaning, normalization, and transformation techniques.
- QC metrics implementation and automation.
- Dashboard development (e.g., Dash, Streamlit, Tableau, Power BI).
- Automated report generation.
- SQL and relational databases (PostgreSQL, MySQL).
- Experience with data modeling and schema design.
- Data warehousing concepts.
- Data backup and archiving best practices.
- Familiarity with cloud storage solutions (Azure is a plus).
- Understanding of downstream modeling workflows.
- Knowledge of statistical analysis for biological datasets.
- Familiarity with biological data formats.
Soft Skills
- Ability to translate technical concepts into clear insights for lab scientists.
- Proactive communication regarding pipeline issues and improvements.
- Clear documentation writing skills.
- Cross-functional teamwork with scientists, data scientists, and database teams.
- Stakeholder management across technical and non-technical teams.
- Takes initiative in improving systems and processes.
- Suggests scalable, long-term solutions rather than short-term fixes.
- Comfortable working with evolving biological data types.
- Willingness to learn new algorithms and technologies.
- Able to operate in dynamic research environments.
You are a Turbiner if these personality traits apply to you
- You have self-awareness and strong intention on personal growth
- You seek out feedback and learn from mistakes and apply what you learned
- You take responsibility and focus on the outcomes over tasks
- You focus on shared success and are open to challenging others respectfully to get the best possible outcome
- You are in it for the long haul and have passion for our mission