Senior Data Platform / Data Engineer

Indeed

Full-time

Onsite

No experience limit

No degree limit

C. Sol, 1, 28950 Moraleja de Enmedio, Madrid, Spain

Favourites

Description

Summary: Join our ML Platform team as a Senior Data Platform / Data Engineer to build and scale data infrastructure for AI products, focusing on Data Lakehouse and dataset management. Highlights: 1. Shape the next generation of our data platform and AI products 2. Collaborate closely with ML researchers and engineers 3. Improve data quality, reproducibility, and traceability **Straumann Group** At Straumann Group we’re on an exciting journey of growth, innovation, and impact \- driven by our mission to improve oral health and transform millions of lives worldwide. United by purpose, we bring our best selves to work every day, embracing a high\-performance, player\-learner culture that inspires collaboration, curiosity, and ambition. Here, you’ll have the opportunity to take charge of your own career, harnessing your skills, passion, and enthusiasm for learning to continually grow and progress. Together, we’re not just shaping brighter smiles, we’re unlocking the potential of people everywhere, including our own. **About the role** We are looking for a Senior Data Platform / Data Engineer to join our ML Platform team and help build and scale the data infrastructure that powers our AI products in dentistry. Our platform supports the full AI development lifecycle, from raw data ingestion and annotation workflows to dataset versioning and model training pipelines. You will work closely with Machine Learning Researchers (MLRs), MLOps engineers, and product teams to ensure our data infrastructure is reliable, scalable, and easy to use. A key focus of the role is improving our Data Lakehouse (DLH) and dataset management workflows, including dataset versioning (DVC) and improving how data is prepared, extracted, and consumed across research and production systems. **What you will work on:** You will play a key role in shaping the next generation of our data platform. **Typical responsibilities include:** Data platform ownership * Design and evolve the Data Lakehouse (DLH) architecture used across our ML teams. * Improve the reliability and structure of data ingestion, extraction, and transformation pipelines. * Ensure datasets used for training and evaluation are consistent, reproducible, and well documented. Dataset lifecycle management * Improve workflows for dataset versioning and reproducibility using tools such as DVC. * Design solutions for managing multiple versions of datasets and annotations across experiments and models. * Improve the ability for researchers to retrieve the correct dataset versions reliably. Data pipelines and infrastructure * Build and maintain scalable data pipelines in Python. * Improve metadata management, dataset validation, and data quality monitoring. * Optimize data workflows across AWS\-based infrastructure. Collaboration with ML teams * Work closely with ML researchers and ML engineers to understand their data needs. * Support research workflows with reliable and efficient data access patterns. * Help translate research requirements into robust platform capabilities. Data governance and quality * Implement practices for data quality, reproducibility, and traceability across the ML lifecycle. * Ensure our data infrastructure meets the requirements of regulated AI development. **What we’re looking for:** **Must have:** * Strong Python engineering skills * Experience building data pipelines or data platforms * Experience working with AWS * Experience working with large datasets used in ML workflows * Strong software engineering practices (testing, CI/CD, documentation) * Experience collaborating with ML teams or working in AI environments **Nice to have:** * Experience with dataset versioning tools such as DVC * Experience with Kubernetes * Experience with data lakehouse architectures * Experience working with annotation pipelines or ML training datasets * Experience with PostgreSQL, Metabase, or similar data tooling * Experience working in regulated environments (medical / healthcare AI) **Our stack*** AWS * Python * Kubernetes * PostgreSQL * Metabase * DVC for dataset versioning * Internal Data Lakehouse infrastructure All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or disability. **Employment Type:** Full Time **Alternative Locations:** Spain : Madrid **Travel Percentage:** 0 \- 10% **Requisition ID:** 20071

Source: indeed View original post