




Job Summary: We are seeking a DevOps/SRE/MLOps Engineer with AWS experience to design, optimize, and manage cloud-native architectures, automate processes, and ensure security. Key Highlights: 1. Design and optimization of CI/CD processes and deployment automation. 2. Management of scalable cloud-native architectures on AWS and observability. 3. Cross-functional collaboration and technical leadership in DevOps and SRE. **Responsibilities** * Design and optimize development and deployment processes using CI/CD pipelines, automated testing, monitoring, and continuous improvement to increase deployment frequency and reliability. * Participate in the design and management of scalable cloud-native architectures on AWS, ensuring high availability, redundancy, and adherence to SLA objectives. * Manage and evolve infrastructure as code, leading migrations and architectural improvements when necessary. * Implement and enhance observability services, defining metrics, SLIs, SLOs, alerts, and traceability to detect and resolve performance and availability issues. * Develop the internal development platform (IDP), improving team experience through tools, templates, scaffolds, and test environments. * Automate data and machine learning operations, collaborating with the data team on DataOps and MLOps practices across the model lifecycle. * Ensure infrastructure security, proper identity and secrets management, and regulatory compliance in regulated environments. * Collaborate cross-functionally with product, data, security, and business teams, serving as a technical reference for DevOps and SRE best practices. * Drive cost-optimization (FinOps) and operational efficiency initiatives. **Requirements** * Minimum of 5 years’ proven experience in DevOps, SRE, MLOps, or similar roles in production environments. * Advanced knowledge of AWS cloud infrastructure, Docker containers, networking, and security. * Proficiency with CI/CD tools and infrastructure-as-code pipelines (Terraform, CDK, or others). * Experience with observability tools such as Prometheus, Grafana, Elastic, or APM solutions. * Knowledge of MLOps, including pipeline orchestration, model versioning, and machine learning model monitoring. * Programming skills in Python and/or Bash for building reusable automations. * Experience with microservices architecture and APIs. * Security and regulatory compliance orientation, with familiarity with standards such as PCI DSS or ISO 27001. * Communication and leadership skills to coordinate teams and foster DevOps and SRE culture. * Strong motivation for continuous improvement and technical excellence.


