




Summary: Seeking an MLOps/AIOps/LLMOps/AgentOps Engineer to design, operate, and continuously evolve our AIOps platform, ensuring reliable, scalable, and cost-efficient AI product operations. Highlights: 1. Design, operate, and evolve AIOps platform for reliable, scalable AI products 2. Focus on platform, infrastructure, automation, observability, and operations 3. Collaborate with Data Scientists, Data Engineers, and Product teams We are looking for a **MLOps / AIOps / LLMOps / AgentOps Engineer** to join a multidisciplinary Data \& AI team. The main mission of this role is to **design, operate, and continuously evolve our AIOps platform**, ensuring that our AI products run in a **reliable, scalable, and cost‑efficient** way. This position is **strongly focused on platform, infrastructure, automation, observability, and operations** rather than on building ML models or AI products themselves. You will work with modern cloud technologies (mainly **AWS**, with some **Azure** exposure) and collaborate closely with **Data Scientists, Data Engineers, and Product teams** to bring AI solutions into production and keep them running smoothly. We are open to candidates with **strong expertise in at least one core area** (e.g. cloud, DevOps, platform engineering, or ML operations) and **solid foundational knowledge in the others**, with motivation to grow across the full AI operations stack. **Key Responsibilities** * **Design, maintain, and evolve the AIOps platform** supporting: + Traditional machine learning models in production + LLM‑based solutions such as **RAG pipelines and AI Agents** + **Speech Analytics** use cases (ASR, conversation analysis, NLP) * **Build and operate ML and LLM pipelines** with a strong focus on: + Reliability, automation, and observability + Model and LLM quality, performance, and drift monitoring + Cloud cost control and optimization * **Implement LLMOps / AgentOps practices**, including: + LLM evaluation and observability + Prompt management, traceability, and specialized logging + Agent integration, orchestration, and lifecycle management * **Ensure continuous operation of AI products**, including: + Alerts, dashboards, SLOs / SLIs + Scalability strategies and basic auto‑remediation mechanisms * **Manage deployments in cloud environments** (AWS / Azure) and container platforms (Docker / Kubernetes) * **Collaborate closely with Data Scientists and Data Engineers** to productionize robust, scalable AI solutions * **Contribute to internal standards, automation, and best practices** across the AI and data ecosystem **Required Skills (Must Have)** * Hands‑on experience in **MLOps, AIOps, or operating ML systems in production** * Solid understanding of **LLMOps and AgentOps concepts** (RAGs, agents, evaluation, monitoring) * Experience working with **AWS and/or Azure** in production environments * Practical knowledge of **containers and Kubernetes** (Docker, basic Helm usage, etc.) * Experience with **CI/CD pipelines** (GitHub Actions, GitLab CI, Azure DevOps, Jenkins, or similar) * Familiarity with **observability and monitoring concepts** (CloudWatch, OpenTelemetry, Prometheus, etc.) * Experience managing infrastructure as code (**Terraform, Bicep, CDK, or similar**) * **Python** experience and familiarity with the ML ecosystem (e.g. scikit‑learn, PyTorch), even if not a Data Scientist * Good understanding of the **ML / LLM lifecycle**, from development to production and monitoring * **Fluent English** to work in an international environment **Nice to Have (Not Required, but Valuable)** * Experience with ML/AI platforms such as **SageMaker, Azure ML, MLflow, Kubeflow** * Exposure to **Speech Analytics technologies** (ASR, diarization, conversational NLP) * Experience with **cloud cost optimization / FinOps**, especially for AI workloads * Experience building or operating **AI agents, copilots, or conversational systems** * Familiarity with **LLM frameworks** (LangChain, LlamaIndex, Semantic Kernel, etc.) * Experience with **workflow and orchestration tools** (Airflow, Argo, Step Functions, Durable Functions) **Professional Skills \& Mindset** * Strong focus on **reliability, automation, and scalability** * Ability to collaborate effectively in **multidisciplinary teams** * Clear communication and documentation‑oriented mindset * **Platform mindset**: building reusable, maintainable, and robust solutions * Proactive, analytical, and continuous‑improvement driven * Strong sense of **ownership and end‑to‑end responsibility** * Motivation to **learn and grow across the AI operations stack** **Technology Environment** * **Cloud**: AWS, Azure * **Orchestration \& Containers**: Kubernetes, Docker * **CI/CD**: GitHub Actions, GitLab CI, Azure DevOps * **Observability**: Prometheus, Grafana, ELK/EFK, OpenTelemetry * **Infrastructure as Code**: Terraform, Bicep, CloudFormation * **AI / ML Tools**: MLflow, Azure ML, SageMaker, LangChain, LlamaIndex, Semantic Kernel * **Primary Language**: Python


