FDM is a global business and technology consultancy seeking an AI / ML Platform Engineer to work for our client within the Sports sector. This is initially a 6 month contract with the potential to extend and will be a hybrid role - 3 days a week - based in Paddington, London.
Our client is seeking an experienced AI / ML Platform Engineer to design, build, and operate the infrastructure and pipelines that power their AI solutions. You will engineer the interactions between their front-end applications and AI services through API management, while managing the compute environment that underpins their AI workloads. Your responsibilities will span the full lifecycle: development, maintenance, and continuous improvement of existing AI systems, with reliability, security, and efficient resource utilisation as your guiding principles.
This is a hands-on engineering role within a fast-moving AI landscape. You will work across ML pipeline creation, resilience pattern design, infrastructure scaling, and model operations — including supporting domain-specific behavioural logic (e.g. in-season, off-season, pre-game, post-game, and live game contexts).
Responsibilities:
AI Platform & Infrastructure
- Engineer and operate the integration layer between front-end applications and AI solutions via API management and compute provisioning.
- Design, implement, and maintain scalable, resilient infrastructure for AI/ML workloads, including performance testing and capacity planning.
- Evaluate capacity and load patterns to optimise reliability of services within a Microsoft Services Architecture.
- Manage authentication, identity, and role-based access controls across the platform.
- Add, remove, and configure service components via Infrastructure as Code (IaC) and networking best practices.
ML Pipelines & Model Operations
- Create and manage end-to-end ML pipelines for training, scoring, and deployment — including logic for domain-specific AI system behaviours (in-season, off-season, pre-game, post-game, during games, etc.).
- Support model development workflows including pre-training, fine-tuning, prompt engineering, and Retrieval-Augmented Generation (RAG).
- Analyse data to identify patterns, trends, and insights that inform model development and tuning.Enable continuous experimentation and comparison against baseline models.
Monitoring, Reliability & Data Operations
- Monitor incoming data to detect data drift; trigger model retraining and configure rollback procedures for disaster recovery.
- Monitor operational and ML-related issues by comparing model inputs, exploring model-specific metrics, and managing alerts on ML/AI platform components.
- Manage data pipelines for training and scoring workloads.
- Design and implement resilience patterns to ensure high availability and fault tolerance.