Location
London

About The Role

FDM is a global business and technology consultancy seeking a Senior Site Reliability Engineer (SRE) to work for our client within the aviation sector. This is initially a 12-month contract with the potential to extend and will be a hybrid role that will be based in London Heathrow.

Our client is seeking an experienced Senior SRE to help ensure the reliability, availability, and performance of their cloud-based services. You will play a key role in automating, monitoring, and optimising their infrastructure while maintaining high uptime and seamless user experiences. Your expertise in systems engineering and software development will be critical as you will work with cross-functional teams to build and improve our production systems.

Responsibilities

  • A senior SRE lead who has established SRE capability from scratch, designed their resource model, operating model and established SRE as a service
  • Be the first point of contact for P1/P2 incidents for their platform
  • Ensure that manual restoration steps are available, converting manual restoration into manual auto healing scripts, converting manual auto healing scripts into automated healing
  • Responsible for root cause analysis, identify patterns, identify anomaly deviations, create auto healing scripts, publish knowledge articles, develop scrips, execute scripts, monitor performance and availability, work with product teams to prioritise defects, roll back changes, etc
  • Identify opportunities to train the ML modelling. Optimise the modelling to reduce the difference between real time agent actions vs ML findings
  • Responsible for configuring and using the platform log analytics and metric intelligence for their services
  • Responsible to ensure that CMDB is accurate, and service mapping is up to date and fix glitches
  • Optimise the task intelligence for auto restoration
  • Responsible for problem management, defect management and define permanent fix

About You

Requirements

  • Extensive experience working as a Site Reliability Engineer.
  • Demonstrable experience and expertise in a domain (cloud (AWS/Azure) Global Data platform, Gen Ai products, Business applications) to understand how the product/platform works
  • Ability to technically write restoration scripts
  • work with product/platform team to fix defects
  • Participate in sprint meetings to understand new releases to develop new scripts for restoration
  • Confident with scripting languages such as Python, Go, Bash etc.
  • Familiarity with networking, security practices, and distributed systems.
  • Ability to troubleshoot and resolve issues at scale, working under pressure in a fast-paced environment.
  • Excellent communication and collaboration skills to work with cross-functional teams

About Us

Why join us?

  • Career coaching and access to upskilling throughout your entire FDM career
  • Initial upskilling pre-assignment that has been accredited by TechSkills
  • Assignments with global companies and opportunities to work abroad
  • Opportunity to obtain certifications from Microsoft, Salesforce, Cisco and more
  • Access to the Buy As You Earn share scheme

About FDM

We are a business and technology consultancy and one of the UK's leading employers, recruiting the brightest talent to become the innovators of tomorrow. We have centres across Europe, North America and Asia-Pacific, and a global workforce of over 4,000 Consultants. FDM has shown exponential growth throughout the years, firmly establishing itself as an award-winning employer and is listed on the FTSE4Good Index.

Diversity and Inclusion

FDM Group is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, colour, religion, sex, sexual orientation, national origin, age, disability, veteran status or any other status protected by federal, provincial or local laws.

Other jobs like this

Location
London
Location
London
Location
UK