AI/ML - Site Reliability Manager, Siri Knowledge Platforms

  • Apple
  • London, UK
  • 23/04/2021
Full time Data Science Machine Learning Data Analytics Big Data Data Management Statistics

Job Description

Play a meaningful role in revolutionizing how people use their computers and mobile devices, build ground breaking technology for algorithmic search, machine learning, natural language processing & artificial intelligence and work with the teams building the most scalable big-data systems in existence.Key Qualifications

  • Experience building and managing small, highly agile teams.
  • Actively participate in the day-to-day stability of a 24/7 global service.
  • Sophisticated knowledge of Kubernetes, containerization systems, and public cloud infrastructure.
  • Proficiency programming in Go, Python, or similar language to automate tasks.
  • Experience with monitoring tools such as Prometheus.

DescriptionAs part of this team, you will be the point person for hiring several people to grow into the team, and be the front-line manager for the AI/ML SRE function in London. As a working manager, you will monitor production and staging environments for a myriad of applications in an agile and dynamic organization. While striving to improve the stability, security, efficiency and scalability of all production systems, strong troubleshooting ability will be used daily; a successful engineer will attempt to isolate issues and resolve the root cause through investigative analysis. The role also requires building and maintaining accurate, up-to-date documentation reflecting configuration, providing code reviews, training and mentoring staff, as well as writing status reports and interacting with other Apple employees and management. The ideal candidate is an independent problem-solver who is focused and capable of exhibiting deftness to handle multiple simultaneous competing priorities and deliver solutions in a timely manner.Education & ExperienceBachelor’s degree in engineering, computer science or related field, or equivalent work experience.Additional Requirements

    • Working knowledge of multi-tier applications and their dependencies including load balancing, TCP/IP networking, web services, LDAP and DNS.
    • Demonstrated history of developing and maintaining automation for infrastructure and application management with configuration tools such as Puppet, Ansible, or Chef.
    • Proficiency with web server administration including Apache and Nginx.
    • Knowledge of database design, support and administration including Postgres, MySQL, and HBase.
    • Network administration and troubleshooting.
    • Good interpersonal skills shown through previous projects or assignments.