Exscientia Oxford, UK
We are currently seeking a highly-motivated data engineer with a strong background in software development and experience in the design and implementation of automated data pipelines. You will be an integral member of a world-class team of engineers, researchers and drug hunters focused on applying AI to achieve unprecedented productivity gains in small molecule drug discovery. As a data engineer at Exscientia, you will be based at our new Oxford office, working as part of an agile team on greenfield projects using modern tools and cloud-based technologies. You will collaborate closely with data scientists, software developers and research engineers to develop robust data pipelines capable of extracting, transforming and loading datasets for use throughout the AI-driven drug discovery process. Your experience working with database systems, designing data models and optimising data flow will allow you to have real impact and influence on the overall design of the data architecture. Key responsibilities Design and implementation of automated data processing pipelines, ETL logic and data models. Delivery of robust, extensible and maintainable software according to best practices. Work with minimal supervision on the full software project lifecycle, from requirements capture through to planning and execution. Develop the knowledge necessary to successfully carry out data engineering tasks involving datasets specific to the drug discovery domain. Provide mentorship to more junior members of the team. Stay informed on new technology developments, analyse them for potential benefits to our platform and devise plans for their implementation. Qualifications A Bachelor's, Master's, or PhD in Computer Science, Mathematics, Physics, Engineering or a related field. Strong analytical and programming skills in Python with familiarity with numpy, pandas desirable. Excellent knowledge of database systems, both relational and non-relational, SQL syntax, schema design and query optimisation. Proven track record of developing and implementing data pipelines, ideally using cloud based technologies. Excellent communication skills, organised and motivated. Strong team player with an inclusive mindset that is willing to listen and learn from others. Passionate about the idea of discovering novel therapeutics through the application of technology. Beneficial skills and experience Experience with cloud infrastructure platforms such as Amazon Web Services. Experience with workflow orchestration systems such as Airflow and Luigi. Experience with containerization and container orchestration such as Docker and Kubernetes. Experience with continuous integration and deployment frameworks such as Jenkins and Travis. Experience with machine learning and major machine learning frameworks such as scikit-learn and Pytorch. Experience using algorithms and data structures to solve real-world problems. Experience with other programming languages (C++/Java/Scala). Open source contributions demonstrating experience in scientific software development.