Data Lake Lead

  • AstraZeneca
  • Melbourn, Royston SG8 6HB, UK
  • 17/08/2020
Full time Data Science Data Analytics Big Data Statistics

Job Description

AstraZeneca is a global biopharmaceutical business that focuses on the discovery, development and commercialisation of prescription medicines for some of the world's most serious diseases. At AstraZeneca, we're proud to have a workplace culture that inspires innovation and collaboration. Here, you would express diverse perspectives, contribute to an energised environment and provide creative ideas - and be rewarded for this.

We are recruiting for a Data Lake Lead to be based in one of our hub sites (Cambridge UK, Gaithersburg MD, Gothenburg Sweden).

As part of AstraZeneca’s Science Data Foundation (SDF) program we are continuing the development of a Data Lake for our Research & Development teams. SDF exists to give our scientists access to data and tools at pace, accelerating their work on life saving medicines.

Through SDF we are making our data Findable, Accessible, Interoperable and Re-usable (FAIR). This is being achieved through the creation of a distributed data architecture, of which the Data Lake is a critical component. It is our desire to create a scalable solution to support the development of data communities and valuable data products. To make this solution scalable we need to provide the services and tools to our partners in R&D so that they can ingest data into the Data Lake and access the data, all in a secure and compliant way.

As the Data Lake Lead you will lead the effort to bring about a step-change in how we develop and run the R&D Data Lake as a capability. This will involve building out the people, processes and technology to meet product owner requirements and architectural strategy. You will build the Data Lake operating model and work with solution architecture to build a technology roadmap. You will be accountable for the efficient, secure and compliant running of the Data Lake. Our technology is focused around the AWS stack, but we do have an on-prem object store (SwiftStack). We wish to see the cloud and on-prem services united to form a single Data Lake capability.

You will work closely with the Data Catalogue Lead to manage high quality metadata and metadata flows to ensure that our data is findable and accessible. We are seeking to digitise our access governance processes, so you’ll be working to integrate the Data Lake with the data catalogue and data science environments.

You will join a team that has delivered cloud solutions, such as the development of auto-scaling containerised ETL. Similarly, we have built an automated ETL test harness which integrates with our evolving CI/CD processes.

A collaborative delivery approach will be crucial to be successful. We prefer to use Agile but choose the appropriate approach for the project. So, experience of a variety of delivery management methodologies will come in useful. You will provide technical leadership throughout our software development lifecycle, from the initial development of a technical design based on a blueprint, right through to hypercare. Do you have a real passion for delivering well engineered data and analytics solutions that can help improve patient lives? If you do, this will make you stand out from other applicants.

Essential skills and experience

  • You will have experience of building or developing team,
  • Technical leadership in a data domain,
  • Experience developing and managing a cloud data lake,
  • You will be able to demonstrate an ability to understand business needs and translate them into a solution,
  • You will be able to craft and document development best practices,
  • You will need great interpersonal skills & a collaborative approach to delivery.

Desirable skills and experience

  • Experience managing metadata and metadata loading into a data catalogue,
  • Experience building data centric APIs,
  • You are very likely to have experience of Agile practices, especially being a SCRUM Master,
  • Experience of big data, ETL & cloud techniques and tools (we currently use Talend. Redshift (inc. Spectrum), Glue, EMR, HIVE, PIG, Spark, S3, SQS, SNS),
  • Experience working in a large organisation,
  • Experience building and managing off-shore teams,
  • Experience building and supporting GxP systems,
  • Experience supporting a 24x7 system,
  • Experience in the pharmaceutical or life sciences industries.

If you are interested, apply now!

At AstraZeneca we’ll make the most of your skills and passion by actively supporting you. To do this, we’ve built an extraordinary international working environment with outstanding opportunities for collaboration and innovation. If you’re passionate about making a difference and ready to discover what you can do – join us.

AstraZeneca is an equal opportunity employer. AstraZeneca will consider all qualified applicants for employment without discrimination on grounds of disability, sex or sexual orientation, pregnancy or maternity leave status, race or national or ethnic origin, age, religion or belief, gender identity or re-assignment, marital or civil partnership status, protected veteran status (if applicable) or any other characteristic protected by law.

AstraZeneca is an equal opportunity employer. AstraZeneca will consider all qualified applicants for employment without discrimination on grounds of disability, sex or sexual orientation, pregnancy or maternity leave status, race or national or ethnic origin, age, religion or belief, gender identity or re-assignment, marital or civil partnership status, protected veteran status (if applicable) or any other characteristic protected by law