Data Catalogue Lead

  • AstraZeneca
  • Melbourn, Royston SG8 6HB, UK
  • 17/08/2020
Full time Data Science Data Engineering Data Analytics Big Data Data Management Statistics

Job Description

AstraZeneca is a global biopharmaceutical business that focuses on the discovery, development and commercialisation of prescription medicines for some of the world's most serious diseases. At AstraZeneca, we're proud to have a workplace culture that inspires innovation and collaboration. Here, you would express diverse perspectives, contribute to an energised environment and provide creative ideas - and be rewarded for this.

We are recruiting for a Data Catalogue Lead to be based in one of our hub sites (Cambridge UK, Gaithersburg MD, Gothenburg Sweden).

As part of AstraZeneca’s Science Data Foundation (SDF) program we are building out a data catalogue for our Research & Development teams. SDF exists to give our scientists access to data and tools at pace, accelerating their work on life saving medicines. Through SDF we are making our data Findable, Accessible, Interoperable and Re-usable (FAIR). This is being achieved through the creation of a distributed data architecture - our data catalogue will be the unifying architectural component. The data catalogue will: be a registry of internal and external data products, digitise access governance and automate access provision. As such the Data Catalogue will be foundational in making our data products: findable, accessible and re-usable.

It is our desire to create a scalable solution to support the development of data communities and valuable data products. To make this solution scalable we need to provide the services and tools to our partners in the R&D in a secure, compliant, stable and sustainable way.

As the Data Catalogue Lead you will own the build and run of the data catalogue as a capability for R&D and IT. This will involve building out the people, processes and technology to meet R&D product owner requirements and IT’s architectural strategy. You will build the data catalogue operating model and work with solution architecture to create a technology roadmap. You will be accountable for the efficient, secure and complaint running of the catalogue. The team you lead will be distributed across our hub sites (UK, US and Sweden) and Chennai.

We have chosen Collibra as our data catalogue technology. Our teams will be making use a range of data engineering products to acquire, ingest and curate metadata into the data catalogue (including Talend and AWS Glue). Your team will be supporting the cataloguing of a wide variety of data sources including: Omics, Imaging, clinical study, DMTA cycle systems, AI/ML model outputs, literature, sensor data, and external data sources. This will include implementing metadata models, building governance workflows, automating granting of access and building out APIs. You will have a close working relationship with our Metadata Lead and Information Architects, as well as the Data Lake Lead.

You will join a team that has delivered cloud solutions, such as the development of auto-scaling containerised ETL. Similarly, we have built an automated ETL test harness which integrates with our evolving CI/CD processes. You will need a collaborative delivery approach to be successful. We prefer to use Agile but choose the appropriate approach for the project. So, experience of a variety of delivery management methodologies will come in useful. You will provide technical leadership throughout our software development lifecycle, from the initial development of a technical design based on a blueprint, right through to hypercare. Do you have a real passion for delivering well engineered data and analytics solutions that can help improve patient lives? If you do, this will make you stand out from other applicants.

Essential skills and experience

  • You will have experience of building or developing team,
  • Technical leadership in a data domain,
  • You will be able to demonstrate an ability to understand business needs and translate them into a solution,
  • You will be able to design and document development best practices,
  • You will need great interpersonal skills & a collaborative approach to delivery.

Desirable skills and experience

  • It is highly desirable that you have experience developing and managing a data catalogue or similar,
  • Experience configuring and managing a SaaS system,
  • A highly available system,
  • Metadata best practices and design principles,
  • Legal issues surrounding data re-use, especially in a pharmaceutical organisation (e.g. PII, GxP, primary & secondary use of data),
  • Experience of big data, ETL & cloud techniques and tools (we currently use Talend. Redshift (inc. Spectrum), Glue, EMR, HIVE, PIG, Spark, S3, SQS, SNS),
  • You have experience of technical leadership in data and analytics,
  • Building and maintaining APIs over data services,
  • Experience working with systems integrators,
  • You are likely to have experience of Agile practices, potentially having been a SCRUM Master.

If you are interested, apply now!

At AstraZeneca we’ll make the most of your skills and passion by actively supporting you. To do this, we’ve built an extraordinary international working environment with outstanding opportunities for collaboration and innovation. If you’re passionate about making a difference and ready to discover what you can do – join us.

AstraZeneca is an equal opportunity employer. AstraZeneca will consider all qualified applicants for employment without discrimination on grounds of disability, sex or sexual orientation, pregnancy or maternity leave status, race or national or ethnic origin, age, religion or belief, gender identity or re-assignment, marital or civil partnership status, protected veteran status (if applicable) or any other characteristic protected by law.

AstraZeneca is an equal opportunity employer. AstraZeneca will consider all qualified applicants for employment without discrimination on grounds of disability, sex or sexual orientation, pregnancy or maternity leave status, race or national or ethnic origin, age, religion or belief, gender identity or re-assignment, marital or civil partnership status, protected veteran status (if applicable) or any other characteristic protected by law