Data Engineer

  • Collinson
  • London, UK
  • 07/02/2020
Full time Data Science Data Engineering Big Data Statistics

Job Description

Purpose of the role:

The Data lake engineer role focuses on maintaining and monitor an inhouse data lake which has been built using AWS and Hadoop ecosystem. The data lake engineer will be responsible for administration of all the tools which are part of Data Lake and maintenance of transparent, secured environment. The Data Lake Engineer will also liaise with Cloud and network team to understand the different systems being used as part of that team and help Data team understanding it well.

Key Responsibilities:

As a Data Lake Engineer, You will have extensive knowledge of AWS and should be able to work with CFT (Cloud Formation Template) , terraform, Source control (Github). You will have extensive knowledge and experience and act as an SME for cloud technologies (AWS) and advocate standards and best practises to be applied.

Integration Specification, Design & Deployment

  • Good understanding of Data Security, Data Lineage and Data governance
  • Solid grounding in data administration either in relational database system or in cloud
  • Experienced in handling tools to create user policy, resource policy to govern data access process (e.g. AWS IAM , Ranger, Atlas etc.)
  • A solid grounding in DevOps principles and someone keen to advocate and educate on these principles and feed into team discussions around Collinson to drive a more collaborative approach breaking down silos where they exist
  • Responsible for performing root cause analysis for underlying issues and incidents that affect systems performance and availability including proactively identifying and resolving known errors and problems to minimize incident re-occurrence.
  • To provide the technical input into planning and implementation of complex upgrades, installation and releases

Knowledge, skills & experience required:

  • Core Skills
  • Python, AWS, SQL and Hive
  • Strong communication
  • Deep exposure to data heavy workflows
  • Data Engineering: Python, bash, Hive, SQL (preferably postgres and SQL Server), building and maintaining APIs, building and maintaining data lakes
  • Minimum 1 year experience in building AWS workloads
  • Knowledge of Hadoop Ecosystem and related tools
  • Experience in Data Management, Administration or data-integration roles
  • Experience with AWS in development and production
  • Experience with Apache Atlas and Ranger
  • Experience with Data Lake Governance
  • Experience maintaining production grade workflows
  • Extraction, Transformation & Loading
  • Database performance tuning and querying writing

Experience of the full Software Development Lifecycle, utilising both Agile and Waterfall Project Delivery