Job Overview
- Design, build and productionize modular and scalable data pipelines and data infrastructure leveraging the wide range of data sources across the organization.
- Implement curated common data models that offer an integrated, business-centric single source of truth for business intelligence, analytics, artificial intelligence, and other downstream system use.
- Identify, design, and implement internal process improvements : automating manual processes, optimizing data delivery, re designing infrastructure for greater scalability, etc.
- Work with tools in the Microsoft Stack; Azure Data Factory, Azure Data Lake, Azure SOL Databases, Azure Data Warehouse, Azure Synapse Analytics Services, Azure Databricks, Microsoft Purview, and Power Bl
- Work within an agile work management framework in delivery of products and services, including contributing to feature & user story backlog item development.
- Develop optimized, performant data pipelines and models at scale using technologies such as Python, Spark and SOL, consuming data sources in XML, CSV, JSON, REST APls, or other formats.
- Implement orchestration of data pipeline execution to ensure data products meet customer latency expectations, dependencies are managed, and datasets are as up to date as possible, with minimal disruption to end-customer use.
- Create tooling to help with day-to-day tasks and reduce toil via automation wherever possible.
- Build continuous integration / continuous delivery (Cl / CD) pipelines to automate testing and deployment of infrastructure / code.
- Monitor the ongoing operation of in-production solutions, assist in troubleshooting issues, and provide Tier 2 support for datasets produced by the team, on an as-required basis.
- Write and perform automated unit and regression testing for data product builds, assist with user acceptance testing and system integration testing as required, and assist in design of relevant test cases.
- Participate in code review as both a submitter and reviewer.
Qualifications
- 4 years of University in computer science, computer / software engineering or other relevant programs within data engineering, data analysis, artificial intelligence, or machine learning.
- Requires experience of over 4 years and up to 6 years in data modeling, data warehouse design and data solution architecture in a Big Data environment.
- Experience as a Data Engineer in a Big Data environment.
- Experience with integrating structured and unstructured data across various platforms and sources.
- Knowledge of content fragmentation, partitioning, query parallelism, and query execution plans.
- Experience with implementing event-driven (pub / sub), near-real-time, or streaming data solutions.
- Strong knowledge of programming methodologies (source / version control, continuous integration / continuous delivery, automated testing, quality assurance) and agile development methodologies.
- Fluency with SQL, Python, and Spark / PySpark is required.
- Experience with Airflow and DBT.
- A period of over 4 years and up to and including 6 years in data ingestion, data modelling, data engineering, and software development.
30+ days ago