Data Engineer- Remote-Canada

Zortech SolutionsAB, Canada

1 day ago

Job type

Full-time

Remote

Quick Apply

Job description

L!-CEIPAL

Role : Data Engineer with expertise in Apache Spark, PySpark, Python, and AWS services (particularly Glue)

Location : Remote-Canada

Duration : 6+ Months

Job Description :

We are looking for an experienced Data Engineer with expertise in Apache Spark, PySpark, Python, and AWS services (particularly Glue) to join our team.

The ideal candidate will have hands-on experience with ETL processes in the cloud, a deep understanding of data pipelines, and the ability to work with large datasets efficiently.

This role will focus on designing, building, and optimizing data workflows on AWS Cloud using Spark-based frameworks and Python.

Mandatory Skill Sets : -

Proficient in using Apache Spark and PySpark for big data processing and transformation.

Hands-on experience with AWS Glue for building ETL workflows in the cloud.

Strong programming skills in Python, particularly for data manipulation, automation, and integration with Spark and Glue.

Solid understanding of ETL principles and data pipeline design, including optimization techniques.

Experience working with AWS services, particularly those related to data processing (e.g., S3, Glue, Lambda, Redshift).

Must have proficiency in writing optimized SQL with performance tuning aspects.

Ability to translate complex business requirements into technical solutions.

Experience with Apache Airflow for orchestrating data workflows.

Knowledge of data warehousing concepts and cloud-native analytics tools.

Key Responsibilities :

Spark & PySpark Development :

Design and implement scalable data processing pipelines using Apache Spark and PySpark for large-scale data transformations.

ETL Pipeline Development :

Develop, maintain, and optimize ETL processes to efficiently extract, transform, and load data across various data sources and destinations.

AWS Glue Integration :

Utilize AWS Glue for serverless ETL jobs, including creating, running, and monitoring Glue jobs for data transformations and integrations.

Python Scripting :

Write efficient and reusable Python code to support data manipulation, analysis, and transformation in Spark and Glue environments.

Data Pipeline Optimization :

Ensure that data workflows are optimized for performance, scalability, and cost-efficiency on AWS Cloud.

Collaboration :

Work closely with data analysts, data scientists, and other engineering teams to build reliable data solutions that support business analytics and decision-making.

Documentation & Best Practices :

Document processes, workflows, and code, while adhering to best practices in data engineering, cloud architecture, and ETL design.