Data Engineer
Brainhunter
Ontario, Ottawa
$56K-$60K a year (estimated)
Full-time
Data Engineer - #61261
About the Job :
Mindwire is currently seeking a Data Engineer to work for our valued Crown corporation client.
The position is remote with the option of a Hybrid work environment (Ottawa / Toronto).
Responsibilities :
- Converts historical data from SAS database to Parquet.
- Works closely with teams to develop and transition ongoing production process of analytical assets (currently coded in SAS and stored as SAS datasets) to be coded using Python (leveraging Pyspark) and stored in Parquet.
- Recommends improvements and enhancements to conversion processes and ongoing production of analytical assets.
- Prepares documentation required related to the results of the conversion (e.g. differences in converted data that will impact use by analysts).
- Coaches and mentors staff in data conversion techniques.
Requirements and Qualifications :
- 5 years’ experience with version control using Git, including branching, merging, and collaborating with teams to manage code repositories effectively.
- Extensive knowledge of SQL for querying and manipulating databases, with a solid understanding of database design and optimization (5 Years’ Experience).
- High level of expertise in data engineering processes, including data extraction, transformation, and loading (ETL), utilizing Informatica tools to create efficient and robust data pipelines.
- Extensive programming experience in Python, with the ability to develop complex applications, scripts, and data processing tasks using a wide range of libraries and frameworks.
- Proficient usage of Jupyter Notebooks as an integrated development environment for data analysis, data visualization, and interactive code execution, fostering efficient and reproducible workflows.
- Extensive hands-on experience with Apache Spark, including the ability to design, optimize, and manage large-scale data processing and analytics pipelines.
- In-depth knowledge of Apache Parquet file format for columnar storage, including its benefits and optimal usage in big data processing scenarios.
30+ days ago