Job Description : Data Engineer (Python | SQL | Semi‑Structured Data | ES APIs)
Experience Required : 6–8 years
Skills : Python, SQL, Data Modeling, ETL / ELT, DBeaver, SQLite / Postgres / Dremio, API Integration
Role Summary
We are seeking a highly skilled Data Engineer with strong Python and SQL expertise to build reliable, scalable data pipelines that transform semi‑structured data from ES URLs / APIs into clean, analytics‑ready datasets. You will work primarily in a local environment (Python, DBeaver, SQLite / Postgres / Dremio), establish database connections, flatten and normalize JSON / Elasticsearch topics, and prepare datasets for downstream Power BI reporting. This role requires deep hands‑on engineering ability, strong data modeling skills, and clear communication with business stakeholders.
Key Responsibilities
1. Data Ingestion & Transformation
Extract semi‑structured data from ES URLs, API endpoints, and Elasticsearch topics (JSON-based).
Flatten, normalize, and structure nested JSON into relational tables suitable for analytics.
Build reproducible ETL / ELT workflows using Python (pandas, NumPy, SQLAlchemy, requests).
Implement transformation logic, incremental loads, and schema alignment for downstream use.
2. Database Engineering
Design, create, and maintain database schemas in SQLite, Postgres, and Dremio.
Configure and manage local DB connections through DBeaver.
Optimize queries using indexing strategies, caching, and partitioning.
Implement performance tuning for Python data jobs and SQL queries.
3. Data Quality & Governance
Build and maintain validation rules, deduplication logic, and anomaly detection.
Establish dataset versioning, lineage tracking, and data contract / documentation.
Ensure secure handling of API credentials, tokens, and data source endpoints.
Use Git for version control, perform code reviews, write unit tests, and support CI checks.
Produce clear documentation, runbooks, and support materials for ad‑hoc data requests.
4. Reporting & Downstream Enablement
Prepare clean, analytics‑ready datasets for use in Power BI dashboards and business reporting.
Collaborate with stakeholders to translate business requirements into technical data solutions.
Ensure accurate, complete, and timely delivery of data to reporting teams.
Required Skills & Experience
Programming & Data Engineering
Strong hands‑on experience with Python (pandas, NumPy, SQLAlchemy, requests).
Ability to work with and transform semi‑structured JSON / ES data.
Experience integrating with REST APIs, ES endpoints, or similar data sources.
SQL & Databases
Advanced SQL proficiency across SQLite, Postgres, and Dremio querying.
Understanding of dimensional modeling, normalization, and modeling nested / semi‑structured data.
Experience with query tuning, indexing, and performance optimization.
Tools & Pipelines
Proficient in DBeaver (database connections, schema management).
Experience building ETL / ELT pipelines with error handling, logging, and recoverability.
Familiarity with dataset preparation for Power BI.
Collaboration & Delivery
Strong communication skills; ability to work closely with business stakeholders.
Experience translating requirements into technical specifications and deliverables.
Preferred / Bonus Skills
Experience with Elasticsearch, ES endpoints, scroll APIs, or schema‑on‑read engines (e.g., Dremio).
Familiarity with Docker for reproducing local environments.
Experience with schedulers such as Airflow, Prefect, or similar orchestration tools.
Knowledge of performance profiling tools (EXPLAIN plans, indexing strategies, caching).
Data Engineer Python SQL • Toronto, ON, ca