Job Description
The person in this role works closely with the product team and the data scientist to deliver data and AI solutions that make processes and systems smarter by developing data pipelines aligned with the requirements of predictive data models.
They are also responsible for the accessibility and integrity of the required datasets.
Responsibilities
- Lead the design, implementation, and maintenance of data transport processes required by the data scientist.
- Perform data transformation operations to feed predictive models.
- Lead data architecture initiatives to ensure reliable pipelines for both structured and unstructured data.
- Design scalable data models required for building predictive models.
- Ensure the transition between the pipeline and the data model.
- Once predictive models are trained, ensure their integration into applications, systems, and operations.
- Integrate technologies to develop solutions.
- Monitor and track data quality (reliability, consistency, integrity) and data flow dynamics.
- Track performance and report on it to make necessary infrastructure adjustments.
- Use distributed computing to train models.
- Contribute to a wide range of projects related to implementing systems, solutions, and both existing and new processes.
- Perform any other related tasks.
Qualifications
Education
University degree in Computer Science, Engineering, or a related field.Relevant Experience
Minimum five years in the industry working with data, coding, scripting, and design.Minimum five years in developing and administering large-scale data systems.Minimum three years in data modeling and administering NoSQL and SQL databases.Skills
Ability to design processes based on data flow concepts and architectures.Ability to extract, transform, and load data to and from systems.Ability to configure, use, and develop data management systems.Ability to develop large, structured software using software engineering methodologies.Ability to use various tools and languages (e.g., Python, SQL, Bash) for system integration.Understanding of Machine Learning.Ability to conduct independent research to solve complex problems.Knowledge of Agile methodologies.Knowledge
Experience with cloud environments (Databricks, Snowflake, AWS).Experience with Iceberg and / or Delta data formats.Experience with COO conceptual models.Experience with Spark, Scala, PySpark.Experience with Flume, NiFi, Kafka, or other data pipeline tools.Experience with REST APIs.Experience with C++, C#, Java, and .NET.Strong knowledge of computer science principles, algorithms, and data structures.Experience with data‑oriented principles and architectures, data flow analysis, and data flow diagrams.Experience in data tuning, cleaning, transport, and integrity.