Role : AI Data Architecture & Engineering
Type : Remote
- Design and Optimization : Design, build, and optimize high-performance ETL / ELT data pipelines specifically tailored to support AI model training, RAG systems, and real-time inference workflows.
- Generative AI Pipelines : Collaborate with data scientists and application teams to implement specialized data flows for document processing, annotation pipelines, and LLM-enriched data.
- Vectorization & Search : Operationalize semantic search and document embeddings using modern data technologies, including vector databases (e.g., Pinecone, Weaviate), PostgreSQL , and cloud storage solutions.
- Scalability & Governance : Implement scalable, modular, and reusable components for data ingestion, transformation, and loading, while ensuring strict data quality, security, and privacy compliance (governance) throughout the AI pipeline lifecycle.
- Tool Adoption : Drive the adoption and integration of modern MLOps / AI tools such as LangChain , Docker , FastAPI / Flask , and enterprise integration patterns to accelerate development and deployment.
Data Science & Analytics Support
Model Implementation : Work alongside Data Scientists to develop and implement advanced data science and machine learning models to power analytics solutions.Big Data Analysis : Utilize big data platforms and tools to effectively analyze and process large, complex datasets.Reporting & Insights : Utilize insights from the data to create impactful visual reports and dashboards using tools like Streamlit, Gradio, or Dash , aligning data delivery with key project objectives.Collaboration : Serve as a subject matter expert, collaborating cross-functionally to ensure accurate data delivery and alignment with business requirements.Troubleshooting : Provide critical support for data management processes and swiftly troubleshoot data quality and pipeline issues.Required Skills and Experience
Technical Expertise
Proven hands-on experience in architecting and engineering data solutions for Generative AI or AI / ML systems.Expertise in Python and its ecosystem for data manipulation and engineering.Deep practical experience with Vector Databases, RAG architectures, and Document Processing .Strong knowledge of database systems, particularly PostgreSQL , and large-scale cloud data storage.Familiarity with data visualization and reporting tools (Streamlit, Gradio, Dash).Experience with deployment platforms and MLOps principles ( Docker, Model Deployment Platforms ).Critical Skills (Must-Haves)
10 + years of experience with Data Engineering and proven hands-on experience in architecting and engineering data solutions for Generative AI or AI / ML systems.LLM Knowledge and application of Prompt Engineering techniques.Hands-on experience with LangChain or similar Generative AI Frameworks.Proficiency in AI / ML model building knowledge and / or Model Fine-Tuning .Understanding of LLM Evaluation Techniques .Proven expertise in Big Data & Analytics, Data Management, and data governance / compliance.