Overview
We're seeking a talented Data Platform Developer to join our Platform Engineering team and architect the data foundation that will power Newforma's next generation of AI-driven capabilities and analytics. You'll design and implement modern data architectures including medallion / lakehouse patterns, build event-driven data pipelines that process billions of project documents and communications in real-time, and create the analytics infrastructure that enables both business intelligence and AI / ML initiatives. This is a foundational role at an exciting time—as we migrate to AWS and invest heavily in AI, you'll establish the data practices and infrastructure that will serve the company for years to come.
Newforma manages billions of emails, documents, RFIs, submittals, drawings, and project files for thousands of construction projects worldwide. This rich dataset represents an incredible opportunity for AI-powered insights, intelligent automation, and advanced analytics. You'll build the data infrastructure to unlock this potential, creating pipelines that transform raw project data into clean, structured, and AI-ready datasets while also enabling real-time analytics and business intelligence. Working closely with our Director of AI Engineering and Platform Engineering team, you'll establish data architecture patterns that support everything from semantic search and RAG systems to executive dashboards and predictive analytics.
In this role, your responsibilities will include :
Data Architecture & Strategy
- Design and implement medallion architecture (bronze, silver, gold layers) or lakehouse patterns on AWS to organize and transform data at scale
- Establish data modeling standards, governance practices, and quality frameworks across the organization
- Define data retention, archival, and lifecycle management policies for massive volumes of project data
- Create reference architectures and best practices for data engineering across teams
- Partner with the Director of AI Engineering to design data pipelines optimized for AI / ML workloads including vector embeddings and model training
- Work with the Lead Software Architect to ensure data architecture aligns with overall platform strategy
- Design data schemas and structures that support both analytical queries and AI applications
- Design and implement event-driven data architectures using AWS EventBridge, Kinesis, MSK (Kafka), SNS, and SQS
- Build real-time data streaming pipelines that capture, process, and route project events across the platform
- Architect event schemas and patterns for domain events (document uploads, email filing, RFI submissions, etc.)
- Implement change data capture (CDC) patterns to stream database changes to data lakes and analytics systems
- Design event-driven workflows that trigger AI processing, notifications, and downstream system updates
- Establish event governance including versioning, documentation, and monitoring
- Optimize event processing for low latency and high throughput at scale
- Build robust, scalable ETL / ELT pipelines using AWS Glue, Step Functions, Lambda, and EMR
- Develop data transformation jobs that cleanse, enrich, and structure unstructured project data
- Implement data quality checks, validation rules, and monitoring throughout pipelines
- Create reusable pipeline components and frameworks that teams can leverage
- Optimize pipeline performance and cost efficiency for processing billions of documents
- Handle diverse data formats including emails, PDFs, CAD drawings, images, and structured databases
- Implement data lineage tracking and metadata management
- Design and build data warehouses and data marts using Amazon Redshift, Athena, or similar technologies
- Create dimensional models and star schemas optimized for analytical queries
- Build datasets and aggregations that power executive dashboards and operational reports
- Implement BI solutions using tools like QuickSight, Tableau, PowerBI, or similar platforms
- Partner with product and business teams to understand analytics requirements and deliver insights
- Create self-service analytics capabilities that empower teams to explore data independently
- Establish KPIs, metrics, and reporting frameworks for product and business analytics
AI / ML Data Infrastructure
Prepare and structure data to support AI initiatives including document classification, semantic search, and intelligent agentsBuild pipelines for generating and storing vector embeddings for RAG (Retrieval-Augmented Generation) systemsCreate training datasets and feature stores for machine learning modelsImplement data versioning and experiment tracking for AI / ML workflowsDesign scalable inference pipelines that serve AI models with fresh, contextualized dataCollaborate with the AI Engineering team to optimize data formats and access patterns for LLM applicationsData Operations & Monitoring
Implement comprehensive monitoring, alerting, and observability for data pipelines and systemsBuild data quality dashboards and anomaly detection systemsCreate operational runbooks and documentation for data platform componentsOptimize costs across data storage, processing, and queryingEnsure data security, encryption, and compliance with privacy regulationsParticipate in on-call rotation to support production data systemsCollaborate with other platform engineering team members to accomplish tasksParticipate in agile ceremonies including daily stand-ups, sprint planning, and retrospectivesWork closely with development teams and with the software architect to establish good data engineering practices for newly developed featuresRequirements
5+ years of experience in data engineering, analytics engineering, or related rolesStrong hands-on experience with AWS data services including S3, Glue, Athena, Redshift, Kinesis, EventBridge, Lambda, and EMRProven expertise designing and implementing event-driven architectures using streaming technologies (Kafka / MSK, Kinesis, EventBridge)Experience building medallion architectures, lakehouse platforms, or similar modern data architectures (bronze / silver / gold patterns, Delta Lake, Iceberg)Proficiency with SQL and database design including both relational (PostgreSQL, MySQL) and analytical databases (Redshift, Snowflake)Strong programming skills in Python for data processing, transformation, and automationExperience with data orchestration tools such as Apache Airflow, AWS Step Functions, or PrefectKnowledge of data modeling techniques including dimensional modeling, star schemas, and data vaultExperience with analytics and BI tools (QuickSight, Tableau, PowerBI, Looker) and building reports / dashboardsUnderstanding of data quality, data governance, and master data management principlesFamiliarity with infrastructure-as-code (Pulumi, Terraform, CloudFormation) for managing data infrastructureStrong problem-solving skills and ability to optimize complex data workflowsExcellent communication skills with ability to explain technical concepts to diverse audiencesTeam player who collaborates effectively across engineering, product, and business teamsNice to have qualifications
AWS certifications (AWS Data Analytics - Specialty, AWS Solutions Architect, or similar)Experience with Azure data services (Data Factory, Synapse, Event Hubs) and Azure-to-AWS data migrationsKnowledge of real-time stream processing frameworks (Apache Spark Streaming, Flink, Kafka Streams)Experience preparing data for AI / ML applications including vector databases (Pinecone, Weaviate, pgvector)Familiarity with document processing, OCR, and unstructured data extraction techniquesExperience with data catalog and metadata management tools (AWS Glue Data Catalog, Alation, Collibra)Knowledge of .NET / C# and integrating data pipelines with .NET applicationsUnderstanding of SaaS multi-tenancy patterns in data architectureExperience with data privacy and compliance frameworks (GDPR, SOC 2, CCPA)Background in the AECO industry or project management domainFamiliarity with graph databases (Neptune, Neo4j) for relationship modelingExperience with serverless data architectures and cost optimization strategiesKnowledge of dbt (data build tool) or similar transformation frameworks#J-18808-Ljbffr