Job descriptionWe are seeking a Senior Consultant in Site Reliability Engineering (Network SRE) to lead network-centric reliability practices across the Shared Platform ecosystem. This role focuses on ensuring resilience, scalability, and operational excellence for all Shared Platform-hosted applications and their interfacing systems, including network and messaging dependencies such as Solace and colocation integrations. The ideal candidate will bring an SRE mindset to early-stage design, embedding reliability, observability, governance, and operational readiness into network architecture and integrations. Key Responsibilities
Lead the network SRE perspective during early design phases for application integrations and end-to-end network architecture. Define and enforce reliability standards across Shared Platform-hosted and interfacing applications. Own and govern the
Data Flow Diagram (DFD)
lifecycle, ensuring accuracy, quality, and alignment with architecture and operations. Establish and drive network reliability controls as part of onboarding and governance processes for new and existing integrations. Collaborate with application, platform, and operations teams to define and implement monitoring, alerting, and capacity planning standards. Identify risks, failure modes, and reliability gaps in network components and proactively drive improvements. Ensure operational readiness through SRE best practices, including observability, incident prevention, and continuous improvement. Promote consistent adoption of SRE-aligned network practices across cross-functional teams. Required Skills & Experience
Strong experience in
Site Reliability Engineering (SRE)
with a focus on network infrastructure and distributed systems. Deep understanding of
network architecture, integration patterns, and messaging systems
(experience with Solace is a plus). Proven experience in designing and implementing
monitoring, alerting, and observability frameworks . Hands-on expertise in
capacity planning, performance tuning, and reliability engineering . Experience with
Data Flow Diagrams (DFDs) , architecture documentation, and governance practices. Strong knowledge of
failure analysis, risk mitigation, and operational readiness frameworks . Ability to work across teams in a
matrix organization , influencing stakeholders and driving alignment. Excellent communication skills with the ability to translate technical concepts into actionable strategies.
#J-18808-Ljbffr