Talent.com
Director, Live Operations / Systems Reliability - NEW!

Director, Live Operations / Systems Reliability - NEW!

Go REcruitmentBC, Canada
26 days ago
Job type
  • Full-time
Job description

Our client, Netskrt.io is looking for a Director of Live Operations & Systems Reliability to oversee our managed service. Netskrt’s eCDN service is comprised of three major components : intelligent content collection, staging and distribution; adaptive networking, leveraging connectivity as and when available; and an edge cache that allows users to access the content they want locally, using the apps and subscriptions that they already have. Your prime responsibility and priority is to ensure customer excellence. You are passionate about system reliability to influence and drive the strategic Systems Reliability Engineering mission. As the leader of Live operations / Systems Reliability you are responsible for monitoring and maintaining the health of the system. We are a highly motivated team, dedicated to delivering products and services that improve the customer experience when accessing internet video at the edges of the network. You are somebody who enjoys solving problems and has a customer-centric mindset. You should be passionate not only about learning new technologies, but also about running systems and software in the real world. You must enjoy a close-knit team environment of shared responsibility, be a team player and a self-starter. You have exceptional technical skills, and enjoy solving challenging problems. You are a quick learner, you adapt easily and you have great interpersonal and communication skills. Netskrt offers the opportunity to obtain hands-on experience with storage, networking, security, and cloud technologies. As part of the Netskrt team you will have the opportunity to design and implement solutions to solve challenging problems in a startup environment; working with accomplished engineers and a leadership team with a proven track history of success.

Key Responsibilities :

  • Monitor, manage and maintain Netskrt’s managed service
  • Manage availability, latency, scalability and efficiency by instilling engineering reliability into our deployed systems with a focus on fault tolerant approaches
  • Drive quality accountability within the organization with well-defined processes, metrics, and goals for process quality. This includes leading effective post mortems and ensuring actions are followed-up
  • Drive capacity planning, performance analysis, instrumentation and other nonfunctional systems requirements
  • Define and report "progress" on strategic initiatives and project level tasks to all stakeholders including senior executives, clients and use effective communication approaches with each constituency.
  • Implement metrics driven processes to ensure service quality targets are met
  • Engage, influence, and evangelize SRE practices with development, operational and product groups to align technology service / solution delivery.

Required Qualifications, Skills, Experience :

  • Degree in Computer Science or related technical field
  • Accomplished leader with 5+ years managing regional and global teams and systems
  • Expert knowledge in all aspects of designing, developing, managing large realtime systems
  • Project and process management
  • Prior successful experience as a systems performance or systems reliability engineer
  • Mastery of Linux / Unix
  • Mastery of coding / scripting languages (e.g., C++, PHP, Python, Perl)
  • Mastery of fault tolerant approaches in a large scale distributed environment and high performance systems
  • Demonstrated experience working in large, complex systems environments
  • Deep understanding of internet and networking protocols
  • Analytical mind with excellent problem-solving skills
  • Excellent time management, communication, decision-making, presentation, and leadership and organizational skills
  • Ability to lead across functions and motivate a matrix staff
  • Desired Qualifications :

  • Proven leader of technology solutions in a high volume transaction environment
  • Maintain excellent written and verbal communications with clients, employees, and management chain, including status reports, project plans, presentations, etc.
  • Familiarity with security frameworks and risk management methodologies
  • Knowledge of patch management, intrusion detection / prevention systems
  • Cloud computing and cloud technologies (AWS, OpenStack)
  • Experience with caching and CDN (content delivery network) technologies (Netflix, Amazon, Google, Limelight, Akamai, Fastly)
  • Knowledge of data protection operations and legislation (e.g. GDPR)
  • Experience with securing IoT and / or autonomous remote devices.
  • Any questions about the company or to apply : [email protected] or [email protected]

    J-18808-Ljbffr