Director, Live Operations / Systems Reliability - NEW!

Go REcruitmentBC, Canada

26 days ago

Job type

Full-time

Job description

Our client, Netskrt.io is looking for a Director of Live Operations & Systems Reliability to oversee our managed service. Netskrt’s eCDN service is comprised of three major components : intelligent content collection, staging and distribution; adaptive networking, leveraging connectivity as and when available; and an edge cache that allows users to access the content they want locally, using the apps and subscriptions that they already have. Your prime responsibility and priority is to ensure customer excellence. You are passionate about system reliability to influence and drive the strategic Systems Reliability Engineering mission. As the leader of Live operations / Systems Reliability you are responsible for monitoring and maintaining the health of the system. We are a highly motivated team, dedicated to delivering products and services that improve the customer experience when accessing internet video at the edges of the network. You are somebody who enjoys solving problems and has a customer-centric mindset. You should be passionate not only about learning new technologies, but also about running systems and software in the real world. You must enjoy a close-knit team environment of shared responsibility, be a team player and a self-starter. You have exceptional technical skills, and enjoy solving challenging problems. You are a quick learner, you adapt easily and you have great interpersonal and communication skills. Netskrt offers the opportunity to obtain hands-on experience with storage, networking, security, and cloud technologies. As part of the Netskrt team you will have the opportunity to design and implement solutions to solve challenging problems in a startup environment; working with accomplished engineers and a leadership team with a proven track history of success.

Key Responsibilities :

Monitor, manage and maintain Netskrt’s managed service
Manage availability, latency, scalability and efficiency by instilling engineering reliability into our deployed systems with a focus on fault tolerant approaches
Drive quality accountability within the organization with well-defined processes, metrics, and goals for process quality. This includes leading effective post mortems and ensuring actions are followed-up
Drive capacity planning, performance analysis, instrumentation and other nonfunctional systems requirements
Define and report "progress" on strategic initiatives and project level tasks to all stakeholders including senior executives, clients and use effective communication approaches with each constituency.
Implement metrics driven processes to ensure service quality targets are met
Engage, influence, and evangelize SRE practices with development, operational and product groups to align technology service / solution delivery.

Required Qualifications, Skills, Experience :

Degree in Computer Science or related technical field

Accomplished leader with 5+ years managing regional and global teams and systems

Expert knowledge in all aspects of designing, developing, managing large realtime systems

Project and process management

Prior successful experience as a systems performance or systems reliability engineer

Mastery of Linux / Unix

Mastery of coding / scripting languages (e.g., C++, PHP, Python, Perl)

Mastery of fault tolerant approaches in a large scale distributed environment and high performance systems

Demonstrated experience working in large, complex systems environments

Deep understanding of internet and networking protocols

Analytical mind with excellent problem-solving skills

Excellent time management, communication, decision-making, presentation, and leadership and organizational skills

Ability to lead across functions and motivate a matrix staff

Desired Qualifications :

Proven leader of technology solutions in a high volume transaction environment

Maintain excellent written and verbal communications with clients, employees, and management chain, including status reports, project plans, presentations, etc.

Familiarity with security frameworks and risk management methodologies

Knowledge of patch management, intrusion detection / prevention systems

Cloud computing and cloud technologies (AWS, OpenStack)

Experience with caching and CDN (content delivery network) technologies (Netflix, Amazon, Google, Limelight, Akamai, Fastly)

Knowledge of data protection operations and legislation (e.g. GDPR)

Experience with securing IoT and / or autonomous remote devices.

Any questions about the company or to apply : [email protected] or [email protected]

J-18808-Ljbffr