This job listing has expired and may no longer be relevant!
28 Sep 2023

Service Reliability Engineering Lead at Safaricom Kenya

Recruit candidates with Ease. 100% recruitment control with Employer Dashboard.
We have the largest Job seeker visits by alexa rankings. Post a Job

Resubmit your Resume Today. Click Here to Start

We have started building our professional LinkedIn page. Follow


Job Description

Safaricom is the leading provider of converged communication solutions in Kenya. In addition to providing a broad range of first-class products and services for Telephony, Broadband Internet and Financial services, Safaricom seeks to uplift the welfare of Kenyans through value-added services and support for community projects.

Summary

Reporting to the Senior Manager – Systems Engineering, the SRE team lead will be responsible for championing and driving operational excellence through driving the adoption of SRE best practices and ensuring system availability, performance, efficiency, change management, monitoring, emergency response, security and capacity planning.

Key Responsibilities:

  • Oversee and lead the implementation of the SRE frameworks and practices within the organization using the systems operations tool chain. Foster a collaborative and inclusive team culture that emphasizes reliability, innovation, and continuous improvement.
  • Team Management: Ensure team performance management while fostering an environment of trust, learning, collaboration and cultivate a culture of high performance.
  • Build, recruit, retain, manage and develop a world class SRE team.
  • Operational Excellence – Define, measure, monitor and report key SRE performance indicators and escalate breaches and violations. This will help in informing the maturity level of the team as well as to inform the Backlog and related decisions. Collaborate with cross-functional teams to identify, prioritize, and address reliability issues.
  • Stakeholder Engagement by engaging the business teams and promoting a culture of participation and collaboration to enhance effective and informed decision making.
  • Define, measure, monitor and report key systems reliability performance indicators and escalate breaches and violations.
  • Problem and Incident management – lead incident response efforts, ensuring that incidents are resolved quickly and effectively while minimizing downtime and customer impact. Conduct post-incident reviews to identify root causes and implement preventive measures.
  • Capacity Planning – Monitor system resource utilization and plan for capacity upgrades as needed to support business growth. Optimize resource allocation and cost-efficiency.
  • Security and Compliance: Collaborate with security teams to ensure the reliability and security of systems and applications. Ensure compliance with relevant industry standards and regulations.
  • Drive continuous improvement of applications through planned chaos simulations, AIOPs, automation and proactive alerting strategies.
  • Documenting “tribal” knowledge and constant upkeep of the playbooks, runbooks to ensure teams get the information they need right when they need it.
  • Champion and lead implementation of machine learning, self-healing and drive the organization towards a no-ops model.

Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field (Master’s degree preferred).
  • Several years of experience in SRE or a related field, with a proven track record of improving system reliability.
  • Strong leadership and team management skills.
  • Proficiency in programming/scripting languages (e.g., Python, Go, Ruby).
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Knowledge of cloud computing platforms (e.g., AWS, Azure, Google Cloud).
  • Familiarity with monitoring and alerting tools (e.g., Prometheus, Grafana, ELK Stack).
  • Excellent problem-solving and communication skills.
  • Ability to work in a fast-paced, dynamic environment and handle high-pressure situations effectively.


Method of Application

Submit your CV and Application on Company Website : Click Here

Closing Date : 5 October. 2023





Subscribe


Apply for this Job