Site Reliability Engineering Manager (REMOTE)

Job Description

Job Description SummaryA Site Reliability Engineering (SRE) Manager is responsible for ensuring that systems and services run smoothly, reliably, and efficiently at scale. They manage a team of SREs to maintain infrastructure, handle incident response, and improve the system's reliability and performance. Below is a comprehensive job description for an SRE Manager:
The SRE Manager will lead a team of Site Reliability Engineers responsible for the availability, scalability, and performance of critical systems. This role involves working with cross-functional teams to improve system reliability and empower engineers through automation, monitoring, and incident management processes. As an SRE Manager, you will ensure your team delivers high-quality services by focusing on system resilience, performance optimization, and operational excellence.Job Description

We are the makers of possible

BD is one of the largest global medical technology companies in the world. Advancing the world of health is our Purpose, and its no small feat. It takes the imagination and passion of all of usfrom design and engineering to the manufacturing and marketing of our billions of MedTech products per yearto look at the impossible and find transformative solutions that turn dreams into possibilities.

We believe that the human element, across our global teams, is what allows us to continually evolve. Join us and discover an environment in which youll be supported to learn, grow and become your best self. Become a maker of possible with us.

Position Summary

A Site Reliability Engineering (SRE) Manager is responsible for ensuring that systems and services run smoothly, reliably, and efficiently at scale. They manage a team of SREs to maintain infrastructure, handle incident response, and improve the system's reliability and performance. Below is a comprehensive job description for an SRE Manager:

The SRE Manager will lead a team of Site Reliability Engineers responsible for the availability, scalability, and performance of critical systems. This role involves working with cross-functional teams to improve system reliability and empower engineers through automation, monitoring, and incident management processes. As an SRE Manager, you will ensure your team delivers high-quality services by focusing on system resilience, performance optimization, and operational excellence.

Key Responsibilities:

Leadership & Strategy:

  • Lead and mentor a team of SREs, fostering a culture of ownership, reliability, and accountability.
  • Collaborate with development, operations, and product teams to define and drive reliability strategies and initiatives.
  • Establish and monitor key performance indicators (KPIs) for system reliability and performance.
  • Provide guidance and prioritize efforts to prevent and mitigate incidents, improve incident response times, and enhance post-incident processes.

Operational Excellence:

  • Ensure 24/7 availability of critical systems by overseeing and improving the incident management process, including on-call rotations.
  • Develop and implement disaster recovery and business continuity strategies.
  • Drive automation efforts to reduce manual work, improve operational efficiency, and enhance system performance.
  • Lead postmortem analysis after major incidents and drive continuous improvements through root cause analysis.

System Reliability & Performance:

  • Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to monitor and maintain the system's reliability.
  • Build and maintain automated monitoring, alerting, and self-healing systems.
  • Work closely with software engineering teams to design and implement infrastructure that is scalable and resilient.
  • Ensure systems can handle capacity and performance requirements through load testing and scaling solutions.

Collaboration & Communication:

  • Act as a point of escalation for incidents and outages, ensuring timely resolution and communication.
  • Promote a culture of proactive collaboration with development and operations teams to improve systems before failures occur.
  • Communicate effectively with stakeholders regarding the status of ongoing projects and incidents.
  • Foster a culture of continuous improvement and learning within the team.

Qualifications:

Technical Expertise:

  • 5+ years of experience in Site Reliability Engineering or DevOps, with at least 2 years in a servant leadership role.
  • Ability to manage personnel to reach maximum potential for both service and personal growth
  • Strong understanding of cloud environments (AWS, Azure, Google Cloud) and orchestration tools (Kubernetes, Docker).
  • Proficient in infrastructure-as-code (Terraform, AWS CDK and CloudFormation) and scripting languages (TypeScript, PowerShell or Go-Lang).
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, etc.).
  • Strong knowledge of CI/CD pipelines, version control systems (Git), and configuration management tools.
  • Experience with Agile methodologies and good understanding of service and microservices.

Leadership & Management:

  • Proven experience in managing on-call rotations, incident management processes, and operational readiness.
  • Experience in mentoring and growing engineering teams.
  • Strong problem-solving and decision-making skills, with the ability to navigate complex system challenges.
  • Ability to work under pressure during high-stress incidents while maintaining composure.

Soft Skills:

  • Excellent communication and collaboration skills.
  • Ability to balance short-term fixes with long-term improvements.
  • Strong organizational and time-management skills.

Preferred Qualifications:

  • Experience with multi-cloud environments.
  • Familiarity with compliance requirements (e.g., SOC 2, ePHI, ISO 27001).
  • Experience working in a fast-paced, startup environment or large-scale, enterprise systems

Education Qualifications & Previous Experience:

  • Bachelors degree in a related field or minimum 10 years relevant IT experience

Desired/Additional Skills & Knowledge:

  • Knowledge of Microsoft Azure virtual network appliances
  • Knowledge of network protocols such as: DNS, SMTP, SNMP, SSH, SFTP, etc.
  • Knowledge of Network and TCP/IP routing/subnetting
  • Working knowledge of VPN connectivity
  • Knowledge of backup and disaster recovery processes
  • Knowledge of DevOps, Agile, Infrastructure as code strongly desired
  • Knowledge of Azure SaaS, PaaS, IaaS, offerings, and services in Azure commercial and DOD regions

Certifications

  • Microsoft or other cloud provider certifications preferred
  • Management or other employee management training or certification desired.

Any Additional Information

  • Experience working in a servant leadership environment
  • Able to build strong partnership with business partners and the project teams
  • Strong analytical and decision-making abilities
  • Takes responsibility for delivering superior value and client service
  • Works well with people who have diverse abilities, experiences, and perspectives
  • Influences others without direct authority
  • Approaches opportunities and issues with an optimistic, action-oriented, and solution-based approach.
  • Good writing skills to document plans and process

For certain roles at BD, employment is contingent upon the Companys receipt of sufficient proof that you are fully vaccinated against COVID-19. In some locations, testing for COVID-19 may be available and/or required. Consistent with BDs Workplace Accommodations Policy, requests for accommodation will be considered pursuant to applicable law.

Why Join Us?

A career at BD means being part of a team that values your opinions and contributions and that encourages you to bring your authentic self to work. Its also a place where we help each other be great, we do whats right, we hold each other accountable, and learn and improve every day.

To find purpose in the possibilities, we need people who can see the bigger picture, who understand the human story that underpins everything we do. We welcome people with the imagination and drive to help us reinvent the future of health. At BD, youll discover a culture in which you can learn, grow, and thrive. And find satisfaction in doing your part to make the world a better place.

To learn more about BD visithttps://bd.com/careers

Becton, Dickinson and Company is an Equal Opportunity/Affirmative Action Employer. We do not unlawfully discriminate on the basis of race, color, religion, age, sex, creed, national origin, ancestry, citizenship status, marital or domestic or civil union status, familial status, affectional or sexual orientation, gender identity or expression, genetics, disability, military eligibility or veteran status, or any other protected status.

Primary Work LocationUSA CA - San Diego Bldg A&BAdditional LocationsWork Shift

At BD, we are strongly committed to investing in our associatestheir well-being and development, and in providing rewards and recognition opportunities that promote a performance-based culture. We demonstrate this commitment by offering a valuable, competitive package of compensation and benefits programs which you can learn more about on our Careers Site under Our Commitment to You.

Salaryor hourly rateranges have been implemented to reward associates fairly and competitively, as well as to support recognition of associates progress, ranging from entry level to experts in their field, and talent mobility. There are many factors, such as location, that contribute to the range displayed.The salaryor hourly rateoffered to a successful candidate is based on experience, education, skills, andany step rate pay system of the actualwork location, as applicable to the role or position.Salaryor hourly payranges may varyfor Field-based and Remote roles.

Salary Range Information

 

*Please mention you saw this ad on AsiansInAcademia.*

Apply Now

Be Seen By Recruiters at the Best Institutions

Create a FREE Profile to be Seen!

Want to stand

Hiring for Asians In Academia Begins Here.