Senior Site Reliability Engineer

Senior Site Reliability Engineer

WHY WE NEED YOU

PagerDuty is the leading digital operations management platform for businesses. Our global teams work together, iterate constantly, and solve complex problems to help our 10,000+ customers deliver beautiful customer experiences powered by happy teams and healthy software.  PagerDuty is a place where you can do some of the most interesting, impactful, challenging, and exciting work of your career.

At PagerDuty we need to be up when our customers are down.  The stability, performance and resilience of our infrastructure are of paramount importance.  We rely on the Site Reliability Engineering team to maintain the platforms and services that our development teams count on to deliver a four-9s experience.  Whether it's provisioning, continuous integration/deployment, monitoring, or cloud platform management, SREs provide the foundation upon which the PagerDuty product is built.  As a member of the SRE team you will maintain, optimize and troubleshoot the PagerDuty infrastructure of today while designing and architecting the platform of tomorrow.
 

HOW YOU CONTRIBUTE TO OUR VISION  

  • \t
    You architect, build, and automate the cloud production infrastructure on which PagerDuty runs.
    \t
  • \t
    You partner with Engineering stakeholders to design and deliver a reliable, scalable, secure, and performant platform.
    \t
  • \t
    You continuously strive to improve the customer experience: Full lifecycle support (creation, development, deployment, retirement), observability, flexible connectivity, and monitoring.
    \t
  • \t
    You stay current on technical trends in order to suggest innovative tools and approaches to interesting problems.
    \t
  • \t
    You share your expertise with the entire Engineering organization.
    \t
  • \t
    You participate in a 24/7 on-call rotation. And yes, we use PagerDuty to manage our on-call schedules.
    \t
 
ABOUT YOU

  • \t
    You are a leader and an influencer.  You oversee large-scale solutions and clarify complex problems.  You propose and champion improvements in processes and technology choices.  You make your team and your teammates better.
    \t
  • \t
    You have solved multiple problems by writing code to automate your way out of them. You have replaced manual processes time and time again with your code.
    \t
  • \t
    You have been responsible for running critical services that multiple customers depend upon. You understand the importance and impact that operational optimization can have on a product and the positive ripple effects that it can have across an entire engineering organization.
    \t
  • \t
    You believe CI servers, push-button deploys, time-series datastores, metrics dashboards, and centralized logging are not just "nice to haves," they are critical pieces of infrastructure that rapidly pay for themselves. You are familiar with the tool-space and can suggest products in each of these areas.
    \t
  • \t
    You are empathetic: You take others' opinions into account and clearly communicate your thoughts to reach technical solutions quickly.
    \t
  • \t
    You consider it important to understand and appreciate your customers, and enjoy seeing your work improve the work of others.
    \t
 
MINIMUM QUALIFICATIONS

  • \t
    Excellent knowledge of at least one domain-relevant language, such as Ruby, Python, or Go
    \t
  • \t
    Experience managing an AWS-based, cloud-native infrastructure and its foundational services, including EC2, S3 and other storage options, VPCs, IAM, and more
    \t
  • \t
    Experience with Docker in a production environment including container orchestration (Kubernetes a plus)
    \t
  • \t
    Knowledge of at least one configuration management system (e.g. Chef, Puppet, or Ansible)
    \t
  • \t
    Applicants must be currently authorized to work in the United States on a full-time basis
    \t
 
PREFERRED QUALIFICATIONS

  • \t
    Experience with infrastructure as code  (Terraform or CloudFormation)
    \t
  • \t
    Experience with Splunk or other log analysis platforms
    \t
 
BENEFITS TO GET EXCITED ABOUT

  • \t
    Competitive salaries and company equity
    \t
  • \t
    Comprehensive benefits package including: medical, dental, and vision plans for you, your spouse and family; 401K, pre-tax commuter benefits, corporate discounts, cell phone allowance and more!
    \t
  • \t
    Generous parental leave, paid vacation (3 weeks vacation your first year, 4 weeks afterwards) in addition to 12 paid holidays and ample sick leave.
    \t
  • \t
    Monthly company wide hack days
    \t
How We Work
PagerDuty Engineering teams are set up to be mini innovation pods. We practice what we preach, and believe that every engineer can build great products to delight our thousands of customers.

Teams are set up to be able to achieve success autonomously while remaining accountable for results. Every team has full vertical ownership of their own services and are able to release as frequently as they want to. We practice the mantra of 'Code It. Ship It. Own It.' and believe that teams are most successful when they are able to own every decision in order to run their software. Every team gets to be a part of our growth by building highly resilient and durable software that scales from our startup customers to Fortune 100 companies.

We deploy over 1000 times a month and every engineer is able to ship high quality software to production on their own. Teams own their own tests and yes, we use PagerDuty to manage incidents. Teams own their own way of working and can use the agile practices of their choice to work collaboratively via incremental delivery.

We support engineers to explore ideas via monthly Hack Days, actively attack our own infrastructure weekly to learn and get better, host an annual internal technical conference called PagerCon, ask our engineers to represent PagerDuty at industry events, and contribute to the open source community.
About Us PagerDuty is the leading digital operations management platform for organizations. Over 10,000 enterprises and small to mid-size organizations globally trust PagerDuty to improve digital operations, drive revenue, mitigate threats, protect assets, and delight customers. We were included in the 2017 Deloitte Technology Fast 500 for the second year in a row, Inc. 500 and Forbes Cloud 100 lists as well as the 2018 Best Places to Work in the Bay Area.
PagerDuty does not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, parental status, veteran status, or disability status.

Our stewardship of the data of many thousands of customers means that a background check is required to join PagerDuty.  We will, nonetheless, consider qualified applicants with arrest and conviction records in accord with applicable law.

PagerDuty uses the E-Verify employment verification program.
Senior Level
Apply for Position
PagerDuty
pagerduty