SRE Foundation

The Site Reliability Engineering Foundation course will help you to learn the principles & practices that allows an organization to reliably and economically scale critical services. SRE is a process of operations which emphasize to accumulate software engineering and automation solutions to ensure that continuously delivered applications are running efficiently and reliably. Our course highlights the progression of SRE in the modern software engineering process and its future direction and prepares learners with the methods, practices, and tools to engage the workforce across the organization involved in reliability and stability evidenced through the use of real-life scenarios and case studies.

Prerequisite:

  • 6 months of working knowledge in cloud or DevOps
  • Linux fundamentals

This course is intended to bring the concepts underpinning SRE

Introduction to SRE

  • What’s the difference between DevOps and SRE? – Intro
  • What’s the difference between DevOps and SRE?
  • Now SRE Everyone Else with CRE! – Intro
  • Now SRE Everyone Else with CRE
  • CRE’s Three Reliability Principles
  • Reliability in the Cloud
  • How SLOs help your business make decisions
  • How SLOs help you build features faster1m
  • How SLOs help you balance operational and project work
  • Making SLOs work for your organization

Targeting Reliability

  • SLOs vs SLAs
  • The happiness test
  • How do we measure reliability?
  • Edge cases
  • 100% is the wrong target
  • Iterating

Operating for Reliability

  • Introduction
  • Error budgets
  • Everything is a trade-off
  • Error budgets: advanced concepts
  • Axes of improvement4m
  • Operational approach to increasing reliability
  • Module summary

Choosing a Good SLI

  • Introduction1m
  • User happiness in metric form
  • The properties of good SLI metrics
  • Ways of measuring SLIs
  • The SLI menu
  • The SLI equation
  • Request / Response SLIs
  • Data processing SLIs
  • But my system is really complex!
  • Managing complexity with aggregation
  • Managing complexity with bucketing
  • Achieveable SLOs
  • Aspirational SLOs
  • Continuous improvement

Developing SLOs and SLIs

  • Introduction
  • The 4 step process
  • Our example game
  • Loading the profile page
  • Refining SLI specifications
  • Looking for observability gaps
  • Failure modes

Quantifying Risks to SLOs

  • Introduction
  • Is your error budget realistic
  • Modeling risks in our spreadsheet
  • Analyzing risk

Consequences of SLO Misses

  • Introduction
  • No surprises
  • A dashboard example
  • Why an error budget policy
  • Fundamentals of an error budget policy
  • How to draft an error budget policy
  • Example policy thresholds
  • A hypothetical policy scenario
  • Course conclusion and video wrap up