At my last job, a product manager asked me if I had any reading recommendations for someone who wanted to learn more about what an SRE (Site Reliability Engineer) does. I wrote one up; here it is.
These are not sorted by topic or theme. In my experience, technical and business issues tend to intertwine whether you’re an IC or a manager: every book in this list has something to say to anyone in a technical organization.
Accelerate, by Nicole Forsgren, Jez Humble, and Gene Kim
This book is what happens when you apply scientific survey techniques to the discipline of software delivery. The four key metrics for performance (and employee satisfaction) are not what I assumed they were. This is also another one that’s really good for technical operations people who want to learn more about the software engineering chunk of their life.
The Phoenix Project, by Gene Kim, Kevin Behr, and George Spafford
Probably the clunkiest novel I’ve read in a while, considered purely as a work of fiction – but I can’t deny that it was tremendously influential on my field and I enjoyed it very much. Read this if you want to understand why devops is effective as a practice. It’s also subtly really useful for reminding yourself that the challenges of software development, delivery, and operations are closely related to challenges the manufacturing world has been dealing with for ages.
There’s a sequel, The Unicorn Project, which is also good.
Swarmwise, by Rick Falkvinge
An activist’s practical lessons in how to build flat organizational structures that are highly effective. In corporate life, we (mostly) aren’t working with volunteers, but this is still great material on how to empower people to act independently while still providing effective guidance. Pair this with Team of Teams, mentioned below.
This book is free.
The Code of Trust, by Robin Dreeke
The lessons learned from a lifetime of counterintelligence work. Although Dreeke’s career sometimes involved gaining the trust of people who shouldn’t necessarily have trusted him, I found this to be a strong primer on how to gain trust under more normal conditions. Also really interesting if you’re fond of espionage stories.
Site Reliability Engineering and The Site Reliability Workbook
Google’s foundational books on their SRE practice. Every company will implement these concepts differently and one size does not fit all, but they’re tremendously valuable reading. Both of these are free.
Team of Teams, by Stanley McChrystal
Excellent overview of the changes General McChrystal made to a fairly traditionally-minded organization when he needed to adapt to circumstances. It starts with a very informative history of management techniques as a bonus. This book is the counterpoint to Swarmwise: same ideas, different implementations.
The Checklist Manifesto, by Atul Gawande
You can skip this book if you internalize one sentence: “Make reusable checklists for everything you do.” It’s short and well written, though.
Team Topologies, by Matthew Skelton and Manuel Pais
This book informs my philosophy of treating infrastructure as a platform. In the Team Topologies model, the classic SRE team is a platform team with a healthy side order of enablement. The core of the book is summarized in this infographic.
The Practice of Cloud System Administration, by Thomas Limoncelli, Strata Chalup, and Christina Hogan
The best practical guide to managing complex systems in the cloud I’ve ever read. Appendix A, on assessments, is worth its weight in gold all by itself.
The Field Guide to Understanding Human Error, by Sidney Dekker
Very dense (but short) book approaching the question of human error from an industrial perspective. Lots to learn from people who’ve been studying this question for a long time. You will never blame an incident on human error again.
As an alternative to reading the book, you could watch his video summaries: Part 1, Part 2, Part 4, and Part 5.
To Read
I haven’t read these yet, but they’re on my list. If you have comments about any of them I’d be interested!
Continuous Delivery, by Jez Humble and David Farley
Remote Team Interactions Workbook, by Matthew Skelton and Manuel Pais (sequel to Team Topologies)
The First 90 Days, by Michael D. Watkins
Disclaimer
I like books written by opinionated people; this reading list reflects that. While I generally find the core ideas of these books useful, I don’t necessarily concur with other ideological or philosophical concepts therein.
One thought on “Reading List”