Introducing Playbooks automation
We're rolling out Playbooks, our latest in fully automating the incident response process. Imagine every action you (incident responders), had to manually take are now fully automated with Playbooks. Steps like initiating a war room (video conference), logging incidents, sending out alerts, and running diagnostic scripts are now executed with precision, every single time, are all now effortlessly automated without you lifting a finger.
It’s been quite the journey to bring Playbooks to life, fueled by hard work and collaboration. We're quite proud of what we’ve accomplished and are even more focused on the practical benefits this automation brings to your incident response strategy.
The core of Playbooks: High Deduction and Automation
Playbooks automate the manual steps you’d typically take when an incident triggers, ensuring tasks are completed quicker and accurately without anyone's constant oversight. They operate seamlessly every time, without any need for user intervention.
Pro tip: Playbooks are designed to run automatically, but they're also accessible for manual run from the dashboard or incident details page.
List of all actions to automate
Below is a list of all the actions you can automate -
- Add Responders: Automatically bring in the right team members to address an incident.
- Set Priority: Assign priority from P1 to P5.
- Set Severity: Set severity from SEV1 to SEV3 to indicate the severity.
- Acknowledge Incidents
- Resolve Incidents
- Setup Conference calls (war room): Automatically set up war rooms and invite one or more members.
- Execute Scripts (Outbound webhooks): Run diagnostic or remediation scripts automatically.
- Imrpove incident title (with Title Remapper): Automatically updates the Title of your incident making it more clear with context.
- Update Status Pages: Automatically creates and resolves incidents on Status Page keeping your users and stakeholders informed.
- Create Tickets: Log incidents in project management like JIRA, Linear, and ClickUp (more coming soon).
- Create Customer Support Tickets: A new customer support ticket on Freshdesk, Zendesk, and SupportPal.
Sequence of actions in automation will be preserved.
You can add any number of actions to your Playbooks (there are no limits), and we ensure they're executed in the exact sequence you specify.
For example, if you set up a Playbook to first resolve an incident, then create a support ticket, and finally trigger an outbound webhook, the automation will run in this precise order every time
Maximizing Incident Management Efficiency with Playbooks
Playbooks improves the speed to execute your incident response process through automation, catering to both ends of the severity spectrum— from urgent crises to routine alerts —and extending this ease of management to external communications via automated status pages.
1. Automating Critical Incident Actions 🔥
Critical incidents demand immediate action, and every minute saved can significantly reduce potential damage (and MTTR). Playbooks will automate these critical response actions:
- Immediate Team Alerts: Ensures that all relevant team members are informed instantly (triggered from your escalation policy)
- War Room Setup: A virtual meeting space is automatically set up, gathering all necessary team members.
- Diagnostic Scripts: Run automatically to provide an initial assessment, or scaling up, taking backups, etc.
Use these actions ::
- Adding Responders: Automatically bringing in senior engineering team members.
- Setting Priority: Assigning the incident a P1 urgency level for immediate attention.
- Setting Severity: Marking the incident as SEV1 to highlight its critical nature.
- War Room Setup: Creating a Google Meet for immediate, focused discussion.
- JIRA Integration: Logging the incident for comprehensive tracking and action.
For instance, in the event of a network breach, a Playbook can instantly gather the security team in a war room, initiate predefined security protocols, and alert stakeholders, significantly reducing the breach's impact. What is your most critical incident?
2. Streamlining Low Severity Incident Management
On the flip side, not every incident requires a red-alert response. Playbooks efficiently manage these lower-severity incidents by automating routine tasks:
- Define Priority & Severity: Automatically sets low priority and severity. Helpful if it were to trigger again since other playbooks can catch it and resolve.
- Direct Resolution: Known non-critical issues can be resolved automatically based on predefined criteria.
Use these actions ::
- Severity Setting: Marking the incident as SEV3, indicating lower severity.
- Priority Assignment: Assigning a P5 priority to reflect its lower urgency.
- Automatic Resolution: Marking the incident as resolved, allowing teams to focus on more critical issues without distraction.
This automation makes sure that the team’s focus remains on high-priority incidents without being sidetracked by minor issues.
3. Automated Status Pages to Enhancing External Communication
Beyond internal incident management, maintaining clear and consistent communication with customers and stakeholders is paramount to maintain transparency. Automated status pages ftw:
- Match conditions, create incident: Automatically updates the status page at the onset of an incident with given pre-defined conditions. Users can also do this with a click of a button from any incident.
- Resolution Notifications: Informing subscribers immediately once the issue is resolved. Set another playbook to do this when the incident is resolved.
This will uplift your transparency game and reduce the workload on teams to manually update subscribers so they can focus on resolving the incident at hand.
4. Support team sync
Beyond the above, Playbooks can also automate syncing your support teams
- Incident Resolution: A know low priority incident? Resolve it instantly.
- Support Ticket Creation: Auto-generates a detailed tickets for customer support in platforms like Freshdesk and others.
5. Investigating a potentially critical incident
Since Playbooks can be run manually, responders can spot and unknown incident and run more actions to raise a warning:
- Incident Creation in Project Management Tools: like Jira or Linear so your work is accounted and the incident is constantly tracked in sprints
- Adding key members for collab: Notifies members from your team who can help invesitage this further
- Dump logs, create backups: External scripts with outbound webhooks can securely run to say create backups or dump logs for better investigation
By integrating Playbooks into their incident response strategy, teams can not only respond to incidents more efficiently but also communicate more effectively with their user base.
Impact of Automation in Incident Response
Time is an important currency of operational integrity. Traditionally, even the most adept teams faced a daunting challenge: managing response times effectively while juggling the complexities of various incident severities. Playbooks will turn the tide by automating critical aspects of incident response and it significantly reduces the Mean Time to Resolution (MTTR) + ease the burden on responders, and cutting through the noise of alert fatigue.
Drastic Reduction in Response Times
With Playbooks, what used to take expert teams about 10 minutes to 2 hours, now takes mere seconds.
For an intermediate team, navigating incident responses without Playbooks could stretch to a stressful 30 minutes to 2 hours timeframe. Automation brings this down to an astounding 2 seconds or less for initial actions. This reduction in MTTR is not just about speed; it's about the ripple effect of minimizing operational disruptions, safeguarding customer trust, and ultimately, preserving revenue.
Reducing Responder Toll
High-pressure environments can take a significant toll on responders.
The mental load of rapidly prioritizing incidents, executing manual response steps, and communicating with stakeholders can be overwhelming. Automation through Playbooks lifts much of this burden, allowing responders to focus on critical thinking and strategic decisions rather than mechanical tasks. This is a huge +1 to well-being of responders.
Targeting Alert Fatigue with Precision
Say this with me - Alert Fatigue is real. Teams inundated with constant notifications, especially from low-severity incidents, risk becoming desensitized, which can lead to slower responses or overlooked alerts when real crises strike. By automating the handling of routine, low-severity incidents, Playbooks ensure that teams are alerted only to high-severity incidents that genuinely require their attention. This targeted approach not only sharpens focus but also preserves the alert system's integrity as a tool for urgent communication.
Quantifiable Benefits: Stats and Numbers
While the qualitative impacts of automation in incident response are clear, the quantitative benefits further underscore its value. Organizations leveraging Playbooks have reported:
- Up to a 40% reduction in MTTR for critical incidents, significantly limiting potential damage and downtime.
- A 80% decrease in manual tasks for responders
- A notable 50% reduction in alert volume reaching responders, directly combating alert fatigue and also enhancing response quality to critical alerts.
By drastically reducing response times, reducing the toll on responders, and targeting alert fatigue with precision, Playbooks can redefine what's possible in incident response. This isn't just about responding faster; it's also about aligning to sustainable, effective ways of modern digital operations.
The internals of Playbooks
Alright, let’s dig deeper into how Spike's Playbooks work?!
When an incident reaches us, it's packed with data—a payload that's essentially the incident's DNA. While I can't spill the beans on our secret sauce for making this payload human-friendly (think of it as our little bit of wizardry), it's this data that sets the stage for our automation.
The Nitty-Gritty of Playbook Operations:
- Cracking the Incident Code: Each incident gets analyzed for unique markers—keywords, how often it’s popping up, you name it. Based on what we find and the automation conditions you’ve set, we decide if any Playbook has to be run.
- Unleashing the Playbooks: If an incident is a match, your Playbooks spring into action. It’s all about getting the right response rolling, stat.
- Alert Rules Tango: Post-Playbook action, our alert rules take the incident for a sping. Say a Playbook just cranked an incident up to SEV1; our alert system then knows it’s time to route alerts the critical incidents crew. It’s a dynamic duo of precision and smarts.
- Round Two with Playbooks: Think of this as the encore. After the alert rules have had their say, Playbooks springs up again to double-check if there’s any other conditions are to be matched and run.
- Curtain Call - Incident Creation: With our backstage operations wrapped up, the incident officially takes the spotlight. This whole process ensures that from the moment an incident drops in to the final bow, everything runs like clockwork.
You are not about automating for the sake of it; you are creating a seamless, smart system that’s always on, always alert, and always ready to respond with the precision of a scalpel.
The Genesis of Playbooks at Spike.sh
Launching Playbooks was a venture into uncharted territory, marked by a clear goal: to automate every step of the incident response process. Our challenge lay in architecting a system flexible enough to host an unlimited array of actions and integrations, all while keeping the UI intuitive. We envisioned a tool where limitations were non-existent, mirroring our escalation policies.
Inspiration for Playbooks came from direct conversations with the people at the heart of engineering—those who dealt firsthand with the daily obstacles of managing incidents. It became clear that much of the incident handling extended beyond Spike.sh, involving various external tools and actions. This steered us last year to create a more inclusive platform, one that beams incidents to tools like Linear, Jira, Freshdesk, etc.
Feedback from our users has been instrumental in shaping Playbooks. Automating status page updates seemed like a common them (which was not anticipated). We went back to our boards and moved a few things around like releasing Status Page API and then marrying it with Playbooks. Similarly, almost all larger entities in Spike are now domain apis which are plugged into Playbooks. This structure allows us to extend and add more actions fast.
Future plans
The future plans for Playbooks root in the idea of making most of Spike's features extended into Actions such as Links in incidents, Resolution notes after resolving incidents, triggering outbound webhooks for every shift rotation, custom prompts for users, and also taking user inputs.
We will also keep our ears to the ground on what you have to say and what you want to accomplish with automation. Let us know!
We are quite excited and proud to release this and look forward to hearing your thoughts and how you use it.
To get started, on your dashboard, visit Automation -> Playbooks. Find the docs detailing playbooks here.
Thanks for reading!