Introduction to Oncall schedules

Oncall refers to the practice of having team members available to quickly respond to incidents. It enables routing alerts to dedicated members in specific time slots. Having someone on-call guarantees the smooth operation of systems and services, regardless of the time of day. On-calls play a crucial role in addressing and resolving issues to limit the impact on business operations.

Setting up an on-call schedule

Creating a well-structured on-call schedule is the first important step.

Here are the key considerations while creating and managing on-call schedules:

  1. Set Availability and Response Time: It is of utmost importance to establish a well-defined schedule for on-call duties. This involves indicating your availability based on when you can fully dedicate your attention to these responsibilities. Set your on-call availability by specifying the days and times when you are available each week.
💡 Spike Suggests: To kickstart on-calls, utilize templates for on-call schedules based on your availability.
  1. Enhance on-call scheduling with layers: To optimize your on-call scheduling, consider incorporating layering into the process. By adding multiple layers to your on-call schedule, you can make sure that you have primary and secondary responders available at all times. Primary responders act as the first line of defence, promptly acknowledging and resolving incidents. Secondary responders, on the other hand, provide backup support and step in if additional assistance is required. This layered approach to on-call scheduling guarantees swift incident handling.
  2. Collaborate with Team Members: Collaboration is the key to a triumphant on-call schedule. Work closely with your team members to handle incidents and alerts. Adding team members helps manage the incidents efficiently. Additionally, ensure that all team members have a crystal-clear understanding of their roles and responsibilities during on-call shifts. This collaborative approach fosters unity and provides seamless coverage during on-calls.
  3. Add the on-call schedule to your calendar
💡 Spike Suggests: To stay organized and not miss shifts, add your on-call schedule to your official calendar.

By doing so, you can easily keep track of your upcoming shifts and plan your personal and work commitments accordingly. This simple step helps you stay on top of your on-call responsibilities and make sure that you are always available when needed.

  1. Utilize on-call overrides: Sometimes, you may need to modify the on-call schedule to accommodate specific circumstances. In such cases, it is essential to utilize on-call overrides. An on-call override allows you to temporarily assign a team member to cover a shift without making permanent changes to the entire schedule. This flexibility ensures that you can quickly adapt to unexpected situations or staff shortages, guaranteeing uninterrupted coverage for your on-call responsibilities.
  2. Configure different modes: Set a mode if you feel a bit offbeat while working or need to concentrate deeply. It is crucial for everyone's well-being to maintain a proper work-life balance by setting different modes, such as:
  • Deep Work Mode: Activate deep work mode to temporarily silence unnecessary notifications. You will only be alerted for critical or high-priority incidents during this time.
  • Cool Mode: Having a tough day? Relax with Cooldown mode. You can delegate your duties, including on-call responsibilities, to a colleague.
  • Vacation Mode: If you are on vacation, you can schedule or instantly delegate your duties, including on-call responsibilities, to a colleague. Whether you are engrossed in deep work or enjoying a well-deserved vacation, set a phone call alert mode to keep your team updated.

By implementing these strategies, you can establish a robust on-call system that enhances the availability and reliability of your systems and services.

Prepare for On Call

Once your on-call schedule is in place, the next step is to familiarize yourself with the preparations required for going on call.

Here's a structured approach to get you ready:

  1. Choose the Right Escalation Policy: Once the on-call schedule is set up, the next step is to align it with specific escalation policies related to the role. If you are new to on-call, start with secondary on-call responsibilities. This will give you insights into how alerts and incidents are managed by the primary on-call responder and other team members, both for critical and non-critical incidents.

<aside> 💡 On-call doesn't mean receiving all alerts. Responders can choose specific escalation policy alerts they want to receive while being on-call, allowing them to focus on their areas of expertise.

</aside>

  1. Set Up Shift Notifications: Being punctual and attentive during on-call shifts is crucial. Configure notification channels such as Phone calls, WhatsApp, Telegram, SMS, Email, and Slack to remind yourself when the shift is about to start and when it's ending.
  2. Get Familiar with Incident Severity Levels: It is important for the on-call responder to have a solid understanding of how incidents are categorized based on their severity. This involves classifying incidents as critical or non-critical. Having a clear grasp of this classification system is crucial for effectively managing incidents.
  3. Learn about Incident Handling Timeframes: Take the time to familiarize yourself with the organization's defined timeframes. This includes acknowledging incidents, working towards resolutions, and knowing when it is necessary to escalate. By knowing these timeframes, you can ensure that responses are timely and fall within acceptable windows.
  4. Understand the Escalation Process: Most organizations have a multi-tiered escalation process in place. It is important to understand when and how to escalate incidents to higher-level responders, such as managers and team leads, when required. Recognizing the escalation hierarchy ensures that incidents are acknowledged with the appropriate level of urgency and expertise.

Handle On-Call Responsibilities

When it comes to handling on-call responsibilities, the key is to respond quickly to alerts and incidents.

  1. Prioritize Incidents: When faced with multiple incidents occurring simultaneously, it's important to prioritize them. Take into consideration factors such as severity and priority when determining which incidents to address first. Start by focusing on critical incidents that can have a significant impact on the organization's services, and then systematically address lower-priority incidents. By doing so, you can ensure that resources are allocated effectively and efficiently.
  2. Follow the Incident Response Process: Every organization has a predefined incident response process in place. It is crucial to diligently follow this process. The process typically involves steps such as incident identification, response, resolution, automation, and post-incident review. By adhering to this structured process, you promote a consistent and effective response to incidents.
  3. Collaborate with Team Members: Remember that you are not alone in managing incidents. Incident management is a team responsibility to tackle and resolve incidents. By leveraging the strengths and expertise of your team, you can acknowledge incidents more effectively and ensure a smoother resolution process.
  4. Root Cause Analysis: Resolving an incident isn't the end of the process. It is important to conduct a thorough root cause analysis of the incident to gain deeper insights. By identifying the underlying reasons and contributing factors that led to the incident, you can implement preventive measures to avoid similar incidents in the future. This understanding is invaluable in enhancing overall system reliability and minimizing the chances of recurrence.

Learn from incidents

To ensure the availability and reliability of the system and services, it is important to conduct post-incident reviews and implement preventive changes.

Remember, on-call is a collective effort. By following the on-call practices and continuously improving incident response processes, organizations can ensure that their systems remain stable and their services meet customer expectations.