DevOps Operations Performance Platform

PagerDuty Blog

Subscribe to PagerDuty Blog: eMailAlertsEmail Alerts
Get PagerDuty Blog: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Related Topics: Application Performance Management (APM), DevOps Journal

Blog Feed Post

A Coordinated Response Culture for Incident Management | @DevOpsSummit #APM #DevOps

Organizational culture that prioritizes coordinated response to incidents is vital for monitoring & managing IT infrastructure

A Coordinated Response Culture for Incident Management
By Christopher Tozzi

An organizational culture that prioritizes coordinated response to incidents is vital for monitoring and managing an IT infrastructure. Incident management won’t go smoothly if teams don’t want to or know how to coordinate their response to alerts.

What is a Coordinated Response?
To break it down simply, a coordinated response in the IT world generally involves notifying the right people and mobilizing teams immediately, providing access to contextual information for seamless alignment, and getting the team onto the right conference bridge or communication channel of choice. A well coordinated response enables organizations to jump in and resolve incidents quickly and efficiently.

Let’s take a look at some of the challenges to response coordination, then examine strategies and tools that allow organizations to overcome these challenges and optimize incident management.

Coordination Challenges
A coordinated response culture rarely breeds itself within an organization. By default, there are obstacles in place that make coordinated response difficult. The biggest challenges include:

  • RECRUITMENT: The difficulty of recruiting additional people to help manage an incident.This challenge arises when you need to bring others in to help put out a fire, but the additional people you need didn’t receive the original alert. Unless you have a incident management platform like PagerDuty, there is no efficient way to request help from others when an incident occurs. You could email or call, of course, and hope the people respond quickly, but emails and calls are not always the fastest way to get in touch or the best way to get someone’s attention quickly, especially outside of normal business hours
  • COMMUNICATION: Too many communication channels give too many options. Multiple communication channels are available for incident management, from email to video chats to Slack. Depending on the type of incident at hand, one channel may make more sense than another. What you don’t want to do is waste time in the midst of an incident figuring out which channel to use and making sure all your team members are on it.
  • TOOLS: Most organizations have some mode of coordination and collaboration. To create an effective coordination culture, it’s best to work with the tools already on hand, and integrating them with a central incident management platform.

Increasing Coordination
Fortunately, these challenges can be solved easily enough by taking advantage of the features very recently released in PagerDuty. With Response Mobilizer and Response Bridge organizations can:

  • RECRUIT: Recruit additional teammates to help solve a particular incident. The best way to do this is to build the recruitment of additional help into your incident management workflow. That’s better than relying on manual, ad-hoc ways of asking for help in the midst of an incident, and you’ll know you have the experts you need there to help you.
  • COMMUNICATE: Integrate existing and most preferred communication tools with your incident management workflow; whether it be email, SMS, Slack, existing conference bridge from WEBEX, GoToMeeting or Skype. Trying to impose a new communication tool on the team will often disrupt the established workflow and incur resistance from staff. The better approach is to integrate your existing communication tools into your incident management solution.
  • HAVE CONTEXT: Include right contextual information to address business-critical issues in real-time. Providing rich contextual information about the incident and including a brief message to responders detailing why they are needed enables responders to prepare and immediately be aligned.

If you’re seeking to build a better-coordinated response culture for incident management within your organization, take advantage of the new features to make the most of the collaboration and coordination resources available to you, without forcing your team to overhaul its communication workflow.

The post A Coordinated Response Culture for Incident Management appeared first on PagerDuty.

Read the original blog entry...

More Stories By PagerDuty Blog

PagerDuty’s operations performance platform helps companies increase reliability. By connecting people, systems and data in a single view, PagerDuty delivers visibility and actionable intelligence across global operations for effective incident resolution management. PagerDuty has over 100 platform partners, and is trusted by Fortune 500 companies and startups alike, including Microsoft, National Instruments, Electronic Arts, Adobe, Rackspace, Etsy, Square and Github.