DevOps Operations Performance Platform

PagerDuty Blog

Subscribe to PagerDuty Blog: eMailAlertsEmail Alerts
Get PagerDuty Blog: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Latest Blogs from PagerDuty Blog
Zayna Shahzad is a Software Engineer at PagerDuty on the Mobile Team. She works on the Android and iOS PagerDuty apps offered through the App Store and Play Store. In this post, she shares her experience shadowing our Customer Support team. Finder her on Github and... The post Shadowin...
Here at PagerDuty, our engineering teams are committed to Agile development principles that favor rapid iteration over lengthy periods of design, and favor direct communication between team members over reams of written specifications. There are countless articles that dictate how Agil...
In a recent blog post, Managing a Tier Zero Service Doesn’t Have to Be Scary, PagerDuty’s SVP of Product Development Tim Armandpour discussed several important best practices that minimize chaos during incident resolution. According to Tim, in today’s always-on world, guaranteeing reli...
Here at PagerDuty, we’re pretty focused on being involved in the DevOps community by providing perspectives on where we’ve been, where we are and where we’re headed as a community — and of course hearing from the community as well! And, if you follow this blog... The post Trends in Dev...
Do you have what it takes to win the PagerDuty Innovation and Transformation awards? PagerDuty is excited to recognize the achievements of the most successful organizations with these awards at PagerDuty Summit 2017. Presented each year at PagerDuty Summit, these awards honor both the ...
I recently had the privilege of spending a full day with a small group of our customers. The attendees were leaders in their development and IT operations organizations and spanned a wide variety of industries, including technology, media, finance, retail, healthcare, and more. Every s...
The PagerDuty and HipChat extension empowers responders to collaborate to resolve issues directly from their chat window Several weeks ago, we released our updated HipChat extension. Our team is excited to have built the v2 HipChat extension from scratch to support great new functional...
Here at PagerDuty, we’re committed to helping our customers get the most out of the platform as possible. We’ve long shared best practices and knowledge via resources such as our Support Knowledge Base. But over in Customer Support and Success, we’ve been hearing your frequently... The...
“Incident lifecycle management? If we manage to stay alive from one incident to the next, it’s a good day. On a bad day, it’s all panic mode.” Unfortunately, that’s the reality of incident lifecycle management for far too many software and IT companies — b...
While a major incident is ongoing, all of your focus is on restoring service: watch the smoke, figure out where the fire is, and put it out. But after service has been restored — the incident is resolved, the adrenaline has drained, and it’s peace-time... The post Better Incident Postm...
Today, we’re excited to announce a suite of new functionality to power even faster resolution and accelerate learning from major business-impacting incidents with the definitive Incident Resolution Lifecycle. With this release, we help you to differentiate major incidents from other da...
Your high school history teacher no doubt delivered to you some variation on George Santayana’s famous remark that, “those who cannot remember the past are condemned to repeat it.“ I’m pretty sure Santayana wasn’t thinking about incident management when he...
We are very pleased to announce that PagerDuty and Atlassian are continuing to collaborate and improve best practice around the incident resolution lifecycle and make the world of unexpected chaos a little less frantic. In April we announced the best-in-class  PagerDuty HipChat Extensi...
Get face to face One of the great things about being a platform is that your users have the ability to take your product in a different direction than you might. We’ve had the ability to integrate your preferred conference call tool into the incident... The post Video Conferencin...
In today’s integrated digital economy, the IT infrastructures at most corporations can no longer exist in silos. The overwhelming benefit of integration is the rapid development of new ideas and solutions. The unfortunate downside is that increased integration and connectivity also pla...
Now that you’re well-equipped with how to fast track your career and survive the high growth startup stage, in this post I’ll share my advice on how to make time for professional development and lay the groundwork for reaching your career goals — something many... The post This Is Not ...
In the run up to our latest release of capabilities for developers, I sat down with David Yang, a senior engineer here at PagerDuty who’s seen our internal architecture evolve from a single monolithic codebase to dozens of microservices. He’s the technical lead for our Incident Managem...
Code reviews are an important part of the modern software lifecycle. Unfortunately, a lot of cycles are burned and morale is damaged because there are few guidelines given to reviewers (and reviewees) on constructive feedback and effective written communication. Below are some tips for...
We’re excited to share that we’re open-sourcing the tool we use to gather and transform the metrics from our managed DNS providers. We use DNSmetrics to supply data in a standard format to our SRE team’s monitoring and alerting systems, and we hope it can... The post ...
For many of our customers, reducing alert noise is a difficult, yet rewarding task. Cleaning up your alerting means fewer late night pages and happier team members. But this task can feel a lot like yak shaving if you don’t have the proper tools. In... The post 3 Easy Steps to Su...
Here at PagerDuty, we understand that being on-call can be a lot of work. In addition to triaging, assessing, and resolving incidents as they arise, you need to do one thing first — find out when you’re on-call! Oftentimes, users with complex on-call shifts find... The post Never Miss ...
PagerDuty is excited to be working closely with Atlassian to deliver incident management solutions that support the developer community. With HipChat, Atlassian has created a powerful chat tool to enable teams to work more efficiently. HipChat allows teams to increase productivity thro...
The role of the software developer has changed tremendously since the craft of creating software began. In recent years, this change has accelerated dramatically, with the developer’s role expanding beyond creating and running code in local environments. In today’s world, the developer...
Designed For The Developer The role of the software developer has been rapidly changing. As a developer, you already know that your involvement doesn’t end when you deploy a service to production. Now it extends into managing that service and being on-call for production issues... The ...
We’ve heard you! We take your feedback seriously and have noted your desire for a more formal community amongst your brethren who are joined together by involuntarily wake up calls at 3am in the morning. Now presenting, the PagerDuty “OnCallClub” to translate all of those... The ...
The fear of failure can be a massive hurdle for many development and ops team members. This fear can be so overbearing that morale across the board drops significantly, hurting employee productivity and advancement. Having appropriate incident management and monitoring in place can, th...
Incident management is paramount to the success of any modern ITOps team. However, much like growing a business, scaling incident management can also trigger growing pains. As the landscape of devices, applications, and systems grows — each requiring monitoring — so too, does the alert...
Last week, I shared my best practices for fast-tracking a career. This week I’m sharing my top pieces of advice for companies on the high-growth fast-track. As the term suggests, companies at this stage are characterized by a rapid increase in regional and international sales, global.....
If technical debt were like monetary debt, it would be hard to keep track of it unless you checked in manually. The only way many people find out their checking account is running out of funds is by logging in and checking the balance — or, worse, having a check bounce or a debit card ...
Incident response bottlenecks – you know they’re real and you know that your incident response system probably has a few, but they must be minimized as they hurt your on-call teams and your customers. Let’s take a look at some of the most critical bottlenecks and how....
Joe Sexton recently joined PagerDuty’s Executive Advisory Board. As an experienced leader in scaling high-growth SaaS companies, we asked him to share his thoughts on how others can scale their careers in today’s workplace. These skills can apply to any field – technical or not.....
It’s critical to have the right tools in place before a firefight happens. A lack of proper tooling makes it significantly more difficult to recognize, organize, fight, and resolve a major outage. This is especially true when teams are busy fighting rather than communicating to i...
Here at PagerDuty, we spend a lot of time thinking about how we can help the DevOps community and IT professionals succeed. We’re particularly interested in the “hows and whys” of evolving DevOps practices, how to deliver value to our practitioners, and how to better serve the communit...
The on-call engineer has a critical role to play in incident management. They can mean the difference between an incident turning critical or being managed and resolved quickly. Startups may not have many choices around who should be on call, but as the organization grows... The post B...
Avoiding Noise in Incident Management Suppression. According to the thesaurus, this word is synonymous with terms like deletion, elimination, and annihilation. Yet within the context of incident management, suppression means something quite different. It’s not about getting rid of data...
Smart devices require smart monitoring. That’s not a platitude. It’s an imperative. In fact, the smarter the device, the smarter you need to be about monitoring it. As headlines have shown, unmonitored, unprotected smart devices may be a disaster (or a DDoS attack) just waiting to hap...
Thanks to the DevOps movement, we now understand why software delivery chains that consist of a series of silos are bad. They complicate communication between different teams, leading to delivery delays, backtracking, and bugs. When it comes to incident management, there is another ty...
According to a roundup by Gartner, the average cost of downtime for an enterprise is $5,600 per minute. While the data collected was from incredibly large companies, the cost of downtime for even small startups is no laughing matter. Let’s assume, for the sake of... The post The Top Ca...
International Women’s Day is a global day celebrating the social, economic, cultural and political achievements of women. It’s about unity, celebration, reflection, advocacy and action. In my career, I have had the opportunity to work with some inspiring women who have helped sha...
A memo from our CEO, Jennifer Tejada. More than a hallmark card holiday, we celebrate today as International Women’s Day. It’s a celebration across the US and the tech industry with a grass roots movement, “A Day Without Women,” which calls for women to go... The post Celebrating Women...