Outages
Overview
Use this document to learn the procedures and SLAs for Party Bus outages.
Planned Outages
Planned outages are primarily infrastructure upgrades or patches and are usually managed by Party Bus Operations, but the MDO and CollabTools Team can also schedule them.
Communication Process
For planned Party Bus updates that could result in an outage or downtime for customers, Party Bus will communicate all updates at least 24 hours in advance. Party Bus will communicate these outages via the Mattermost Bot notifications.
Level of Effort Classification
For Party Bus planning purposes, the total LOE is categorized into the following bins. Note that these LOEs don't equate to outage durations. Those times are stated in the SLAs.
- Low: 30 minutes
- Medium: 60 minutes
- High: 2 business hours
- Extra High (rare): 4 business hours
Service Level Agreements
Party Bus aims to minimize downtime and meet the following SLAs for planned outages.
Maximum expected downtime during planned outages per LOE:
| LOE | Maximum Expected Downtime |
|---|---|
| Low | < 15 minutes |
| Medium | 16-45 minutes |
| High | 46 minutes - 1.5 business hours |
| Extra High (very rare) | 1.5 + business hours |
Request to Have an Outage Rescheduled Due to Mission Impact
Customers have two options to request to reschedule an outage:
- Submit a request to reschedule a planned outage with the help desk
- Communicate your request in the Party Bus Value Stream Support Mattermost channel .
Unplanned Outages
Upon learning of an unplanned outage, the Party Bus team will immediately triage the event. Note that working hours are normally limited to those listed in Party Bus's Terms and Conditions, unless otherwise coordinated directly with a customer.
Communication Process
Party Bus uses the following communication plan for unplanned outages.
| Outage Type | Primary | Alternate | Contingency |
|---|---|---|---|
| Pipeline Outages | GitLab | Mattermost | Email via Odoo |
| All Other Outages | Mattermost | Email via Odoo | TBD |
Outage Severity Determination and Service Level Agreements
Party Bus's DRP determines the severity of the unplanned outage.
| Severity | Response Time SLAs |
|---|---|
| Low | < 2 business hours |
| Medium | 2 - 4 business hours |
| High | 4 - 8 business hours (requires AAR) |
| Critical | 8+ business hours (1 day, requires AAR) |
Resolution Process for Unplanned Outages
The DRP provides additional information about this process.
For unplanned outages, Party Bus will publish an AAR following the event (pending sensitivity and classification) that details the following:
- An overview of what happened
- How it was solved, including technical details
- What steps Party Bus is taking to prevent a similar outage from happening again
After an outage is resolved, AARs may be requested via a link provided in a Mattermost notification bot message. Archived AARs are available on the Party Bus internal IL4 Confluence site.
Related Content/References
Submit a Help Request
- Submit a request to reschedule a planned outage with the help desk
- Communicate your request in the Party Bus Value Stream Support Mattermost channel .