Red Hat's Continuous Kernel Integration (CKI) project provides CI-as-a-service to Red Hat kernel developers, kernel maintainers, and QE engineers.
These customers' perception of the CKI service quality is heavily influenced by how incidents - unplanned service interruptions or decreases in service quality - are handled.
In this talk, we will present how the CKI project detects, tracks, fixes and prevents incidents.
We will discuss - the logging, monitoring and alerting infrastructure deployed by CKI - the various incident workflows the CKI project tried to use in the past, and the incident workflow it uses at the moment - the social aspects that need to be considered when implementing such a workflow
Attendees will gain an understanding of the key factors that contribute to successful incident handling. The CKI project's approach to incident management can serve as a blueprint for other small service teams.