There are 06 design principles for operational excellence in the cloud:
✅ 1. Perform operations as code: In the cloud, you can apply the same engineeringdiscipline that you use for application code to your entire environment. You candefine your entire workload (applications, infrastructure) as code and update it withcode. You can implement your operations procedures as code and automate theirexecution by triggering them in response to events. By performing operations ascode, you limit human error and enable consistent responses to events.
✅ 2. Antonate documentation: In an on-premises environment, documentation iscreated by hand, used by people, and hard to keep in sync with the pace of change.In the cloud, you can automate the creation of annotated documentation afterevery build (or automatically annotate hand-crafted documentation). Annotateddocumentation can be used by people and systems. Use annotations as an input toyour operations code.
✅ 3. Make frequent, small, reversible changes: Design workloads to allow componentsto be updated regularly. Make changes in small increments that can be reversed ifthey fail (without affecting customers when possible).
✅ 4. Refine operations procedures frequently: As you use operations procedures,look for opportunities to improve them. Set up regular game days to review and validate that allprocedures are effective and that teams are familiar with them.
✅ 5. Anticipate failure: Perform “pre-mortem” exercises to identify potential sourcesof failure so that they can be removed or mitigated. Test your failure scenariosand validate your understanding of their impact. Test your response procedures toensure that they are effective, and that teams are familiar with their execution. Setup regular game days to test workloads and team responses to simulated events.
✅ 6. Learn from all operational failures: Drive improvement through lessons learned from all operational events and failures. Share what is learned across teams and through the entire organization.