Effective preparation is required to drive operational excellence. Business successis enabled by shared goals and understanding across the business, development,and operations. Common standards simplify workload design and management, enabling operational success. Design workloads with mechanisms to monitor and gain insight into application, platform, and infrastructure components, as well as customer experience and behavior.
🍄 Create mechanisms to validate that workloads, or changes, are ready to be movedinto production and supported by operations. Operational readiness is validatedthrough checklists to ensure a workload meets defined standards and that required procedures are adequately captured in runbooks and playbooks. Validate thatthere are sufficient trained personnel to effectively support the workload. Prior to transition, test responses to operational events and failures. Practice responses in supported environments through failure injection and game day events.
AWS enables operations as code in the cloud and the ability to safely experiment, develop operations procedures, and practice failure. Using AWS CloudFormation enables you to have consistent, templated, sandbox development, test, and production environments with increasing levels of operations control. AWS enables visibility into your workloads at all layers through various log collection and monitoring features. Data on use of resources, application programming interfaces (APIs), and network flow logs can be collected using Amazon CloudWatch, AWSCloudTrail, and VPC Flow Logs. You can use the collected plugin, or the CloudWatchLogs agent, to aggregate information about the operating system into CloudWatch.
OPS 1: How do you determine what your priorities are?
Everyone needs to understand their part in enabling business success. Have shared goals inorder to set priorities for resources. This will maximize the benefits of your efforts.
OPS 2: How do you design your workload so that you can understand its state?
Design your workload so that it provides the information necessary for you to understand its internal state (for example, metrics, logs, and traces) across all components. This enables youto provide effective responses when appropriate.
OPS 3: How do you reduce defects, ease remediation, and improve flow into production?
Adopt approaches that improve flow of changes into production, that enable refactoring,fast feedback on quality, and bug fixing. These accelerate beneficial changes enteringproduction, limit issues deployed, and enable rapid identification and remediation of issues introduced through deployment activities.
OPS 4: How do you mitigate deployment risks?
Adopt approaches that provide fast feedback on quality and enable rapid recovery fromchanges that do not have desired outcomes. Using these practices mitigates the impact ofissues introduced through the deployment of changes.
OPS 5: How do you know that you are ready to support a workload?
Evaluate the operational readiness of your workload, processes and procedures, and personnel to understand the operational risks related to your workload.