Learning from mistakes


Everyone can make a mistake once, that’s totally fine. Repeating mistakes is unacceptable. Correction of Errors (COE) is the name Amazon gives it’s mechanism for correcting errors. Microsoft uses the term Root cause analysis (RCA), it’s the same concept. This mechanism originated with service outages but can be applied for any “error”, like missing your sons kindergarden graduation, or getting fired.

Why COEs

Sections and questions

What was the impact?

What is the Timeline?

How did you notice it?

How did you mitigate it?

What was the root cause?

5 Whys

What did you learn

What are you going to do so it doesn’t happen again


Is it worth the cost?

How to ensure they are happening?

Per company differences?

Should I use this beyond service outages?