Logo Logo

Nine Meals to Anarchy

Those of us who are well fed (…) ought not to forget that necessity makes the root of crime (…) the only barrier between us and anarchy is the last nine meals we’ve had.

Alfred Henry Lewis, 1986


The 9th meal: The Boeing Drama

Fearing that American Airlines would switch its fleet to Airbus, Boeing rushed a 5-year manufacturing plan for its new aircraft - the 737 Max - in 2011. As the “normal” certification process would take up to 9 years, Boeing decided for an easier certificate type. The company claimed their new aircraft was not new, but instead almost a copy of the original 737.

However, the 737 Max included bigger fuel-efficient engines, which required a longer and heavier airframe and a wider wingspan. As any visible changes would violate the certification, the engines were mounted further forward on the wings, shifting the centre of gravity of the machine. To correct the problem, Boeing developed a software patch (MCAS) that listened to two sensors in the wings and pushed down the plane’s nose to restore its angle of attack when needed. With the issue solved, the Max flew for the first time in January 2016. The MCAS was hidden from pilots and airlines, and not mentioned in the flight manual.

In October 2018, the sensors of the Lion Air flight 610 that took off from Jakarta were malfunctioning. The airplane dove sharply 22 times before crashing during its short 13 minute flight. Less than half a year later, the Ethiopian Airlines flight 302 crashed 6 minutes after taking off. A total of 346 people died. Post-incident investigations of the Boeing case discovered that engineers had alerted for this danger and some even claimed to be “hesitant about putting my family on a Boeing airplane”.

Crashes Everywhere 💥

In todays’ world, it is common to be pushed to get things done at light speed. Planning and extensive testing are often deprioritized, which enhances the probability of mistakes that could have been prevented. Even with the best intentions, pressure can take its toll: in 1983, sunlight disrupted the reading of a soviet satellite, alerting incorrectly about a US nuclear attack, which almost started a World War. The 5 hour crash of all Facebook apps in 2021 resulted from a command ran by mistake which disconnected all its data centres, costing the company $65M, a 4.8% drop in stock price, and $6B in Zuckerberg’s fortune. In 2020, COVID-19 became the textbook example of how unexpected events can shake our systems to the point of anarchy (remember not finding toilet paper at the supermarket?).

While there are certainly unexpected events that lead to product failures, there are several things product managers can do to decrease their impact. What?, you ask.

How to create a failure proof product?

Product Failures

(1) Analyze the Unhappy Path

When your product works as expected, everyone is happy - that’s your happy path. However, everyone knows this hardly happens. The bumps in the road, the crashes, the need to reset by unplugging a device… that’s normal - that’s your unhappy path.

Analyzing the Unhappy path starts by listing all the steps that a customer goes through (their journey) and all the background processes of your product. For each step, brainstorm all possible failures that might occur. The brainstorm can be elevated by observing how customers use your product (or a prototype) - specially noting the “weird” things people do you were not expecting. Finally, evaluate the impact of each problem based on their likelihood and consequences.

(2) Define what to do

Based on the above analysis, define how to handle each issue:

  • Eliminate - Problems with high likelihood and/or high impact might need to be addressed immediately. This might mean you need to redesign or redevelop parts of the product.
  • Protect - When you cannot solve the problem, you might want to protect the core:
    • Increase robustness - Make your product adapt to variations: while life is better at 25-30ºC (personal preference 🙂), humans regulate their temperature to adapt to colder and warmer environments
    • Add redundancy - If a plane engine fails, there is another. We have two kidneys. Databases are commonly backed up.
    • Add barriers - Our internal organs are not exposed - they are covered by skin and protected by an immune system.
  • Monitor - Track the performance of the product and alert the user when something is not right. This gives extra time for proper reaction.
  • Announce - Some features will be out of scope. And that’s ok. However, a customer might still take your non-water-proof watch into water. Advert the user explicitely not to do it 😊

(3) Prepare for failure

Despite all the work, failures still happen, and you should be ready for them.

  • Start emergency procedures - If there is a serious issue, you might want to put your product into a coma. Define which areas are instrumental and lock or destroy access. A backup system also ensures the product can regain its normal functions after the storm.
  • Implement post-traumatic growth mechanisms - Traumas can be handled in two ways - stress or growth. Growth in products can be implemented through a series of mechanisms that detect the “weak” and “strong” components of the system (the ones that led to failure and the ones that maintained operations or excelled). Kill the weak links and strengthen the good ones (if done automatically by the system, even better).
  • Empower customer support and on-call teams - Ritz-Carlton employees are empowered to spend up to $2000 to solve customer problems without requesting permission. This demonstrates trust, speeds problem solving, and delights customers.

Final thoughts 

All products are different, and the tips above might apply perfectly to some products and fail for others. It is important to evaluate which strategies make sense and adapt to the product in question.


Catarina Pinto @catarinappinto