Squishy Human Fails At Task

6 Oct 2017

People don't fail. Systems fail.

Actually, that's not quite right. People fail all the time. They fail so often and so consistently that it's not fair, or even useful, to call it failure. When I left my lunch in the fridge and cycled to work, I failed. When I didn't notice that one of the checkpoints had code R128 but the sticker said R129, I failed. When I didn't realise that we'd double-booked a crewman and we ended up running one person down, I failed.

But acknowledging inevitability doesn't mean acceptance. Fire Hazard maintains a relentlessly high standard - and we really care about everything we do, so every time we drop the ball, even in minor ways that players might not even notice, it hurts.

As a policy, saying "I will make fewer mistakes", or, worse "You! Make fewer mistakes!" doesn't work. We're all working really hard, and promising to work harder or check more carefully means nothing. That dial's already at 11, and has nowhere further to go.

Humans are wonderful, creative, adaptable creatures. But even the best are also squishy, forgetful, and prone to mistakes. We have to deal with that.

That's where the robots come in. Fire Hazard makes a three-pronged attack on human frailty:

1) Automate

Save the humans for the jobs humans are actually good at. Automate the rest. Our internal systems do as much of the heavy lifting as possible, from crew scheduling to generating stickers, player IDs, and even missions. Any task that can be automated should be; relentless, dead-eyed metal monsters never forget, never tire, never get distracted.

2) Systematise

If we can't automate a task away completely, we build a system around it, like a powered exoskeleton for the human to wear. It could be as simple as a checklist on the door, or as complex as the auto-generated warning messages in our stage management system. The systems' job is to prevent failures, but also to catch failures before they propagate. Planes don't crash when something goes wrong; planes crash when three things go wrong.

3) Investigate

"We don't tolerate screwups" is a very different statement to "we don't tolerate people who screw up". Every incident or near-miss triggers an investigation. It could be as simple as a two-minute cross-desk conversation ("Wait, how did that just happen?") or as involved as a whole-team-hour, but one way or another we need to know how the ball was dropped, and what robots or systems will stop it from getting dropped again.

The crucial bit is that we don't care who dropped the ball. And not in a touchy-feely "it's all OK, we're all friends" way, but in a hard-edged, rational, "that is not relevant information" way. Squishy Human Fails At Task is not news. Bad System Fails To Protect Us From The Consequences, that's news.

In the short term, this process means that seemingly inconsequential screwups slow us down ("Sure, the briefing was a bit late. It's out now, players are fine, why are we having a meeting about this?"). In the medium and long term, it's why we have a reputation for steadfast quality in an industry that doesn't - and why we don't have to remember to send the briefings ever again.

Because, now, steely-eyed machines send them for us.