Don’t Call It a Glitch
Last Sunday, news outlets reported that the prior day a computer glitch in the air traffic control system had caused major slowdowns at airports in New York and Washington DC. While flight schedules have returned to normal, the Federal Aviation Administration (FAA) was still investigating as of Monday morning.
The use of the word glitch really infuriated me, somehow. And as I’ve done a fair amount lately, in an attempt to find out if I was alone in this, I posted an update to Facebook. The update read as follows:
“Is it just because I’m a former IT person that I find the term ‘glitch’ so offensive? Apparently there was a ‘glitch’ in the air traffic control system that closed down some of the east coast airports today. ‘Glitch’ makes it sound like some kind of unavoidable thing that happened on its own. That’s not right. Somebody messed up. Sloppy code. Bad testing. Inadequate backup. Somehow it minimizes the incompetent part of it.
Am I the only one? Maybe I am! ”
I wasn’t alone. And I got some really fascinating answers that helped me work through the why of it.
First of all, it’s pretty easy to see how the word glitch has been coopted. Merriam Webster, a traditional, older dictionary defined the word in the way I expected: an unexpected and usually minor problem; especially a minor problem with a machine or device (such as a computer).
More modern dictionary.com and the Urban Dictionary had more evolved or coopted definitions:
Dictionary.com: A defect or malfunction in a machine or plan. Computers: any error, malfunction, or problem.
Urban Dictionary: An error in a structured system. It is usually applied to electrical and computer systems. A mistake, or bug, usually in a computer program or video game.
But I think the original definition still hangs in the air. It was minor, unavoidable, and inconsequential. Except it wasn’t, it wasn’t and it wasn’t.
Another old friend and former colleague, Gerry Meisler, got me to the core of my issue with one three letter word — MIC. Now, so much time had passed that I’d forgotten what that acronym stood for, but he reminded me it was Merchandise Information Control — a consultant-built merchandise operations management system that cost many millions of dollars and did not work. In today’s parlance, it was filled with glitches.
As many of you know, I was in the IT industry for many years, and tended to focus in two different areas: innovation and mess-cleaning. MIC was one of the messes I cleaned up at now-dead shoe retailer Morse Shoe (Fayva!). I thought I was there to implement a planning system (did it!) but realized I could not until we got the base of data cleaned up. And that meant getting MIC to work.
It took a small staff and me about six months to get the thing working, and it had nothing to do with glitches. It was all avoidable, there was nothing minor about it, and since the company had spent eight figures on it (this is going back more than 20 years, mind you), calling it inconsequential could have gotten you fired.
What was wrong?
- No integration or bridges built to existing systems
- No single in-house IT manager in charge
- Plausible deniability among both users ( “I wasn’t here when the design was done “), in-house IT staff( “the consultants never got us involved “), and the consultants ( “We’ve been begging them to please just test the thing “)
- No full-on system or parallel testing
- No clear documentation of the is so that the impact could be quantified
- No prioritization of new reports/screens to minimize culture shock
This wasn’t a glitch. It was a lack of leadership and buy-in, poor coding, awful change management and the cultural equivalent of an ostrich.
Today, we’d call it different things depending on our political persuasion. A glitch, political problem…think of all the things you heard about ACA. Yet it was morally pretty close to the same.
And when you really look at it, the dumbing down of computer problems to glitches is more than just irritating, it’s downright dangerous.
As partner Brian Kilcourse said this morning: “I find it incomprehensible that on the one hand, our society is talking about drones, robots, driverless cars, IoT, and all manner of automation without direct human hands-on-the-wheel control, and at the same time casually dismissing alarming ‘glitches,’ security breaches, and unprotected and sometimes failing infrastructure, with a shrug of shoulder. “
With regard to the specific issue at hand on Saturday, I got some additional valuable input. This has nothing to do with retail, and everything to do with the ostrich-syndrome.
Early in my IT career I worked with a former air traffic controller who’d been summarily fired as part of Ronald Reagan’s reaction to the union, PATCO’s strike in 1980 or so. He was rebuilding his life with determination one rarely sees today: putting himself through programming school while supporting his family as a wedding photographer and spending weekends working in the National Guard. He remains an inspiration to me. We found each other on Facebook about six months ago, and yesterday he said some really important things that go right to Brian’s point above.
“Government employees [are] running a very complicated ATC system on a marginal budget. Why are you surprised? And yes sloppy code and back-dated equipment probably had something to do with it…
I can’t believe they still fail to address the bad management issues and lack of funding for reasonable maintenance. The problem is that the Federal Government is more prone to being reactive rather than proactive. In fact, the actual development of the ATC system and most of [its] improvements have come from major accidents or incidents. The Feds had convened a study way back when that said rotating shifts were detrimental to a person’s health and ability to maintain clear focus. Yet, they persisted in having the controllers work rotating shifts and up to 10 hours a day and 60 hours a week. The idea of having one person man a tower all night in a low density facility is yet another example of bad management. How are they supposed to relieve themselves and who would blame them for falling asleep at 3 am? “
Glitches Indeed
This is a very long way around a core point: Glitches are rare. Computer / technology problems are not. Sunspots can create glitches. But make no mistake: most computer problems are avoidable. We are venturing into the brave new world of an automated universe. If I ruled the world, we’d eliminate the term glitch from our computer vocabulary and go straight to accountability. Otherwise, as Brian said, we’re looking at some major trouble ahead.
Facebook has brought me some real benefits this year…different from just seeing how my friends are doing, or hearing the endless chattering about the political situation. I’ve gotten some valuable input on real topics like this.