Epic winter fail and those elusive “edge cases”

Those of you in the midwest will be well familiar with the brief interlude of really bad winter weather that arrived this week. Snowfall and temperatures below zero caused massive problems with United Airlines’ hub in Chicago. Starting on Saturday, January 4th, thousands of flights were cancelled, extending right through Monday, January 6th. O’Hare airport usually handles these types of events with great verve and aplomb, but this was shaping up to be a special kind of storm — snow AND frigid temperatures. Those of us that live in Chicago know that summer thunderstorms cause more flight cancellations and delays than the typical winter storm. But this was not your typical winter weather. I was half-expecting to see Christopher McDonald exclaiming that this was the perfect storm. Heck, I would’ve settled for hearing Shooter McGavin calling it the perfect storm. (Google it).

But, I digress.   It’s not United’s fault that we’re having a storm of epic proportions.  As a result of this mess, we get to see just how these “chaos” style events are causing cracks in United’s scheduling and operations systems (i.e. software).

The week of January 5th was supposed to be a typical week for me.  That means multiple cities over multiple days.   I was scheduled to fly from Tampa to Chicago on Sunday morning, then from Chicago to Columbus, Ohio on Wed night, then back from Columbus through Chicago to Tampa on Friday night.

At about 1:45 AM on Sunday morning, my iPhone buzzed with a message from United, telling me that my 11 am flight from Tampa to Chicago was cancelled.   I checked the web site and saw that the earlier flight was also cancelled, and then shut off my alarm and went back to sleep.

When I woke back up again at 8:30, the goodly United automatons had resceduled my flights for me.  Since I have lots of status on United, I was given the first available slot to get from Tampa to Chicago, via Houston — arriving at 1:05 PM onThursday.   Wait for it…   Yes, I was supposed to already be in Chicago, traveling to Columbus on Wednesday night.

A customer service rep looking at this reservation probably would have spotted this in a New York minute.   Clearly, the problem is somewhere in the code that handles the automatic rescheduling of reservations.    If you look carefully, you will notice that ALL of the flights were on United jets — no regional carriers.   However, and I’m just guessing here, the NEW segments from Tampa to Houston and Houston to Chicago were likely booked on Continental’s reservation system.   Yes, the two systems are now one, publicly speaking, but are they truly integrated under the covers?  Once I noticed the problem, I logged into United’s online reservation system, but it bombed out trying to make any changes to this mess with the familiar “call the 1-800-UNITED1″ message.

Fear not, my loyal followers, the nice customer service representative that I spoke with (albeit 24 hours later) was able to get the whole mess straightened out by cancelling and refunding the entire reservation.

So what’s the moral of this story?  There are two morals.  The first is that writing really good enterprise software is hard to do.   The second is that integrating different software from two different companies can be a lot more challenging than it looks.  My best guess is that under the covers United and Continental are still using a mix of their original scheduling and operations systems.  I started getting error emails from United’s automated boarding-pass system after I was finally able to cancel the flights.  Further evidence that the boarding-pass application uses yet ANOTHER back-end system.  Thus, the problem with my flights is the result of what software developers like to call “an edge case”.   Something that could happen, but isn’t very likely to happen.  United did not have any flights available on their system, so the software probably kicked the TPA-ORD leg over to the Continental system to see what it had available.   The data package that United’s engine passed onto Continental’s engine probably did not include any information about my OTHER flight segments.   Thus, Continental’s engine was happily able to offer me transport from Tampa to Chicago via Houston at the first available opportunity, namely, Thursday morning.  They solved one problem, while creating a completely different problem.

It’s a lesson that we all need to keep in mind when we are integrating several different systems.    This is exactly the kind of problem that you will have to solve if you are a mid-market business owner looking to acquire one of your competitors — or what a strategic acquirer with have to deal with when they buy your business.   Lots of edge-cases in getting their systems and your systems to co-habitate.  What can you do to minimize this problem?   First, you should try to stick with mainstream platforms, programming languages, databases and packaged applications.    Choose products that have a large user base, so it will be easier to find people that can work with the technology.   Second, you’ll need to adequately document how all of your systems work.   You don’t need to write the next War & Peace, but you will need up-to-date storyboards, data models and data flows.    Third, do your best to document the edge cases in your system — the things that don’t work as people might expect them to work.  Last, but certainly not least, give us a call — we love working with edge cases!