Old Image Server

Once housing nearly 600,000 files, the old image server now sits as a desk ornament, thanks to Maui’s power grid.

My day had started innocuous enough: I plodded into the studio with a travel mug of coffee in one hand, and a list of orders in the other. I plopped myself into the desk chair, selected a bunch of images to work on, took a swig of Joe, and as I’m reading over the day’s to-do list, I realize that something isn’t right. The lights start to dim, there’s a strange electronic buzzing, and the clicking sound of several uninterruptable power supplies trying to kick over. In less than a minute, one of Maui’s famous brownouts is over, but this one has left me with an unusual feeling.

Usually, there’s a distinct hum in the room – an electronic white noise that comes from a blend of servers, network storage devices, and other electronic doo-dads – that’s comforting, you know everything’s working like a well-oiled machine. After this morning power dip, the tone had changed. It wasn’t as deep, and there was an unfamiliar feeling in the cadence.

I shrugged it off, thinking it was just a lack of caffeine in my system. I clicked on our main image server, got the Microsoft spinning blue disc of death, and the hairs on the back of my neck stood at attention. A few seconds later, my heart dropped into my right pinky toe when my screen displayed “CANNOT FIND //SMP-IMAGESERVER.”

It wasn’t the first time our server had crashed. In fact, we had lost two hard drives a couple months before, at a cost of 8-days to rebuild the RAID and restore all the data. I wasn’t very worried at the time, as we’ve built in triple redundancy to my workflow: Save a file to the server, then its automatically copied to three other storage devices. Should the server fail, I have two other copies I can instantly fall back to. Except this time, every time I clicked on another drive, I received “THE FILE YOU REQUESTED IS MISSING OR CORRUPT…

Oh $#!%!!!

In the world of computers and data, I’d just experienced “The Perfect Storm” — an epic meltdown of sorts — and the result introduced me to a new term I never wanted to learn: Cascade Failure. Sixteen hard drives, over three devices, carrying 17 Terabytes of images (nearly 600,000 files) – gone into an electronic abyss.

Hooking one of the redundant Network Attached Storage (NAS) drives, I decided to utilize data recovery software to retrieve the files – something I use whenever I accidently format a memory card that I thought I had downloaded. After several hours with abysmal results, I pointed my browser to Google, and found another data recovery program. I called the company, and was assured the software would retrieve everything. Fifty bucks and 20-minutes later, the software was downloaded, and the recovery process began. Several days later, 14TB of data was found and I clicked on one of the recovered images and saw a gibberish of splotchy color on the screen. I clicked on another and received random numbers and letters, the gobbledygook of ASCII data. Sure, it had found data, but it was all corrupt and unusable.

Fortunately, the saving grace came in the form of an off-site backup. Call it my version of doomsday prepping, as in the Islands, it’s not a matter of if there’s ever a hurricane or tsunami, it’s a matter of when its gonna hit (much like a hard drive failure).  When I went to the Mainland for the holidays, I brought with me a small portable hard drive – code named “Bacon” — a sole 4TB backup of everything “mission critical” to rebuild my company from a disaster.

Having made the frantic phone call to retrieve the disc and have it sent to Maui, there came a new worry, and with it another new term: Volatile Data. Volatile data essentially means that there’s no backup, and anything can happen to it if you’re not careful. In this case, it’s about shipping the hard drive 2,500 miles, and I’ve had my fair share of items that have been either lost, damaged or destroyed in transit. Failure to not receive it in good working order is not an option, so the disc was enclosed into an anti-static bag, encased in several layers of bubble wrap, then placed onto a 12x12x12 box (the drive is only 3x4x1) with a ton of eco-friendly packing peanuts.

You know the old adage about how a watched pot never boils? Well, the same rings true tracking a package. It’s three anxiety-filled days, looking for an update from UPS on the whereabouts of a business life or death knell in a cardboard box… and the whereabouts of the box seemingly never gets updated, until its delivered. Cue the long sigh of relief.

With hard drive plugged in, I was able to recover a good portion – those mission-critical files – of images, documents, spreadsheets, presentations and such. But 181,000 images (my “B” and “C”-grade shots), plus all the family photos, were toast. The loss was one of complacency – resting in the comfort that with my triple redundancy scheme, I could never lose a file. My naivety, in turn, bit me in the ass big time.

It’s a loss that in some ways, is immeasurable. To think that some 20-plus years of images are gone, and family moments and history that can never be replaced — it borders on unimaginable, or absurd. I never thought it could happen, but it did… in spades. All you can do is recognize it for what it is – a wake-up call – and move forward. Besides, dwelling on it is only good for developing an ulcer.

In the midst of a disaster, I turned to fellow photographers and IT professionals for advice on developing a series of redundancies/safety nets/Emergency Plan Alphas, while mixing in the lessons learned from this week’s “KABLAMMO!” Everyone agreed on one basic tactic: file mirroring. The premise is simple enough – when you hit the “save” button, the file is saved in one primary location, then it’s automatically saved again in a second (or third or fourth) location. With this structure, should one device fail, you’ve got an immediate copy you can revert to and continue working.

Synology DS1515+ DiskStations

The Heart of the New System: Two Synology DS1515+ DiskStations.

Working with my friend, Paul (who just so happens to be a data guru), we’ve developed a near fool-proof system, ensuring data will NEVER be lost again. The scheme utilizes two, mirrored servers (one active, the other passive), with an attached incremental backup (via Drobo); an off-site, mirrored server (our own “cloud” server, set to only receive new files or update existing files – no deletions allowed), with its own attached backup drive; a secondary Cloud mirror with a data farm, plus a separate external drive backup, which rotates monthly to our safe deposit box.

While It’s big time overkill, I now know that even under the most adverse conditions, I have six levels of redundancy to rely on. I’ll never get hit by a cascade or multiple failures in the future, and can sleep at night, knowing that my images will never be lost, so long as the Earth doesn’t swap its axis.

So folks, take the time to back up your images on at least two different drives. Keep one on hand, and put the other off site in a safe place. No one likes an epic meltdown.