There’s always a reason why things break in data centers, and the powers that be can usually find someone to blame — whether that someone is an IT operations staff member, an OEM, a systems integrator or a third-party service provider.

Often, the offender leaves clear fingerprints, such as a mislabeled component or a process that wasn’t updated. Some incidents are clearly due to the oversights of multiple parties.

Uptime Institute, a professional group whose membership includes data center managers from many industries, has collected and studied incident data for almost 20 years and concludes that the majority of problems are caused by outside parties like contractors or vendors, with a smaller but still sizeable percentage being the fault of internal IT staffers.

Since 1994, Uptime has collected data on some 5,000 abnormal incidents, which it defines as events in which a piece of equipment or infrastructure component did not perform as expected. The incident reports are submitted voluntarily by Uptime members.

Uptime said its analysis found that the percentage of abnormal incidents attributable to operations staff was 34% in 2009, followed by 41% in 2010, and 40% last year.

Third-party operators that work on a customer’s data center or that supply equipment to it, such as manufacturers, vendors, factory representatives, installers and integrators, were found responsible for 50% to 60% of the incidents reported from 2009 to 2011.

The analysis likely faces criticism from all sides, because neither internal IT operations personnel nor data center vendors take blame easily, except when it could affect the bottom line.

For instance, Ahmad Moshiri, director of power technical support at Emerson Network Power’s Liebert Services unit, said that vendors in some cases do accept blame for a problem they feel is caused by an oversight by internal IT operations.

“The vendor gets caught up in a sensitive spot,” he said. “It doesn’t want to put the client — a facilities manager — in a difficult position. It’s very touchy.”

Uptime said it also found that internal IT operations staffers are responsible for the majority (60%) of the worst abnormal incidents — those that resulted in a system or data center outage.

Hank Seader, managing principal for research and education at Uptime, said those results could be misleading as well. Often, it’s “the design, manufacturing, installation processes that leave banana peels behind, and [it’s] the operators who slip and fall on them,” Seader said.

David Filas, a data center engineer at Novi, Mich.-based healthcare provider Trinity Health, added, “The designs and actions of engineers, architects, and installation contractors can have latent effects on operations long after construction. Outside forces can make or break the data center just as easily as internal forces.”

He noted that Trinity Health suffered through a data center outage because an emergency power-off bypass circuit was not built to spec when the facility was constructed years earlier.

Filas suggested that IT’s increased reliance on contractors to build or update data centers exacerbates the risk of problems.

Electrical contractors, for example, may not understand the specific needs of a data center, he said, adding, “We are frequently questioned on why we provide redundant power to racks.”

Emerson’s Moshiri cited process and procedural issues as leading causes of data center problems, particularly when multiple vendors are involved and a high degree of coordination is needed.

– In ComputerWorld

 

Advertisements