By Larry Smith, President, ABR Consulting Group, Inc.


One of the most troublesome parts of producing a budget for the design and relocation of a data center is budgeting for the construction of the computer room and supporting equipment yard portion of the project.  The primary reasons that make this area so difficult to budget is; (1) facilities groups and IT groups rarely design and build data centers, and (2) 

The Uptime Institute® http://upsite.com/TUIpages/whitepapers/tuitiers.html has developed a tiered classification approach to site infrastructure functionality that addresses the need for a common benchmarking standard. The Institute’s system has been under development for several years, and includes measured availability figures ranging from 99.67% to more than 99.99% It is important to note that this range of availability is substantially less than the current Information Technology (IT) expectations for “Five Nines.”

Over the last forty years, data center designs have evolved through at least four distinct stages, which are captured in the Institute’s classification system. Tier I first appeared in the early sixties, Tier II in the seventies, Tier III in the late eighties and early nineties, and Tier IV in 1994 with the United Parcel Service Windward project, which was the first site to assume the availability of dual-powered computer equipment. The Uptime Institute® participated in the development of Tier III concepts and pioneered the creation of Tier IV.

Invention of Tier IV was made possible by Ken Brill, Executive Director of The Uptime Institute, who envisioned a future when all computer hardware would come with dual power inputs. During constructi
on of the $50 million Windward project, United Parcel Service worked with IBM and other computer hardware manufacturers to provide dual-powered computer hardware. Dual power technology requires having at least two completely independent electrical systems. These dual systems supply power via diverse power paths to the computer load, which moves the last point of electrical redundancy from within the Uninterruptible Power System (UPS) down to within the computer hardware itself. Brill’s intuitive conclusion has since been confirmed by Uptime Institute research that has determined that 95% of all site infrastructure failures occur between the UPS and the computer load. Since completion of the Windward project in 1994, Tier IV electrical designs have become common and the number of computer hardware products with dual inputs has grown.

The advent of dual-powered computer hardware in tandem with Tier IV electrical infrastructure is an example of site infrastructure design and computer hardware design simultaneously achieving higher availability. With the significant improvements in computer hardware design currently being made, many data centers constructed even in the last five years offer only Tier I, II, or III functionality, falling far behind in their capacity to match the availability offered by the Information Technology they support.

Defining the Tiers
The tier classification system involves several definitions. A site that can sustain at least one “unplanned” worst-case site infrastructure failure with no critical load impact is considered fault tolerant. A site that is able to perform planned site infrastructure activity without shutting down critical load is concurrently maintainable (fault tolerance level may be reduced during concurrent maintenance). It is important to remember that a typical data center site is composed of at least twenty major mechanical, electrical, fire protection, security and other systems, each of which has additional subsystems and components. All of these must be concurrently maintainable and/or fault tolerant for the entire site to be considered concurrently maintainable and/or fault tolerant.

Some sites built with fault tolerant System+System electrical concepts failed to incorporate the mechanical analogy, which involves dual mechanical systems. Such sites are classified Tier IV electrically, but only achieve a Tier II level mechanically. The following list summarizes the characteristics of each Tier.

+ Tier I
Single path for power and cooling distribution, no redundant components, 99.671% availability.

+ Tier II
Single path for power and cooling distribution, redundant components, 99.749% availability.

+ Tier III
Multiple power and cooling distribution paths, but only one path active, redundant components, concurrently maintainable, 99.982% availability.

+ Tier IV
Multiple active power and cooling distribution paths, redundant components, fault tolerant, 99.995% availability.

The availability numbers have been drawn from industry benchmarking conducted by The Uptime Institute and sites in the top 90th percentile (this means only 10% of all sites performed at this level). The quality of human-factors management is the most significant element separating top sites from all others.

Tier I Data Center

A Tier I data center is susceptible to disruptions from both planned and unplanned activity. It has computer power distribution and cooling, but it may or may not have a raised floor, a UPS, or an engine generator. If it does have UPS or generators, they are single-module systems and have many single points of failure. The infrastructure should be completely shut down on an annual basis to perform preventive maintenance and repair work. Urgent situations may require more frequent shutdowns. Operation errors or spontaneous failures of site infrastructure components will cause a data center disruption.

Tier II Data Center
Redundant Components

Tier II facilities with redundant components are slightly less susceptible to disruptions from both planned and unplanned activity than a basic data center. They have a raised floor, UPS, and engine generators, but their capacity design is “Need plus One” (N+1), which has a single-threaded distribution path throughout. Maintenance of the critical power path and other parts of the site infrastructure will require a processing shutdown.

Tier III Data Center
Concurrently Maintainable

Tier III level capability allows for any planned site infrastructure activity without disrupting the computer hardware operation in any way. Planned activities include preventive and programmable maintenance, repair and replacement of components, addition or removal of capacity components, testing of components and systems, and more. For large sites using chilled water, this means two independent sets of pipes. Sufficient capacity and distribution must be available to simultaneously carry the load on one path while performing maintenance or testing on the other path. Unplanned activities such as errors in operation or spontaneous failures of facility infra-structure components will still cause a data center disruption. Tier III sites are often designed to be upgraded to Tier IV when the client’s business case justifies the cost of additional protection.

Tier IV Data Center
Fault Tolerant

Tier IV provides site infrastructure capacity and capability to permit any planned activity without disruption to the critical load. Fault-tolerant functionality also provides the ability of the site infrastructure to sustain at least one worst-case unplanned failure or event with no critical load impact. This requires simultaneously active distribution paths, typically in a System+System configuration. Electrically, this means two separate UPS systems in which each system has N+1 redundancy. Because of fire and electrical safety codes, there will still be downtime exposure due to fire alarms or people initiating an Emergency Power Off (EPO.) Tier IV requires all computer hardware to have dual power inputs as defined by The Uptime Institute’s Fault Tolerant Power Compliance Specification Version 1.2. www.uptimeinstitute.org/spec.html

Tier IV site infrastructures are the most compatible with high availability IT concepts that employ CPU clustering, RAID DASD, and redundant communica-tions to achieve reliability, availability, and serviceability. The accompanying chart shows how these IT ideas relate to site infrastructure concepts.

Solving Incompatible “Five Nines”

Even a fault-tolerant and concurrently maintainable Tier IV site will not satisfy an IT requirement of “Five Nines” (99.999%) uptime. The best a Tier IV site can deliver over time is 99.995%, and this assumes a site outage occurs only as a result of a fire alarm or EPO, and that such an event occurs no more than once every five years. Only the top 90th percentile of Tier IV sites will achieve this level of performance. Unless human activity issues are continually and rigorously addressed, at least one additional failure is likely over five years. While the site outage is assumed to be in-stantaneously restored (which requires 24 x “forever” staffing), it can still require up to four hours for IT to recover information availability.

Tier IV’s 99.995% uptime is an average over five years. An alternative calculation using the same underlying data is 100% uptime for four years and 99.954% for the year in which the downtime event occurs.

Contact us at www.abrconsulting.com   Phone:  925.872.5523  Fax:  916.478.2814