Booklet: Operations
Section:
Risk Mitigation and Control Implementation
Subsection: Controls Implementation
 

 

 

 

 

 

ENVIRONMENTAL CONTROLS
Many financial institutions rely on IT operations that are complex, sensitive, or critical to daily functioning. Disruptions to the IT operations environment can pose significant operational, strategic, transaction, and reputation risks. Consequently, management should control and monitor environmental factors whether at the business line or the consolidated data center. Management should carefully assess the IT operations environment and implement relevant controls.

Computing equipment should have a continuous uninterrupted power source. Independent electrical feeds drawing on separate power grids are the most reliable power source, however they may be cost prohibitive and may not be feasible in many geographic locations. Management should take reasonable action to protect computing equipment power sources. Where dual feeds or back-up power generators are used, wiring should support automatic switching in the event one power source is disrupted. Power surges can also damage computer equipment. Consequently management should monitor and condition or stabilize the voltage of electricity sources to prevent power fluctuations.

IT operations centers should have an alternative power source independent of local power grids. Typically, this is provided by a combination of a battery-based uninterruptible power supply (UPS) and a generator powered by gasoline, kerosene, natural gas, or diesel fuel. Management should configure the UPS to provide sufficient electricity within milliseconds to power equipment until there is an orderly shutdown or transition to the back-up generator. The back-up generator should generate sufficient power to meet the requirements of mission critical technology and environmental support systems. The institutions should have sufficient fuel in storage or readily available to sustain operations for at least two or three business days. In addition, management should make arrangements to replenish the fuel supply in the event of an extended outage. Gasoline becomes stale after an extended period of time; the tank should be drained and refilled with new gasoline at least annually. A lower cost alternative to installing a permanent generator is to configure the operations center with an exterior electrical box to connect a temporary generator. Under this scenario, management should also establish reliable arrangements for the availability and delivery of a generator and fuel within a required time frame.

Similarly, IT operations centers should have independent telecommunication feeds from different vendors. Wiring configurations should support rapid switching from one provider to another without burdensome rerouting or rewiring. Because vendors often share or sublease the same common cabling or are routed through the same central office, management should have the vendors perform line traces to ensure there is no single point of failure or path redundancy.

Even small IT operations centers with modest computer equipment can contain a significant amount of computer cabling. Management should physically secure these cables to avoid accidental or malicious disconnection or severing. In addition, management should document wiring strategies and organize cables with labels or color-codes to facilitate easy troubleshooting, repair, and upgrade.

Every operations center should have adequate heating, ventilation, and air conditioning (HVAC) systems in order for personnel and equipment to function properly. Older computer equipment produces a significant amount of heat, requiring cooling capacity exceeding that of a standard office building. Some newer models do not produce as much heat and thus do not require as much air conditioning. Organizations should plan their HVAC systems with the requirements of their computer systems in mind. Back-up sources of electricity should be able to sustain HVAC systems, because inadequate cooling could render computer equipment inoperable in a short period of time. Also, operations personnel should be familiar with written emergency procedures in the event of HVAC system disruption.

Personnel should also be able to function in the event utility service is interrupted. Therefore, management should keep a one- to two-day supply of bottled water and non-perishable food on the premises.

All operations centers should have heat and smoke detectors installed in the ceiling, in exhaust ducts, and under raised flooring. Detectors should not be situated near air conditioning vents or intake ducts that can disburse smoke and prevent the triggering of alarms. Some large and complex operations centers are beginning to use very early smoke detection alert (VESDA) systems in place of conventional smoke detectors. VESDA systems sample the air on a continuous basis and are far more sensitive. They are capable of detecting a fire at the pre-combustion stage. Although more expensive than conventional systems, a VESDA system can detect a smoldering wire and alert management before a fire starts. The early notice may also prevent suppression equipment from deploying water or foam that can damage computer equipment.

A variety of strategies are available for fire suppression. One of the more widely used systems was a halon gas system that deprived a fire of oxygen. The government phased out production of halon because it determined that halon causes ozone depletion. The phase out deadline was December 31, 2003. Once existing reserves are depleted additional halon may not be purchased. Institutions still using halon systems should be prepared to switch to another fire suppression system. Newer systems rely on the same theory, but use inert agents such as Inergen, FM-200, FE-13, and carbon dioxide. Many facilities continue to rely on water as a fire suppressant, choosing a wet-pipe or dry-pipe configuration. In the wet-pipe configuration, the pipes are filled with water and may be subject to leakage. In the dry-pipe configuration, the pipes are empty until a fire is detected, minimizing the risk of water damage from burst or leaking pipes. Ideally, the fire suppression system should allow operators time to shut down computer equipment and cover it with waterproof covers before releasing the suppressant. Many facilities store waterproof covers throughout the data center to cover sensitive equipment quickly if sprinklers are activated.

Water leaks can cause serious damage to computer equipment and cabling under raised floors. For this reason, operations centers should be equipped with water detectors under raised flooring to alert management to leaks that may not be readily visible. Management should also consider installing floor drains to prevent water from collecting beneath raised floors or under valuable computer equipment. Furthermore, management has several considerations in protecting cables from water damage; running cables under raised flooring risks flooding from below but suspending cables overhead risks water damage from leaks resulting from the roof or floors above.

PREVENTIVE MAINTENANCE
Preventive maintenance on equipment minimizes equipment failure and can lead to early detection of potential problems. This includes minor maintenance such as cleaning peripheral equipment as well as more extensive maintenance provided by the manufacturer, vendor, or maintenance contractor. Preventive maintenance also includes general housekeeping to keep the operations center clean and orderly.

Unless specifically authorized by management, computer operators should not repair equipment or perform other than the most routine maintenance. Even if they have the requisite knowledge and experience, many hardware and software warranties disclaim liability for unauthorized maintenance or alteration. Maintenance by computer operators should be performed according to manufacturers' recommendations. As a general rule, these duties include:

Bullet

Cleaning tape heads each shift;

Bullet

Cleaning printers daily;

Bullet

Checking and cleaning the magnetic ink character recognition (MICR) reader/sorter at the end of each shift; and

Bullet

Periodically checking and cleaning the area under raised flooring.

Maintenance schedules may vary considerably depending on the number and variety of technology systems and the volume of work processed. All maintenance should follow a predetermined schedule. Employees should document maintenance in logs or other records. Management review of these records will aid in monitoring employee and vendor performance.

The manufacturer or vendor will usually perform maintenance under contract. For leased equipment, maintenance may be part of the lease arrangement. When equipment is owned or leased from a third party, management should obtain a separate maintenance or service agreement between the operations center and the equipment manufacturer. The service or maintenance agreement should provide repair services, detail the preventive maintenance, and include a schedule for both. When an operations center uses hardware from more than one manufacturer, it may be desirable to enter into an arrangement whereby one vendor takes responsibility for all repair maintenance. Under this arrangement, the operations center would contact the designated vendor to determine the source of the problem and to make all the necessary repairs. In any event, management should ensure maintenance contracts guarantee timely performance.

Management should schedule time and resources for preventive maintenance and coordinate that schedule with production. During scheduled maintenance, the computer operators should dismount all program and data files and work packs, leaving only the minimum software required for the specific maintenance task on the system. If this is impractical, management should review system activity logs to monitor access to programs or data during maintenance. Also, at least one computer operator should be present at all times when the service representative is in the computer room.

Some vendors can perform computer maintenance online. Operators should be aware of the online maintenance schedule so that it does not interfere with normal operations and processing. Operators and information security personnel should adhere to established security procedures to ensure they grant remote access only to authorized maintenance personnel at predetermined times to perform specific tasks.

Operators should maintain a written log of all hardware problems and downtime encountered between maintenance sessions. A periodic report on the nature and frequency of those problems is a necessary management tool, and can be valuable for vendor selection, equipment benchmarking, replacement decisions, or planning increased equipment capacity.