Automating the Data Center

Organizations have invested millions into the data center to make the transition from mainframe and client/server architectures to the new distributed computing paradigm. This seismic shift in how data center infrastructure is deployed and used has provided IT organizations with substantial cost and efficiency benefits. However, to make a truly meaningful impact on the business, organizations require a state where business policies and service-level agreements drive dynamic and automatic optimization of the IT infrastructure, creating a highly agile, business-driven IT environment. Unfortunately, the hard reality is that while this is a noble goal, a number of significant obstacles must be addressed before it can be attained. These obstacles can be categorized into three areas: exploding complexity and cost; inconsistent quality of service; and escalating security risks.



Exploding Complexity and Cost

Inherent to distributed computing architectures is an exponential increase in the number of devices, applications and configuration elements or "knobs" that needs to be managed as compared to the traditional client/server platform.

  1. More devices to manage- A single distributed application can span multiple types of servers, such as web, application and database servers. Consequently, with the ability to develop new applications far more rapidly, the total number of devices in the data center has increased dramatically.
  2. More complex configurations - Hundreds if not thousands of discrete configuration elements such as files, configuration files, vendor and OS-specific packages, and processes must be tracked and managed on an on-going basis. Moreover, since an application relies on multiple types of servers, making a change requires not only an understanding of the configuration elements and the associated dependencies across and within each server tier but also the sequence in which changes must be made to maintain application integrity.
  3. More specialization required - Distributed applications can extend over different devices with different operating systems. For example, a supply-chain application may have web servers running on Windows/Intel-based devices and databases servers running on larger UNIX machines. Managing these applications requires not only application-specific skills but also specialists in both Windows and UNIX operating systems. Consequently, an administrator today can easily cost an organization over $100K/yr in fully loaded salary and benefits. A major Wall Street financial institution estimates that the operating costs of managing a server is between 8-9 times the associated capital costs of that device.
  4. Strict regulatory requirements - Due to increasing regulatory pressure to implement strict management controls, requirements such as Sarbanes-Oxley, SAS- 70, and HIPAA have put a significant burden on IT organizations to allocate resources to document and track what changes are made, and when, in the data center.

Inconsistent Quality of Service

There are two strongly held axioms in the data center. One, there is an inverse correlation between the rate of change and the level of stability of data center infrastructure, and two, there is an inverse correlation between the number of people who touch the infrastructure and the availability and integrity of that infrastructure. Given the attributes of the distributed computing model, a key challenge that IT managers face today is how to turn those axioms upside down.

  1. High rate of change causes instability- In the client/server world, applications and infrastructure changes happen once or twice a quarter. Conversely, changes for distributed applications can occur as frequently as several times a week. Studies by Gartner have shown that 80% of all downtime is due to misconfigurations or operator error. Hence, distributed applications are by nature far more unstable.
  2. Many groups involved- Due to the many technologies that support distributed applications, multiple IT groups (e.g., Help Desk, UNIX, Windows, Application, Security and Networking), each with its own expertise in a particular technology, have to be involved when changes are made. This requires extensive planning and coordination between groups. Because it is so difficult and time-consuming to execute all required changes, invariably all tasks cannot be completed within regular maintenance windows.
  3. Poor documentation of server configurations - Due to the dynamic nature of distributed applications and the number of people involved in making changes, it is extremely difficult to track the current state of configurations and to identify deviations from the "gold" configuration standard, if one even exists. For example, when new application updates get migrated from development to QA to production, each environment has differences in configurations which often cause unintended and very negative consequences.
  4. Hard to align IT to the business - IT groups are organized by specific technology domains, not by the services they offer. As a result, these groups do not have complete visibility into the needs of the business. For example, if an organization needs to expand the capacity of its e-commerce web site due to seasonality in its business, this requirement initiates a cumbersome process which ultimately gets translated into complex, low-level operational tasks executed by many different IT groups.

Escalating Security Risks

The number of security breaches increases every year, due mostly to flawed software with security holes that are easily exploited and the difficulties associated with tracking changes and identifying compromised servers. As a result, IT organizations are under tremendous pressure to secure their data center infrastructure and keep up with the latest security recommendations.

  1. Hard to balance security versus responsiveness - The dramatic increase in servers, applications and users requires IT organizations to monitor and manage many more security-related configuration elements. The conventional approach has been to "lock-down" the data center, limiting the amount of change and the number of personnel involved in managing change. As a result, organizations are faced with trading off responsiveness for security. Unfortunately, the dichotomy between traditional automation tools, which only focus on change, and security tools, which only focus on compliance, has compounded this problem. For this reason, IT organizations struggle to secure their data center infrastructure in an environment where a high rate of change is the norm, not the exception.
  2. Difficult to identify and fix compromised servers- Security holes in OS and application software increase every year, requiring organizations to be very vigilant about patching servers. Nonetheless, it is estimated that 90% of all security breaches result from existing vulnerabilities and that breaches can be prevented if servers are patched on time. The problem is not that organizations pay only lip service to security but that it is very hard to identify what security patches have been issued and which specific servers are affected. Once identified, it is difficult to quickly patch the appropriate servers in a timely manner without impacting application stability.
  3. Poor access controls for data center staff - Many different administrators from different IT teams touch servers on a daily/weekly basis. Administrators typically get full "root" access, which has significant ramifications for security. The lack of a centralized access control mechanism significantly impedes the ability to secure the data center. Such a control mechanism provides a rich audit trail of actions taken and limits what administrators can do based on their skills and privileges.

The Solution: Holistic Data Center Automation

Given these shortcomings, forward-looking IT organizations now recognize that they must take a holistic approach to managing infrastructure by employing a comprehensive data center automation solution that provides the foundation for a truly responsive IT environment where business policies drive the allocation and optimization of IT resources.

Data center automation solutions not only address the automation requirements of today's complex server and application infrastructure, but also better align IT operations to the needs of the business. These types of solutions provide one platform for provisioning, change, administration and compliance, and offer a wide range of functionality, including:

  1. Modeling and Management of Configuration Items - The ability to treat all types of configuration items such as files, vendor packages, specific parameters in configuration files, Windows registry settings, .Net and J2EE components across all major operating systems as objects that can be manipulated and managed in one consistent, secure and seamless manner;
  2. Transaction-Safe Provisioning and Change - The ability to easily simulate complex distributed changes to prevent problems up front and roll them back to quickly recover from unforeseen problems when changes are made;
  3. Continuous Compliance Management - The ability to define reference configurations (i.e., gold standards, security, regulatory policies, etc.) and to scan and remediate changes against these reference configurations to ensure a high level of infrastructure and application consistency on an on-going basis;
  4. Service-Oriented Computing - The ability to simplify the complexity of managing a large number of configuration items by modeling services, so that provisioning of new servers and applications or scanning and repair of existing non-compliant servers and applications can occur based on these service models. This enables business requests to be easily translated into operational tasks.

Fundamentally, the foundation to a highly agile, business-driven IT environment starts with the implementation of a data center automation solution. By doing so, IT organizations can:

  • use one platform to provision, configure and manage all types of servers and applications in the data center;
  • foster strong collaboration between IT teams, with different technology and functional expertise, to accomplish key data center tasks;
  • increase staff productivity and improve cost structure while supporting a large and fast growing data center environment;
  • reduce complexity and better align with the business through the abstraction of IT configuration components into IT services, where management decisions are made at the IT service level;
  • and create an environment that supports an extremely high rate of change while ensuring consistency and application stability, and where data center resources are easily repurposed and reallocated based on changing business needs.

About the Author

Vijay is a co-founder and a member of the Board of Directors of BladeLogic, and is responsible for the company’s overall product strategy and direction. Previously, Vijay led all phases of the company’s development efforts which resulted in BladeLogic’s current product leadership position. Before BladeLogic, Vijay was an entrepreneur-in-residence at Battery Ventures where he spent the bulk of his time working to launch BladeLogic. Earlier, Vijay was the CTO at Breakaway Solutions where he was responsible for all technology initiatives in the ASP and eBusiness lines of businesses. Prior to Breakaway, Vijay was the CTO and co-founder of Eggrock Partners, an ASP/eBusiness-consulting firm that was acquired by Breakaway Solutions.

Vijay is recognized as one the most thoughtful and experienced software technologists in the industry with a long track record of success. In his 17 year career, Vijay has held technology positions of increasing responsibility at Unisys, TCI and Cambridge Technology Partners. He holds a B.S. degree in Mechanical Engineering from Pune University, India. In addition, he serves on several technical advisory boards, holds two patents and has co-authored a book on eCommerce with two Harvard Business School professors.

More by Vijay Manwani

About BladeLogic

As the data center has evolved from the client/server to the distributed computing paradigm, solutions for provisioning, change and configuration management have not kept pace. Management processes are mired in a craftsmanship era, where highly-skilled personnel use a collection of scripts and point tools to effect change manually. The result – data center infrastructure is essentially hard-wired because it is so complicated and costly to change. Consequently, server utilization rates are exceedingly low, IT resources cannot be quickly re-purposed to respond to changing business requirements and management and support costs account for up to 80% of data center budgets.

BladeLogic was founded by industry veterans who understand these problems having managed complex, globally-distributed computing environments, including eleven data centers on four different continents. Through that experience they identified the pressing need for a new data center automation platform to help IT organizations more efficiently provision, configure and manage today’s highly sophisticated data center environment.

By providing the industry’s most comprehensive data center automation solution, more Global 2000 customers use BladeLogic to dramatically cut data center operating costs, reduce security risks and increase IT service quality than any other solution in the market. BladeLogic’s data center automation solutions enable a state where business policies and service-level agreements drive the dynamic and automatic optimization of the IT infrastructure, creating a highly agile, business-driven IT environment.