We use cookies and other similar technologies (Cookies) to enhance your experience and to provide you with relevant content and ads. By using our website, you are agreeing to the use of Cookies. You can change your settings at any time. Cookie Policy.

The Performance Principle

Russell Rothstein

After Amazon: 5 Ways IT Can Protect Itself from the Next Cloud Outage

user-pic
Vote 0 Votes

Follow Russell on twitter at @RussRothsteinIT

The Amazon cloud outage is a wake-up call for IT staff that are not adequately prepared for the journey to the cloud. Planning for migration of applications to any type of cloud - public or private, on-premise or off-premise - requires appropriate service management processes and infrastructure. Otherwise, you risk being unable to manage, or even understand, the business impact of future cloud outages.

When talking about business services in the cloud, it's almost impossible to avoid the obvious play on words: when you move to the cloud, you lose visibility. In order to meet SLAs, maintain a quality user experience, and resolve problems quickly, you need a clear picture of your services as they traverse each hop of the infrastructure. But in the cloud, where resources are virtualized and allocated dynamically, you often have little idea where services are running.

The Amazon cloud outage demonstrates the point. When the outage occurred, the EC2 dashboard could not tell customers how their applications and services were performing. It did not provide round-trip transaction times or report on the user experience. Instead, it reported various problems with latency and errors that were eventually linked to the cloud storage service. Those KPIs did not tell EC2 customers how the outage was affecting their business. In fact, according to Amazon, the outage was not even a violation of customer SLAs - even though many sites went down completely.

Cloud computing requires a sophisticated approach to Business Service Management that enables you to track services from the data center and into the cloud. This post looks at 5 key capabilities that organizations must have in order to maintain visibility and control in the cloud:

1. INTEGRATED, END-TO-END SERVICE VIEW

In the cloud more than ever, you need a top-down view of your business services, end-to-end. The service cannot be a block box; instead, you need a topological map that shows the execution of the each service - also called a business transaction - as it traverses every server in the private and public cloud. As we saw last week, it is critical to build redundancy and not to rely on a single cloud provider for all of your needs, so you need a solution that can track complex hybrid architectures, even between clouds.

You need to see the performance not only round-trip, but on each leg of the journey. This is the only way to assure SLAs on the one hand, and to quickly identify the source of performance degradation on the other. Ideally, your solution will also provide some deep-dive capabilities so that in addition to identifying the problem tier, it will also lead you to the source of the problem.

2. DYNAMIC SERVICE DISCOVERY

Since dynamic resource allocation is a cornerstone of the cloud ROI model, the path of a service or transaction in the cloud will be changing. If your monitoring solution requires manual definition of services, it is very likely that it will not work properly in this type of environment.

To ensure accuracy and to save valuable time, it is important to choose a solution that automatically identifies business services and maintains a dynamic picture of service delivery.

3. REAL END-USER EXPERIENCE MONITORING

Once of the most important indicators of application health is the experience of real end-users. Synthetic transactions can provide an important indicator during quiet times but they cannot tell you what all of your users are experiencing, all of the time. Setting up a real-user monitoring solution in the cloud can be complicated since you do not necessarily control the point on the network between the application and your users. You should make sure that your monitoring solution can track real-user transactions in any cloud configuration. This is a crucial piece of information that puts the technical information from your cloud services provider into business context.

4. CHANGE MANAGEMENT

Even in the datacenter, change is probably the greatest risk to service stability. That risk is magnified exponentially in the cloud where any change to code, hardware, or configuration can affect the behavior and performance of business services in unpredictable ways. Again, the Amazon outage shows us that even in the cloud, you may have to make some fast decisions and changes in order to keep your critical services on line.

To mitigate the danger, you need a monitoring solution that can baseline service performance and analyze the impact of change on a wide variety of parameters. It's important to choose a solution that captures all transaction instances - and does not rely on sampling - so that you can accurately analyze problems and find root causes that occurred before a service level alarm would have been triggered.

5. EFFECTIVE COMMUNICATIONS

One of the biggest obstacles to the cloud is the - understandable - fear of business owners that performance and usability will decline. Many application owners are concerned about the risks of sharing resources and are reluctant to accept the standardization and loss of control inherent in the cloud model. Unfortunately, well-publicized events such as the Amazon outage will only exacerbate those fears.

Yet the benefits of the cloud are real, and IT must be able to not only mitigate the risks of outages, but also to demonstrate the benefits to a business audience. You need a solution that measures performance and user experience, and can communicate them in a robust and intuitive fashion.

Enhanced by Zemanta

Russell Rothstein blogs about cloud computing, performance management, business service management and related topics, examining how new technologies and business models impact the dynamic IT service management market.

Russell Rothstein

Russell Rothstein has spent his 20+ year career in the enterprise technology industry at the crossroads between technology and business. He has spoken at industry events including Interop, CloudConnect, CMG, Red Herring, and TeleManagement World. Russell is currently Founder and CEO of IT Central Station, a B2B social networking site that provides user reviews and ratings of enterprise software, hardware and services. Previously, Russell was Vice President of Product Marketing at OpTier, a vendor of application performance management (APM) solutions. Before joining OpTier, Russell was AVP Product Marketing at OPNET Technologies (Nasdaq: OPNT) where he helped lead the company’s focus into APM. He was co-founder and CEO of Zettapoint, a venture-backed enterprise software startup that was acquired by EMC, and ran marketing for Open Sesame, a Web 1.0 startup that was acquired by Bowne/RR Donnelley (NYSE:BNE). Russell began his career at Oracle, deploying Oracle Applications for Fortune 1000 companies. Russell received a BA in Computer Science from Harvard University, an MS in Technology and Policy from MIT and an MS in Management from the MIT Sloan School of Management. Follow Russell on twitter at @RussRothsteinIT .

Recently Commented On

Recent Webinars

    Monthly Archives

    Blogs

    ADVERTISEMENT