Enabling Enterprises for Cloud-Scale Deployments
Gary Orenstein, Vice President, Technical Solutions, MaxiScale
Mark Balch, Director of Product Management, MaxiScale
Within the last five years, enterprises have adopted LAMP (Linux, Apache, MySQL,
PHP) stacks to deploy both internal and external Web applications. This new approach
has dramatically reduced the costs of application management and deployment for
not only these organizations, but their customers, employees and partners.
However, an underlying issue to this framework is the lack of an adequate storage
infrastructure to support exploding data requirements. Traditional file servers
and storage systems are simply not built to handle the high data demands and
scale required by enterprise Web applications. Consider the retail operations
of Walmart.com handling more than half a million unique visitors daily, the
customer-facing tracking options for FedEx with more than a quarter million
daily visitors, or the collaboration and Intranet requirements of a company
as large as General Electric. These types of applications drive unprecedented
needs for fast, economical and scalable file serving.
Enterprise Web applications require a new and innovative approach to file system
infrastructures to enable simple scalability and reliable application performance.
This article will outline the enterprise challenges inherent in legacy storage
and file systems and give real-world examples of the key benefits resulting
from the use of distributed file systems that are optimal in supporting cloud-scale
Challenges and Requirements: Enterprise Scale-Out Deployments
Enterprises continually struggle to match file serving and storage costs, performance
and feature sets that can keep up with unpredictable and rapidly expanding workloads.
Since today's applications differ significantly from those of just a few years
ago, there are a new set of challenges and requirements facing enterprises.
In 2009, Enterprise Strategy Group conducted a study on scale-out network attached
storage solutions. The results showed the most frequently mentioned considerations
- Faster storage provisioning times
- Improved scalability
- Easier to manage
- Improved data availability
Rapid provisioning and scale
Whereas applications that served only a small set of users had moderate needs
for rapid provisioning and scale, today's Web applications reach unprecedented
levels of data growth-both in terms of overall capacity and the number of files
or objects managed. This causes significant pain for IT administrators, who
must constantly provision new capacity and performance. If provisioning requires
the deployment of new systems, and then the manual oversight to load balance
across those systems, it will be impossible to keep up.
In addition, many file serving and storage systems have caps on the number
of nodes, overall capacity, or maximum file delivery performance they can achieve.
These inherent limitations can quickly cause troublesome and costly roadblocks
to effective scaling.
Ease of management
Similar to provisioning and scale, IT professionals must be able to manage a
growing installation without having to add management tasks. Conventional systems
often operate as "islands of storage," requiring individual mount
points as well as individual management. The explosion of managed entities easily
outpaces the ability of a single administrator to keep up limiting application
Scale-out applications typically support a large number of end users requiring
continuous uptime. Solutions that require off-line software upgrades, or cannot
undergo online hardware refreshes, negatively impact the application.
Basics of a Cloud-Scale Solution
Cloud-scale file serving and storage solutions support Web companies, service
providers and enterprises architecting scale-out Web based applications that
may in turn be offered as services to their customers.
These solutions must scale in a manner that decreases management cost per unit
to meet the economic challenges of serving large user bases with potentially
explosive amounts of data growth.
True scale-out -- commodity hardware + software, no bottlenecks
As evidenced by Internet giants such as Amazon, Facebook, Google, and Yahoo!,
the models to support scale-out applications involve large numbers of commodity
hardware server nodes throughout the compute and storage layers. These solutions
are architected to handle millions of simultaneous requests across billions
of files, and therefore distribute metadata operations across many inexpensive
hardware nodes to achieve high throughput at low cost. This eliminates centralized
choke points within the storage system to locate and retrieve the requested
data, and removes crippling performance bottlenecks typical of conventional
Single management point, single namespace
Scaling performance and capacity seamlessly requires unified management where
any node in the system can act as a management node of the entire cluster. This
simplifies operations and keeps management overhead constant while accommodating
increasing application loads and datasets. By making the entire system available
within a single namespace, administrators can manage a single mount point for
applications instead of having to manually load balance and reallocate among
a set of independent devices.
Withstand drive and node failures automatically
Cloud-scale solutions must withstand the drive and node failures that occur
without disrupting the application. Commodity hardware is inexpensive, but not
immune to physical failures. Self-healing systems automatically detect hardware
failures and repair the system without loss of data or access to the affected
data. Data replication often plays a key role by enabling this capability, ensuring
that data is always available and eliminating single points of failure.
Scale-out cloud applications service users 24 hours each day, making downtime
for upgrades an unaffordable luxury. Successful cloud applications deploy flexible
storage systems that can be upgraded and expanded seamlessly, while applications
continue to access data. Administrators adjust the storage system as the business
requires without worrying about scheduling an outage window simply to add capacity
or upgrade hardware and software components.
The overall strategy involves an investment in software resiliency that ultimately
allows applications to transcend hardware implementations. One can envision
an application that remains online through numerous hardware refreshes during
Cloud-scale solutions have the ability to service large numbers of end users
with explosive data requirements, keep the cost of service delivery to a minimum
and provide availability to keep users happy and administrators nimble. Web
companies, service providers and enterprises can keep these guidelines in mind
when evaluating current and future solutions.
About the Authors
Gary Orenstein is vice president, technical solutions at MaxiScale. Orenstein, who has extensive data center infrastructure and network storage experience, has served in leadership marketing roles at numerous networking and storage companies. In addition to being a regular contributor to GigaOM, Orenstein hosts the podcast The Cloud Computing Show. Orenstein is the author of IP Storage Networking: Straight to the Core. He holds an MBA from the Wharton School at the University of Pennsylvania, and a BA from Dartmouth College.More by Gary Orenstein
Mark Balch has served in leadership roles for market-winning storage, networking and data center automation products at early stage ventures and established technology firms. Prior to MaxiScale, he led the flagship Server Automation product line at Opsware, which was acquired by Hewlett Packard. He has held product management and development positions at Topspin Communications (acquired by Cisco), Nishan Systems (acquired by McData/Brocade) and C-Cube Microsystems (acquired by Harmonic). Mr. Balch earned a bachelorís degree in electrical engineering from The Cooper Union.More by Mark Balch