Untitled Document
Photo sharing has become one of the most popular Internet applications with
sites such as Flickr, Shutterfly and SmugMug making it a "snap" to
upload and share pictures. Millions of people now access and post billions of
photos every day. Facebook, for example, reports that its users upload more
than 10 billion photos each month.
But photo sharing is working its way to other sites for recommendations, e-commerce
product reviews, and user contributed news. So the challenge of handling an
ever-increasing number of photos now applies to a larger range of web properties.
To support the unpredictable yet ever-increasing number of uploads and views,
photo hosting applications generate various sizes, like thumbnails and previews
to speed the process. Add this burden to the lack of control over content uploading,
and the underlying infrastructure is severely taxed. It has to store and retain
multiple sizes for each photo while also serving millions to billions of photo
files to a worldwide audience.
For web companies on the rise, preparing for this data growth can be challenging.
Innovative architectures are imperative to ensure peak performance as old systems
and approaches simply cannot handle the workloads. New, distributed systems
and optimized small-file serving resolve these challenges for any web property
relying on photos to enrich their application.
Storage challenges and solutions for large scale photo sharing
Let's take a closer look at a few of the key challenges and potential solutions
to building and scaling large photo sharing applications.
Managing small files (and lots of them)
The sheer number of contributed photos multiplied by the different sizes kept
on hand quickly mushrooms into a small file handling mess. Many file systems
and storage architectures developed just a few years ago are not capable of
managing small files efficiently.
Two major trouble sources are the delay in scanning a directory with many files,
or the time it takes to sort through many directory levels. When trying to retrieve
a jpg preview, these taxing operations create system delays. Traditional systems
also tried to brute force the small file approach by using many small, fast
drives, RAID striping, and memory over-provisioning but this quickly leads to
excessive hardware costs.
1