Untitled Document
As the amount and types of data that constitute an organization's information
increases exponentially, it becomes more difficult to manage and protect that
data. Data deduplication technology, which is maturing in the backup space and
emerging in nearline and primary storage solutions, is becoming a necessity for
any organization managing large amounts of data. But why is data deduplication
so critical? Simple: it has the potential to change the economics of data management
and data protection solutions by greatly reducing the costs of storing and transferring
data.
An IT manager drawn to the promise of deduplication will want to know how data
deduplication delivers disruptive benefits, how it compares to other reduction
technologies, what its limitations are and what customers must know when deploying
deduplication solutions.
Deduplication compared to other data reduction technologies
First-generation data reduction technologies such as data compression and file
single instancing have been available for some time and have been widely adopted.
Although these technologies can reduce data footprint and transfer costs, these
gains are modest, usually in the range of 2-8X.
The goal of most compression schemes is to reduce the footprint of a single
file by removing redundancies and efficiently encoding the results. Little is
done to address duplicate or versioned files. Emerging application-aware compression
technologies can deliver significantly improved reduction rates for narrow classes
of data, but often with significant performance penalties and little or no improvement
handling cross-file redundancies.
The goal of file single instance storage (SIS) technologies is to reduce the
data footprints of a repository by eliminating redundant files and replacing
them with references. But file level SIS does little to reduce the footprint
of a single file and doesn't handle sub-file redundancies. Block level SIS technologies
improve reduction ratios by eliminating duplicate fixed length blocks across
files in a repository, but results are still limited. File level changes that
affect data alignment (e.g. byte insertions) typically defeat fixed block schemes.
Related, redundant data in dissimilar files are rarely suitably block aligned,
further limiting block level SIS technologies, making them a poor choice for
versioned environments.
1