The growth of the Internet, the rising number of consumers conducting business
and purchasing goods online, and the increase in the number of systems deployed
by companies to manage customer data have all significantly increased the volume
of customer data collected by businesses. Dramatic decreases in the costs to store
and process data, means that many businesses are retaining customer data much
longer and are using it to create more complete, historical views of customers.
The Internet has also impacted consumer expectations. Consumers now expect
the companies they deal with to be able to access complete, up-to-date information
about their accounts and provide that information at all points of service,
regardless of whether the information is one year or one minute old. New expectation
levels have increased pressure on businesses to collect, integrate and understand
all customer information housed within their organization.
Most enterprises today appreciate that, unless they have a firm grasp on customers
and how customers fit into the overall "big picture," they could be
missing huge business opportunities to increase revenue, growth and customer
satisfaction. Companies also have increasing concerns about regulatory compliance
issues and new data privacy and security requirements for this growing quantity
of customer data.
To create a complete, single view of each customer, many organizations know
they need to accurately match customer data from multiple sources. They also
know that implementing a customer-centric master data management (MDM) solution
is usually the best approach to achieve these goals. What they are often confused
about is that long-standing question regarding technology: to build or to buy?
The answer depends on an organization's own circumstances, goals and objectives,
which need to be considered before making a decision about whether to build
or buy an MDM solution.
To Build Or Buy? It Depends Before deciding on an MDM strategy, an organization
must first answer the following questions about its data and how it will be
1. What is the organization's current data volume and how will this volume
increase in the future?
2. How will data be used now and in the future? Will additional types of data
be collected over time that may change matching criteria? Will outside data
be leveraged to augment internal data?
3. What level of data accuracy and completeness is required to support business
goals and will these levels vary by business user?
4. Is real-time access to a complete view of data a requirement? If not, what
level of data latency is acceptable?
5. What resources and budgets, now and in the future, are available to support
an MDM initiative?
6. How much time does the organization have to deploy an MDM solution?
7. How much impact do regulatory compliance issues have on the organization?
Once an organization has answered the questions above, it is ready to take
a closer look at whether to build or buy.
A Closer Look at Build
Most companies will not actually "build" an MDM solution from scratch,
but rather "assemble" different systems to create a multi-step approach
to MDM. These steps usually include the following:
1. ETL (extract, transform and load) tools pull, at a minimum, initial data
loads out of existing repositories and put it into a common format and location.
2. Data quality tools clean up, standardize and fix data, such as inaccuracies
in address and phone number information.
3. Matching and integration tools create a single customer view from multiple
records by resolving those that contain different information about the same
person. For example, records that use or include formal names, nicknames, maiden
names, misspelled names or name reversals.
4. Centralized data stores house the clean, integrated data. Depending on an
organization's needs, that data will be used for back end data analytics, real-time
transactions or both.
5. Message-based integration (web services or message busses) synchronizes data
changes throughout the MDM ecosystem.
A build solution is usually best for companies with smaller data volumes (less
than one million records), where matching criteria is very straightforward and
there are limited sources of data. They are also best for companies that want
to use customer data for back-end analytics or can otherwise accept certain
levels of data processing latency.
For most companies, the answer to the question about current and future data
volumes is perhaps the most critical when deciding to build or buy. Internal
"build" scenarios often fail because the componentized structure does
not scale to handle large data volumes. In order to ultimately match data, a
business employing a "build" strategy must process data through the
ETL, quality, and matching tools and then deposit it into a centralized repository.
As data volumes grow, processing time increases at an even faster rate (faster
than linear), resulting in businesses needing to make significant investments
in increasing the number of hardware and human resources responsible for data
processing. For example, a small data set (less than one million records) could
take between four and five hours a day to process, resulting in data latency
that would ultimately require an organization to continually run the update
process. The need for constant updates would increase the resources required
and, ultimately, the total cost of ownership.
The components of a build solution are essentially batch processing tools that
have been available to the market for some time and are usually reasonably priced.
However, the responsibility for ensuring the tools work together, which can
be an extremely complicated process, lies squarely on the shoulders of the organization
assembling them. These organizations run the risk of too tightly coupling systems
together, which complicates future maintenance. Those that choose to build their
own matching logic introduce yet another layer of risk. In addition, organizations
need to put mechanisms in place to manage data quality as data flow through
Finally, build strategies can significantly increase the exposure and risk
of companies impacted by regulatory compliance issues, such as the Health Insurance
Portability and Accountability Act (HIPPA), Sarbanes-Oxley Act or Gramm-Leach-Bliley
Act, among others. Businesses with less compliance risk are most likely to choose
a build scenario, where they work with multiple vendors and systems, none of
which bears full responsibility for ensuring the MDM solution meets compliance
standards. Organizations must take their compliance needs into account when
preparing financial models for MDM, since reduced risk translates to less chance
of being assessed a fine for noncompliance.
When to Buy
While a "build" approach to MDM is appropriate for some businesses,
many others are better served by purchasing an MDM offering from a vendor specializing
in the technology. A buy scenario provides the complete MDM function in a single
package, and has been designed to efficiently manage large numbers of customer
data and reduce or eliminate the latency that results from a build scenario.
In a buy scenario, customers purchase MDM solutions, install them behind firewalls
and connect them to existing systems where customer data is housed. The software
matches and links data, using either deterministic or probabilistic algorithms,
and then provides the integrated customer view back out to consuming applications
(e.g. via web services). MDM solutions that use probabilistic algorithms provide
a higher level of data accuracy and require significantly less time and expense
to deploy than solutions that use deterministic algorithms or those deployed
under the build scenario.
Some purchased MDM solutions can be configured to perform their functions in
real time so that, when a customer presents themselves at a point-of-service
location, data about them are immediately available throughout the organization.
In addition, MDM solutions can come packaged with data steward applications,
which provide additional data management functions.
A "buy" scenario is most appropriate for large enterprises (or those
that aspire to become large enterprises) that need to handle high volumes of
data and require a solution to conduct real-time or near real-time transaction
processing with sub-second response times. As a business grows, either organically
or through acquisition, it continues to add more data from new customers or
existing customers with whom it is now doing more business. An MDM solution
from a specialty vendor is able to manage and work with the influx of data being
collected and processed without bogging down. Many large enterprises have learned
the hard way that they are better served purchasing a complete MDM solution
and continually finding ways to use it than perpetually investing in "build"
scenarios that are ultimately either too complex or do not scale.
For example, a large consumer-based technology organization that allows customers
to buy software over the phone and then fulfill it from a web download source,
needs an MDM solution that updates customer data immediately, links it with
other records and makes this data available in less than a second for the next
transaction, regardless of where that transaction takes place.
While a commercial MDM solution may cost more upfront, the total cost of ownership
may be significantly less than with the build alternative. Real-time processing
and the ability to keep up with data influxes reduces the need to increase the
number of IT staff managing the process, which ultimately reduces long-term
costs. Creating a solution from scratch also requires the building of complex
algorithms, a skill set that not many developers or companies have. Also, purchasing
an MDM solution from one vendor eliminates the worries created in a "build"
scenario where individual components are upgraded at different times and may
not be compatible with other components.
Additionally, a "buy" scenario is attractive to large enterprises
that are bound by regulatory compliance requirements. By working with one vendor
that is responsible for the full MDM lifecycle, and ultimately the responsibility
for meeting and maintaining compliance standards, enterprises can reduce their
potential exposure for noncompliance.
And lastly, businesses that are currently using or planning to implement a
service oriented architecture (SOA) can easily plug in customer identification
and information as a data service. Since SOA environments have the same low
latency and high accuracy requirements as a real-time MDM solution, it makes
sense that businesses leveraging SOA architectures would buy an MDM solution
that already has this structure built in and can serve customer data up as a
part of the SOA.
The case can be made for both "build" and "buy" MDM scenarios,
depending on the organization making the choice and its requirements. For a
business to determine its appropriate course, it must take into consideration
business goals such as growth, acceptable accuracy and latency levels, how data
will be used, availability of financial and human resources, and how heavily
regulatory compliance governs its business.
Businesses with a smaller number of stable data sets that will use customer
information for analytics on the back end and can therefore tolerate some latency
and lower accuracy rates, might be well served by a build scenario. On the flip
side, enterprises that have a large number of data sets and/or plan to grow
this number, use data for real-time transactions and cannot tolerate latency,
and require highly accurate results, would be better served working with a single
vendor and buying an MDM solution.
About the Author
Marty Moseley serves as chief technology officer at Initiate Systems where he is responsible for the company’s strategic technology direction, development and future product evolution. Initiate Systems, Inc. is the leading provider of customer-centric master data management software for companies and government agencies that want to create the most complete, real-time views of people, households and organizations from data dispersed across multiple application systems and databases. He can be reached at email@example.com and additional information on Initiate Systems is available at www.initiatesystems.com.