Master Data Management Systems: Build or Buy?

The growth of the Internet, the rising number of consumers conducting business and purchasing goods online, and the increase in the number of systems deployed by companies to manage customer data have all significantly increased the volume of customer data collected by businesses. Dramatic decreases in the costs to store and process data, means that many businesses are retaining customer data much longer and are using it to create more complete, historical views of customers.

The Internet has also impacted consumer expectations. Consumers now expect the companies they deal with to be able to access complete, up-to-date information about their accounts and provide that information at all points of service, regardless of whether the information is one year or one minute old. New expectation levels have increased pressure on businesses to collect, integrate and understand all customer information housed within their organization.

Most enterprises today appreciate that, unless they have a firm grasp on customers and how customers fit into the overall "big picture," they could be missing huge business opportunities to increase revenue, growth and customer satisfaction. Companies also have increasing concerns about regulatory compliance issues and new data privacy and security requirements for this growing quantity of customer data.

To create a complete, single view of each customer, many organizations know they need to accurately match customer data from multiple sources. They also know that implementing a customer-centric master data management (MDM) solution is usually the best approach to achieve these goals. What they are often confused about is that long-standing question regarding technology: to build or to buy? The answer depends on an organization's own circumstances, goals and objectives, which need to be considered before making a decision about whether to build or buy an MDM solution.

To Build Or Buy? It Depends
Before deciding on an MDM strategy, an organization must first answer the following questions about its data and how it will be used:

1. What is the organization's current data volume and how will this volume increase in the future?
2. How will data be used now and in the future? Will additional types of data be collected over time that may change matching criteria? Will outside data be leveraged to augment internal data?
3. What level of data accuracy and completeness is required to support business goals and will these levels vary by business user?
4. Is real-time access to a complete view of data a requirement? If not, what level of data latency is acceptable?
5. What resources and budgets, now and in the future, are available to support an MDM initiative?
6. How much time does the organization have to deploy an MDM solution?
7. How much impact do regulatory compliance issues have on the organization?

Once an organization has answered the questions above, it is ready to take a closer look at whether to build or buy.

A Closer Look at Build
Most companies will not actually "build" an MDM solution from scratch, but rather "assemble" different systems to create a multi-step approach to MDM. These steps usually include the following:

1. ETL (extract, transform and load) tools pull, at a minimum, initial data loads out of existing repositories and put it into a common format and location.
2. Data quality tools clean up, standardize and fix data, such as inaccuracies in address and phone number information.
3. Matching and integration tools create a single customer view from multiple records by resolving those that contain different information about the same person. For example, records that use or include formal names, nicknames, maiden names, misspelled names or name reversals.
4. Centralized data stores house the clean, integrated data. Depending on an organization's needs, that data will be used for back end data analytics, real-time transactions or both.
5. Message-based integration (web services or message busses) synchronizes data changes throughout the MDM ecosystem.

A build solution is usually best for companies with smaller data volumes (less than one million records), where matching criteria is very straightforward and there are limited sources of data. They are also best for companies that want to use customer data for back-end analytics or can otherwise accept certain levels of data processing latency.

For most companies, the answer to the question about current and future data volumes is perhaps the most critical when deciding to build or buy. Internal "build" scenarios often fail because the componentized structure does not scale to handle large data volumes. In order to ultimately match data, a business employing a "build" strategy must process data through the ETL, quality, and matching tools and then deposit it into a centralized repository. As data volumes grow, processing time increases at an even faster rate (faster than linear), resulting in businesses needing to make significant investments in increasing the number of hardware and human resources responsible for data processing. For example, a small data set (less than one million records) could take between four and five hours a day to process, resulting in data latency that would ultimately require an organization to continually run the update process. The need for constant updates would increase the resources required and, ultimately, the total cost of ownership.

The components of a build solution are essentially batch processing tools that have been available to the market for some time and are usually reasonably priced. However, the responsibility for ensuring the tools work together, which can be an extremely complicated process, lies squarely on the shoulders of the organization assembling them. These organizations run the risk of too tightly coupling systems together, which complicates future maintenance. Those that choose to build their own matching logic introduce yet another layer of risk. In addition, organizations need to put mechanisms in place to manage data quality as data flow through the system.

Finally, build strategies can significantly increase the exposure and risk of companies impacted by regulatory compliance issues, such as the Health Insurance Portability and Accountability Act (HIPPA), Sarbanes-Oxley Act or Gramm-Leach-Bliley Act, among others. Businesses with less compliance risk are most likely to choose a build scenario, where they work with multiple vendors and systems, none of which bears full responsibility for ensuring the MDM solution meets compliance standards. Organizations must take their compliance needs into account when preparing financial models for MDM, since reduced risk translates to less chance of being assessed a fine for noncompliance.

When to Buy
While a "build" approach to MDM is appropriate for some businesses, many others are better served by purchasing an MDM offering from a vendor specializing in the technology. A buy scenario provides the complete MDM function in a single package, and has been designed to efficiently manage large numbers of customer data and reduce or eliminate the latency that results from a build scenario.

In a buy scenario, customers purchase MDM solutions, install them behind firewalls and connect them to existing systems where customer data is housed. The software matches and links data, using either deterministic or probabilistic algorithms, and then provides the integrated customer view back out to consuming applications (e.g. via web services). MDM solutions that use probabilistic algorithms provide a higher level of data accuracy and require significantly less time and expense to deploy than solutions that use deterministic algorithms or those deployed under the build scenario.

Some purchased MDM solutions can be configured to perform their functions in real time so that, when a customer presents themselves at a point-of-service location, data about them are immediately available throughout the organization. In addition, MDM solutions can come packaged with data steward applications, which provide additional data management functions.

A "buy" scenario is most appropriate for large enterprises (or those that aspire to become large enterprises) that need to handle high volumes of data and require a solution to conduct real-time or near real-time transaction processing with sub-second response times. As a business grows, either organically or through acquisition, it continues to add more data from new customers or existing customers with whom it is now doing more business. An MDM solution from a specialty vendor is able to manage and work with the influx of data being collected and processed without bogging down. Many large enterprises have learned the hard way that they are better served purchasing a complete MDM solution and continually finding ways to use it than perpetually investing in "build" scenarios that are ultimately either too complex or do not scale.

For example, a large consumer-based technology organization that allows customers to buy software over the phone and then fulfill it from a web download source, needs an MDM solution that updates customer data immediately, links it with other records and makes this data available in less than a second for the next transaction, regardless of where that transaction takes place.

While a commercial MDM solution may cost more upfront, the total cost of ownership may be significantly less than with the build alternative. Real-time processing and the ability to keep up with data influxes reduces the need to increase the number of IT staff managing the process, which ultimately reduces long-term costs. Creating a solution from scratch also requires the building of complex algorithms, a skill set that not many developers or companies have. Also, purchasing an MDM solution from one vendor eliminates the worries created in a "build" scenario where individual components are upgraded at different times and may not be compatible with other components.

Additionally, a "buy" scenario is attractive to large enterprises that are bound by regulatory compliance requirements. By working with one vendor that is responsible for the full MDM lifecycle, and ultimately the responsibility for meeting and maintaining compliance standards, enterprises can reduce their potential exposure for noncompliance.

And lastly, businesses that are currently using or planning to implement a service oriented architecture (SOA) can easily plug in customer identification and information as a data service. Since SOA environments have the same low latency and high accuracy requirements as a real-time MDM solution, it makes sense that businesses leveraging SOA architectures would buy an MDM solution that already has this structure built in and can serve customer data up as a part of the SOA.

Closing Thoughts
The case can be made for both "build" and "buy" MDM scenarios, depending on the organization making the choice and its requirements. For a business to determine its appropriate course, it must take into consideration business goals such as growth, acceptable accuracy and latency levels, how data will be used, availability of financial and human resources, and how heavily regulatory compliance governs its business.

Businesses with a smaller number of stable data sets that will use customer information for analytics on the back end and can therefore tolerate some latency and lower accuracy rates, might be well served by a build scenario. On the flip side, enterprises that have a large number of data sets and/or plan to grow this number, use data for real-time transactions and cannot tolerate latency, and require highly accurate results, would be better served working with a single vendor and buying an MDM solution.

About the Author

Marty Moseley serves as chief technology officer at Initiate Systems where he is responsible for the company’s strategic technology direction, development and future product evolution. Initiate Systems, Inc. is the leading provider of customer-centric master data management software for companies and government agencies that want to create the most complete, real-time views of people, households and organizations from data dispersed across multiple application systems and databases. He can be reached at and additional information on Initiate Systems is available at

More by Marty Moseley