Data Governance for SOA Success

Untitled Document

Data is any company's lifeblood. If the data can't be accessed, or is slow to be accessed, or is of poor quality when it arrives, the company pays the price. SOA provides access points for common functions so that they can be reused in multiple business processes throughout an enterprise, but the essence of what those processes are sharing is data. One of the key benefits of embarking on SOA is that you can treat data sources and applications that store and act on data as services and combine them into composite applications. This provides the company with unparalleled data access, efficiency and resiliency to change.



The problem: then you are dependent on the quality of the source data and may have limited insight into all the relevant definitions of and limitations of how it's been described. While services that participate in SOA are supposed to be self-describing, there are no standards for how deeply the real meaning of the data is described. As an example, if a customer's name is entered into the system, and an address requested, that data could easily reside in a dozen different data silos. Each one could have a slightly different view of what a customer means. While one would return a company's Texas location as the address for that customer, another might return the company's California address and still another might return the CEO's home address. Data governance, including management of metadata descriptions, is the key to knowing which address is the right one to return for this particular business process requestor and for hundreds of other similar situations.

Even that example assumes all three addresses are complete and correct. Another aspect of data governance is data quality monitoring. Data quality is of extreme importance to every business process and to a company's success as a whole. In a SOA, data quality is even more essential. Any errors in the data will be visible globally across the enterprise by any consumer that uses the service that pulls information from the faulty data source. Using the above example, if invoices, bills or products are consistently sent to the wrong address, the company will lose a lot of business. Data quality evaluations to find anomalous data, and manual or automated remediation of that data, must be an integral part of any really useful SOA plan. The consumer needs to be able to trust that the data they request from the service will be both correct and relevant to their current need.

Data as a Service

The need for the data to be relevant to the consumer might encourage a SOA designer to tightly couple the source of the data with the specific service that uses it. This can create a brittle situation if services dependent on a particular data source are unable to adapt when data sources change over time, or fail to give a global view of the data. A better way is for the data itself to become a service. Encapsulating the data into a service used by multiple processes helps with standardization and prevents data duplication. It also allows the data to slip seamlessly into the composite business processes and applications as just another service. This concept of Data as a Service provides data access to any business process in any part of the company, giving the most efficient possible data flow as well as resiliency over time.

However, delivering data as a service is a challenge in its own right. To participate in SOA flows, a robust delivery service, exposed in a way that's consumable as a service but also reliable, is an absolute requirement. And you must be able to combine data from multiple sources for it to be useful in a SOA context. If conventional code is used to connect all the hundreds or even thousands of data sources in an enterprise, limited SOA benefit will likely be seen in the long run because of the challenge and cost of maintaining those brittle connections.

To avoid that, data services should be built on a middleware platform that connects to many sources, preferably all of the data sources in the enterprise from the legacy mainframe COBOL application in the basement to the new SaaS CRM application in the cloud. Yes, it is possible to incorporate SaaS applications into an integrated SOA initiative. Internally, there will be less control over the SaaS application's metadata and content, but it can still be accommodated by a flexible integration platform. That integration layer should also adapt easily to change, since it will be the touch point for volatile information sources. It will have to be able to accommodate complex processes as a unit, and be compatible with standard SOA service technologies like SOAP, XML and WSDL. That may seem like a lot to ask, but integration platforms with that level of flexibility and power exist, and the cost of custom coding brittle end points, which must then be constantly repaired, far outweighs the license costs of a good tool.

Metadata Governance

Once that's addressed, you need to think about how to expose and manage relevant metadata. The essential task of metadata is to clarify for everyone using the Data as a Service what that data actually means. To return to the previous example, instead of simply having that record defined as "customer," it would make the data far more reliable and relevant if it were defined as, for instance person, with various fields or metadata properties that identified that person as among other things, a customer, with job title of CEO for corporation XYZ, which has two locations, one in Texas and one in California. Then, the issue of defining a location becomes relevant. The Texas branch might have multiple buildings and might be the manufacturing branch of that company, while the California branch is the service and support branch. This level of metadata would give the consumer what is needed in order to know which address to return to the requestor for an invoice, a support request or a letter to the CEO.

A related issue is granularity. If you present very fine-grained detailed services, users must contend with building their own complex flows. Adding a layer of more complex combined services that are themselves packaged as a service can give the consumer a robust, easy-to-use application. For example, serving up a complete order processing application would be far more useful than serving up each element separately. Similarly, consolidating data services will give a more global view of data and prevent duplication and conflicts between sources. From a data governance perspective, the more granular the data service, the more problematic governance will be. People will need to understand the different data elements and therefore be more liable to unintentional updates, misuse or insertion of incorrect data. With a less granular approach, appropriate integration technology can be leveraged to create more sophisticated, robust services that are more reliable in a SOA environment.

Data governance of the service metadata is also crucial, as that will be the main means to manage the plethora of existing services and any new services or composites that come along. A good metadata management tool, preferably one compatible with the integration middleware used to tie in the end points, will help with the most integral aspects of change management for data services as well as other services, lineage and impact analysis. Knowing how a particular value came to be, where it originated and what the impact would be on other systems of altering that data source can give businesses the edge they need to keep running smoothly.

Think Globally, Act Pragmatically

To bring all the needed elements of good data governance and solid infrastructure together, a SOA initiative should start with a good road map; a global plan for the vision of the enterprise level result desired, including authoritative data sources, flows, policies and governance. But trying to build all of the integration end points, service wrappers and messaging systems for an entire enterprise simultaneously isn't very practical. Global big bang projects have a historically high rate of failure.

The way to get immediate buy-in and immediate ROI is to show results quickly. Start with a single pressing business issue and solve it with that overall road map in mind. Leverage the integration middleware to create microflows that pull together existing, functional stop-gap integration measures with the more robust new technologies into cohesive processes, and then expose those as services. This prevents large amounts of time being spent replacing technologies that are working and allows the focus to fall on areas that genuinely need help. As time permits, the older point-to-point interfaces, stored procedures and other interim measures can be replaced as needed, and other business processes can be pulled into microflows and exposed as services. One step at a time, each step reaping benefit in its own right, will get the enterprise to the goal of an integrated SOA environment in the shortest possible time with the fastest return on investment. Just make certain that governance of that indispensable resource, the company's data, is at front of mind from day one.

About the Authors

David Inbar is Director of Marketing and International Alliances for Pervasive Integration Products. He has more than 20 years sales of marketing experience as a consultant and senior executive in the software industry in the U.S. and Europe. Inbar has extensive expertise in DBMS and Application Development and Process Management and holds an MBA and Masters in Electrical Engineering.

More by David Inbar

Paige Roberts is Technical Content Developer for Pervasive Integration Products. She has worked in the data integration industry for the past twelve years as a support technician, technical writer, trainer, software developer, and consultant.

More by Paige Roberts

About Pervasive Software

Pervasive Software (NASDAQ: PVSW) helps companies get the most out of their data investments through embeddable data management and agile data integration software. Pervasive's multi-purpose data integration platform accelerates the sharing of information between multiple data stores, applications, and hosted business systems and allows customers to re-use the same software for diverse integration scenarios. The embeddable PSQL database engine allows organizations to successfully embrace new technologies while maintaining application compatibility and robust database reliability in a near-zero database administration environment. For more than two decades, Pervasive products have delivered value to tens of thousands of customers in more than 150 countries with a compelling combination of performance, flexibility, reliability and low total cost of ownership. Through Pervasive Innovation Labs, the company also invests in exploring and creating cutting edge solutions for the toughest data analysis and data delivery challenges. For additional information, go to www.pervasive.com.