Enterprise CIOs are increasingly concerned with the need to manage metadata – the information that describes the structure and content of their corporate data. Properly managed metadata enables rapid, even automatic, development of new application interfaces. Without it, interface development involves costly, handcrafted code, and enterprise agility through Service-Oriented Architectures is impossible. So the CIO must worry about the question of how to collect and manage metadata. For several years there was an accepted answer: to use a metadata registry as defined in ISO 11179. Latterly, this has been challenged by supporters of ontologies, with a claim that theirs is a superior approach. How accurate is this, and what strategy should a CIO follow?
Analysts are agreed on the growing importance of metadata management. Forrester’s Galen Schreck says that metadata has come to light as a critical element of automated information life-cycle management systems. According to Gartner’s Michael Blechar, the move to Java 2 Platform, Enterprise Edition and .NET service-oriented architectures means that organizations must understand model-based business processes and data architectures, and developers will need processes and tools that reuse and manage interrelated metadata across suites and environments. Blechar estimates that renewed interest in metadata management could result in the number of new repository purchases doubling from 2005 to 2010.
The established approach to metadata management is based on ISO 11179, a six-part International Standard for metadata registries. It describes what a metadata registry is, how to classify data, and how to store and manage descriptions of data. It assumes the established entity-relationship model that is associated with relational databases, the traditional data storage paradigm that is expected to continue to dominate for at least the next decade. Its basic data classification unit is the data element. Data elements can readily be identified in bottom-up fashion from enterprise database schemas and documentation. So, where there is a customer database with a “name” field, “customer name” is a data element. This is a straightforward way of capturing and managing metadata for existing and new applications within an enterprise.
Ontologies, by contrast, provide a top-down approach, in which concepts are identified and refined, and the relations between them are described. So the “customer” concept might be derived as a refinement of the “person” concept, and might inherit the “name” property from it (every person has a name, so every customer has a name). Ontologies are seen as providing a competing approach to that of ISO 11179 for metadata management. It is a new approach, but one that is rapidly gaining ground. By starting from the subject matter, rather than the implementation, it leads to metadata that is less product-specific and can more easily be understood across organizations.
There are established registry products based on ISO 11179, and at least one open source implementation. However, these products will not necessarily interoperate. Although they are based on a standard, that standard does not define interface formats and protocols. The closest thing to an accepted standard registry interface is the Java API for XML Registries (JAXR), which provides a uniform application programming interface (API) for accessing registries of different kinds. This is not part of ISO 11179, and it does not define information formats or interface protocols. It gives uniform access to different registries in a Java environment, but does not enable interoperability between heterogeneous registry implementations.
The ontology approach does not rely on registries. Ontology definitions can be made available as part of the Semantic Web, using the Resource Description Framework Schema (RDFS) or the richer Web Ontology Language (OWL). RDFS enables concepts and properties such as “person”, “name”, and “customer” to be identified, and basic relationships between them to be conveyed (such as that “customer” is a refinement of “person,” and “name” is a property of “person”). OWL goes further, and enables the detailed definition of some concepts in terms of others (for example that “customer” is “the subclass of person that has made a purchase”). RDFS and OWL do enable interoperation between different collections of metadata. OWL is particularly powerful in this respect, providing for global distribution of metadata, just as the ordinary Web provides for global distribution of data.
So, will we see ontologies and the semantic web displace the traditional ISO 11179 registries? Not necessarily, because the two approaches are more compatible than many people think. Indeed, they can be complementary.
How to combine the two approaches is demonstrated by the Cancer Data Standards Repository of the US National Cancer Institute (NCI). The NCI has created an ISO 11179 metadata repository to facilitate the harmonization and exchange of data in cancer-related domains such as clinical trials. It is an excellent, and publicly accessible, example of a thoroughly thought-through metadata management project. The data element definitions are driven by vocabularies and taxonomies developed by subject matter experts. This builds on the strengths of both approaches, with ISO 11179 providing the systematic classification of the data elements, and ontologies relating that classification to the subject matter clearly and precisely.
This will work within a single domain, but the lack of interface specifications within the ISO 11179 standard means that there is no guarantee that different domains will interoperate at the data element level. And interoperability is vital. As Schreck puts it, many products for archiving and content management create and use their own metadata today, but these databases must become part of a federated system because, if they do not, firms will be faced with a fragmented portfolio of information life-cycle management tools that have no common policies, management, or audit trail. And this is within the context of a single enterprise. The need for enterprises to share metadata to support Boundaryless Information Flow™ within a collaborative industry or supply chain, for example, magnifies the problem. The requirement for interoperability between metadata stores within and between enterprises is absolute.
ISO 11179 and ontologies are compatible approaches to metadata management, each with different strengths. It is possible to leverage the strengths of both, with implementation-level definitions organized in ISO 11179 registries and linked to subject-matter ontologies that make them meaningful across implementation domains. But this is not enough.
The Semantic Web standards can deliver federation and interoperability for ontologies. But interoperability is needed at the implementation level too, because this is the level at which data communication takes place. Including ontologies in the enterprise metadata management strategy will help but, until there is standards-based metadata interoperability at the implementation level, the problem will not be solved. Unfortunately, the CIO must continue to worry.
For more information, please contact Dr. Chris Harding at email@example.com.
About the Author
Dr. Chris Harding leads the SOA Working Group at The Open Group - an open forum of customers and suppliers of IT products and services. In addition, he is a Director of UDEF Forum, and manages The Open Groups work on semantic interoperability. He has been with The Open Group for over ten years.
Dr Harding began his career in communications software research and development. He then spent nine years as a consultant, specializing in voice and data communications, before moving to his current role.
Recognizing the importance of giving enterprises quality information at the point of use, Dr. Harding sees information interoperability as the next major challenge, and frequently speaks or writes on this topic. He is a regular contributor to ebizQ.
Dr Harding has a PhD in mathematical logic, and is a member of the British Computer Society (BCS) and of the Institute of Electrical and Electronics Engineers (IEEE).
The Open Group is a vendor-neutral and technology-neutral consortium, whose vision of Boundaryless Information Flow will enable access to integrated information within and between enterprises based on open standards and global interoperability. The Open Group works with customers, suppliers, consortia and other standard bodies. Its role is to capture, understand and address current and emerging requirements, establish policies and share best practices; to facilitate interoperability, develop consensus, and evolve and integrate specifications and open source technologies; to offer a comprehensive set of services to enhance the operational efficiency of consortia; and to operate the industry’s premier certification service. Further information on The Open Group can be found at http://www.opengroup.org.