What is Meta Data?
Metadata is data about the data or documentation about the information which is required by the users. In data warehousing, metadata is one of the essential aspects.
Metadata includes the following:
- The location and descriptions of warehouse systems and components.
- Names, definitions, structures, and content of data-warehouse and end-users views.
- Identification of authoritative data sources.
- Integration and transformation rules used to populate data.
- Integration and transformation rules used to deliver information to end-user analytical tools.
- Subscription information for information delivery to analysis subscribers.
- Metrics used to analyze warehouses usage and performance.
- Security authorizations, access control list, etc.
Metadata is used for building, maintaining, managing, and using the data warehouses. Metadata allow users access to help understand the content and find data.
Several examples of metadata are:
- A library catalog may be considered metadata. The directory metadata consists of several predefined components representing specific attributes of a resource, and each item can have one or more values. These components could be the name of the author, the name of the document, the publisher’s name, the publication date, and the methods to which it belongs.
- The table of content and the index in a book may be treated metadata for the book.
- Suppose we say that a data item about a person is 80. This must be defined by noting that it is the person’s weight and the unit is kilograms. Therefore, (weight, kilograms) is the metadata about the data is 80.
- Another examples of metadata are data about the tables and figures in a report like this book. A table (which is a record) has a name (e.g., table titles), and there are column names of the tables that may be treated metadata. The figures also have titles or names.
Why is metadata necessary in a data warehouses?
- First, it acts as the glue that links all parts of the data warehouses.
- Next, it provides information about the contents and structures to the developers.
- Finally, it opens the doors to the end-users and makes the contents recognizable in their terms.
Metadata is Like a Nerve Center. Various processes during the building and administering of the data warehouse generate parts of the data warehouse metadata. Another uses parts of metadata generated by one process. In the data warehouse, metadata assumes a key position and enables communication among various methods. It acts as a nerve centre in the data warehouse.
Figure shows the location of metadata within the data warehouse.
Types of Metadata
Metadata in a data warehouse fall into three major parts:
- Operational Metadata
- Extraction and Transformation Metadata
- End-User Metadata
Operational Metadata
As we know, data for the data warehouse comes from various operational systems of the enterprise. These source systems include different data structures. The data elements selected for the data warehouse have various fields lengths and data types.
In selecting information from the source systems for the data warehouses, we divide records, combine factor of documents from different source files, and deal with multiple coding schemes and field lengths. When we deliver information to the end-users, we must be able to tie that back to the source data sets. Operational metadata contains all of this information about the operational data sources.
Extraction and Transformation Metadata
Extraction and transformation metadata include data about the removal of data from the source systems, namely, the extraction frequencies, extraction methods, and business rules for the data extraction. Also, this category of metadata contains information about all the data transformation that takes place in the data staging area.
End-User Metadata
The end-user metadata is the navigational map of the data warehouses. It enables the end-users to find data from the data warehouses. The end-user metadata allows the end-users to use their business terminology and look for the information in those ways in which they usually think of the business.
Metadata Interchange Initiative
The metadata interchange initiative was proposed to bring industry vendors and user together to address a variety of severe problems and issues concerning exchanging, sharing, and managing metadata. The goal of metadata interchange standard is to define an extensible mechanism that will allow the vendor to exchange standard metadata as well as carry along “proprietary” metadata. The founding members agreed on the following initial goals:
- Creating a vendor-independent, industry-defined, and maintained standard access mechanisms and application programming interfaces (API) for metadata.
- Enabling users to control and manage the access and manipulation of metadata in their unique environment through the use of interchange standards-compliant tools.
- Users are allowed to build tools that meet their needs and also will enable them to adjust accordingly to those tools configurations.
- Allowing individual tools to satisfy their metadata requirements freely and efficiently within the content of an interchange model.
- Describing a simple, clean implementation infrastructure which will facilitate compliance and speed up adoption by minimizing the amount of modification.
- To create a procedure and process not only for maintaining and establishing the interchange standard specification but also for updating and extending it over time.
Metadata Interchange Standard Framework
Interchange standard metadata model implementation assumes that the metadata itself may be stored in storage format of any type: ASCII files, relational tables, fixed or customized formats, etc.
It is a framework that is based on a framework that will translate an access request into the standard interchange index.
Several approaches have been proposed in metadata interchange coalition:
- Procedural Approach
- ASCII Batch Approach
- Hybrid Approach
In a procedural approach, the communication with API is built into the tool. It enables the highest degree of flexibility.
In ASCII Batch approach, instead of relying on ASCII file format which contains information of various metadata items and standardized access requirements that make up the interchange standards metadata model.
In the Hybrid approach, it follows a data-driven model.
Components of Metadata Interchange Standard Frameworks
1) Standard Metadata Model: It refers to the ASCII file format, which is used to represent metadata that is being exchanged.
2) The standard access framework that describes the minimum number of API functions.
3) Tool profile, which is provided by each tool vendor.
4) The user configuration is a file explaining the legal interchange paths for metadata in the user’s environment.
Metadata Repository
The metadata itself is housed in and controlled by the metadata repository. The software of metadata repository management can be used to map the source data to the target database, integrate and transform the data, generate code for data transformation, and to move data to the warehouse.
Benefits of Metadata Repository
- It provides a set of tools for enterprise-wide metadata management.
- It eliminates and reduces inconsistency, redundancy, and underutilization.
- It improves organization control, simplifies management, and accounting of information assets.
- It increases coordination, understanding, identification, and utilization of information assets.
- It enforces CASE development standards with the ability to share and reuse metadata.
- It leverages investment in legacy systems and utilizes existing applications.
- It provides a relational model for heterogeneous RDBMS to share information.
- It gives useful data administration tool to manage corporate information assets with the data dictionary.
- It increases reliability, control, and flexibility of the application development process.