Types of Data Warehouses
There are different types of data warehouses, which are as follows:
Host-Based Data Warehouses
There are two types of host-based data warehouses which can be implemented:
- Host-Based mainframe warehouses which reside on a high volume database. Supported by robust and reliable high capacity structure such as IBM system/390, UNISYS and Data General sequent systems, and databases such as Sybase, Oracle, Informix, and DB2.
- Host-Based LAN data warehouses, where data delivery can be handled either centrally or from the workgroup environment. The size of the data warehouses of the database depends on the platform.
Data Extraction and transformation tools allow the automated extraction and cleaning of data from production systems. It is not applicable to enable direct access by query tools to these categories of methods for the following reasons:
- A huge load of complex warehousing queries would possibly have too much of a harmful impact upon the mission-critical transaction processing (TP)-oriented application.
- These TP systems have been developing in their database design for transaction throughput. In all methods, a database is designed for optimal query or transaction processing. A complex business query needed the joining of many normalized tables, and as result performance will usually be poor and the query constructs largely complex.
- There is no assurance that data in two or more production methods will be consistent.
Host-Based (MVS) Data Warehouses
Those data warehouse uses that reside on large volume databases on MVS are the host-based types of data warehouses. Often the DBMS is DB2 with a huge variety of original source for legacy information, including VSAM, DB2, flat files, and Information Management System (IMS).
Before embarking on designing, building and implementing such a warehouse, some further considerations must be given because
- Such databases generally have very high volumes of data storage.
- Such warehouses may require support for both MVS and customer-based report and query facilities.
- These warehouses have complicated source systems.
- Such systems needed continuous maintenance since these must also be used for mission-critical objectives.
To make such data warehouses building successful, the following phases are generally followed:
- Unload Phase: It contains selecting and scrubbing the operation data.
- Transform Phase: For translating it into an appropriate form and describing the rules for accessing and storing it.
- Load Phase: For moving the record directly into DB2 tables or a particular file for moving it into another database or non-MVS warehouse.
An integrated Metadata repository is central to any data warehouse environment. Such a facility is required for documenting data sources, data translation rules, and user areas to the warehouse. It provides a dynamic network between the multiple data source databases and the DB2 of the conditional data warehouses.
A metadata repository is necessary to design, build, and maintain data warehouse processes. It should be capable of providing data as to what data exists in both the operational system and data warehouse, where the data is located. The mapping of the operational data to the warehouse fields and end-user access techniques. Query, reporting, and maintenance are another indispensable method of such a data warehouse. An MVS-based query and reporting tool for DB2.
Host-Based (UNIX) Data Warehouses
Oracle and Informix RDBMSs support the facilities for such data warehouses. Both of these databases can extract information from MVS¬ based databases as well as a higher number of other UNIX¬ based databases. These types of warehouses follow the same stage as the host-based MVS data warehouses. Also, the data from different network servers can be created. Since file attribute consistency is frequent across the inter-network.
LAN-Based Workgroup Data Warehouses
A LAN based workgroup warehouse is an integrated structure for building and maintaining a data warehouse in a LAN environment. In this warehouse, we can extract information from a variety of sources and support multiple LAN based warehouses, generally chosen warehouse databases to include DB2 family, Oracle, Sybase, and Informix. Other databases that can also be contained through infrequently are IMS, VSAM, Flat File, MVS, and VH.
Designed for the workgroup environment, a LAN based workgroup warehouse is optimal for any business organization that wants to build a data warehouse often called a data mart. This type of data warehouse generally requires a minimal initial investment and technical training.
Data Delivery: With a LAN based workgroup warehouse, customer needs minimal technical knowledge to create and maintain a store of data that customized for use at the department, business unit, or workgroup level. A LAN based workgroup warehouse ensures the delivery of information from corporate resources by providing transport access to the data in the warehouse.
Host-Based Single Stage (LAN) Data Warehouses
Within a LAN based data warehouse, data delivery can be handled either centrally or from the workgroup environment so business groups can meet process their data needed without burdening centralized IT resources, enjoying the autonomy of their data mart without comprising overall data integrity and security in the enterprise.
Limitations
Both DBMS and hardware scalability methods generally limit LANÂ based warehousing solutions.
Many LAN based enterprises have not implemented adequate job scheduling, recovery management, organized maintenance, and performance monitoring methods to provide robust warehousing solutions.
Often these warehouses are dependent on other platforms for source record. Building an environment that has data integrity, recoverability, and security require careful design, planning, and implementation. Otherwise, synchronization of transformation and loads from sources to the server could cause innumerable problems.
A LAN based warehouse provides data from many sources requiring a minimal initial investment and technical knowledge. A LAN based warehouse can also work replication tools for populating and updating the data warehouse. This type of warehouse can include business views, histories, aggregation, versions in, and heterogeneous source support, such as
- DB2 Family
- IMS, VSAM, Flat File [MVS and VM]
A single store frequently drives a LAN based warehouse and provides existing DSS applications, enabling the business user to locate data in their data warehouse. The LAN based warehouse can support business users with complete data to information solution. The LAN based warehouse can also share metadata with the ability to catalog business data and make it feasible for anyone who needs it.
Multi-Stage Data Warehouses
It refers to multiple stages in transforming methods for analyzing data through aggregations. In other words, staging of the data multiple times before the loading operation into the data warehouse, data gets extracted form source systems to staging area first, then gets loaded to data warehouse after the change and then finally to departmentalized data marts.
This configuration is well suitable to environments where end-clients in numerous capacities require access to both summarized information for up to the minute tactical decisions as well as summarized, a commutative record for long-term strategic decisions. Both the Operational Data Store (ODS) and the data warehouse may reside on host-based or LAN Based databases, depending on volume and custom requirements. These contain DB2, Oracle, Informix, IMS, Flat Files, and Sybase.
Usually, the ODS stores only the most up-to-date records. The data warehouse stores the historical calculation of the files. At first, the information in both databases will be very similar. For example, the records for a new client will look the same. As changes to the user record occur, the ODs will be refreshed to reflect only the most current data, whereas the data warehouse will contain both the historical data and the new information. Thus the volume requirement of the data warehouse will exceed the volume requirements of the ODS overtime. It is not familiar to reach a ratio of 4 to 1 in practice.
Stationary Data Warehouses
In this type of data warehouses, the data is not changed from the sources, as shown in fig:
Instead, the customer is given direct access to the data. For many organizations, infrequent access, volume issues, or corporate necessities dictate such as approach. This schema does generate several problems for the customer such as
- Identifying the location of the information for the users
- Providing clients the ability to query different DBMSs as is they were all a single DBMS with a single API.
- Impacting performance since the customer will be competing with the production data stores.
Such a warehouse will need highly specialized and sophisticated ‘middleware’ possibly with a single interaction with the client. This may also be essential for a facility to display the extracted record for the user before report generation. An integrated metadata repository becomes an absolute essential under this environment.
Distributed Data Warehouses
The concept of a distributed data warehouse suggests that there are two types of distributed data warehouses and their modifications for the local enterprise warehouses which are distributed throughout the enterprise and a global warehouses as shown in fig:
Characteristics of Local data warehouses
- Activity appears at the local level
- Bulk of the operational processing
- Local site is autonomous
- Each local data warehouse has its unique architecture and contents of data
- The data is unique and of prime essential to that locality only
- Majority of the record is local and not replicated
- Any intersection of data between local data warehouses is circumstantial
- Local warehouse serves different technical communities
- The scope of the local data warehouses is finite to the local site
- Local warehouses also include historical data and are integrated only within the local site.
Virtual Data Warehouses
Virtual Data Warehouses is created in the following stages:
- Installing a set of data approach, data dictionary, and process management facilities.
- Training end-clients.
- Monitoring how DW facilities will be used
- Based upon actual usage, physically Data Warehouse is created to provide the high-frequency results
This strategy defines that end users are allowed to get at operational databases directly using whatever tools are implemented to the data access network. This method provides ultimate flexibility as well as the minimum amount of redundant information that must be loaded and maintained. The data warehouse is a great idea, but it is difficult to build and requires investment. Why not use a cheap and fast method by eliminating the transformation phase of repositories for metadata and another database. This method is termed the ‘virtual data warehouse.’
To accomplish this, there is a need to define four kinds of data:
- A data dictionary including the definitions of the various databases.
- A description of the relationship between the data components.
- The description of the method user will interface with the system.
- The algorithms and business rules that describe what to do and how to do it.
Disadvantages
- Since queries compete with production record transactions, performance can be degraded.
- There is no metadata, no summary record, or no individual DSS (Decision Support System) integration or history. All queries must be copied, causing an additional burden on the system.
- There is no refreshing process, causing the queries to be very complex.