Data Warehouse Design
A data warehouse is a single data repository where a record from multiple data sources is integrated for online business analytical processing (OLAP). This implies a data warehouse needs to meet the requirements from all the business stages within the entire organization. Thus, data warehouse design is a hugely complex, lengthy, and hence error-prone process. Furthermore, business analytical functions change over time, which results in changes in the requirements for the systems. Therefore, data warehouse and OLAP systems are dynamic, and the design process is continuous.
Data warehouse design takes a method different from view materialization in the industries. It sees data warehouses as database systems with particular needs such as answering management related queries. The target of the design becomes how the record from multiple data sources should be extracted, transformed, and loaded (ETL) to be organized in a database as the data warehouse.
There are two approaches
- “top-down” approach
- “bottom-up” approach
Top-down Design Approach
In the “Top-Down” design approach, a data warehouse is described as a subject-oriented, time-variant, non-volatile and integrated data repository for the entire enterprise data from different sources are validated, reformatted and saved in a normalized (up to 3NF) database as the data warehouse. The data warehouse stores “atomic” information, the data at the lowest level of granularity, from where dimensional data marts can be built by selecting the data required for specific business subjects or particular departments. An approach is a data-driven approach as the information is gathered and integrated first and then business requirements by subjects for building data marts are formulated. The advantage of this method is which it supports a single integrated data source. Thus data marts built from it will have consistency when they overlap.
Advantages of top-down design
Data Marts are loaded from the data warehouses.
Developing new data mart from the data warehouse is very easy.
Disadvantages of top-down design
This technique is inflexible to changing departmental needs.
The cost of implementing the project is high.
Bottom-Up Design Approach
In the “Bottom-Up” approach, a data warehouse is described as “a copy of transaction data specifical architecture for query and analysis,” term the star schema. In this approach, a data mart is created first to necessary reporting and analytical capabilities for particular business processes (or subjects). Thus it is needed to be a business-driven approach in contrast to Inmon’s data-driven approach.
Data marts include the lowest grain data and, if needed, aggregated data too. Instead of a normalized database for the data warehouse, a denormalized dimensional database is adapted to meet the data delivery requirements of data warehouses. Using this method, to use the set of data marts as the enterprise data warehouse, data marts should be built with conformed dimensions in mind, defining that ordinary objects are represented the same in different data marts. The conformed dimensions connected the data marts to form a data warehouse, which is generally called a virtual data warehouse.
The advantage of the “bottom-up” design approach is that it has quick ROI, as developing a data mart, a data warehouse for a single subject, takes far less time and effort than developing an enterprise-wide data warehouse. Also, the risk of failure is even less. This method is inherently incremental. This method allows the project team to learn and grow.
Advantages of bottom-up design
Documents can be generated quickly.
The data warehouse can be extended to accommodate new business units.
It is just developing new data marts and then integrating with other data marts.
Disadvantages of bottom-up design
the locations of the data warehouse and the data marts are reversed in the bottom-up approach design.
Differentiate between Top-Down Design Approach and Bottom-Up Design Approach
Top-Down Design Approach | Bottom-Up Design Approach |
---|---|
Breaks the vast problem into smaller subproblems. | Solves the essential low-level problem and integrates them into a higher one. |
Inherently architected- not a union of several data marts. | Inherently incremental; can schedule essential data marts first. |
Single, central storage of information about the content. | Departmental information stored. |
Centralized rules and control. | Departmental rules and control. |
It includes redundant information. | Redundancy can be removed. |
It may see quick results if implemented with repetitions. | Less risk of failure, favorable return on investment, and proof of techniques. |