ETL Testing Interview Questions
A list of frequently asked ETL Testing Interview Questions and Answers are given below.
1) What is ETL? Explain it.
ETL stands for Extraction, Transformation, and Loading. It is an essential concept in Data Warehousing systems. There are three basics steps in Data Integration Process. Extraction stands for extracting the data from different data sources such as transactional systems or applications. Transformation stands to apply the conversion rules on data so that it becomes suitable for analytical reporting. Loading process involves, to move the data into the target system, i.e., Data Warehouse.
2) Explain the concept of Extraction, Transformation, and Loading?
Extraction
Extracted the data from an external source and move it to the data Warehouse pre-processor database.
Transformation
Transform data task allows point to point generating, modifying, and transforming the data.
Loading
In this task, the data is added to the database table in a warehouse.
3) What is the three-layer architecture of an ETL cycle?
The three layers in the ETL are:
- Staging Layer: Staging layer is used to store the data which is extracted from the different data source systems.
- Data Integration Layer: Integration layer transforms the data from the staging layer and moves the data to a database. In the database, the data is arranged into hierarchical groups, which is often called dimension, and into facts and aggregation facts. The combination of facts and dimension table in a data warehouse system is called a schema.
- Access Layer: Access layer is used by the end-users to retrieve the data for analytical reporting.
4) What is BI?
Business Intelligence is the process for collecting raw business data and transforming it into a meaningful vision that is more useful for business.
5) What are the differences between ETL and BI tools?
ETL TOOLS | BI TOOLS |
---|---|
The ETL tools are used to extract the data from different data sources, transform the data, and load it into a data warehouse system. | BI tools are used to generate interactive and ad-hoc reports for end-users, data visualization for monthly, quarterly, and annual board meetings. |
Most commonly ETL tools are Informatica, SAP BO data service, Microsoft SSIS, Oracle Data Integrator (ODI) Clover ETL Open Source, etc. | Most commonly BI tools are SAP Lumira, IBM Cognos, Microsoft BI platform, Tableau, Oracle Business Intelligence Enterprise Edition, etc. |
6) What are the ETL tools available in the market?
The popular ETL tools available in the market are:
- IBM- Websphere DataStage
- Informatica- Power Center
- SAP- Business objects data service BODS
- SAS – Data Integration Studio
- Oracle- Warehouse Builder
- Open source Clover ETL.
7) When we need the staging area in the ETL process?
Staging area is a central area which is available between the data sources and data warehouse/data marts systems. It is a place where data is stored temporarily in the process of data integration. In the staging, area data is cleansed and checked for any duplication. The staging area is designed to provide many benefits, but the primary goal is to use the staging area. It is used to increase efficiency, ensure the data integrity, and support the data quality operations.
8) What is the difference between the data warehouse and data mining?
Data warehousing is a broad concept as compared to data mining. Data Mining involves extracting the hidden information from the data and interpreting it for future forecasting. In contrast, data warehousing includes operations such as analytical reporting to generate detailed reports and ad-hoc reports, information processing to generate interactive dashboards and charts.
9) What are the differences between data warehousing and data mining?
OLTP | OLAP |
---|---|
OLTP stands for Online Transactional Processing. | OLAP stands for Online Analytical Processing. |
OLTP is a relational database, and it is used to manage the day to day transaction. | OLAP is a multidimensional system, and it is also called a data warehouse. |
10) What is a dimension table and how it is different from the fact table?
Here, we are taking an example to describe how the dimension table is distinguishing from the fact table.
Suppose a company sells its products to its customer. Every sale is a fact which occurs within the company, and the fact table is used to record these facts. Each fact table stores the primary key that joins fact table with the dimension table and measures/ facts.
Example: Fact Units
Cust_ID | Prod_ID | Time_ID | No. of units sold |
---|---|---|---|
101 | 24 | 1 | 25 |
102 | 25 | 2 | 15 |
103 | 26 | 3 | 30 |
A dimension table which store attributes or dimensions describe the objects in a fact table. It is a set of companion tables to a fact table.
Cust_ID | Cust_Name | Gender |
---|---|---|
101 | Sana | F |
102 | Jass | M |
11) What is a Data Mart?
Data Mart is a simple form of Data Warehouse, and it is focused on a single functional area. It gets the only from few sources.
For example: In an organization, data marts may exist for marketing, finance, human resource, and other individual departments which stores the data related to their specific functions.
12) What is the difference between Manual Testing and ETL Testing?
The difference between Manual testing and ETL testing is:
- Manual testing focuses on the functionality of the program while the ETL testing is related to database and its count.
- ETL is the automated testing process where we do not need any technical knowledge. ETL testing is extremely faster, systematic, and assurance of the result required by the business.
- Manual testing is a time-consuming process where we need the technical knowledge to write the test cases and scripts. It is slow, highly prone to errors, and also need efforts.
13) What is ETL Testing?
ETL stands for Extraction, Transform, and Loading the information. ETL testing is done to ensure that the data is loaded from different source to destination after the accurately business transformation. It involves data verification at multiple stages that are being used between the source and the destination.
14) What is the responsibility of ETL tester?
The responsibility of ETL Tester is divided into three major categories:
- Stage Tables
- Business Logic Transformation
- Target table loading from the staging table, once we apply the transformation.
Responsibilities of ETL tester are:
- ETL tester tests the ETL software thoroughly.
- The tester will check the test component of the ETL Data Warehouse.
- The tester will execute the data-driven test in the backend.
- The tester creates the design and executes the test cases, test plans or test harness, etc.
- Tester identifies the problems and will suggest the best solution also.
- Tester approves the requirements and design specification.
- Tester transfers the data from flat files.
- They write the SQL queries for the different test scenario.
15) What is the need for ETL Testing?
In today’s time, we are migrating the lots of system from old technology to new technology. At the time of migration activities, we also need to migrate the data as well from old DBMS to latest DBMS. So there is a lot of need to test whether the data is correct from the target side.
Here, are some important points where the need for ETL testing is arising:
- ETL testing used to keep an eye on the data which is being transferred from one system to another.
- The need for ETL testing is to keep a track on the efficiency and speed of the process.
- The need for ETL testing is arising to be familiar with the ETL process before we implement it into our business and production.
16) Where the user use ETL concepts? Explain it.
- Before ETL tools user writes the extended code for data transformation to data loading.
- ETL makes life more comfortable, and one tool manages all the scenarios of transformation and loading of the data.
Here is the following example where we are using the ETL:
Example: Data Warehousing
ETL is used in data warehousing concept. Here, we need to fetch the data from multiple different systems and loads it in the data warehouse database. ETL concept is used here to extract the data from the source, transform the data, and load it into the target system.
Example: Data Migration
Data migrations are a difficult task if we are using PLSQL. If we want to migrate the data using a simple way, we will use different ETL tools.
Example: Mergers and Acquisitions
In today’s time, lots of companies are merging into different MNCs. To move the data from one company to another, the need for ETL concept arises.
17) How we use ETL in third party management?
The big organization always gives different application development to different kind of vendors. A single vendor cannot manage everything. Here we are taking an example of a telecommunication project where billing is handled by one company, and another company manages CRM. If CRM company needs the data from the company, who is managing the billing, now the company will receive the data feed from other company. To load the data from the ETL process is used.
18) How we use ETL in Data Warehousing?
Most commonly, the ETL used in Data Warehousing. User fetches the historical data as well as current data for developing the data warehouse. Data in the data warehouse is the combination of historical data as well as transactional data. Data Source of data warehouse might be different. We need to fetch the data from multiple different systems and load it into a single target system, which is also called a data warehouse.
19) What is the difference between ETL Testing and Database Testing?
The differences between the ETL testing and Database testing are:
ETL Testing | Database Testing |
---|---|
In ETL testing, the goal is the reporting of business intelligence | In DB testing, the goal is to integrate the data. |
The flow of business environment is based on the data used earlier | Database Testing applies to business flow systems only. |
The tools Informatica, Query Surge, Cognos can be used. | In DB testing, the QTP and Selenium tools are used. |
In ETL testing, Dimensional model is used. | In DB testing, relational model is used. |
In ETL testing, Analytics are processed. | In DB testing, Transactions are processed. |
Denormalized data is used in ETL testing. | .Normalized data is used. |
20) What are the characteristics of Data Warehouse?
- Data Warehouse is a database which is different from the operational database and stores the historical data.
- Data Warehouse Database contains the analytical as well as transactional data.
- Data Warehouse is used for data analysis and reporting purpose.
- Data Warehouse helps the higher management to take strategic and tactical decisions using historical or current data.
- Data Warehouse helps the business user to the current trend to run the business.
21) What are the types of Data Warehouse systems?
- Online Analytical Processing (OLAP)
- Predictive Analysis
- Online Transactional Processing
- Data Mart
22) What are the steps followed in ETL testing process?
The different steps followed in ETL testing process are:
Step 1. Requirement Analyzing
In this step, we understand the business structure and the requirement.
Step 2. Validation and Test Estimation
An estimation of time and expertise is required in this step.
Step 3. Test Planning and designing the testing environment
This step is based on the validation and test estimation. In this step, the environment of ETL is planned according to the input which is used in the test estimation and worked according to that.
Step 4. Test Data Preparation and Execution
As per the test, data is prepared and executed as per the requirement.
Step 5. Summary Report
On the completion of the test run, a summary report is prepared for concluding and improvising.
23) How is ETL used in Data Migration Project? Explain it.
ETL tools are generally used in Data Migration Project. If any organization is managing the data in Oracle 10g previously, now the organization wants to use SQL server cloud database, then there is a need to move the data from source to target. For this kind of movement, ETL tools are very useful. If we want to write the code for ETL, it is a very time-consuming process. To make this simple, we use ETL tool, which makes the coding simple PL SQL or T- SQL code. So the ETL process is useful in Data Migration Projects.
24) What are the steps followed to choose the ETL process?
It is a very difficult task to choose the ETL tools. To select the correct ETL tool, we need to consider a lot of factors according to the project. To choose the ETL tool for a specific project is a very strategic move, even we need it for a small project.
Here are some points which will help us to choose the ETL tool.
- Data Connectivity
To choose the ETL tool, we will focus on how the ETL tool should communicate with any source of data no matter where the data comes. Data connectivity is very critical. - Performance
To move and change the data requires some serious processing power. So here, we need to check the performance factor. - Transformation Flexibility
Merging, Matching, and Changing the data is very critical. ETL tools should provide all these Mergings, Matching and Changing operations and many transformation packages. It allows the modification to the data in the transformation phase with simple drag and drop. - Data Quality
We can take advantage of the data only when the data is clean and consistent. - Flexible data action option
When the ETL is ready, we need to check that ETL will work on previous data as well as new coming data. - Committed ETL vendor
We are working with the organization data while we are doing the ETL process. So here we have to choose the vendor who is aware of the industry and whose support will be beneficial.
25) What are the ETL bugs?
Here are the following ETL bugs:
- Source Bugs
- Load Condition Bugs
- Calculation Bugs
- ECP related Bugs
- User-Interface Bugs
26) What is Operation Data Source?
- ODS stands for Operational Data Source.
- ODS works between the staging area and the Data Warehouse. The data is ODS will be at the level of granularity.
- When the data is inserted in ODS, all the data will be loaded in the EDW through ODS.
27) What is the data extraction phase in ETL?
Data Extraction is nothing, but it is extracting the data from multiple different sources using ETL tools.
Here are two types of data extraction.
- Full Extraction: All extracted data from an operational system or source system load to the staging area.
- Partial Extraction: Sometimes, we get the notification from the source system to update the specific data. It is called Delta Load.
Source System Performance: The extraction strategies of data should not affect the performance of the source system.
28) What are the ETL Tools?
The popular tools are:
1. Enterprise ETL tools
- Informatica
- Talend
- IBM Datastage
- Abnitio
- MS SQL Server Integration service
- Clover ETL
2. Open Source ETL tools
- Pentaho
- Kettle
29) What is partitioning in ETL?
Transactions are always needed to be divided for better performance. The same processes are known as Partitioning. It merely makes sure that the server can directly access the sources through multiple connections.
30) What is ETL Pipeline?
ETL Pipeline refers to a set of processes to extract the data from one system, transform it, and load it into some database or data warehouse. ETL pipelines are built for data warehousing applications, which includes both enterprise data warehouse as well as subject-specific data marts. ETL pipelines are also used for data migration solutions. Data warehouse/ business intelligence engineers build ETL pipelines.
31) What is the Data Pipeline?
Data Pipeline refers to any set of processes elements that move data from one system to another. Data Pipeline can be built for any kind of application which uses data to bring the value. It can be used for integrating the data across the applications, build the data-driven web products and carrying out the data mining activities. Data engineers build the data pipeline.
32) What is the staging place in the ETL Testing?
Staging place is the temporary storage area that is used during the data integration process. In this place, data is analyzed carefully for redundancy and duplication.
33) What is ETL mapping sheet? Define its significance.
ETL mapping sheet contains all the necessary information from the source file and stores the details in rows and column. Mapping sheets help in writing the SQL queries to speed up the testing process.
34) What is the transformation in ETL Testing?
- Transformation is defined as the archive objects to generate, modify, or pass the data. Transformation can be Active or passive. Transformation is beneficial in many ways.
- It helps in getting values very quickly.
- The transformation can update the slowly changing dimension table.
- It checks or verifies whether the record exists or not inside the table.
35) What is the use of dynamic cache and static cache in transformation?
Dynamic cache is used to update the dimension or master table slowly. The static cache is used in flat files.
36) What is a mapping, Session, Worklet, and Mapplet?
- Mapping: Mapping represents workflow from source to target.
- Workflow: Workflow is a set of instructions which tells the Informatica server how to execute the tasks.
- Mapplet: Mapplet configures or creates a set of transformation.
- Worklet: It is an object that represents a set of tasks.
- Session: Session is a set of instructions that describe how and when to move the data from sources to target.
37) What is full load and incremental or refresh load?
Full Load: Full load completely erase the content of one or more tables and reload with fresh data.
Incremental Load: In this, we apply the ongoing changes to one or more table, which is based on a predefined schedule.
38) What are joiner and lookup?
The joiner is used to join two or more tables to retrieve the data from tables.
Lookup is used to check and compare the source table and the target table.
39) What is data purging?
Data Purging is a term that is commonly used to describe the methods which remove and permanently erase the data from a storage space. In other words, it can be defined as deleting the data from the data warehouse is known as data purging. Usually, we have to clean up the junk data like rows which have null values or spaces. Data Purging is the process of cleaning the junk values.
40) What is the difference between ETL tools and OLAP tools?
ETL Tools is meant for extraction the data from the legacy system and load it into the specified database with some process of cleansing data.
For example: Informatica, data stage etc.
OLAP Tools: It is used for reporting purpose in OLAP data available in the multidirectional model. We can write a simple query to extract the data from the database.
Example: Business object, Cognos, etc.