Informatica IDQ
Informatica Data Quality is a suite of applications and components that we can integrate with Informatica PowerCenter to deliver enterprise-strength data quality capability in a wide range of scenarios.
The IDQ has the following core components such as:
- Data Quality Workbench
- Data Quality Server
Data Quality Workbench: It is used to design, test, and deploy data quality processes. Workbench allows testing and executing plans as needed, enabling rapid data investigation and testing of data quality methodologies.
Data Quality Server: It is used to enable plan and file sharing and to run programs in a networked environment. The Data Quality Server supports networking through service domains and communicates with Workbench over TCP/IP.
Both Workbench and Server install with a Data Quality engine and a Data Quality repository. Users cannot create or edit programs with Server, although users can run a program to any Data Quality engine independently of Workbench by runtime commands or from PowerCenter.
Users can apply parameter files, which modify program operations, to runtime commands when running data quality projects to a Data Quality engine. Informatica also provides a Data Quality Integration plug-in for PowerCenter.
In Data Quality, a project is a self-contained set of data analysis or data enhancement processes.
A project is composed of one or more of the following types of component, such as:
- Data sources provide the input data for the program.
- Data sinks collect the data output from the program.
- Operational components perform the data analysis or data enhancement actions on the data they receive.
IDQ has been a front runner in the Data Quality (DQ) tools market. It will provide a glance at the features these tools offer.
IDQ has two type variants, such as:
- Informatica Analyst
- Informatica Developer
Informatica analyst: It is a web-based tool that can be used by business analysts & developers to analyze, profile, cleanses, standardize & scorecard data in an enterprise.
Informatica developer: It is a client-based tool where developers can create mappings to implement data quality transformations or services. This tool offers an editor where objects can be built with a wide range of data quality transformations such as Parser, standardizer, address validator, match-merge, etc.
Develop once & deploy anywhere: Both tools can be used to create DQ rules or mappings and can be implemented as web services. Once the DQ transformations are deployed as services, they can be used across the enterprise and platforms.
Role of Dictionaries
Projects can make use of reference dictionaries to identify, repair, or remove inaccurate or duplicate data values. Informatica Data Quality projects can make use of three types of reference data.
Standard dictionary files: These files are installed with Informatica Data Quality and can be used by various kinds of the component in Workbench. All dictionaries installed with Data Quality are text dictionaries. These are plain-text files saved in .DIC file format. They can be manually created and edited.
Database dictionaries: Informatica Data Quality users with database expertise can design and specify dictionaries that are linked to database tables, and that this can be updated dynamically when the underlying data is updated.
Third-party reference data: These data files are provided by third parties and are provided by Informatica customers as premium product options. The reference data provided by third-party vendors are typically in database format.
How to Integrate IDQ with MDM
Data cleansing and standardization is an essential aspect of any MDM project. Informatica MDM Multi-Domain Edition (MDE) provides a reasonable number of cleansing functions out-of-the-box. However, there are requirements when the OOTB cleanse functions are not enough, and there is a need for comprehensive functions to achieve data cleansing and standardization, e.g., address validation, sequence generation. The Informatica Data Quality (IDQ) provides an extensive array of cleansing and standardization options. IDQ can easily be used along with Informatica MDM.
There are three methods to integrate IDQ with Informatica MDM.
- Informatica Platform staging
- IDQ Cleanse Library
- Informatica MDM as target
1. Informatica Platform Staging
Starting with Informatica MDM’s Multi-Domain Edition (MDE) version 10.x, Informatica has introduced a new feature called “Informatica Platform Staging” within MDM to integrate with IDQ (Developer Tool). This feature enables to direct stage or cleanse data using IDQ mappings to MDM’s Stage tables bypassing Landing tables.
Advantages
- Stage tables are immediately available to use in the Developer tool after synchronization, eliminating the need to manually create physical data objects.
- Changes to the synchronized structures are reflected in the Developer tool automatically.
- Enables loading data into Informatica MDM’s staging tables, bypassing the landing tables.
Disadvantages
- Creating a connection for each Base Object folder in the Developer tool can be inconvenient to maintain.
- Hub Stage options like Delta detection, hard delete detection, and audit trails are not available.
- System generated columns need to be populated manually.
- Rejected records are not captured in the _REJ table of the corresponding stage table but get caught in .bad file.
- Invalid lookup values are not rejected while data loads to stage, unlike in the Hub Stage Process. The record with invalid value gets rejected and captured by the Hub Load process.
2. IDQ Cleanse Library
IDQ allows us to create functions as operation mappings and deploys them as web service, which can then be imported in Informatica MDM Hub implementation as a new type of cleansing library defined as IDQ cleanse library. This functionality allows usage of the imported IDQ cleanse functions, just like any other out-of-the-box cleanse function. Informatica MDM Hub acts as a Web service client application that consumes IDQ’s web services.
Advantages
- Quickly build transformations in IDQ’s Informatica Developer tool rather than creating complex java functions.
- Unlike Informatica Platform staging, Hub Stage process options such as delta detection, hard delete detection, audit trail are available for use.
Disadvantages
- Physical data objects need to be manually created for each staging table and manually updated for any changes to the table.
- IDQ function must contain all transformation logic to leverage the batching of records. If any transformation logic is additionally defined in the MDM map, then calls to the IDQ web service will be a single record leading to performance issues.
- Web service invocations are synchronous only, which can be a concern for large data volume.
3. Informatica MDM as target
3.1 Loading data landing tables
Informatica MDM can be used as a target for loading the data to landing tables in Informatica MDM.
Advantages
- The single connection created in the Developer tool for Informatica MDM is less cumbersome when compared to creating multiple connections with Informatica platform staging.
- No need to standardize data in the Hub Stage Process.
- Unlike Informatica Platform staging, Hub Stage process options – delta detection, hard delete detection, audit trail are available to use.
Disadvantages
- Physical data objects need to be manually created for each landing table and manually updated for any changes to the table.
- Need to develop mappings at two levels (i) source to landing and (ii) landing to staging (direct mapping).
3.2 Loading data staging tables (bypassing landing tables)
Informatica MDM can be used as a target for loading the directly to staging tables in Informatica MDM, bypassing landing tables.
Advantages
- The single connection created in the Developer tool for Informatica MDM is less cumbersome when compared to creating multiple connections with Informatica platform staging.
- It can be used for the lower version of Informatica MDM, where the Informatica Platform staging option is not available.
Disadvantages
- Physical data objects need to be manually created for each staging table and manually updated for any changes to the table.
- Hub Stage Delta detection, hard delete detection, and audit trails options are not available.
- System generated columns need to be populated manually.
- Rejected records are not captured in the _REJ table of the corresponding stage table but get caught in .bad file.
- Invalid lookup values are not rejected while data loads to stage, unlike in the Hub Stage Process. The record with invalid value gets rejected and captured by the Hub Load process.