106
Starting Sqoop
Sqoop is a command line tool with following structure
- TOOL indicates the operation eg: “import”, “export”.
- PROPERTY_ARGS are Java properties in the format “-Dname=value”
- SQOOP_ARGS mention various Sqoop parameters
- EXTRA_ARGS are for specialized connectors, separated from the SQOOP_ARGS with a “–“
Example:
Type “sqoop help” to get all the tools available:
What happens in backend
When you have decided to move data from RDBMS to HDFS then the first product which comes into use is Apache Sqoop. When you request to bring the data to HDFS then the following things happen.
- Sqoop asks for metadata information from Relation DB.
- Relational DB returns the required request.
- Based on metadata information Sqoop generates java classes.
- Based on primary id partitioning happens in table as multiple mappers will importing data as the same time.
Next TopicSqoop Import