Oracle Loader for Hadoop

Oracle Loader for Hadoop is an efficient and high performance loader for fast movement of data from a Hadoop Cluster into a table in an Oracle database. Oracle Loader for Hadoop prepares data for loading into a database table, pre-partitioning the data if necessary and transforming it into an Oracle-ready format. It optionally sorts records before loading the data or creating output files. Oracle Loader for Hadoop is a Map Reduce application that is invoked as a command line utility and accepts the generic command-line options which are supported by the org.apache.hadoop.util.Tool interface.

After the pre-partitioning and transforming steps, there are two modes for loading the data into an Oracle database from a Hadoop cluster:

Online database mode: The data is loaded into the database using either a JDBC output format or an OCI Direct Path output format. The OCI Direct Path output format performs a high performance direct path load of the target table. The JDBC output format performs a conventional path load. In both cases, the reducer tasks connect to the database in parallel.

Offline database mode:The reducer tasks create binary or text format output files. The Data Pump output format creates binary format files that are ready to be loaded into an Oracle database using Oracle Direct Connector for HDFS. The Delimited Text output format creates text files in delimited record format. (This is usually called comma separated value (CSV) format when the delimiter is a comma.) These text files are ready to be loaded into an Oracle Database using Oracle Direct Connector for HDFS. Alternatively, these files can be copied to the database system and loaded manually. For Data Pump files, Oracle Loader for Hadoop produces a SQL script that contains the commands to create an external table that may be used to load the Data Pump files. Delimited text files may be manually loaded using either SQL*Loader or external tables. For each delimited text file, Oracle Loader for Hadoop produces a SQL*Loader control file that may be used to load the delimited text file. It also produces a single SQL script to load the delimited text file(s) into the target external table.

Oracle Loader for Hadoop is installed and runs on the Hadoop cluster. It resides on a node from which you submit MapReduce jobs.

Oracle Big Data Connectors must be licensed separately from Oracle Big Data Appliance. If Oracle Big Data Connectors are licensed and you have choosen the option to install connectors in the configuration script, then Mammoth utility installs Oracle Loader for Hadoop on all nodes of the non-primary racks on the Oracle Big Data Appliance.

Tags