Saturday, December 19, 2015

Study Notes - CCDH-410 Hadoop Developer Study Notes

Sqoop- export and import data
Pig: jdbc and odbc dataflow and transform large data set
Hive:  sql like query language, JDBC/ODBC
Oozie: workflow
Flume: moving data from rdbms to hdfs
Hbase: column base family that give high throughput

NameNode: holds metadata for HDFS
Secondary NameNode: Perform housekeeping function for namenode.
Datanode: stores actual HDFS data blocks
Job Tracker: Manages Mapreduce jobs, distribute individual task to machine

Task tracker: instantiates and monitors individual map reduce tasks