Saturday, December 19, 2015

Study Notes - CCDH-410 Hadoop Developer Study Notes

Sqoop- export and import data
Pig: jdbc and odbc dataflow and transform large data set
Hive:  sql like query language, JDBC/ODBC
Oozie: workflow
Flume: moving data from rdbms to hdfs
Hbase: column base family that give high throughput

NameNode: holds metadata for HDFS
Secondary NameNode: Perform housekeeping function for namenode.
Datanode: stores actual HDFS data blocks
Job Tracker: Manages Mapreduce jobs, distribute individual task to machine

Task tracker: instantiates and monitors individual map reduce tasks

Thursday, August 6, 2015

Project Notes- Install Single Node Hadoop on Amazon Ubuntu AMI & verify with wordcount MR.

This blog entry documents how we set up a single node Hadoop instance on Amazon Ubuntu AMI and verify with running a wordcount mapreduce.
Note that this instance is for testing/demonstration purpose only. By all means one should not use this kind of security configuration for real-life deployment.
Step-By-Step Procedure:

Wednesday, July 29, 2015

Project Notes -Parse JSON log file using MapReduce on AWS AMI

This blog entry discusses how we parse a JSON log file using MapReduce.
We will be using the AWS AMI we setup earlier to perform mapreduce task.
1. We use a simple JSON log file generator to generate the following JSON log file, save as demo.txt

{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID964", "location": {"y": 156, "x": 292}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID442", "location": {"y": 135, "x": 323}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID314", "location": {"y": 153, "x": 316}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID711", "location": {"y": 131, "x": 310}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID397", "location": {"y": 170, "x": 347}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID120", "location": {"y": 122, "x": 355}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID591", "location": {"y": 117, "x": 213}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID164", "location": {"y": 125, "x": 341}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID187", "location": {"y": 135, "x": 382}}
{"time_stamp": "2015-07-01 19:19:13", "user_id": "UID623", "location": {"y": 137, "x": 359}}

Saturday, March 21, 2015

Study Notes - Amazon Web Services -EC2

What I learned about Amazon Web Services (AWS) from infinitieSkills Training

Introduction: 

Amazon Web Services (AWS) are a collection of cloud computing and infrastructure resources available from Amazon.com.

From this training, I learned the EC2 and elastic cloud compute, which are virtual servers in the cloud. Storage and content delivery, database administration and security, deployment and management.

I also practiced building  EC2 Linux/Windows Instance, monitoring and reporting security,and  provisioning.

Saturday, March 14, 2015

Project Notes- Create a Web Application using Apache Cassandra

Purpose: 
Creating a web application using Cassandra. 

Description: 
Use one Cassandra cluster object ,one session object, and 4 Cassandra nodes 

Pre-Requisites:
1.            Acquired VMware Fusion (to be able to create a virtual machine).
2.            Acquired Ubuntu ISO disk image (to have an operating system as a base for creating a virtual machine).
3.            Used VMware Fusion to create a virtual machine from the disk image.
4.            Installed vim (a text editor) on the virtual machine.
5.            Installed curl (for referencing URLs on the command line).
6.            Installed Oracle JDK (to be able to create and run Java applications).
7.            Installed Apache Maven (for managing dependencies when developing an application).
8.            Installed Eclipse (to have a development environment for writing a Java application).
9.            Installed Tomcat (to be able to host and serve a Java application).
10.          Downloaded Apache Cassandra to the virtual machine.