R's NoteBook: Project Notes- Install Single Node Hadoop on Amazon Ubuntu AMI & verify with wordcount MR.

This blog entry documents how we set up a single node Hadoop instance on Amazon Ubuntu AMI and verify with running a wordcount mapreduce.

Note that this instance is for testing/demonstration purpose only. By all means one should not use this kind of security configuration for real-life deployment.

Step-By-Step Procedure:

1. Create a Amazon EC2 Instance

http://aws.amazon.com/ec2/
Select Launch instance
- Step 1: Choose an Amazon Machine Image (AMI) :select Ubuntu 14.04
- Step 2: Choose an Instance Type: select t2.large
- Step 3: Configure Instance Details : leave default
- Step 4: Add Storage : put 30GB
- Step 5: Tag Instance :leave default
- Step 6: Configure Security Group: custom TCP rule, port range 0-36600, source anywhere
- Step 7: Review Instance Launch

2. Telnet into the EC2 instance

Get the Public address of EC2
Use Putty to telnet into EC2
Follow this blog for instruction on Putty: http://www.hkitblog.com/?p=24492

3. After Telnet, install JAVA

k@laptop:~$ cd ~

# Update the source list
k@laptop:~$ sudo apt-get update

# The OpenJDK project is the default version of Java 
# that is provided from a supported Ubuntu repository.
k@laptop:~$ sudo apt-get install default-jdk

k@laptop:~$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

4. Add a Hadoop user

k@laptop:~$ sudo addgroup hadoop
Adding group `hadoop' (GID 1002) ...
Done.

k@laptop:~$ sudo adduser --ingroup hadoop hduser
Adding user `hduser' ...
Adding new user `hduser' (1001) with group `hadoop' ...
Creating home directory `/home/hduser' ...
Copying files from `/etc/skel' ...
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully
Changing the user information for hduser
Enter the new value, or press ENTER for the default
 Full Name []: 
 Room Number []: 
 Work Phone []: 
 Home Phone []: 
 Other []: 
Is the information correct? [Y/n] Y

5. Give SU privilege to Hadoop User:

hduser@laptop:~/hadoop-2.6.0$ su k
Password: 

k@laptop:/home/hduser$ sudo adduser hduser sudo
[sudo] password for k: 
Adding user `hduser' to group `sudo' ...
Adding user hduser to group sudo
Done.

6. Install Hadoop

hduser@laptop:~$ wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
hduser@laptop:~$ tar xvzf hadoop-2.6.0.tar.gz

move the Hadoop installation to the /usr/local/hadoop directory using the following command:

hduser@laptop:~/hadoop-2.6.0$ sudo mv * /usr/local/hadoop

hduser@laptop:~/hadoop-2.6.0$ sudo chown -R hduser:hadoop /usr/local/hadoop

7. Setup Hadoop configuration files

The following files will have to be modified to complete the Hadoop setup:

~/.bashrc
/usr/local/hadoop/etc/hadoop/hadoop-env.sh
/usr/local/hadoop/etc/hadoop/core-site.xml
/usr/local/hadoop/etc/hadoop/mapred-site.xml.template
/usr/local/hadoop/etc/hadoop/hdfs-site.xml

follow this post for setting:

http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php

8. Format the Hadoop filesystem

hduser@laptop:~$ hadoop namenode -format

9. Start Hadoop

k@laptop:/usr/local/hadoop/sbin$ sudo su hduser

hduser@laptop:/usr/local/hadoop/sbin$ start-all.sh

10.Export the Path

export PATH=$PATH:/usr/local/hadoop/bin/

11. Run MapReduce

get the wordcount jar from s3:
- hduser@ip-172-31-6-97:/home$ wget https://s3-us-west-1.amazonaws.com/hadoopclass1/WordCountPartitioner.jar
get the text file from s3:
- wget https://s3-us-west-1.amazonaws.com/hadoopclass1/datasets/shakespeare/comedies
put the file into hdfs
- hadoop fs -mkdir /input-file
- hdfs -put comedies /input-file/comediesput
Run wordcount:
- hadoop jar WordCountPartitioner.jar WordCount /input-file/comedies output
Read the output file
- hadoop fs -cat /output-file/wordcount/*

R's NoteBook

Thursday, August 6, 2015

Project Notes- Install Single Node Hadoop on Amazon Ubuntu AMI & verify with wordcount MR.

No comments:

Post a Comment

Blog Archive