Thursday, August 6, 2015

Project Notes- Install Single Node Hadoop on Amazon Ubuntu AMI & verify with wordcount MR.

This blog entry documents how we set up a single node Hadoop instance on Amazon Ubuntu AMI and verify with running a wordcount mapreduce.
Note that this instance is for testing/demonstration purpose only. By all means one should not use this kind of security configuration for real-life deployment.
Step-By-Step Procedure:

1. Create a Amazon EC2 Instance
  • http://aws.amazon.com/ec2/
  • Select Launch instance
    • Step 1: Choose an Amazon Machine Image (AMI) :select Ubuntu 14.04
    • Step 2: Choose an Instance Type: select t2.large
    • Step 3: Configure Instance Details : leave default
    • Step 4: Add Storage : put 30GB
    • Step 5: Tag Instance :leave default
    • Step 6: Configure Security Group: custom TCP rule, port range 0-36600, source anywhere
    • Step 7: Review Instance Launch
2. Telnet into the EC2 instance
  • Get the Public address of EC2
  • Use Putty to telnet into EC2
  • Follow this blog for instruction on Putty: http://www.hkitblog.com/?p=24492
3. After Telnet, install JAVA
  • k@laptop:~$ cd ~
    
    # Update the source list
    k@laptop:~$ sudo apt-get update
    
    # The OpenJDK project is the default version of Java 
    # that is provided from a supported Ubuntu repository.
    k@laptop:~$ sudo apt-get install default-jdk
    
    k@laptop:~$ java -version
    java version "1.7.0_65"
    OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.14.04.1)
    OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
4. Add a Hadoop user
  • k@laptop:~$ sudo addgroup hadoop
    Adding group `hadoop' (GID 1002) ...
    Done.
    
    k@laptop:~$ sudo adduser --ingroup hadoop hduser
    Adding user `hduser' ...
    Adding new user `hduser' (1001) with group `hadoop' ...
    Creating home directory `/home/hduser' ...
    Copying files from `/etc/skel' ...
    Enter new UNIX password: 
    Retype new UNIX password: 
    passwd: password updated successfully
    Changing the user information for hduser
    Enter the new value, or press ENTER for the default
     Full Name []: 
     Room Number []: 
     Work Phone []: 
     Home Phone []: 
     Other []: 
    Is the information correct? [Y/n] Y
5. Give SU privilege to Hadoop User:
  • hduser@laptop:~/hadoop-2.6.0$ su k
    Password: 
    
    k@laptop:/home/hduser$ sudo adduser hduser sudo
    [sudo] password for k: 
    Adding user `hduser' to group `sudo' ...
    Adding user hduser to group sudo
    Done.
6. Install Hadoop
  • hduser@laptop:~$ wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
    hduser@laptop:~$ tar xvzf hadoop-2.6.0.tar.gz
  • move the Hadoop installation to the /usr/local/hadoop directory using the following command:
  • hduser@laptop:~/hadoop-2.6.0$ sudo mv * /usr/local/hadoop
  • hduser@laptop:~/hadoop-2.6.0$ sudo chown -R hduser:hadoop /usr/local/hadoop
7. Setup Hadoop configuration files
The following files will have to be modified to complete the Hadoop setup:
  1. ~/.bashrc
  2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh
  3. /usr/local/hadoop/etc/hadoop/core-site.xml
  4. /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
  5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml
follow this post for setting:
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
8. Format the Hadoop filesystem
  • hduser@laptop:~$ hadoop namenode -format
9. Start Hadoop 
  • k@laptop:/usr/local/hadoop/sbin$ sudo su hduser
    
    hduser@laptop:/usr/local/hadoop/sbin$ start-all.sh
10.Export the Path
  • export PATH=$PATH:/usr/local/hadoop/bin/
11. Run MapReduce 
  • get the wordcount jar from s3:
    • hduser@ip-172-31-6-97:/home$ wget https://s3-us-west-1.amazonaws.com/hadoopclass1/WordCountPartitioner.jar
  • get the text file from s3:
    • wget https://s3-us-west-1.amazonaws.com/hadoopclass1/datasets/shakespeare/comedies
  • put the file into hdfs
    • hadoop fs -mkdir /input-file
    • hdfs -put comedies /input-file/comediesput
  • Run wordcount:
    • hadoop jar WordCountPartitioner.jar WordCount /input-file/comedies output
  • Read the output file
    • hadoop fs -cat /output-file/wordcount/*
    • wordcount output

No comments:

Post a Comment