This blog entry documents how we set up a single node Hadoop instance on Amazon Ubuntu AMI and verify with running a wordcount mapreduce.
Note that this instance is for testing/demonstration purpose only. By all means one should not use this kind of security configuration for real-life deployment.
1. Create a Amazon EC2 Instance
- http://aws.amazon.com/ec2/
- Select Launch instance
- Step 1: Choose an Amazon Machine Image (AMI) :select Ubuntu 14.04
- Step 2: Choose an Instance Type: select t2.large
- Step 3: Configure Instance Details : leave default
- Step 4: Add Storage : put 30GB
- Step 5: Tag Instance :leave default
- Step 6: Configure Security Group: custom TCP rule, port range 0-36600, source anywhere
- Step 7: Review Instance Launch
2. Telnet into the EC2 instance
- Get the Public address of EC2
- Use Putty to telnet into EC2
- Follow this blog for instruction on Putty: http://www.hkitblog.com/?p=24492
3. After Telnet, install JAVA
k@laptop:~$ cd ~ # Update the source list k@laptop:~$ sudo apt-get update # The OpenJDK project is the default version of Java # that is provided from a supported Ubuntu repository. k@laptop:~$ sudo apt-get install default-jdk k@laptop:~$ java -version java version "1.7.0_65" OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.14.04.1) OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
4. Add a Hadoop user
k@laptop:~$ sudo addgroup hadoop Adding group `hadoop' (GID 1002) ... Done. k@laptop:~$ sudo adduser --ingroup hadoop hduser Adding user `hduser' ... Adding new user `hduser' (1001) with group `hadoop' ... Creating home directory `/home/hduser' ... Copying files from `/etc/skel' ... Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully Changing the user information for hduser Enter the new value, or press ENTER for the default Full Name []: Room Number []: Work Phone []: Home Phone []: Other []: Is the information correct? [Y/n] Y
5. Give SU privilege to Hadoop User:
hduser@laptop:~/hadoop-2.6.0$ su k Password: k@laptop:/home/hduser$ sudo adduser hduser sudo [sudo] password for k: Adding user `hduser' to group `sudo' ... Adding user hduser to group sudo Done.
6. Install Hadoop
hduser@laptop:~$ wget http://mirrors.sonic.net/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz hduser@laptop:~$ tar xvzf hadoop-2.6.0.tar.gz
- move the Hadoop installation to the /usr/local/hadoop directory using the following command:
hduser@laptop:~/hadoop-2.6.0$ sudo mv * /usr/local/hadoop
hduser@laptop:~/hadoop-2.6.0$ sudo chown -R hduser:hadoop /usr/local/hadoop
7. Setup Hadoop configuration files
The following files will have to be modified to complete the Hadoop setup:
- ~/.bashrc
- /usr/local/hadoop/etc/hadoop/hadoop-env.sh
- /usr/local/hadoop/etc/hadoop/core-site.xml
- /usr/local/hadoop/etc/hadoop/mapred-site.xml.template
- /usr/local/hadoop/etc/hadoop/hdfs-site.xml
follow this post for setting:
http://www.bogotobogo.com/Hadoop/BigData_hadoop_Install_on_ubuntu_single_node_cluster.php
8. Format the Hadoop filesystem
hduser@laptop:~$ hadoop namenode -format
9. Start Hadoop
k@laptop:/usr/local/hadoop/sbin$ sudo su hduser hduser@laptop:/usr/local/hadoop/sbin$ start-all.sh
10.Export the Path
- export PATH=$PATH:/usr/local/hadoop/bin/
11. Run MapReduce
- get the wordcount jar from s3:
- hduser@ip-172-31-6-97:/home$ wget https://s3-us-west-1.amazonaws.com/hadoopclass1/WordCountPartitioner.jar
- get the text file from s3:
- wget https://s3-us-west-1.amazonaws.com/hadoopclass1/datasets/shakespeare/comedies
- put the file into hdfs
- hadoop fs -mkdir /input-file
- hdfs -put comedies /input-file/comediesput
- Run wordcount:
- hadoop jar WordCountPartitioner.jar WordCount /input-file/comedies output
- Read the output file

No comments:
Post a Comment