Wednesday, 6 May 2015

Hadoop Single Node Cluster Installation

This tutorial has been tested with the following software versions
·  Ubuntu Linux 12.04 
·  Hadoop 1.0.4
·  jdk7

1.1 Install Ubuntu 12.04 on your system either with windows or Ubuntu 12.04 individually. 

1.2 Configuring SSH
 ssh-keygen -t rsa -P ""
 cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
 ssh localhost

1.3 Install Hadoop-1.0.4 on your system.
Download Hadoop-1.0.4 tar file and untar it and place it in home directory.
sudo tar xzf hadoop-1.0.4.tar.gz

-to install java on the system.
sudo apt­get install openjdk­7­jdk

-to check java is currently installed on the system
java -version

Set the configuration of the Hadoop file.
Open the conf file and change the following files.

1.3.1 core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>fs.default.name</name>
  <value>HDFS://localhost:9000</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation. The
  uri's scheme determines the config property (fs.SCHEME.impl) naming  the     
FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>

 1.3.2 Hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

1.3.3 HDFS-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
 </property>
</configuration>

1.3.4 mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
 </description>
</property>
</configuration>

1.3.5 masters
master

1.3.6 slave
master

1.3.7 .bashrc
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
export HADOOP_HOME=/home/harshit/Hadoop-1.0.4
export TOMCAT_HOME=/home/harshit/apache-tomcat-7.0.39
export PATH=$PATH:$JAVA_HOME/bin

1.4 to Start the Hadoop cluster 
-- change the directory to Hadoop -1.0.4
cd Hadoop-1.0.4
--format the namenode
bin/Hadoop namenode –format
--start namenode
bin/Hadoop-daemons.sh start namenode
--start datanode
bin/Hadoop-daemons.sh start datanode
--start secondarynamenode
bin/Hadoop-daemons.sh start secondarynamenode
--start tasktracker
bin/Hadoop-daemons.sh start tasktracker
--start jobtracker
bin/Hadoop-daemons.sh start jobtracker
--check all processes are running
jps
or
--to run all processes at once
bin/hadoop/start-all.sh

--to stop all the processes
bin/hadoop/stop-all.sh
or
--stop all processes individually by changing the start to stop.

1.5 Copy the data from local to hdfs.
--check whether any data is present in hdfs
Cd Hadoop-1.0.4
Bin/Hadoop fs –ls
--transfer file from local to hdfs
Bin/Hadoop fs –put filename.txt filename.txt
--check whether file is present
Bin/Hadoop fs –ls

1.6 Run the sample wordcount  program from Hadoop example.jar
Bin/Hadoop jar Hadoop-examples-1.0.4.jar wordcount filename.txt filenameop.txt
--check the output generated
Bin/Hadoop fs –get filenameop.txt filenameop.txt

1 comment: