This tutorial
has been tested with the following software versions
·
Ubuntu Linux 12.04
·
Hadoop 1.0.4
·
jdk7
1.1 Install Ubuntu 12.04 on your system either with windows or Ubuntu 12.04 individually.
1.2 Configuring SSH
ssh-keygen -t rsa -P ""
cat
$HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh
localhost
1.3 Install Hadoop-1.0.4 on your system.
Download
Hadoop-1.0.4 tar file and untar it and place it in home directory.
sudo tar xzf hadoop-1.0.4.tar.gz
-to
install java on the system.
sudo aptget install openjdk7jdk
-to check
java is currently installed on the system
java -version
Set the
configuration of the Hadoop file.
Open the
conf file and change the following files.
1.3.1
core-site.xml
<?xml
version="1.0"?>
<?xml-stylesheet
type="text/xsl" href="configuration.xsl"?>
<!--
Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>HDFS://localhost:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the
FileSystem
implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
1.3.2 Hadoop-env.sh
export
JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
1.3.3
HDFS-site.xml
<?xml
version="1.0"?>
<?xml-stylesheet
type="text/xsl" href="configuration.xsl"?>
<!--
Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
</configuration>
1.3.4
mapred-site.xml
<?xml
version="1.0"?>
<?xml-stylesheet
type="text/xsl" href="configuration.xsl"?>
<!--
Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
</configuration>
1.3.5
masters
master
1.3.6
slave
master
1.3.7
.bashrc
export
JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386
export
HADOOP_HOME=/home/harshit/Hadoop-1.0.4
export TOMCAT_HOME=/home/harshit/apache-tomcat-7.0.39
export
PATH=$PATH:$JAVA_HOME/bin
1.4 to Start the Hadoop cluster
-- change the directory to Hadoop -1.0.4
cd Hadoop-1.0.4
--format the namenode
bin/Hadoop namenode –format
--start namenode
bin/Hadoop-daemons.sh start namenode
--start datanode
bin/Hadoop-daemons.sh start datanode
--start secondarynamenode
bin/Hadoop-daemons.sh start secondarynamenode
--start tasktracker
bin/Hadoop-daemons.sh start tasktracker
--start jobtracker
bin/Hadoop-daemons.sh start jobtracker
--check all processes are running
jps
or
--to run all processes at once
bin/hadoop/start-all.sh
--to stop all the processes
bin/hadoop/stop-all.sh
or
--stop all processes individually by changing the start to stop.
1.5 Copy the data from local to hdfs.
--check whether any
data is present in hdfs
Cd
Hadoop-1.0.4
Bin/Hadoop
fs –ls
--transfer file from
local to hdfs
Bin/Hadoop
fs –put filename.txt filename.txt
--check whether file
is present
Bin/Hadoop
fs –ls
1.6 Run the sample wordcount program from Hadoop example.jar
Bin/Hadoop
jar Hadoop-examples-1.0.4.jar wordcount filename.txt filenameop.txt
--check the
output generated
Bin/Hadoop fs –get filenameop.txt filenameop.txt
Nice Article .... thanx dude :)
ReplyDelete