Data Science: Hadoop Single Node Cluster Installation

This tutorial has been tested with the following software versions

· Ubuntu Linux 12.04

· Hadoop 1.0.4

· jdk7

1.1 Install Ubuntu 12.04 on your system either with windows or Ubuntu 12.04 individually.

1.2 Configuring SSH

ssh-keygen -t rsa -P ""

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

ssh localhost

1.3 Install Hadoop-1.0.4 on your system.

Download Hadoop-1.0.4 tar file and untar it and place it in home directory.

sudo tar xzf hadoop-1.0.4.tar.gz

-to install java on the system.

sudo aptget install openjdk7jdk

-to check java is currently installed on the system

java -version

Set the configuration of the Hadoop file.

Open the conf file and change the following files.

1.3.1 core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>fs.default.name</name>

<value>HDFS://localhost:9000</value>

<description>The name of the default file system. A URI whose

scheme and authority determine the FileSystem implementation. The

uri's scheme determines the config property (fs.SCHEME.impl) naming the

FileSystem implementation class. The uri's authority is used to

determine the host, port, etc. for a filesystem.</description>

</property>

</configuration>

1.3.2 Hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

1.3.3 HDFS-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file is created.

The default is used if replication is not specified in create time.

</description>

</property>

</configuration>

1.3.4 mapred-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<name>mapred.job.tracker</name>

<value>localhost:9001</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

</configuration>

1.3.5 masters

master

1.3.6 slave

master

1.3.7 .bashrc

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386

export HADOOP_HOME=/home/harshit/Hadoop-1.0.4

export TOMCAT_HOME=/home/harshit/apache-tomcat-7.0.39

export PATH=$PATH:$JAVA_HOME/bin

1.4 to Start the Hadoop cluster

-- change the directory to Hadoop -1.0.4

cd Hadoop-1.0.4

--format the namenode

bin/Hadoop namenode –format

--start namenode

bin/Hadoop-daemons.sh start namenode

--start datanode

bin/Hadoop-daemons.sh start datanode

--start secondarynamenode

bin/Hadoop-daemons.sh start secondarynamenode

--start tasktracker

bin/Hadoop-daemons.sh start tasktracker

--start jobtracker

bin/Hadoop-daemons.sh start jobtracker

--check all processes are running

jps

--to run all processes at once

bin/hadoop/start-all.sh

--to stop all the processes

bin/hadoop/stop-all.sh

--stop all processes individually by changing the start to stop.

1.5 Copy the data from local to hdfs.

--check whether any data is present in hdfs

Cd Hadoop-1.0.4

Bin/Hadoop fs –ls

--transfer file from local to hdfs

Bin/Hadoop fs –put filename.txt filename.txt

--check whether file is present

Bin/Hadoop fs –ls

1.6 Run the sample wordcount program from Hadoop example.jar

Bin/Hadoop jar Hadoop-examples-1.0.4.jar wordcount filename.txt filenameop.txt

--check the output generated

Bin/Hadoop fs –get filenameop.txt filenameop.txt

Data Science

Wednesday, 6 May 2015

Hadoop Single Node Cluster Installation

1 comment: