Single Cluster Hadoop pada Virtual Box

Bywidyawan

Single Cluster Hadoop pada Virtual Box

Tutorial ini menjelaskan instalasi single cluster Hadoop 2.0 di Ubuntu 14.04 diatas VirtualBox pada laptop Mac OS 10.10. Tutorial ini salah satunya diilhami dari artikel ini. Instalasi Hadoop di atas VBox memudahkan pembelajaran, baik single maupun multi node cluster.

VirtualBox dan Java

Konfigurasi Ubuntu dan install Java. Instalasi (lihat disini untuk detail) diperlukan karena pemrograman Map Reduce pada Hadoop terutama menggunakan Java.

  • Login di Ubuntu server dan update paket index
sudo apt-get update
  • Periksa bila java sudah terpasang
java -version
  • Bila belum terpasang, install Java
sudo apt-get install default-jre
  • install java-jdk (untuk kompilasi java application)
sudo apt-get install default-jdk
  • update ‘JAVA_HOME’ environment variable (dibutuhkan oleh beberapa program)
sudo update-alternatives --config java
sudo nano /etc/environment
JAVA_HOME=“/usr/lib/jvm/java-7-openjdk-i386/"
source /etc/environment 
echo $JAVA_HOME

Instalasi Hadoop

  • Unduh Hadoop, salah satunya dari mirror site spt dibawah. Contoh dibawah hadoop diunduh langsung ke Ubuntu.
wget -o http://mirrors.advancedhosters.com/
apache/hadoop/common/stable/hadoop-2.2.0.tar.gz
tar -xvzf archive.tar.gz --owner root --group root --no-same-owner 2>&1 > tar2.log
  • Buat hadoop group dan user (avoid security issue)
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser
sudo su hduser
  • Menghilangkan warning start-dfs.sh
export HADOOP_OPTS="$HADOOP_OPTS -XX:-PrintWarnings -Djava.net.preferIPv4Stack=true"
  • Membuat SSH-certificate
ssh-keygen -t rsa -P

(terms setting default) Identifikasi kita sudah disimpan  /home/hduser/.ssh/id_rsa. Public key tersimpan di /home/hduser/.ssh/id_rsa.pub

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
ssh localhost
  • ubah owner dari file hadoop
cd /usr/local
sudo chown -R hduser:hadoop hadoop
  • Setup environment variable
 
cd ~
nano .bashrc

Copikan di akhir file

###Hadoop variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
###end of paste
  • buat directory untuk data
cd~
mkdir -p mydata/hdfs/namenode
mkdir -p mydata/hdfs/datanode

Konfigurasi Hadoop

  • Update JAVA_HOME
cd /usr/local/hadoop/etc/hadoop
nano hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
ubah ke
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
  • verifikasi versi Hadoop
hadoop version

Bila tidak ada pesan error, berarti hadoop sudah sukses terinstalasi. Sebelum bisa dijalankan, Hadoop terlebih dulu perlu dikonfigurasi.

 nano core-site.xml

paste setting berikut diantara configure tag

  <property>
    fs.default.name
    hdfs://localhost:9000
  </property>
 nano yarn-site.xml

paste setting berikut diantara configure tag

<property>
  yarn.nodemanager.aux-services
  mapreduce_shuffle
</property>
<property>
   yarn.nodemanager.aux-services.mapreduce.shuffle.class
   org.apache.hadoop.mapred.ShuffleHandler  
</property>

save dan close file.

mv mapred-site.xml.template mapred-site.xml
nano mapred-site.xml

Paste setting berikut diantara configure tags:

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>

nano hdfs-site.xml

Paste setting berikut diantara configure tags:

<property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/home/hduser/mydata/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/home/hduser/mydata/hdfs/datanode</value>
 </property>

Simpan dan tutup file.

Format Namenode

hdfs namenode -format

Memulai Hadoop Services

start-dfs.sh
....
start-yarn.sh
....
jps

Jika semuanya lancer, services dibawah akan berjalan

DataNode
ResourceManager
Jps
NodeManager
NameNode
SecondaryNameNode

Menguji Hadoop (menghitung nilai dari pi)

$cd /usr/local/hadoop
$hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 5

Number of Maps  = 2
Samples per Map = 5
13/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
13/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process : 2
13/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2
13/10/21 18:41:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
...

Hasil test seharusnya mirip seperti dibawah:

Job Finished in 24.815 seconds (number rounded to 1000th decimal)
Estimated value of Pi is 3.60000000000 (or similar number)

Selamat mencoba!

About the author

widyawan administrator