Tutorial ini menjelaskan instalasi single cluster Hadoop 2.0 di Ubuntu 14.04 diatas VirtualBox pada laptop Mac OS 10.10. Tutorial ini salah satunya diilhami dari artikel ini. Instalasi Hadoop di atas VBox memudahkan pembelajaran, baik single maupun multi node cluster.
VirtualBox dan Java
- Install dulu VirtualBox, bisa diunduh di https://www.virtualbox.org/wiki/Downloads
- Unduh Ubuntu dari repo, contoh http://repo.ugm.ac.id/iso/ubuntu/releases/. Versi yang digunakan dlm tutorial ini adalah 14.04 LTS versi server
- Install Ubuntu di atas VirtualBox (lihat disini untuk tutorialnya)
Konfigurasi Ubuntu dan install Java. Instalasi (lihat disini untuk detail) diperlukan karena pemrograman Map Reduce pada Hadoop terutama menggunakan Java.
- Login di Ubuntu server dan update paket index
sudo apt-get update
- Periksa bila java sudah terpasang
java -version
- Bila belum terpasang, install Java
sudo apt-get install default-jre
- install java-jdk (untuk kompilasi java application)
sudo apt-get install default-jdk
- update ‘JAVA_HOME’ environment variable (dibutuhkan oleh beberapa program)
sudo update-alternatives --config java sudo nano /etc/environment JAVA_HOME=“/usr/lib/jvm/java-7-openjdk-i386/" source /etc/environment echo $JAVA_HOME
Instalasi Hadoop
- Unduh Hadoop, salah satunya dari mirror site spt dibawah. Contoh dibawah hadoop diunduh langsung ke Ubuntu.
wget -o http://mirrors.advancedhosters.com/ apache/hadoop/common/stable/hadoop-2.2.0.tar.gz
- untar hadoop
tar -xvzf archive.tar.gz --owner root --group root --no-same-owner 2>&1 > tar2.log
- Buat hadoop group dan user (avoid security issue)
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser
sudo su hduser
- Menghilangkan warning start-dfs.sh
export HADOOP_OPTS="$HADOOP_OPTS -XX:-PrintWarnings -Djava.net.preferIPv4Stack=true"
- Membuat SSH-certificate
ssh-keygen -t rsa -P
(terms setting default) Identifikasi kita sudah disimpan /home/hduser/.ssh/id_rsa. Public key tersimpan di /home/hduser/.ssh/id_rsa.pub
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh localhost
- ubah owner dari file hadoop
cd /usr/local sudo chown -R hduser:hadoop hadoop
- Setup environment variable
cd ~ nano .bashrc
Copikan di akhir file
###Hadoop variables export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386 export HADOOP_INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOME=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME=$HADOOP_INSTALL export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib" ###end of paste
- buat directory untuk data
cd~ mkdir -p mydata/hdfs/namenode mkdir -p mydata/hdfs/datanode
Konfigurasi Hadoop
- Update JAVA_HOME
cd /usr/local/hadoop/etc/hadoop nano hadoop-env.sh
export JAVA_HOME=${JAVA_HOME} ubah ke export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
- verifikasi versi Hadoop
hadoop version
Bila tidak ada pesan error, berarti hadoop sudah sukses terinstalasi. Sebelum bisa dijalankan, Hadoop terlebih dulu perlu dikonfigurasi.
nano core-site.xml
paste setting berikut diantara configure tag
<property> fs.default.name hdfs://localhost:9000 </property>
nano yarn-site.xml
paste setting berikut diantara configure tag
<property> yarn.nodemanager.aux-services mapreduce_shuffle </property> <property> yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler </property>
save dan close file.
mv mapred-site.xml.template mapred-site.xml nano mapred-site.xml
Paste setting berikut diantara configure tags:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
nano hdfs-site.xml
Paste setting berikut diantara configure tags:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
Simpan dan tutup file.
Format Namenode
hdfs namenode -format
Memulai Hadoop Services
start-dfs.sh .... start-yarn.sh .... jps
Jika semuanya lancer, services dibawah akan berjalan
DataNode ResourceManager Jps NodeManager NameNode SecondaryNameNode
Menguji Hadoop (menghitung nilai dari pi)
$cd /usr/local/hadoop $hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 5 Number of Maps = 2 Samples per Map = 5 13/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Wrote input for Map #0 Wrote input for Map #1 Starting Job 13/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 13/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process : 2 13/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2 13/10/21 18:41:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name ...
Hasil test seharusnya mirip seperti dibawah:
Job Finished in 24.815 seconds (number rounded to 1000th decimal) Estimated value of Pi is 3.60000000000 (or similar number)
Selamat mencoba!