Tutorial ini menjelaskan instalasi single cluster Hadoop 2.0 di Ubuntu 14.04 diatas VirtualBox pada laptop Mac OS 10.10. Tutorial ini salah satunya diilhami dari artikel ini. Instalasi Hadoop di atas VBox memudahkan pembelajaran, baik single maupun multi node cluster.
VirtualBox dan Java
Konfigurasi Ubuntu dan install Java. Instalasi (lihat disini untuk detail) diperlukan karena pemrograman Map Reduce pada Hadoop terutama menggunakan Java.
- Login di Ubuntu server dan update paket index
sudo apt-get update
- Periksa bila java sudah terpasang
java -version
- Bila belum terpasang, install Java
sudo apt-get install default-jre
- install java-jdk (untuk kompilasi java application)
sudo apt-get install default-jdk
- update ‘JAVA_HOME’ environment variable (dibutuhkan oleh beberapa program)
sudo update-alternatives --config java
sudo nano /etc/environment
JAVA_HOME=“/usr/lib/jvm/java-7-openjdk-i386/"
source /etc/environment
echo $JAVA_HOME
Instalasi Hadoop
- Unduh Hadoop, salah satunya dari mirror site spt dibawah. Contoh dibawah hadoop diunduh langsung ke Ubuntu.
wget -o http://mirrors.advancedhosters.com/
apache/hadoop/common/stable/hadoop-2.2.0.tar.gz
tar -xvzf archive.tar.gz --owner root --group root --no-same-owner 2>&1 > tar2.log
- Buat hadoop group dan user (avoid security issue)
sudo addgroup hadoop sudo adduser --ingroup hadoop hduser
sudo su hduser
- Menghilangkan warning start-dfs.sh
export HADOOP_OPTS="$HADOOP_OPTS -XX:-PrintWarnings -Djava.net.preferIPv4Stack=true"
ssh-keygen -t rsa -P
(terms setting default) Identifikasi kita sudah disimpan /home/hduser/.ssh/id_rsa. Public key tersimpan di /home/hduser/.ssh/id_rsa.pub
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
ssh localhost
- ubah owner dari file hadoop
cd /usr/local
sudo chown -R hduser:hadoop hadoop
- Setup environment variable
cd ~
nano .bashrc
Copikan di akhir file
###Hadoop variables
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
###end of paste
- buat directory untuk data
cd~
mkdir -p mydata/hdfs/namenode
mkdir -p mydata/hdfs/datanode
Konfigurasi Hadoop
cd /usr/local/hadoop/etc/hadoop
nano hadoop-env.sh
export JAVA_HOME=${JAVA_HOME}
ubah ke
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
hadoop version
Bila tidak ada pesan error, berarti hadoop sudah sukses terinstalasi. Sebelum bisa dijalankan, Hadoop terlebih dulu perlu dikonfigurasi.
nano core-site.xml
paste setting berikut diantara configure tag
<property>
fs.default.name
hdfs://localhost:9000
</property>
nano yarn-site.xml
paste setting berikut diantara configure tag
<property>
yarn.nodemanager.aux-services
mapreduce_shuffle
</property>
<property>
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
</property>
save dan close file.
mv mapred-site.xml.template mapred-site.xml
nano mapred-site.xml
Paste setting berikut diantara configure tags:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
nano hdfs-site.xml
Paste setting berikut diantara configure tags:
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/mydata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/mydata/hdfs/datanode</value>
</property>
Simpan dan tutup file.
Format Namenode
hdfs namenode -format
Memulai Hadoop Services
start-dfs.sh
....
start-yarn.sh
....
jps
Jika semuanya lancer, services dibawah akan berjalan
DataNode
ResourceManager
Jps
NodeManager
NameNode
SecondaryNameNode
Menguji Hadoop (menghitung nilai dari pi)
$cd /usr/local/hadoop
$hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 2 5
Number of Maps = 2
Samples per Map = 5
13/10/21 18:41:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform...
using builtin-java classes where applicable
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/10/21 18:41:04 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
13/10/21 18:41:04 INFO input.FileInputFormat: Total input paths to process : 2
13/10/21 18:41:04 INFO mapreduce.JobSubmitter: number of splits:2
13/10/21 18:41:04 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
...
Hasil test seharusnya mirip seperti dibawah:
Job Finished in 24.815 seconds (number rounded to 1000th decimal)
Estimated value of Pi is 3.60000000000 (or similar number)
Selamat mencoba!