CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Two Day Hands on Workshop on
“CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA” Organized By Department of Computer Science and Engineering
TRP Engineering College NH 45, Mannachanallur Taluk, Tiruchirappalli District, Irungalur, Tamil Nadu 621105 16-09-2016 & 17-09-2016
Resource Person
Mr.D.Kesavaraja M.E , MBA, (PhD) , MISTE Assistant Professor, Department of Computer Science and Engineering, Dr.Sivanthi Aditanar College of Engineering Tiruchendur - 628215
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA Day – 01 Big Data Analytics with HADOOP Introduction to Big Data BIG DATA Analogy Big Data Analytics Installing Cent OS – 7 Hadoop Installation – Single Node Hadoop Distributed File System To set up the one node Hadoop cluster. Mount the one node Hadoop cluster using FUSE. JAVA API’s of Hadoop Map and Reduce tasks
Presented By D.Kesavaraja
www.k7cloud.in
1|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
JAVA wordcount program to demonstrate the use of Map and Reduce tasks Big Data Analytics Job Opportunities
Day – 02 Cloud Computing – Open Nebula IaaS Cloud Computing Virtualization – VM Ware Demo IaaS, PaaS, SaaS XaaS What is OpenNebula? Ecosystem Open Nebula Setup Open Nebula Installation Procedure to configure IaaS Virtual Machine Creation Virtual Block Storage Controller Virtual Machine migration
Hands on - Live Experiments
E-Resources , Forums and Groups.
Discussion and Clarifications
“Knowing is not enough We must apply Willing is not enough We must do” More Details Visit : www.k7cloud.in : http://k7training.blogspot.in *************
Presented By D.Kesavaraja
www.k7cloud.in
2|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Setup the one node Apache Hadoop Cluster in CentOS 7
Aim :
To find procedure to set up the one node Hadoop cluster. Introduction : Apache Hadoop is an Open Source framework build for distributed Big Data storage and processing data across computer clusters. The project is based on the following components: 1. Hadoop Common – it contains the Java libraries and utilities needed by other Hadoop modules. 2. HDFS – Hadoop Distributed File System – A Java based scalable file system distributed across multiple nodes. 3. MapReduce – YARN framework for parallel big data processing. 4. Hadoop YARN: A framework for cluster resource management. Procedure : Step 1: Install Java on CentOS 7 1. Before proceeding with Java installation, first login with root user or a user with root privileges setup your machine hostname with the following command. # hostnamectl set-hostname master
Set Hostname in CentOS 7
Also, add a new record in hosts file with your own machine FQDN to point to your system IP Address. # vi /etc/hosts Add the below line: 192.168.1.41 master.hadoop.lan
Presented By D.Kesavaraja
www.k7cloud.in
3|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Set Hostname in /etc/hosts File
Replace the above hostname and FQDN records with your own settings. 2. Next, go to Oracle Java download page and grab the latest version of Java SE Development Kit 8 on your system with the help of curl command: # curl -LO -H "Cookie: oraclelicense=accept-securebackup-cookie" ―http://download.oracle.com/otn-pub/java/jdk/8u92-b14/jdk-8u92-linux-x64.rpm‖
Download Java SE Development Kit 8
3. After the Java binary download finishes, install the package by issuing the below command: # rpm -Uvh jdk-8u92-linux-x64.rpm
Install Java in CentOS 7
Step 2: Install Hadoop Framework in CentOS 7 4. Next, create a new user account on your system without root powers which we’ll use it for Hadoop installation path and working environment. The new account home directory will reside in /opt/hadoop directory.
Presented By D.Kesavaraja
www.k7cloud.in
4|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
# useradd -d /opt/hadoop hadoop # passwd hadoop 5. On the next step visit Apache Hadoop page in order to get the link for the latest stable version and download the archive on your system. # curl -O http://apache.javapipe.com/hadoop/common/hadoop-2.7.2/hadoop2.7.2.tar.gz
Download Hadoop Package
6. Extract the archive the copy the directory content to hadoop account home path. Also, make sure you change the copied files permissions accordingly. # tar xfz hadoop-2.7.2.tar.gz # cp -rf hadoop-2.7.2/* /opt/hadoop/ # chown -R hadoop:hadoop /opt/hadoop/
Extract-and Set Permissions on Hadoop
7. Next, login with hadoop user and configure Hadoop and Java Environment Variables on your system by editing the .bash_profile file. # su - hadoop $ vi .bash_profile Append the following lines at the end of the file: ## JAVA env variables export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin
Presented By D.Kesavaraja
www.k7cloud.in
5|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar ## HADOOP env variables export HADOOP_HOME=/opt/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Configure Hadoop and Java Environment Variables
8. Now, initialize the environment variables and check their status by issuing the below commands: $ source .bash_profile $ echo $HADOOP_HOME $ echo $JAVA_HOME
Presented By D.Kesavaraja
www.k7cloud.in
6|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Initialize Linux Environment Variables
9. Finally, configure ssh key based authentication for hadoop account by running the below commands (replace the hostname or FQDN against the ssh-copyid command accordingly). Also, leave the passphrase filed blank in order to automatically login via ssh. $ ssh-keygen -t rsa $ ssh-copy-id master.hadoop.lan
Configure SSH Key Based Authentication
Step 3: Configure Hadoop in CentOS 7 10. Now it’s time to setup Hadoop cluster on a single node in a pseudo distributed mode by editing its configuration files. The location of hadoop configuration files is $HADOOP_HOME/etc/hadoop/, which is represented in this tutorial by hadoop account home directory (/opt/hadoop/) path. Once you’re logged in with user hadoop you can start editing the following configuration file. The first to edit is core-site.xml file. This file contains information about the port number used by Hadoop instance, file system allocated memory, data store memory limit and the size of Read/Write buffers.
Presented By D.Kesavaraja
www.k7cloud.in
7|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
$ vi etc/hadoop/core-site.xml Add the following properties between
... tags. Use localhost or your machine FQDN for hadoop instance.
fs.defaultFS hdfs://master.hadoop.lan:9000/
Configure Hadoop Cluster 11. Next open and edit hdfs-site.xml file. The file contains information about the value of replication data, namenode path and datanode path for local file systems. $ vi etc/hadoop/hdfs-site.xml Here add the following properties between
... tags. On this guide we’ll use/opt/volume/ directory to store our hadoop file system. Replace the dfs.data.dir and dfs.name.dir values accordingly.
dfs.data.dir file:///opt/volume/datanode
Presented By D.Kesavaraja
www.k7cloud.in
8|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
dfs.name.dir file:///opt/volume/namenode
Configure Hadoop Storage
12. Because we’ve specified /op/volume/ as our hadoop file system storage, we need to create those two directories (datanode and namenode) from root account and grant all permissions to hadoop account by executing the below commands. $ su root # mkdir -p /opt/volume/namenode # mkdir -p /opt/volume/datanode # chown -R hadoop:hadoop /opt/volume/ # ls -al /opt/ #Verify permissions # exit #Exit root account to turn back to hadoop user
Presented By D.Kesavaraja
www.k7cloud.in
9|Page
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Configure Hadoop System Storage 13. Next, create the mapred-site.xml file to specify that we are using yarn MapReduce framework. $ vi etc/hadoop/mapred-site.xml Add the following excerpt to mapred-site.xml file:
mapreduce.framework.name yarn
Presented By D.Kesavaraja
www.k7cloud.in
10 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Set Yarn MapReduce Framework 14. Now, edit yarn-site.xml file with the below statements enclosed between
... tags: $ vi etc/hadoop/yarn-site.xml Add the following excerpt to yarn-site.xml file:
yarn.nodemanager.aux-services mapreduce_shuffle
Add Yarn Configuration
15. Finally, set Java home variable for Hadoop environment by editing the below line from hadoop-env.sh file. $ vi etc/hadoop/hadoop-env.sh Edit the following line to point to your Java system path. export JAVA_HOME=/usr/java/default/
Presented By D.Kesavaraja
www.k7cloud.in
11 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Set Java Home Variable for Hadoop
16. Also, replace the localhost value from slaves file to point to your machine hostname set up at the beginning of this tutorial. $ vi etc/hadoop/slaves Step 4: Format Hadoop Namenode 17. Once hadoop single node cluster has been setup it’s time to initialize HDFS file system by formatting the /opt/volume/namenode storage directory with the following command: $ hdfs namenode -format
Presented By D.Kesavaraja
www.k7cloud.in
12 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Format Hadoop Namenode
Hadoop Namenode Formatting Process
Step 5: Start and Test Hadoop Cluster 18. The Hadoop commands are located in $HADOOP_HOME/sbin directory. In order to start Hadoop services run the below commands on your console: $ start-dfs.sh $ start-yarn.sh
Presented By D.Kesavaraja
www.k7cloud.in
13 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Check the services status with the following command. $ jps
Start and Test Hadoop Cluster
Alternatively, you can view a list of all open sockets for Apache Hadoop on your system using the ss command. $ ss -tul $ ss -tuln # Numerical output
Presented By D.Kesavaraja
www.k7cloud.in
14 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Check Apache Hadoop Sockets
19. To test hadoop file system cluster create a random directory in the HDFS file system and copy a file from local file system to HDFS storage (insert data to HDFS). $ hdfs dfs -mkdir /my_storage $ hdfs dfs -put LICENSE.txt /my_storage
Check Hadoop Filesystem Cluster
To view a file content or list a directory inside HDFS file system issue the below commands: $ hdfs dfs -cat /my_storage/LICENSE.txt $ hdfs dfs -ls /my_storage/
Presented By D.Kesavaraja
www.k7cloud.in
15 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
List Hadoop Filesystem Content
Check Hadoop Filesystem Directory
To retrieve data from HDFS to our local file system use the below command: $ hdfs dfs -get /my_storage/ ./
Copy Hadoop Filesystem Data to Local System
Get the full list of HDFS command options by issuing: $ hdfs dfs -help
Step 6: Browse Hadoop Services 20. In order to access Hadoop services from a remote browser visit the following links (replace the IP Address of FQDN accordingly). Also, make sure the below ports are open on your system firewall. For Hadoop Overview of NameNode service. http://192.168.1.41:50070
Presented By D.Kesavaraja
www.k7cloud.in
16 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Access Hadoop Services
For Hadoop file system browsing (Directory Browse). http://192.168.1.41:50070/explorer.html
Hadoop Filesystem Directory Browsing
For Cluster and Apps Information (ResourceManager). http://192.168.1.41:8088
Presented By D.Kesavaraja
www.k7cloud.in
17 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Hadoop Cluster Applications
For NodeManager Information. http://192.168.1.41:8042
Hadoop NodeManager
Step 7: Manage Hadoop Services 21. To stop all hadoop instances run the below commands: $ stop-yarn.sh $ stop-dfs.sh
Presented By D.Kesavaraja
www.k7cloud.in
18 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Stop Hadoop Services
22. In order to enable Hadoop daemons system-wide, login with root user, open /etc/rc.local file for editing and add the below lines: $ su - root # vi /etc/rc.local Add these excerpt to rc.local file. su - hadoop -c "/opt/hadoop/sbin/start-dfs.sh" su - hadoop -c "/opt/hadoop/sbin/start-yarn.sh" exit 0
Enable Hadoop Services at System-Boot
Presented By D.Kesavaraja
www.k7cloud.in
19 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Then, add executable permissions for rc.local file and enable, start and check service status by issuing the below commands: $ $ $ $
chmod +x /etc/rc.d/rc.local systemctl enable rc-local systemctl start rc-local systemctl status rc-local
Enable and Check Hadoop Services
That’s it! Next time you reboot your machine the Hadoop services will be automatically started for you! Hadoop Fuse Installation and Configuration on Centos What is Fuse? FUSE permits you to write down a traditional user land application as a bridge for a conventional file system interface. The hadoop-hdfs-fuse package permits you to use your HDFS cluster as if it were a conventional file system on Linux. It’s assumed that you simply have a operating HDFS cluster and grasp the hostname and port that your NameNode exposes. The Hadoop fuse installation and configuration with Mounting HDFS, HDFS mount using fuse is done by following the below steps. Step 1 : Required Dependencies Step 2 : Download and Install FUSE Step 3 : Install RPM Packages Step 4 : Modify HDFS FUSE Step 5 : Check HADOOP Services Step 6 : Create a Directory to Mount HADOOP
Presented By D.Kesavaraja
www.k7cloud.in
20 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step 7 : Modify HDFS-MOUNT Script Step 8 : Create softlinks of LIBHDFS.SO Step 9 : Check Memory Details To start Hadoop fuse installation and configuration follow the steps: Step 1 : Required Dependencies Hadoop single / multinode Cluster (started mode) jdk (preinstalled) Fuse mount Installation and configuration guide has prepared on following platform and services. Operating System : CentOS release 6.4 (Final) 32bit hadoop : hadoop-1.2.1 mysql-server : 5.1.71 JDK : java version ―1.7.0_45″ 32bit (jdk-7u45-linux-i586.rpm) fuse : hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz fuse RPMs : fuse-libs-2.8.3-4.el6.i686, ————————- —- fuse-2.8.3-4.el6.i686, ————————fuse-devel-2.8.3-4.el6.i686. Step2 : Download and install fuse Llogin as hadoop user to a node in hadoop cluster (master / datanode) Download hdfs-fuse from following location [hadoop@hadoop ~]#wget https://hdfs-fuse.googlecode.com/files/hdfsfuse-0.2.linux2.6-gcc4.1-x86.tar.gz Extract hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz [hadoop@hadoop ~]#tar -zxvf hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz Step 3 : Install rpm packages Switch to root user to install following rpm packages fuse-libs-2.8.3-4.el6.i686 fuse-2.8.3-4.el6.i686 fuse-devel-2.8.3-4.el6.i686 [hadoop@hadoop ~]#su – root [root@hadoop ~]#yum install fuse* [root@hadoop ~]#chmod +x /usr/bin/fusermount Step 4 : Modify hdfs fuse After installation of rpm packages, switch back to hadoop user [root@hadoop ~]# su – hadoop Modify hdfs fuse configuration / environmental variables [hadoop@hadoop ~]$cd hdfs-fuse/conf/
Presented By D.Kesavaraja
www.k7cloud.in
21 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Add following lines in hdfs-fuse.conf [hadoop@hadoop conf]$vi hdfs-fuse.conf export JAVA_HOME=/usr/java/jdk1.7.0_45 # JAVA HOME path export HADOOP_HOME=/home/hadoop/hadoop-1.2.1 # hadoop installation home path export FUSE_HOME=/home/hadoop #fuse installation path export HDFS_FUSE_HOME=/home/hadoop/hdfs-fuse # fuse home path export HDFS_FUSE_CONF=/home/hadoop/hdfs-fuse/conf # fuse configuration path LogDir /tmp LogLevel LOG_DEBUG Hostname 192.168.1.52 # hadoop master node IP Port 9099 # hadoop port number Step 5 : Check hadoop services [hadoop@hadoop conf]$cd .. verify hadoop instance is running [hadoop@hadoop hdfs-fuse]$ jps 2643 TaskTracker 4704 Jps 2206 NameNode 2516 JobTracker 2432 SecondaryNameNode 2316 DataNode Step 6 : Create a directory to mount hadoop Create a folder to mount hadoop file system to it [hadoop@hadoop hdfs-fuse]#mkdir /home/hadoop/hdfsmount [hadoop@hadoop hdfs-fuse]# cd [hadoop@hadoop ~]#pwd Step 7 : Modify hdfs-mount script Switch to hdfc fuse binary folder in order to run mount script. [hadoop@hadoop ~]#cd hdfs-fuse/bin/ Modify hdfs-mount script to show jvm path location and other environmental settings, in our installation guide this is the location for jvm (usr/java/jdk1.7.0_45/jre/lib/i386/server) [hadoop@hadoop bin]$ vi hdfs-mount JAVA_JVM_DIR=/usr/java/jdk1.7.0_45/jre/lib/i386/server export JAVA_HOME=/usr/java/jdk1.7.0_45 export HADOOP_HOME=/home/hadoop/hadoop-1.2.1 export FUSE_HOME=/home/hadoop export HDFS_FUSE_HOME=/home/hadoop/hdfs-fuse export HDFS_FUSE_CONF=/home/hadoop/hdfs-fuse/conf Step 8 : Create softlinks of libhdfs.so
Presented By D.Kesavaraja
www.k7cloud.in
22 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
create softlinks of libhdfs.so which is located in (/home/hadoop/hadoop1.2.1/c++/Linux-i386-32/lib/libhdfs.so) [root@hadoop ~]# cd /home/hadoop/hdfs-fuse/lib/ [root@hadoop lib]# ln -s /home/hadoop/hadoop-1.2.1/c++/Linux-i38632/lib/libhdfs . Mount HDFS file system to /home/hadoop/hdfsmount [hadoop@hadoop bin]#./hdfs-mount /home/hadoop/hdfsmount or [hadoop@hadoop bin]$./hdfs-mount -d /home/hadoop/hdfsmount (-d option to enable debug) Step 9 : Check memory details [hadoop@hadoop bin]# df -h Filesystem Size /dev/mapper/vg_hadoop-lv_root 50G tmpfs 504M /dev/shm /dev/sda1 /dev/mapper/vg_hadoop-lv_home hdfs-fuse /home/hadoop/hdfsmount
Used 1.4G
Avail 46G 0
Use% 3% 504M
Mounted on / 0%
485M 29G 768M
30M 1.2G 64M
430M 27G 704M
7% 5% 9%
/boot /home
[hadoop@hadoop bin]$ ls /home/hadoop/hdfsmount/ tmp
user
Use below ―fusermount‖ command to unmount hadoop file system [hadoop@hadoop bin]$fusermount -u /home/hadoop/hdfsmount The fuse mount is ready to use as local file system. Using FileSystem API to read and write data to HDFS Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a lot of ways. Now let us start by using the FileSystem API to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system. Step 1: Once you have downloaded a test dataset, we can write an application to read a file from the local file system and write the contents to Hadoop Distributed File System. import org.apache.hadoop.conf.Configured; import org.apache.hadoop.util.Tool; import java.io.BufferedInputStream; import java.io.FileInputStream; import java.io.InputStream; import java.io.OutputStream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; Presented By D.Kesavaraja
www.k7cloud.in
23 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.util.ToolRunner; public class HdfsWriter extends Configured implements Tool { public static final String FS_PARAM_NAME = "fs.defaultFS"; public int run(String[] args) throws Exception { if (args.length < 2) { System.err.println("HdfsWriter [local input path] [hdfs output path]"); return 1; } String localInputPath = args[0]; Path outputPath = new Path(args[1]); Configuration conf = getConf(); System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME)); FileSystem fs = FileSystem.get(conf); if (fs.exists(outputPath)) { System.err.println("output path exists"); return 1; } OutputStream os = fs.create(outputPath); InputStream is = new BufferedInputStream(new FileInputStream(localInputPath)); IOUtils.copyBytes(is, os, conf); return 0; } public static void main( String[] args ) throws Exception { int returnCode = ToolRunner.run(new HdfsWriter(), args); System.exit(returnCode); } } Step 2: Export the Jar file and run the code from terminal to write a sample file to HDFS. [root@localhost student]# vi HdfsWriter.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/javac HdfsWriter.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/jar cvfe HdfsWriter.jar HdfsWriter HdfsWriter.class [root@localhost student]# hadoop jar HdfsWriter.jar a.txt kkk.txt configured filesystem = hdfs://localhost:9000/ [root@localhost student]# hadoop jar HdfsReader.jar /user/root/kkk.txt kesava.txt configured filesystem = hdfs://localhost:9000/ Step 3: Verify whether the file is written into HDFS and check the contents of the file. Presented By D.Kesavaraja
www.k7cloud.in
24 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt Step 4: Next, we write an application to read the file we just created in Hadoop Distributed File System and write its contents back to the local file system. import java.io.BufferedOutputStream; import java.io.FileOutputStream; import java.io.InputStream; import java.io.OutputStream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class HdfsReader extends Configured implements Tool { public static final String FS_PARAM_NAME = "fs.defaultFS"; public int run(String[] args) throws Exception { if (args.length < 2) { System.err.println("HdfsReader [hdfs input path] [local output path]"); return 1; } Path inputPath = new Path(args[0]); String localOutputPath = args[1]; Configuration conf = getConf(); System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME)); FileSystem fs = FileSystem.get(conf); InputStream is = fs.open(inputPath); OutputStream os = new BufferedOutputStream(new FileOutputStream(localOutputPath)); IOUtils.copyBytes(is, os, conf); return 0; } public static void main( String[] args ) throws Exception { int returnCode = ToolRunner.run(new HdfsReader(), args); System.exit(returnCode); } } Step 5: Export the Jar file and run the code from terminal to write a sample file to HDFS. [root@localhost student]# vi HdfsReader.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/javac HdfsReader.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/jar cvfe HdfsReader.jar HdfsReader HdfsReader.class [root@localhost student]# hadoop jar HdfsReader.jar /user/root/kkk.txt sample.txt configured filesystem = hdfs://localhost:9000/ Step 6: Verify whether the file is written back into local file system. Presented By D.Kesavaraja
www.k7cloud.in
25 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt
MAP REDUCE [student@localhost ~]$ su Password: [root@localhost student]# su - hadoop Last login: Wed Aug 31 10:14:26 IST 2016 on pts/1 [hadoop@localhost ~]$ mkdir mapreduce [hadoop@localhost ~]$ cd mapreduce [hadoop@localhost mapreduce]$ vi WordCountMapper.java import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper
{ private final static IntWritable one= new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = newStringTokenizer (line); while(tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word,one); } } } [hadoop@localhost mapreduce]$ vi WordCountReducer.java import import import import
java.io.IOException; java.util.Iterator; org.apache.hadoop.io.*; org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer { //Reduce method for just outputting the key from mapper as the value from mapper is just an empty string Presented By D.Kesavaraja
www.k7cloud.in
26 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; //iterates through all the values available with a key and add them together and give the final result as the key and sum of its values for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } [hadoop@localhost mapreduce]$ vi WordCount.java import import import import import import import import import import
org.apache.hadoop.fs.Path; org.apache.hadoop.io.IntWritable; org.apache.hadoop.io.Text; org.apache.hadoop.conf.Configuration; org.apache.hadoop.conf.Configured; org.apache.hadoop.mapreduce.Job; org.apache.hadoop.mapreduce.lib.input.FileInputFormat; org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; org.apache.hadoop.util.Tool; org.apache.hadoop.util.ToolRunner;
public class WordCount extends Configured implements Tool { public int run(String[] args) throws Exception { //getting configuration object and setting job name Configuration conf = getConf(); Job job = new Job(conf, "Word Count hadoop-0.20"); //setting the class names job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); //setting the output data type classes job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //to accept the hdfs input and output dir at run time FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1; Presented By D.Kesavaraja
www.k7cloud.in
27 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
} public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), newWordCount(), args); System.exit(res); } }
Download CentOS As you download and use CentOS Linux, the CentOS Project invites you to be a part of the community as a contributor. There are many ways to contribute to the project, from documentation, QA, and testing to coding changes for SIGs, providing mirroring or hosting, and helping other users.
Presented By D.Kesavaraja
www.k7cloud.in
28 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
DVD ISO Everything ISO Minimal ISO ISOs are also available via Torrent. How to verify your iso If the above is not for you, alternative downloads might be. The release notes are continuously updated to include issues and incorporate feedback from users. Front-end Installation This page shows you how to install OpenNebula from the binary packages. Using the packages provided in our site is the recommended method, to ensure the installation of the latest version and to avoid possible packages divergences of different distributions. There are two alternatives here: you can add our package repositories to your system, or visit the software menu to download the latest package for your Linux distribution. If there are no packages for your distribution, head to the Building from Source Code guide. Step 1. Disable SElinux in CentOS/RHEL 7 SElinux can cause some problems, like not trusting oneadmin user’s SSH credentials. You can disable it changing in the file /etc/selinux/config this line: SELINUX=disabled After this file is changed reboot the machine. Step 2. Add OpenNebula Repositories CentOS/RHEL 7 To add OpenNebula repository execute the following as root: cat << EOT > /etc/yum.repos.d/opennebula.repo [opennebula] name=opennebula baseurl=http://downloads.opennebula.org/repo/5.0/CentOS/7/x86_64 enabled=1 gpgcheck=0 EOT Debian/Ubuntu To add OpenNebula repository on Debian/Ubuntu execute as root: wget -q -O- http://downloads.opennebula.org/repo/Debian/repo.key | apt-key add Debian 8 echo "deb http://downloads.opennebula.org/repo/5.0/Debian/8 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Ubuntu 14.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/14.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list
Presented By D.Kesavaraja
www.k7cloud.in
29 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Ubuntu 16.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/16.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Step 3. Installing the Software Installing on CentOS/RHEL 7 Before installing: Activate the EPEL repo. In CentOS this can be done with the following command: yum install epel-release There are packages for the Front-end, distributed in the various components that conform OpenNebula, and packages for the virtualization host. To install a CentOS/RHEL OpenNebula Front-end with packages from our repository, execute the following as root. yum install opennebula-server opennebula-sunstone opennebula-ruby opennebulagate opennebula-flow CentOS/RHEL Package Description These are the packages available for this distribution: opennebula: Command Line Interface. opennebula-server: Main OpenNebula daemon, scheduler, etc. opennebula-sunstone: Sunstone (the GUI) and the EC2 API. opennebula-ruby: Ruby Bindings. opennebula-java: Java Bindings. opennebula-gate: OneGate server that enables communication between VMs and OpenNebula. opennebula-flow: OneFlow manages services and elasticity. opennebula-node-kvm: Meta-package that installs the oneadmin user, libvirt and kvm. opennebula-common: Common files for OpenNebula packages. Note The files located in /etc/one and /var/lib/one/remotes are marked as configuration files. Installing on Debian/Ubuntu To install OpenNebula on a Debian/Ubuntu Front-end using packages from our repositories execute as root: apt-get update apt-get install opennebula opennebula-sunstone opennebula-gate opennebula-flow Debian/Ubuntu Package Description These are the packages available for these distributions:
Presented By D.Kesavaraja
www.k7cloud.in
30 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
opennebula-common: Provides the user and common files. ruby-opennebula: Ruby API. libopennebula-java: Java API. libopennebula-java-doc: Java API Documentation. opennebula-node: Prepares a node as an opennebula-node. opennebula-sunstone: Sunstone (the GUI). opennebula-tools: Command Line interface. opennebula-gate: OneGate server that enables communication between VMs and OpenNebula. opennebula-flow: OneFlow manages services and elasticity. opennebula: OpenNebula Daemon. Note Besides /etc/one , the following files are marked as configuration files:
/var/lib/one/remotes/datastore/ceph/ceph.conf
/var/lib/one/remotes/vnm/OpenNebulaNetwork.conf Step 4. Ruby Runtime Installation Some OpenNebula components need Ruby libraries. OpenNebula provides a script that installs the required gems as well as some development libraries packages needed. As root execute:
/usr/share/one/install_gems The previous script is prepared to detect common Linux distributions and install the required libraries. If it fails to find the packages needed in your system, manually install these packages: sqlite3 development library mysql client development library curl development library libxml2 and libxslt development libraries ruby development library gcc and g++ make
Presented By D.Kesavaraja
www.k7cloud.in
31 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
If you want to install only a set of gems for an specific component read Building from Source Code where it is explained in more depth. Step 5. Enabling MySQL/MariaDB (Optional) You can skip this step if you just want to deploy OpenNebula as quickly as possible. However if you are deploying this for production or in a more serious environment, make sure you read the MySQL Setup section. Note that it is possible to switch from SQLite to MySQL, but since it’s more cumbersome to migrate databases, we suggest that if in doubt, use MySQL from the start. Step 6. Starting OpenNebula Log in as the oneadmin user follow these steps: The /var/lib/one/.one/one_auth fill will have been created with a randomlygenerated password. It should contain the following: oneadmin: . Feel free to change the password before starting OpenNebula. For example: echo "oneadmin:mypassword" > ~/.one/one_auth Warning This will set the oneadmin password on the first boot. From that point, you must use the oneuser passwd command to change oneadmin’s password. You are ready to start the OpenNebula daemons: service opennebula start service opennebula-sunstone start Step 7. Verifying the Installation After OpenNebula is started for the first time, you should check that the commands can connect to the OpenNebula daemon. You can do this in the Linux CLI or in the graphical user interface: Sunstone. Linux CLI In the Front-end, run the following command as oneadmin: oneuser show USER 0 INFORMATION ID :0 NAME : oneadmin GROUP : oneadmin PASSWORD : 3bc15c8aae3e4124dd409035f32ea2fd6835efc9 AUTH_DRIVER : core ENABLED : Yes USER TEMPLATE TOKEN_PASSWORD="ec21d27e2fe4f9ed08a396cbd47b08b8e0a4ca3c" RESOURCE USAGE & QUOTAS If you get an error message, then the OpenNebula daemon could not be started properly:
Presented By D.Kesavaraja
www.k7cloud.in
32 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
oneuser show Failed to open TCP connection to localhost:2633 (Connection refused - connect(2) for "localhost" port 2633) The OpenNebula logs are located in /var/log/one , you should have at least the files oned.log and sched.log , the core and scheduler logs. Check oned.log for any error messages, marked with [E] . Sunstone Now you can try to log in into Sunstone web interface. To do this point your browser to http://:9869 . If everything is OK you will be greeted with a login page. The user is oneadmin and the password is the one in the file /var/lib/one/.one/one_auth in your Front-end. If the page does not load, make sure you check /var/log/one/sunstone.log and /var/log/one/sunstone.error . Also, make sure TCP port 9869 is allowed through the firewall. Directory Structure The following table lists some notable paths that are available in your Front-end after the installation: Path
Description
/etc/one/
Configuration Files
/var/log/one/
Log files, notably: oned.log , sched.log , sunstone.log and .log
/var/lib/one/ /var/lib/one/datastores/< dsid>/ /var/lib/one/vms// /var/lib/one/.one/one_aut h /var/lib/one/remotes/ /var/lib/one/remotes/hook s/
Presented By D.Kesavaraja
oneadmin home directory Storage for the datastores Action files for VMs (deployment file, transfer manager scripts, etc...) oneadmin credentials Probes and scripts that will be synced to the Hosts Hook scripts
www.k7cloud.in
33 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Path /var/lib/one/remotes/vmm / /var/lib/one/remotes/auth / /var/lib/one/remotes/im/ /var/lib/one/remotes/mark et/ /var/lib/one/remotes/data store/ /var/lib/one/remotes/vnm / /var/lib/one/remotes/tm/
Description Virtual Machine Manager Driver scripts
Authentication Driver scripts Information Manager (monitoring) Driver scripts MarketPlace Driver scripts
Datastore Driver scripts
Networking Driver scripts Transfer Manager Driver scripts
Step 8. Next steps Now that you have successfully started your OpenNebula service, head over to the Node Installation chapter in order to add hypervisors to your cloud. KVM Node Installation This page shows you how to install OpenNebula from the binary packages. Using the packages provided in our site is the recommended method, to ensure the installation of the latest version and to avoid possible packages divergences of different distributions. There are two alternatives here: you can add our package repositories to your system, or visit the software menu to download the latest package for your Linux distribution. Step 1. Add OpenNebula Repositories CentOS/RHEL 7 To add OpenNebula repository execute the following as root: cat << EOT > /etc/yum.repos.d/opennebula.repo [opennebula] name=opennebula baseurl=http://downloads.opennebula.org/repo/5.0/CentOS/7/x86_64 enabled=1 gpgcheck=0 EOT
Presented By D.Kesavaraja
www.k7cloud.in
34 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Debian/Ubuntu To add OpenNebula repository on Debian/Ubuntu execute as root: wget -q -O- http://downloads.opennebula.org/repo/Debian/repo.key | apt-key add Debian 8 echo "deb http://downloads.opennebula.org/repo/5.0/Debian/8 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Ubuntu 14.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/14.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Ubuntu 16.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/16.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Step 2. Installing the Software Installing on CentOS/RHEL Execute the following commands to install the node package and restart libvirt to use the OpenNebula provided configuration file: sudo yum install opennebula-node-kvm sudo service libvirtd restart For further configuration, check the specific guide: KVM. Installing on Debian/Ubuntu Execute the following commands to install the node package and restart libvirt to use the OpenNebula provided configuration file: sudo apt-get install opennebula-node sudo service libvirtd restart # debian sudo service libvirt-bin restart # ubuntu For further configuration check the specific guide: KVM. Step 3. Disable SElinux in CentOS/RHEL 7 SElinux can cause some problems, like not trusting oneadmin user’s SSH credentials. You can disable it changing in the file /etc/selinux/config this line: SELINUX=disabled After this file is changed reboot the machine. Step 4. Configure Passwordless SSH OpenNebula Front-end connects to the hypervisor Hosts using SSH. You must distribute the public key of oneadmin user from all machines to the file /var/lib/one/.ssh/authorized_keys in all the machines. There are many methods to achieve the distribution of the SSH keys, ultimately the administrator should choose a method (the recommendation is to use a configuration management system). In this guide we are going to manually scp the SSH keys.
Presented By D.Kesavaraja
www.k7cloud.in
35 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
When the package was installed in the Front-end, an SSH key was generated and the authorized_keys populated. We will sync the id_rsa , id_rsa.pub and authorized_keys from the Front-end to the nodes. Additionally we need to create a known_hosts file and sync it as well to the nodes. To create the known_hosts file, we have to execute this command as user oneadmin in the Front-end with all the node names as parameters: ssh-keyscan ... >> /var/lib/one/.ssh/known_hosts Now we need to copy the directory /var/lib/one/.ssh to all the nodes. The easiest way is to set a temporary password to oneadmin in all the hosts and copy the directory from the Front-end: scp -rp /var/lib/one/.ssh :/var/lib/one/ scp -rp /var/lib/one/.ssh :/var/lib/one/ scp -rp /var/lib/one/.ssh :/var/lib/one/ ... You should verify that connecting from the Front-end, as user oneadmin , to the nodes, and from the nodes to the Front-end, does not ask password: ssh ssh exit exit ssh ssh exit exit ssh ssh exit exit
Presented By D.Kesavaraja
www.k7cloud.in
36 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step 5. Networking Configuration
A network connection is needed by the OpenNebula Front-end daemons to access the hosts to manage and monitor the Hosts, and to transfer the Image files. It is highly recommended to use a dedicated network for this purpose. There are various network models (please check the Networking chapter to find out the networking technologies supported by OpenNebula). You may want to use the simplest network model that corresponds to the bridged drivers. For this driver, you will need to setup a linux bridge and include a physical device to the bridge. Later on, when defining the network in OpenNebula, you will specify the name of this bridge and OpenNebula will know that it should connect the VM to this bridge, thus giving it connectivity with the physical network device connected to the bridge. For example, a typical host with two physical networks, one for public IP addresses (attached to an eth0 NIC for example) and the other for private virtual LANs (NIC eth1 for example) should have two bridges: brctl show bridge name bridge id STP enabled interfaces br0 8000.001e682f02ac no eth0 br1 8000.001e682f02ad no eth1 Note Remember that this is only required in the Hosts, not in the Front-end. Also remember that it is not important the exact name of the resources ( br0 , br1 , etc...), however it’s important that the bridges and NICs have the same name in all the Hosts.
Presented By D.Kesavaraja
www.k7cloud.in
37 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step 6. Storage Configuration You can skip this step entirely if you just want to try out OpenNebula, as it will come configured by default in such a way that it uses the local storage of the Front-end to store Images, and the local storage of the hypervisors as storage for the running VMs. However, if you want to set-up another storage configuration at this stage, like Ceph, NFS, LVM, etc, you should read the Open Cloud Storage chapter. Step 8. Adding a Host to OpenNebula In this step we will register the node we have installed in the OpenNebula Front-end, so OpenNebula can launch VMs in it. This step can be done in the CLI or in Sunstone, the graphical user interface. Follow just one method, not both, as they accomplish the same. To learn more about the host subsystem, read this guide. Adding a Host through Sunstone Open the Sunstone as documented here. In the left side menu go to Infrastructure > Hosts. Click on the + button.
The fill-in the fqdn of the node in the Hostname field.
Presented By D.Kesavaraja
www.k7cloud.in
38 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Finally, return to the Hosts list, and check that the Host switch to ON status. It should take somewhere between 20s to 1m. Try clicking on the refresh button to check the status more frequently.
Presented By D.Kesavaraja
www.k7cloud.in
39 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
If the host turns to err state instead of on , check the /var/log/one/oned.log . Chances are it’s a problem with the SSH! Adding a Host through the CLI To add a node to the cloud, run this command as oneadmin in the Front-end: onehost create -i kvm -v kvm onehost list ID NAME CLUSTER RVM ALLOCATED_CPU ALLOCATED_MEM STAT 1 localhost default 0 - init
# After some time (20s - 1m) onehost list ID NAME 0 node01
CLUSTER RVM ALLOCATED_CPU ALLOCATED_MEM STAT default 0 0 / 400 (0%) 0K / 7.7G (0%) on
If the host turns to err state instead of on , check the /var/log/one/oned.log . Chances are it’s a problem with the SSH!
Presented By D.Kesavaraja
www.k7cloud.in
40 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step 8. Import Currently Running VMs (Optional) You can skip this step as importing VMs can be done at any moment, however, if you wish to see your previously deployed VMs in OpenNebula you can use the import VM functionality. Step 9. Next steps You can now jump to the optional Verify your Installation section in order to get to launch a test VM. Otherwise, you are ready to start using your cloud or you could configure more components: Authenticaton. (Optional) For integrating OpenNebula with LDAP/AD, or securing it further with other authentication technologies. Sunstone. OpenNebula GUI should be working and accessible at this stage, but by reading this guide you will learn about specific enhanced configurations for Sunstone. If your cloud is KVM based you should also follow: Open Cloud Host Setup. Open Cloud Storage Setup. Open Cloud Networking Setup. Hands on Demo Step:1 click infrastructure and select host to create new host
Step:2 click infrastructure and select cluster to create new cluster
Presented By D.Kesavaraja
www.k7cloud.in
41 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step:3
click network and select virtual network Step:4
click storage and select data store to create two new data stores for image and system
Presented By D.Kesavaraja
www.k7cloud.in
42 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step:5
Click template and select VM to create virtual machine Step:6
select storage and proceed the following, Step:7 select network and choose your virtual network,
Presented By D.Kesavaraja
www.k7cloud.in
43 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step:8
select OS booting and do the follwing, Step:9 select scheduling and choose your host and clusters Step:10 select VM and click instantiate
Step:11 click Instances and select Vms option ,VM status shown below,
Presented By D.Kesavaraja
www.k7cloud.in
44 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Step:12
click refresh icon Step:13 Now the output will be as follows
EX.NO:2
Presented By D.Kesavaraja
www.k7cloud.in
45 | P a g e
CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA
Click instances and select your VM to migrate
Select host to migrate,
EX.NO:4
Presented By D.Kesavaraja
www.k7cloud.in
46 | P a g e