CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA

CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA

Two Day Hands on Workshop on

“CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA” Organized By Department of Computer Science and Engineering

TRP Engineering College NH 45, Mannachanallur Taluk, Tiruchirappalli District, Irungalur, Tamil Nadu 621105 16-09-2016 & 17-09-2016

Resource Person

Mr.D.Kesavaraja M.E , MBA, (PhD) , MISTE Assistant Professor, Department of Computer Science and Engineering, Dr.Sivanthi Aditanar College of Engineering Tiruchendur - 628215

CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA Day – 01 Big Data Analytics with HADOOP  Introduction to Big Data  BIG DATA Analogy  Big Data Analytics  Installing Cent OS – 7  Hadoop Installation – Single Node  Hadoop Distributed File System  To set up the one node Hadoop cluster.  Mount the one node Hadoop cluster using FUSE.  JAVA API’s of Hadoop  Map and Reduce tasks

Presented By D.Kesavaraja

www.k7cloud.in

1|Page


 JAVA wordcount program to demonstrate the use of Map and Reduce tasks  Big Data Analytics Job Opportunities

Day – 02 Cloud Computing – Open Nebula IaaS  Cloud Computing  Virtualization – VM Ware Demo  IaaS, PaaS, SaaS  XaaS  What is OpenNebula?  Ecosystem  Open Nebula Setup  Open Nebula Installation  Procedure to configure IaaS  Virtual Machine Creation  Virtual Block  Storage Controller  Virtual Machine migration



Hands on - Live Experiments



E-Resources , Forums and Groups.



Discussion and Clarifications

“Knowing is not enough We must apply Willing is not enough We must do” More Details Visit : www.k7cloud.in : http://k7training.blogspot.in *************


www.k7cloud.in

2|Page


Setup the one node Apache Hadoop Cluster in CentOS 7

Aim :

To find procedure to set up the one node Hadoop cluster. Introduction : Apache Hadoop is an Open Source framework build for distributed Big Data storage and processing data across computer clusters. The project is based on the following components: 1. Hadoop Common – it contains the Java libraries and utilities needed by other Hadoop modules. 2. HDFS – Hadoop Distributed File System – A Java based scalable file system distributed across multiple nodes. 3. MapReduce – YARN framework for parallel big data processing. 4. Hadoop YARN: A framework for cluster resource management. Procedure : Step 1: Install Java on CentOS 7 1. Before proceeding with Java installation, first login with root user or a user with root privileges setup your machine hostname with the following command. # hostnamectl set-hostname master

Set Hostname in CentOS 7

Also, add a new record in hosts file with your own machine FQDN to point to your system IP Address. # vi /etc/hosts Add the below line: 192.168.1.41 master.hadoop.lan


www.k7cloud.in

3|Page


Set Hostname in /etc/hosts File

Replace the above hostname and FQDN records with your own settings. 2. Next, go to Oracle Java download page and grab the latest version of Java SE Development Kit 8 on your system with the help of curl command: # curl -LO -H "Cookie: oraclelicense=accept-securebackup-cookie" ―http://download.oracle.com/otn-pub/java/jdk/8u92-b14/jdk-8u92-linux-x64.rpm‖

Download Java SE Development Kit 8

3. After the Java binary download finishes, install the package by issuing the below command: # rpm -Uvh jdk-8u92-linux-x64.rpm

Install Java in CentOS 7

Step 2: Install Hadoop Framework in CentOS 7 4. Next, create a new user account on your system without root powers which we’ll use it for Hadoop installation path and working environment. The new account home directory will reside in /opt/hadoop directory.


www.k7cloud.in

4|Page


# useradd -d /opt/hadoop hadoop # passwd hadoop 5. On the next step visit Apache Hadoop page in order to get the link for the latest stable version and download the archive on your system. # curl -O http://apache.javapipe.com/hadoop/common/hadoop-2.7.2/hadoop2.7.2.tar.gz

Download Hadoop Package

6. Extract the archive the copy the directory content to hadoop account home path. Also, make sure you change the copied files permissions accordingly. # tar xfz hadoop-2.7.2.tar.gz # cp -rf hadoop-2.7.2/* /opt/hadoop/ # chown -R hadoop:hadoop /opt/hadoop/

Extract-and Set Permissions on Hadoop

7. Next, login with hadoop user and configure Hadoop and Java Environment Variables on your system by editing the .bash_profile file. # su - hadoop $ vi .bash_profile Append the following lines at the end of the file: ## JAVA env variables export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME/bin


www.k7cloud.in

5|Page


export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar ## HADOOP env variables export HADOOP_HOME=/opt/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Configure Hadoop and Java Environment Variables

8. Now, initialize the environment variables and check their status by issuing the below commands: $ source .bash_profile $ echo $HADOOP_HOME $ echo $JAVA_HOME


www.k7cloud.in

6|Page


Initialize Linux Environment Variables

9. Finally, configure ssh key based authentication for hadoop account by running the below commands (replace the hostname or FQDN against the ssh-copyid command accordingly). Also, leave the passphrase filed blank in order to automatically login via ssh. $ ssh-keygen -t rsa $ ssh-copy-id master.hadoop.lan

Configure SSH Key Based Authentication

Step 3: Configure Hadoop in CentOS 7 10. Now it’s time to setup Hadoop cluster on a single node in a pseudo distributed mode by editing its configuration files. The location of hadoop configuration files is $HADOOP_HOME/etc/hadoop/, which is represented in this tutorial by hadoop account home directory (/opt/hadoop/) path. Once you’re logged in with user hadoop you can start editing the following configuration file. The first to edit is core-site.xml file. This file contains information about the port number used by Hadoop instance, file system allocated memory, data store memory limit and the size of Read/Write buffers.


www.k7cloud.in

7|Page


$ vi etc/hadoop/core-site.xml Add the following properties between ... tags. Use localhost or your machine FQDN for hadoop instance. fs.defaultFS hdfs://master.hadoop.lan:9000/

Configure Hadoop Cluster 11. Next open and edit hdfs-site.xml file. The file contains information about the value of replication data, namenode path and datanode path for local file systems. $ vi etc/hadoop/hdfs-site.xml Here add the following properties between ... tags. On this guide we’ll use/opt/volume/ directory to store our hadoop file system. Replace the dfs.data.dir and dfs.name.dir values accordingly. dfs.data.dir file:///opt/volume/datanode


www.k7cloud.in

8|Page


dfs.name.dir file:///opt/volume/namenode

Configure Hadoop Storage

12. Because we’ve specified /op/volume/ as our hadoop file system storage, we need to create those two directories (datanode and namenode) from root account and grant all permissions to hadoop account by executing the below commands. $ su root # mkdir -p /opt/volume/namenode # mkdir -p /opt/volume/datanode # chown -R hadoop:hadoop /opt/volume/ # ls -al /opt/ #Verify permissions # exit #Exit root account to turn back to hadoop user


www.k7cloud.in

9|Page


Configure Hadoop System Storage 13. Next, create the mapred-site.xml file to specify that we are using yarn MapReduce framework. $ vi etc/hadoop/mapred-site.xml Add the following excerpt to mapred-site.xml file: mapreduce.framework.name yarn


www.k7cloud.in

10 | P a g e


Set Yarn MapReduce Framework 14. Now, edit yarn-site.xml file with the below statements enclosed between ... tags: $ vi etc/hadoop/yarn-site.xml Add the following excerpt to yarn-site.xml file: yarn.nodemanager.aux-services mapreduce_shuffle

Add Yarn Configuration

15. Finally, set Java home variable for Hadoop environment by editing the below line from hadoop-env.sh file. $ vi etc/hadoop/hadoop-env.sh Edit the following line to point to your Java system path. export JAVA_HOME=/usr/java/default/


www.k7cloud.in

11 | P a g e


Set Java Home Variable for Hadoop

16. Also, replace the localhost value from slaves file to point to your machine hostname set up at the beginning of this tutorial. $ vi etc/hadoop/slaves Step 4: Format Hadoop Namenode 17. Once hadoop single node cluster has been setup it’s time to initialize HDFS file system by formatting the /opt/volume/namenode storage directory with the following command: $ hdfs namenode -format


www.k7cloud.in

12 | P a g e


Format Hadoop Namenode

Hadoop Namenode Formatting Process

Step 5: Start and Test Hadoop Cluster 18. The Hadoop commands are located in $HADOOP_HOME/sbin directory. In order to start Hadoop services run the below commands on your console: $ start-dfs.sh $ start-yarn.sh


www.k7cloud.in

13 | P a g e


Check the services status with the following command. $ jps

Start and Test Hadoop Cluster

Alternatively, you can view a list of all open sockets for Apache Hadoop on your system using the ss command. $ ss -tul $ ss -tuln # Numerical output


www.k7cloud.in

14 | P a g e


Check Apache Hadoop Sockets

19. To test hadoop file system cluster create a random directory in the HDFS file system and copy a file from local file system to HDFS storage (insert data to HDFS). $ hdfs dfs -mkdir /my_storage $ hdfs dfs -put LICENSE.txt /my_storage

Check Hadoop Filesystem Cluster

To view a file content or list a directory inside HDFS file system issue the below commands: $ hdfs dfs -cat /my_storage/LICENSE.txt $ hdfs dfs -ls /my_storage/


www.k7cloud.in

15 | P a g e


List Hadoop Filesystem Content

Check Hadoop Filesystem Directory

To retrieve data from HDFS to our local file system use the below command: $ hdfs dfs -get /my_storage/ ./

Copy Hadoop Filesystem Data to Local System

Get the full list of HDFS command options by issuing: $ hdfs dfs -help

Step 6: Browse Hadoop Services 20. In order to access Hadoop services from a remote browser visit the following links (replace the IP Address of FQDN accordingly). Also, make sure the below ports are open on your system firewall. For Hadoop Overview of NameNode service. http://192.168.1.41:50070


www.k7cloud.in

16 | P a g e


Access Hadoop Services

For Hadoop file system browsing (Directory Browse). http://192.168.1.41:50070/explorer.html

Hadoop Filesystem Directory Browsing

For Cluster and Apps Information (ResourceManager). http://192.168.1.41:8088


www.k7cloud.in

17 | P a g e


Hadoop Cluster Applications

For NodeManager Information. http://192.168.1.41:8042

Hadoop NodeManager

Step 7: Manage Hadoop Services 21. To stop all hadoop instances run the below commands: $ stop-yarn.sh $ stop-dfs.sh


www.k7cloud.in

18 | P a g e


Stop Hadoop Services

22. In order to enable Hadoop daemons system-wide, login with root user, open /etc/rc.local file for editing and add the below lines: $ su - root # vi /etc/rc.local Add these excerpt to rc.local file. su - hadoop -c "/opt/hadoop/sbin/start-dfs.sh" su - hadoop -c "/opt/hadoop/sbin/start-yarn.sh" exit 0

Enable Hadoop Services at System-Boot


www.k7cloud.in

19 | P a g e


Then, add executable permissions for rc.local file and enable, start and check service status by issuing the below commands: $ $ $ $

chmod +x /etc/rc.d/rc.local systemctl enable rc-local systemctl start rc-local systemctl status rc-local

Enable and Check Hadoop Services

That’s it! Next time you reboot your machine the Hadoop services will be automatically started for you! Hadoop Fuse Installation and Configuration on Centos What is Fuse?  FUSE permits you to write down a traditional user land application as a bridge for a conventional file system interface.  The hadoop-hdfs-fuse package permits you to use your HDFS cluster as if it were a conventional file system on Linux.  It’s assumed that you simply have a operating HDFS cluster and grasp the hostname and port that your NameNode exposes.  The Hadoop fuse installation and configuration with Mounting HDFS, HDFS mount using fuse is done by following the below steps. Step 1 : Required Dependencies Step 2 : Download and Install FUSE Step 3 : Install RPM Packages Step 4 : Modify HDFS FUSE Step 5 : Check HADOOP Services Step 6 : Create a Directory to Mount HADOOP


www.k7cloud.in

20 | P a g e


Step 7 : Modify HDFS-MOUNT Script Step 8 : Create softlinks of LIBHDFS.SO Step 9 : Check Memory Details To start Hadoop fuse installation and configuration follow the steps: Step 1 : Required Dependencies Hadoop single / multinode Cluster (started mode) jdk (preinstalled) Fuse mount Installation and configuration guide has prepared on following platform and services. Operating System : CentOS release 6.4 (Final) 32bit hadoop : hadoop-1.2.1 mysql-server : 5.1.71 JDK : java version ―1.7.0_45″ 32bit (jdk-7u45-linux-i586.rpm) fuse : hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz fuse RPMs : fuse-libs-2.8.3-4.el6.i686, ————————- —- fuse-2.8.3-4.el6.i686, ————————fuse-devel-2.8.3-4.el6.i686. Step2 : Download and install fuse Llogin as hadoop user to a node in hadoop cluster (master / datanode) Download hdfs-fuse from following location [hadoop@hadoop ~]#wget https://hdfs-fuse.googlecode.com/files/hdfsfuse-0.2.linux2.6-gcc4.1-x86.tar.gz Extract hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz [hadoop@hadoop ~]#tar -zxvf hdfs-fuse-0.2.linux2.6-gcc4.1-x86.tar.gz Step 3 : Install rpm packages Switch to root user to install following rpm packages fuse-libs-2.8.3-4.el6.i686 fuse-2.8.3-4.el6.i686 fuse-devel-2.8.3-4.el6.i686 [hadoop@hadoop ~]#su – root [root@hadoop ~]#yum install fuse* [root@hadoop ~]#chmod +x /usr/bin/fusermount Step 4 : Modify hdfs fuse After installation of rpm packages, switch back to hadoop user [root@hadoop ~]# su – hadoop Modify hdfs fuse configuration / environmental variables [hadoop@hadoop ~]$cd hdfs-fuse/conf/


www.k7cloud.in

21 | P a g e


Add following lines in hdfs-fuse.conf [hadoop@hadoop conf]$vi hdfs-fuse.conf export JAVA_HOME=/usr/java/jdk1.7.0_45 # JAVA HOME path export HADOOP_HOME=/home/hadoop/hadoop-1.2.1 # hadoop installation home path export FUSE_HOME=/home/hadoop #fuse installation path export HDFS_FUSE_HOME=/home/hadoop/hdfs-fuse # fuse home path export HDFS_FUSE_CONF=/home/hadoop/hdfs-fuse/conf # fuse configuration path LogDir /tmp LogLevel LOG_DEBUG Hostname 192.168.1.52 # hadoop master node IP Port 9099 # hadoop port number Step 5 : Check hadoop services [hadoop@hadoop conf]$cd .. verify hadoop instance is running [hadoop@hadoop hdfs-fuse]$ jps 2643 TaskTracker 4704 Jps 2206 NameNode 2516 JobTracker 2432 SecondaryNameNode 2316 DataNode Step 6 : Create a directory to mount hadoop Create a folder to mount hadoop file system to it [hadoop@hadoop hdfs-fuse]#mkdir /home/hadoop/hdfsmount [hadoop@hadoop hdfs-fuse]# cd [hadoop@hadoop ~]#pwd Step 7 : Modify hdfs-mount script Switch to hdfc fuse binary folder in order to run mount script. [hadoop@hadoop ~]#cd hdfs-fuse/bin/ Modify hdfs-mount script to show jvm path location and other environmental settings, in our installation guide this is the location for jvm (usr/java/jdk1.7.0_45/jre/lib/i386/server) [hadoop@hadoop bin]$ vi hdfs-mount JAVA_JVM_DIR=/usr/java/jdk1.7.0_45/jre/lib/i386/server export JAVA_HOME=/usr/java/jdk1.7.0_45 export HADOOP_HOME=/home/hadoop/hadoop-1.2.1 export FUSE_HOME=/home/hadoop export HDFS_FUSE_HOME=/home/hadoop/hdfs-fuse export HDFS_FUSE_CONF=/home/hadoop/hdfs-fuse/conf Step 8 : Create softlinks of libhdfs.so


www.k7cloud.in

22 | P a g e


create softlinks of libhdfs.so which is located in (/home/hadoop/hadoop1.2.1/c++/Linux-i386-32/lib/libhdfs.so) [root@hadoop ~]# cd /home/hadoop/hdfs-fuse/lib/ [root@hadoop lib]# ln -s /home/hadoop/hadoop-1.2.1/c++/Linux-i38632/lib/libhdfs . Mount HDFS file system to /home/hadoop/hdfsmount [hadoop@hadoop bin]#./hdfs-mount /home/hadoop/hdfsmount or [hadoop@hadoop bin]$./hdfs-mount -d /home/hadoop/hdfsmount (-d option to enable debug) Step 9 : Check memory details [hadoop@hadoop bin]# df -h Filesystem Size /dev/mapper/vg_hadoop-lv_root 50G tmpfs 504M /dev/shm /dev/sda1 /dev/mapper/vg_hadoop-lv_home hdfs-fuse /home/hadoop/hdfsmount

Used 1.4G

Avail 46G 0

Use% 3% 504M

Mounted on / 0%

485M 29G 768M

30M 1.2G 64M

430M 27G 704M

7% 5% 9%

/boot /home

[hadoop@hadoop bin]$ ls /home/hadoop/hdfsmount/ tmp

user

Use below ―fusermount‖ command to unmount hadoop file system [hadoop@hadoop bin]$fusermount -u /home/hadoop/hdfsmount The fuse mount is ready to use as local file system. Using FileSystem API to read and write data to HDFS Reading data from and writing data to Hadoop Distributed File System (HDFS) can be done in a lot of ways. Now let us start by using the FileSystem API to create and write to a file in HDFS, followed by an application to read a file from HDFS and write it back to the local file system. Step 1: Once you have downloaded a test dataset, we can write an application to read a file from the local file system and write the contents to Hadoop Distributed File System. import org.apache.hadoop.conf.Configured; import org.apache.hadoop.util.Tool; import java.io.BufferedInputStream; import java.io.FileInputStream; import java.io.InputStream; import java.io.OutputStream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; Presented By D.Kesavaraja

www.k7cloud.in

23 | P a g e


import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.util.ToolRunner; public class HdfsWriter extends Configured implements Tool { public static final String FS_PARAM_NAME = "fs.defaultFS"; public int run(String[] args) throws Exception { if (args.length < 2) { System.err.println("HdfsWriter [local input path] [hdfs output path]"); return 1; } String localInputPath = args[0]; Path outputPath = new Path(args[1]); Configuration conf = getConf(); System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME)); FileSystem fs = FileSystem.get(conf); if (fs.exists(outputPath)) { System.err.println("output path exists"); return 1; } OutputStream os = fs.create(outputPath); InputStream is = new BufferedInputStream(new FileInputStream(localInputPath)); IOUtils.copyBytes(is, os, conf); return 0; } public static void main( String[] args ) throws Exception { int returnCode = ToolRunner.run(new HdfsWriter(), args); System.exit(returnCode); } } Step 2: Export the Jar file and run the code from terminal to write a sample file to HDFS. [root@localhost student]# vi HdfsWriter.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/javac HdfsWriter.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/jar cvfe HdfsWriter.jar HdfsWriter HdfsWriter.class [root@localhost student]# hadoop jar HdfsWriter.jar a.txt kkk.txt configured filesystem = hdfs://localhost:9000/ [root@localhost student]# hadoop jar HdfsReader.jar /user/root/kkk.txt kesava.txt configured filesystem = hdfs://localhost:9000/ Step 3: Verify whether the file is written into HDFS and check the contents of the file. Presented By D.Kesavaraja

www.k7cloud.in

24 | P a g e


[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt Step 4: Next, we write an application to read the file we just created in Hadoop Distributed File System and write its contents back to the local file system. import java.io.BufferedOutputStream; import java.io.FileOutputStream; import java.io.InputStream; import java.io.OutputStream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class HdfsReader extends Configured implements Tool { public static final String FS_PARAM_NAME = "fs.defaultFS"; public int run(String[] args) throws Exception { if (args.length < 2) { System.err.println("HdfsReader [hdfs input path] [local output path]"); return 1; } Path inputPath = new Path(args[0]); String localOutputPath = args[1]; Configuration conf = getConf(); System.out.println("configured filesystem = " + conf.get(FS_PARAM_NAME)); FileSystem fs = FileSystem.get(conf); InputStream is = fs.open(inputPath); OutputStream os = new BufferedOutputStream(new FileOutputStream(localOutputPath)); IOUtils.copyBytes(is, os, conf); return 0; } public static void main( String[] args ) throws Exception { int returnCode = ToolRunner.run(new HdfsReader(), args); System.exit(returnCode); } } Step 5: Export the Jar file and run the code from terminal to write a sample file to HDFS. [root@localhost student]# vi HdfsReader.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/javac HdfsReader.java [root@localhost student]# /usr/java/jdk1.8.0_91/bin/jar cvfe HdfsReader.jar HdfsReader HdfsReader.class [root@localhost student]# hadoop jar HdfsReader.jar /user/root/kkk.txt sample.txt configured filesystem = hdfs://localhost:9000/ Step 6: Verify whether the file is written back into local file system. Presented By D.Kesavaraja

www.k7cloud.in

25 | P a g e


[training@localhost ~]$ hadoop fs -cat /user/training/HdfsWriter_sample.txt

MAP REDUCE [student@localhost ~]$ su Password: [root@localhost student]# su - hadoop Last login: Wed Aug 31 10:14:26 IST 2016 on pts/1 [hadoop@localhost ~]$ mkdir mapreduce [hadoop@localhost ~]$ cd mapreduce [hadoop@localhost mapreduce]$ vi WordCountMapper.java import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.Mapper; public class WordCountMapper extends Mapper { private final static IntWritable one= new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = newStringTokenizer (line); while(tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word,one); } } } [hadoop@localhost mapreduce]$ vi WordCountReducer.java import import import import

java.io.IOException; java.util.Iterator; org.apache.hadoop.io.*; org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer { //Reduce method for just outputting the key from mapper as the value from mapper is just an empty string Presented By D.Kesavaraja

www.k7cloud.in

26 | P a g e


public void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; //iterates through all the values available with a key and add them together and give the final result as the key and sum of its values for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } } [hadoop@localhost mapreduce]$ vi WordCount.java import import import import import import import import import import

org.apache.hadoop.fs.Path; org.apache.hadoop.io.IntWritable; org.apache.hadoop.io.Text; org.apache.hadoop.conf.Configuration; org.apache.hadoop.conf.Configured; org.apache.hadoop.mapreduce.Job; org.apache.hadoop.mapreduce.lib.input.FileInputFormat; org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; org.apache.hadoop.util.Tool; org.apache.hadoop.util.ToolRunner;

public class WordCount extends Configured implements Tool { public int run(String[] args) throws Exception { //getting configuration object and setting job name Configuration conf = getConf(); Job job = new Job(conf, "Word Count hadoop-0.20"); //setting the class names job.setJarByClass(WordCount.class); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); //setting the output data type classes job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //to accept the hdfs input and output dir at run time FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return job.waitForCompletion(true) ? 0 : 1; Presented By D.Kesavaraja

www.k7cloud.in

27 | P a g e


} public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), newWordCount(), args); System.exit(res); } }

Download CentOS As you download and use CentOS Linux, the CentOS Project invites you to be a part of the community as a contributor. There are many ways to contribute to the project, from documentation, QA, and testing to coding changes for SIGs, providing mirroring or hosting, and helping other users.


www.k7cloud.in

28 | P a g e


DVD ISO Everything ISO Minimal ISO ISOs are also available via Torrent. How to verify your iso If the above is not for you, alternative downloads might be. The release notes are continuously updated to include issues and incorporate feedback from users. Front-end Installation This page shows you how to install OpenNebula from the binary packages. Using the packages provided in our site is the recommended method, to ensure the installation of the latest version and to avoid possible packages divergences of different distributions. There are two alternatives here: you can add our package repositories to your system, or visit the software menu to download the latest package for your Linux distribution. If there are no packages for your distribution, head to the Building from Source Code guide. Step 1. Disable SElinux in CentOS/RHEL 7 SElinux can cause some problems, like not trusting oneadmin user’s SSH credentials. You can disable it changing in the file /etc/selinux/config this line: SELINUX=disabled After this file is changed reboot the machine. Step 2. Add OpenNebula Repositories CentOS/RHEL 7 To add OpenNebula repository execute the following as root: cat << EOT > /etc/yum.repos.d/opennebula.repo [opennebula] name=opennebula baseurl=http://downloads.opennebula.org/repo/5.0/CentOS/7/x86_64 enabled=1 gpgcheck=0 EOT Debian/Ubuntu To add OpenNebula repository on Debian/Ubuntu execute as root: wget -q -O- http://downloads.opennebula.org/repo/Debian/repo.key | apt-key add Debian 8 echo "deb http://downloads.opennebula.org/repo/5.0/Debian/8 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Ubuntu 14.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/14.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list


www.k7cloud.in

29 | P a g e


Ubuntu 16.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/16.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Step 3. Installing the Software Installing on CentOS/RHEL 7 Before installing:  Activate the EPEL repo. In CentOS this can be done with the following command: yum install epel-release There are packages for the Front-end, distributed in the various components that conform OpenNebula, and packages for the virtualization host. To install a CentOS/RHEL OpenNebula Front-end with packages from our repository, execute the following as root. yum install opennebula-server opennebula-sunstone opennebula-ruby opennebulagate opennebula-flow CentOS/RHEL Package Description These are the packages available for this distribution:  opennebula: Command Line Interface.  opennebula-server: Main OpenNebula daemon, scheduler, etc.  opennebula-sunstone: Sunstone (the GUI) and the EC2 API.  opennebula-ruby: Ruby Bindings.  opennebula-java: Java Bindings.  opennebula-gate: OneGate server that enables communication between VMs and OpenNebula.  opennebula-flow: OneFlow manages services and elasticity.  opennebula-node-kvm: Meta-package that installs the oneadmin user, libvirt and kvm.  opennebula-common: Common files for OpenNebula packages. Note The files located in /etc/one and /var/lib/one/remotes are marked as configuration files. Installing on Debian/Ubuntu To install OpenNebula on a Debian/Ubuntu Front-end using packages from our repositories execute as root: apt-get update apt-get install opennebula opennebula-sunstone opennebula-gate opennebula-flow Debian/Ubuntu Package Description These are the packages available for these distributions:


www.k7cloud.in

30 | P a g e


opennebula-common: Provides the user and common files. ruby-opennebula: Ruby API. libopennebula-java: Java API. libopennebula-java-doc: Java API Documentation. opennebula-node: Prepares a node as an opennebula-node. opennebula-sunstone: Sunstone (the GUI). opennebula-tools: Command Line interface. opennebula-gate: OneGate server that enables communication between VMs and OpenNebula.  opennebula-flow: OneFlow manages services and elasticity.  opennebula: OpenNebula Daemon. Note Besides /etc/one , the following files are marked as configuration files:        



/var/lib/one/remotes/datastore/ceph/ceph.conf

/var/lib/one/remotes/vnm/OpenNebulaNetwork.conf Step 4. Ruby Runtime Installation Some OpenNebula components need Ruby libraries. OpenNebula provides a script that installs the required gems as well as some development libraries packages needed. As root execute: 

/usr/share/one/install_gems The previous script is prepared to detect common Linux distributions and install the required libraries. If it fails to find the packages needed in your system, manually install these packages:  sqlite3 development library  mysql client development library  curl development library  libxml2 and libxslt development libraries  ruby development library  gcc and g++  make


www.k7cloud.in

31 | P a g e


If you want to install only a set of gems for an specific component read Building from Source Code where it is explained in more depth. Step 5. Enabling MySQL/MariaDB (Optional) You can skip this step if you just want to deploy OpenNebula as quickly as possible. However if you are deploying this for production or in a more serious environment, make sure you read the MySQL Setup section. Note that it is possible to switch from SQLite to MySQL, but since it’s more cumbersome to migrate databases, we suggest that if in doubt, use MySQL from the start. Step 6. Starting OpenNebula Log in as the oneadmin user follow these steps: The /var/lib/one/.one/one_auth fill will have been created with a randomlygenerated password. It should contain the following: oneadmin: . Feel free to change the password before starting OpenNebula. For example: echo "oneadmin:mypassword" > ~/.one/one_auth Warning This will set the oneadmin password on the first boot. From that point, you must use the oneuser passwd command to change oneadmin’s password. You are ready to start the OpenNebula daemons: service opennebula start service opennebula-sunstone start Step 7. Verifying the Installation After OpenNebula is started for the first time, you should check that the commands can connect to the OpenNebula daemon. You can do this in the Linux CLI or in the graphical user interface: Sunstone. Linux CLI In the Front-end, run the following command as oneadmin: oneuser show USER 0 INFORMATION ID :0 NAME : oneadmin GROUP : oneadmin PASSWORD : 3bc15c8aae3e4124dd409035f32ea2fd6835efc9 AUTH_DRIVER : core ENABLED : Yes USER TEMPLATE TOKEN_PASSWORD="ec21d27e2fe4f9ed08a396cbd47b08b8e0a4ca3c" RESOURCE USAGE & QUOTAS If you get an error message, then the OpenNebula daemon could not be started properly:


www.k7cloud.in

32 | P a g e


oneuser show Failed to open TCP connection to localhost:2633 (Connection refused - connect(2) for "localhost" port 2633) The OpenNebula logs are located in /var/log/one , you should have at least the files oned.log and sched.log , the core and scheduler logs. Check oned.log for any error messages, marked with [E] . Sunstone Now you can try to log in into Sunstone web interface. To do this point your browser to http://:9869 . If everything is OK you will be greeted with a login page. The user is oneadmin and the password is the one in the file /var/lib/one/.one/one_auth in your Front-end. If the page does not load, make sure you check /var/log/one/sunstone.log and /var/log/one/sunstone.error . Also, make sure TCP port 9869 is allowed through the firewall. Directory Structure The following table lists some notable paths that are available in your Front-end after the installation: Path

Description

/etc/one/

Configuration Files

/var/log/one/

Log files, notably: oned.log , sched.log , sunstone.log and .log

/var/lib/one/ /var/lib/one/datastores/< dsid>/ /var/lib/one/vms// /var/lib/one/.one/one_aut h /var/lib/one/remotes/ /var/lib/one/remotes/hook s/


oneadmin home directory Storage for the datastores Action files for VMs (deployment file, transfer manager scripts, etc...) oneadmin credentials Probes and scripts that will be synced to the Hosts Hook scripts

www.k7cloud.in

33 | P a g e


Path /var/lib/one/remotes/vmm / /var/lib/one/remotes/auth / /var/lib/one/remotes/im/ /var/lib/one/remotes/mark et/ /var/lib/one/remotes/data store/ /var/lib/one/remotes/vnm / /var/lib/one/remotes/tm/

Description Virtual Machine Manager Driver scripts

Authentication Driver scripts Information Manager (monitoring) Driver scripts MarketPlace Driver scripts

Datastore Driver scripts

Networking Driver scripts Transfer Manager Driver scripts

Step 8. Next steps Now that you have successfully started your OpenNebula service, head over to the Node Installation chapter in order to add hypervisors to your cloud. KVM Node Installation This page shows you how to install OpenNebula from the binary packages. Using the packages provided in our site is the recommended method, to ensure the installation of the latest version and to avoid possible packages divergences of different distributions. There are two alternatives here: you can add our package repositories to your system, or visit the software menu to download the latest package for your Linux distribution. Step 1. Add OpenNebula Repositories CentOS/RHEL 7 To add OpenNebula repository execute the following as root: cat << EOT > /etc/yum.repos.d/opennebula.repo [opennebula] name=opennebula baseurl=http://downloads.opennebula.org/repo/5.0/CentOS/7/x86_64 enabled=1 gpgcheck=0 EOT


www.k7cloud.in

34 | P a g e


Debian/Ubuntu To add OpenNebula repository on Debian/Ubuntu execute as root: wget -q -O- http://downloads.opennebula.org/repo/Debian/repo.key | apt-key add Debian 8 echo "deb http://downloads.opennebula.org/repo/5.0/Debian/8 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Ubuntu 14.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/14.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Ubuntu 16.04 echo "deb http://downloads.opennebula.org/repo/5.0/Ubuntu/16.04 stable opennebula" > /etc/apt/sources.list.d/opennebula.list Step 2. Installing the Software Installing on CentOS/RHEL Execute the following commands to install the node package and restart libvirt to use the OpenNebula provided configuration file: sudo yum install opennebula-node-kvm sudo service libvirtd restart For further configuration, check the specific guide: KVM. Installing on Debian/Ubuntu Execute the following commands to install the node package and restart libvirt to use the OpenNebula provided configuration file: sudo apt-get install opennebula-node sudo service libvirtd restart # debian sudo service libvirt-bin restart # ubuntu For further configuration check the specific guide: KVM. Step 3. Disable SElinux in CentOS/RHEL 7 SElinux can cause some problems, like not trusting oneadmin user’s SSH credentials. You can disable it changing in the file /etc/selinux/config this line: SELINUX=disabled After this file is changed reboot the machine. Step 4. Configure Passwordless SSH OpenNebula Front-end connects to the hypervisor Hosts using SSH. You must distribute the public key of oneadmin user from all machines to the file /var/lib/one/.ssh/authorized_keys in all the machines. There are many methods to achieve the distribution of the SSH keys, ultimately the administrator should choose a method (the recommendation is to use a configuration management system). In this guide we are going to manually scp the SSH keys.


www.k7cloud.in

35 | P a g e


When the package was installed in the Front-end, an SSH key was generated and the authorized_keys populated. We will sync the id_rsa , id_rsa.pub and authorized_keys from the Front-end to the nodes. Additionally we need to create a known_hosts file and sync it as well to the nodes. To create the known_hosts file, we have to execute this command as user oneadmin in the Front-end with all the node names as parameters: ssh-keyscan ... >> /var/lib/one/.ssh/known_hosts Now we need to copy the directory /var/lib/one/.ssh to all the nodes. The easiest way is to set a temporary password to oneadmin in all the hosts and copy the directory from the Front-end: scp -rp /var/lib/one/.ssh :/var/lib/one/ scp -rp /var/lib/one/.ssh :/var/lib/one/ scp -rp /var/lib/one/.ssh :/var/lib/one/ ... You should verify that connecting from the Front-end, as user oneadmin , to the nodes, and from the nodes to the Front-end, does not ask password: ssh ssh exit exit ssh ssh exit exit ssh ssh exit exit


www.k7cloud.in

36 | P a g e


Step 5. Networking Configuration

A network connection is needed by the OpenNebula Front-end daemons to access the hosts to manage and monitor the Hosts, and to transfer the Image files. It is highly recommended to use a dedicated network for this purpose. There are various network models (please check the Networking chapter to find out the networking technologies supported by OpenNebula). You may want to use the simplest network model that corresponds to the bridged drivers. For this driver, you will need to setup a linux bridge and include a physical device to the bridge. Later on, when defining the network in OpenNebula, you will specify the name of this bridge and OpenNebula will know that it should connect the VM to this bridge, thus giving it connectivity with the physical network device connected to the bridge. For example, a typical host with two physical networks, one for public IP addresses (attached to an eth0 NIC for example) and the other for private virtual LANs (NIC eth1 for example) should have two bridges: brctl show bridge name bridge id STP enabled interfaces br0 8000.001e682f02ac no eth0 br1 8000.001e682f02ad no eth1 Note Remember that this is only required in the Hosts, not in the Front-end. Also remember that it is not important the exact name of the resources ( br0 , br1 , etc...), however it’s important that the bridges and NICs have the same name in all the Hosts.


www.k7cloud.in

37 | P a g e


Step 6. Storage Configuration You can skip this step entirely if you just want to try out OpenNebula, as it will come configured by default in such a way that it uses the local storage of the Front-end to store Images, and the local storage of the hypervisors as storage for the running VMs. However, if you want to set-up another storage configuration at this stage, like Ceph, NFS, LVM, etc, you should read the Open Cloud Storage chapter. Step 8. Adding a Host to OpenNebula In this step we will register the node we have installed in the OpenNebula Front-end, so OpenNebula can launch VMs in it. This step can be done in the CLI or in Sunstone, the graphical user interface. Follow just one method, not both, as they accomplish the same. To learn more about the host subsystem, read this guide. Adding a Host through Sunstone Open the Sunstone as documented here. In the left side menu go to Infrastructure > Hosts. Click on the + button.

The fill-in the fqdn of the node in the Hostname field.


www.k7cloud.in

38 | P a g e


Finally, return to the Hosts list, and check that the Host switch to ON status. It should take somewhere between 20s to 1m. Try clicking on the refresh button to check the status more frequently.


www.k7cloud.in

39 | P a g e


If the host turns to err state instead of on , check the /var/log/one/oned.log . Chances are it’s a problem with the SSH! Adding a Host through the CLI To add a node to the cloud, run this command as oneadmin in the Front-end: onehost create -i kvm -v kvm onehost list ID NAME CLUSTER RVM ALLOCATED_CPU ALLOCATED_MEM STAT 1 localhost default 0 - init

# After some time (20s - 1m) onehost list ID NAME 0 node01

CLUSTER RVM ALLOCATED_CPU ALLOCATED_MEM STAT default 0 0 / 400 (0%) 0K / 7.7G (0%) on

If the host turns to err state instead of on , check the /var/log/one/oned.log . Chances are it’s a problem with the SSH!


www.k7cloud.in

40 | P a g e


Step 8. Import Currently Running VMs (Optional) You can skip this step as importing VMs can be done at any moment, however, if you wish to see your previously deployed VMs in OpenNebula you can use the import VM functionality. Step 9. Next steps You can now jump to the optional Verify your Installation section in order to get to launch a test VM. Otherwise, you are ready to start using your cloud or you could configure more components:  Authenticaton. (Optional) For integrating OpenNebula with LDAP/AD, or securing it further with other authentication technologies.  Sunstone. OpenNebula GUI should be working and accessible at this stage, but by reading this guide you will learn about specific enhanced configurations for Sunstone. If your cloud is KVM based you should also follow:  Open Cloud Host Setup.  Open Cloud Storage Setup.  Open Cloud Networking Setup. Hands on Demo Step:1 click infrastructure and select host to create new host

Step:2 click infrastructure and select cluster to create new cluster


www.k7cloud.in

41 | P a g e


Step:3

click network and select virtual network Step:4

click storage and select data store to create two new data stores for image and system


www.k7cloud.in

42 | P a g e


Step:5

Click template and select VM to create virtual machine Step:6

select storage and proceed the following, Step:7 select network and choose your virtual network,


www.k7cloud.in

43 | P a g e


Step:8

select OS booting and do the follwing, Step:9 select scheduling and choose your host and clusters Step:10 select VM and click instantiate

Step:11 click Instances and select Vms option ,VM status shown below,


www.k7cloud.in

44 | P a g e


Step:12

click refresh icon Step:13 Now the output will be as follows

EX.NO:2


www.k7cloud.in

45 | P a g e


Click instances and select your VM to migrate

Select host to migrate,

EX.NO:4


www.k7cloud.in

46 | P a g e

CLOUD COMPUTING LAB SETUP USING HADOOP & OPEN NEBULA

Recommend Documents