diff --git a/Reference-Platform/EnterpriseEdition/ODPi-BigTop-Hadoop-Config-Run.md b/Reference-Platform/EnterpriseEdition/ODPi-BigTop-Hadoop-Config-Run.md new file mode 100644 index 0000000..5e134d1 --- /dev/null +++ b/Reference-Platform/EnterpriseEdition/ODPi-BigTop-Hadoop-Config-Run.md @@ -0,0 +1,320 @@ +This post concentrates on Running Hadoop after [installing](ODPi-Hadoop-Installation.md) ODPi components built using Apache BigTop. These steps are only for configuring it on a single node and running them on a single node. + +# Add Hadoop User + We need to create a dedicated user (hduser) for running Hadoop. This user needs to be added to hadoop usergroup: + +```shell +sudo useradd -G hadoop hduser +``` + +give a password for hduser + +```shell +sudo passwd hduser +``` + +Add hduser to sudoers list +On Debian: + +```shell +sudo adduser hduser sudo +``` + +On Centos: + +```shell +sudo usermod -G wheel hduser +``` + +Switch to hduser: + +```shell +sudo su - hduser +``` + +# Generate ssh key for hduser + +```shell +ssh-keygen -t rsa -P "" +``` + +Press \ to leave to default file name. + +Enable ssh access to local machine: + +```shell +cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys +``` + +Test ssh setup, as hduser: + +```shell +ssh localhost +``` + +# Disabling IPv6 + +```shell +sudo nano /etc/sysctl.conf +``` + +Add the below lines to the end and save: + +```shell +net.ipv6.conf.all.disable_ipv6 = 1 +net.ipv6.conf.default.disable_ipv6 = 1 +net.ipv6.conf.lo.disable_ipv6 = 1 +``` + +Prefer IPv4 on Hadoop: + +```shell +sudo nano /etc/hadoop/conf/hadoop-env.sh +``` + +Uncomment line: + +```shell +# export HADOOP_OPTS=-Djava.net.preferIPV4stack=true +``` + +Run sysctl to apply the changes: + +```shell +sudo sysctl -p +``` + +# Configuring the app environment +Configure the app environment by following steps: + +```shell +sudo mkdir -p /app/hadoop/tmp +sudo chown hduser:hadoop /app/hadoop/tmp +sudo chmod 750 /app/hadoop/tmp +sudo chown hduser:hadoop /usr/lib/hadoop +sudo chmod 750 /usr/lib/hadoop +``` + +# Setting up Environment Variables +Follow the below steps to setup Environment Variables in bash file : + +```shell +sudo su - hduser +nano .bashrc +``` + +Add the following to the end and save: + +```shell +export HADOOP_HOME=/usr/lib/hadoop +export HADOOP_PREFIX=$HADOOP_HOME +export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib/native" +export HADOOP_LIBEXEC_DIR=/usr/lib/hadoop/libexec +export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop +export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native +export HADOOP_COMMON_HOME=$HADOOP_HOME +export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce +export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs +export YARN_HOME=/usr/lib/hadoop-yarn +export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn/ +export CLASSPATH=$CLASSPATH:. +export CLASSPATH=$CLASSPATH:$HADOOP_HOME/hadoop-common-2.6.0.jar +export CLASSPATH=$CLASSPATH:$HADOOP_HOME/client/hadoop-hdfs-2.6.0.jar +export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") +export PATH=/usr/lib/hadoop/libexec:/etc/hadoop/conf:$HADOOP_HOME/bin/:$PATH +``` + +Execute the terminal environment again (`bash`), or simply logout and change to `hduser` again. + +# Modifying config files +## core-site.xml + +```shell +sudo nano /etc/hadoop/conf/core-site.xml +``` + +And add/modify the following settings: +Look for property with fs.defaultFS and modify as below: + +```shell + + fs.default.name + hdfs://localhost:54310 + The name of the default file system. A URI whose + scheme and authority determine the FileSystem implementation. The + uri's scheme determines the config property (fs.SCHEME.impl) naming + the FileSystem implementation class. The uri's authority is used to + determine the host, port, etc. for a filesystem. + +``` + +Add this to the bottom before \ tag: + +```shell + + hadoop.tmp.dir + /app/hadoop/tmp + A base for other temporary directories. + +``` + +## mapred-site.xml + +```shell +sudo nano /etc/hadoop/conf/mapred-site.xml +``` + +Modify existing properties as follows: +Look for property tag with as mapred.job.tracker and modify as below: + +```shell + + mapred.job.tracker + localhost:54311 + The host and port that the MapReduce job tracker runs + at. If "local", then jobs are run in-process as a single map + and reduce task. + + +``` + +## hdfs-site.xml: + +```shell +sudo nano /etc/hadoop/conf/hdfs-site.xml +``` + +Modify existing property as below : + +```shell + + dfs.replication + 1 + Default block replication. + The actual number of replications can be specified when the file is created. + The default is used if replication is not specified in create time. + + +``` + +# Format Namenode +This step is needed for the first time. Doing it every time will result in loss of content on HDFS. + +```shell +sudo /etc/init.d/hadoop-hdfs-namenode init +``` + +# Start the YARN daemons + +```shell +for i in hadoop-hdfs-namenode hadoop-hdfs-datanode ; do sudo service $i start ; done +sudo /etc/init.d/hadoop-yarn-resourcemanager start +sudo /etc/init.d/hadoop-yarn-nodemanager start +``` + +# Validating Hadoop +Check if hadoop is running. jps command should list namenode, datanode, yarn resource manager. or use ps aux + +```shell +sudo jps +``` +or + +```shell +ps aux | grep java +``` + +Alternatively, check if yarn managers are running: + +```shell +sudo /etc/init.d/hadoop-yarn-resourcemanager status +sudo /etc/init.d/hadoop-yarn-nodemanager status +``` + +You would see like below: + +```shell +● hadoop-yarn-nodemanager.service - LSB: Hadoop nodemanager + Loaded: loaded (/etc/init.d/hadoop-yarn-nodemanager) + Active: active (running) since Tue 2015-12-22 18:25:03 UTC; 1h 24min ago + CGroup: /system.slice/hadoop-yarn-nodemanager.service + └─10366 /usr/lib/jvm/java-1.7.0-openjdk-arm64/bin/java -Dproc_node... + +Dec 22 18:24:57 debian su[10348]: Successful su for yarn by root +Dec 22 18:24:57 debian su[10348]: + ??? root:yarn +Dec 22 18:24:57 debian su[10348]: pam_unix(su:session): session opened for ...0) +Dec 22 18:24:57 debian hadoop-yarn-nodemanager[10305]: starting nodemanager, ... +Dec 22 18:24:58 debian su[10348]: pam_unix(su:session): session closed for ...rn +Dec 22 18:25:03 debian hadoop-yarn-nodemanager[10305]: Started Hadoop nodeman... +``` + + +## Run teragen, terasort and teravalidate ## + +```shell +hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teragen 1000000 terainput + +hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar terasort terainput teraoutput + +hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar teravalidate -D mapred.reduce.tasks=8 teraoutput teravalidate +``` + +## Stop the Hadoop services ## + +```shell +sudo /etc/init.d/hadoop-yarn-nodemanager stop + +sudo /etc/init.d/hadoop-yarn-resourcemanager stop + +for i in hadoop-hdfs-namenode hadoop-hdfs-datanode ; do sudo service $i stop; done +``` + +## Potential Errors / Issues and Resolutions ## +* If Teragen, TeraSort and TeraValidate error out with 'permission denied' exception. The following steps can be done: + +```shell +sudo groupadd supergroup + +sudo usermod -g supergroup hduser +``` + +* If for some weird reason, if you notice the config files (core-site.xml, hdfs-site.xml, etc) are empty. + +```shell +You may have delete all the packages and re-run the steps of installation from scratch. +``` + +* Error while formatting namenode +With the following command: + +```shell +sudo /etc/init.d/hadoop-hdfs-namenode init + + +If you see the following error: + WARN net.DNS: Unable to determine local hostname -falling back to "localhost" + java.net.UnknownHostException: centos: centos + at java.net.InetAddress.getLocalHost(InetAddress.java:1496) + at org.apache.hadoop.net.DNS.resolveLocalHostname(DNS.java:264) + at org.apache.hadoop.net.DNS.(DNS.java:57) + +Something is wrong in the network setup. Please check /etc/hosts file. + +```shell +sudo nano /etc/hosts +``` + +The hosts file should like below: + +```shell +127.0.0.1 localhost localhost.localdomain #hostname should have the output of $ hostname +::1 localhost +``` + +Also try the following steps: + +```shell +sudo rm -Rf /app/hadoop/tmp + +hadoop namenode -format +```