Hadoop Installation On Windows 7

 Posted admin
Hadoop Installation On Windows 7 Rating: 4,3/5 4297 reviews

Install Hadoop 2.5.1 on Windows 7 - 64Bit Operating System

This post is about installing Single Node Cluster Hadoop 2.5.1 (latest stable version) on Windows 7 Operating Systems.Hadoopwas primarily designed for Linux platform. Hadoop supports forwindows from its version 2.2, but we need prepare our platformbinaries. Hadoop official website recommend Windows developers to usethis build for development environment and not on production, sinceit is not completely tested success on Windows platform. Thispost describes the procedure for generating the Hadoop build forWindows platform.
Generating Hadoop Build For Windows Platform
Step 1:Install Microsoft Windows SDK 7.1
  • In my case, I have used Windows 7 64 bit Operating System. Download Microsoft Windows SDK 7.1 from Microsoft Official website and install it.
  • While installing Windows SDK,I have faced problem like C++ 2010 Redistribution is already installed. This problem will happen only if we have installed C++ 2010 Redistribution of higher version compared to the Windows SDK.
  • We can solve this issue by either not installing the C++ 2010 Redistribution by unchecked the Windows SDK on custom component selection or uninstalling from Control Panel and reinstalling the C++ 2010 Redistribution via Windows SDK again.
Step 2:Install Oracle Java JDK 1.7
  • I recommend everybody to download Oracle Java JDK7 and install JDK at C:Java instead of default path C:Programming FilesJava. Since this default path contains invalid space character between “Programming” “Files”.
  • Now we need to configure JAVA_HOME Environment Variable with value “C:Javajdk1.7.0_51”. If we have already installed Java at it's default path (C:Programming FilesJava). We need to find its 8.3 pathname with the helpof “dir /X” command from its parent directory. The sample 8.3 pathname will look like 'C:PROGRA~1Javajdk1.7.0_51'.
  • Finally we need to add Java bin path to PATH enviroment variable as “%JAVA_HOME%bin”.
Step 3:Install Maven 3.2.1
  • Download latest Apache Maven from its official website and extract to C:maven. Configure the M2_HOME Enviroment Variable with maven home directory path “C:maven”.
  • Finally add the maven bin path to PATH Environment variable as “%M2_HOME%bin'.
Step 4: Install Protocol buffer 2.5.0
  • Download binary version of Protocol Buffer from it official website and extract it to “C:protobuf” directory and add this path to PATH Environment Variable.
Step 5: Install Cygwin
  • Download the latest version of Cygwin from its official website and install at 'C:cygwin64' with ssh, sh packages.
  • Finally add the cygwin bin path to PATH environment variable.
Step 6: Install cmake 3.0.2
  • Download the latest cmake from its official website and install it normally.
Step 7: Configure “Platform” Environment Varibale.
  • Add the “Platform” environment variable with the value of either “x64” or “Win32” for buildin on 64-bit or 32-bit system.(Case-sensitive)
Step 8:Create Hadoop Build
  • Download the latest stable version of Hadoop source from its official website and extract it to “C:hdc”. Now we can generate Hadoop Windows Build by executing the following command on Windows SDK Command Prompt.
  • The above command will run for approx 30 min and output the Hadoop Windows build at C:hdchadoop-disttarget” directory.
Configuring Hadoop for Single Node(Pseudo Distributed) Cluster
Step 1:Extract Hadoop
  • Copy the Hadoop Windows Build tar.gz file from “C:hdchadoop-disttarget” and extract at “C:hadoop”.
Step 2: Configure hadoop-env.cmd
  • Edit the “C:hadoopetchadoophadoop-env.cmd” file and add the following lines at the end of the file. The following lines will configure the Hadoop and Yarn Configuration Directories.
Step 3: Configure core-site.xml
  • Edit the “C:hadoopetchadoopcore-site.xml” file and configure the following property.
Step 4: Configure hdfs-site.xml
  • Edit the “C:hadoopetchadoophdfs-site.xml” file and configure the following property.
Step 5: Configure mapred-site.xml
  • Edit the “C:hadoopetchadoopmapred-site.xml” file and configure the following property.
Step 5: Create tmp directory
  • Create a tmp directory as “C:tmp”, where “C:tmp” is the default temperory directory for Hadoop.
Step 6: Execute hadoop-env.cmd
  • Execute the “C:hadoopetchadoophadoop-env.cmd” file from the Command Prompt to set the Environment Varibales.
Step 7: Format File System
  • Format the file sytem by executing the following command before first time usage.
Step 8: Start HDFS
  • Execute the following command to start the HDFS.
Step 9: Check via Web-browser
  • Open the browser with address http://localhost:50070, this page will display the currently running nodes and we can browse the HDFS also on this portal.