Our setup of HortonWorks HDP 1.1 on Windows VMs

 

Many of our customers have started talking about Big Data, and how they can best make sense of the data they have collected over many years. Apache Hadoop is one among the tools we suggest to them.

Being Windows guys, and having very basic hands-on skills in linux administration and management, we faced challenges with setting up Hadoop on Linux.

HortonWorks changed this and made it much more cooler easier to use Hadoop on Windows with their release of easy-to-use Hadoop releases.

We started off by playing  with the HortonWorks Sandbox and liked what we saw. Very simple to set up (Just install VMWare or VirtualBox) and having a web GUI for everything. We sailed thru their very comprehensive tutorials on Hive and Pig.

Our next step was to ssh (to the Windows guys – “remote desktopped”) to the virtual machine, and play around with the CLI. But we did not see too many changes while using the CLI.

But the sandbox limits us to a single node, and so we were unable to test out the limits of Hadoop or any of the components. (having admin acccess on the machine, we did think about modifying the setup, to build a cluster, but decided not to).

We went thru the excellent article of setting up release on Linux machines (http://blogs.msdn.com/b/benjguin/archive/2013/04/05/how-to-install-hadoop-on-windows-azure-linux-virtual-machines.aspx), but it did not solve our requirement (should be simple to manage / administrate for Windows users). We read about beta release of HDInsight (http://blogs.msdn.com/b/windowsazurefrance/archive/2013/03/21/tuto-hadoop-arrive-dans-le-portail-windows-azure-lan-231-ons-des-jobs-java-pig-et-hive-pour-voir.aspx), and signed up for the beta.

Then we focused on  the HortonWorks Hadoop release for Windows and decided to set up a cluster. For our hardware, rather than requisitioning our own hardware we decided to instantiate multiple Windows VMs on Windows Azure. (later realized advantage – we captured the image of the hadoop slave, and now we can instantiate slaves on-demand).

We decided to use a 2 node setup (master=1, slave=1 – master has secondary name node also on same machine)

The steps to install are clearly documented at http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.1/bk_installing_hdp_for_windows/content/win-chap1.html

We tried deviating from the documentation, but realized that we should not have. But all in all following the steps in the link above is the best way to install it.

The points we deviated on, but we suggest you do as per the documentation –

  • We thought that installing JRE would be good enough. But it looks like oozie is dependent on JDK, and does not work with just the JRE. 
  • After setting up and realizing the problems with Oozie, we uninstalled JRE and installed JDK instead. Assuming that correcting the JAVA_HOME environment variable will be enough, we tried, in vain to restart the hadoop services. Finally we solved the problem with some modifications. (http://blog.sachinnayak.info/2013/04/changing-javahome-on-hdp-11-hadoop-on.html).
  • We had not seen the fourth column in the table at (http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.1/bk_installing_hdp_for_windows/content/win-install-chap2-2.html), and had assumed that the third column will be “default values”, so had ignored many of the variables in our configuration. Point to note – all variables are mandatory.
  • If installation of the MSI fails, there is no failure reported by the MSI. So remember to check the installation log file, and check the installation folder, to confirm that it is installed.
  • If by chance the MSI does not install correctly, remember to go to control panel and uninstall the “HortonWorks Data Platform 1.1.0 for Windows” before trying the MSI again, else the installation will keep failing (checking the logs, it looks as if the MSI tries to uninstall the product, but since the product is not installed, the uninstall fails, chicken and egg problem).

Note the above steps before installing HDP, and setting it up will be a breeze.

Now that we have our cluster setup, there will be more posts coming soon about our experiences

This entry was posted in Azure, Big data and tagged , , , . Bookmark the permalink.

One Response to Our setup of HortonWorks HDP 1.1 on Windows VMs

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>