Our setup of HortonWorks HDP 1.1 on Windows VMs

 

Many of our customers have started talking about Big Data, and how they can best make sense of the data they have collected over many years. Apache Hadoop is one among the tools we suggest to them.

Being Windows guys, and having very basic hands-on skills in linux administration and management, we faced challenges with setting up Hadoop on Linux.

HortonWorks changed this and made it much more cooler easier to use Hadoop on Windows with their release of easy-to-use Hadoop releases.

We started off by playing  with the HortonWorks Sandbox and liked what we saw. Very simple to set up (Just install VMWare or VirtualBox) and having a web GUI for everything. We sailed thru their very comprehensive tutorials on Hive and Pig.

Our next step was to ssh (to the Windows guys – “remote desktopped”) to the virtual machine, and play around with the CLI. But we did not see too many changes while using the CLI.

But the sandbox limits us to a single node, and so we were unable to test out the limits of Hadoop or any of the components. (having admin acccess on the machine, we did think about modifying the setup, to build a cluster, but decided not to).

We went thru the excellent article of setting up release on Linux machines (http://blogs.msdn.com/b/benjguin/archive/2013/04/05/how-to-install-hadoop-on-windows-azure-linux-virtual-machines.aspx), but it did not solve our requirement (should be simple to manage / administrate for Windows users). We read about beta release of HDInsight (http://blogs.msdn.com/b/windowsazurefrance/archive/2013/03/21/tuto-hadoop-arrive-dans-le-portail-windows-azure-lan-231-ons-des-jobs-java-pig-et-hive-pour-voir.aspx), and signed up for the beta.

Then we focused on  the HortonWorks Hadoop release for Windows and decided to set up a cluster. For our hardware, rather than requisitioning our own hardware we decided to instantiate multiple Windows VMs on Windows Azure. (later realized advantage – we captured the image of the hadoop slave, and now we can instantiate slaves on-demand).

We decided to use a 2 node setup (master=1, slave=1 – master has secondary name node also on same machine)

The steps to install are clearly documented at http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.1/bk_installing_hdp_for_windows/content/win-chap1.html

We tried deviating from the documentation, but realized that we should not have. But all in all following the steps in the link above is the best way to install it.

The points we deviated on, but we suggest you do as per the documentation -

  • We thought that installing JRE would be good enough. But it looks like oozie is dependent on JDK, and does not work with just the JRE. 
  • After setting up and realizing the problems with Oozie, we uninstalled JRE and installed JDK instead. Assuming that correcting the JAVA_HOME environment variable will be enough, we tried, in vain to restart the hadoop services. Finally we solved the problem with some modifications. (http://blog.sachinnayak.info/2013/04/changing-javahome-on-hdp-11-hadoop-on.html).
  • We had not seen the fourth column in the table at (http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-Win-1.1/bk_installing_hdp_for_windows/content/win-install-chap2-2.html), and had assumed that the third column will be “default values”, so had ignored many of the variables in our configuration. Point to note – all variables are mandatory.
  • If installation of the MSI fails, there is no failure reported by the MSI. So remember to check the installation log file, and check the installation folder, to confirm that it is installed.
  • If by chance the MSI does not install correctly, remember to go to control panel and uninstall the “HortonWorks Data Platform 1.1.0 for Windows” before trying the MSI again, else the installation will keep failing (checking the logs, it looks as if the MSI tries to uninstall the product, but since the product is not installed, the uninstall fails, chicken and egg problem).

Note the above steps before installing HDP, and setting it up will be a breeze.

Now that we have our cluster setup, there will be more posts coming soon about our experiences

Posted in Azure, Big data | Tagged , , , | Leave a comment

The Rise of SaaS

The Information Technology world has seen many revolutions- the early calculators, telephones, printers, scanners, fax machines, the MAC, the PC and so on. And each advancement in the process of evolution did depend on work/knowledge from previous inventions.

cloud and devices

There are some like Marc Benioff, (the salesforce.com founder and CEO) who invent the future. They start fairly early-on before the real infrastructure falls in place. I mean, when Salesforce.com started in 1999, neither the internet was fast enough nor were there so many devices and nor were people so mobile. Salesforce.com in many ways grew with the internet and transitioned (teenage and then into twenties) with the advent of cloud and the devices revolution. It’s definitely arguable that they wouldn’t be so successful without the reliability of the internet/bandwidth. And of course without Sony there wouldn’t be an Apple and without Amazon there wouldn’t be a Salesforce (arguably so). Steve Jobs kept on saying earlier on that Apple should be like Sony and at some point eventually he said Apple should be like Apple. Same goes with Salesforce. Marc Benioff, wanted enterprise apps should be as easy to use as amazon.com.

If you are really watching what happened and happens often, from an app-dev perspective we are certainly experiencing a major step in this evolutionary process, history is unfolding right here, right now. The line between consumer and enterprise apps is really getting blurred. Werner Vogels the CTO at Amazon.com joked the other day that dropbox.com is so simple and easy to use that people don’t want to classify dropbox as an enterprise product. Enterprises are used to doing it the hard way- the whole nine-yards.

Like TV killed the radio star, youtube killed the TV, twitter killed the blogger star, cloud+devices will kill the enterprise star-apps. And that’s essentially the SaaS in this context. In fact, it’s *aaS- everything as a service. I want to sum this up with the diagram below- where the three gears represent the current state of affairs from a provider/enabler point of view. You are either engineering the device or the cloud or building on top of them. For anything that falls outside these three it’s just matter of time.

device cloud and development

And yes, one more thing and something very contextual. Isn’t, this whole thing reminiscent of the very first 1st Mac commercial by Steve Jobs. –

Enjoy!

-Phani | @phanimt

Posted in Azure SDK | Leave a comment

VM Depot comes to Windows Azure Portal

Before 4th Feb 2013, the way you would deploy a Virtual Machine from VM depot was by using a command line tool. Check VM depot help documentation for detailed steps involved. The “VM depot” was launched by Microsoft Open Technologies on Jan 8th 2013 as public preview. woohoo!! As of 4th Feb 2013, VM Depot is integrated into Windows Azure Portal.

windows azure virtual machine

Once you log-in into the windows azure portal,

  1. Click on the “Virtual Machines” in the left menu.
  2. Then click on the “IMAGES” on the right. You will see a new link called “Browse VMDepot”.
  3. On clicking the “Browse VMDepot” link it will open up the gallery of VM images available from VMDepot.

There you go, you now have an option to choose LAMP, Ruby, redmine (may more) application stack(s) with few clicks directly from within Windows Azure Portal.

-Phani | @phanimt

Posted in Windows Azure Portal | Tagged , | Leave a comment

Central Jersey Azure User Group

We are excited to announce today the launch of “Central Jersey Azure User Group”. This is our meetup.com link. Please join in. This meetup is all about windows azure, windows azure and windows azure. The first session will be happening on Monday, 18th Feb 6 to 8PM. This first meetup is sponsored by Pluralsight and Hanu Software.
Jim Priestley, Azure Technical Specialist from Microsoft has will be here for the first meetup and guess what he will also be giving a session- Introduction to the Cloud and Windows Azure. Thanks Jim. After a quick break I will be doing a talk on Windows Azure Service Bus. This is totally a COMMUNITY group. OF the people, By the people, FOR the people with focus being cloud in general and Windows Azure in specific.

Speakers are welcome. Ideas for topics and future meetups are welcome.

Yea! let’s meetup and learn Windows Azure.

Snacks and coffee will be provided at the venue.
check venue details here-

Looking forward to meet some Windows Azure Enthusiasts!! Let’s cloud IT!

cheers,
-Phani | @phanimt

Posted in Azure SDK | Leave a comment