Sunday, December 4, 2011

Server Virtualization

In my last blog I talked about Cloud Computing. Those who read that will know that a Cloud provides computing resources on the fly. These resources can be software resources like a CRM Solution, an Email Solution or a BI Solution or they can be hardware resources like storage. They can also be complete machines/servers with everything setup. This is what Amazon EC2 does. What this means to you as a user is now you can get a server machine of your choice on demand. So you can say that you want a machine with 2 processors, 360G storage space and 4G RAM etc. You can also say that you want Linux with MYSQL or WIndows with Oracle and Tomcat installed.

Now obviously it is not possible that when you put a request for an all set machine somebody will be available to react to your request and on getting the request he/she will set up your machine, install the required software and then provision the machine. This process needs to be on the fly and automated. So what can be the way?

One way is to keep a disk image ready which includes the standard Operating System and some standard software. You remember that Nortorn Ghost Image. Ghost is a disk cloning S/W which allows you create an Image of a disk which you can restore on another blank machine. This type of provisioning is called as Bare Metal Provisioning. Now when you ask for a machine this image can be loaded into a physical machine and that’s it.

But would this approach work with Cloud Computing? I would say NO. Why? Many reasons.
First of all this is slow. Imagine every time a ghost image getting retrieved. Moreover what do we do when the user says now I don’t need the machine anymore? Do we format the disk?
Above that, if we start giving physical machines to a user on the fly just imagine how many physical machines it would need. Too many, resulting into too much of power consumption; generating too much heat and needing too much physical space.

Also we don’t know what the Users’ Software might do to the machine. The system might crash; one erroneous program might halt the machine as there is no controlling entity.
Above all, this is not the best utilization of the resources also. Most of the time a powerful machine given to a user might be sitting idle or may be its full power won’t be in full use. If there could have been a way to share a machine among many users, it would have been a better utilization of resources. So what is the solution?

Why not have multiple machines on a single machine and allocate these smaller machines to the users. This is called server virtualization or just Virtualization.

Virtualization lets you run multiple machines or multiple virtual machines on single physical machine, with each virtual machine sharing the resources of that one physical machine. Different virtual machines can run different operating systems and applications on the same physical machine. What is a virtual machine?

A virtual machine is a software container that can run its own operating system on a physical machine. A virtual machine, just like a physical machine, has its own virtual Processor, Storage, Memory and NIC. Typically these virtual resources are implemented over actual physical resource with a software layer on top of it. So for example a virtual processor might map to a core of a physical processor. Virtual storage or memory might map to a part of physical storage or memory. Normally, Physical Machine is called as Host and Virtual Machine is called as Guest. Neither the software running on the virtual machine nor the users using it can really differentiate it from a physical machine. These Virtual machines are physically files; meta and data files, on the disk. You can collectively call those files as images and when we say that we are launching a virtual machine what we are doing is we are launching a virtual Instance from a physical image.

Now here you need to understand an important concept or benefit of Virtualization. What it is allowing you to do is separating OS from the Hardware. So Virtualization is creating a disconnect between the OS or APPs from the physical hardware and now your virtual machine sits on top of the physical hardware. So without Virtualization if you had to switch your server on a better Hardware machine it could take almost 24 hours. You take back up of your data then set up a new machine. Install same OS and other applications and then deploy your backed up data. With Virtualization it is as simple as copying the image and launching an instance out it on desired Hardware.This happens through Virtualization Software.

Let us try to understand Virtualization Software. There are two types of Virtualization Software :

1. Client based
Here you install an OS over your Hardware and over the OS you install a Client. Now you can deploy or install multiple virtual machines/OSs over that Client.

Examples:
Oracle/Sun Virtual Box
VMware Fusion for Mac

2. Hypervisor based

Typically in this case you have two components:

Hypervisor or Virtual Machine Monitor:

This is just like an OS that you install directly over Hardware. So you don’t install any OS but you put directly this component over hardware. In VM Ware world this hypervisor is called as ESXi. Hypervisor doesn’t do much alone.

Management Software:

For creating VMs you use another piece of software called as management software which in VMware world is vSphere. You install vSphere on a machine which is connected to Server/Hypervisor through Network and now you administer your server through machine. Now through vSPhere you can create VMs. You specify the Harddisk, RAM per your requirements and then install the File System and OS on it.Through management software you can also transfer this image to any Hardware which has this hypervisor installed. VMWare even allows feature like VMotion which enables the live migration of running virtual machines from one physical server to another with zero downtime.



Now a days you also get P2V tools allowing Physical to Virtual migration. So they can literally create a virtual machine image for a physical machine.
Examples:
Vmware Convertor


Acknowledgements:


http://www.youtube.com/watch?v=QYzJl0Zrc4M&feature=related

by Eli The Computer Guy

Sunday, October 9, 2011

Cloud Computing

Today, the condition of the economy is such that, almost every organization wants to cut down the cost and increase the profit. For Non IT Organizations like banks and insurance companies one area to focus to cut down the cost is IT. Of course they understand the significant of IT in automating their business functions. But that’s not their primary business and they would definitely prefer to focus on their core business goals rather than on IT or Software Development.

Furthermore they are also realizing that maintaining an in house IT infrastructure along with the Software is an expensive affaire. Hardware has its own cost which is repetitive. As soon as en employee joins in, you have to get a new machine. Software, be it custom or third party, add to the cost. Maintaining them is also a headache. And then it’s not only about cost. It’s about time also. Every time an employee joins in, you have to look for a new machine with the S/W installation.

What you would want is making these services available to your employees quickly and cost effectively. Solution?

Look to the cloud...

Instead of getting a powerful machine and installing a suite of software on it, you'd buy a basic machine with a simple application on it. That application would allow employees to log into a Web-based service which hosts all the resources needed by the employees.

The service provides all the software needed by the employee.
This is called as cloud computing and it is gaining more and more popularity because of the lower cost and quick provisioning.

Let’s try to understand the basic concept of Cloud and Cloud Computing further. So you know what, I have been doing a lot of googling trying to find the real definition of cloud after an year of study I derived the best definition of the cloud which I would like to share with you.

“Cloud is the cloudiest term which is used by anybody in any way.”

Almost everybody on the earth seems to have a different understanding of the cloud. So then finally I thought of understanding the meaning of all the definitions and deriving my own. I will put that in a moment. At the moment let’s understand some basic concepts.

Let’s begin with a cluster. You know a cluster which is normally a small group of computers connected with LAN. You deploy your web applications across this cluster typically to get performance and availability as there is no single point of failure. So this is what generally you yourself form at your premises.

Then there comes a grid which in simple terms is a bigger cluster. It involves many interconnected computers which are loosely coupled and geographically dispersed. Here the network connecting them can be Internet also.Typically you use a grid for complex jobs processing like Satellite signals. If we talk about web hosting then maybe you deploy a portal for thousands of users on a grid. If you have big amount of money you can form a grid for your need else what is also possible is that there is a grid available for public and typically you rent it or a part of it.

When computing resources like grid or grid with some required pre-installed software are made available to public like a metered utility, it is called as utility computing.

Now some of you might be getting furious at me thinking what is cloud computing then. So cloud computing is a little more. It use Grid as infrastructure and provides computing over it as a utility also. What differentiates it from Grid Commuting and Utility computing is On-Demand, Automated Provisioning of the Computing Resources over a Network as an Abstract Service. So I define Cloud Computing as:

“On-Demand, Automated Provisioning of the Computing Resources over a Network as an Abstract Service.”

Let’s try to understand this definition word by word.

By computing resources I mean Hardware and software both.

By On-Demand, Automated Provisioning over a Network as an Abstract Service I mean you ask for these resources and you get them in real time and there is an automated system which makes this provisioning possible. Moreover all the resources are provisioned as a web based service over a network, typically Internet and the service is very abstract which means you never know how this service has been provided. You don’t know what’s happening inside the infrastructure to make it possible. The internal details are always hidden.
So for ex when you get any resource you don’t need to worry about its maintenance or management. You also don’t know the physical location of the resources.

“The Grid used for cloud computing is called as cloud.”

Clouds typically follow pay-per-use model or pay-as-you-go model which means you pay for all what you use. This is interesting. As I mentioned Cloud Computing is provisioning resources On-Demand so when you want a resource, just grab it, use it and then release it. Pay for the usage. "So you use you pay, if you don’t use you don’t pay opposed to grids where even if you don’t use you pay.

Clouds are considered to be primarily of two types:

Public clouds
These are available for all, organizations s and individual users.

Private clouds
These are used within the organizations and the organization’s IT people manage them.

Hybrid Clouds & Cloud bursting
Cloud Bursting is an application deployment model in which an application uses both, public and private cloud. Normally the application runs in a private cloud or data centre but when the demand for computing capacity spikes, it bursts into the public cloud. The cloud formed by the combination of the private and a public cloud is called as hybrid cloud.The advantage of such a hybrid cloud deployment is that an organization only pays for extra compute resources when they are needed.

Sunday, July 3, 2011

Big Data stretching the scope of BI

Sometimes I wonder looking at how “Business Intelligence” is moving today. Experts in the field are trying their best to stretch the scope as much as possible for this domain. When it comes to storing the data at the back end we are trying to move as backward as possible. While when it comes to display the information we are moving as forward as possible.

Remember those days when you used to store the data in the flat files possibly at your local system. Then you faced the problem of data management and size. You moved further backwards and you started to store the Data in the RDBMS setup at remote machines and distributed across multiple nodes. Now you see even bigger data. You find it further difficult to manage at your own premises. Guess what, now you decide to go further backwards, possibly out of your premises. You start looking for Big Data Storage options located remotely either in the form of Data Centres or as Clouds. Clouds sound as attractive option as we can find many cheaper options which can provide gigantic storage spaces with already setup big data processing frameworks; Amazon Elastic Map Reduce being a good example. Even if we want to use any other commercial solution for big data processing, setting it up on the cloud should not be a big problem. Though, the Safety and Security challenges associated with Clouds still remain. We can still argue and discuss for hours if it is a good strategic decision to move to clouds just like people do today in the enterprise. Leaving behind all these arguments and challenges Clouds are gaining more and more popularity day by day. So don’t be surprised if your kid modifies his or her understanding of the Cloud. Gone are the days when Clouds were found only in the sky.

While on one side we see data moving further backwards, on the other side we can see the information moving further forward. Earlier you used to get the information in the form of reports on the paper. Somebody used to prepare the reports for you, get them printed on the paper and bring those reports to you finally. Then you started getting reports on your computer screens by connecting your Thick Desktop based viewer on you terminal to the Reporting Solution. Then you got Adhoc Analytics over browsers that allowed you to play with your data that too from any location over the web. Now you want the real time interactive Adhoc Analytics over the handheld devices; mobile and tablets. It’s amazing to see the BI solutions today in the markets allowing you to do real time Adhoc Analytics over your big data stored in some cloud on your iPad. It feels great to see the important yet horrible big data appearing in the form of really pretty charts, widgets and dashboards that too on devices like iPads. So now you don’t need to be worried. Just go wherever you want to go still you are not far from making the important strategic decisions instantly.

Saturday, May 21, 2011

OLAP Over Hadoop

In the last few years Hadoop has really come forward as a massively scalable distributed computing platform. Most of us are aware that it uses Map Reduce Jobs to perform computation over Big Data which is mostly unstructured. Of course such a platform cannot be compared with a relational database storing structured data with defined schema. While Hadoop allows you to perform Deep analytics with complex computations, when it comes to performing multidimensional analytics over data Hadoop seems lagging. You might argue that Hadoop was not even built for such uses. But when the users start putting their historical data in Hadoop they also start expecting multidimensional analytics over it in real time. Here “real time” is really important.

Some of you might think that you can define OLAP friendly Warehousing Star Schema using Hive for your data in Hadoop and use a ROLAP tool. But there comes the catch. Even on the partially aggregated data the ROLAP queries will be too slow to make it real time OLAP. As Hive structures the data at read time, the fixed initial time taken for each Hive query makes Hadoop really unusable for real time multidimensional analytics.

The only options left to you are either you aggregate the data in Hadoop and bring the partially aggregated data in an RDBMS. Thus you can use any standard OLAP tool to connect to your RDBMS and perform Multidimensional analytics using ROLAP or MOLAP. While ROLAP will directly fire the queries against the Database, MOLAP will further summarize and aggregate the multidimensional data in the form of cuboids for a cube.

The other option is you use a MOLAP tool that can compute the aggregates for the data in Hadoop and get the computed cube locally. This will allow you to do a really real time OLAP. Moreover if the aggregates can be performed in Hadoop itself that will really make cube computations scalabale and fast.

There can be a big fight over the point that Hadoop is not a DBMS but when Hadoop reaches to users and organizations who look to use it just because it is a buzzword, they expect almost anything out of it that a DBMS can do. You should see such solutions growing in the near future.