Stateless desktops, hotels and floating rooms …

by +

Eyes glaze over, the meaningless automatic nodding starts and you can feel the person’s mind is miles away … yes, I admit, I had several such fruitless attempts of explaining the concept and benefits of using ‘stateless versus dedicated desktops’.
The inconsistency of todays “VDI” terminology doesn’t help and that includes the description of the relationship between the user and the desktop image.
Stateless desktops are often also referred to as ‘non-persistent’, ‘pooled’ and ‘dynamic’ and dedicated desktops as “persistent” or “private”.
The image deployment model has a fundamental impact on the arguably most important metric for VDI – cost per user – so getting your point across to even the potentially less technical folks is imperative.
… I hope your eyes haven’t glazed over yet …
Actually its simple – a stateless desktop is to the IT Administrator what a hotel room is to a property manager.
Assume a property manager (IT administrator) has been tasked to provide and maintain accommodation (virtual desktops) on a tight budget to a large number of tenants (VDI users … you get the drift …). He knows that unless housing is considered functional and homely by the tenants (user experience) the project will be considered a failure.
He evaluates two approaches:
  • Hotel apartments (stateless desktops) 
  • Residential area with private properties (dedicated desktops)
Let’s see how these approaches compare …


Hotel: (stateless desktop)

  • Tenant checks into the hotel and get any available apartment (desktop) allocated. 
  • On check-in the tenant typically brings their suitcase (user profile) that they use to populate or customize the apartment with personal things they need or like while in it. 
  • They use it for a period of time, will check out when not needed which means all personal belongings will taken away and stored in their suitcase for the next visit. 
  • The apartment will be cleaned (“reset”) so that the next tenant finds it “as new“ (changes to the desktop will be discarded; making the desktop itself  ”stateless”)
  • The apartment will then be made available to any other potential tenant.
  • The next time our tenant checks into the hotel he/she will (most likely) get a totally different apartment (remember … desktop) and won’t care as long as it provides the same functionality.   
Tenant’s View (user):
  • Functionality: Good, as the apartments are equipped with all the facilities and appliances commonly required e.g. kitchen, bathroom etc (MS office, Mail applications etc. built into the Golden image)
  • Personalization: Typically adequate, the level of personalization varies – depending on how big your suitcase is and what you are allowed to bring, the hotel chain would provide centralized storage for permanent personal items outside your apartment or hotel (network drives, folder redirection), some even provide the equivalent of a personal designer service that allows for advanced customization of your apartment to make it really feel like yours (advanced profile management software like AppSense, RES etc). 
  • Major functionality upgrades: The hotel will obviously not allow you to buy a personal home cinema system (your favourite PC game) and permanently install it in the hotel apartment. You could … but be assured that it won’t be there next time you check in (remember, you’ll check into a different room and changes you make to the apartment get cleaned up anyway).The hotel could however provide custom services (applications) through alternative methods if required, think of it as ‘room service’ without having to install a kitchen (e.g. application streaming or XenApp publishing).
 Property Manager’s View (IT administrator):
  • Build and Maintenence Effort: Low, a collection of standard “cookie cutter” apartments from a common blue-print (golden image) NB: I’ll avoid the delta disk/Linked Clone analogy. A common set of furniture and appliances (apps) can be maintained across all apartments. For any custom services like room service to complement the base functionality, additional facilities (cost) are required but can be handled centralized (e.g. streamed apps).
  • Availability Requirements: Low, If a apartment becomes unavailable due to scheduled maintenance or unforeseen problems (flooded bath = image corrupted through user error) the tenant can simply check out and check in to another apartment (connects to another desktop). There is no dependency between the tenant and a apartment (user and desktop image).Even if the entire hotel experiences a power cut (host failure) with all apartments becoming unusable, the tenant can simply check into a apartment in another hotel as long as total capacity across all hotels is sufficient.
  • Utilization: The apartments can be oversubscribed (need to accommodate number of concurrent tenants only)
So the stateless desktop provides the user with a set of common capabilities and application, a mechanism to personalize, use and store personal data permanently that is accessible from any desktop but natively does not allow you to install personal applications into the image. You will never own the desktop but the user experience is close to that of a privately owned one, giving a suitable experience for most users.
 The stateless desktop allows the administrator to build desktops from a common image base that is easily deployed and maintained, the stateless desktop itself does not need to be made highly available and can easily be replaced with another available desktop if the desktop (or host) becomes unavailable. 
The critical personal data is logically separated from the common image (using ‘suitcase’ and ‘centralized hotel storage space’). This results in greatly reduced storage, availability and backup requirements, allowing the use of cheaper local storage for the common desktop image files as described in detail in this post here.

 

In Contrast – Private Properties (Dedicated Desktops)

  • Properties build from a common blue print, also offering custom built “executive” houses
  • Even if they are built from a common blue-print, tenants can and will customize them over time in any way they want, over time they will become unique. 
Tenant’s View (user) :
  • Functionality and major upgrades: ”unlimited”, the apartments are already equipped with common facilities (applications) and the tenant can install any additional ones they would like. 
  • Personalization: Great, the tenant owns the property, they will personalize every aspect of the house and permanently store personal items anywhere in the property (personal data anywhere in the image)
Property Manager’s View (IT administrator):
  • Build and Maintenance Effort: High, any custom build will require additional design effort (image). Even if the initial build is from a common blue-print the property becomes unique over time anyway. Maintaining and supporting these additional facilities (applications) as well as controlling compliance with property regulations increases cost significantly.
  • Utilization: The property is yours, if you are not using it, no on else can, it will remain empty (desktop unused), no over subscription is possible.
  • Availability Requirements: Very High, if your property becomes unavailable due to scheduled maintenance or unforeseen issues the tenant is ‘homeless’. In the event that your home itself (your desktop) gets destroyed it would have to be rebuild from scratch (assuming your property manager maintains updated ”build plans” of your ever changing property (image backups) - all driving up the maintenance cost for your property significantly.If the infrastructure (e.g. electricity) running your private property fails (host failure) the property will be unusable unless it has been build with redundant/ shared facilities that can take over and run your property instead (host level failover using shared storage) - again, driving up cost massively.There is inherent dependency between the tenant and the property (user and his/her desktop image).

 … Reality is that the tenant will probably check into a ‘hotel’ at this point ;)

So while the tenant or better desktop user, will appreciate the potentially unlimited level of personalization and upgrade of functionality, this scenario is a nightmare for the IT organization.
Maintenance of a large number of unique images requires careful backup and availability planning, maintaining the additional applications (or correcting issues they can cause) will result in significant administrative overhead compared to stateless images. 
The infrastructure required to run these highly available images will drive up cost significantly – specifically through a drastic increase in shared storage requirements.
The private property approach is however the one we are used to (who wants to live in a hotel?) … and for VDI users with specific requirements or simply executives who want maximum functionality whatever the cost a dedicated desktop has its place. We often see hybrid deployments and the key to success (reducing cost) is a careful user categorization and analysis of functional requirements to increase the share of stateless desktops in your environment.

 

Future:

We have seen technologies becoming mainstream that blend the the two approaches. They have been around for some time as point solutions like Unidesk but are increasingly integrated into the vendor apartments with Citrix’s personal vDisk being a great example.

Imagine you are in a hotel that provides the futuristic feature of a “floating” personal room that can be detached and magically attached to any of the apartments.

The tenant is allowed to store any personal items and even install the above mentioned home cinema system (or any other personal applications) in this “floating personal room” (personal vDisk).
When the user moves between apartments the personal room will be detached and reattached to the new apartment retaining the personalization and functionality that it provides over and above the standard apartment even if the apartment was cleaned or refurbished (image reset or recomposed). 

If you are familiar with VMware View’s “persistent disk” or Verde’s “user disk” implementation you know that this personal “room” exists today but can only be used to store your suitcase items (profile) and items you’d have put into hotel storage (my documents etc.) surviving a clean of the apartment or even a refurbishment (reset or recomposing of the image). If you however decided to install the above home cinema system (personal application) in this room it would be there after the apartment was cleaned/refurbished but it would not function anymore.


Why?
Well, the installation of these applications also make changes to the base image (think of it installing a power junction in the standard hotel apartment (not your personal room) to power your home cinema system. There is no intelligence that tracks the dependencies and changes and when you try to reattach the magic floating room to a new apartment the required power junction is simply not there. So the home cinema system is still physically in your room but won’t function.

This is where the beauty of the personal vDisk comes into play.

When using the personal vDisk a filter driver in the image will track all changes and ensure that they are routed to your “personal room” and more importantly that they continue to exist in isolation from the base image (think of it as installing a duplicate power supply in the floating room rather than utilizing the existing one in the apartment).

The result is a model that preserves the best of both approaches in the VDI world, a stateless base image (with all the associated benefits) combined with a ‘layer” (room) of personal applications and customizations (requiring only those to be highly available and backed up rather than the entire image).
Even the personal vdisk has limitations today where in reality it does not float automatically between desktops (following the user) but is associated with the desktop and needs to be manually reattached to the new “hotel apartment” by the administrator in recovery situations – but we are halfway there and other vendors work on similar functionality (e.g. VMware’s Mirage).
It’s just a question of time … and … well, inventing floating hotel rooms …
Andy



Maximums and sizing guidelines for VMware View, XenDesktop (on vSphere) and Virtual Bridges VERDE

by +

When we introduced a building block approach to our reference architecture many questions from the wider team revolved around the scaling maximums and limitations of the respective desktop virtualization solution in order to create valid configurations and correctly sized building blocks.

It became quickly apparent that while e.g. in VMware’s case vSphere maximums were well documented, the virtual desktop specific guidelines are scattered around in different documents, some are not listed at all and others (e.g. storage related) are based on 3rd party vendor recommendations rather than limits specified by VMware.

How many systems per “cluster”, how many VMs per replica/base image or LUN, how many broker or management servers per building block ?

So I thought it’d be worthwhile to summarize the (high-level) guidelines and assumptions we used for View, XenDesktop and Verde (verified by the respective vendors) in this post.

In a nut shell, understanding the scaling limitations allows you to “assemble” systems into clusters, building blocks and larger constructs (PODs) by combining them with management and other peripheral components.

I’ve created a few graphics to show you the approach conceptually in the example below.

Figure 1 – Step 1: Assembling hosts into a cluster and adding storage and management components to create a building block for XenDesktop on vSphere

Figure 2 – Step 2: Adding broker and other access components to the building block for user access

Figure 3 – Step 3: Create a larger scale user environment by combining building blocks (10 000 user example for XenDesktop on vSphere)

VMware View 5 and 5.1

General
  • Maximum 8 Hosts per Cluster (including N+1) – limit is caused by View Composer for concurrent file access in conjunction with the VMFS file system (maximum without View Composer is 32)
    • New: 32 hosts per cluster with View 5.1 when using NFS volumes (no VMFS)
  • Each desktop pool can contain max of one ESX/ESXi cluster
  • Up to 16 VM’s per CPU core (recommended, not a hard limit as up to 25 are supported on vSphere)
  • Max 512 VMs per host (documented vSphere maximum)
  • Max 1,000 VMs per View desktop pool
  • Max 2,000 VMs per vCenter (primarily determined by the number of concurrent activities from View components against the vCenter server)
  • Maximum pod size (construct of VMware View Building Blocks): 10 000 users (tested by VMware)
Storage:

Storage maximums with VMware View are less clear defined (mainly best practices have been published)

  • Typically the recommended maximum number is 64-128 linked clones (VMs) per VMFS datastore
  • VAAI storage systems typically allow for numbers higher than 128
    • (the limit without VAAI is primarily driven by the SCSI reservations on the VMFS file system during metadata updates – with VAAI enabled LUNs ESX uses atomic test and set (ATS) algorithm to lock the LUN, reducing the impact greatly)
    • Storage vendors have shown that an NFS based datastore (NFS is not using VMFS) can support up 256 and more linked clones (the actual theoretical maxima of VMFS and NFS for vSphere are higher).
    • Max 64TB per LUN (VMFS), max size of NFS LUNs are often determined by the NFS storage array (check with your vendor)
      • The maximum size listed above is not intended to be a “recommended” number as this will be determined by various factors (performance per LUN, vm sizes, operational concerns like backup etc)
      • Max 1000 VMs per replica and View pool
 Maximum Number of Connections:
  • 1 Connection Server with Direct connection, RDP or PCoIP:  2,000
  • 7 Connection Servers (5+2 spares) Direct connection, RDP or PCoIP:  10,000
  • 1 Connection Server with Tunneled connection, 2,000
  • 1 Connection Server with PCoIP Secure Gateway, 2,000


Citrix XenDesktop 5.6, PVS 6.1 and vSphere 5

When utilizing XenDesktop with a VMware vSphere backend infrastructure (managed by VMware vCenter) many scaling considerations are determined by the VMware environment.

General
  • Maximum 8 Hosts per Cluster (including 1 hot spare)
  • Maximum 8 Hosts per Cluster when using Citrix Machine Creation Services (including N+1) – limit is caused by concurrent file access in conjunction with the VMFS file system
    • For NFS this limit does not apply (Citrix support statement pending)
  • Up to 16 VM’s per CPU core (recommended, not a hard limit as up to 25 are supported on vSphere)
  • Max 512 VMs per host (documented vSphere maximum)
  • Max 2,000 VMs per vCenter (primarily determined by the number of concurrent activities from XenDesktop components against the vCenter server)
 Storage:
  • Typically the recommended maximum number is 64-128 XenDesktop Machine Creation Services (MCS) difference disks (or VMs) per VMFS datastore.
  • VAAI storage systems typically allow for numbers higher than 128 VMs
    • (the limit without VAAI is primarily driven by the SCSI reservations on the VMFS file system during metadata updates – with VAAI enabled LUNs ESX uses atomic test and set (ATS) algorithm to lock the LUN, reducing the impact greatly)
    • Storage vendors have shown that an NFS based datastore (NFS datastores are not using VMFS) can support 256 and more linked clones (the actual theoretical maxima of VMFS and NFS for vSphere are higher).
    • Max 1000 VMs per MCS base image (equivalent of View replica disk)
Management Server and Maximum Number of Connections:

The number of supported users depends heavily on the actual load generated by registrations or logins per minute, so the below numbers are only high level guideline)

  • 1 XenDesktop Contoller: Can handle 10 000+ users, the configuration listed in our reference architecture (virtual machine with 4 vCPU, 4GB RAM) should be able to handle 5000+ users
    Always use N+1 servers for redundancy.
  • Provisioning Server: A single virtual server (4 vCPU and 32GB RAM) will support approximately 1000 users
    Always use N+1 servers for redundancy.
  • License Server: A single Citrix License server (2 vCPU, 4GB RAM can issue approximately 170 licenses per second or over 300,000 licenses every 30 minutes.
    Because of this scalability, a single virtual license server with VM level HA can be implemented (if the license server is down a grace period of 30 days is available).
  • A single Web Interface serve has the capacity to handle more than 30,000 connections per hour.
    Two Web Interface servers should always be configured and load balanced through NetScaler to provide redundancy and balance load (NetScaler design is outwith the scope of this paper).

For environments with smaller numbers of users (e.g. <500; actual numbers will depend on user activity) the Web Interface service as well as the license server can reside on the XenDesktop controller instance rather than on a dedicated server.

Citrix NetScalerAccess Gateway: Provides secure remote access and single sign-on capabilities (e.g. from outside the corporate network): A single access gateway can provide 10 000+ concurrent ICA connections and should be deployed in N+1 configurations.

 

Virtual Bridges VERDE 6.0

One of the aspects I really like about VERDE is that fact that it’s scaling limitations are actually far simpler to deal with as the product inherently provides an architecture that allows for easy horizontal scaling.

Rather than having to use dedicated management servers (potentially with multiple UIs) each VERDE Server comes with an integrated connection broker, a hypervisor to run VDI sessions, and is managed from a single Management Console.

Servers can be clustered together using Virtual Bridges’ stateless cluster algorithm, in addition the Distributed Connection Brokering architecture eliminates any single choke point and therefore increases the scalability and availability of the VDI solution.

  • Max 10 000 hosts per VERDE cluster
  • Max 1 000 000 VMs per VERDE cluster
  • Max 2000 users per connection broker (typically each hypervisor host will also act as broker so that the alignment of “users per server” will automatically take care of that)
  • Cloud Branch Servers are sized like regular clusters
Storage
  • Shared storage needs to be NFS based
    • Shared storage typically only contains the golden image and the VERDE persistent disks for the VERDE user profile management (plus the small cluster metadata) greatly reducing the shared storage requirements!

Yes, it can be that simple … ;)

PS I have included more details on creating building blocks and the surrounding design consideration in the upcoming Redbook.
Please always check the latest vendor documentations for official (and updated) numbers where available.

Andy



How to configure 3D and Aero for VMware View 5 – and what is the overhead of doing so? (IBM VDI RA Part 3)

by +

As you probably know, View 5 (in conjunction with vSphere 5) introduced has a software-based GPU function that gives users basic DirectX and OpenGL capability without the use of a physical GPU (like e.g. Citrix’s HDX3D GPU-passthrough requires). Typical target use cases include Aero and low-end 3D animations, not “high-end” 3D engineering application.

One of the questions I’ve been regularely asked is “what amount of overhead does this create?

As we have no physical GPU in this scenario to execute the graphics related work (3D rendering etc.), the (general purpose) system CPU will have to perform this task. As the CPU is the most likely bottleneck in the majority of VDI deployments, enabling 3D should have a direct impact on user density – but how much?

So I wanted to qualify and qauntify the impact of enabling 3D as part of our reference architecture testing. Using LoginVSI we investigated the maximum user density (VSImax) when:

  1. Configuring 3D capability for all desktops in a View 5 pool without enabling an Aero theme (this configuration would give user the ability to run e.g. Google Earth but have no Aero animations configured)
  2. Configuring 3D capability AND enable an Aero theme (with animated effects) for all users.

Both results will be compared to the base line (PCoIP connections to the same pool without any 3D capability enabled)

As the steps to enable 3D capability are actually not well covered in the View documentation I’ll give a step-by-step log of what we did … in case you are less interested in the “how”  – here are the high-level results:

Conclusion:

  1.  Enabling 3d capability for the pool and image causes essentially zero overhead. This means that you can enable 3D capabilities for a pool even if you are not planning to use this capability in the future without impact on user density. That will allow users to use 3D-like applications like e.g. Google Earth if needed without having to reconfigure the pool (these 3D applications will of course create additional load if used).
  2. Enabling Aero capabilities should be done with care. Only enable users that really require this level of user experience as the impact on overall user density is significant.
    As enabling Aero is done on a user level this can be done easily with the needed level of granularity.
As a side comment, I do like the View 5 capability to enable basic 3D functionality for users without a physical GPU, regardless of the (expected) overhead. I expect increased interest in the VDI/3D Graphics area over the next 12 months given e.g. the product announcements regarding physical GPU support with NVIDIA at VMworld 2011.
But we’ll also keep a close eye on Citrix which is currently arguably the market leader in the high-end space with HDX3D Pro capabilities. The recent NVIDIA announcements around the “Kepler” GPU with virtualization capabilities are very promising as this approach will overcome the crippling 1:1 (1 vm per physical GPU) relationship with traditional GPU passthrough.

If you want to understand how to enable 3D/Aero on a View 5 pool simply read on …

Pool capabilities without 3D

We have an existing View 5 pool with 80 virtual desktops. The LoginVSI benchmark measured that we can support up to 74 multimedia users (VSImax) on our server (dual socket, 12 cores) when connecting with PCoIP.

Let’s test first what happens if we try to run a 3D app in a non-3D enabled vm …

  • We connect with the View client (PCoIP) to our existing View pool (dedicated pool, linked clones, non-3D enabled)
  • We connect to the Internet and try to install Google Earth – it fails with:
  •  Applying any of the above suggested actions will not resolve the issue
  • Additionally when we try to configure Aero in the vm it fails (as expected).


The above confirms that you can neither run 3D (DirectX/OpenGL) applications nor  enable Aero effects for users in a non-3D enabled pool.

Configuring 3D

Before you start, remember that you will need both, vSphere 5 AND View 5 in order to configure 3D. With virtual hardware version 8 the VMware tools install the 3D capable graphics adapter as the new default adapter, so ensure that you have the tools updated. For this article we assume that you have already installed and configured View 5.

Preparing the Master Image for 3D

First step for enabling 3D is to prepare the master image (for the linked clones). The View user documentation is a bit loose here and seems to omit that you need to prepare the master image before configuring the pool settings.

Note: In order to prepare the image for e.g. Aero use you will have to manually perform the following operations on the master image before creating the pool, perform the configuration changes and then take a snap shot for the linked clones.
As Windows was initially installed on a non-3D capable system, Aero is not enabled, nor has Windows established the User Experience Index (required), nor attempted to enable required Aero related services.
Therefore if you only enable 3D for the View pool (without the following image changes) the updates required to enable Aero would be missing and any changes you perform as user in the individual vms will be discarded on refresh (e.g. logoff). The result is that you would e.g. be able to run Google Earth (if installed) but not Aero.

  • We took a clone of our existing master image to have an independent image stream (snapshots) for 3D
  • Open a console to the cloned master vm (refered to as “3D Master” from now)
  • (optional) Verify in Device manager that the 3D video adapter has been configured (min virtual HW v8 required and VMware tools installed)

  • Enable 3D on the 3D Master virtual machine (> vm settings > tick box “enable 3D support”

Important: Shutdown or power off the vm after the reconfiguration – do not just “reset” the vm

  • We tried again to install Google Earth again – this time it installed OK and worked as expected!

  •  As this point we took a snapshot “Google Earth without Aero enabled” which we will use later to test the supported number of users with LoginVSI.
Enabling Aero

We went on to prepare the image for Aero. As you might know from personal experience enabling Aero (after installing Win 7 on a non-3D capable system) feels slightly random but here are the steps that worked for us:

  • Most VDI optimization procedures will disable services required for Aero functionality (including the official VMware script:
  • We therefore enabled the “Desktop Window Manager Session Manager” service manually:

  • The easiest way to enable Aero is to go to > Control Panel > Troubleshooting > Display Aero Desktop Effects > and follow the wizard ..
  • This time (as 3D was enabled in the vm settings) the wizard will be able to fix most issues, even though it wrongly indicated that the Desktop Window Manager service was disabled the following actions were successful)

  • Go to >System properties > advanced and apply “best appearance” – At this stage we still weren’t able to select advanced Aero options (e.g. Aero Peak)
  • The important step is to run the “Windows Experience Index” for the system to confirm the appropriate 3D capability

  • Confirm that the 3D capability was recognized
  • Now we can enable advanced settings like Aero Peak appearance

  • At this stage we created a second snapshot “Aero Enabled”

We have now for the same Master image two snap shots, one with Google Earth installed without Aero enabled and a second one for 3D enabled with Aero capabilities enabled. This allows us to test the overhead enabling general 3D capability as well as the delta of running Aero for all users.

Note: The Aero theme is a user setting, not a computer setting, so even with our second snap shot we will still have to enable an Aero theme for users of this image to get Aero working in the linked clone desktops.

Enable 3D Capability for the View 5 Pool (without enabling an Aero theme)
  • Create a new pool or recompose the the existing pool edit the pool settings for your existing pool with the desired snap shot.
  • Enable 3D as seen below, this settings will automatically enable 3D for all images of the pool ( so you don’t have to edit the individual virtual machine settings)
  • For our test we selected the maximum amount of VRAM
  •  You will see a “reconfigure virtual machine” in vCenter – after the reconfiguration you can verify that the tick box “enable 3D support” has been set for all the virtual machines in the pool.

You are not done yet, in order for Aero to be effective ensure that you have enabled Aero for the user (e.g. using GPOs as shown in the below screenshot) The default for the user is a non-Aero theme when installing Win 7 on a non-3D capable vm. So while you might assume that at this point you have enabled Aero for all users of the pool a test will show you that users logging in will not have Aero capabilities.

Enable an Aero them as shown for all users of the pool:

Determining the overhead (impact on user density)

We simply used the respective snapshot to recompose our View pool and ran another series of LoginVSI tests.

The first test was done using the 3D enabled snapshot but Aero was not configured nor an Aero theme enabled for the users:

As you can see from above the test shows that there is basically no overhead when enabling the 3D capability on the pool, the user number stayed easily within the 5% bracket that we allow for VSImax fluctuation when running tests (we took the averages of 3 tests).

  • We then recomposed the pool with the image that had 3D enabled, Aero configured and configured an Aero theme for all 80 users of the pool.
  • After the first test run we reduced the number of vms to 60 in order to avoid a skewed result due to many idling virtual machines)
  • The result is shown below:

The Aero enable test clearly shows the overhead created by enabling the additional Aero graphics workload. The impact on the CPU to emulate 3D functionality, causes the number of supported users to drop by over 35%.

OK, hopefully that gave you some insight into a) how to enable 3D and Aero and b) what type of overhead you should expect when doing so. The actual overhead will of course depend on the individual 3D workloads or animations you decide to run in your virtual desktops.



IBM VDI Reference Architecture – Shared or Local Storage for your VDI deployment? (IBM VDI RA Part 2)

by +

Link to Introduction: A first peak into the new IBM VDI Reference Architectures (IBM VDI RA Part 1)

It is well understood that a key inhibitor to VDI solutions (amongst general complexity, technical limitations and migration effort) is the upfront capital cost. As I’ve been leading this project architecturally I want to elaborate a little on the importance of our storage design approach.
Just again this month two of my (larger financial) customers have estimated respectively 40% and 50% of the projected VDI project CAPEX cost to be related to enterprise storage, primarily due to the specific IOPS requirements of VDI.
So let’s have a closer look at the storage approach for our RA … As you can tell from the above, our desired (but of course not only) use case will be the pooled “non-persistent/stateless desktop that enables users to connect to a new/different desktop image every time they login while keeping aspects of user experience persistent (profiles).
This allows the usage of local storage instead of shared storage as no user-associated data will reside persistently in the image, in (the unlikely) case of host failures, users can simply reconnect to a desktop hosted on another system without the need for e.g. VMware HA.
I will here not discuss again each of the architectural approaches in detail (e.g. persistent v non-persistent, positioning VDI v SBC etc.) and I ask for forgiveness for brushing over important alternatives discussed in previous articles on this blog.

Importance of in-depth performance analysis

As stated in the overview, the key design principle of our RA approach is to radically reduce the cost of VDI by utilizing local SSD storage instead of shared storage where possible.
Without going into great detail (see the final publication for details) I want to assure you that we have performed extensive analysis particularly of the storage related aspects in order to validate this approach. I ensured we measured and documented all IOPS performance aspects and monitored latency on all storage components. The collected data does not only validate the local SSD architecture but also gives us unmatched insight into the IOPS distribution and allows us to create sizing models for local and shared storage approaches which we will feed into new sizing tools.   The above example illustrates the detail of the storage related data collected for every test (IOPS and latency measurements of a single test on each storage tier).

So, local instead of shared storage for VDI …

This approach is not new but unfortunately still not promoted widely enough.
Why?
Ok, allow me to be slightly controversial … review the majority of VDI reference architecture documents out there yourself and you’ll see that they are primarily created with/by major storage vendors … now, would it be in your interest to promote local storage if you goal is to sell enterprise SAN/NAS … I’ll let you be the judge …
And yes you could argue “what about you IBM – you are a storage vendor, no?” – let me say that common sense does sometimes prevail  ;)

So why have we decided to make the ‘local storage’ architecture the core of our approach?

  • To gain maximum return on investment, non-persistent/stateless virtual desktops should be the default approach in any VDI deployment (reduce storage requirements, minimise size&number of images, enable pooled images etc.). To be blunt – if one argues “I can’t do it with non-persistent, I need all to be dedicated desktops” then VDI’ is most likely the inappropriate approach anyway (e.g. high-end user requirements across the board) or the capability of “stateless” is misunderstood.
  • Cost: Shifting IOPS data from shared storage to local storage allows you to significantly reduce cost – get a quote from any storage vendor for the same capacity/performance configured on their Enterprise SAN/NAS and compare it with the equivalent local storage cost and you will get a feel for the massive delta!
  • Building blocks (servers) with local storage allow you for simple linear capacity scaling – add another system and you will get a linear capacity gain – no complexity in estimating impact on shared storage.
  • So please approach VDI with non-persistent and treat persistent as “exception”! Local storage goes architecturally hand-in-hand with stateless desktops.
    Of course reality is that there will be “exceptions” in most deployments and we will absolutely cater for those but let’s be clear, a deployment with 100% persistent desktops has little chances of (financial) success.

So what about those “exceptions”?

We all know that in most deployment you will be asked to provide persistent desktop. I’m sure I made clear that you should validate any “demands” for persistent desktops – Do NOT assume that the requestor has already done that! However, if persistent desktops are indeed required, our architecture will provide a hybrid of persistent and non-persistent desktops with the same building blocks (local SSD removed) simply through the addition of external storage, win/win.

One more comment - you will probably be familiar with interesting 3-rd party ‘SAN caching/optimization’ appliances like Atlantis’ ILIO (with their latest diskless feature) that try to address the storage cost issue. We have tested Atlantis in the past and seen very efficient offload so I am absolutely not discounting solutions like that – there is a place, specifically if primarily persistent desktops are required and we have been working with e.g. Atlantis in the past to provide ILIO based solutions.
So why have we not included SAN caching appliances (at least at this stage)?

  • I’m a believer in simplicity – most VDI solutions today are clearly already too complex and non-integrated.
    Introducing an additional layer (of 3rd party components) should only be done if the return absolutely justifies this.
    From my experience the additional (licensing) cost, additional support layer makes the simple local SSD storage approach the preferred model where appropriate.
    Cost for SSDs decrease rapidly, capacity and durability go up with it – arguably becoming a primary storage technology.
  • Most major VDI vendors are increasingly integrating caching algorithms; you will be familiar with e.g. XenServer’s IntelliCache, VMware’s announced Storage Accelerator (CBRC) and Verde’s Storage Optimiser.
    No, they are functionally not identical to e.g. ILIO (too big a topic to go into detail) – they address the issue in varying ways and primarily only the “read” IOPS – which is great for ‘boot storms’ but less so for the majority of “working state” IOPS (which is write). However they are/will be vendor-integrated, provide at least a subset of the functionality natively as part of the product, are fully compatible with our local SSD approach and I personally prefer to choke one throat in case of any issues.

I suppose the summary of this is that I have yet to see a SAN or SAN optimization appliance based building block that will flat out beat “local SSD” on price and simplicity for non-persistent desktops …
Again, let me be clear, there is no “one-fit” all approach and I am by no means implying that there won’t be cases where primarily persistent desktops, SAN+SAN optimization appliances or of course Terminal Services like solutions are appropriate (or in TS’s case potentially even more appropriate) – I have made my view absolutely clear on this before.

“So what about shared storage then, are you telling me I can get away without it completely?”

Ehhm, I’d be a fool to claim that …so let me be clear. There are types of data even in a non-persistent desktop environment that you need to keep available from any host/desktop and therefore needs to be hosted on shared storage … primarily the bits that give the user the feeling of persistency i.e. the user profile (desktop setting etc.) and any persistent user data (stored documents etc).
This is nothing new and has been achieved through various methods like roaming profiles, folder redirection etc. for ages and is increasingly enhanced through product features like VMware Personas, Citrix personal vDisk and Microsofts UE-V.
Bottom line is that you will need some shared storage …

Our architectural approach on this is clear and should (hopefully) make you happy ..

  • As explained above – we absolutely minimize shared storage requirements by placing the heaviest load on local SSD and only use shared storage for persistent user data and profiles (these are typically already on shared storage for physical desktop environments in your environment – so most likely no additional investment at all)
  • We understand that most have already a preferred storage platform – continue to use your own if you want to – our building block systems provide IP, FCoE and iSCSI based storage capabilities.
  • In order to further minimize shared storage cost and allow maximum flexibility we suggest a file (not block-based) storage system (NFS or CIFS) – again, this will depend on your environment

In the next post we’ll move on to share more of the preliminary results – continuing with “enabling 3D and Aero capabilities in View 5 – impact on user density and step-by step instructions” – coming soon …



The New IBM VDI Reference Architectures – VMware View, Citrix XenDesktop and Virtual Bridges Verde (IBM VDI RA Part 1)

by +

As promised before, I want to start sharing some of our experiences with our on-going (IBM) VDI reference architecture (RA) work.
I’ve discussed VDI patterns and inhibitors (based on real-world client engagements) in various previous articles so I’ll cut to the chase …

What have we been doing?
We have set up three industry-leading VDI solutions in our labs (US and UK) and are performing architecture verification and LoginVSI performance tests on different sets of IBM hardware, IBM Blades, IBM Rack systems and the recently announced IBM PureSystems (which has great potential to become the ideal platform for VDI – I’ll get into more details in another post).

A key design principle of our Reference Architecture is to address the arguably most common inhibitor to VDI – storage cost. Our approach will radically reduce the storage requirements for your VDI deployment by favouring local SSD storage instead of large-scale shared storage arrays.

We have been working closely with each individual vendor during the project and I’m grateful for their help (since our initial architectural workshop with the vendors last year).
It is also important to point out that the purpose of this document is NOT to compare the vendor solutions AGAINST each other but to demonstrate the suitability of our architectural approach on IBM hardware for each solution.

A key objective for the project is to determine the supported user density for individual workloads in order to create scalable building blocks and sizing models – essentially make sure our approach works, scales and gets you to the best price point.

The three VDI solutions implemented and tested are:

  • VMware View 5 (with a 5.1 update to follow) with ESX 5
  • Citrix XD 5.6 (ESX 5 host – optional Hyper-V hosts)
  • Virtual Bridges Verde 5.5 (KVM based hypervisor with “Storage Optimizer”)
That’s not all though – since I have the privilege to lead this effort architecturally I also wanted to provide additional value for the VDI enthusiast – or indeed you sceptics ;) – by investigating specific aspects of performance and user density.
I am frequently asked “will using PCoIP instead of RDP create any overhead?” or “does filling up memory in my server help or harm user density?” …

So what is the impact on user density (and therefore on the all-important metric of cost/user) when:

  • Connecting with PCoIP (instead of RDP)
  • Connecting with SPICE (instead of RDP)
  • Enabling 3D capabilities on View5 desktop pool and image
  • Configure an Aero theme for users (in a 3D enabled View 5 pool)?
  • Running with decreased memory frequency (when using larger memory configurations)?
  • Enabling View 5 Persona Management (compared to Microsoft Roaming Profiles and local profiles on persistent disks)?

So we set out to determine the following values (example View 5 / dual socket Intel Westmere 6-core / 192GB RAM):  

You can see that I’ve already listed some of the results as “teasers”. Pay for instance attention to the decrease in supported users when using PCoIP instead of RDP for all users. 20% less users is significant but of course PCoIP provides e.g. advanced graphics and redirection capabilities. Also the results are measured using the default settings (e.g. BTL enabled etc.) so tuning is absolutely possible.

We’lll discuss and share the other results in the following posts, of course all the results (and more) will be also be officially published (IBM Redpaper) as they become available (starting with VMware View), I will post the link(s) to any documents here as well.

Throughout our testing we have performed extensive performance analysis on all aspects (user density, IOPS, latency, network etc) and the findings will be fed into new VDI sizing tools that we’ll make available in due course.

So … a VDI approach that will allow you to reduce your cost per user, allow you to add scalable building blocks as you grow and all bundled with first-hand technical insight and sizing guides based on our testing … sounds interesting? If yes, then feel free to check out the following posts and upcoming publications …

PS A massive credit to the extended IBM team and the VDI vendor teams for their help with this project!



Tip – Enable IBM SmartCloud Entry for email notifications (to e.g. Gmail)

by +

Why would I want to do this?

Just a quick tip as we’ve received a few questions on enabling email notification for IBM Smart Cloud Entry (Starter Kit for Cloud).

Possible Symptom: SCE fails to send email notifications to Gmail (or other mail systems requiring SSL based authentication)

Background:

IBM Smart Cloud Entry can notify users about important activities on the cloud (e.g. new project created, virtual machine deployed, request approved etc.) Google (and other common email systems) require SSL based authentication.
IBM SmartCloud Entry currently uses non SSL based email for its email notifications. You can use a relay server to forward non-SSL based mail (generated by SCE) to Gmail or other external mail systems by following this article.

In the process we will install a free Windows based mail server that will establish an SSL mail “proxy” relationship to Gmail for you. Gmail will then send mail to any Gmail user directly or relay it to other mail systems.

Pre-requisites:

  • Note: You will need a valid Gmail account that is used to authenticate any mail requests (suggested to create a  dedicated mail account)
  • The hMailServer needs to be installed on a computer (or vm)  that has access to the Smart Cloud Entry environment as well as the Internet

1 – Install hMailServer as Email Relay Server

You can use other products with similar function – the purpose of this article is not to endorse the product rather than to explain how to enable the function for IBM Smart Cloud Entry.

  • Download hMailServer from http://www.hmailserver.com/
  • Accept license agreement
  • Specify installation destination
  • Select to use the integrated DB (or specify an external one)
  • Select full install

  • Start the hmail admin console (enter the password specified above)
  • Add a name for your local local domain, e.g. local.yourdomain.com and save it
  • Got to > Settings > Protocols > SMTP > Delivery of e-mail:
    • Local host name: enter “localhost” or full host name (irrelevant)
    • Remote host name: “smtp.gmail.com”
    • Remote TCP/IP port: “465”
    • Important: Server requires authentication: yes (checked)
    • User name = user@gmail.com (enter a valid gmail account that will be used to authenticate the relay requests)
    • Password: enter your gmail account password
    •  Important: Use SSL: Yes (checked)

    • Click “save”
  • Got to > Settings > Advanced  >IP Ranges > Internet
    • Lower IP – Upper IP = leave as is (all access)
    • Other >
      • Anti-Spam: No (Cleared)
      • Anti-Virus: No (Cleared)
    • Require SMTP Authentication:
      • Local to local e-mail addresses: No (Cleared)
      • Local to external e-mail addresses: No (Cleared)
      • External to local e-mail addresses: No (Cleared)
      • External to external e-mail addresses: No (Cleared)

    • Save
    • Exit

Please note that in order to properly secure your mail server in production environments you should limit the scope of IP addresses, the type of mail traffic and enable spam and anti-virus functionality.

 

2 – Configure SKC to use the hMailServer Relay Server

We have configured the mail relay server, now we need to point SCE to forward mail to it (instead of directly attempting to send it to Gmail)

  • Locate the email.properties file on the SKC system (user directory) as seen below
  • Change the IP address of the relay host to the system where you installed hMailServer on (can be on the SKC system – not a statement of official support though)

 

3 – Enable the user to receive email notifications and test the setup

In SCE, ensure that the user has email notifications enabled as seen below.

  • To test, add the user (ensure that  the email address specified here is a valid external email address – it does not have to be a gmail account)
  • Ensure to tick the “Send notifications …” box
  • Save the new user and verify that an email has been sent to the address specified.

Tip: In case of problems you can enable logging on the hMailServer as shown below



From Virtualization Management to Private Cloud – Lab Test Log: Upgrading Hyper-V cluster on IBM BladeCenter to SCVMM 2012

by +

IMHO the only way to provide relevant coverage of vendor capabilities on Virtualizationmatrix.com is through hands-on experience with the products in client projects or lab test – finding the  time to always document this in detail is however a challenge.
So initially this “test log” was not intended to be a blog but I decided to post it after running a few colleagues through my experience with VMM 2012 and they asked me to share it with others.

So here it goes …

Scenario:

Migrate one of our Hyper-V lab clusters from VMM 2008 R2 to VMM 2012 and evolve the managed virtualization environment into a private cloud that facilitates controlled Self-Service access for visitors (remote demo users and developers).

  • Upgrade the VMM management of our existing 2-node Hyper-V cluster “HypVCluster” to VMM 2012 (RC) by installing a NEW instance of VMM (not an upgrade of the existing instance)
  • The original HypVCluster consists of 2 x W2K8R2 hosts “Hyperv1″ and “Hyperv2″ on IBM blades HS21XM in IBM BladeCenter-H chassis connected to IBM Nseries7800 storage (Netapp FAS6070)
Architectural Comments:
  • The VMM appliance (downloadable vhd) will be used (It contains the following components: Eval of Windows Server 2008 R2 Standard SP1, eval of SQL Server 2008 R2, eval of VMM for System Center 2012, Windows Automated Installation Kit (AIK) for Windows 7, Microsoft Storage Management Service, Web Deployment Tool
  • The new SCVMM instance will reside in a highly available virtual machine hosted on this Hyper-V cluster. The new high availability feature for SCVMM (where you can install SCVMM as “cluster aware application” on each node will not be used (no “hard” technical reason, mainly to keep VMM “portable” as vhd in our test environment).
  • As the VMM appliance is based on an evaluation version of W2K8R2 Standard and an evaluation version of SQL2008R2 we will activate Windows to make it permanent and use an existing (full) version of SQL2008R2 for the DB (installed on old SCVMM server: SCVMMR2.eebc.dom) to avoid any expirations.

Preparation:

  • Downloads: Download VMM 2012 appliance from HERE
  • Accounts: Even if you normally may not bother with dedicated accounts in test environments – DO create a dedicated VMM “service account” in your domain (do not use your “default” admin account as various steps will check that you don’t use it and hard stops and problems can occur if you do – more details are covered in the VMM documentation http://technet.microsoft.com/en-us/library/gg697600.aspx)
    • Ensure to make the VMM service domain account a member of local admins group on your new VMM Server (vhd)
    • If you use an external SQL server, ensure you have an authorized account for the DB creation
    • Add the VMM service account (using SQL studio) as e.g. “sysadmin” (otherwise DB creation during setup will fail)
Get Started:
  • Extract the download vhd to the existing SCVMM 2008 library
  • Create virtual machine using the extracted vhd file
  • Start virtual machine, run through the initial Windows setup, adjust domain and network settings to integrate the vm into your environment.
  • Start VMM setup by clicking the existing icon on the desktop of the VMM server
  • Ensure to specify the created “VMM service account” (I’d suggest to add it as “run as” account in VMM to allow for easy future re-use)
  • Specify the SQL instance you want to use (in our case “SCVMMR2″ but it could be in your case the bundled evaluation version)
  • Check that the setup finished without any issues – you will now have the VMM start icon on your desktop – start the VMM console.
  • Get a feel for the GUI, explore the new Office style “ribbon” menu but do not start adding hosts or clusters yet

Configure the Fabric

Before we upgrade and add our hosts we will prepare the “fabric” environment we want to add the hosts to.
Fabric is the new collective term for servers, network and storage managed by VMM.

Host Groups:

Host groups are hierarchical folder structures to group virtual machine hosts in meaningful ways, often based on physical site location and resource allocation.

  • From the Fabric pane start off by creating a hierarchical folder structure that reflects your logical infrastructure layout (e.g. by datacentre locations and sub-locations).
  • Review the available properties you can set on the host group level, like “Placement Rules”, “Network” and “Storage” allocations
    Note: Resources typically need to be allocated on the host group level before they can be assigned to hosts and clusters so familiarize yourself with these options (right-click on the host group and select properties)
Library:

If unfamiliar with the VMM library simply think of it is a repository for any resources you might need to access such as virtual hard disks, virtual floppy disks, ISO images and application packages (new in SCVMM 2012) as well as virtual machine and service templates and profiles that reside in the VMM database.
The library in VMM 2012 has been enhanced to support additional resource types and cloud structures. You can also store driver files that are used during the bare-metal deployment of a Hyper-V host and custom resources that would normally not be recognized as VMM reources (such as scripts)

  • Explore the library view of the default library (installed during setup), including the templates and profiles as well as “equivalent objects” (you can now mark identical objects as equivalent – that allows VMM to have multiple sources for the same object in order to decrease dependency on physical resources and consider locality)
    Note: VMM has no integrated mechanism to syncronise changes to the equivalent objects (you need to ensure e.g. replication or manual copy after changes)
  • Optional: Install a second library server (just to test some of the new functions like marking objects as equivalent for location sensitive deployments)- we installed a second library server: “w2k8r2-trial-0.eebc.dom”
    Note:  If you plan to implement a private cloud I strongly suggest to review the topic “Implementing a private cloud” in the VMM documentation before creating your final library structure as organisations might require dedicated library resources (to ensure access to THEIR resources)

Observed problem: Adding second library server did not reliably add all the default resources (skipped some of the default categories), circumvented by manually copying remaining files from one library to the new one)

Comment: There is no integrated functionality to view/monitor capacity information on the library servers i.e. to avoid running out of space or make decisions where to store images)

Set up your Logical Networks
  • Review the global network settings -> Settings Pane -> General -> Network Settings (they e.g. determine the behaviour of default creation and association of virtual and logical networks when none exist on certain components)
  • Create logical networks, think of them as descriptions (virtual switches) that reflect your external network structure. This is a new feature, previously you could only create virtual networks.

The structure is obviously entirely depending on your environment. I started out creating a new set of logical networks:

  • EEBC Lab Network (DHCP on 192.168.x.x)
  • Test Network for static IP pool (managed by VMM): 192.168.90.1-10 (with 192.168.90.9, 192.168.90.10 for IVP reserved for Load Balancer for future use)

Note: We will later verify the assignment of the logical networks to hosts by mapping them to the physical adapters (once the hosts have been added). As part of logical network creation, you can create network sites to define the VLANs, IP subnets that are associated with the logical network in each physical location.

Comments:

  • Reviewing the structure of the logical networks and the relationship to the hierarchical structure (e.g. host groups) is not easily done, there is no good view in VMM to get an “at a glance” view of the architectural network structure.
  • Try to remember the difference between logical and virtual networks. VIRTUAL networks are “virtual switches” on the host, providing connections to vms (typically vms to physical host NICs or private inter-vm connectivity), while logical networks are descriptions of external networks (“switches” connecting the host NICs to external networks) with properties like DHCP v IP pool, VLANs etc …
Setting up Storage

I will deploy the new SMI-S based storage management that allows the admin to perform common storage activities from the VMM GUI (e.g. create LUNs, assign storage etc) and also allows to offload certain storage functions directly to the storage array (if you are familiar with vSphere then this is a similar approach to – but not identical – to VMware’s VAAI/VASA approach).

Note: You need a supported storage array (array: storage system) to integrate VMM with SMI-S but you can of course use standard storage using non-SMI-S based storage allocation but you won’t be able to manage them through VMM.

Comment: You can work with existing disk resources or create new ones.Depending on the storage you might have to perform some actions using the native array GUI

  • In our case I created a new “SCVMM” aggregate on the Nseries using array GUI
  • Downloaded and install your SMI-S provider – in or case I installed the Nseries (Netapp) SMI-S provider and installed it on SCVMM server (could obviously be on another server)
  • Added hostname of SMI-S provider system to Providers under Storage (without SSL), the array was discovered OK and new aggregate “SCVMM” was listed

Observation: Please note that there can be delays in updating the array status in VMM after VMM driven configuration updates, ensure to “refresh” before performing new actions if problems occur Fabric Pane -> Storage -> Providers

  • Selected the disk resources you want to manage through, I selected the “SCVMM” aggregate to be managed by VMM
  • I also tested the array interaction by creating and deleting a test LUN through SCVMM and verified the activities through the Nseries GUI – all successful.

OK, we have prepared the fabric environment, now we have our hierarchical folder structure, added library servers to store images, created logical networks and prepared the storage.
Let’s add the hosts.

Adding Host Resources

Preparing Hosts
  • If you have not already done so add the storage multi-path (MPIO) feature to each host before adding the host/cluster to VMM. MPIO will then be configured automatically when adding hosts to VMM.
  • As the IBM blades are configured with Broadcom NICs I installed and configured the BASP failover driver with the defaults.
    Note: If the hosts are already configured for Hyper-V (as in our case) you will have to un-associate the NIC from the hyper-v virtual switch as the BASP installation will otherwise not be able to continue with the following error “The selected Adapter is bound to Hyper-V Virtual Network …”:

Re-associated logical network (hyper-v switch) with the team (rather than a physical adapter) as shown

Note: If you receive a warning 26179 when adding hosts/cluster “Couldn’t enable multi-path i/o for known storage arrays xxx” you have either not or incorrectly configured multi-path on the hosts before adding them. VMM will attempt to configure MPIO when adding host. Correct the MPIO settings before continuing.

  • If the hosts were part of a SCVMM 2008 cluster remove the existing SCVMM agents from the host before adding the host to the new SCVMM instance
Add Hosts/Cluster
  • From the Fabric Pane, select the appropriate host group and add the cluster (you can specify a cluster node and it will pick up the existing cluster.
  • Your cluster should now be imported into the new VMM instance and any existing virtual machines should be visible and operational.
Adding Storage to the Cluster

The storage allocation to hosts can be slightly confusing to the new user.

  1. Be sure that you can see the storage array and any resources on the array you want to use from VMM
  2. You first need to select to managethe relevant “storage pools” (aggregates in our case): Storage ->arrays, select your array -> properties, select the storage pool(s).
    1. As part of this you should create storage classification to describe the properties (e.g. if you have different storage tiers)
    2. Then “allocate” storage to a host group (folder containing hosts or clusters): properties -> storage
      You can allocate existing storage pools, existing (unmapped) LUNs or create new LUNS (free space on existing pool) and allocate them
    3. Then (and only then) you can “assign” storage to the cluster: select the cluster -> properties
      Note: you can add (assign) LUNs as “available storage” (think “normal” LUN) or “shared volumes” (think “cluster shared volumes”) – for what it’s worth – I don’t like the naming convention here nor the way of allocation
    4. Feel free to convert between CSV and “normal” LUNs – I selected all shared storage as CSV for the obvious advantages (there aren’t many reasons why you’d want to have “normal” LUNs in a cluster scenario)

In our case we had two existing cluster LUNs (Quorum + 1x CSV) on an existing aggregate (30 spindles) from the initial SCVMM2008R2 managed cluster, and as mentioned above I added an aggregate “SCVMM” with 2 additional LUNs (5 spindles) on the Nseries

  • We then selected both aggregates (original + new) to be managed by VMM and created 2 VMM storage classifications to reflect the performance differences (spindles) as shown below

Comment: There is no feature to exclude LUNs of a managed Storage Pool from the management (in our case we added an aggregate that also contains LUNs not used for the VMM environment). This distorts the capacity information (as unrelated LUNs are included) and introduce potential admin errors (e.g. can delete unrelated LUNs).

Observation: I ensured that the managed disk pool is allocated to host group but any attempt to add the new storage pool (or LUNs within the pool) to the cluster failed with error 26184 “The Storage Group existing for xxx doesn’t match storage group setting at array xxx”

Resolution: As VMM will create relevant LUN-to-host mappings at this point, any existing conflicting configurations may cause problems. Use the native array GUI to remove invalid old mappings for the HBAs/hosts (in our case in the “initiator” section of the Nseries GUI). After deleting invalid old mappings the process worked.

  • As expected, after fixing the “ghost mappings” the assignment of available storage created automatically the respective LUN to host (initiator) mappings on the Nseries storage.

  • As the storage was assigned to the cluster it also created automatically the cluster resources (as seen in failover manager)

  • I then converted the volumes to CSVs – no problems – the CVS were created automatically and made available to the cluster nodes.
 Verify Host Network Config
  •  Again, verify that MPIO is configured correctly: Admin Tools -> MPIO, if you have added MPIO before adding the hosts (or configured MPIO manually correctly) you should see something like the below
  • Perform the association of logical network association with the hosts: Select host -> properties -> Hardware -> select NIC -> check that the relevant logical networks are connected

Observation: Adding an additional host to an existing cluster fails with error 25343: “No network adapter found on host xxx that matches cluster virtual network xxx”. The error refers to a miss-match with the VIRTUAL network. However the recommended action points out that you should set the LOGICAL network on the NIC.

 Therefore do NOT just try to create a matching VIRTUAL network like below on the host:

Instead as described above, select the host before adding it to the cluster -> properties ->hardware -> NIC and ensure that the associated logical network is connected correctly.

Configuring Dynamic Optimization and Power Optimization

Dynamic Optimisation for Hyper-V (again, if you are familiar with vSphere think “DRS”) is now very easy to set up. Forget the extremely awkward SCOM/PRO dependency for even basic optimization in SCVMM 2008.

  • In the properties for the host group containing the cluster, enable Dynamic Optimization with the appropriate settings – literally nothing else is required at this stage …
  • 10 mins later first “optimisation” took place:

Note: Power optimization requires direct out of band BMC access for IPMI control (i.e. try to ping the BMC IP address from the VMM server … since the BladeCenter chassis uses central management of the blades through its management module it will not work on this setup.

VMM Updates (WUS)

VMM now supports compliance scanning and remediation of the fabric servers (again, think “VMware Update Manager” in vSphere). VMM supports orchestrated updates of Hyper-V host clusters (VMM places one cluster node at a time in maintenance mode and then installs updates) while vms are being live migrated. If the cluster does not support live migration, VMM saves state for the virtual machines

We will install a dedicated WUS server for VMM (installed on the VMM server). You can also use an existing WUS server in conjunction with SCCM.

  • Downloaded WUS from Windows Server Update Services 3.0 SP2
  • Installed prerequisites:
  • Installed WUS with following options:
    • Full server installation including Administration Console
    • Create a Windows Server Update Services SP2 Web site
    • Selected relevant settings regarding updates (limited languages and selected relevant W2k8R2 updates only)
  • WUS console showed that initial sync was successful

  • Added WUS server to VMM server (fabric -> add resources) – port 8530 – no problems
  • Reviewed the default baselines and created a new test baseline – added critical and security baselines to “all hosts” host group

Comment: There seems to be no intuitive method of filtering/selecting updates at this stage and the baselines are not continuously maintained (e.g. you sorted all updates  by “critical” and created an “all critical updates” baseline. That means that critical updates released in the future are not automatically added to this baseline)

  • Scanned all hosts for compliance:

Remediated the non-compliant server (if I had a non-compliant cluster then remediation would have put hosts into maintenance mode in round-robin before applying updates)- This is what you should see after the Remediation:

Comments: This all works and is straight forward but …

- No integrated WUS synchronization (to download new updates) – only “on-demand” (marketing term for “manual”) – No dynamic updates of baselines to include the new updates (i.e. by category “all critical”)

So in order to stay updated one needs to:

1)      Manually sync the WUS server (to download new updates)

2)      Manually update baselines to include the new (synced) updates

OK, so now we have added our hosts and associated them with storage, logical networks, enable Dynamic Optimization and configured updates for the hosts.

Our virtualization environment is basically configured and we could go ahead creating vms, templates and deploy workloads. However, what we really want is to create a private cloud ….

Private Cloud

I will assume that the reader is familiar with the concept of a private cloud. Essentially we want to create an environment that allows us not only to pool our underlying resources (which we have essentially done) but to enable shared Self-Service access for users from different organisations, delegate management without requiring users to ask the private cloud provider for administrative changes beyond increasing capacity and quotas as their needs change. While you can create private cloud from either Hyper-V hosts, VMware ESX hosts and Citrix XenServer hosts we will only u

Scenario:

We want to make the resources in the host group “ATS Lab” available through two private clouds:

  • Private Clouds:
    1. Cloud 1: ATS Department
    2. Cloud 2: Visitors and Test/Dev
  • Capacity:
    1. ATS Cloud with have unlimited capacity quotas on the underlying resources
    2. Visitor and Dev Cloud will have limitations on memory, storage and number of virtual machines
  • Network:
    1. All will have access to the same logical network (DHCP)
    2. Only ATS will additionally be given a dedicated IP pool (fixed IPs)
  • Storage Tiers:
    1. ATS: Gold
    2. Visitors: Silver
  • Library:
    1. ATS: Both Library shares on SCVMMLibrary1
    2. Visitors: SCVMMLibrary2
Prepare Cloud Libraries 

Please spend some time to properly plan the library structure to accommodate multiple orgs/dptms

  • Distinguish between read-only “catalogue resources and write-able “repositories” resources (store virtual machines)
  • Create read-only library folder structures (not shares) on the library server(s) that allow dedicated folders (with unique paths for each “organsisation”) to store vms. You can see below that we created dedicated “write” folders on the same library server as the “read-only” library share but not within the share! (suggest to review the impact of user rights and folder structures in the documentation)

Note that (just as a test) in this example we have selected separate folders on the same server (Library1) for both orgs to store vms (while the read-only shares are dedicated to Library1 and Library2 respectively. This is by no means intended to be a “best practices” library setup.

Comment: A “reference library layout” in the GA VMM documentation would be useful – the library structure can be confusing given the different types of folders, shares and access requirements for the cloud libraries (in addition to the standard libraries)

Creating the “ATS Cloud”

From the “VMs and Services” Pane select “create  cloud”, then specify the cloud properties

 

 

 
Creating the “Visitors and Dev_Test” Cloud

 

  • Verify that the clouds were successfully created from the VMs and Services Pan

Comment: The capacity settings have some inconsistencies and limitations:

  • Danger of “over-committing” capacity
    • There is no way to guarantee resource – only limit/cap the usage
    • There is no warning when “overcommitting”, i.e. you can only have 36GB of physical RAM combined in the resource pool shared by two clouds but you can “limit” to e.g. 64GB on each cloud – no warning or visibility of how much of the resource has been “committed” (it’s not commited as such as it’s a “limit”)
  • Values shown as “unlimited” – which is strictly speaking correct but meaningless i.e. how much is “unlimited”?
 

Configuring Self Service

Self-service users can deploy their virtual machines and services to private clouds.

  • Role-level quotas on the self-service user role are used to allocate computing capacity and other storage within the cloud.
  • Member-level quotas set individual limits for self-service user role members.

Self-service users can also create their own templates and profiles. The Author action for a self-service user role grants self-service users authoring rights. Users with authoring rights can create hardware profiles, guest operating system profiles, application profiles, SQL Server profiles, virtual machine templates, and service templates.

Preparation:

You typically create security group(s) in active directory and associate Self Service User Roles to these groups.

  • In AD:
    • Created Security Groups “SelfService_ATS” and “SelfService_visitors”
    • Added “ATS1” and “Visitor1” as new test users to the respective groups
Creating the Self Service User roles in VMM

We will create two Self Service User Roles in order to test different levels of entitlements to the cloud environments.

    1. ATS Self Service User (“unrestricted” access to both clouds)
    2. “Visitors Self Service User” (restricted access to visitors cloud (only) without “Author” rights and limited quota for max of 2 vms per user)
      (The Author right determines whether a user can create their own templates)

Create ATS Self Service User Role:

    • From the Settings Pane -> Create User Role
    • Add “SelfService_ATS “ role
    • Gave access to both private clouds (ATS and Visitors)
    • Granted all Self Service rights
    • Created and shared a folder for the user role data path (where SS users will be able to upload and share the physical resources that they use to create service templates and deploy services)

Create “Visitors Self Service User” Role:

  • From the Settings Pane -> Create User Role
  • Added “SelfService_visitor “ role
  • Gave access only to visitor cloud
  • Granted all Self Service rights except Author
  • Limit quotas as shown below
  • Created and shared a folder for the user role data path (see above example)

Note: In order to test assigment and sharing of resources between user roles I subsequently created and added a vm template and a guest OS profile to the library and added them as available resources to the ATS Self Service User role only!

Observation: Capacity and Quota assignment is straight forward in VMM but viewing the (effective) allocations is not intuitive as the cloud overview does not seem to correctly reflect the assigned values.  Example:

  • The “visitors” User role restricts the member to the following quotas:
    • the role (group) level is unrestricted
    • the member level is restricted (e.g. to 2 vms only) as show below

  • However, logging in with as the self service user “visitor1” (which is member of the security group that is associated with the “visitor” user role does not display any limitations in “ Quota for visitor1” – see below:


Changing the role level quota to a restricted amount is however correctly reflected so the user is only able to see group-level quotas NOT user-level quotas (which would be more appropriate for Self Service Users in order to understand what is available to the particular user when logged in).

Logging in as Self Service User

We are now logging in as the respective Self Service Users to verify the correct resource assignment.
Note that you can concurrently log in as administrator and self service user(s) from the same system as shown below

  • Log in as user “ATS1”
  • Note that there is no Fabric Pane
  • Library Pane: As expected we can see all cloud library resources (not the physical library servers)
  • The assigned resources (guest profile as an example) are visible
    Note:
    • ATS1 can create new templates as we have given “author” right to the user role.
    • Pay attention to the context menu options. ATS1 can create templates as we have given the user role “author” rights.

  • Now log in as Self Service user “visitor1”
  • As expected we can see only the cloud library resources associated with the visitor’s cloud but not the other library resources.
  • No resources (e.g. guest profiles as shown) are available yet as we have not assigned any to the visitors user role
    Note: Pay attention to the context menus – as we have NOT given the author right to the visitor user role there is no option to create a template (see limited menu options) – all working as expected …

Sharing Resources between Self Service Users

Finally we want to test the ability of VMM 2012 to share resources between Self Service Users. SS Users can either be entitled to resources through their user role or through object based sharing of resources if the user role “rights” (“Actions” as defined above in the user role) allow that to happen.
As mentioned above, the ATS Self Service users have already 2 resources allocated (one guest profile and one vm template) – the Visitor SS users have not been allocated any.

In order to share resources between ATS and visitors the ATS SS user role must have the “share” action enabled and the Visitors SS user role the “receive” action enabled (we have done this when we created the user roles)

Also note that the SS user must be the owner of the resource in order to share it (e.g. must have created the resource or be made the owner by an admin)

  • We logged in as user “ats1” and created a test guest profile “Shared Guest Profile”
  • In the properties of the resource (library view) ats1 can now share the resource with other user roles (that have the “receive” action enabled), see below:
    Note: Ensure that that the logged in user is the owner of the resource and add other user roles for access as desired

After performing the above action and logging in with “visitor1” we can now see the shared profile being available to “visitor1” as expected.

 

Deploying a Resource to the Cloud

Finally let’s deploy a test virtual machine to the cloud using the “visitors” user role.
After specifying Source, virtual machine name and virtual hardware you are asked to specify whether you want to deploy to the cloud.
  • As expected we are able to (only) specify the visitors cloud as target

 
OK, that was my test log from the initial upgrade to SCVMM, covering the required steps to configure the fabric resources including storage, network and compute,  updates to the fabric servers, dynamic optimisation and finally the creation of private clouds and Self Service User roles.
 
I still have to write up the Service Profiles section and will test the App Controller hybrid cloud functionality. As I’ve also tested the new bare-metal deploy function I might add another blog on this.
 
Due to its nature this was clearly a less “opinionated” article – but don’t get your hopes up too high – the next one might just be the opposite ;)
 
Andy



Is there a market for entry-level cloud management software?

by +

So why would anyone want an entry-level cloud solution?

No time to set up SKC yourself? Simply watch this short video – it will be worth your time!  


“Angry Boss” – Starter Kit for Cloud (demo by Andreas Groth and Kenny Bain )  

  

Let me rewind … many many years ago I was sent some test code for a very basic web interface allowing self-service requests for virtual machines – developed by a single VMware employee in his spare time – looking back, this was the first time I actually “did cloud”. And I liked it because it was exactly what I wanted at the time – a simple way to enable, control and streamline resource requests.  

With marketing engines blazing today we seem to have forgotten what drove these initial efforts and often it feels that vendor capabilities drive our (perceived) cloud requirements rather than the other way around (as it should be). It seems that everyone today is brainwashed into thinking they are a public cloud service provider (and I understand that some IT departments indeed become some sort of “service providers” for internal divisions but typically with totally different security requirements).  

“With marketing engines blazing it seems that everyone today is brainwashed into thinking they are a public cloud service provider”  

So it’s not surprising that security concerns, compliance and business process integration challenges often spring to mind first when listing cloud adoption inhibitors.

However, on a more practical level from my experience for many smaller private cloud projects the upfront implementation effort with the associated cost, complexity and lack of in-house skills are the first (and still often final) hurdle.  

A study covered in August 2011 underlines this pattern.  

Cloud Inhibitors

And yes, this doesn’t come as surprise, the first time we installed vCloud Director last year it took us the best part of 4 days – a far cry from the “click next”, “next”, next, next” experience many got so used to with e.g. vCenter (and don’t get me wrong – vCloud Director has a great UI and this is by no means a “VMware only” issue).  

So why is it that even vendors like VMware who are known for intuitive management UIs struggle to deliver a simple, “end-user” installable cloud management suite?
To a certain extent it’s the nature of the beast … a full multi-tenant cloud management stack is vastly more complex as it touches and incorporates not only layers of the classical server infrastructure but the extended network, security and more importantly the interfacing business support systems. When combined with the inherent requirement for system wide orchestration and extensibility through comprehensive APIs for each individual customer environment it is clear that by it’s very nature it will not be “simple main-stream” for some time.

Efforts like VMware’s vCloud Director Appliance (for evaluation purposes) show that this is a recognised problem.  

So while there is no easy short-term solution, the question you should ask is whether YOUR environment always really needs all the bells and whistles of ” a full-blown” cloud management stack … and I’m by no means implying that the answer will always be “no” …
The question you should ask is whether YOUR environment really needs all the bells and whistles of ” a full-blown” cloud management stack?
So – you might argue – nothing new here … one needs to understand the specific functional and operational requirements of the environment and translate that into ones custom solution – that’s what we (architects) do right…?
Well, unfortunately today’s reality is that your are likely to be presented with a “one-fit-all” approach by most vendors when it comes to “your” cloud solution – it’s typically “take that or nothing” – unless I’ve just missed the e.g. vCD “light” version? ;)

   

What is Starter Kit for Cloud (SKC)?

When we showed off “IBM SmartCloud Entry delivered by IBM Starter Kit for Cloud” at VMworld 2011 I was seriously taken by surprise how much interest it generated but retrospectively it makes clearly sense.
OK, so what is the IBM Starter Kit for Cloud? In a simplified way it is a browser-based orchestration layer that is installed on your existing virtualization environment to provide cloud-like functionality. Take for instance your existing vSphere infrastructure, install SKC and point it to your vCenter server. It will automatically surface your existing vSphere workloads and templates and add extended self service portal functionality.
Take your existing vSphere infrastructure, install SKC and point it to your vCenter server. It will automatically surface your existing vSphere workloads and templates and add extended self service portal functionality
So what’s good about it? (and I am conscious of the fact that I’m an IBM employee covering an IBM product so please bear with me before shouting “fix”)
  1. It’s has an extremely intuitive user interfaceYes, I can hear some of you … “An intuitive user interface from IBM??”. I am probably the first to admit that our UIs can sometimes be intimidating to the novice user but if you are e.g. familiar with the IBM Storwize interface http://youtu.be/aHC5X_-gzw0 then you know that great attention is being paid internally to user experience and the SKC UI clearly reflects that.
  2. It installs in minutes …Now, I really mean that – I have put together a short “Our Angry Boss Wants a Cloud” video that captured the entire install process in our lab environment. It also gives you an overview of the interface and overall functionality. If you have a few minutes then have a look above. And yes – humor is intended but bare in mind that I’m German ;)
  3. It provides the core functionality for private cloud portalsA web-based self service user portal, project based workload entitlement, request and approval management with email notifications and basic metering and billing for the deployed workloads.
  4. Multi-Virtualization Vendor supportOK, so today it only “cloudifies” VMware vSphere and IBM system p Unix systems (separate editions) but given IBM’s publicly stated policy of open choice it would seem logical that SKC would be extended to support other x86 hypervisors from a single SKC instance in the future (I am not making any official forward-looking statements but think of e.g. KVM as additional virtualization platform)
  5. Attractive price point and easily extensibleSKC is priced per server (so independent of the number of virtual machines!) and can be purchased for under $
    2K with a 1 year S&S.
    SKC has a documented REST API that allows for integration and customization of SKC in your environment.
     

      

As always I’ll be straight on this blog (even if I talk about one of “our” products) …

 … So what is SKC (not) … ?

It is what it says on the tin, an entry cloud solution – it is e.g. not intended to be a fully fledged multi-tenant cloud solution for service providers – IBM has other products in the portfolio addressing this space – see our Cloud Service Provider (CSP2) offering or our SmartCloud Portfolio.

SKC is not intended to be a full multi-tenant public cloud solution for Service Providers – there are other products in IBM’s portfolio to address this space  

To give you an example, while you can e.g. create virtual networks in SKC it does not have secure network isolation a la vShield with VMware vCD. And if you look for all the advanced functions IBM’s ISDM or VMware’s vCloud Director + (fee-based) extensions can provide then don’t be disappointed not to find them all in SKC. Also be aware that the currently supported vSphere version is 4.1 (with support for v5 coming early next year.

I really do like SKC for what it is (otherwise I wouldn’t cover it here) – so if the core of what you need is delegation of resource provisioning, control vm sprawl through request and approval management, basic metering and billing and you feel that other offerings are too complex, too costly and simply overkill for what you need then I can only suggest to evaluate SKC (see HERE for details and contact).  

SKC is by no means the “one fit all” answer to all of our cloud scenario - it is simply another option in your architectural toolbox – determining whether it fits your needs will be required …
If it does fit, I believe it can simplify your job greatly and give you quick time to value on your journey to the white fluffy thing … .
 

  

  

 

  



Has VMware won the Hypervisor war? … and who cares anyway… (Hyper-V 3 & SCVMM 2012, RHEV 3, XenServer)

by +

I’ve made it a habit to kick off every larger presentation by asking the audience to indicate adoption of virtualization in their environment. Suffice to say that even just 2 -3 years back barely 10% would indicate any serious evaluation or large scale production use outwith the VMware realm.
And when I talked enthusiastically about Virtuozzo, XenServer, VirtualIron and Hyper-V the eyes of many glazed over until we came back to VMware. And it is difficult to describe but there was (is) a distinct pride, an admiration, almost the feeling of a “cult following” among the VMware users.

Just 2-3 years back barely 10% would indicate use of hypervisors other than VMware – forward to 2011 and over 50% are actively evaluating alternatives …

Forward to end-2011 an the landscape has seriously changed.
Today I barely run a session where less than 50% of the audience lifts their hands and while indication of large scale production use is still limited, the appetite for virtualization alternatives is incredible.
A survey in spring 2010 showed that 78 percent of the 243 respondents said they were using VMware while 38 percent said Microsoft Hyper-V. In the 2011 survey of 250, the gap closed to 59 percent for VMware compared to 53 percent for Hyper-V. Citrix XenServer users doubled year over year from 9 percent to 18 percent of respondents. (statistics on market share vary greatly and are often skewed due to free downloads which don’t result in production use).

2012 will be the first real year of a “hypervisor war” and it will decide the battle in the private IaaS cloud market and its interconnected cloud layers (PaaS, SaaS etc) …

Just two weeks ago I had a client exclaiming that VMware has won the hypervisor war years ago and moved on to fight the cloud war. What war? Let’s face it, there were merely a few “uprisings” without real alternatives…
I’m risking to sound over-dramatic but 2012 is shaping up to be the first real year of a “hypervisor war” and it will directly impact the battle in the private IaaS cloud market and therefore shape the entire ecosystem of the interconnected cloud layers (PaaS, SaaS etc).

What has changed?

  • VMware has evolved from the cool underdog that couldn’t get it wrong even had they tried, to one of the “big players” now subject to the same scrutiny when releasing a new product or changing licensing.
  • Partner loyalty in some areas is exposed, there is increasing portfolio overlap (competition) with their large OEM partners (a critical route to market) and an unsettled technology-partner community that takes a sigh of relieve every time the latest VMware product has not erased it’s ecosystem by branching out into their (management, backup, storage etc) territory. All open to alternatives …
  • Most importantly – the arrival of competitive products will seriously challenge VMware’s position. Most will not challenge VMware’s technology leadership but they will aim to provide (at least) ”good enough“ alternatives or vertical  solution to gain market share. They will target the cost-conscious customer segment (and who doesn’t fall into that category today?) as well as customers desperately looking for a multi-supplier strategy to enable price-negotiation leverage when dealing with VMware.

    So who are the contenders? (and don’t mistake this as an anti-VMWare rant given my genuine respect for the company)

The Contenders

Microsoft
VMware’s biggest threat will undoubtedly be Microsoft with it’s new System Center 2012 release in the first half of 2012 (possibly April). Do not make the mistake and discount this release as “one of those minor “updates”. Especially the new Virtual Machine Manager is clearly an altogether different animal and having followed it since the initial beta in March I can only advise to spend some time evaluating it.

… imagine all the annoying shortcomings SCVMM has today had been magically addressed and add some impressive cloud functionality on top … now you are pretty close to what SCVMM 2012 will look like …

How to best summarize the improvements without going into detail …?
Well, think of all the shortcomings that annoy you with with the current version of SCVMM (whether the awkward dependency on SCOM when implementing DRS-like function, the non-integrated high availability, lack of vApp-like constructs and power management as well as the inflexible virtual network etc etc … ).
Got it? Now imagine most of those shortcomings had been eliminated/improved … then add a fundamentally different fabric management and service template approach, bare metal deployment capability and put an impressive private and hybrid cloud functionality with App Controller (Concero) on top … done …? Now you are pretty close to what SCVMM 2012 will look like (more details in a previous post or here).

It’s important to understand that the SCVMM release will address the hypervisor management layer, not the Hyper-V layer itself. However, the end of 2012 will also see the release of Windows 8 and with it Hyper-V version 3, which will address today’s underlying scalability and functionality limitations. A good summary of what’s to come is here.

HyperV3


Microsoft will have a strong contender by YE 2012 even if you take the – in the meanwhile ongoing – evolution of VMware’s product portfolio into account and many of my recent clients have confirmed that they are actively evaluating it.

Red Hat
One of the biggest surprises at VMworld Europe for me was the interest in our setup of Beta RHEV 3 (Red Hat Enterprise Virtualization). The feedback made it clear that the improvements in this release finally allow Red Hat to provide a credible “good enough” alternative and that not only for clients specifically pursuing an open source based alternative.
My initial impression was “wow, the GUI is very intuitive (yes, I have to admit vCenter-like in some respects … and that is a good thing).

RHEV3

Red Hat has finally ported its RHEV Manager to Java, JBoss and PostgreSQL, removing the dependency on Windows (except for the client which still has some dependency on .Net in this release). RHEV 3.0 also has a nice little “power user” portal that allows users to create and manage their VMs.
There are still a couple of basic things Red Hat needs to (and will) address in the next releases, the lack of live snapshots and live storage migration among them.
Besides the user experience, one of my major concerns with RHEV has always bee the lack of a consistent API and the challenges that poses for the much needed build-up of an ecosystem around RHEV (by enabling ISVs and OEMs to easily create extension) – therefore the addition of a RESTful API for integration with RHEV Manager is extremely welcome.
RHEV already had the advantage of a unified console for server and desktop virtualization and the graphical user experience with the SPICE protocol on our setup in Copenhagen was seriously impressive.

Red Hat is unlikely to challenge VMware head-on but will gain market share by identifying vertical stacks and targeting suitable customer segments that are craving for alternatives or open source based solutions

RHEV 3 (currently public beta) will be released early next year and in conjunction with the previously discussed Cloud announcements Red Hat has an opportunity to make an impact – not broadly challenging VMware head-on but by identifying customer segments and workloads that are craving for alternatives or open source based solutions.

Citrix
Now I really wish I could confidently add XenServer/Center to the list of contenders for 2011 … I have never made a secret of my admiration for Citrix to create (for a long time the only) credible alternative to VMware, sustaining to innovate again and again and to provide a free version in a space totally dominated by VMware and with a Microsoft relationship that makes clear positioning tricky. And it’s not the lack of functionality that makes me hesitate, XenServer/Center 6 is looking strong, but the lack of a well communicated vision for the role and integration of XenServer in Citrix’s Cloud plans.
After announcing project Olympus (OpenStack + XenServer) which reassured the importance of XenServer, the acquisition of cloud.com (managing multiple hypervisors with the CloudStack platform) makes the future role of XenServer unclear. Combined with the fact that Citrix happily collaborates with Microsoft to allow System Center 2012 to manage XenServer and therefore essentially exposes it’s own future virtual infrastructure control point is not reassuring.
Can XenServer seriously survive a Hyper-V 3 onslaught even if Citrix decided to (seriously) take on Microsoft (which is unlikely anyway).

There is a lack of a well communicated vision for the role of XenServer in Citrix’s Cloud plans – can XenServer carve out a niche as VDI Hypervisor?

What’s the alternative ..? Citrix is likely to find its vertical “niche” in the cloud business (and e.g. the end-user computing one is not a small niche). XenServer could help fulfilling that very role if Citrix focussed further on the creation of a true VDI hypervisor. Currently we still see the majority of XenDesktop deployed on vSphere but with more vendor specific optimizations like IntelliCache and GPU passthrough Citrix could carve out a niche for XenServer.

Why should we care?

But why should we even care about the boring old hypervisor in the age of Cloud Computing?
Many increasingly (and mistakenly) describe the hypervisor as “replaceable commodity” with little vendor stack dependency. And yes, if you looked at this thin abstraction layer in isolation you would have a case but that would also imply that seamless mult-hypervisor management (to enable “rip and replace” of Hypervisors) is a reality today.

Seamless mult-hypervisor management is not yet reality – today’s solutions are merely marketing tick boxes and strategic directions rather than usable alternatives ..

It isn’t … efforts like Microsoft’s SCVMM’s capability to manage VMware environments and VMware’s XVP to manage Hyper-V are today merely marketing tick boxes and strategic directions rather than usable alternatives. The mammoth effort involved in creating full replacements providing the equivalent functionality to the native vendor management app and the challenge to keep up with the constant flow of new (competitive) vendor releases casts doubts that this will soon become a reality.

So be wary of such messaging – feedback from my clients almost unanimously show that – while all would love to have this “single pane of glass” management for heterogeneous hypervisor environments – they prefer to use multiple native vendor tools (with full functionality) to a single tool (with only a subset of functionality).

Without today’s vSphere footprint there would be no case to use vCloud Director – any hypervisor market share loss will directly impact the VMware’s private cloud market share

Make no mistake, VMware’s strength is the foothold in the virtual infrastracture space, directly influencing the higher management layers – without the vSphere footprint there would be no case to use e.g. vCloud Director (which manages exclusively vSphere environments). Any hypervisor market share loss will directly impact the VMware’s private cloud market share. So yes, we should absolutely care about these dynamics.

It is important to understand that this dependency does not necessarily apply to the higher cloud management layers. We have already seen many examples of 3rd party cloud management platforms successfully drive the underlying e.g. vSphere or KVM-based infrastructures as independent target resource pools.
More on this topic in one of my next rambles … ;)

So it’s undoubtedly going to be an interesting year but with VMware continuing to innovate at incredible speed the main question remaining is how much of this “appetite” for alternatives will also result in “consumption” … ?



VDI – Success or Failure? – or – Why “VDI-bashing” is popular….

by +

I have to admit that it is slightly irritating to fly home from ‘yet another’ client meeting and stumble across ‘yet another’ discussion (OK, let’s call it rant) about VDI on twitter or blogs. Especially when the meeting was actually a positive one that included VDI – so yes – my opinion upfront – there is such a thing as ” successful VDI” (and note that I’m absolutely not saying all projects are)

There are good and even great (critical) discussions and articles with absolutely valid points like this one (discussing VDI’s obvious challenges and shortcomings) – and then there are others, rants driven by frustration, inability to position, feeling mislead and – IMHO the most dangerous ones … are the geniuses and visionaries who already live in 2015 and seem to easily dismiss everything that is today’s (boring) reality.
Don’t get me wrong, we (the industry) need thought leaders – and the best of us all have a visionary in them – but don’t assume genius = flawless/always right. Genius actually = high risk – yes, with potentially high returns. Dangerous, because thought leaders create followers – unfortunately often following blindly, amplifying and distorting originally educated, valid points until they become false generalisations.

Ultimately it’s our own (IT vendors’) fault, we are hyping up, creating bubbles, “re-inventing” our portfolios (translate into “re-badging”) and push our own agendas – so I completely understand where the backlash is coming from. With VDI we have surpassed the hype curve, face adoption and with it a level of disillusion. And suddenly VDI-bashing becomes … well … fashionable.

But enough of my rant, as you know, I am not associated with a VDI vendor, agree with many of the raised technical concerns, I work with server virtualization and cloud environments (so have nothing to “lose” if VDI quietly faded away) but here is my honest view and to make it as short as possible I’ll just paste without changes the comments I posted on Simon Crosby’s blog:  (also check out Tal Klein’s great comments here)

==== paste ====

“To me this boils down to: (and I’ll keep it short as I have the tendency to drivel on)

- Are we trying to solve a problem which does not exists? No. (so I slightly disagree with how you portrait the current state of physical endpoints – time spent on image management is unsustainable for many of my clients)

- Is the answer to the problem always VDI? No.

- Can the answer be VDI? Yes, as a tactical solution for specific use cases where TS does not fit.

- Can VDI be a strategic approach – yes, as a stepping stone (and I am aware of the inherent contradiction here) to enable logical separation of OS and app, mobile & user-centric app delivery.

- Why VDI as the “one fit all”: 1) because the VDI “vendor says so” – 2) because the clients prefers the simplicity of one architecture that can (potentially) cover all use cases (so both dubious but reality)

Bottom line – I can’t fail VDI completely – VDI has a space as tactical building block alongside TS (and others) – combined as strategic hybrid approach to facilitate future app delivery models. (all simplified in the interest of time)

=== paste end ====

This really sums it up for me – unless I’m working in an under-developed / left-behind part of the world – then todays reality for most is not a single “Nirvana-like” solution with a unified GUI, integrated modular sub-components, driving all use-cases of End-user-computing (or ideally all IT services) from the same platform.
If that was the case quite frankly architects would be out of the job. That’s what we do, match use cases with suitable technologies.
Clients have unsustainable problems today – the requirement for tactical solutions in the search of strategic ones is today’s reality, not everyone can “continue and wait”.

And yes, we absolutely should do that with a vision of how to enable emerging approaches. But too often have I come recently into client meetings instructed and prepared to talk about “cloud”, vSphere 5, SCVMM 2012 or RHEV3 and the client kicked off with e.g. “let’s fix today’s problem first – we need to migrate off vSphere 3.5 – so we want a vSphere 4.x (not 5) architecture from you …” (that particular one was one of the largest retailers in the UK just 2 weeks ago).

What’s my point ..? We have a tendency to get ahead of ourselves, forget what today’s problems in the accounts are, we sell one idea then happily move off to the next “fancy” thing, sometimes leaving the client with the mess and giving them a kick in the teeth by turning around saying things like “yes, this VDI thing is rubbish – I could have told you that in the beginning” – “you should really go with TS” or “you should wait for HTML5″ – replacing the same mistaken “one fit all” approach with another label.
The question is not “Does VDI work?” (of course it “does”) but rather is “Where does VDI fit?” and “Does it work for you?”
It’t should be one of our tools in a set of possible solutions to a client-specifc problem, neither the de facto answer to everything nor discounted by default.

If we sold VDI as the only answer then we were wrong (not architects or consultants) to begin with.

It is easy to criticise – the art is to make it work …