F3 Technology Partners

Current Articles | RSS Feed RSS Feed

Better VMware Management with Veeam

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

Virtualization has solved or simplified many IT tasks and/or made them safer.  However like all things in the universe this must be balanced out with new issues and tasks that need to be confronted.  In a previous blog I had addressed the challenges of backing up of virtual environments.  I would now like to discuss managing them as this has become an important topic for many IT shops.

Currently there are a set of products that are designed to manage and maintain your infrastructure.  Some of these include Microsoft SCOM, HP OpenView, Nagios and Zenoss.  I have seen these systems successfully implemented in many of the sites that I have visited.  However they all have one thing in common.  Their monitoring, alerting and reporting capabilities are focused on a system as an individual asset.  That scope makes perfect sense if that server is running a single OS supporting a few applications.  It does not work so effectively when you are in a virtualized environment with many operating systems and many applications running on a few hosts. 

To help you increase the scope of your monitoring, alerting and reporting with a virtual environment Veeam has created two products which help bring your whole environment into view.

Monitoring
Veeam Monitoring smallVeeam Monitor is the tool that actively monitors and alerts on your environment and allows you to view and customize what performance data you see from the VMware environment.  Your view can be as high up as the whole environment or as low as a single process running a Windows 2008 guest and everything in between.  Simple things like a list of your highest and lowest utilized ESX hosts or what are you top 5 resource consuming VMs can give a more complete understanding of your environment.

When analyzing a system it can be difficult to see what the root cause to an issue is if your view is limited to that guest VM.  For instance if the ESX host a guest is living on is critically low on memory and swapping this will cause those ESX’s guests to experience performance problems.  However you cannot determine that directly from the guests themselves as they are unaware of the hypervisor. Veeam nWorks management pack for your existing Microsoft SCOM or HP OpenView management tools solves this visibility issue.  You can now have a complete view of your system from the virtualization layer down to the guest OS and Applications from one pane of glass. 

Reporting
Veeam Reporting smallReporting on the health and status of your virtual infrastructure is important.  The reason for this is a virtual infrastructure is fluid as there are so many different workloads operating in the same compute environment.  Virtual environments are usually in a state of growth, whether it be adding new guests or adding /upgrading hosts.  Veeam reporter is designed to take the mountain of data that vCenter collects and allow your compile and view it in a way that makes sense.  By this I mean if you want a quick list of the VMs on a certain datastore or an inventory of your entire virtual environment.

To help managing the fluid nature of VMware Veeam reporter can give you change reports.  These reports are composed from the differences in your vCenter database over time.  Running a job over time will produce a change report at the granularity of the job schedule.  This change report will show what has been added, modified and deleted and who did it and when.  A report like that is immensely useful for a tracking what is going on in the environment.

Both of these products are very feature rich.  They can be downloaded for free for 30 days to test in your environment from [http://www.veeam.com].  F3 Technology Partners would be happy to assist with setting up a POC in your environment.

Solaris Swap Q&A

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

Swap space in the Solaris 10 OS is misunderstood or poorly understood by most SA's, myself included.  Here is my shot at explaining what I think it means from a Systems standpoint (as opposed to a programming or design standpoint).

Q:  When I run 'swap -s', what does the output mean?

A:

# swap -s

total: 61541352k bytes allocated + 10156312k reserved = 71697664k used, 5965344k available

'allocated' is the sum total of all the user process address spaces (including shared memory), plus all the data in /tmp.

'reserved' is memory which has been allocated but which is not in use.  Solaris attempts to put these pages onto the swap device so as to not occupy physical memory with unused pages.

'used' + 'available' should equal all of the RAM on the host, minus anything used by the kernel.

Q: Why does 'used' + 'available' not add up to the RAM on my server?
A: kernel memory pages are not counted in the output of the swap command because they are not directly accessible by processes.

Q: How can I see how much memory the kernel is occupying?
A: Here is one way:

# echo "::memstat" | mdb -k

Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                    1483266             11588    9%
ZFS File Data             4445637             34731   27%
Anon                      7034973             54960   43%
Exec and libs              249329              1947    2%
Page cache                 787194              6149    5%
Free (cachelist)          2036135             15907   12%
Free (freelist)            416189              3251    3%

Total                    16452723            128536
Physical                 16430557            128363

Q: OK, I see that the kernel is occupying 11Gb of memory.  So why does the total of used + available only equal about 77Gb on a 128Gb host?  Shouldn't it equal 115Gb or 117Gb?

A: ZFS has played a sneaky trick on you by using kernel pages for its adaptive read cache (ARC).  On this system, the ARC has grown large and is occupying 34Gb of space. 'swap -s' does not display kernel memory and therefore your total available memory will shrick as the ARC grows.  But the pages in the ARC are readily freed and should not have a great impact on user-level memory allocation unless memory becomes fragmented and user processes are requesting large memory pages.

Q: What if I am running Solaris Containers?

A: The swap usage of containers can be limited by defining the zone.max-swap resource control (rctl).  In a container which has a zone.max-swap rctl defined, 'swap -s' used+available should add up to the value of the rctl.  For example, in a container with

zone.max-swap = 4Gb:

zone# swap -s

total: 3624128k bytes allocated + 0k reserved = 3624128k used, 570176k available

Note that the zone.max-swap rctl also limits the amount of /tmp space which a container can use.

JVM issue in Solaris Branded Container

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

A big part of our work here is the provisioning and management of a Solaris Container Farm for a large customer.  In the course of migrating an older Solaris 9 server to a Branded container, users began reporting that their Oracle 11g utilities (namely, opatch and dbua) were not working inside the container. When invoking them, they would receive the error message,

"Could not reserve enough space for code cache"

This issue was observed only in branded containers, not in Solaris 10 Native containers.

After performing some research and contacting the vendor for support, it was determined that there is a wrokaround for the problem:  Add the option "-XX:-UseLargePages" to the JVM invocation.

The permanent fix should be to add Solaris patchid 143357-03 or greater to the global zone.  We have not tested this as yet.

The Cost of BlackBerry

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

We recently had an event with our Blackberry Enterprise Server (BES) where we were down for the better part of a day. Turns out it was corrupt MAPI profile, causing an inability of the BES to interact with the Exchange mailbox store. The fix was simple, delete some registry entries and recreate the profile, but a considerable amount of time was spend tracking down a very uncommon and obscure problem.

Blackberry iPhone

This got me thinking, what was the cost of this outage to our business? Being a sales organization, this could range from nothing more than an inconvenience up to causing us to lose a major deal or major account. Of course this question is impossible to answer, and very difficult to approximate, but It's not hard to imagine a situation where this could have been a very costly outage.

So why do we have a BES? It's more expensive, it's an additional point of failure, it takes additional time to manage, and we don't use any of the features it provides beyond what ActiveSync does (Remote wipe, Enterprise apps, IT policies, etc). While the BlackBerry was far and away the best device to receive enterprise email on, its competitors have come a long way in the past few years. I have a blackberry and I like the device, but I don't think it's a better device than what Apple has with the new iPhone, or some of the new Android phones. So I decided to find out how much more supporting a Blackberry device was, compared to supporting an ActiveSync compatible phone.

Most of our users are on Verizon, so I used numbers from Verizon for comparison. This is based on the phones having a two year refresh cycle and a new two year contract.

 

Blackberry

ActiveSync Phone

Basic Phone

$20 (8830)

$0 (Palm Pixi or Samsung Saga)

Plan (450 min, Unlimited Data, Exchange Access)

$85/mo * 24 mo = $2,040

$70/mo * 24 mo = $1,680

Blackberry Enterprise server support

$500/year = $500/8 users = $62 /user/year

$0

      Two year reoccurring cost per user

$2,184

$1,680

 

By my quick calculations that's a $500 per user premium for two years of service, or $250 per year per user. Now that's not much money for a business, but I believe that's only a small part of the total cost. When you factor in how much time our administrator has spent managing the BES server (updates, service packs, version upgrades, troubleshooting) and the additional burden in related tasks (upgrading Exchange), plus the additional downtime we've experience on the BES, I would imagine the cost per user is significantly higher. This also does not include MS Windows licensing, hardware, power and cooling (our BES lives in a VM, reducing our costs).

Of course the situation can vary dramatically from our organization to another, especially if the enterprise features of BES are leveraged. I don't think the BlackBerry makes our users any more productive over another Smartphone, and it certainly adds hard costs and soft costs to our IT infrastructure.  My recommendation to management will be to begin to phase out the BES as employee contracts expire.

Disagree with me or my numbers? Please post below.

Cloud Services for Small and Medium Business

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

Buzzwords like SaaS and cloud computing  can seem like daunting technologies that only big companies can afford to invest in and migrate to. In my experience, this couldn’t be further from reality.

We were recently approached by a small law firm with that had been experiencing problems with email. They ran Microsoft Exchange in house and had problems with their Exchange server crashing, losing their internet connection, and poor power service to their building. Like most modern businesses, email is a business critical service, and the downtime was hurting their ability to communicate with their clients in a timely fashion. They needed help fast, without a large capital investment.

We discussed the options with them and settled on a hosted exchange model. They liked the following advantages of a hosted solution:

  • Offsite in a professional datacenter with redundant power, cooling, network and tight security.
  • Managed by professional Exchange administrators
  • Low capital investment (only the Professional services we charged to migrate them to the service)
  • Fixed operating costs: $5 per user, per month. No hardware or software upgrades, ever.

To get an idea of how quick and easy this process was, we went form first meeting to complete migration in less than a week. Most of the time was spent discussing what solution was best for them, and how we would migrate to the new service. The migration itself took place over one weekend.

The client is very happy with the new solution, and since moving they have not had any downtime. If you’re interested in this for your business, contact us and we can discuss the best options for you.

Better Virtualized Backups with Veeam

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

Virtualization has made a lot of IT tasks and processes simpler and easier to manage.  It has brought the ability to quickly deploy a server or workstation with the click of a mouse, move machines around without causing an outage or bring High Availability and Fault Tolerance to systems that otherwise would not be able to benefit from those features.  However like every piece of technology that makes our lives easier it invariably creates new issues that need to be solved.

One of these problems is how to efficiently back up a virtualized environment.

Backing up a virtual environment should NOT be treated like a physical one.  Doing so will create performance and management headaches, remember you are sharing resources.  Backing up VMware VMs used to use VCB (VMware Consolidated Backup).  The VCB was a band-aid for the problem, you should not use it, VMware is getting rid of it, don’t use it.  With the vStorage API’s in vSphere the backup process is streamlined.  The API allows for well designed methods for more efficient access to a VMs data.

Unlike conventional backup systems there are no agents to install.  Veeam utilizes VMware tools which you already have on your VMs.  Through VMware tools Microsoft VSS is supported so your Active Directory, Exchange, and SQL servers can be quiesced so their databases are consistent on disk.  A snapshot is then taken and the now read-only disk is mounted to the Veeam backup server using the hot disk add feature and backed up to a compressed and deduped file.  The file can exist anywhere that VMware ESX, or the Veeam backup server can write to (even a USB drive).  This gives you great flexibility.  From this point data replication or conventional tape backup system can be used to send your backup off site.

With Veeam restoring a single file or whole VM is simplified.  You can restore a VM from a backup to anywhere in your VMware environment accessible to Veeam.  This gives you the ability not just to restore a broken VM but also to deploy it as a clone.  You can then do testing of patches and software on a VM that is the same of that in production.  

The other feature that is part of the Veeam backup product it replication.  The advantage here is that you can replicate at the VM level.  Only replicating systems that are needed for DR can save you bandwidth and complexity in your DR environment.  Veeam also allow for near CDP (continuous data protection).  Near CDP is accomplished through a combination of changed block tracking and FastSCP.  Culmination of this allows you to reach lower recover point objectives.

F3 Storage Methodology

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

(Or "How we help our clients pick the best storage for their needs")

 

This being my first blog post, I thought I'd take a moment to introduce myself. I'm Will Usher and I've been working for F3 Technology Partners for the past two years. Here at F3, my time is split pretty evenly between pre-sales and post-sales engineering work. My two main focuses are VMware Virtualization and Storage. This blog post is going to be about my methodology around helping a client understand their storage options and picking the solution that best fits their needs.

I break down any potential storage solution into two of four categories; frame-based/frame-less and block-based/file-based. Every storage array on the market can be placed into two of these quadrants. So we're all on the same page, let me give a

quick background of what each quadrant means.

Frame-based arrays are your traditional SANs. You buy a controller (or two) and however much disk you need, generally added in trays. You can continue to add disk until you have reached the limit of disk that your controller can support.

Frame-less arrays are a similar in concept to grid computing systems; every time a node is added, CPU, memory, network and disk are added. You're not locked into the one or two controllers the way you are with frame based computing.

Block Based arrays are traditionally what was considered a SAN. They don't run traditional file systems like Windows' NTFS or Solaris' ZFS. They carve up raw storage into LUNs and serve that out via block based protocols (FC, iSCSI, FCoIP, etc.)

File Based arrays use an internal filesystem, like ZFS, WAFL, or NTFS, and are able to serve files to clients via file based protocols (CIFS/SMB, NFS, HTTP/S, FTP). They have traditionally been called NAS devices, but this is changing as many are now able to serve our block based protocols.

So now we know how to classify storage arrays, but how does this help my clients find the correct storage solution? The answer is that each of these architectures has specific advantages and disadvantages.


Frame based arrays

A frame based array will be easier to design and potentially easier to manage, thus lowering costs. When designing a frame based array you make an assumption that there will be at most two controllers. This makes inter-controller communication easier, as well as adding features easier. This can lead to faster development and richer feature sets.

Frame-less arrays

A frame-less array has a few advantages over their frame-based counterparts. It has to do with what I call "linear growth". With a frame-less architecture your array is composed of nodes. Generally each node has compute resources (CPU), network resources (FC, Ethernet, etc), cache (DRAM, NVRAM, SSD), and storage (disk). This means that every time you add capacity, you're also adding compute, network, and cache resources. This means you maintain a balanced system as you scale the capacity of the array. Contrast this to a frame based array, where you must anticipate your future growth and buy that upfront. If you under estimate your growth, you're going to end up doing a rip-and-replace to move up to the next larger controller. If you overestimate your growth, you've just wasted money on a controller that is too large for your environment.  A storage engineer (like me J ) can help mitigate some of this risk based on our experience with other clients, but unless you have a crystal ball that can tell us how much storage you're going to need in the next three to five years, it's still going to be an approximation.

Block Based arrays

Traditional SANs use a block based design where disks are assigned to RAID groups and carved up into LUNs, which are then presented to servers as block devices. The server formats it with its native filesystem, and proceeds to use it like a local disk. This is easier to design as it has fewer moving parts than a file based array. Block based arrays are generally less expensive in the low and mid range, and scale larger at the high end than file based solutions. People generally associate block based arrays as being faster than file based arrays. This does not necessarily hold true today with high-performance file based products like the NetApp FAS, and the Oracle 7000. Keep reading to find out why.

File Based arrays

Once relegated to mundane file sharing tasks, file based arrays have improved leaps and bounds over the past five or so years, thanks in no small part to Moore's law. File based array's require more CPU power to do the same amount of work as their block based counterparts. This was an issue when we had Xeons running at 800 MHz, but now that we have six core Xeons running at 3+ GHz we have more CPU power than we know what to do with. File based arrays can leverage this abundance of CPU power to meet and in some cases exceed the performance of block based arrays. They do this several ways including leveraging advanced caching algorithms to prefetch blocks from disk into cache before they're needed by the client. O f course having equal performance certainly isn't a good reason to go file based over block based. So why are file based arrays gaining rapidly in popularity? Feature set and TCO. Because a file based array will have a local filesystem on the array, it unlocks a huge amount of features not possible with traditional block storage, for example DeDuplication and encryption. One only has to look at the (admittedly dizzying) selection of software features on the NetApp website to understand my point. The other big feature of file based storage is what's been coined Unified Storage. A unified storage device can serve both block based data (iSCSI, FC) and file level data (NFS, CIFS, HTTP, FTP, etc). This means a single device can take the place of a block based array and consolidate all of your file servers. This can save a lot of time between patching and administering windows file servers. Of course there are disadvantages of file based arrays; they tend to be more expensive and they don't scale much over a petabyte.

 

When I do this for a client I have the benefit of being much more interactive. I can take their needs and pain points into account and tailor the discussion around them. By the end of the meeting a client has a much better idea of what they want, and then I can offer several different solutions that fit their needs. We can the drill down on those products and the client can weigh the pros and cons of each one. My clients appreciate that we're vendor agnostic and that we can give them solutions and let them pick the best one, rather than trying to force a specific technology on them.

Do you agree with me? Think I've got it all wrong? Please let me know in the comments.

T5220 Hardware Configuration Issue

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

In the process of testing LDOMs on a T5220, we came to a point where we decided to place the T5220 under the control of Oracle Enterprise Manager Ops Center.  We set up an OS provisioning job which would wipe out any LDOMS and OSes on the host and start fresh.

 During the course of the rebuild, Ops Center attempted to reset the hypervisor configuration to its factory default.  This operation encountered some kind of problem which placed the T5220 in an unusable state.  The host itself was powered off.  The ILOM showed the following conditions:

-> show /SYS

  Properties:

type = Host System
ipmi_name = /SYS
keyswitch_state = Normal
product_name = SPARC-Enterprise-T5220
product_part_number = 602-3822-09
product_serial_number = xxxxxxx
product_manufacturer = SUN MICROSYSTEMS
fault_state = Faulted
power_state = Off
 

-> show faulty

 Target

 Property

Value

/SP/faultmgmt/0

fru

/SYS

/SP/faultmgmt/0/faults/0

timestamp

Apr 26 xxx

/SP/faultmgmt/0/faults/0

sp_detected_fault

Apr 26 xxx ERROR: Unsupported memory configuration

 

 

 

 

 

 

 

 

 

 

 

 

Following some procedures in the T5220 documentation, I attempted the following to try and get the host back in a running state:

-> cd /HOST/bootmode

-> set config=factory-default

-> start /SYS

No Luck; same problem.

Turns out that the documentation was missing a critical step-- after doing the 'set config' above, the next step must be to reset the ILOM itself:

-> reset /SP

Following the ILOM reset, the fault was cleared. Resetting or power-cycling the host itself does not fix the problem.

Server Consolidation: Solaris, Zones and....

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 
Solaris Zones are an increasingly popular technology for performing server consolidation in large datacenters.  Zones are part and parcel of Solaris 10 and will run on any hardware platform (Sparc or X86) where Solaris 10 is supported.  On its own, Zones technology addresses the security and isolation aspects of consolidating multiple applications or workloads on a physical host.  Combined with other products and technologies, Zones provides the basis for a complete managed solution.  Here are some examples of how the zones concept can be expanded and complemented:
 
Resource Management:  Sure, you can cram 20 or more zones onto a single host, but how to ensure that they all get their fair share of CPU and memory resources?  Solaris 10 not only provides free, built-in resource management; but gives you the choice of how to implement it.  Want to dedicate CPUs to a workload?  Want to limit how much real memory and swap space each zone occupies?  Or how about allowing any zone to grab as much CPU and memory as it can but reliquish some of that resource when required by other workloads?  All built into Solaris 10.
 
High Availability:  Putting all those eggs into one basket could be risky.  Both Solaris Cluster 3.x and Symantec's Veritas Cluster Server (VCS) 5.0 provide full support for monitoring anf failover of zones between hosts in a cluster.  Solaris Cluster takes the HA concept one step further, by enabling not only the failover of a zone with all of its workloads, but also a 'virtual host' clustering mode, where the applications are monitored and can be moved between two or more zones located on different hosts in the cluster.
 
Branded Containers: Solaris 8 and Solaris 9 branded containers allow you to retire old hardware by taking a flash image of the physical host and installing it into a container on a Solaris 10 host.  The branded container presents its applications with system call interfaces exactly as Solaris 8 or Solaris 9 would; in fact, because the branded container is installed with an image taken directly from a physical host; all binaries and libraries are carried over.  The magic occurs at the system call layer, where Solaris 8 or 9 system calls are translated, executed in the Solaris 10 kernel, and the results sent back to the caller in native format.  But branded containers are intended as a means to an end-- enabling migration off old hardware and operating systems while planning for migration to native Solaris 10 is underway.  Vendor support for a given brand follows the support schedule for its corresponding operating system. When Solaris 8 enters its EOSL (end of support life) in March 2012, there will no longer be patches or support for the Solaris 8 container brand.
 
Capacity planning and management:  A major advantage of shared / consolidated environments is their ability to make much more efficient use of compute resources such as CPU and memory.  To fully exploit this advantage, it is important to have historical capacity data upon which to base decisions regarding capacity.  Most data used in Solaris capacity planning originates with the kernel's extended accounting facility.  The collection tool's local agent will take samples of this data and store it in a centralized database for analysis and reporting.  Major capacity management solutions (BMC Perform/Predict, Teamquest, Sun Ops Center) are all container-aware and can give planners a graphical, unified view of where their environments have excess capacity and where capacity is short.

Zones: zoneadm detach is failing with a strange error

Share on Twitter Twitter | Share on Facebook Facebook | Submit to Digg digg it |  Add to delicious  delicious |  Submit to StumbleUpon StumbleUpon |  Share on LinkedIn LinkedIn | Submit to Reddit reddit 

In the process of configuring zones on Solaris 10 update 8 (10/09) in a clustered environment, F3 engineers came across the following problem:

# zoneadm -z zone_1 detach
zoneadm: zone 'zone_1': These file-systems are mounted on subdirectories of /tech/zones/zone_1.
zoneadm: zone 'zone_1':   /tech/zones/zone_1/dev/.devfsadm_synch_door

We have not yet determined what causes this; it seems to be sporadic.  It appears that .devfsadm_synch_door is a hidden mount.  To fix it, we have done:

# umount -f  /tech/zones/zone_1/dev/.devfsadm_synch_door

We can then cleanly detach the zone from the host.

All Posts

Subscribe by Email

Your email:

Posts by Month