Archive

Archive for the ‘Geek’ Category

Starting at the Top…

October 8, 2009 Leave a comment

To back up a couple steps and frame these entries a bit… There are three basic categories of efficiency in the datacenter: Supply, Demand, and Business (or Process). All three, and all of the subcategories under them, should be tracked.

Obviously, supply components have cost. Reducing the consumption and the waste of the supply components is often the focus of efficiency efforts. We see tons of marketing and geek-journal information about new, super-efficient power distribution systems, transformer-less datacenter designs, and there is much effort in the industry to make these pieces less wasteful. For those who haven’t really dabbled in the field, you would be amazed at how much power is wasted between “the pole and the plug”. UPS systems, PDU’s, power conversions, etc. all mean loss. Loss means power that you are paying for that never makes it to a useful function of processing.

Downside, the supply categories are difficult and usually expensive to change (including labor and asset categories). The real efficiency gains that are often overlooked or given less priority are in demand and business. Demand includes the workloads and software running in your IT environment. Can you imagine the cost savings if you were about to “compress” your consumption by 30%? Maybe 50%? Capacity management, virtualization, and less obvious things (to most people) like fixing bad and under-performing code can be huge wins.

In my experience, customers (and “efficiency consultants”) rarely look at the business processes, goals, and overall flow that drives the processing in the first place. Often the flow of demands, and the scoping of demands from the business can have huge impact on consumption. Do you really need 50,000 batch jobs that represent every report that has ever been run on the system? Do you really need to save those results? Do you really need to run and distribute all those reports, and to how many people? How many manual processes are involved in your IT management where the answer is always “yes”, or the process is mostly repeatable and could be automated?

Examining supply, demand and business in a structured fashion, and asking “why are we doing that anyway?” can have huge returns with minimal investment. There is always “low hanging fruit” in efficiency. It is just plain dumb to keep throwing money away for the sake of tradition, habit, and legacy operations.

bill.

Categories: Geek

Sometimes it is the Obvious Things…

October 1, 2009 Leave a comment

Slight tangent, but still on the efficiency theme. Last year, I was doing some work with a large Internet retailer, and came across a few epiphanies that apply to these monologues.

We were addressing storage, with a view toward a “private storage cloud” as the intended goal. This customer was very forward-thinking in much of their environment, with excellent processes in place for most of their IT environment. Some pieces seemed to be stuck in the 90’s though, and were based on “traditional operating practices” rather than the realities of their business.

Simple example: What happens when the table spaces of your database are running out of room? Traditionally (and in this customer’s environment), a monitoring agent on the server sent an alarm to the service desk, and opened a trouble ticket. The ticket was then assigned (by hand) to one of the DBA’s, who could go into the database management interface and “add tablespace” using some available disk space. What happens when you run out of available disk space? A trouble ticket is opened for the storage group to allocate another chunk of storage to the system. What happens when the storage array starts running short of available chunks? A trouble ticket is opened for the manager of the storage group to either add capacity through a “Capacity on Demand” contract, order an upgrade, or purchase an additional storage array.

What’s wrong with this picture? Nothing according to the classic flow of IT management. In reality, there are way too many people doing manual processes in this flow. Simple business question: If the system taking orders from your customers is running out of tablespace to store the orders, are you really ever going to say no to adding more disk space? No? Then why do we have a series of binary decisions and checkpoints on the way to satisfying the demand?

Automation and self-service are key components of the cloud “ilities”, but can also stand alone as an efficiency play in almost every IT environment. We often execute traditional processes and practices rather than mapping the real business needs and constraints against the technology capabilities. In this example, a few scripts to automate adding tablespaces, creating new tablespaces, adding filesystems, adding and assigning storage chunks, and pre-provisioning those storage chunks in a standardized fashion saved countless hours of human intervention. Each of those activities is simple and repeatable, and the scripting reduces the opportunities for error. Throw in informational alerts to the interested parties, instead of trouble tickets requiring action, and the efficiency of this little piece of the puzzle is greatly improved.

Some parts do remain mostly “traditional”, such as the procurement of new storage assets, but even those business functions are streamlined as part of this evolution. Once IT realizes that the business needs follow a predictable pattern of simple actions and reactions, automation becomes a simple task. After all, what Internet retailer wants a potential customer to click on the “check out now” button and get some weird error message because of a storage shortage, when they just spent an hour filling the cart with all of those items that they need? It isn’t just a lost transaction, it could be a lost customer.

Well, back to my day job for now.

bill.

Categories: Geek

Private Clouds, the Evolutionary Approach

September 30, 2009 Leave a comment

Continuing with the ramblings of my last entry, since I am up late with children and their dozens of excuses as to why they are not asleep…

Now that we have defined the “ilities” that we want from our private cloud efforts, we can examine each of them and look for obvious opportunities with high returns. People cost, IT CAPEX, IT OPEX, energy costs, reducing operational complexities, improving service levels, reducing risk, and any other opportunities that we can target and quantify. One major rule here is that when we pick a target and an approach, we must also have a “SMART” set of goals in place.

For the .21% of the readers who have never heard the SMART acronym before, it stands for Specific, Measurable, Attainable, Realistic, and Timely. In other words, for every action that we plan to take, or every improvement that we want to deploy, we must have a measurable set of criteria for success. It amazes me how many IT managers do not know the “average utilization of server systems in the data layer during peak shift”. Yeah, that is pretty darn specific, but ask yourself, do you know what your company’s utilization is during the prime workday cycles? Bingo. We need a baseline for whatever metrics we choose to measure success for each project and change to our IT operations.

Sidenote: If the answer to the previous question was “Yes”, and the utilization is anywhere above 30% during workday peak shift hours, I am impressed.

So where are the obvious targets? I have already hit on one of them, system utilization and idle processing cycles. Systems consume electricity and generate heat (Those servers are actually very efficient space heaters), resulting in cooling requirements and air circulation requirements, and odds are that a majority of the processing potential is not being used for processing.

Consolidation? Maybe. Capacity Planning? Definitely. Capacity Management? Absolutely! Consolidation is a valid target project, but is usually approached as a one-time, isolated event. Consolidation does not necessarily change the behavior that caused over-sizing to begin with, or help when workloads are seasonal or sporadic. These variable workloads most often result in systems that are sized for “peak load”, with lots of idle cycles during off hours and off-days (and sometimes off-months).

The first step to a consolidation is capacity planning, including the key step of generating a baseline of capacity and consumption. If, instead of treating this as a one time event, we start monitoring, reporting, and trending on capacity and consumption, we have now stepped into the realm of Capacity Management. We can watch business cycles, transactional trends, traffic patterns, and system loads and project the processing needs in advance of growth and demands. What a concept.

Now imagine a world where we could dynamically allocate CPU resources on-demand, juggle workloads between systems with little or no downtime, and use systems of differing capacity to service workloads with differing demands. Wow. That sounds like one of those “ilities” that we were promised with that “Cloud” concept. Dynamic resource allocation and resource sharing, possibly with multi-tenancy to maximize utilization of compute resources. Yep. Sure is. Ignoring the “Cloud” word, let’s look at how we can implement this “Cloud-like capability” into our existing IT environment without bringing in a forklift to replace all of our systems and networks, and spending billions.

Breaking down those technology pieces necessary to execute against that plan, we need Capacity Management (TeamQuest, BMC, Tivoli, pick your tool that does capacity and service level management). The tool doesn’t matter. The process, the knowledge generated, and the proactive view of the business matter. Caveat: Define your needs and goals *before* buying tools that you will never fully implement or utilize!

So now we know what our hour-by-hour, day-by-day needs are, and can recognize and trend consumption. We can even start to predict consumption and run some “what if” scenarios. The next step is dynamic capacity, which in this context, includes “Resource Sharing, Dynamic Allocation, Physical Abstraction (maybe), Automation (hopefully, to some degree), and Multi-Tenancy from our right hand “Business Drivers” column from my last blahg entry. Sure, we can juggle and migrate these workloads and systems by hand, but the complexity and risk of moving those applications around is ridiculous. We need a layer of physical abstraction in order to move workloads around, and stop thinking of “systems” as a box running an application.

There are many ways to do this, so pick the solution and products that best fit your IT world. You can create “application containers”, or standard operating environments for your applications, and juggle the “personalities” running in the physical machines. Not easy. Most apps will likely not move in easily. Still a good goal to reduce variance and complexity in your environment. In this case, not a quick hit, as you will end up touching and changing most of your applications.

The obvious answer (to me and 99.6% of the geeks reading this) is to employ virtualization to de-couple the application from the operating environment, and the operating environment from the physical hardware (and network, and storage). Solaris Containers, LDOMs, VMware, Xen, xVM software in OpenSolaris, Citrix, fast deployment and management tools, the options and combinations are all over the map. The deciding factors will be cost, capabilities, management tools (monitoring, reporting, and intelligence), and support of your operational and application needs. The right answer is very often a combination of several technology pieces, with a unifying strategy to accomplish the technical and business goals within the contraints of your business. There are many of us geeky types that can help to define the technology pieces to accomplish business goals. Defining those business goals, drivers, and constraints is the hard part, and must be done in IT, “the business”, and across the corporate organization that will be impacted and serviced.

There, we have some significant pieces of the “private cloud” puzzle in place, and if the server systems were severely under-utilized, and we were able to move a significant number of them into our new “managed, dynamic capacity” environment, we should be able to realize power, cooling, and perhaps even license cost savings to balance the cost of implementation. One interesting note here, if I have “too many servers with too many idle cycles” in my datacenter, why should a vendor come in leading with a new rack full of new servers? Just wondering. Personally, I would prefer to invest in a strategy, develop a plan, identify my needs and the metrics that I would like improved, and then, maybe, invest in technology towards those goals.

Just the late night ramblings of an old IT guy.

Next entry will likely talk more about the metrics of “how much are we saving”, and get back to those SMART goals.

bill.

Categories: Geek

Unicorns, Leprechauns, and Private Clouds…

September 30, 2009 Leave a comment

Numerous conversations and customer projects over the past few weeks have motivated me to exit from the travelogue and world adventures entries here, and get back to geeky writing for a bit.

Yes, cloud. We all “get it”. We all see Amazon and the others in the public cloud space doing really cool things. Somewhere along the way, some of the message got scrambled a bit though. With any luck, I’ll clear up some of the confusion, or at least plant some seeds of thought and maybe even some debate with a couple monologues.

Non-controversial: Let’s talk about three kinds of clouds. Most folks in the industry agree that there are Public clouds like Amazon’s AWS, Joyent’s Public Cloud, and GoGrid. That one is easy. In theory, there are “private clouds”, where the cloud exists within the IT organization of a customer (note that I did not say “within the four walls of the datacenter”), and “hybrid clouds” that allow a private compute infrastructure to “spill over” to a public cloud, as expandable capacity, disaster recovery, or dynamic infrastructure.

No hate mail so far? Good, I’m on a roll.

So do private clouds exist? Maybe. If we hop into the WayBack Machine, John Gage said it best, “The Network is the Computer.”. Let’s dive a little deeper into what makes a computing infrastructure a “cloud”:



Unfortunately, most people start on the left side, with the technical details. This rarely results in a productive discussion, unless the center and right columns are agreed on first. We put a man on the moon, I think we can solve content structure and locking/concurrency in distributed and flexible applications. We don’t want to start a cloud discussion with APIs, protocols, and data formats. We want to start with business drivers, and justifiable benefits to the business in costs, value, and security.

The center column describes at a very high level, the goals of implementing a “private cloud”, while the right column lists the control points where a private cloud architecture would create efficiencies and other benefits. If we can agree that the center column is full of good things that we would all like to see improved, we can apply the business drivers to our IT environment and business environment to start looking for change. All changes will be cost/benefit, and many will be business process versus technical implementation conflicts. For example, is your business ready to give all IT assets to “the cloud”, and start doing charge-backs? In many corporations, the business unit or application owner acquires and maintains computing and storage assets. In order for a shared environment to work, everyone must “share” the compute resources, and generally pay for the usage of them on a consumption basis. This is just one example of where the business could conflict with the nirvana of a private cloud. You can imagine trying to tell a business application owner who just spent $5M on IT assets that those assets now belong to the IT department, and that they will be charged for usage of those assets now.

So is private cloud impossible? No. Is private cloud achievable? Probably. Does private cloud fit your current business processes and needs? Probably not. Do the benefits of trying to get there outweigh the hassles and heartaches of trying to fit this square peg into the octagon shaped hole without using a large hammer? Most definitely.

Some attributes and motivators for cloud computing have huge benefits, especially on the financial side of the equation. Virtualization definitely has some great benefits, lowering migration downtime requirements, offering live migration capabilities, enabling new and innovative disaster recovery capabilities, and allowing workloads to “balance” more dynamically than ever before. Capacity planning has always been a black art in the datacenter, and every model only survives as long as the workloads are predictable and stable. How many applications have you seen in your datacenter that don’t bloat and grow? How many businesses actually want their number of customers and transactions to remain stable? Not many, at least not many that survive very long.

So, to wrap up this piece (leaving much detail and drama for future blahg entries)… Private clouds probably don’t come in a box, or have a list of part numbers. Business processes and profiles need to adjust to accommodate the introduction of cloud enabling technologies and processes. Everyone, from the CIO, to IT, to legal, to business application owners, has to buy into the vision and work together to “get more cloud-like”. And finally, the business discussion, business drivers, and evolutionary plan must be reasonably solid before any pile of hardware, software, and magic cloud devices are ordered. The datacenter will go through an evolution to become more “private cloud”, while a revolution will likely mean huge up front costs, indeterminate complexity and implementation drama, and questionable real results.

bill.

Categories: Geek

More from Beijing…

April 24, 2009 Leave a comment

A couple more pics from my week in Beijing that might be interesting. Since I mentioned our new, fancy caffeinated beverage machine here in the Solution Center, here is a picture of the beast:


</img

And here is another little item that I found interesting. This is a picture that I took of a sign posted in my flex office space. Apparently if I leave my bicycle in the flex office space, it will be removed, along with any decor that happens to clash with the office themes and decoration. I will have to keep my strange and loud ties out of view! Just to put this in context, the “swanky” flex spaces are about 1.5m of desk space with power and network. Unless you put the bicycle on the desk, I don’t see how storing one in the flex space would be possible. Maybe we can put some bike racks hanging from the ceiling?


</img

bill.

Categories: Geek

SunOS Rises Again, Better than Ever!

November 7, 2008 Leave a comment

After 15 months of bozos whacking away with power tools…

YAY!! And yes, my 2008 Hybrid Mercury Mariner has “Solaris” license plates. And yes, that is a 1975 Bricklin SV1 Gullwing in the background, and no, I haven’t touched it in 3 years, and yes, if you’d like to take it off my hands, I would entertain offers.

bill.

Categories: Geek

Back to the real world…

November 7, 2008 Leave a comment

I spent the past couple of months working on a project that had way too many lawyers involved. I didn’t want to blog about it, as dealing with lawyers and “content review” for everything I decided to write during the project would have made my head explode.

Now that the project is finished, and the intellectual property sharks are done reviewing and blessing things, I feel a bit more open about sharing my experiences. I did get to work with a a great geek who also happens to be an actor. I learned a ton about storage and cloud-like things using virtualization layers.

Since the project finished, I have been working with the xVM Server folks on the Early Access Program, details here and here.

I’ll throw some screen shots and info up soon. For now, I have a borrowed, single CPU, dual core, 8GB memory x2200 M2, sitting in a datacenter 1500 miles away to run my tests on. As of today, it is running the xVM Server software (EA2), with Solaris 10 update 6, a Solaris Express Community Edition release (Nevada 101a), Windows Server 2003 x64 Enterprise Edition, and Windows Server 2008 x64 Datacenter Edition. All on the same machine. All managed from a single desktop filled with console windows. My Windows Server guest systems even have remote console working, with decent performance to my desktop at home over VPN.

Yay!

bill.

Categories: Geek

Wikis for Dummies…

August 22, 2008 Leave a comment

The folks at The CommonCraft Show have produced a bunch of interesting videos in a series called “in Plain English”, explaining how technology (and Zombies) work. Very Cool stuff:

Enjoy…

bill.

Categories: Geek

Tupperware comes in sets…

August 13, 2008 Leave a comment

Continuing where I left off, the previous blahg entries addressed installation of the Solaris 8 branded container. Those pieces covered the mechanics of the container itself. One of the key architectural decisions in this process was “where do we put the stuff?”. Not just mountpoints and filesystems, we already covered that, but what pieces go on local disk storage, and what pieces go on the shared SAN storage?

Since the objective is to eventually integrate into a failover scenario, we looked at two options here. Each one has benefits and can supply a capability to our final solution. In the first case, we want to fail a container over to an alternate host system. In the second case, we want to fail a container over to an alternate datacenter. Think of these two as “Business Continuity” and “Disaster Recovery”.

In the Business Continuity case, the capability to do “rolling upgrades” as part of the solution would be a huge added bonus. We decided to put the zone itself on local disk storage, and the application data on the shared SAN storage. This allows us to “upgrade” a container, roll the application in, and still maintain a “fallback” configuration in case the upgrade causes problems, with minimal downtime. Accomplishing this requires two copies of the container. Application data “rollback” and “fallback” scenarios are satisfied with the shared SAN storage itself through snapshots and point in time copies.

Similar to a cluster failover pair, both zones have their own patch levels and configurations, and a shared IP address can be used for accessing application services. Only one zone can be “live” at any time as these two zones are actually copies of the same zone “system”.

To migrate the branded container to another host system, the zone must be halted, and the shared SAN storage volumes must be detached, and unmounted from the original host system:

The detach operation saves information about the container and its configuration in an XML file in the ZONEPATH (/zones/[zonename] in our configuration). This will allow the container to be created on the target system with minimal manual configuration through zonecfg.

The detached container’s filesystem can now be safely copied to the new target system. The filesystem will be backed up and then restored on to the target system. There are many utilities that can create and extract backup images, this example uses the “pax” utility. A pax archive can preserve information about the filesystem, including ACLs, permissions, creation, access, and modification times, and most types of “special files” that are persistent. Make sure that there is enough space on both the source system and the target system to hold the pax archive (/path/to/[zonename].pax in the example) as the image may be several gigabytes in size. Some warnings could be seen during the pax archive creation. Some transient special files cannot be archived, but will be re-created on the target system when the zone boots.

On the target system, the zone filesystem space must be configured and mounted, and have “700” permissions with owner root. The /zones loopback mount must also be in place, just as in the source system.

Since the zone filesystem is not on shared storage, and will remain local to the target system, the “mount at boot” option can be set to “yes”.

Storage for the applications and data should now be imported and mounted on the target system to replicate the configuration of the source system. All mountpoints, loopback filesystems, and targets of the “add fs” components of the zone must be replicated.
Once the filesystems are mounted into the global zone, the zone pax archive can be extracted. Again, care must be taken to make sure that there is sufficient space on the zone filesystem for the extraction:

The filesystem of the zone is now in place, but the zone is not yet configured into the target system. The zone must be created, modified as necessary (i.e. different network adapter hardware or device naming), and “attached” to the new host system. As a sanity check, it is highly recommended that the /usr/lib/brand/solaris8/s8_p2v command is run against the new zone to make sure that the new system “accepts” the attach of a zone created elsewhere:

The “attach” command may fail with messages about patch version conflicts, as well as extra or missing patches. Even though this is a full root zone, the detach/attach functionality makes sure that the host systems are equivalent. Some patches will be missing or extra in some cases, especially where the machine types or CPU types are different (sun4u, sun4v, HBA types and models, Ethernet adapter hardware differences, etc.). It is possible to normalize all patch versions and instances across systems of different configurations and architectures, but this involves significant effort and planning, and has no real effect on the operation of the hosting systems or the hosted zones (patching software that will never run on a given machine).

Once all errors and warnings are accounted for as “accepted deltas” or resolved, a failed attach can be forced:

Zone migration can be toggled between the machines by halting the zone, detaching the zone, moving the shared SAN storage into the target system, attaching the zone and booting the zone. Once the zone has been installed, configured, and booted on both systems, there is no need to use the s8_p2v function for migration. Strictly speaking, the “detach/attach” function is not necessary since the zone itself resides locally, and is not actually migrating, but it does provide an extra layer of protection on the non-active machine to keep the halted zone from being booted while the shared storage is not active. By setting the zone state to “detached”, the zone will not boot unless the “attach” command is executed first, providing the check for the shared SAN storage configured with the “add fs” zone configuration.

Pretty simple, huh? In fact, if you look at the above diagram, it looks mysteriously like the functionality of a cluster failover. Once we modeled and tested these actions by hand, we integrated the pair of containers into Veritas Cluster Server and managed the zones through the VCS GUI. Online, offline, failover… It all just works. Very cool stuff.

bill.

Categories: Geek

Tupperware footnote…

August 12, 2008 Leave a comment

A couple folks have emailed me asking about my VxVM and VxFS in this physical to virtual conversion. As I have blahg’d before, separation is a good thing. In this case, we had Veritas Volume Manager and Veritas Filesystem on the source machine, and on the target machine. Volume Management and Filesystem Management should live within the Global Zone, not in local zones (or non-global zones as they are officially called). Mixing “system” activities within zones is a BadThing[tm], especially when the zone is a branded container. Trying to use Solaris 8 filesystem utilities and disk volume management utility binaries (even through the branded containers software) against a Solaris 10 system, kernel, and possibly even a hardware architecture (sun4v) unknown to a Solaris 8 operating system is a dangerous path to walk.

Definitely too much risk in there for me to even attempt to whack it into working. 🙂

We installed out Volume Management (VxVM) and Filesystem (VxFS) on the Solaris 10 target host system using the Veritas Foundation Suite (5.0 maintenance pack 1 Rolling Patch 4). All of the storage goodies were installed and configured as local objects on the Solaris 10 host system, and mounted under the /z/[zonename]_[function] pathnames as described earlier. The lofs loopback mounts and zonecfg “add fs” pieces mapped them into the places that we wanted them to be, just providing “disk space” to the Solaris 8 branded containers. We did use zonecfg “add fs” with a type of vxfs, and it worked as advertised. In the end, we decided that the VxFS pieces are a “system function” and should be mounted in the Global Zone under /z for simplicity and consistency.

Who knows, at some point we might even use ZFS instead of the VxFS in this configuration (more on that in a later blahg entry), and this allows us to keep the “zone space” filesystem and storage agnostic.

bill.

Categories: Geek