I haven’t written any entries for a while, and with pretty good reason. I left Oracle (or SunOracle) and joined Huawei in Shenzhen China. That’s right, I moved to China, wife, kids, luggage, the whole pile.
A very cool, and very big Chinese company that not many people outside of the telco business have heard of. Over 100,000 employees, and a main campus measured in square kilometers instead of acres. Crazy big, crazy innovative, and with a portfolio of tons of new toys and gadgets. Yeah, I am having fun.
Woohoo! Let your jacket flap in the wind…
I won’t really miss the word “Microsystems” all that much, and I do like the change of font. Sun Microsystems is dead, long live Sun (and SPARC, and Solaris, and Oracle).
A great trip, amazing weather, decent food, and fun projects. Nothing beats sitting on a beach at 10pm with a nice 20C breeze coming in off of the Mediterranean, eating hummus and a kabob by the light of a candle in a paper bag. In January.
The primary purpose of the trip was to look at some modernization opportunities, mostly older mainframes. It has been many years since I have heard acronyms like ADS/O, IDMS, CICS, and BMS used exclusively for hours of conversation. Fun stuff. The Clerity guys have some amazing tools and software for rehosting, and BluePhoenix makes the data and application conversions relatively pain-free. These guys definitely make my job easier, as I can concentrate on the IT services side of the rehosted system, running natively on Oracle under Solaris, with proper job and resource control, fully integrated into the “Open Systems” side of the customer’s IT operations.
On the downside, I had to give a lesson in the three steps of customer service to a flight crew after the AC’s air exchanger dripped on me for the first hour of my 12+ hour flight home. Step one, fix my pain (hand me paper towels to dry off my stuff and my shoulder). Step two, make sure it doesn’t continue to impact the customer (move me to a new seat). Step three, make sure it doesn’t happen again (put the issue on the maintenance log for the aircraft to make sure it gets fixed before the next customer gets rained on). I was amazed that the crew didn’t understand these three basic steps to happy customers. Of course, that would probably explain why United flights hopping through Frankfurt are full, while Continental direct flights have plenty of empty seats. Customer service, and customer experience has a direct impact on customer loyalty. (sigh.) And yes, I am 6’5″ (192cm) tall, and am not willing to move back to the “no legroom section” of the plane, from my aisle to a middle seat, against a bulkhead which will not recline. Moving a customer because you have inconvenienced them should be a move to an equal or better situation, not a punishment or additional inconvenience. (double sigh.)
But mostly, I’m just waiting for the exciting days ahead. No matter what happens, life will surely be “interesting” to say the least. Buckle up, it’s going to be a wild ride.
To back up a couple steps and frame these entries a bit… There are three basic categories of efficiency in the datacenter: Supply, Demand, and Business (or Process). All three, and all of the subcategories under them, should be tracked.
Obviously, supply components have cost. Reducing the consumption and the waste of the supply components is often the focus of efficiency efforts. We see tons of marketing and geek-journal information about new, super-efficient power distribution systems, transformer-less datacenter designs, and there is much effort in the industry to make these pieces less wasteful. For those who haven’t really dabbled in the field, you would be amazed at how much power is wasted between “the pole and the plug”. UPS systems, PDU’s, power conversions, etc. all mean loss. Loss means power that you are paying for that never makes it to a useful function of processing.
Downside, the supply categories are difficult and usually expensive to change (including labor and asset categories). The real efficiency gains that are often overlooked or given less priority are in demand and business. Demand includes the workloads and software running in your IT environment. Can you imagine the cost savings if you were about to “compress” your consumption by 30%? Maybe 50%? Capacity management, virtualization, and less obvious things (to most people) like fixing bad and under-performing code can be huge wins.
In my experience, customers (and “efficiency consultants”) rarely look at the business processes, goals, and overall flow that drives the processing in the first place. Often the flow of demands, and the scoping of demands from the business can have huge impact on consumption. Do you really need 50,000 batch jobs that represent every report that has ever been run on the system? Do you really need to save those results? Do you really need to run and distribute all those reports, and to how many people? How many manual processes are involved in your IT management where the answer is always “yes”, or the process is mostly repeatable and could be automated?
Examining supply, demand and business in a structured fashion, and asking “why are we doing that anyway?” can have huge returns with minimal investment. There is always “low hanging fruit” in efficiency. It is just plain dumb to keep throwing money away for the sake of tradition, habit, and legacy operations.
Slight tangent, but still on the efficiency theme. Last year, I was doing some work with a large Internet retailer, and came across a few epiphanies that apply to these monologues.
We were addressing storage, with a view toward a “private storage cloud” as the intended goal. This customer was very forward-thinking in much of their environment, with excellent processes in place for most of their IT environment. Some pieces seemed to be stuck in the 90’s though, and were based on “traditional operating practices” rather than the realities of their business.
Simple example: What happens when the table spaces of your database are running out of room? Traditionally (and in this customer’s environment), a monitoring agent on the server sent an alarm to the service desk, and opened a trouble ticket. The ticket was then assigned (by hand) to one of the DBA’s, who could go into the database management interface and “add tablespace” using some available disk space. What happens when you run out of available disk space? A trouble ticket is opened for the storage group to allocate another chunk of storage to the system. What happens when the storage array starts running short of available chunks? A trouble ticket is opened for the manager of the storage group to either add capacity through a “Capacity on Demand” contract, order an upgrade, or purchase an additional storage array.
What’s wrong with this picture? Nothing according to the classic flow of IT management. In reality, there are way too many people doing manual processes in this flow. Simple business question: If the system taking orders from your customers is running out of tablespace to store the orders, are you really ever going to say no to adding more disk space? No? Then why do we have a series of binary decisions and checkpoints on the way to satisfying the demand?
Automation and self-service are key components of the cloud “ilities”, but can also stand alone as an efficiency play in almost every IT environment. We often execute traditional processes and practices rather than mapping the real business needs and constraints against the technology capabilities. In this example, a few scripts to automate adding tablespaces, creating new tablespaces, adding filesystems, adding and assigning storage chunks, and pre-provisioning those storage chunks in a standardized fashion saved countless hours of human intervention. Each of those activities is simple and repeatable, and the scripting reduces the opportunities for error. Throw in informational alerts to the interested parties, instead of trouble tickets requiring action, and the efficiency of this little piece of the puzzle is greatly improved.
Some parts do remain mostly “traditional”, such as the procurement of new storage assets, but even those business functions are streamlined as part of this evolution. Once IT realizes that the business needs follow a predictable pattern of simple actions and reactions, automation becomes a simple task. After all, what Internet retailer wants a potential customer to click on the “check out now” button and get some weird error message because of a storage shortage, when they just spent an hour filling the cart with all of those items that they need? It isn’t just a lost transaction, it could be a lost customer.
Well, back to my day job for now.
Continuing with the ramblings of my last entry, since I am up late with children and their dozens of excuses as to why they are not asleep…
Now that we have defined the “ilities” that we want from our private cloud efforts, we can examine each of them and look for obvious opportunities with high returns. People cost, IT CAPEX, IT OPEX, energy costs, reducing operational complexities, improving service levels, reducing risk, and any other opportunities that we can target and quantify. One major rule here is that when we pick a target and an approach, we must also have a “SMART” set of goals in place.
For the .21% of the readers who have never heard the SMART acronym before, it stands for Specific, Measurable, Attainable, Realistic, and Timely. In other words, for every action that we plan to take, or every improvement that we want to deploy, we must have a measurable set of criteria for success. It amazes me how many IT managers do not know the “average utilization of server systems in the data layer during peak shift”. Yeah, that is pretty darn specific, but ask yourself, do you know what your company’s utilization is during the prime workday cycles? Bingo. We need a baseline for whatever metrics we choose to measure success for each project and change to our IT operations.
Sidenote: If the answer to the previous question was “Yes”, and the utilization is anywhere above 30% during workday peak shift hours, I am impressed.
So where are the obvious targets? I have already hit on one of them, system utilization and idle processing cycles. Systems consume electricity and generate heat (Those servers are actually very efficient space heaters), resulting in cooling requirements and air circulation requirements, and odds are that a majority of the processing potential is not being used for processing.
Consolidation? Maybe. Capacity Planning? Definitely. Capacity Management? Absolutely! Consolidation is a valid target project, but is usually approached as a one-time, isolated event. Consolidation does not necessarily change the behavior that caused over-sizing to begin with, or help when workloads are seasonal or sporadic. These variable workloads most often result in systems that are sized for “peak load”, with lots of idle cycles during off hours and off-days (and sometimes off-months).
The first step to a consolidation is capacity planning, including the key step of generating a baseline of capacity and consumption. If, instead of treating this as a one time event, we start monitoring, reporting, and trending on capacity and consumption, we have now stepped into the realm of Capacity Management. We can watch business cycles, transactional trends, traffic patterns, and system loads and project the processing needs in advance of growth and demands. What a concept.
Now imagine a world where we could dynamically allocate CPU resources on-demand, juggle workloads between systems with little or no downtime, and use systems of differing capacity to service workloads with differing demands. Wow. That sounds like one of those “ilities” that we were promised with that “Cloud” concept. Dynamic resource allocation and resource sharing, possibly with multi-tenancy to maximize utilization of compute resources. Yep. Sure is. Ignoring the “Cloud” word, let’s look at how we can implement this “Cloud-like capability” into our existing IT environment without bringing in a forklift to replace all of our systems and networks, and spending billions.
Breaking down those technology pieces necessary to execute against that plan, we need Capacity Management (TeamQuest, BMC, Tivoli, pick your tool that does capacity and service level management). The tool doesn’t matter. The process, the knowledge generated, and the proactive view of the business matter. Caveat: Define your needs and goals *before* buying tools that you will never fully implement or utilize!
So now we know what our hour-by-hour, day-by-day needs are, and can recognize and trend consumption. We can even start to predict consumption and run some “what if” scenarios. The next step is dynamic capacity, which in this context, includes “Resource Sharing, Dynamic Allocation, Physical Abstraction (maybe), Automation (hopefully, to some degree), and Multi-Tenancy from our right hand “Business Drivers” column from my last blahg entry. Sure, we can juggle and migrate these workloads and systems by hand, but the complexity and risk of moving those applications around is ridiculous. We need a layer of physical abstraction in order to move workloads around, and stop thinking of “systems” as a box running an application.
There are many ways to do this, so pick the solution and products that best fit your IT world. You can create “application containers”, or standard operating environments for your applications, and juggle the “personalities” running in the physical machines. Not easy. Most apps will likely not move in easily. Still a good goal to reduce variance and complexity in your environment. In this case, not a quick hit, as you will end up touching and changing most of your applications.
The obvious answer (to me and 99.6% of the geeks reading this) is to employ virtualization to de-couple the application from the operating environment, and the operating environment from the physical hardware (and network, and storage). Solaris Containers, LDOMs, VMware, Xen, xVM software in OpenSolaris, Citrix, fast deployment and management tools, the options and combinations are all over the map. The deciding factors will be cost, capabilities, management tools (monitoring, reporting, and intelligence), and support of your operational and application needs. The right answer is very often a combination of several technology pieces, with a unifying strategy to accomplish the technical and business goals within the contraints of your business. There are many of us geeky types that can help to define the technology pieces to accomplish business goals. Defining those business goals, drivers, and constraints is the hard part, and must be done in IT, “the business”, and across the corporate organization that will be impacted and serviced.
There, we have some significant pieces of the “private cloud” puzzle in place, and if the server systems were severely under-utilized, and we were able to move a significant number of them into our new “managed, dynamic capacity” environment, we should be able to realize power, cooling, and perhaps even license cost savings to balance the cost of implementation. One interesting note here, if I have “too many servers with too many idle cycles” in my datacenter, why should a vendor come in leading with a new rack full of new servers? Just wondering. Personally, I would prefer to invest in a strategy, develop a plan, identify my needs and the metrics that I would like improved, and then, maybe, invest in technology towards those goals.
Just the late night ramblings of an old IT guy.
Next entry will likely talk more about the metrics of “how much are we saving”, and get back to those SMART goals.
Numerous conversations and customer projects over the past few weeks have motivated me to exit from the travelogue and world adventures entries here, and get back to geeky writing for a bit.
Yes, cloud. We all “get it”. We all see Amazon and the others in the public cloud space doing really cool things. Somewhere along the way, some of the message got scrambled a bit though. With any luck, I’ll clear up some of the confusion, or at least plant some seeds of thought and maybe even some debate with a couple monologues.
Non-controversial: Let’s talk about three kinds of clouds. Most folks in the industry agree that there are Public clouds like Amazon’s AWS, Joyent’s Public Cloud, and GoGrid. That one is easy. In theory, there are “private clouds”, where the cloud exists within the IT organization of a customer (note that I did not say “within the four walls of the datacenter”), and “hybrid clouds” that allow a private compute infrastructure to “spill over” to a public cloud, as expandable capacity, disaster recovery, or dynamic infrastructure.
No hate mail so far? Good, I’m on a roll.
So do private clouds exist? Maybe. If we hop into the WayBack Machine, John Gage said it best, “The Network is the Computer.”. Let’s dive a little deeper into what makes a computing infrastructure a “cloud”:
Unfortunately, most people start on the left side, with the technical details. This rarely results in a productive discussion, unless the center and right columns are agreed on first. We put a man on the moon, I think we can solve content structure and locking/concurrency in distributed and flexible applications. We don’t want to start a cloud discussion with APIs, protocols, and data formats. We want to start with business drivers, and justifiable benefits to the business in costs, value, and security.
The center column describes at a very high level, the goals of implementing a “private cloud”, while the right column lists the control points where a private cloud architecture would create efficiencies and other benefits. If we can agree that the center column is full of good things that we would all like to see improved, we can apply the business drivers to our IT environment and business environment to start looking for change. All changes will be cost/benefit, and many will be business process versus technical implementation conflicts. For example, is your business ready to give all IT assets to “the cloud”, and start doing charge-backs? In many corporations, the business unit or application owner acquires and maintains computing and storage assets. In order for a shared environment to work, everyone must “share” the compute resources, and generally pay for the usage of them on a consumption basis. This is just one example of where the business could conflict with the nirvana of a private cloud. You can imagine trying to tell a business application owner who just spent $5M on IT assets that those assets now belong to the IT department, and that they will be charged for usage of those assets now.
So is private cloud impossible? No. Is private cloud achievable? Probably. Does private cloud fit your current business processes and needs? Probably not. Do the benefits of trying to get there outweigh the hassles and heartaches of trying to fit this square peg into the octagon shaped hole without using a large hammer? Most definitely.
Some attributes and motivators for cloud computing have huge benefits, especially on the financial side of the equation. Virtualization definitely has some great benefits, lowering migration downtime requirements, offering live migration capabilities, enabling new and innovative disaster recovery capabilities, and allowing workloads to “balance” more dynamically than ever before. Capacity planning has always been a black art in the datacenter, and every model only survives as long as the workloads are predictable and stable. How many applications have you seen in your datacenter that don’t bloat and grow? How many businesses actually want their number of customers and transactions to remain stable? Not many, at least not many that survive very long.
So, to wrap up this piece (leaving much detail and drama for future blahg entries)… Private clouds probably don’t come in a box, or have a list of part numbers. Business processes and profiles need to adjust to accommodate the introduction of cloud enabling technologies and processes. Everyone, from the CIO, to IT, to legal, to business application owners, has to buy into the vision and work together to “get more cloud-like”. And finally, the business discussion, business drivers, and evolutionary plan must be reasonably solid before any pile of hardware, software, and magic cloud devices are ordered. The datacenter will go through an evolution to become more “private cloud”, while a revolution will likely mean huge up front costs, indeterminate complexity and implementation drama, and questionable real results.