capacity management – Fasttrack to IT performance

Performance personas, part 1 introduction

May 10, 2017May 14, 2017 Lydia DuijvestijnLeave a comment

Reflecting on the question how engineering practices can best be embedded into Agile and Design Thinking methods, a thought crept into my mind that would not go away. It is key to have skilled professionals on the team. If only the right set of personas would be identified and elaborated, that would lead to awareness and provide justification to attract these professionals. And as a result quality-of-service (non-functional) aspects could be better positioned in Agile methods.

Agile and Design Thinking methods are based on a deep understanding of end user groups. Through the creation of imaginary archetypical ‘personas’ these methods offer a structured approach to extensively research and document the behaviors, the experiences, beliefs and painpoints of groups of end users. The goal is to come up with a solution that is usable and meaningful for the target user groups.

As these archetypical end users mostly represent consumer roles or roles that directly support the business it is only natural that their behaviors, experiences and painpoints relate to business functionalities. These business functionalities are captured in the WHAT part of user stories.

To illustrate my point I will re-introduce the famous AMGRO (Amalgamated Grocers) case study, wellknown to generations of IBM architects. Let’s assume that after having successfully exploited their web portal on traditional infrastructure, AMGRO is now moving their shopping portal to the cloud and that they are making this move in an Agile manner and just completed a series of Design Thinking workshops.

An example of a user story created in the AMGRO cloud migration program is given below. As you can see it adheres to the WHO, WHAT, WOW paradigm :

“It should take an AMGRO online shopper no more than 10 minutes to complete a purchase and to receive confirmation of their purchase through the new cloud-based web portal”

WHO = an AMGRO online shopper
WHAT = complete a purchase and receive a confirmation of their purchase through the new cloud-based web portal
WOW= no more than 10 minutes

The only place in WHO-WHAT-WOW user stories to document quality-of-service aspects, such as performance, is in the WOW part. Note that to make this particular user story ‘SMART’ the meaning of ‘a purchase’ will have to be clearly defined! A good addition could be ‘consisting of 10 items’. And this user story can be made even more compelling by adding ‘and shoppers must be able to access the portal 24×7 from PC as well as mobile devices’.

So Design Thinking at least offers a way to document end user requirements for performance. It does not give any guidance how to achieve the WOW effect aimed at by these requirements. Thus to ensure that performance (and the same is true for other quality-of-service aspects) is given the attention that it deserves in an Agile program we will need the Agile PEMMX approach that Phil Hunt has introduced in his blog. But WHO in the Agile program will be accountable and responsible to drive that approach?

Let’s see if we can identify personas for those stakeholder groups that are directly affected by failing quality-of-service and for the professionals who are responsible for preventing that that happens. The relevant questions are :

Who, apart from the online shopper, will have a problem when purchases cannot be completed and confirmed within 10 minutes? In other words ‘who feels the pain when the web portal does not perform?’ The answer is that there must be a product owner for the web portal and that they will probably be called when there is an outage or performance is unacceptably slow.
Whose responsibility is it to design the web portal in such a way that it meets its performance targets? In other words ‘who will be called by the product owner to fix the performance issues?’ The answer is that the performance engineer is the most likely person to be woken up in case of problems. But the performance engineer cannot do this on her own. She needs a team of skilled specialists to come up with tuning options.
Whose responsibility is it to test that the design meets the performance requirements? In other words ‘who wil check that the tuning options that the performance engineer came up with with her team actually work?’ The answer is that the performance test manager will take care of that. And probably he gets help from one or two testers.
Who looks after the product’s performance once it has gone life? In other words ‘who will see to it that the optimized solution will stay optimized once it went live?’ The answer is that the capacity manager will monitor performance and capacity in production.

We now have defined four key ‘performance roles’. In large programmes there could be additional supporting roles, but these four roles are sufficiently instructive to support our target. In subsequent blogs we will dive deeper into these performance personas and give them a name and a face. So stay tuned!

What to manage : Performance, Capacity or both….integrated?

April 19, 2017April 23, 2017 Lydia DuijvestijnLeave a comment

A transition & transformation program that I am currently involved in distinguishes performance management, capacity management and capacity planning as three different IT service management processes. The three ITSM processes are described separately and this results in a lot of design and documentation work, performed in isolation by different architects. I was asked by the project manager to look into the rationale for this distinction and assess its feasibility.

The three different concepts can best be explained by an analogy that everybody can relate to. I will use the analogy of a supermarket. The resources in the supermarket are the carts and baskets, the self-scanners, the tills and the cashiers. The users are the shoppers. The transactions are fetch a product and put it in the cart, scan all products from the cart and pay for all scanned products.

Performance management is defined as the process responsible for monitoring, analysing and reporting on response times and throughput. This process monitors the speed of the system and its services. In the analogy of the supermarket, this process keeps track of time spent in the shop and at the till by each shopper and of the number of shoppers that can be served per hour given a certain amount of available carts and open tills and the “processor speed” of the cashiers.

Capacity management is defined as the process responsible for monitoring, analysing and reporting on available resources and the utilisation of those resources. In the analogy of the supermarket this process keeps track of the amount of available carts, open tills and the queues per till. The “processor speed” of the cashier and the amount of purchases that they have to process have an influence too.

Capacity planning is defined as the process responsible for producing the capacity plan. In the analogy of the supermarket, the supermarkt manager will collect and analyse data about all above discussed parameters on a weekly or monthly basis and use that information to make decisions about the installation of extra manned tills or to invest in a self service scanning system so as to relieve the burden on the manned tills.

Although theoretically the ITIL capacity management process could be split into three separate processes as outlined above, it is not recommended to do this from a practical perspective. The many similarities and dependencies between the three “processes” point towards combining them into one “umbrella” process rather than to implementing them as three separate processes.

The process flow for performance management is very similar to the process flow for capacity management. The sub-processes “analyse and model requirements”, “monitor, analyse and report”, “supervise tuning and performance / capacity delivery” are relevant in both cases.
Performance management and capacity management both feed into capacity planning; capacity planning is entirely dependent on the other two. In ITUP “produce capacity plan” is therefore described as a sub-process of capacity management,
Performance management, capacity management and capacity planning require similar skills. The roles involved in the three “processes” are mostly described as the capacity / performance manager, the capacity / performance administrator, the capacity / performance analyst. The first two roles are responsible for setting up and maintaining the capacity / performance framework and for streamlining the process as a whole. These roles are technology independent. The last role is not technology independent and can be implemented as a double-role for technology specialists in which case it is filled in by a multi-disciplinary team.
Although the tools to monitor application and service performance are different from the tools to monitor infrastructure resource utilisation, it is recommended to correlate the metrics in the analysis of system performance. This is enabled by setting up a performance / capacity DataWareHouse in which all collected information is stored for analysis and reporting.
In the case under discussion the team responsible for application performance reports into a different department than the team responsible for infrastructure performance and capacity but in order to meet the agreed service levels both teams will have to interface and cooperate closely in one process

Now let’s try to design the ideal performance / capacity management process for a retailer. First of all the demand as well as the service level requirements for performance have to be clarified.

Retailers know for each city where they have a shop how many customers they have on weekdays and in the weekend; they know the peaks in the year and on the day. Through their loyalty programs they know which customers buy which products. This is called the demand and the process keeping track of it is demand management. Demand management is focused on the business trends and patterns that influence performance and is in ITIL defined as a separate yet related process.

The service level requirements for performance may be different per customer. For the supermarket model that is used in the PEMMX classes the assumption was that all customers want to minimize their time in the supermarket and especially at the till, but this is not true for each customer. For instance my mother in law regards a visit to the nearby supermarket as a nice pastime. For her service and a friendly chat with the shop employees are more important than speed.

The simple supermarket simulation model that we used in our PEMMX classes allowed to specify variable input values for parameters such as the shopper arrival pattern, the number of available carts, the number of open tills. By playing with those parameters one can determine the optimal configuration of the shop given a certain shopper arrival pattern. This approach helps to forecast the effects of future demand and feeds into the capacity plan.

But the model needs to be calibrated based on real-life metrics. So we need a performance / capacity monitoring system in the shop that keeps track of the arrival pattern of the shoppers, the available carts and baskets, the available self-scanners, the open tills and the queues at each till. Let’s assume that a measurement system like that is possible – I don’t want to dive into the details of internet-of-things implementations in retail today – and that all measurements are collected and stored into a shop performance data warehouse.

I think eveybody agrees with me that in the retail example it does not make much sense to implement three isolated processes, one concerned with optimizing the time spent in the shop, one concerned with optimizing the amount, size and speed of resources such as carts, baskets, self-scanners, tills, cashiers and one concerned with planning for extra resources. It is more efficient to integrate the activities in these processes under one umbrella.

The performance themes explained

April 9, 2017April 16, 2017 Lydia DuijvestijnLeave a comment

Visitors of this blog may have noticed the logo on the top of the pages and wondered what it means. When zooming in on it the picture reveals a typical project timeline with beneath it eight ovals. These ovals represent the “performance themes” that underpin the IBM performance engineering and management method (IBM PEMM ©).

This method was conceived in the nineties of the previous century as an end-to-end engineering method focusing on the performance aspects of solution design. It was integrated with methods for application design and for infrastructure design. The so-called workproduct dependency diagram (WPDD) visualised these dependencies. This WPDD was a very popular tool during the uncountable method adoption workshops (MAWs) that were conducted at that time. It helped to decide what deliverables should be created in a consultancy engagement. The central deliverable for performance engineering was the performance model.

The IBM PEMM © logo shows that the method originated at a time when the prevalent approach towards solution design was the “waterfall” approach. It also shows that the focus was on design and development rather than on operations. Only one of the ovals or “performance themes” relates to performance management. All the other themes refer to design and development.

A lot has changed since the nineties. The traditional waterfall approach has been replaced by more iterative, agile ways of working. Design, development and operations methods have become more integrated. More content has been developed while using the method in practice. The method was rebranded to IBM performance engineering and management method eXtended (PEMMX ©) and more extensions can be expected.

Amidst all those changes the performance themes proved to be unexpectedly stable. The themes together cover all focus areas that have to be addressed to integrate performance aspects into the solution lifecycle. And with some small changes the same themes can also be applied to other non-functional aspects such as availability and security.

Throughout the solution life cycle the performance (or availability or security) risk must be managed. If the risk is high, more time and effort should be spent. At the start of each project or sprint the performance requirements and volumetrics must be analysed. While the design takes shape, modeling and estimation activities can be started. Technology research of bleeding edge products feeds into the model as well as design patterns and test results. A proper design for performance lays the foundation for professional capacity management and tuning.

All these themes that were invented for waterfall programs apply to Agile programs too, but the cycles are shorter and activities triggered by a theme are likely to be recurring. In waterfall projects it is normal (though not recommended) to plan two weeks of performance testing at the end of the project. An Agile program almost requires that the performance test capability be setup as early as possible to ensure that each performance sensitive sprint can be performance tested. This is a recurring activity for which automation now becomes a key factor.

Summarizing, the beauty of the method is that in its core it is technology independent and timeless and that it can be applied in any situation in which failing performance is a business risk!

For further reading on the performance themes see Dave Jewell’s paper “Performance Engineering and Management Method – a holistic approach to performance engineering” here.