Agile! Read and reflect…!

You have not heard from me for a long time – I know: I broke all the rules for effective blogging… but it took an exceptional book to break the silence. “Agile! The good, the hype and the ugly” by Bertrand Meyer is such a book. This book is a must read for all IT professionals who call themselves Agilists or are bewilderedly trying to understand and adapt to agile ways of working:

What makes this book so great? It is a razor-sharp analysis of agile principles, practices, techniques and artefacts. Having been actively engaged in software engineering since the early days of object orientation, the author has a deep understanding of and experience with old and new software engineering methods. His classification of agile practices into the four categories good-and-new, good-and-not new, not good-and-new, not good-and-not new has helped me understand much better what Agile really is about, what to keep and what to avoid.

Obviously only the practices in the top of the quadrant need to be remembered!

  • Good but not new are iterative development in short iterations, the recognition that change plays an important role in software engineering and the central role of code
  • Good and new are team empowerment, the daily meeting, freezing requirements during iterations, time-boxed iterations, the practical importance of testing,

Meyer convincingly invalidates a number of rhetorical traps that the more evangilistical Agile texts are guilty of. The examples he gives of ‘proof by anecdote’ , ‘slander by association’ and other tricks are sometimes quite funny.

Stating that a principle must be both abstract and falsifiable, he disects the ‘principles’ from the Agile Manifesto, rejects some of them as ‘not really a principle’ and derives the (in his view) real underlying principles.

This leads to the following “usable list”

In his agile-sceptic and ‘strict though righteous’ manner he walks through all the principles on his list and discusses pro’s and con’s. Regarding the principle to put the customer at the center, he points out that the best end users are probably also the busiest. As it is unlikely that any software development team can get their fulltime attention, clever ways must be devised to make optimal use of the expert user’s scarce time.

But a fundamental danger of basing requirements on user stories alone is that they are limited to one or some user’s view and do not necessarily uncover the underlying capabilities that the software needs to deliver. It takes deep thinking to do this type of fundamental requirements analysis!

Similar disadvantages apply to the principle to build no more than the bare working minimum and keep adding on to that so-called ‘minimal viable product’ (MVP) until it is complete. How would you like it if the building company that you hired to build your new house would start by building a shed without any foundations, just to be able to show you something that remotely looked like a house? I guess the answer is clear.

Meyer points out that there is a basic difference between two types of complexities and illustrates that difference neatly with pictures of his favourite pasta : lasagne to illustrate additive complexity and linguine to illustrate multiplicative complexity. Unfortunately the MVP approach does not really help to solve problems that are characterized by multiplicative complexity.

Meyer argues that adding on functionality only works provided that the core application architecture is sound. So, fellow application architects, don’t despair! There is still hope for us. The architect profession has not yet become entirely obsolete because most business problems that we try to solve by IT suffer from multiplicative complexity and it takes a sound architecture to tackle these problems.

Meyer makes short work of agilist’s preferences for open spaces. He argues that all programmers are different and that some are more productive when they can quietly focus on the job at hand.

And there is more… One by one Meyer discusses the agile roles and the agile artefacts and gives his learned view on them.

The book concludes with the assessment of what is good, hype and ugly. As that assessment deserves my and your full attention I will come back to it in a follow-on blog!

“Agile! The good, the hype and the ugly” by Bertrand Meyer. https://www.springer.com/kr/book/9783319051543

Advertisements

There’s got to be a better way

Innovation in IT, there’s got to be a better way

Last year I installed Kerberized Kafka with Ranger for authorisation and Solr for auditing by a manual installation of handpicked versions of components it needs.

Working through the documentation took me a few months. And yes, I also did installations of Kerberized Kafka via open source Ambari which were not to my full satisfaction.

It took quite some time to get it all sorted out.

Before you have something running at a customer site there is also the product selection and license negotiations that are involved.

Though it is possible to innovate with such an approach it takes the speed out of innovation, not to mention scaling up or down an installation.

It was not the first time that I worked through the documentation on how to install a product. New products come out frequently and the rate accelerates each year.

Before you know it, another quarter has passed and what if your customer does not want to adopt the product that you have prepared for?

Then early 2018 IBM offered me to go on a training for IBM Cloud Private (with the focus on Kubernetes) where I learned how to install containerized IBM middleware.

Installing containerized middleware can now be done in minutes. And with a catalogue of about 50 containerized products in IBM Cloud Private at this moment and more coming, it made sense to me that this is the route to go.

If you have IBM Cloud Private you can have a new middleware platform in an afternoon even with your own customized container images if you want that. Good job, IBM!

And so, I started my Kubernetes journey somewhere during my preparations for the IBM Cloud Private boot camp early this year.

I bought Marco Luksa’s ‘ Kubernetes in Action’  which is a very good buy.

And therefore, I needed a Kubernetes environment.

 

What do you need?

For those who are short pressed and understand that time is money: 64 GB of RAM, 16 cores and 1 TB SSD will do fine for starters. But please check the hardware pre-requisites.

I started off by installing minikube as well as a regular Kubernetes cluster with 1 master node and 3 worker nodes in virtualbox. It did work although such an installation is very basic. For example, when you want a dashboard you need to install it. I had set it up with bridged nic’s on my home network and all was fine and dandy with very limited resource requirements.

During the ICP boot camp we installed a standard ICP installation consisting of 1 master node, 1 proxy node, 3 worker nodes and an NFS server in an afternoon.

After the training I performed an installation of IBM Cloud Private Community via the vagrant approach on my Lenovo W5210 with 16 GB of RAM in virtualbox which did not work out for me.

Next, I performed a single node install which succeeded. For this I used a virtualbox guest with a NAT nic. It turned out that I could only access the console inside the image. Port forwarding to the console did not work. Also, the 10 GB VM was severely short of memory.

After re-doing the install with a bridged nic I had something up and running but it left little room on my 16 GB laptop with 8 cores. You might be lucky if your laptop has 32 GB.

I then managed to acquire 1 old computer with a 4 core I3 CPU on which I installed a single node ICP 2.1.0.2 cluster directly on bare metal to get the most out of the 16 GB of RAM it had. The machine thus ran the master, the proxy and a single worker node. I had to exclude installing the management and vulnerability advisor node from the installation because of the lack of compute resources available.

I did manage to install a Jenkins pipeline on it to deploy the blue compute shop. It did work but the amount of surplus memory was not a lot.

At the time that ICP 2.1.0.3 came out I managed to get a new computer with an 8 core I7 core. I decided to setup a VM containing the master and the proxy on the I3 and 2 VM’s containing worker nodes on the I7. I used RHEL ‘s KVM instead of Windows 10 VirtualBox running Ubuntu and I must say that that was a very positive experience in terms of start-up times. I can recommend it.

The installation took almost 3.5 hours and in retrospect I believe that this was because the I3 machine has an old-fashioned HDD. After all the installation on the IBM sky-tap environment took about 20 minutes or so, if I re-call it correctly.

I performed the “pod auto scaling“ exercise from chapter 15 of Marko’s book and I used my 8 core I7 laptop to send calls to the I3 running the master and the proxy and then I discovered that sometimes the auto-scaling did not work. The machine containing the master (controller) and the proxy (CNI) was overloaded during the test as evidenced by a Linux utility called atop. When I throttled the rate down to 250 calls per second, the master had sufficient compute power to scale the deployment up.

Now, you understand why IBM has chosen a different topology for real life situations. IBM has service offerings to get you on the right track with adopting ICP from the start, which makes sense in a professional engagement.

I looked at my laptop and asked myself, where do a get a new computer to replace the I3? Oh, … well, … I bought a Lenovo from the X series (light to carry) and moved the KVM running the master/proxy to my laptop and did the test again where I used the Lenovo X as load driver.

The throughput increased, but although the horizontal pod auto-scaler did allow for scaling up to 6 replicas (with a target CPU utilization of 5%), it did not scale up the number of pods higher than 4. None of the I7 nodes were the constraint, … well, you have guessed it, now the X has become the bottleneck.

The current setup looks as follows:

infra-setup

In September 2018 ICP 3.1.0 comes out.  I have a cunning plan, …

My DEVOPS journey [3] start an initiative

After having attended a class and having read a couple of books it was time to start an initiative. Next to that the focus needed to be shifted back to the core subject of this blog – IT performance – and the question had to be asked “How does DEVOPS affect IT performance (and other qualities of service such as availability and security) and vice versa?”. That question led to a number of derived questions…

  1. Should we focus on the IT that supports the DEVOPS processes or on the IT that supports the target solution?
  2. Should we look into performance engineering, performance testing or performance management and should we do so in parallel or in sequence?
  3. Should we write a point of view paper, do a proof of technology (PoT), produce education materials, blog or all of the aforementioned?

An IBM Academy of Technology initiative is an excellent way to work with a group of colleagues ( business partners and/or customers can also be invited if and when interested ) on some innovative technical topic outside the boundaries of one’s day-job.

In 2017 such an initiative led to the publication of a whitepaper on ‘Agile performance engineering’ in which we folded proven performance engineering and management practices (PEMMX) into Agile ways of working. Many practitioners who reviewed that work have asked for an extension to DEVOPS with more ‘practical’ guidance on tools and techniques.

The recently started 2018 initiative will therefore dive into a number of practical questions related to DEVOPS and non-functional aspects. That means that we will not just produce a white paper but also stood up an (albeit limited) PoT environment in which we installed a Kubernetes master and two worker nodes with we can do some experiments.

Stay tuned for updates on our progress and findings!

My DEVOPS journey [2] Read a book (or two)

In my previous blog I have shared my experiences in the DEVOPS workshop at IBM Hursley. I am now inviting you to stay with me as my journey continues…

I was pleasantly surprised by being given a book at the end of the DEVOPS class. The title of the book ‘The Phoenix project’ by Gene Kim et al. did not suggest any direct connection to the hands-on learning experiences in the class.

Intrigued I started reading and immediately got caught by the accessible style of writing and the recognizability of the story.

The main character of the book is Bill, an operations manager who is given an unexpected promotion and then sees himself faced with the impossible task of saving his department from being outsourced. I am not going to give away the storyline – you can (and should!) read the book for yourself!

The short summary is that the project and the department are saved by simplifying and improving processes and by implementing much more efficient collaborations between teams.

This brings me to the point that I made at the end of my previous blog. For IT professionals it is tempting to focus on the technology dimension. It is fun, it inspires innovation, it does not talk back, it does not get angry or shout and it does not protect its job! However, without a mindshift and serious changes in processes and organisational culture, DEVOPS is never going to be successful!

In ‘The Phoenix project’ this is explained by means of the three ways. The three ways are introduced in a playful manner in the novel and at the end of the book summarized in the epilogue.

The first way optimizes the left to right flow of work from DEV to OPS by introducing small batch sizes and short intervals of work. The key practices are continuous build, integration and deployment. The second way focuses on a constant feedback loop from right to left, from OPS to DEV. The key practices are automated testing, failing fast and retreating back on your steps when quality goals are not met. The third way aims at creating a culture of continuous experimentation and risk taking based on trust.

Although the technology dimension is important to support the three ways of DEVOPS, the process and people dimensions are critical to make them succeed! Particularly the third way is a challenge, especially in large organisations with an established company culture.

Another important insight that is highlighted by the authors is the existence of four types of work that are competing for the same scarce resources in IT organisations. The four types of work are business projects, internal projects, operational change and unplanned work. All these types of work are important in their own right; they have different stakeholders; priorities are often unclear making it hard to strike the right balance.

All these insights are derived from process optimization methods developed in the 80s, in particular “the theory of constraints” that is explained in another famous book “The goal” by Eli Goldratt.

The key message is that IT can learn a lot from optimization practices developed for manufacturing. In IT as well as in manufacturing we have to look for the “constraints”, the work centers that have limited capacity but are on the critical path to production. The batches and sequence of work have to be adapted to the constraints to achieve a continuous workflow and as little ‘work in process’ (stock) as possible.

It requires a thorough analysis of IT design-to-delivery value streams to identify the weak spots and improvement points. And buy-in of professionals throughout these value streams must be obtained to be able to make changes.

This probably explains why many organisations that claim to practice DEVOPS in reality often do so only in small experimental teams working on isolated projects. Implementng DEVOPS practices in a complete IT shop is not at all simple….!

My DEVOPS journey [1] take a class

In February of this year I finally managed to attend a DEVOPS class that I never found the time for before. The joining instructions contained a large pdf with a detailed stepwise description of the installation of a virtual machine and the download of a very large file with an image in it that needed to be deployed in that virtual machine.

Fortunately I had just acquired a new and completely empty laptop, so disk space was no issue. The process took me one Sunday afternoon of waiting, hitting keys and waiting again. It has been a while since I installed machines and compiled programs on a day-to-day basis, hence I was truly proud to see my image work on Sunday evening!

Having successfully passed that first step, I headed for Hursley with three other colleagues from the Netherlands. My first impression of the teachers was that they are millenials, not much older than my two sons and I prepared myself for three days of hard work!

We started by agreeing social contracts in our team and by drawing up the value stream map for a process that we all are familiar with. [Our millenial teacher commented that a process containing more than 10 steps is far too complex and that one should aim for no more than 5 steps which is easier said than done….!]

On the backwall of the classsroom the teachers started to build up three lists; the TO-DO (backlog), the DOING and the DONE list. At the start of day one all the coloured post-its were glued onto the TO-DO space and in the course of three days they gradually moved from TO-DO to DOING to DONE! [Agile backlog management is put into practice very effectively in this class – to get rid of your backlog you just have to wait until it drops off the wall as the glue wears off the post-its…]

After the introductions the hard work begun – the first hands-on excercises in our virtual machines had to be completed.

In the mean time a colourful landscape of OpenSource tools, called THE BIG PICTURE unfolded on another part of the back wall of the classroom as we plodded along. With nostalgia I remembered the days when operating systems, programming languages and software products were called by uninspiring but meaningful acronyms. After having deciphered the acronym one had a good chance of figuring out from the name what the software was supposed to do. Not so in the Agile and DEVOPS era! Nerdy names seem to be standard for OpenSource tools (‘Bower’, ‘Puppet’, ‘Jenkins’) and there is no way to guess the purpose of each tool – one has to learn by doing.

And learning by doing is what we did for three days… we wrote simple code, deployed it and tested it automatically to get a feeling of DEVOPS. The experience reminded me of the programming labs that I took during my computer science education, just with different tools.

Being so deeply submerged in the code, builds, deployment and test runs, it became more and more difficult to maintain the necessary helicopter view and keep track of the generic structure behind the BIG PICTURE. I therefore felt the need to take a step back and reflect on it here.

The BIG PICTURE unfolding on the wall was obviously very focused on the technology dimension of DEVOPS.

Needless to say that however important good technological support is, to make DEVOPS succesfull the process and peopledimensions have to be addressed as well!

Slightly rearranging THE BIG PICTURE and adding my own thoughts to it, there clearly is a need for DEVelopment tools on the one hand side and OPerationS tools on the other side as the name DEVOPS suggests. Additionally test and collaboration tools need to be included to provide the glue between the teams and to ensure a seamless automated process. And last but not least there is the target system solution stack with its technology footprint.

Summarizing, the architecture of the DEVOPS technology dimension could look like below included mindmap in which ideally all tools interconnect nicely without any overlaps or gaps [!]

Increasing load by a factor of four? No problem with performance engineering!

Have you ever wondered how performance engineering works and how it can help you? In this writeup I will show an example how proper Performance Engineering allowed to improve the performance of importing documents into a Document Management System (DMS) by a factor of four, actually hitting the performance goal with the first try.

Project Situation

The project is a large DMS system at a public client. It went live in 2012 and has evolved and grown since. We start at the situation where the system is rolled out to two tenants, and imports about 1 Mio documents per night as per non-functional requirements.

In 2014 a requirement was stated to migrate all data from a legacy system into the system – in its original form that would have been 1.2 Petabyte of PDF data in almost 2Bn documents. Later these requirements were reduced in the size of documents, so it ended up being about 450 TByte of PDF. The migration process was to run within 18 months with varying throughput, so the peak required import throughput was four million documents per day – an increase in the factor of four!

Performance Engineering

The Performance Engineering process optimized the standard import process. Performance engineering was also used to optimize other parts of the system, but that is not part of this write up. In the performance engineering process no specific tools were used, just understanding of the architecture of the system and some spreadsheets.

Standard Import process

The standard import process involves a number of steps. The document files and their metadata are stored in a transfer directory. Then a pre-processing step checks for viruses, and uploads the documents to a transfer database. This step is required due to network zoning requirements.

In the application, the documents are then functionally checked, e.g. the target mailbox is determined, then the document is stored in the DMS. This process is running highly parallel, controlled by a custom batch framework.

After working with the documents, the document status is set to be archived, so that the PDFs are then moved into a (slightly slower) archive storage. The metadata still stays in the eAkte, so that documents can still be accessed transparently for the user.

diag

Figure 1: The import process (simplified)

Engineering process

To investigate whether the system was able to handle the estimated amount of import data during the migration process, we looked at each component and each network link between the components:

  • For each step in the import, we determined the required scaling factor, i.e. depending on how much time was available for the step, how much we could increase the time interval compared to the current situation (e.g. the pre-processing could run all day and thus did not need scaling) and how much scaling was needed.
  • For each server, we determined the CPU and memory load used during the current import process, and extrapolated to the estimated load for the migration process.
  • For each network connection, we looked at the amount of data to be transferred in what time interval (as the main import should happen at night)

For the components, a simple spreadsheet sufficed. For the network connections I created a sophisticated spreadsheet that, for each phase in the import process, calculated the maximum amount of data per logical connection, then aggregated all logical connections over the actual physical connections. For example the pre-processing file system is used when pre-processing, but also when moving the data into the next step and so on.

The main issues that we found were:

  1. The pre-processing server was – just in a previous release – migrated from physical hardware to a virtualized hardware, as it did not create as much load as originally expected. However, for the import we found we would need a considerable amount of resources. Especially we would basically use all the network capacity of the host server.In the end the network interface of the host server was replaced, from a 1GBit/s Ethernet, to a 10GBit/s Ethernet card. This sufficed to mitigate that issue.
  1. The import step does the functional check that was using other backend systems. It was already running with about 100 tasks in parallel, but as the functional check could not be made faster, we actually estimated to having to scale to about 400 tasks in parallel. Unfortunately, when testing this, our (custom made) batch framework that controlled this process was basically creating a lot of database load and was finally breaking down due to database locking.As our custom made batch framework was ensuring transactional integrity across our import steps, we could not replace it with simpler solutions, that relax the transactional integrity but rely on the application to be able to handle the processing state and retries. So we rewrote the batch framework to handle a larger amount of parallel tasks, with even much less database load.
  1. The documents to be migrated were actually “older” in the sense that they could be archived right away. So we investigated the possibility to move the PDFs directly from the preprocessing file system to the eArchive. For this we actually had to ask for an exception from the client’s security department.We changed the import process, so that only the metadata was processed and put into the main eAkte server, but the PDFs were, during the import process, directly transferred from the filesystem (that was shared with the import server for this exception) to the eArchive. This significantly reduced the import load on the servers and the database.

Optimization Result

When we were first integrating with the source system, we had already done some preliminary performance measurements and were confident that the system would work, but of course there always is a risk. We had “preliminary” settings for e.g. the number of parallel import tasks, but there still was room for optimization if needed.

The source system had prepared two days worth of import data in advance. Which we imported, to the great surprise of the source system project, in exactly the required amount of time! No further optimization was needed!

In fact the system was basically operated in this mode, with maybe minor adjustments, throughout the import process, until all the documents were imported and the exceptional file share was removed.

The result of this was a very good feedback from the client, and an exceptional client satisfaction.

Benefits of Performance Engineering

In the import optimization we have used Performance Engineering to estimate the effects of the increased import load on an existing system. The result of the Performance Engineering was exceptional – the performance requirements were met from day one, without further optimization needed.

The system had one advantage – it is an existing system, so some base numbers are already known from the production environment, like CPU loads and memory usage that we could extrapolate. On the other hand, a performance test can establish such baselines as well. The network usage analysis did not even use any existing baseline information, just the base requirements.

Performance engineering helps a project to meet its performance requirements. There are no magical techniques or tools involved, just good understanding of technology and some spreadsheet operations. Performance engineering can and should be done in any project!

 

Agile architecture….the future, just another hype or a very bad idea…?

After having survived many previous paradigm shifts such as “structured programming” in the 80s, “object orientation” in the 90s and “service orientation” in the 00s IT professionals now find themselves faced with yet another (r)evolution. The “Agile era” has commenced and we are told to forget everything that we learned before. This places the community of architects for a dilemma. Is there a future for architecture, is that future “Agile” or is “Agile architecture” a contradiction in terms and is architecture therefore doomed as Agile methods thrive?

Let’s start by asking the question ‘What is agile architecture?’

Scaled Agile Framework (SAFe) promotes “Agile architecture” and explains the concept by stating that “[….] the architecture of a system [….] evolves over time [….]”. Unfortunately there is not much more to be found on this topic on their summary pages.

Having seen the architectures or rather historically evolved system landscapes of various enterprises, I wonder to what extent this is a new concept…  In my view evolving over time is not a new property for IT solutions; however evolving over time while maintaining its intrinsic architecture could be an interesting attribute of IT solutions and worth exploring. The underlying assumption in this case is that IT solutions can be classified into families with similar intrinsic architectures.

The distinction between systems of engagement, systems of record and systems of insight comes to mind. It seems worthwhile to explore that classification. Systems of engagement directly engage with end users and therefore have to be flexible and easy to adapt. They also have to be highly available, scalable and performant, Systems of record on the other hand contain the company’s core data. They need to be secure, reliable and robust. Systems of insight serve to support decision making. They need to be able to answer all possible queries, accessing all relevant information, but sub-second performance is not a key criterion for these systems. To make the story even more complex I would like to introduce a fourth class of systems, namely the class of systems of support (or systems of integration). These are the systems that nobody ever talks about but everybody just expects to work, such as service-buses, batch schedulers and other shared services.

The hypothesis is that systems of engagement and systems of insight are better suited for agile development than systems of record or systems of support. This leads to another popular concept, namely dual (or multiple) speeds of IT. Systems of engagement and of insight need faster development, test and deployment cycles than systems of record or support.

But the transactions and queries started in the systems of engagement and insight reach out to the information stored in the systems of record and are supported by the systems of support. Therefore the ‘slow’ systems cannot be slow at all levels. The exposure layers of these systems have to adapt to the speed of their consumers. They have to be capable of integrating in an ‘agile’ manner with agile consumers.

Systems of record and support need to expose their services in a non-intrusive manner, slightly decoupled from their core architectures. This is usually done by designing APIs that expose the services needed by the consumers. To enable this we need what used to be called ‘modular design’ in the 80’s.

Architecture is the discipline that seeks to understand the essence and the structure of IT systems. Modular design leans on sound and proven architectural principles. My conclusion is that in the agile world there is still room for architects. But the definition of agile architecture needs to be rebranded into ‘the skill to (re-)design IT systems of multiple speeds in such a way that they can seamlessly interact with each other while maintaining their intrinsic architecture and properties’.

This insight requires attention in IT shops – architecture is not dead, it is very much alive! Another consequence is that systems of record and support need to be given the funding they need to expose themselves to their agile environments while safeguarding the company’s assets like they used to do before the agile era. This approach to modernize systems of record and support will eventually prove to be more efficient and cheaper than a complete turnover of the company’s legacy.

More on systems of engagement can be found here

Don’t methods work….?

In his post “Methods don’t work” on LinkedIn Daan Rijsenbrij tries to convince the IT solution architect population to abandon methods. He seems to imply that a solid methodological background negatively impacts creativity and blocks the ability to clearly identify the customer’s needs and translate these into a viable solution.

I don’t agree with Daan’s point of view. A method can never be blamed for creating an uninspiring, incomplete or incorrect solution. The responsibility to come up with a viable end product lies entirely with the users of the method, the architects – not with the method itself! The method is just a means to an end. When used well, a method helps the skilled architect to do a better job. But in the hands of an unexperienced person it can do undesirable harm. The proverbial fool with a tool will always remain a fool.

When used properly, a method together with a description standard, enables unambiguous communication with audiences that are trained in the method. A method enforces reuse of succesfull practices. A method provides memorial support – it reminds its user of the necessary steps, artefacts and attributes..

But a method cannot replace professional experience, communication skills or industry knowledge. All these qualities need to be in place and when they are, the method must be fit to the problem, not vice versa. Its user should be sufficiently skilled to play with it.

To master a method to the point that one can play with it and creatively use it takes time and lots of practice. It also requires a thorough understanding of the problems to be solved and the possible solutions.

In a recent project an assessment of the available architecture artifacts uncovered among others the absence of a complete overview and descriptions of the deployment units that were placed onto the nodes in the operational model. When uttered to an audience of skilled and trained architect, this sentence is a clear instruction about a gap that needs closing. But the task to add overviews and lists of deployment units to their high level designs was given to three untrained application architects. In stead of asking “What do you mean by deployment units and what attributes should we capture about them?”, they assumed that they understood what was asked and started working. The result was dramatic!

This is a perfect example of how a method should not be used. What should have been explained to the application architects is that they should document with which user groups and systems the application interfaces and how, what data it persists and how, what it executes and how and what needs to be installed and how. After having been given that explanation, they should have been given examples of how to document this information and voila – deployment units have come to live!

I have been teaching architecture and design methods for almost two decades. My best students have always been the ones who came with a positive attitude, dared to ask questions and tried to understand what was behind the method, why it was crafted the way it was. These students have all tremendously helped to improve the education materials with their sharp views. After all methods too are dependent on their followers and will improve by being used in real life situations!

Performance personas – meet Grant, Gwen, Petra, Chris and Becky

In my previous blog I have introduced five ‘performance persona’s’. The performance persona’s are the people in the Agile program who are accountable or responsible for acceptable performance of the ‘minimal viable product’ and all its extensions. Next to the end user four other performance persona’s were introduced: the product owner, the performance engineer, the performance test manager and the capacity manager. Obviously this list is not extensive – in practice the list of stakeholders can be longer. 

In this blog I will make an attempt to sketch the perspective of some of the performance persona’s based on a case study. 

AMGRO is an imaginary retailer who went live with an e-business home shopping solution in 2003 and are now embarking on a project to rebuild their home shopping portal to a multi-site always ON cloud. Note that this is not a one-to-one cloud migration! During the migration process the portal will be will be rebuilt using an architecture based on microservices and API connectivity to the back-end systems and this will be done in an Agile way. 

Grant Hawker, the AMGRO program manager and product owner for the home shopping service is accountable for end user satisfaction. Grant has been involved with AMGRO’s previous e-business projects as IBM program manager and has accepted an executive position at AMGRO ten years ago. He has learned the hard way that performance deserves special attention. Especially the software built by the small software house “Hackett & Runn” that AMGRO contracted around 2006 has caused him many head-aches – the first releases of the home shopping application had many stability and performance issues and he wants to avoid this situation at all costs now. 

Grant has therefore appointed Petra Quicksilver as performance engineer on the AMGRO cloud migration program and he has urged Gwen Imreddy, the scrum master, to invite Petra to the Design Thinking workshops and the daily scrum meetings. Petra has worked with the AMGRO over the past 5 years on various projects and she knows their environment pretty well. Petra is responsible for ensuring that the migrated system supports the workload and that response times are maintained.

The end user of the AMGRO home shopping portal could be anybody who incidentally shops online with AMGRO. Real end users have been invited to a series of Design Thinking workshops. The empathy maps and day-in-the-life chronologies for the end user have been extensively discussed and described in these workshops, that were led by Gwen Imreddy.

Petra has attended some of the workshops and is now asked by Grant to give feedback on the results. The first question that Petra is trying to answer is: “Have the user stories correctly captured the end user’s expectations with respect to performance? And are these user expectations documented in a SMART enough way to base a scalable design upon?” (SMART=Specific, Measurable, Achievable, Relevant, Time-bound)

During the Design Thinking workshops the end user litterally said “I expect the AMGRO home shopping service to be as fast as G**gle”. And they added ” Whatever you do to change the portal, we expect the home shopping experience to be at least as fast as before the cloud migration”. This was documented in the Ways of Working section of some of the user stories. Petra has just attended a weblecture on how to write SMART performance requirements and she notices that these requirements obviously do not fall in that category. 

She checks with Chris Rocksolid, the AMGRO capacity manager. Chris is responsible for the smooth running of the system in production. He monitors business activity and performance experience by end users and in the daily schedule. Chris confirms her suspicion that it will be very difficult to monitor and measure these end user requirements in production. But they also conclude that it probably does not make sense to go back to the end user group and ask them to be more specific. They will have to be creative and come up with a better plan. Petra informs Chris that nothing will happen in the Agile program unless it is prioritized on the backlog or documented in the definition of done. She learned this on an Agile bootcamp that she attended recently.

Petra schedules a meeting with Grant, Gwen and Chris to discuss the proposal that she and Chris have made. It consists of the following points:

  1. Petra and Chris will organise a ‘performance awareness’ session for the scrum teams. This will help the team to be able to design to the performance targets. Gwen agrees that this is a good plan.
  2. Grant will only accept user stories that have SMART performance requirements. This criterium will be added to the DoD. This adds to monitoring requirements.
  3. If a user story copies existing functionality, the existing SMART performance requirements will remain valid. Chris keeps track of production monitoring statistics that reveal the percentage of user transactions that meet or fail the criteria. This information will be used to finetune the performance requirements for the user stories. 
  4. If a user story refers to new functionality, the scrum team with help from Petra and Chris will analyse the user story in more detail and come up with a new SMART performance requirement. The feasibility of the requirement will be tested in the mandatory unit performance test. 
  5. Petra advises to hire Becky Czeckitt, an experienced performance test manager, to setup a ‘continous performance testing capability’ for the program.  All sprints that deliver performance sensitive functionality will have to be tested with production workload to check if the software can meet the SMART performance requirements.

The proposal is accepted and the AMGRO team now embarks on a journey to structurally include performance aspects in the Agile program!

Performance personas, part 1 introduction

Reflecting on the question how engineering practices can best be embedded into Agile and Design Thinking methods, a thought crept into my mind that would not go away. It is key to have skilled professionals on the team. If only the right set of personas would be identified and elaborated, that would lead to awareness and provide justification to attract these professionals. And as a result quality-of-service (non-functional) aspects could be better positioned in Agile methods.

Agile and Design Thinking methods are based on a deep understanding of end user groups. Through the creation of imaginary archetypical ‘personas’ these methods offer a structured approach to extensively research and document the behaviors, the experiences, beliefs and painpoints of groups of end users. The goal is to come up with a solution that is usable and meaningful for the target user groups.

As these archetypical end users mostly represent consumer roles or roles that directly support the business it is only natural that their behaviors, experiences and painpoints relate to business functionalities. These business functionalities are captured in the WHAT part of user stories. 

To illustrate my point I will re-introduce the famous AMGRO (Amalgamated Grocers) case study, wellknown to generations of IBM architects. Let’s assume that after having successfully exploited their web portal on traditional infrastructure, AMGRO is now moving their shopping portal to the cloud and that they are making this move in an Agile manner and just completed a series of Design Thinking workshops. 

An example of a user story created in the AMGRO cloud migration program is given below. As you can see it adheres to the WHO, WHAT, WOW paradigm : 

“It should take an AMGRO online shopper no more than 10 minutes to complete a purchase and to receive confirmation of their purchase through the new cloud-based web portal” 

  • WHO = an AMGRO online shopper
  • WHAT = complete a purchase and receive a confirmation of their purchase through the new cloud-based web portal
  • WOW= no more than 10 minutes

The only place in WHO-WHAT-WOW user stories to document quality-of-service aspects, such as performance, is in the WOW part. Note that to make this particular user story ‘SMART’ the meaning of ‘a purchase’ will have to be clearly defined! A good addition could be ‘consisting of 10 items’. And this user story can be made even more compelling by adding ‘and shoppers must be able to access the portal 24×7 from PC as well as mobile devices’.

So Design Thinking at least offers a way to document end user requirements for performance. It does not give any guidance how to achieve the WOW effect aimed at by these requirements. Thus to ensure that performance (and the same is true for other quality-of-service aspects) is given the attention that it deserves in an Agile program we will need the Agile PEMMX approach that Phil Hunt has introduced in his blog. But WHO in the Agile program will be accountable and responsible to drive that approach?

Let’s see if we can identify personas for those stakeholder groups that are directly affected by failing quality-of-service and for the professionals who are responsible for preventing that that happens. The relevant questions are : 

  • Who, apart from the online shopper, will have a problem when purchases cannot be completed and confirmed within 10 minutes? In other words ‘who feels the pain when the web portal does not perform?’ The answer is that there must be a product owner for the web portal and that they will probably be called when there is an outage or performance is unacceptably slow.
  • Whose responsibility is it to design the web portal in such a way that it meets its performance targets? In other words ‘who will be called by the product owner to fix the performance issues?’ The answer is that the performance engineer is the most likely person to be woken up in case of problems. But the performance engineer cannot do this on her own. She needs a team of skilled specialists to come up with tuning options.
  • Whose responsibility is it to test that the design meets the performance requirements? In other words ‘who wil check that the tuning options that the performance engineer came up with with her team actually work?’ The answer is that the performance test manager will take care of that. And probably he gets help from one or two testers.
  • Who looks after the product’s performance once it has gone life? In other words ‘who will see to it that the optimized solution will stay optimized once it went live?’ The answer is that the capacity manager will monitor performance and capacity in production.

We now have defined four key ‘performance roles’. In large programmes there could be additional supporting roles, but these four roles are sufficiently instructive to support our target. In subsequent blogs we will dive deeper into these performance personas and give them a name and a face. So stay tuned!