Agile On The Beach Talk

Ciarán and I had a wonderful time at the Agile on the Beach conference this last week. We did the first full version of our talk: “The ‘Just Do It’ approach to change management”.  I did an earlier version of the talk at the DARE conference in Antwerp earlier this year, but this longer version has gone through quite a few changes in the mean time.

agile-on-the-beach

The conference was set-up very well, and it was great to talk to so many people working on Agile in the UK.

The slides for the talk are up on slideshare:

We got some really nice responses, including:
The next chance to catch us is at the Lean and Kanban Netherlands conferene (“Modern Management Methods: Making Better Decisions”) conference in Maarssen on 7-8 October. We’ll have a new iteration of the talk, of course. Always on the move:-)LKNL-im-a-speaker-badge
UPDATE: The video of the talk was just released, and can be found on the conference website. Our talk can also be viewed directly on YouTube:
 Next year, Agile on the Beach will be on 4-5 September, and you can register your interest.

DevOps and Continuous Delivery

If you want to go fast and have high quality, communication has to be instant, and you need to automate everything. Structure the organisation to make this possible, learn to use the tools to do the automation.

There’s a lot going on about DevOps and Continuous Delivery. Great buzzwords, and actually great concepts. But not altogether new. But for many organisations they’re an introduction to agile concepts, and sometimes that means some of the background that people have when arriving at these things in the natural way, through Agile process improvement, is missing. So what are we talking about?

DevOps: The combination of software developers and infrastructure engineers in the same team with shared responsibility for the delivered software

Continuous Delivery: The practice of being able to deliver software to (production) environments in a completely automated way. With VM technology this includes the roll-out of the environments.

Both of these are simply logical extensions of Agile and Lean software development practices. DevOps is one particular instance of the Agile multi-functional team. Continuous Delivery is the result of Agile’s practice of automating any repeating process, and in particular enabled by automated tests and continuous integration. And both of those underlying practices are the result of optimizing your process to take any delays out of it, a common Lean practice.

In Practice

DevOps is an organisational construct. The responsibility for deployment is integrated in the multi-functional agile team in the same way that requirement analysis, testing and coding were already part of that. This means an extension to the necessary skills in the teams. System Administrator skills, but also a fairly new set of skills for controlling the infrastructure as if it were code with versioning, testing, and continuous integration.

Continuous Delivery is a term for the whole of the process that a DevOps team performs. A Continuous Delivery (CD) process consists of developing software, automating testing, automating deployment, automating infrastructure deployment, and linking those elements so that a pipeline is created that automatically moves developed software through the normal DTAP stages.

So both of these concepts have practices and tools attached, which we’ll discuss in short.

Practices and Tools

DevOps

Let’s start with DevOps. There are many standard practices aimed at integrating skills and improving communication in a team. Agile development teams have been doing this for a while now, using:

  • Co-located team
  • Whole team (all necessary skills are available in the team)
  • Pairing
  • Working in short iterations
  • Shared (code, but also product) ownership
  • (Acceptance) Test Driven Development

DevOps teams need to do the same, including the operations skill set into the team.

One question that often comes up is: “Does the entire team need to suddenly have this skill?”. The answer to that is, of course, “No”. But in the same way that Agile teams have made testing a whole team effort, so operations becomes a whole team effort. The people in the team with deep skills in this area will work together with some of the other team members in the execution of tasks. Those other will learn something about this work, and become able to handle at least the simpler items independently. The ops person can learn how to better structure his scripts, enabling re-use, from developers. Or how to test and monitor the product better from testers.

An important thing to notice is that these tools we use to work well together as a team are cross-enforcing. They enforce each-other’s effectiveness. That means that it’s much harder to learn to be effective as a team if you only adopt one or two of these.

Continuous Delivery

Continuous Delivery is all about decreasing the feedback cycle of software development. And feedback comes from different places. Mostly testing and user feedback. Testing happens at different levels (unit, service, integration, acceptance, …) and on different environments (dev, test, acceptance, production). The main focus for CD is to get the feedback for each of those to come as fast as possible.

To do that, we need to have our tests run at every code-change, on every environment, as reliable and quickly as possible. And to do that, we need to be able to completely control deployment of and to those environments, automatically, and for the full software stack.

And to be able to to that, there are a number of tools available. Some have been around for a long time, while others are relatively new. Most especially the tools that are able to control full (virtualised) environments are still relatively fresh. Some of the testing tooling is not exactly new, but seems still fairly unknown in the industry.

What do we use that for?

You’re already familiar with Continuous Integration, so you know about checking in code to version control, about unit tests, about branching strategies (basically: try not to), about CI servers.

If you have a well constructed CI solution, it will include building the code, running unit tests, creating a deployment package, and deploying to a test environment. The deployment package will be usable on different environments, with configuration provided separately. You might use tools such the cargo plugin for deployment to test (and further?), and keep a versioned history of all your deployment artefacts in a repository.

So what is added to that when we talk about Continuous Delivery? First of all, there’s the process of automated promotion of code to subsequent environments: the deployment pipeline.

pipeline

This involves deciding which tests to run at what stage (based on dependency on environment, and runtime) to optimize a short feedback loop with as detailed a detection of errors as possible. It also requires decisions on which part of the pipeline to run fully automatic, and where to still assume human intervention is necessary.

Another thing that we are newly interested in for the DevOps/CD situation is infrastructure as code. This has been enabled by the emergence of virtualisation, and has become manageable with tools such as Puppet and Chef. These tools make the definition of an environment into code, including hardware specs, OS, installed software, networking, and deployment of our own artefacts. That means that a test environment can be a completely controlled systems, whether it is run on a developer’s laptop, or on a hosted server environment. And that kind of control removes many common error situations from the software delivery equation.

Scaling Agile?

There’s a lot of discussion in the Agile community on the matter of scaling agile. Should we all adopt Dean Leffingwell’s Scaled Agile Framework? Do the Spotify tribe/squad thing? Or just roll our own? Or is Ron Jeffries’ intuition right, and do the terms scaling and agile simply not mix?

Ron’s stance seems to be that many of Agile’s principles simply don’t apply at scale. Or apply in the same way, so why act differently at scale? That might be true, but might also be a little too abstract to be of much use to most people running into questions when they start working with more than one team on a codebase.

Time and relative dimension in space

When Ron and Chet came around to our office last week, Chet mentioned that he was playing around with the analogy of coordination in time (as opposed to cross-team) when thinking about scaling. This immediately brought things into a new perspective for me, and I thought I’d share that here.

If we have a single team that will be working on a product/project for five years, how are they going to ensure that the team working on it now communicates what is important to the team that is working on it three, four or five years from now?

Now that is a question we can easily understand. We know what it takes to write software that is maintainable, changeable, self-documenting. We know how to write requirements that become executable, living documentation. We know how to write tests that run through continuous integration. We even know how to write deployment manifests that control the whole production environment to give us continuous deployment.

So why would this be any different when instead of one team working five years on the same product, we have five teams working for one year?

This break in this post is intentionally left blank to allow you to think that over.

Simple Design

scrum_tardis

Scrum really is bigger on the inside!

This way of looking at the problem simplifies the matter considerably, doesn’t it? I have found repeatedly that there are more technical problems in scaling (and agile adoption in general) than organizational ones. Of course, very often the technical problems are caused by the organizational ones, but putting them central to the question of scaling might actually help re-frame the discussions on a management level in a very positive way.

But getting back to the question: what would be the difference?

Let’s imagine a well constructed Agile project. We have an inception where the purpose of the product is clearly communicated by the customer/PO. We sketch a rough idea of architecture and features together. We make sure we understand enough of the most important features to split off a minimum viable version of it, perhaps using a story map. We start the first sprint with a walking skeleton of the product. We build up the product by starting with the minimal versions of a couple of features. We continue working on the different features later, extending them to more luxurious versions based on customer preference.

As long as the product is still fairly well contained, this would be exactly the same when we are with a few teams. We’d have come to a general agreement on design early on, and would talk when a larger change comes up. Continuous integration will take care of much of the lower level coordination, with our customer tests and unit testing providing context.

One area does become more explicit: dependencies. Where the single team would automatically handle dependencies in time by influencing prioritization, the multiple teams would need to have a commonly agreed (and preferably commonly built) interface in existence before they could be working on some features in parallel. This isn’t really different from the single-team version above, where the walking skeleton/minimal viable feature version would also happen before further work. But it would be felt as something needing some special attention, and cooperation between teams.

If we put these technical considerations central, that resolves a number of issues in scaling.  It could also allow for a much better risk / profit trade-offs by integrating this approach with a set-based approach to projects. But I’ll leave that for a future post.

Not Estimating At Scale

Estimation is a sensitive subject in Agile. Should we estimate? Do we avoid estimation in days or other time-based units? If we use relative estimation like story points, do we standardize across teams? What do we use estimation for? Are we explicit enough in emphasizing the distinction between estimations and commitments? How do we prevent abuse?

I’m not going to provide an answer to these questions.

If you want to get a good treatment on estimation with regards to agile, I suggest you read Ron Jeffries’ excellent articles in the Pragmatic Programmer’s magazine: Estimation is Evil: Overcoming the Estimation Obsession, and Estimation: The Best We Can Do

I’m just going to describe a fairly simple and effective way to not (well, kind of not) estimate with multiple teams. This is based on a recent experiment doing this exercise in a running project.

Not Estimating

In this project we had not been estimating at all, up until this point, which was 9 months into the project. And when I say we did not estimate, then of course I’m lying. But we did not do estimation of individual user stories (or larger items, Epics, Features, whatever you want to call them).

We started this project introducing Scrum, and going into one-week sprints. None of the participants were used to working in an iterative way. And the requirements for the project were completely unclear. So even if we wanted to do estimations, there was very little to estimate! We decided to forgo estimation, and simply asked the teams to split the user stories into small enough slices that a story could be finished in two to three days.

Of course, that means that there was estimation involved. Otherwise they couldn’t know whether the story was small enough. So I wasn’t lying about the lying. But the effort was limited, and only done for stories that were being groomed for the next sprint(s).

Delivery after nine months?

Fast forwarding nine months into this project, we were in a somewhat different state. For one, we were no longer doing Scrum, but have moved to a Kanban approach. We have a two-step Kanban configuration. Stories are prepared through a READY board, of which the process steps’ Explicit Policies reflect the Definition of Ready we had as a Scrum team. One of the policies is the ‘takes less than 2 – 3 days to get Done’ rule. One of three development teams is involved in the grooming process of an individual story and usually (but not necessarily) picks up that story later by moving it to their Build Board.

At nine months, people traditionally get interested in the concept of delivery. Of course, our project was already delivered. Production figures had been produced. But the project was supposed to ramp down around the 12 month mark. That meant that there was interest in finding out what part of the features still on the wishlist could be delivered by that time. So that required some estimations.

What to estimate

At this point, there are a bunch of high level areas of interest that we are interested in. These haven’t been looked at or communicated with the teams yet, and haven’t been prioritized. In fact, one of the main reasons we need the estimations is to help with prioritization. We do not want to spend a lot of time on estimating these things. We should focus on actually delivering software, not talking about things we might never work on.

We also don’t want to give the impression that the estimations that we will come up with are very accurate. A good estimation makes the uncertainty of the estimation explicit. We decide that we’ll be estimating the incoming feature requests in ‘T-Shirt sizes’, each categorized by a certain bandwidth of the number of stories expected to be necessary to implement the feature:

T-Shirt Sizes

  • Small: 1 – 5 Stories
  • Medium: 6-10 Stories
  • Large: 11-15 Stories
  • Parked: Not clear enough to size, or simply too large: requires splitting and clarifying

To make sure we wouldn’t spend a lot of time in detailed discussion, we decided to use Silent Grouping as the method of estimation. To make it work across our teams, we added a little diverge and merge to the mix.

Estimation process

We arranged for a number of sessions, each 90 minutes long. Each session would be dedicated to one main functional area. The Product Owner (or in this case the functional area PO helper) would introduce the functional area, explain the overall goals, and run through the features that had been identified in that area. This introduction would be half an hour, and would allow the teams to ask any questions that came up on the subject discussed.

Then we split into four different corners of our room, each attracting 8-10 people, and performed the silent grouping exercise there. This simply meant that everyone that showed up got a feature token (in this case simply the title of the feature printed on a piece of paper) and was invited to put it onto a board in one of four columns (the categories described above). Then people were allowed to change the position of any paper if they didn’t agree with its placement. And all of this happened in complete (well, we tried…) silence.

After a few minutes, things stopped moving, and we then went for the ‘merge’: on a central board, we called out each feature title, and based on the placement on the four separate boards determined the final position for the story. We did a few iterations of this, but our final set of rules seemed to work quite well:

  • Stories that have the same estimation on all boards, obviously go into that category on the main board
  • Stories that have two different, but adjacent, estimations go into the larger category
  • Stories that have three or more different estimations go into ‘parked’

We found that we regularly had some discussion on whether something was parked because of uncertainty, or size. But when we tried those out as separate categories, most turned out to be uncertain and the added value was limited.

Sizing Board Picture

Results

We had four 90 minutes sessions to estimate what turned out (of course:-) to be quite a bit more than 3 months of work. We found that quite a large part of the features ended up in ‘parked’, simply because they were not clear enough for the development teams to give even this kind of ball-park estimate. To get these features clearer, three amigo grooming sessions were set-up. This brought the number of parked features down considerably, and got us to a fair idea of the total amount of work. Since those sessions did not include the entire team, this did break our intent to have everyone involved in estimation, but we haven’t found a better way of doing this yet.

A second, and maybe even more important effect was that the whole team was instantly up to date on what we had planned for the next time period. A number of additional features and technical improvement were brought up during and after this session as people realized they might be needed, or would fit into the direction the project was going.

And the estimates gave the more distant business owners the required feedback to consider the priorities and decide what they could do without. For now.

Actionable Metrics at Organizational Scale

I recently chaired a session on ‘Going from company vision to Actionable Metrics‘ at the Stoos Stampede conference in Amsterdam. In that session I tried to show some ideas on making the link from an overall company vision, through different approaches to achieve that vision, to concrete actionable metrics allowing teams within a company to autonomously pursue steps towards making that vision a reality. I’m not sure I succeeded in all of that in the session, so I’m trying again in this post…

Autonomy

A goal of a lean enterprise is to ensure that the people doing the work have all the information, knowledge and skills necessary to make decisions in their day-to-day work. For a lean knowledge organisation that means that people don’t just need to know their own work field well, they need to be able to relate the decisions they make every day to the longer-term goals and vision of the organisation.

Much has been said about supporting high levels of motivation and customer focus within companies. Especially in larger companies this is quite hard to sustain, which is not surprising with works such as Dan Pink’s Drive emphasising the importance of autonomy for the knowledge worker. Ensuring the right information, and a quick feedback loop for knowledge workers is key to motivated, high performing people.

Networks

Such autonomy can’t easily be achieved in a classically structured hierarchical organisation. The siloes inherent in that type of structure are natural barriers. Barriers to the autonomy of action where the distribution of necessary skills and knowledge over separate departments is an impediment to producing work and serving the customer. Barriers as well to the autonomy of reaction where the feedback loop on whether an action was in any way effective in reaching the goals of the organisation or not is too long, or absent.

An organizational structure much more compatible with that goal of autonomy, is that of a network organisation. The basic concept of a network organisation is that of independently working cross-functional teams that gather each other’s support where necessary but generally are able to make their own decisions. Enabling them to make their own decision is the subject of this post. These are the type of organizations that the Stoos Network is considering as the preferred replacement for today’s dysfunctions.

Actionable Metrics

The Lean Startup concept of Actionable Metrics (in order to create Validated Learning) is a great way to give a team the necessary autonomy to work independently towards the right goals. In a startup those metrics can be very directly linked with the goals of the company. In  larger organisations there is need for a clear link between the overall company vision and Actionable Metrics that are usable on the team level.

An actionable metric is one that ties specific and repeatable actions to observed results. — Ash Maurya, http://www.ashmaurya.com/2010/07/3-rules-to-actionable-metrics/

In this post I’ll be using an Effect Map as the method to link the vision to specific metrics, but other methods exist of course. During the session, Catherine Louis mentioned GQM as a method designed determine which metrics to use. This paper gives some more background on GQM. The GQM method seems mostly concerned with determining the right metric for any given goal or problem, and can as such be very useful within the type of context I’m talking about. Another approach at determining the metrics you need is the A3 method.

The nice thing about Effect Maps is that they are very inclusive, and involve different roles and functions in their creation. This fits well with the multi-functional teams in our target organisation. They’re also easy to scale, using a diverge and merge facilitation process, so you can easily work on this with larger groups with full participation.

We’re on a mission from…

The first point of order is determining why we’re here. Not in a metaphysical way. I don’t really have the patience for that. In a ‘What are we trying to do as a company?’ way.  A company vision and mission statements should provide us with a good starting point here. A vision statement could be “A literate future, ” with a mission statement of “More readers, more books.”

This is of course very generic, and a subsequently generated Effect Map could go all over the place:

Effect Map Example

Effect Map Example

One thing we always need to add to the ‘Why?’ part of the Effect Map is a concrete, measurable, goal. This could be, in this case, encourage people to read more books, going from a current estimate of 100 books in a ‘lifetime’ (30 years, apparently, in the poll we got that figure from), to 1000.

Our company could encourage people to read more books in many different ways. The Effect Map shows various directions: working through publishers, changing business models, working with public libraries, promote reading in schools, making books cheaper, working with writers directly instead of through publishers, and some ways of helping people find the right books through technology.

Since we have a technology company, those last seem more relevant to start with. A larger company would probably start exploring some of the other possibilities as well, and perhaps be able to integrate those with the technology work. That could mean incorporating different sales models into the e-reader software. Or creating a 2nd hand e-book market in there. Or something. Plenty of opportunities!

Getting to actionable metrics

How do we go from such a generic goal (people read 10x more books in their lifetime!) to some actionable metrics that can be used by the multi-functional teams that our network organization comprises of? These teams need to be able to use those metrics in their day-to-day decision making. They need to be able to devise experiments, prioritise work, and navigate towards products and solutions without the type of top-down supervision that characterizes the more traditional organization.

First of all you need a baseline. Say we have a product through which people can read books: e-reader software (I told you we were a tech company). From that software we could gather statistics on the number of books people read. To do this well, we’d probably need to track this relative to how long customers have been using our software so we don’t get skewed figures from early enthousiasm (for instance). The term to look for is cohort testing. In our example, it turns out people are buying, on average, one book every three months. To get to the goal of 10x more books, we should then improve this to 3 books a month! This is already a shorter term, and thus more helpful, goal.

Pirates

To get to more useful figures, we need to turn to Dave McClure’s Pirate Metrics. Pirate Metrics are all about the funnel of attracting customer interest, keeping them, and selling to them: Acquisition, Activation, Retention, Referral and Revenue, or AARRR. Just by looking at customers through this lens gives a useful perspective. Our goal is phrased as getting people to read (on average) 10x more books. This could be approached as a matter of increasing retention (more books per customer), but also as on of Acquisition/Activation (getting more customers). That last one only works if we don’t take them away from other sources of books, of course. Can you think of a way to measure that? Certainly combined with increasing retention it would still give a new positive effect.

This would give us two main variables to pay attention to: Retention and Acquisition. We should, as a matter of course, be paying attention to at least the first three (AAR) of these metrics, and most companies will have a natural tendency to also track the last R… But tracking what the result of specific actions are on Retention and Acquisition should have our focus for now.

Pirate Metrics

Pirate Metrics

Splitting Metrics

But wait! In the Effect Map we had come up with two high-level feature ideas that would help us reach our goals: ‘Social Reading’ and ‘Better Book Recommendations’. Should both these ideas work with the same metrics?

Interesting question. On the one hand, I’d expect to be tracking all the pirate metrics in a well established application. But. The whole idea here is that you focus. So while we should keep a global eye on the whole (I’ll get back to that later), the experiments we’re conducting should focus on the change of a particular (set of) variables.

For our examples:

  • Social Reading – This is mostly about having existing readers getting each other interested in other books. That would be Retention. Secondary would be getting new customers in by sharing outside the app, which would be Referral. It’s important to note that distinction, as this has a direct influence on the priority of hypotheses to try.
  • Recommendations – This is also mostly about Retention. Existing readers should be getting more relevant recommendations, and thus but more books. The second level would be Activation. People who visit our shop already, but haven’t bought anything yet, should also be getting better recommendations and thus be prompted to buy.

This is consistent with the way we defined our goals, focusing on existing readers. That means it gives a decided focus to our development work. Phrasing our goals a little differently might increase our attention to new customer acquisition, but we’re not doing that. Consciously diving down into our metrics makes these kind of choices explicit, and that’s A Good Thing.

Absolutely Relative

So how would our teams take these more metrics towards specific hypotheses? First, we’d establish a baseline for retention. That could be

  • When people buy a book, the average time between this purchase and the previous one is 92 days

Then we can start measuring this over time. A nice, always visible chart on a big screen in the development teams’ rooms would be a great idea.

This is a useful metric, as we can measure it day-by-day. It can also be calculated in time based cohorts, as well as feature based cohorts, so we can compare normal changes over time with changes caused directly by our new features.

Hypothetically Speaking…

Ok, now we can get started. “Social Reading” is quite a broad concept. Our imaginative team of developers and product people can brainstorm-up quite a large cloud of ideas that fall within that scope, and they might have a collective gut-feeling on which ones of those would be most effective. They might have used another Effect Mapping exercise to generate ideas, and dot-voted on the most plausible ones. Or not.

The question they should be asking themselves is:

  • What would be the simplest way, costing the least effort, to prove that this idea can indeed prove effective in decreasing the average time between purchases?

If that’s not what they’re asking, they might as well be asking their company if it was feeling lucky, inc.

So for any ideas they generated, they should be thinking about this questions: how can it help  disprove (or prove) that the “Social Reading” idea is plausible?

From the long list (or effect map) of ideas that they generated (sharing quotes, sharing notes, rating books, publishing ‘reading lists’, embedding shared things on blogs, embedding on facebook or twitter, etc.) they pick one item. In this case that item might be a very basic “If a user can easily share he’s reading a book on twitter, this will trigger a shorter time between purchases”.

Now there are some problems with this one. Most important of all is that we don’t limit our audience, so we don’t know if people receiving the tweet will be existing customers. That’s Ok, though. It simply means we’re also testing for referral. Having an ‘internal’ audience might be more effective. But it would probably require a much larger up-front investment to create a communication channel between just our customers, an as such would be a less efficient way to test the hypothesis.

Another problem might be that we’re not helping the customer to share parts of the book, or anything, so the content of the tweet will probably be unspecific. We want more!

Stop!

Hold on. Take it easy. Hold your horses. We were looking for the simplest way to validate our hypothesis. How did we get into a discussion on all the cool features that should be in there? This feature, that feature, estimations (of both effort and expected value), discussions about opinions about hypotheticals…

If you want to know whether some tweets, to an audience that probably includes some existing customers, about a specific book, have some impact on sales then what you should do is write a few tweets. About some books. With an account that’s probably already there, from one of the people in the team. That probably already has other users of the service among its followers.

We all know that this is what should be done, that this is what the whole Lean Startup idea professes: Do The Simplest Thing. But even (or particularly?) in a bigger enterprise we need to put our money where our mouth is. And more importantly, avoid putting too much money where our mouth is and focus on getting that (in)validation of our most important hypothesis.

Giving the reins to the team

Taking those minimal steps is an important part of the overall process. It also seems to be one of the most difficult parts. Like developers needing time and practice to get used to working in the small steps of Test Driven Development. Like the Product Owners needing practice to split their requirements up into small enough chunks to be practical within a short sprint. Doing the absolute minimum work required to invalidate a hypothesis is probably the most difficult skill (or discipline?) to master from the Lean Startup mindset.

You can’t make it work without, though.

Especially in larger organisations where, by simple imperative of the size of the organisation, the involvement in individual projects, products and teams from the people setting the overall direction is much less than in a small startup!

The collaborative construction of Effect Maps ties together our organisation with a common vision and goal. Our carefully crafted and continuously tuned set of actionable metrics give teams clear direction within their level of influence to achieve.

To ensure that the organisational leadership doesn’t need to feel nervous about progress towards their goals, it is crucial that we fail as fast as is possible. And adjust. And try the next idea.

All Together Now

So organizational leadership can comfortably sleep at night in the knowledge that the full intellect and energy of their entire company is being put to work in the pursuit of truth, happiness and organizational goals while continuously self-correcting by the application of validated learning.

What more could they want?

There is one step still missing in this particular example, though. The metrics gathered for the specific experiments provide the very specific data needed for validated learning on the team level. The broader metrics that those are built on are still necessary for the bigger picture.

In our example that means that the targeted cohort testing done in each team is only one slice of the whole. The same (pirate) numbers are being gathered for a much broader cohort over longer periods of time to check whether the organisation as a whole is on the right track. Since that broader cohort would include the entire customer base, it will capture the combined results of all the teams.

Combining Cohorts

Summary

In this article I’ve tried to illustrate, using a simple example, how longer term organizational goals can be made measurable in the short term, and can be used to provide the direction and purpose for teams to work independently and with full autonomy towards a shared organizational purpose.

Can you capture your organization’s vision in goals? What end-result metrics will you introduce? Can you refrain from cost metrics and focus on new value delivery? Go on. Do it.

On Effect Mapping and Pirate Metrics

During the Specification by Example training I talked about recently, Gojko Adzic introduced me to Effect Mapping. He’s writing a more extensive booklet on the subject, of which he’s released a beta here. I think this is an excellent tool for exploring goals, opportunities and possible features. It can be used as a tool to generate a backlog of features, as a way to explore possible business hyptheses, and perhaps even as a light-weight way to do strategic management of a company.

But let’s start with a short description (see Gojko’s site or beta booklet for the longer one) of what effect mapping is.

Effect Mapping basics

The basic structure of an effect map is that of a structured Mind-Map. A mind-map is a somewhat hierarchical way to note down ideas related to a central theme.

A Mind Map

A Mind Map

The effect map is a mind-map with a specific structure. The different levels of the mind-map are based around the answers to four questions:

  • Why? (The Goal)
  • Who? (Who can have a role in reaching that goal; Or preventing it)
  • How? (In what way can they help, business activities)
  • What? (What are the concrete software features to make; Or non-software actions to take)

Additional levels can be specific User Stories, tasks, or actions, but that depends more on how you want to organise your backlog. The important thing is that this provides an uninterrupted flow from high-level goal or vision to concrete work.

In this way, effect maps can provide one of the important missing steps in the Agile software development world: how to determine what features provide value supporting specific business goals.

The goal used in the centre of the effect map is supposed to be a measurable goal: we need to know unambiguously when this goal has been reached! Gojko gives a nice overview of the way this can be done, using a lighter-weight version of the way Tom Gilb prescribes for making goals measurable. This involves the scale (thing we’re measuring), meter (way we’ll measure it), benchmark (current state), constraint (minimum acceptable value, break even point), target (what we want it to be).

Effect mapping is not just about what ends up on the diagram, it’s also the process of generating the map. This is, of course, a collaborative approach. Getting the involved people together, and creating and discussing the goal, and the ways to get there. Important is that both business people with decision power as well as subject matter experts and technical people that can know possible solution directions should be present. And yes, they do have time for this. Using diverge and merge, as discussed in my previous post, can be very useful again. There’s more, but it’s not relevant to the rest of my post, about iterating, prioritising, etc. So just go and read the booklet, already.

Our Effect Map

In his booklet, Gojko also links this process to the Lean Startup process of customer development. I think this is a great combination, but I would like to see some tweaks in the type of measurements we use for goals in that case.

Actionable Metrics for Pirates

In his book The Lean Startup, Eric Ries talks extensively about the importance of Actionable Metrics, as opposed to Vanity MetricsAn actionable metric is one that ties specific and repeatable actions to observed results. A Vanity Metric is usually a more generic metric (such as total number of hits on a website, or total number of customers), that is not tied to specific (let alone repeatable) actions. There can be many reasons why those change, and isolating the reason is one part of making metrics actionable.

Dave McClure user the term ‘Pirate Metrics‘ to talk about the most important metrics he sees for organisations:Pirate Flag

A – Acquisition – User is directed to your site;

A – Activation – User signs up or is otherwise engaged;

R – Retention – User keeps coming back, i.e., is engaged over time;

R – Referral – User invites others;

R – Revenue – User pays or is otherwise monetized;

These are also the familiar ‘funnel’ metrics, and the above link to Ash Maurya’s site has much more background on them. When using these metrics, it is recommended to do Cohort testing, so that you can see the different results for different groups of users (again, see Ash Maurya’s site). Doing this such that the source of the (new) users is trackable allows you to identify the best ways of increasing the number of users, without deluding yourself into extrapolating from great growth figures if they’re the result of a single marketing action.

This is probably the only spot where Gojko’s booklet could use some tuning. The Gilbian metrics he uses for his example are functional metrics (SMART, and all that), but not the type of actionable metrics that would fit into the lean startup mold. In this example, we’re talking about reaching a certain number of player (one million), in a certain space of time (6 months), while keeping costs and retention rates to certain limits. Gojko does of course explain how to do this iteratively, taking a partial goal (less player than in the end-goal) and checking at a defined milestone whether the goal was reached. And because we’ve only done one ‘branch’ of the effect map, we do have a specific action to link to the result.

If we were to re-cast this more along the lines of our pirate metrics, we could rephrase the goal as increasing the Acquisition and Activation rates. To be clear: this is a whole other goal! This goal is about a change that will ensure structural growth in the longer term. The goal on 1M extra users could (perhaps) be reached by increasing marketing spending (note that only  operational costs are taken into account in the original example). In that case, the goal would be reached, but given that there is a typical Retention rate of (for example) 6 months, then there would be an equivocal exodus of users after that time. If the company had reacted based on total number of users, this could lead to incorrect actions such as hiring extra people, etc.

If the CEO comes down with the message ‘We want 1 million users!’, and it turns out he wants that in about 6 months time, we can then say, “Ok, that’s about 5500 new users per day, or
a growth rate of 1.6%” (per day, again). Then we can start creating and testing hypotheses in much smaller steps than those 6 or three months. What’s more, by using these metrics (and goals) it should become possible to use the same metrics further out into the effect map.

And if this is something at a larger scale, teams can take some of the higher level hypotheses / business activities and use their own domain knowledge to devise more experiments to find a way to reach those goals. Since churn is included in the figure, this also allows experiments based on increasing retention rates. And since we’re doing cohort testing on pirate metrics, we can know day-to-day and week-to-week whether what we’re doing has the expected results. This extends the set-based design paradigm used in effect mapping to a broader organisation.

Strategy and organisational structure

So by using effect mapping, we can make the relation between high-level goals, stakeholders and intermediate level goals visible, using a collaborative process involving different parts and levels of the organisation. By using a consistent set of (value / end-result focused) metrics that can be applied in both the short and longer term, throughout the organisation, we can enable all levels of the organisation to apply their knowledge and skills to reach those goals. And thus allows for more self-organisation (and experimentation) at all levels…

I recently came across the system of Hoshin Kanri (via Bob Marshall). This has some remarkable similarities to what I’ve been talking about. It’s also a method to ensure policy/goals can be distributed throughout the organisation, with involvement of multiple levels and stakeholders at each step. I’ve not studied it extensively, but to me it does feel like a strictly hierarchical system, and one that is mostly used in large companies with a fairly slow (yearly and half-yearly) cycle time. It is used by Toyota, and is part of Total Quality Control, which is supposed to be “designed to use the collective thinking power of all employees to make their organization the best in its field.” It’s nice to see that everything has already been thought of, and we’re just repeating the progress of the past:-)

The difference with effect mapping is its light-weight focus, and the ease with which effect mapping can more easily be used in less hierarchical organisations. In fact, I think such a system might well be a prerequisite to the type of organisations we’re talking about in light of the Stoos network: “learning networks of individuals creating value”. My recent proposal for a session at the Stoos Stampede was precisely about finding out how we can link an organisation’s vision/mission to goals specific enough that teams can work towards them independently, but open enough that they would not suffocate and entrap those teams. I think this might be one solution to that problem.

Stoos Stampede

Scrum Gathering London 2011 – day 3

The third day of the London Scrum Gathering (day 1, day 2) was reserved for an Open Space session led by Rachel Davies and a closing keynote by James Grenning.

We started with the Open Space. I’d done Open Spaces before, but this one was definitely on a much larger scale than what I’d seen before. After the introduction, everyone that had a subject for a session wrote their subject down, and got in line to announce the session (and have it scheduled). With so many attendents, you can imagine that there would be many sessions. Indeed, there were so many sessions that extra spaces had to be added to handle all of them. The subject for the Open Space was to be the Scum Alliance tag-line: “Changing the world of work”.

I initially intended to go the a session about Agile and Embedded, as the organiser mentioned that if there wouldn’t be enough people to talk about the embedded angle, he was OK with widening the subject to ‘difficult technical circumstances’. I haven’t done much real embedded work, but was interested in the broader subject. It turned out, though, that there were plenty of interested parties into the real deal, so the talk quickly started drifting to FPGAs and other esoterica and I used the law of two feet to find a different talk.

My second choice for that first period was a session about getting involvement of higher management. This session, proposed by Joe Justice and Peter Stevens (people with overlapping subjects were merging their sessions), turned out to be  very interesting and useful. The group shared experiences of both successfully (and less successfully) engaging higher management (CxOs, VPs, etc.) into an Agile Change process. Peter has since posted a nice summary of this session on his blog.

My own session was about applying Agile and Lean-Startup ideas to the context of setting up a consultancy and/or training business. If we’re really talking about ‘transforming the world of work’, then we should start with our own work. My intention was to discuss how things like transparency, early feedback, working iteratively and incrementally could be applied for an Agile Coach’s work. My colleague and I have been working to try and approach our work in this fashion, and are starting to get the hang of this whole ‘fail early’ thing. We’ve also been changing our approach based on feedback of customers, and more importantly, not-customers. During the session we talked a little about this, but we also quickly branched off into some related subjects. Some explanation of Lean-Startup ideas was needed, as not everyone had heard of that. We didn’t get far on any discussion on using some of the customer/product development ideas from that side of things, though.

Some discussion on contracts, and how those can fit with an agile approach. Most coaches are working on a time-and-material basis, it seems. We have a ‘Money for nothing and your changes for free’ type contract (see http://jeffsutherland.com/Agile2008MoneyforNothing.pdf, sheets 29-38) going for consultancy at the moment, but it’s less of a fit than with a software development project. Time and material is safest for the coach, of course, but also doen’t reward them for doing their work better than the competition. How should we do this? Jeff’s ‘money back guarantee’ if you don’t double your velocity is a nice marketing gimmick, but a big risk for us lesser gods: Is velocity a good measure for productivity and results? How do we measure it? How do we determine whether advice was followed?

Using freebies or discounts to customers to test new training material on was more generally in use. This has really helped us quickly improve our workshop materials, not to mention hone the training skills…

One later session was Nigel Baker’s. He did a session on the second day called ‘Scrumbrella’, on how to scale Scrum, and was doing a second one of those this third afternoon. I hadn’t made it to the earlier one, but had heard enthusiastic stories about it, so I decided to go and see what that was about. Nigel didn’t disappoint, and had a dynamic and entertaining story to tell. He made the talk come alive by the way he drew the organisational structures on sheets of paper, and moved those around on the floor during his talk, often getting the audience to provide new drawings.

There is no way I can do Nigel’s presentation style justice here. There were a number of people filming him in action on their cellphones, but I haven’t seen any of those movies surface yet. For now, you’ll have to make do with his blog-post on the subject, and some slides (which he obviously didn’t use during this session). I can, however, show you what the final umbrella looked like:

All my other pictures, including some more from the ScrumBrella session, and from the Open Space closing, can be found on flickr.

Closing Keynote by James Grenning on ‘Changing the world of work through Technical Excellence’

(slides are here)

The final event of the conference was the closing keynote by James Grenning. His talk dealt with ‘Technical Excellence’, and as such was very near my heart.

He started off with a little story, for which the slides are unfortunately not in the slide deck linked above, about how people sometimes come up to him in a bar (a conference bar, I assume, or pick-up lines have really changed in the past few years) and tell him: “Yeah, that agile thing, we tried that, it didn’t work”.

He would then ask them some innocent questions (paraphrased, I don’t have those slides, not a perfect memory):

So you were doing TDD?

No.

Ah, but you were doing ATDD?

No.

But surely you were doing unit testing?

Not really.

Pair programming?

No.

Continuous Integration?

Sometimes.

Refactoring?

No!

At least deliver working software at the end of the sprint?

No…

If you don’t look around and realise that to do Agile, you’ll actually have to improve your quality, you’re going to fail. And if you insist on ignoring the experience of so many experienced people, then maybe you deserve to fail.

After this great intro, we were treated to a backstage account of the way the Agile Manifesto meeting at Snowbird went. And then about what subjects came up after this year’s reunion meeting. James showed the top two things coming out of the reunion meeting:

We believe the agile community must:

  1. Demand Technical Excellence
  2. Promote individual [change] and lead organizational change

The rest of his talk was a lesson in doing those things. He first went into more detail on Test Driven Development, and how it’s the basis for improving quality.
To do this, he first explains why Debug-Later-Programming (DLP) in practice will always be slower than Test-First/TDD.

The Physics of Debug-Later-Programming

DLP vs TDD

Mr. Grenning went on to describe the difference between System Level Tests and Unit Tests, saying that the System Level Tests suffer from having to test the combinatorial total of all contained elements, while Unit Tests can be tailored by the programmer to directedly test the use of the code as it is intended, and only that use. This means that, even though System Level Tests can be useful, they can never be sufficient.
Of course, the chances are small that you’ll write sufficient and complete Unit Tests if you don’t do Test Driven Development, as Test-After always becomes Release First. Depending on Manual Testing for testing is a recipe for unmaintainability.

The Untested Code Gap

The keynote went on to talk about individual and organisational change, and what Developers, Scrum Masters and Managers can do to improve things. Developers should learn, and create tested and maintainable code. Scrum Masters should encourage people to be Problem Solvers, not Dogma Followers. He illustrated this with the example of Planning Poker. As he invented Planning Poker, him saying that you shouldn’t always use it is a strong message. For instance, if you want to estimate a large number of stories, other systems can work much better. Managers got the advice to Grow Great Teams, Avoid De-Motivators, and Stop Motivating Your Team!

DeMotivators

It was very nice to be here for this talk, validating my own stance on Technical Excellence, and teaching me new ways of talking to people in order to help them see the advantages of improving their technical practices. Oh, and some support for my own discipline in strictly sticking to TDD…