March 25, 2008
Over-staffing
Following my last post, on Hyper-productivity, I thought I should add some remarks about staff levels.
Budget and resource constraints often mean that, even if too many faults are appearing in a software project, it is not always possible just to add staff. However, even if it is possible, it is often not a good idea.
Adding staff works on the seemingly common sense principle that if 10 developers can fix n faults per week, 20 developers can fix something not too far from 2n faults per week. Software project managers often seem to believe that having more people are working on the job will inevitably lead to an improved delivery rate.
Unfortunately, however, this is not true. In fact, adding staff may well lead to a reduced delivery rate.
Many readers will be familiar with Fred Brooks' seminal book on software project management, The Mythical Man-Month: Essays on Software Engineering. The concept of "The Mythical Man-Month" refers to his principle that "Adding manpower to a late software project makes it later". Brooks says this is because adding people to a project has 2 primary knock-on impacts.
First, the newcomers have to be trained in the skills they need, and familiarized with the project itself. This takes up not only their time but the time of other people - often the people who know most, and who are hence most productive. Induction also takes up other resources of various kinds that can then bottleneck the overall work being done by the project.
Second, the number of communication channels increases dramatically with the number of staff. Brooks' formula is that for n developers, there are n(n - 1) / 2 communication channels within the group. For example, 50 developers implies 50(50 - 1) / 2 = 1225 channels of communication.
This second point is a simplification, of course. Many people would argue that careful management can reduce the number of communication channels. However, from my experience, Brooks' formula is useful whether or not it is exactly correct, in that it indicates the scale of the problem. If you observe any large project from the floor you will see that there is a great deal of human interaction taking place, of all different kinds - not just formal meetings, but emails raising issues, questions asked over the cubicle wall, casual discussions by the coffee machine, and so on. It is very hard indeed to quantify this interaction and even harder to assess its value.
TAKE AWAY
It is possible to add staff to a late-running software development project and gain value. However, it must be done very carefully. To be specific, first you must design the collaborative work processes in which the new staff will engage, using the techniques of Human Interaction Management.
Only by this means can you find out what bang you will get for your buck - what improvement in delivery rate you can expect from purchasing extra staff. In fact, only by this means can you be sure you will actually get an improvement, rather than the deterioriation expected by Brooks.
Further, a moment's thought will show that designing collaborative work processes for new staff is no use unless you have already designed the collaborative work processes for current staff - and ensured that current staff are actually engaging in these processes. Otherwise the structured interactions of the new workers will dissipate into chaos as soon as they start working with people already on the project.
It is possible to improve a late-running project by adding staff, but you can't do it by building a house on sand.
Posted by keithhb in
Management
| Permalink
| Comments (0)
March 11, 2008
Hyper-productivity
In the last few months I have been approached several times for advice about large software projects in which the fault rate is out of control. This is not as simple as saying that faults are appearing much faster than they can be fixed - at certain stages of a project, one expects this. The real problem is more complex, and relates to the nature of large software projects.
It is important to understand in what ways a large software project is unlike a small one. As I often suggest, one can gain understanding here by comparing software engineering with traditional heavy engineering. For instance, engineers who deal with extremely large and complex systems in fields such as aerospace and the military have always been used to projects that go live with a high number of faults, many of which will remain open throughout the entire lifetime of a system. Fault control in such systems is as much about managing faults as about fixing them.
However, this is not to say that one can take a relaxed attitude towards fault control. There are well-established metrics available for the "defect density" one can expect at each stage of large software projects of certain types in certain fields. These metrics allow software project managers to use the current number of faults, the current fault rate, and the current fix rate to predict where their project will be in the future - and understand whether or not these predictions are worrying.
The situations about which I was asked were worrying. Predictions showed that build deadlines would not be met, and that by a certain point in the future, the number of faults would escalate beyond the managers' ability to control. What should one do in such circumstances?
Taking another leaf out of the military's book, there are 3 approaches that should be used in combination: operational, tactical and strategic. If you prefer, think of these approaches as short-, medium- and long-term. Taken together, they provide a means for even large software projects to become "hyper-productive" - a phenomenon well-recognized in certain small teams, but rarely achieved on the biggest projects.
Operational
One needs to do something in the short-term, not only to stop the situation getting further out of hand, but also to raise morale. Developers do not like to work on failing projects, and once your developers start leaving (or just giving up), you are really in trouble. Even if management somehow conceal the true situation from developers - which is never easy - there is the morale of managers to consider as well.
So in the short-term, it is advisable to set up an emergency room - a lab environment in which some of the best developers can work free from disturbance to knock some of the worst faults on the head as soon as possible. It is often surprising how much progress can be made by reducing the amount of interruption people are subject to. People should not be allowed to enter the emergency room without permission, and if the lab fault fixers need to talk to someone from outside, that person should be brought into the room specially.
A manager should be appointed for the lab, a person who is able to create a team atmosphere as well as maintain a sense of urgency. A key part of the manager's role is to ensure that work is allocated in the most efficient manner possible from hour to hour - sometimes from minute to minute. Everyone should be fully loaded, the whole time, and distractions such as checking email discouraged. The aim is to maintain a sense of flow.
Lab work of this kind is exhausting, so the participants should be rewarded with overtime payments and pampered with free, luxury food and drink. However, the work is also rewarding due to the awareness of visible progress and the team spirit that should be engendered. Further, such a lab is a powerful means of knowledge sharing - after working in the lab, a developer will not only have more system knowledge but may well have acquired new ideas and techniques that will be useful going forward.
Nevertheless, this approach is not intended to be a normal part of the project. If it is necessary to carry it on for more than a month, the participants should change regularly - weekly, for example - to avoid burn-out. Lessons can be learnt by both developers and managers from such a lab, but the emphasis should be on carrying these lessons into regular daily practice rather than on sustaining the lab itself.
Tactical
In the medium-term, there are various means to reduce pressure on a project, with which most project managers are familiar. Deadlines and the content of deliverables can be renegotiated with the client. Workarounds can be developed for areas of fault. End-user expectations can be managed, such as when to expect the introduction of particular aspects of system functionality. Training can be planned so that end-users acquire system knowledge gradually, in order to allow time for the corresponding functionality to be released. Easy-to-use techniques can be developed for end-user fault reporting and control.
When proposing such measures to the client(s), one should take care to present a positive picture. Don't emphasize the trouble the project is in! Rather, explain how your advanced project management techniques have identified potential problem areas well in advance, and prepared a range of compensating solutions. The client(s) should come away from such a meeting feeling glad that you are in charge, not someone who might have let the problems go unnoticed until it was too late.
Strategic
However, it is only fair to admit that, if the project has got to this state, someone has slipped up - even if it was your predecessors. Clearly the scope of the work was misjudged, inadequate development techniques have been used, project management has not been optimal, and so on. So it is necessary to take remedial action to get the project properly back on track in the long-term.
There is a wealth of advice available here, particularly with regard to "Agile" techniques. I recommend Jim Coplien's book Organizational Patterns of Agile Software Development. If you can't face a long read, here are some summaries of "the ten patterns that research has found most strongly correlate to business success": PDF HTML.
One should take the time to understand what Agile project management is all about. Unfortunately, though, Agile techniques work much better with small projects than with large ones. In my experience, agile techniques typically have most success in projects with less than 50 staff, and projects with over 100 staff need something else. So here are some techniques that I have found useful in getting large projects back on track.
- Food
Provide ample, high-quality food free to all project staff daily. A small investment in pampering of (say) £10 per person per day pays huge dividends in morale and commitment. Don't be mean! The aim is to make people feel valued.
- Team ownership of code
You can't act responsibly if you're not responsible for anything. Coders must own their bugs and work as a team to reduce them. Develop continuous integration practices and put up highly visible flags in each team area every morning: green, amber, red to indicate code quality for that team based on automated testing of last night's build. Institute a daily trawl through all open faults by each Work Package Manager with a view to escalating or delegating faults or actions as appropriate. Introduce a weekly quota of fault fixes based on size of team. Careful synchronization with the emergency room is clearly necessary here while it is running.
- Knowledge management
Build system knowledge. Black box: give developers a user's perspective - terminology, operation, concerns, etc. White box: make sure they understand what all the different parts of the system do. Do both these things as early and as widely as possible to get maximum benefit. It doesn't take as long as you might think - lunchtime seminars could be the way to go, for instance.
- Testing
Give developers a means to test the system as a whole (not just to run unit tests). In particular, they need a framework with which they can construct and run end-to-end tests themselves, for example to reproduce informal user fault reports or check particular behaviour relevant to a fault. Helping develop such a framework may be one very useful output from the emergency room.
- Pair programming
Pairs of roughly equal ability but mixed knowledge work best. Change the pairs often. Check to ensure that both people are contributing rather than having one driver and one passenger. Convince people to give it a go - nearly everyone finds they like it. In general it is more than twice as productive - i.e., worth doing.
- Systematic fault fixing
Most developers in the industry as a whole spend over 50% of their time fault fixing. Yet they never learn how to do it properly - probably because there is no standard methodology or even practices. I have compiled a systematic fault diagnosis procedure based on the research papers and books that are available, and found that for faults that take more than 2 hours to diagnose, it makes a dramatic difference. It has enabled faults to be solved in hours on which experienced developers previously spent weeks and got nowhere.
- Reduce distraction
The BBC reported on 13 August 2007 that "Workers are 'stressed out' by e-mails". Regular readers of this blog will be aware of my view that this is a pernicious problem across the board in the modern workplace, and that the solution lies in new Human Interaction Management tools such as HumanEdj.
TAKE AWAY
It is often tempting to push problems under the carpet when they seem to huge too solve. Large software projects that have gone out of control can seem like a many-headed hydra - as soon as you solve one problem, ten more problems appear. However, there is always a way through the labyrinth.
The key is to be systematic - in particular, divide remedial measures into operational, tactical and strategic.
Further, take care to explain what you are doing to others and get them on board - this is key. Developers as well as managers should understand what is going on and what you are doing about it. A typical but very unhelpful approach to managing crisis situations is to take everything behind closed doors. Resist this temptation! People are surprisingly good-natured, even in times of stress, if they are made to feel a valued part of something - as opposed to being excluded from important discussions.
There is a route to hyper-productivity, and it is not hard. You just have to take it in stages.
Posted by keithhb in
Management
| Permalink
| Comments (0)
March 04, 2008
ROI on improving human collaboration
Regular readers of this blog will know of my view that the next significant step in both IT and management is to improve the way that people do collaborative knowledge work (what I call "interaction work"). You can find all sorts of evidence for this view on the Human Interaction Management Web site. However, most people know from their own experience that action is needed, and the sooner the better. In particular, working hours are getting out of control, a trend that is not helped by the current expectation that everyone is "always on" via their mobile phone.
A typical factor in people's inability to control their workload is the necessity to use inefficient workplace software, the main culprit being email. Across the board in industry there are major problems resulting from use of email. It is not at all clear when emails should be sent, what they should contain, who should be included, the validity of requesting actions from people, who has committed to do what, the correlation between different email streams, the accuracy of data included, etc. Almost daily, this lack of clarity causes material problems in organizations of every kind.
Problems such as email usage are complex to resolve, relating as they do to process awareness, new approaches to management, new forms of software, and more. Hence solutions for such problems can be expensive to implement in large organizations. How can one justify the expense? In other words, what is the basis of a business case for the introduction of more efficient human collaboration?
It is necessary to find a quantitative means of evaluating the impact. One can list ad nauseam aspects of organizational life that will be improved by rationalizing the way people work together, but the board have a duty to consider the bottom line in any major decisions they make. However swayed they may be personally - e.g., from experience of mobile phone calls at ungodly hours - they will not be acting with due diligence if they approve a programme that has no means to demonstrate Return On Investment (ROI). So it is necessary to find a metric that is applicable to human collaboration.
This is non-trivial, since there is a difference between efficiency and effectiveness.
When measuring the impact of process improvement in mechanistic areas such as transaction handling or manufacturing, the outputs of the process are probably going to remain similar after the changes have been implemented. The aim is simply to produce these outputs quicker and cheaper. In other words, one is aiming for increased efficiency.
The same is not at all true for human collaboration. Free people up by helping them work better, and they will deliver more value to the organization. Once people are not struggling to keep afloat, they can help steer. In other words, improving interaction work delivers increased effectiveness. However, in advance one cannot always predict what form the increase will take. Take sales, for example. One might expect a salesperson who works better to make more sales, but it is quite possible that the actual improvement will be customer relationships that are longer-lasting, something that cannot be measured until enough time has passed.
So what metric should one use when proposing a programme to improve human collaboration in the organization?
TAKE AWAY
The simplest metric for human collaboration is very simple indeed. Work out how much you pay people per hour. Then count how many hours they are spending at work, taking care to include work done out of the office. If your staff start taking less time to do their work, you are getting better value for money.
This applies even if you don't pay people overtime. There is a huge hidden cost to working long hours. Tiredness and stress not only make people miserable, but reduce their contribution to the organization and increase "churn" (the frequency at which staff leave the organization altogether). These negative impacts are at least partially offset by paying people decent overtime - and if you don't pay people overtime directly, you can be sure that your organization is paying the price somehow.
So a first step for a programme to improve interaction work in your organization is to find out how much time in each day people are taking to do their work. Cost this time based on salaries, and aim to reduce the total by a specific amount - say 1 hour per day per person. This effectively gives you a budget for the interaction work improvement programme - spend any less than this in total, and you will have delivered ROI.
Of course, this is using the narrowest of measures - a measure that takes no account of the significant improvements you can expect in effectiveness, which as discussed above will be the main benefit of the programme. However, the aim here is simply to get the programme approved by the board. Once you get going, material impacts of all kinds will become evident. It will be much easier to get approval for your second interaction work improvement programme than for your first!
Posted by keithhb in
Business Process Management
| Permalink
| Comments (0)
|