« ROI on improving human collaboration | Main | Over-staffing »
March 11, 2008Hyper-productivity
In the last few months I have been approached several times for advice about large software projects in which the fault rate is out of control. This is not as simple as saying that faults are appearing much faster than they can be fixed - at certain stages of a project, one expects this. The real problem is more complex, and relates to the nature of large software projects.
It is important to understand in what ways a large software project is unlike a small one. As I often suggest, one can gain understanding here by comparing software engineering with traditional heavy engineering. For instance, engineers who deal with extremely large and complex systems in fields such as aerospace and the military have always been used to projects that go live with a high number of faults, many of which will remain open throughout the entire lifetime of a system. Fault control in such systems is as much about managing faults as about fixing them.
However, this is not to say that one can take a relaxed attitude towards fault control. There are well-established metrics available for the "defect density" one can expect at each stage of large software projects of certain types in certain fields. These metrics allow software project managers to use the current number of faults, the current fault rate, and the current fix rate to predict where their project will be in the future - and understand whether or not these predictions are worrying.
The situations about which I was asked were worrying. Predictions showed that build deadlines would not be met, and that by a certain point in the future, the number of faults would escalate beyond the managers' ability to control. What should one do in such circumstances?
Taking another leaf out of the military's book, there are 3 approaches that should be used in combination: operational, tactical and strategic. If you prefer, think of these approaches as short-, medium- and long-term. Taken together, they provide a means for even large software projects to become "hyper-productive" - a phenomenon well-recognized in certain small teams, but rarely achieved on the biggest projects.
Operational
One needs to do something in the short-term, not only to stop the situation getting further out of hand, but also to raise morale. Developers do not like to work on failing projects, and once your developers start leaving (or just giving up), you are really in trouble. Even if management somehow conceal the true situation from developers - which is never easy - there is the morale of managers to consider as well.
So in the short-term, it is advisable to set up an emergency room - a lab environment in which some of the best developers can work free from disturbance to knock some of the worst faults on the head as soon as possible. It is often surprising how much progress can be made by reducing the amount of interruption people are subject to. People should not be allowed to enter the emergency room without permission, and if the lab fault fixers need to talk to someone from outside, that person should be brought into the room specially.
A manager should be appointed for the lab, a person who is able to create a team atmosphere as well as maintain a sense of urgency. A key part of the manager's role is to ensure that work is allocated in the most efficient manner possible from hour to hour - sometimes from minute to minute. Everyone should be fully loaded, the whole time, and distractions such as checking email discouraged. The aim is to maintain a sense of flow.
Lab work of this kind is exhausting, so the participants should be rewarded with overtime payments and pampered with free, luxury food and drink. However, the work is also rewarding due to the awareness of visible progress and the team spirit that should be engendered. Further, such a lab is a powerful means of knowledge sharing - after working in the lab, a developer will not only have more system knowledge but may well have acquired new ideas and techniques that will be useful going forward.
Nevertheless, this approach is not intended to be a normal part of the project. If it is necessary to carry it on for more than a month, the participants should change regularly - weekly, for example - to avoid burn-out. Lessons can be learnt by both developers and managers from such a lab, but the emphasis should be on carrying these lessons into regular daily practice rather than on sustaining the lab itself.
Tactical
In the medium-term, there are various means to reduce pressure on a project, with which most project managers are familiar. Deadlines and the content of deliverables can be renegotiated with the client. Workarounds can be developed for areas of fault. End-user expectations can be managed, such as when to expect the introduction of particular aspects of system functionality. Training can be planned so that end-users acquire system knowledge gradually, in order to allow time for the corresponding functionality to be released. Easy-to-use techniques can be developed for end-user fault reporting and control.
When proposing such measures to the client(s), one should take care to present a positive picture. Don't emphasize the trouble the project is in! Rather, explain how your advanced project management techniques have identified potential problem areas well in advance, and prepared a range of compensating solutions. The client(s) should come away from such a meeting feeling glad that you are in charge, not someone who might have let the problems go unnoticed until it was too late.
Strategic
However, it is only fair to admit that, if the project has got to this state, someone has slipped up - even if it was your predecessors. Clearly the scope of the work was misjudged, inadequate development techniques have been used, project management has not been optimal, and so on. So it is necessary to take remedial action to get the project properly back on track in the long-term.
There is a wealth of advice available here, particularly with regard to "Agile" techniques. I recommend Jim Coplien's book Organizational Patterns of Agile Software Development. If you can't face a long read, here are some summaries of "the ten patterns that research has found most strongly correlate to business success": PDF HTML.
One should take the time to understand what Agile project management is all about. Unfortunately, though, Agile techniques work much better with small projects than with large ones. In my experience, agile techniques typically have most success in projects with less than 50 staff, and projects with over 100 staff need something else. So here are some techniques that I have found useful in getting large projects back on track.
- Food
Provide ample, high-quality food free to all project staff daily. A small investment in pampering of (say) £10 per person per day pays huge dividends in morale and commitment. Don't be mean! The aim is to make people feel valued. - Team ownership of code
You can't act responsibly if you're not responsible for anything. Coders must own their bugs and work as a team to reduce them. Develop continuous integration practices and put up highly visible flags in each team area every morning: green, amber, red to indicate code quality for that team based on automated testing of last night's build. Institute a daily trawl through all open faults by each Work Package Manager with a view to escalating or delegating faults or actions as appropriate. Introduce a weekly quota of fault fixes based on size of team.
Careful synchronization with the emergency room is clearly necessary here while it is running. - Knowledge management
Build system knowledge. Black box: give developers a user's perspective - terminology, operation, concerns, etc. White box: make sure they understand what all the different parts of the system do. Do both these things as early and as widely as possible to get maximum benefit. It doesn't take as long as you might think - lunchtime seminars could be the way to go, for instance. - Testing
Give developers a means to test the system as a whole (not just to run unit tests). In particular, they need a framework with which they can construct and run end-to-end tests themselves, for example to reproduce informal user fault reports or check particular behaviour relevant to a fault. Helping develop such a framework may be one very useful output from the emergency room. - Pair programming
Pairs of roughly equal ability but mixed knowledge work best. Change the pairs often. Check to ensure that both people are contributing rather than having one driver and one passenger. Convince people to give it a go - nearly everyone finds they like it. In general it is more than twice as productive - i.e., worth doing. - Systematic fault fixing
Most developers in the industry as a whole spend over 50% of their time fault fixing. Yet they never learn how to do it properly - probably because there is no standard methodology or even practices. I have compiled a systematic fault diagnosis procedure based on the research papers and books that are available, and found that for faults that take more than 2 hours to diagnose, it makes a dramatic difference. It has enabled faults to be solved in hours on which experienced developers previously spent weeks and got nowhere. - Reduce distraction
The BBC reported on 13 August 2007 that "Workers are 'stressed out' by e-mails". Regular readers of this blog will be aware of my view that this is a pernicious problem across the board in the modern workplace, and that the solution lies in new Human Interaction Management tools such as HumanEdj.
TAKE AWAY
It is often tempting to push problems under the carpet when they seem to huge too solve. Large software projects that have gone out of control can seem like a many-headed hydra - as soon as you solve one problem, ten more problems appear. However, there is always a way through the labyrinth.
The key is to be systematic - in particular, divide remedial measures into operational, tactical and strategic.
Further, take care to explain what you are doing to others and get them on board - this is key. Developers as well as managers should understand what is going on and what you are doing about it. A typical but very unhelpful approach to managing crisis situations is to take everything behind closed doors. Resist this temptation! People are surprisingly good-natured, even in times of stress, if they are made to feel a valued part of something - as opposed to being excluded from important discussions.
There is a route to hyper-productivity, and it is not hard. You just have to take it in stages.
Posted by keithhb in
Management
|
Digg This|
Add to del.icio.us


IT Directions
