February 10, 2008   Sign In |  About ebizQ |  Contact Us |  Join ebizQ Gold Club
Keith Harrison-Broninski
IT Directions
Keith Harrison-Broninski cuts through the hype in his hands-on guide to where enterprise technology is really going.

« Whatever happened to software engineering? | Main | BPEL4People and human interactions »

June 25, 2007
BPM and bugs

In my role as a consultant, I was recently asked to help a large-scale safety-critical project with their fault management. Despite the use of design-by-contract techniques, that on smaller systems typically ensure a very low defect density, the sheer size of this system meant that the number of bugs was far higher than expected. As a result, the project has been struggling in various ways.

My recommendations had various aspects:


  1. Strategic: the introduction of new code delivery processes based on Human Interaction Management principles;

  2. Tactical: emergency measures to ensure that upcoming deadlines are met;

  3. Operational: rationalization of how they are going about fault diagnosis.

The last of these approaches is what I want to discuss here - and in particular, I want to draw out how it relates to BPM.

When I say "fault diagnosis" I mean the engineering work that takes place in between a test failing and the start of coding to implement a solution, i.e., the detection of the corresponding bugs in the code, and the design of a solution. There are 2 related problems with the way my client is going about fault diagnosis on this project.

First, all the complex faults are handed to a small number of key individuals, who possess not only a high degree of skill but also extensive knowledge of the system. This means that they get better and better at fault fixing, while others do not improve. The project team is polarizing.

Second, even among this small cadre of experts there is no consistency imposed on the way they go about fault diagnosis. As a result, it seems likely that many faults are being mis-diagnosed: attributed to the wrong causes, only partially fixed, or fixed in such a way that new errors are introduced in other parts of the system.

As part of the solution to this operational problem, I created a generic checklist for "Software Fault Diagnosis". This consists of 4 stages (Symptoms, Scenarios, Sources, Solutions), each containing a number of steps. We are now customizing this checklist for the system in question, with the intention of giving it to each developer to put on their wall. It has 2 purposes: as an artefact capturing the knowledge currently in key individuals' heads (which can be maintained along with the system itself), and as a means of improving how everyone does fault fixing.

Interestingly, when I came to draw up the generic checklist, I assumed that it would be possible to find such an item for download somewhere on the Web. Yet a wide trawl of the research literature revealed nothing suitable - certainly nothing applicable to a large-scale, highly distributed system. I had to combine elements from many different sources to put it together.

What has all this to do with BPM?

In my last post I touched on the problems with verification of large processes designed using mainstream BPM applications. Put simply, currently there does not seem to be an equivalent in the BPM world for the various forms of testing deemed essential when software is constructed by any other means. Following this train of thought, there is a next logical question to ask.

TAKE AWAY

Suppose you have just introduced a large-scale process, built using a BPM suite, and the process is not working as expected. What tools do you have for finding the fault?

I will be very interested to hear how readers of this blog are going about it. From my own observations, I suspect that fault diagnosis in BPM-based systems is currently very immature. This may not be surprising when you consider that, even after over 50 years of commercial software, fault diagnosis in conventional software is not yet standardized - as evinced by my inability to find a standard procedure for it in the research literature.

However, at least with conventional software, there are well-known and proven techniques (such as binary search) and tools (such as interactive debuggers) for fault fixing. For fault diagnosis, as for testing, it seems that BPM is not yet delivering enterprise-strength solutions. This represents a serious business risk of which any current or prospective BPM user needs at least to be aware.

Posted by keithhb in Business Process Management |Digg This|Add to del.icio.us

Comments

Interesting post.

Two thoughts.

1. A thing that always surprises me, is the complexity and confusion that arises if you find yourself in a situation where two correlating defects occur. It would be nice if in the diagnosis model one could have that warning and appropriate research ways to detect that early.

2. Structuring your defect analysis proces is a great step. Various process improvement frameworks, and I thing Lean is the best example, have this concept. I have seen software development companies that do this, and in addition also create a measuring system around it - e.g. What type of defects occur? This information can help you improve in a structured way the rest of the coding.

The reasons you don't find these types of diagnosis on the Internet is an interesting question. From a supplier perspective, I guess your defect processes are a bit hidden. But the Open Source community would greatly benefit by more common, proven methods for defect analysis and measuring!

The painfull fact is of course that every 5 years, we revamp our development technology, throwing us back at code statements for debugging purposes "Am now on Point XYZ, variable A has value....", where we had full blown code debug systems... But that's tools - the proces stays the same...

Regards,
Roeland

Posted by: Roeland at June 29, 2007 04:20 AM

Post a comment




Remember Me?

(you may use HTML tags for style)

We ask that you type your code (displayed below) in the text box.This code is an image that cannot be read by a machine. It prevents automated programs from submitting comments.


Code:



Most Recent ebizQ Blog Entries
ADVERTISEMENT
RSS Subscription

Blog Roll
This Work
Accountability:The opinions expressed in this blog are solely representative of the blog's author, and not of ebizQ

Subscribe to our Newsletters
ebizQ Weekly Gold Club Update
Live Webinar Updates
Updates from ebizQ Partners
ebizQ SOA Update
ebizQ BPM Update
ebizQ Security Update
ebizQ BI Update
ebizQ Open Source Software Update
Virtual Show Newsletter
Your E-mail Address:
BAM: The Killer App for CEP
Date: Feb 12, 2008
Time: 12:00 PM ET
(17:00 GMT)

I WANT TO ATTEND
Event Processing Market Pulse
Date: Feb 14, 2008
Time: 12:00 PM ET
(17:00 GMT)

I WANT TO ATTEND
Archived Webinars | Upcoming Webinars

Marketing Solutions | Feedback | About ebizQ | Unsubscribe | Privacy Policy | Site Map