I just finished reading the anniversary edition of The Mythical Man-Month: Essays on Software Engineering by Frederick P. Brooks Jr., a classic on the difficulties of project management in the software engineering industry. It's one of those great books that makes you step back and think things through from a new perspective. A few short notes follow, where I drew an analogy to the over-budget experiences in process engineering and just managing large projects in general.
The book was written in 1975 as a from-the-trenches meditation on the hazards of managing huge software engineering projects. For example, writing operating systems. Although the book deals with software engineering, and some of the challenges described are long-gone due to advances in technology, there is still some wisdom in the book that causes it to be read to this day. And Brook is a good writer: frank, wide-ranging in interests and quotations, practical, and interesting. It’s one of those “work books” that you won’t mind reading in your spare time.
Brooks’ Law: Adding Manpower to a Late Project Will Only Make it Later
This is one of the core ideas of the book, so let's step back and understand it. First, Brooks points out that large projects grow exponentially harder as they grow larger. One of the reasons is the inter-communication factor: it becomes harder and harder to keep large groups of people aligned, and allow their pieces of work to link up.
Some work, like picking cotton or sewing shirts, does not have this problem. In such jobs, people can work independently and with minimal interaction. Adding more people does not add to the communication burden, so you can always add people onto a job to reduce the time it takes. Men and months are interchangeable: add more of one to reduce the other.
But in software engineering (and many modern projects) you need people communicating, because their changes can directly impact on or interface with the work of other people.
If each part of the task must be separately coordinated with each other part the effort increases as n(n-1)/2. Three workers require three times as much pairwise intercommunication as two; four require six times as much as two. If, moreover, there need to be conferences among three, four, etc. workers to resolve things jointly, matters get worse yet.
This can lead to perverse outcomes. For example, suppose a project is estimated to take 12 man-months of effort, and the deadline is in four months. So you hire 3 people, and divide the project into four “stages” or “milestones,” with each stage taking a month for your team to complete. (3 people * 1 month = 3 man-months effort / month time elapsed. One quarter of 12 man-months of work is 3 man-months of effort). But say you under-estimated the first stage and it takes two months. Now what?
You have 2 months to do 9 man-months of work. You need 9/2 = 4.5 people, so hire 5 people? Well…except that there is a cost to brining new people in. Having HR find and hire new people could take longer than the project is budgeted for! And even if you do have qualified workers sitting on the benches within your own organization, you still have to bring them up to speed on the project, on your team’s approach, what custom programs you’ve built to tackle the problem, how to interface with other people’s sub-programs, what the user experience must be, etc. etc. And there is work involved re-dividing the programming work into smaller chunks that can be worked independently, and then teaching the new people about their roles. So realistically you may need more than 5…maybe you need 10 people? But 10 is way more than 5, so now the training load and the work spent dividing up the project gets even higher! Is it even possible to divide the work into chunks so small 10 people can feasibly work at the same time? Better figure that out fast!
So even if you do bring 10 people, and manage to divide up the work, probably you get to the end of month 3 and you’ve just finished training everyone and dividing the work. So even though you’re now on track, how does it look to the upper management outside of your project: you have one month left to complete 75% of the job! The temptation will be for management to add more people at this point, slipping the schedule further.
Obviously a good schedule and good schedule monitoring is used to try and avoid this scenario. But if you DO get stuck in this situation, it may be more realistic to just let the schedule slip a month, or to cut some features from the project, rather than try to add more people. The farther you get into a project, the less feasible it is to save it with additional manpower.
Job Architecture is key
One of the difficulties is designing an overall plan or solution infrastructure to use, and then splitting up the work. You need to keep an overall vision, or specification, of how the program will work for the user. (i.e. how the project will look for the client). You also need to break this down into chunks of work that can proceed, semi-independently of other chunks. Brooks recommends having a single person acting as the “architect” for the job, who plans out how the program will look and how to divide the work at a high level. Coherence in the design, where the group is steered by one mind championing the user’s experience with the product, is key. If you don’t keep people working to a single vision, the end product can be bloated and/or unfocused.
An analogue to process design is dividing a plant into operating units, having a frozen design bases for the individual units of a plant, and carefully defining the inter-unit communication that is required. In this way you may design a hydrotreater to 6000 barrels per day, even though the final results show that the crude unit will only produce 5683 barrels per day to feed it. You might leave the feed hydraulics loose and unfinished until the crude unit hydraulics are complete.
Brooks recommends that the architect, when dividing the work, have the skill to be able to propose a solution method for each piece of work they give out. i.e. the architect should never create an impossible assignment. However, implementing the work lies in the hands of technical people assigned to do the chunk of work. As long as it won’t hurt the rest of the project, let them solve their problem their own way. Let them use their creativity and own the solutions.
“Surgical Team” Approach
People love to day-dream about using a small, super-star team of programmers to do their work. Studies have shown that star performers produce better program codes and can end up being 10x more productive than standard workers. However, to implement the really huge projects in a timely manner the super-star approach is impossible. If you do the math (Brooks provides a good example), it really does require large teams to get the huge projects out the door in time. No one will want your program a decade after you’ve started it. Technology, and business, moves too fast. Really big projects really do require really big teams, and they cannot all be super stars.
To help capture some of the power of star performers, Brooks suggests the metaphor of a surgical team. In surgical teams normally one very qualified doctor does the surgery – the actual cutting. The surgeon also leads the operation and dictates what happens. But he has many members of his team around helping: they prepare the tools, hold the light, apply the bandage, etc. They take care of the minor tasks and let the surgeon just focus on the surgery with minimal time spent on distracting or low-value work.
In a similar way, Brooks suggests the chunks defined by the architect be given to “surgical teams” of programmers and support staff. In these teams a star programmer leads the effort on a chunk of work, doing most of the coding. A “co-surgeon” assists with the coding, provides ideas, and the two basically work simultaneously as a pair. The co-surgeon can represent the pair in low-level meetings, and take over if the surgeon gets sick or leaves the project. But in disputes the surgeon can pull rank and decide the way forward.
The team also consists of junior programmers and admin staff, helping to do some of the work tracking minutes of meeting, getting bug tests ready, etc. Some support staff can be shared between two teams.
The design engineering analogue is putting large project pieces into the care of experienced engineering leads, and letting them delegate to and get support from more junior engineers. Ideally you have one junior good enough to step in for vacations and emergencies.
Brooks identifies the rising availability of off-the-shelf software. In some cases, it makes more sense to buy new software as a replacement or a component of your program. Getting a proven solution quickly with no manpower requirement can justify very large up front costs.
An analogue in the engineering is hiring consultants, or using “package units” that bundle equipment together into a unified package that can be purchased and installed as one.
A second analogue is to take this at face value: why pay your team for hours of work to design a calculation spreadsheet or program if you can buy one? (Although that buying and validating a program has its own costs).
Separate Business and Technical Leadership
A project needs technical leadership, to divide the work and provide expertise for tricky technical issues. It also needs business/human leadership, to interface with the rest of the company, get people, get resources, approve expenses, approve vacations, pair mentors to junior workers, etc. etc.
It can be hard to find people who are excellent at both aspects of leadership, and in large projects these jobs can both become huge burdens. Consider separating the two and providing equal prestige and pay for both career paths. Providing equal prestige is actually very hard, but necessary.
Brooks wrote a second book, The Design of Design: Essays from a Computer Scientist, which continued these themes and is also an interesting read. He uses some examples working on home renovation projects which are easy for anyone to identify with. I love the example of where he got around zoning rules that said “the home must be XX ft from the property line” by buying an unused 5 ft strip of property from his neighbor, rather than fighting for an exception or compromising the design of the house.
A complimentary text, or a counter-text?
In a similar vein, I hope to soon read Rework by 37 Signals. 37 Signals is a popular “less is more” web application company. I expect they will half agree, that having a tight architect’s vision is key and that trimming scope is the best way to fight deadlines. But half disagree, pushing for smaller teams, less pre-planning, less in depth bug checking.
Partly they can get away with this because of the unique nature of web applications, which allows a much faster development cycle than most forms of engineering and are much more open to fixing in the field. (And not killing anyone with a failure). By focusing on smaller scopes they can cut a lot of fat and live the small-teams dream.
Photo credit: Sarah McD