The Infrastructure Needed for an Open-Source Project

Every open-source project provides a public code archive to access the source code, documentation on both how to use and how to modify the code, mailing lists and newsgroups to discuss issues, a database to record bugs, and a website to provide access to the preceding facilities. These core features are the skeleton on which a healthy open-source project is built.

Public Code Archive

A prime requirement for an open-source project is that the source code be publicly available.
Any developer, whether inside or outside the company, should be able to get the latest version of the code anytime.
A developer who is in charge of maintaining a module--the module owner --should be able to make changes directly to the source code.
Contributions and bug fixes from developers who have not yet been granted write-access to the source code need to be integrated into the source code archive in a timely manner. This can be a major task and will most likely be done by the various module owners and other core developers.
Builds of the source code must be done frequently (daily where possible) and made available for download by developers and users. Usually a less recent build that was fairly stable is also kept available for download. Users should always be able to download a working version of the code.

This is very different from the usual proprietary development model where the internal developers have their own private copy of the official source code that they periodically release to external developers. With everyone sharing the same source code archive, when any developer fixes a bug or makes a change, it is immediately available to all of the other developers; internal developers do not get special access. You do not want to discourage potential contributors by having them spend time tracking down a bug and fixing it, only to find that someone else has already done so.

Most open-source projects use CVS (Concurrent Versioning System) to maintain the repository of shared code. CVS provides a way for multiple developers to edit the source code without stepping on each other's changes. It also supports defining branches to create multiple versions of the code and makes it possible to roll back to an earlier version. CVS is itself an open-source project and is available for free.1

Whatever source control management (SCM) system your project uses, it is important that it be freely available to your project's developers. Someone working part-time on your project will not be willing to spend lots of money to purchase a proprietary SCM. Your current development team members may believe that whatever proprietary SCM your company uses is superior to CVS and that they will have to give up useful features if they are forced to switch. They need to understand that it is important that everyone uses the same SCM system and that the features they lose by using CVS will be made up in other ways by working with outside developers. It is hoped that the missing features will eventually be added either to CVS or to its replacement, Subversion. The Subversion project2 is working to build a version control system that will be a compelling replacement for CVS in the open-source community.

Bonsai3 is a tool used by the Mozilla project that lets developers perform queries on the contents of a CVS archive to determine recent changes and who was responsible for changing a particular line of code.

One job that must be assigned to a team member is to set up and maintain the CVS code server. A second job is to do the daily build and make sure it works--and if it doesn't, to find out why and to communicate the problem to the relevant developers. A third job, already mentioned, is to integrate bug fixes and contributions into the source code archive.

Project Documentation

In addition to the normal end-user documentation required for any software product, an open-source project needs to have good internal documentation for developers. You want to make it as easy as possible for new developers to learn their way around the source code. The easier it is to learn enough to get started, the more developers you'll attract. Conversely, if the internal documentation is poor or nonexistent, many potential developers will become frustrated and give up.

Those developers who are working on the project as part of their normal day job will be prepared to devote whatever time is needed to come up to speed on how the code is organized and how it works. For them, poor documentation is just business as usual, but for people with only a few hours of free time in their off-hours who just want to fix an annoying bug or make a minor enhancement, the quality of the documentation can make the difference between success and failure. Initially new developers need to be able to locate the relevant locations in the source code and learn enough about how that code works to be able to modify it. If they are successful, then they are likely to do more work on the code. One web-based tool that can help developers search the source code is lxr, the Linux Cross-Reference tool.4

This means that all developers who contribute to the source code need to be encouraged to document their code. Someone also needs to write design documents and keep them up-to-date--especially needed is a high-level description of the software architecture. This could be an additional job for any technical writers that are available on your project. Mailing-list archives are also a good place for new (and old) developers to learn about various design decisions.

Another important document is a project road map that describes the current development plans for the overall project and the individual modules. The road map reflects what the core developers and module owners plan to work on based on discussions on the community mailing lists. The road map allows developers and users to get a sense of what changes they can expect and when they might happen. They can join in the official work then or, if their particular needs are for other changes, they know that they need to organize additional efforts. Many developers decide what they will work on by consulting the project road map, so it is vital that the road map be kept current. The road map, and the community discussions about what features are part of it, helps to focus your project and give it direction.

You will also want to create wish lists for the overall project and for each individual module. The wish lists are a good place for developers to look when they want to find something interesting to do that will help the project. The process of creating the wish list encourages users to speak up and participate in the design. Involving users as designers is essential if the project is to be really successful.

Your users will also need documentation. It is likely that some of them will be willing to help write it. The public user mailing list is an excellent source of information; organizing information in it into a list of frequently asked questions (FAQ) is a good first step.

If you have professional technical writers working on your project, they will probably be the module owners for the documentation. Just as with code, others can send them suggestions and corrections, but they are the ones who decide what goes into the documentation. Of course, as with a code module, a volunteer who makes a number of good suggestions can earn the right to edit a document directly.

Be sure that the documentation you provide is in a format that people can easily read such as plain text or HTML. Don't require that your users and developers have special software to read the documentation--especially software they need to pay for! This rule also applies to the software needed to create the documentation. A number of open-source projects have discussed standardizing on DocBook/XML as the canonical format for open-source documentation.5

Bug Database

Bugs happen. Being able to keep track of outstanding bugs is a must. The bug-tracking tool you choose should be as easy as possible for developers to use; otherwise they may not. Some developers prefer a bug tool that they can use via email so that bug reports are mailed to them and they can reply via email. Other developers prefer a web-based bug database.

Because users are a prime source of bug reports, it should to be easy for them to report bugs. Keep in mind that they have already suffered by discovering bugs--they may have lost their work and undoubtedly lost time--so don't make it painful for them to submit bug reports too. It makes sense to have a special, easy way for users to report bugs, different from the one developers use--it may be necessary for someone in the project to then check whether the bug has already been reported and, if not, to add it the bug database. Also, the more encouragement you can give to users to report bugs well, the more testers you will effectively have.

Another good practice is to have a developer who is responsible for reading the user mailing list and filing bug reports for problems reported there. It may make sense to have a separate mailing list just for bug reports.

The bug database may also be a good place to keep track of suggested enhancements. Any developer should be able to record an idea for a future improvement. Module owners should make sure that good ideas that come up in the mailing list are also recorded. Periodically, the suggestions recorded in the bug database should be used to update the wish lists for the overall project and for the individual modules.

An example of a bug-tracking system used by various open-source projects is Bugzilla.6 It was originally created for the Mozilla project and the source code for it is freely available. The NetBeans open-source project is currently using Issuezilla, which is based on Bugzilla. Another open-source bug system being developed is Scarab.7 The work to maintain the bug database is another infrastructure job.

Open Mailing Lists or Newsgroups

It is important that all the discussion about an open-source project be done in the open. All the users and developers should be using a public mailing list or newsgroup for their discussions. These discussions include announcements, bug reporting, problems and how to solve them, design issues, and proposals for future work. Your internal developers should also be participating there and not just using internal mailing lists. Note: For brevity's sake, in the following discussion, when we talk about using a public mailing list we also include using a public newsgroup or a web-based discussion forum.

Let Everyone Know What is Happening

It is vital that community members be involved in discussions with your internal developers. If it appears that your internal developers are doing their work in closed internal meetings and exchanges of private email, then outside developers will feel like they are being treated as second-class citizens and will not participate as much as they might. This is not to say that your internal developers must communicate only through public email lists or newsgroups, but when you have a meeting be sure to post notes from it to the list and consider inviting outside developers to attend the meeting, either in person or over the phone. Also, if you post the agenda beforehand, outside developers can express their viewpoints via email and these views can be considered at the meeting. Note that the more design work done via email, the easier it is to preserve it in a mailing list archive; often design decisions are never documented and this can create problems down the road when the assumptions underlying the design change or new people join the project and need to understand why certain decisions were made the way they were.

There are other reasons to let everyone know what is being considered. It is important to alert the community about any plans to make major changes to the code. The worst situation is when a module owner makes large changes to a module with no notification whatsoever. If other developers find out about the change because it breaks something they are working on or because when they go to do something with the module they find the code totally different, then they are not going to be happy. They are going to feel that they cannot depend on that module, because to them it has changed in an arbitrary manner. It is slightly better if the module owner announces that the code has been changed; the other developers may be just as bad off, but at least they know why sooner. The best thing to do is to announce in advance which changes are being considered, engage in open discussion with anyone concerned, and then announce what has been changed when the new code is checked in. Outside developers will feel they are part of the process because they really are part of it--plus the quality of the design will likely be improved because the discussions included more viewpoints.

On an Apache project mailing list in July 2000 there was a heated exchange of messages caused by private discussions. Two groups, one from Sun and the other from IBM, had each been discussing a particular major component primarily among themselves instead of using the public email list. When one group announced a new project to redo the component's architecture, the other--which had been developing that component--basically said "Hey, who are you to tell us what to do? That's our code and we'll take care of it ourselves, thank you very much." Lots of flames then flooded the mailing list. This case eventually had a satisfactory resolution, but the upset never needed to happen. It turns out that the group members who were originally developing the component had concluded in their private discussion that it should be redesigned, but that they didn't have the manpower internally to do so. Had they posted their discussion, they would have gotten lots of volunteers and retained their leadership position. Meanwhile, the other group of developers had privately discussed their problems with the current design, but didn't make them public until they announced the new project. Had they posted their comments earlier, everyone would have come out for a redesign, and they wouldn't have created a rift with the original development group.8

Posting Etiquette

In general, your internal developers have to be more careful about what they post than random, outside developers do. Even though they may think they are speaking as individuals, everyone else will take what they say as your company's official policy. If an internal developer flames someone, it will not be seen as the action of an individual but rather as the action of "one of those arrogant guys" at your company. However, it is really important that your developers do participate, so you need to encourage them to do so. Some folks will never trust your company, but they may trust Ken, Yarda, and Stefan--employees of your company that the community comes to know as individuals.

As a rule, when sending a reply to a public mailing list or newsgroup remember that it will be read by everyone, taken out of context, and viewed suspiciously by folks trying to determine what your company is up to now. If you have a strong emotional feeling when writing the message, it might be best not to send it right away; instead, wait a day and see how it looks then. Often the first draft will let you get any angry feelings out of your system, and the second draft is the one that you should actually send.

One final point about messages: Don't be defensive. If someone attacks your company's actions, it is much better to wait a bit and see if, as often happens, another outsider will come to your company's defense. This will carry more weight than if the same message came from an employee of your company. Of course, if the original message points out an actual mistake that you have made, the sooner you admit that it is a mistake and explain how you plan to correct it, the better.

Types and Number of Mailing Lists/Newsgroups

Different mailing lists or newsgroups can be used for different purposes. A special moderated list for major announcements that receives only infrequent posts will be subscribed to by many users and developers. An unmoderated list with frequent postings on technical issues will have a smaller audience. In general, it is better to have too few mailing lists than to have too many.

Which would you prefer: to arrive at a party where the first room you see is comfortably filled with people talking animatedly, where the host greets you and starts introductions, and where groups move into nearby rooms when appropriate; or to arrive at a party where some rooms have only one or two people doing and saying nothing while the rest are empty? Following this analogy, you should strive to make your email lists like parties, each with a comfortable number of active conversations and an attentive host. To start this off, you should try to create one email list that is alive with activity, creating new lists only as needed. A host has an important job that must be done well because it helps set the tone for the community. Part of the host's job is to actively discourage flaming in order to make the list a safe place for people to post to.

A small project may need only a single mailing list for all project-related discussions. For a large, active project, each module might have several mailing lists. When the number of messages sent out each day grows too large, the people on the list might call for splitting it into several lists. Before you do so, make sure that it's not just a temporary increase in traffic. A single thread that lots of folks chime in on can easily double the message rate.

You will need a mailing list for users. Free software generally does not have a customer support hotline that users can call when they have a problem. They need a mailing list or newsgroup where they can post their questions. Developers or other users can then post replies to help them out.

It is important to keep an archive of each list. This is useful for new developers or new users so they can see if a particular issue has already been discussed. It's also useful as a group memory. Be sure to make searching the archives easy.

In order to encourage people to answer other folks' questions, the Jini team had a policy of waiting 2 days before answering a general question. (Those that could be answered only by the core team, mostly "why was this done?" questions, were answered immediately.) This worked very well and has resulted in a mailing list where community members naturally answer most posted questions, with the core developers answering only the more difficult, technical ones.

Mailing lists and archives serve another purpose: dispelling suspicions of an insider cabal. The health of any community is likely to become poisoned when its members believe that there is a group of people who are secretly making the important decisions. This suspicion can sap the morale of any organization, and it can happen even if it is well known and expected that decisions take place behind closed doors.

In an open-source project, the developers may feel that their efforts are being exploited when the project does not seem to embrace transparency. One of the best and easiest ways to avoid this potential problem is to have an archived mailing list (or lists) that has a clear influence on decision making.

Spam Concerns

With the continuing increase in the amount of unsolicited and unwanted email messages (spam) that flood the Internet, you want to do everything you can to keep spam from appearing on your project's mailing lists. If only a few spam messages make it through to your mailing lists, it's just a minor inconvenience--but if a large proportion of the messages are spam, then the mailing lists can become useless.

There are two types of spam to worry about in open-source projects: conventional spam which is typically advertising sent in bulk and spam email sent by project members but inappropriate for the list. An example of the second type of spam is a flame war on a topic unrelated to the project. Sometimes this can happen when politics or world events are mentioned.

Each project will have a slightly different definition of which messages are really spam. Is it okay for a company to post a product announcement that relates to the project? What about job offers? What about posting the same message to several of the project's mailing lists? What about continuing to argue after a decision has already been made? What about a humorous political cartoon?

Some open-source projects install spam filters to stop conventional spam from infecting their mailing lists. Others avoid conventional spam by restricting posts only to subscribers. Some allow anyone to post, but if the sender is not on an approved list then the message is first sent to a moderator. If the moderator confirms that it is a valid message, then it is forwarded on to the mailing list and the sender is added to the approved list. Sometimes messages can be delayed due to the need for moderation because it can take hours or even days before the moderator gets around to approving the message. Having several moderators located in different time zones around the globe can help.

For inappropriate messages, a moderator or other community leader can send a private note to the sender suggesting either that the email is not appropriate or that the email should be sent to individuals rather than to the whole list. For companies or individuals advertising related products or talking about possible jobs, a moderator or community leader should determine how appropriate the message is and either notify the sender if it's obviously out of line or bring it up with the community if it is a close call.

Whatever spam policy you decide on, be sure to make it clear to everyone. People need to have confidence that the mailing list software works well. It's generally better that a few spams make it through to the list than that legitimate messages be classified as spam and dropped. What constitutes spam depends on the culture of the community. A savvy community designer will post guidelines on the community website.

Project Website

In addition to mailing lists, each project needs a website where potential users and developers can find out about the project and where news about the project can be posted. The website should be the portal to all aspects of the project. The site can have user guides, tutorials, archives of the mailing lists, and other documentation. And of course there should be a download page where folks can obtain the latest version of both the source code and ready-to-use binaries. There may also be pointers to web pages for commercial products associated with the project, although the main site should not have a strong commercial feel to it.

When people first hear of your project, the project's website is where they will go to find out about it. The website will help set their expectations about your project: What is the current project status? How can they download executables or source code? Does it seem professional? Are new developers welcomed? Is it a real community where they can fully participate, or is it more like a user group where their participation is limited? Is it an active effort or does it look dead? Why should they get involved? Is there help or tutorials available for beginners? Their experience of your project through the website will dictate whether they want to get involved as either users or developers, or whether they just leave.

For a small project, the website might be just a few pages. For a large project, the site might be quite large, with each module having its own set of pages. Maintaining the website and making sure the content on it is kept current is another infrastructure job that must have a person assigned to it. For large projects, an additional person is needed to be the editor-in-chief.

We say more about the project website in the section on creating a community of developers.

1. ftp://ftp.gnu.org/gnu/non-gnu/cvs or http://www.nongnu.org/cvs

2. http://subversion.tigris.org

3. http://mozilla.org/bonsai.html

4. http://lxr.linux.no

5. http://www.oasis-open.org/docbook

6. http://www.mozilla.org/bugs

7. http://scarab.tigris.org

8. A short article on the exchange is available at http://www.xml.com/pub/a/2000/07/19/deviant/index.html and reproduced in Appendix D. The actual message can be seen starting at http://marc.theaimsgroup.com/?l=xml-apache-general&m=96942807201829&w=2.

Innovation Happens Elsewhere
Ron Goldman & Richard P. Gabriel
Send your comments to us at IHE at dreamsongs.com.

Previous Table of Contents Up Next