We use the term Global Species Database when describing how the Catalogue of Life is put together. Global Species Databases or GSDs are ideally monographic databases with global coverage of all known species in one taxon. They can be integrated side-by-side with other GSDs without any overlap. In an ideal world the Catalogue of Life would assemble GSD after GSD to the point where all known species are covered. Unfortunately this ideal scenario does not exist and the Catalogue of Life has had to find different ways to increase coverage where gaps remain.
One could describe the Catalogue of Life as a 'bottom-up' approach to assembling the world's biodiversity. Where species only make it into the Catalogue when they are part of an existing checklist that has been taxonomically scrutinised. This leaves taxon gaps because there are not specialists working in all areas, or where there are specialists, they are not necessarily building electronic resources. A different approach is to take all the species concepts available from all different sources and try to automatically assemble them - this would be considered a 'top-down' approach and can be seen in other existing biodiversity aggregation portals. It is definitely much faster but it will still leave gaps and will contain high levels of error.
The Catalogue of Life employs a third-way to enable it to incorporate (in addition to GSDs), large, existing regional checklists holding many thousands of names and taxonomic concepts. It creates what we call a 'proto-GSD'. A proto-GSD combines multiple regional checklists to try to achieve greater coverage for a particular taxon. One would think this would be quite straightforward, after all isn't it just a case of combining datasets and removing the duplicates? Well not exactly, as combining regional datasets can lead to all sorts of taxonomic issues because of possible duplication in species names and also the conflicting species concepts and classification systems. Take for example the Family Gentianaceae in the Plant Kingdom. This family of plants has an estimated 1650+ species worldwide. We have two current suppliers of Gentianaceae to the Catalogue of Life - ITIS Regional database and Catalogue of Life China. Together they supply the Catalogue of Life with 552 species in addition to 82 infraspecific taxa. The species Gentianella acuta (Michx.) Hultén appears in both checklists where it is a synonym in ITIS Regional and an accepted name in the Catalogue of Life China. This is because some of the species of Gentianaceae are cosmopolitan (ie present in North America and China) and the taxonomic concept (ie accepted name or synonym) is different. To combine the datasets the Catalogue of Life editors had to resolve these issues before publishing it in the Catalogue of Life. Gentianaceae is now part of a 'proto-GSD'.
Hopefully one day a taxonomic Gentianaceae expert will produce a Global Species Database for the family, if considered of good enough quality by the Catalogue of Life, it will replace the regional databases for all Gentianaceae species.
The map below shows an overview of the worldwide distribution of Gentianaceae (Struwe, 2002) in addition to the approximate coverage of the proto-GSD. However, many species listed in regional checklists will cover indigenous species found elsewhere.