Thursday, 22 August 2013

i4Life Part 2: Global Species Databases

GSDs in the i4Life dataflow

Previous posts in this series:
Part 1: Improving the world's taxonomic data indexing

Global Species Databases are central to the success and quality of the Catalogue of Life as they form the core knowledge upon which it is built. As a corollary, they are an integral part of the workings of the i4Life project and the first stop in our data flow blog series (see introductory post). There are currently around 130 contributing Global Species Databases in the Catalogue of Life. But what is a Global Species Database and how do they come about?

A Global Species Database is quite simply a Catalogue of Life term to describe a global species checklist that is in a database. The database software used is not significant (although can help) but the content and the way that content has been structured to support the essential parts of a taxonomic checklist and supply the Catalogue of Life is. In the context of the Catalogue of Life, a global species checklist is a list of all known names worldwide with an associated taxonomic concept (ie accepted name or synonym). They are usually produced by taxonomists although those taxonomists do not necessarily need to be working with primary data (ie specimens). Creating a checklist can be achieved through pulling together published names and concepts from elsewhere (ie existing taxonomic publications, floras and faunas or existing regional checklists) and producing a single authoritative global view for a particular taxonomic group. Once a database is recommended to the Catalogue of Life by independent peer reviewers and accepted by the Catalogue of Life editors, the taxonomist's view is not questioned or altered. However, the structure and consistency of the data needs to be checked to ensure it meets certain data standards for inclusion in the Catalogue of Life. To enable the Catalogue of Life to do this, the submitted checklist is required to be in the form of a Standard Dataset.  On receipt of the checklist in this format, it is then easier for the Catalogue of Life to run a series of quality control checks to query the database's integrity and suitability for inclusion.

The taxon and its underlying taxa does not have to be at any particular rank in the classification. So for example, it could be a global species checklist for one particular genus or family or order, or at any taxonomic rank.

The Catalogue of Life is often criticised for presenting 'one view' of a taxon. Where a decision on classification and species name is from one taxonomic hypothesis rather than presenting the view of two or perhaps multiple. The Catalogue of Life does this purposely for simplicity, as the majority of our users are not taxonomists so do not have the expertise to decide which view to use. However, if more than one is available, the Catalogue of Life will always choose the checklist that it feels offers the most, where conditions such as checklist stability, taxonomist's credentials and global coverage are all considered. If another checklist is submitted that is considered better for our users by reviewers, then it will replace the existing checklist or part of that checklist. For example, imagine the Catalogue of Life accepts a checklist that covers three families with only 85% global coverage of taxa in each. If a new checklist is submitted for just one of those families that has 100% coverage and is judged to be authoritative and meets data standards, it will replace the original checklist for that family.  Existing gaps in the Catalogue of Life are the biggest problem for its users so it tries to achieve as much coverage as possible without compromising quality, but that does not mean that once a checklist is accepted it no longer reviews the quality of the names it holds over time.

Why do taxonomists contribute their global species checklists to the Catalogue of Life? The Catalogue of Life acts like a shop window for Global Species Databases, promoting them through the website and Annual DVD publication, as well as outreach activities around the world; whilst always ensuring that they receive full credit for their work. There is still no stable inventory of all known species, but Global Species Databases in partnership with the Catalogue of Life have done more than any other initiative towards achieving this goal, creating a unique taxonomic resource in the process.

Next up: i4Life Part 3 The Catalogue of Life

No comments:

Post a Comment