Wednesday, 20 November 2013

The Catalogue of Life exceeds 1.5 million species

On 19th November 2013 the Catalogue of Life listed in excess of 1.5 million species names for the first time in its history.  This release of the Catalogue of Life, Dynamic edition, contains contributions from 142 databases with information on 1,516,879 species, 128,030 infraspecific taxa and also includes 1,162,766 synonyms and 403,928 common names.  Adding the names of synonyms and species means the index contains over 2.5million names recoverable by searching.  The Catalogue of Life has grown steadily since its launch although data cleaning and the change in Global Species Database contributors caused a short term dip in the number of accepted species in the 2013 Annual Checklist compared with the 2012 checklist.  The Annual Checklists are published in April each year.

As well as an increase in overall content, the community behind the Catalogue of Life continues to grow through the addition of new Global Species Databases.  Each database has one or more editors who are experts in taxonomy and are able to convert a mountain of taxonomic literature and experience into a simple checklist of species that they consider should be accepted in a particular group.

Catalogue of Life's daughter project, i4Life, has been helping build the infrastructure underneath the Catalogue and has directly funded the inclusion of new names via grants to GSD owners.  The i4Life project took the estimate of 1.9 million species globally as it's benchmark for a complete catalogue based on the review by Arthur Chapman - Numbers of Living Species in Australia and the World (2nd edition) so we consider current coverage to be almost 80% of our original target however new species continue to be discovered and estimates are now close to 2 million species meaning we have nearer 75% of the new target - this is still a phenomenal achievement.

Wednesday, 30 October 2013

Example uses of the Catalogue of Life: Media libraries

Anyone who has ever searched an online image database looking for a particular species by scientific name will know it can be a frustrating exercise. Often the name you enter brings back a completely different species from the one you are looking for, or else it returns nothing at all. Yet this may not be due to the species identification skills of the photographer, or the online library's lack of images, but rather a result of the limitations of the controlled vocabularies that index these media libraries. For specialists that want to find images by scientific name, and for nature photographers with identification expertise that want to sell them, the integration of the Catalogue of Life taxonomy into these systems could certainly help yield better results. The Catalogue of Life uniquely offers a simplified and unified hierarchical classification across all organisms plus one accepted name for each species, and as such, has the potential to be used as an indexing mechanism for this kind of file management.

Commercial (and social) online contributor-based media libraries (such as iStockphoto, Shutterstock, Corbis Images, Getty Images, Alarmy, Dreamstime) have had, and continue to experience, exponential growth. Indexing and retrieval effectiveness across an international market are key to their usability and profitability.  Nature is arguably the most photographed and videoed area of life, and keywording or tagging such images for later retrieval ideally includes identifying key organisms present in each shot. So why is it that managers of these media libraries find it difficult to index nature-related information in an efficient and accurate way for their specialist users and contributors? 

Fig 1: Just some of the international common 
names held in  Catalogue of Life for this 
well-known and  wide-spread species of conifer 
Pseudotsuga menziesii

花旗松 (hua qi song) - China
British Columbia fir -  Canada, France
Douglas spruce  - Canada, USA
Douglas-fir - UK, USA
Oregon pine - UK, USA
Douglas d'Orégon - France
sapin de Douglas - Canada, France
Douglasfichte- Germany
Amerikai duglászfenyo - Hungary
Abete di Douglas - Italy
Douglasgran - Norway
The naming of organisms, both scientific and common, is rife with synonymy (different name, same species) and homonymy (same name, different species). Where common-use names are both language and location dependent (see Fig 1) and scientific names, while internationally recognised, can change periodically as competing academic views take precedence as was shown in a previous blog post on elephants. In addition, the ability to name a subject is dependent on the expertise of the contributor or the index creator, leading to varying levels of specificity, where an animal to one person, may be a pig to another, and Sus scrofa domestica to someone else! The sheer number and complexity of names can cause a great deal of confusion as is shown in the example at the end of this post.

For contributor based media libraries, controlled vocabularies are one of the most effective ways to control synonyms, arrange terms into hierarchies (to broaden or narrow search terms based on level of expertise), and determine other related or associate terms. As museum collection managers have shown for centuries, the Linnaean taxonomy of binomials in a rank-based classification is the most effective controlled vocabulary to deal with these issues in relation to species. An accepted scientific name offers a unique and universal code for every species and can act as the indexing tag of all other names - for example with plants, it can index common names, horticultural cultivar names, food ingredients and natural products. Yet the current controlled vocabularies and tagging systems used by media libraries are not adequately curating species, leading to missing, limited or incorrect search results. More names need to be added to these vocabularies, existing content re-tagged and a expert curated taxonomy decided upon, before they can hope to service expert users and handle the continued expansion of content predicted. Unfortunately no single, complete, electronic list of accepted species names exists anywhere, let alone with associated common names and synonyms. But by adding the most comprehensive list of accepted species and rank names, and manoeuvring contributors through the controlled vocabulary to ultimately choose one (as the defining tag) would be a step forward.

The Catalogue of Life is working to complete an inventory of life on earth where all known species (~1.9m plants, animals, fungi and micro-organisms) are named, documented and made available on the web.  This global quality-assured checklist currently holds over 1.4m accepted names, 1m synonyms and 0.5m common names (in multiple languages) and is expanding every month.  The Catalogue of Life is already used as an indexing mechanism by the world’s largest online biodiversity providers (European Nucleotide Archive, Encyclopedia of Life, IUCN Redlist, GBIF) and as a synonym expansion search tool (ie type in one name it will find resources that include all known synonyms too) for text-based resources such as Biodiversity Heritage Library and the Dictionary of Natural Products. While science has different motivations and methods to update and curate their datasets from those of the commercial media library, both desire the same end product - a quality controlled, up-to-date, sustainable, multilingual and internationally relevant taxonomy. Utilising the expertise of the curators that supply the Catalogue of Life is the best option for the rapid enhancement of controlled vocabularies for nature-related collections.

The example of 

Fig 2: Gentianella amarella
When submitting a species image, such as the plant in Fig 2 to the istockphoto image library, you are asked to tag it with appropriate keywords. For this image it seemed logical to include the following: 

  • Gentianella amarella -  the scientific name
  • Autumn Gentian - the UK common name
  • Gentianaceae - the plant family
  • Gentian - the common name for the family, and lastly
  • Northern Gentian - another known common name for this plant from Canada. 

What is returned from the image manager is as follows:

  • Northern Gentian is 'unknown'
  • Autumn Gentian is 'unknown'
  • Gentianella amarella is 'unknown'
  • Gentianaceae is recognised but as a synonym of Lisianthus 
  • Gentian is recognised and can be included

This example shows the current limitations of the controlled vocabulary of istockphoto in not adequately dealing with the indexing of this species. Apart from not recognising the common names of a relatively well known wildflower in the UK and Canada, it also is unable to recognise the scientific species name. Furthermore, it is erroneously matching the whole plant family Gentianaceae to the name Lisianthus. Lisianthus is a commonly used name for the cultivars of one species of Gentianaceae in the small genus Eustoma. However, in science Lisianthus is the name of a different genus in Gentianaceae, and Gentianaceae has 78 possible genera, of which Lisianthus is just one.

What this means for the Gentianella amarella image seeker is they will probably experience the frustration noted at the start of this post. Only a broader search term will help find their species (in this case 'Gentian'), that will then return many unwanted images that they will need to wade through to find one that they want.  If Gentianella amarella had been in the vocabulary, both image contributor and end user would have had a greater chance of success.

Tuesday, 29 October 2013

Taxon of the Day: Dillwynella voightae

Dillwynella voightae

Today's Taxon of the Day has been produced by Thomas Kunze, he writes:

Dillwynella voightae Kunze, 2011 is a small snail from the marine gastropod family Skeneidae and is in the Catalogue of Life care of WoRMS Mollusca database. The shell is whitish, porcelain-like, with not much sculpture and has a maximum diameter of only 5.8 mm. It lives at a depth of around 600 meters on sunken wood. Its diet includes both wood fiber and bacteria that lives on the wood. It is named in honour of Dr. Janet R. Voight, who on a research expedition collected 18 specimens from a piece of wood dredged from the ocean floor. This type location, off of the coast of Louisiana in the Gulf of Mexico, is the only place it has so far been found.

Whale fall, sunken wood, algae, fish bones, tortoiseshells or squid jaws form a special habitat that is widespread in oceans. Pieces of wood or complete trees are transported into the sea and then at some point sink. In the deep parts of the ocean nutrients are rare and a whole range of living things like bacteria, snails and bivalves will settle on the fallen organic matter and slowly biodegrade them.

Empty shell
As you will see in the author string, this species was described by me. You might ask how are species found and described nowadays? Well in this particular case, I was visiting the Field Museum in Chicago looking for members of the Skeneidae family for my PhD, and while looking through their collection I found a jar labelled Dillwynella sp. I knew this meant that the museum had done a preliminary assessment of family, but as yet, had not identified the specimens down to species level. As is procedure when involved in scientific research, I was able to get 4 of the 18 specimens shipped from the US to Sweden were I was working at the time. After close examination it became obvious to me that this specimen was rather different to all other known species of Dillwynella and so a description of a new species seemed appropriate and was published in 2011 in the Nautilus.

There is only one other species of Dillwynella known from the Atlantic, D. modesta (Dall, 1889) with the other 7 known species found off the coast of New Zealand and Japan. All marine gastropods (like slugs, snails, winkles, tingles and so on) are contributed by WoRMS Mollusca to the Catalogue of Life.

CoL Annual Checklist: Dillwynella voightae
CoL contributor: WoRMS Mollusca
Image copyright: T Kunze

Wednesday, 23 October 2013

Taxon of the Day: Menura

Lyrebird - Menura novaehollandiae
Today's Taxon of the Day has once again been produced by Thomas Kunze, he writes:

Many birds are well known for their beautiful song, for example, the blackbird and nightingale are held in high musical regard here in Europe. Today’s Taxon of the Day has one of the most extraordinary birdsongs ever heard on earth, with David Attenborough describing it as “possibly the most elaborate, complex and beautiful”. The genus Menura from Australia contains two species commonly known as lyrebirds both of which are listed in the Catalogue of Life. Not only can these birds mimic the sound of a whole range of other forest birds such as the kookaburra, but human-made noises too, such as, camera clicks, street works, chainsaws and car alarms, all of which are recorded in the BBC video. Who else can do this than the Australian lyrebirds?!?

These sounds are made by the male lyrebird when trying to attract a partner (why else?). He forms a hill of earth to stand on, or places himself on a branch and starts to sing. The females will be attracted to this unique vocal performance and snatch a view of the male's astonishing tail feathers before deciding if he is appropriate for mating. The lyrebirds special capability of adopting all kinds of noises into their song are what makes it unique. Once there were just forest noises in the birdsong, but now because all sorts of sounds have encroached into the bird’s habitat they too are included. To be able to produce this variety of sound the lyrebird has developed one of the most elaborate syrinx (vocal organ) of all birds. 

There are two species in the genus Menura: Menura alberti Bonaparte, 1850 and Menura novaehollandiae Latham, 1802. Both species form the family Menuridae a member of the order Passeriformes (aka perching birds). The original distribution of both species are the mountain forests in South-Eastern Australia.

The species Menura alberti has been assessed by the Catalogue of Life's partner IUCN Red List as Near Threatened.

CoL Annual Checklist: Menura alberti  
CoL contributor: ITIS Global
Image copyright: By Melburnian (Own work) [GFDL, CC-BY-SA-3.0 or CC-BY-2.5], via Wikimedia Commons

Tuesday, 22 October 2013

i4Life Part 5: Global Biodiversity Partners

Global Biodiversity Programmes

Previous posts in this series:
Part 1: Improving the world's taxonomic data indexing
Part 2: Global Species Databases
Part 3: The Catalogue of Life

In the last post we looked at how the Catalogue of Life shares its data with partners and collaborators through the Download Tool and Web Services. So who are the Catalogue of Life's partners and collaborators? Well the Catalogue of Life is itself a partnership between the officiating ITIS and Species 2000 organisations and the confederation of 139 contributing expert-curated taxonomic databases. But now, as a result of the i4Life project the collaborative reach has grown even further to include leading global biodiversity programmes and international research groups.

Today the Catalogue of Life is used as a common index for taxa in the catalogues of five global biodiversity programmes - IUCN Red List, Global Biodiversity Information Facility, European Nucleotide Archive, Barcoding initiatives and Encylopedia of Life. This index has acted as a backbone for a growing harmonisation between these catalogues, making it possible to share more easily names that are present and identify those missing in each. This process also includes the recognition of data stored under synonymic names, because in addition to 1.4M plus species names held, the Catalogue of Life currently contains over a million synonyms too. This allows partners to enhance their own catalogues by including names from the Catalogue of Life that they do not have, meaning a more comprehensive taxon search can be conducted by users in each of their own data portals. Through the Piping Tool (our next post!) the Catalogue of Life can receive names from partners that it doesn't hold, but can only include them once they have been assessed by taxonomic experts of the contributing Global Species Database (GSD) for that taxon. Once placed (or not), updated GSD checklists are then sent back to the Catalogue of Life for inclusion in the next edition of the Dynamic checklist, before once again being made available to partners through the Download and Web Services. Many of the additional names that are circulating may be synonyms or even misspellings, but until this i4Life data flow is complete it is not possible to know exactly how many valid species names that constitutes. Names that the Catalogue of Life is missing, while probably a small fraction of the total are highly significant. While the Catalogue of Life is the largest expert-curated species indexing mechanism currently out there, if it can not index all names global biodiversity programmes hold, its taxonomy is not as useful to partner programmes as it otherwise would be. However, the lack of Global Species Databases for some taxa means this is an incremental rather than an exhaustive process. Through i4Life the e-infrastructure is now in place to keep this data flow moving. In the meantime, the Catalogue of Life, global biodiversity programmes and Global Species Databases are all enhancing the quality of their data and for the end user this will mean more agreement across data portals and less confusion.

It has been no easy task establishing this exchange, both setting up these global partnerships, and agreeing appropriate methods to make sharing species information as painless as possible. That is in addition to the actual development of the e-infrastructure and making it operational and sustainable. But today, the Catalogue of Life now delivers a refreshed instance of the Catalogue of Life taxonomy in an internationally recognised data exchange format (a key achievement of i4Life) on a monthly basis for use in the heterogeneous catalogues and portals of global biodiversity programme partners. This level of networking, cooperation and integration demonstrates the level of commitment in the biodiversity community to shared goals and a desire to achieve them collectively.

Below is a brief overview of what the global biodiversity programmes are that form i4Life's Global Biodiversity Partners, and how they are exploring or aggregating different aspects of biodiversity knowledge that includes global species distribution modelling, genome and sequence diversity, species identification using DNA Barcodes and conservation status. What is common among them all is the need for a taxonomic index from which all their other data can radiate - this is where the Catalogue of Life comes in. For more information please find a link to each of their data portals.

IUCN Red List
The IUCN Red List is a database of information related to a species risk of extinction and conservation needs. Information is presented at the species level and therefore the Red List has at its core, a taxonomic backbone. It is now widely recognised as one of the fundamental tools to support conservation planning, management, monitoring, and decision making, with among other things growing value for broadening and strengthening our understanding of human impact on biodiversity.


Global Biodiversity Information Facility (GBIF)
GBIF is a distributed and digital infrastructure which builds upon the collective efforts of and contributions of thousands of scientists in hundreds of institutions across the world through aggregation of their data. It also serves many different communities. The richness and importance of its biodiversity data, in particular its distribution data, is widely used by different organisations in science and society. The Convention on Biological Diversity and other international conventions, land-use planners and the agricultural sector, are all asking for new services which GBIF can help to deliver. The Catalogue of Life taxonomy feeds into the GBIF infrastructure.


European Nucleotide Archive (ENA)
The European Nucleotide Archive provides a comprehensive, accessible and publicly available repository for nucleotide sequence data. Nucleotide sequence information is crucial to our understanding of biology, from genetics and molecular interactions through to organism-wide processes. Free access to nucleotide sequence data is therefore essential for life science research. As large-scale sequencing becomes faster and cheaper, the need to deposit, search and analyse information in a central archive that is publicly available and easily accessible continues to grow.


Barcoding Initiatives
The various “barcoding of life” initiatives like BOLD, CBOL, ECBOL, iBOL or QBOL are currently some of the major sources of new species. BOL projects and principles are helping scientists to discover substantial numbers of cryptic species. New tools have been created to help identify existing taxa and new tools are currently under development to discover the vast majority of the species biodiversity that remains unknown. In 2004, CBS-KNAW launched the Mycobank initiative, introducing new standards and methods for the deposit and the registration of new species names and associated data. Unlike existing species registration systems, it can handle nomenclatural, taxonomical, geographical, bibliographical, morphological, physiological, chemical, electrophoretic and other molecular data.

Website (Mycobank)

Encyclopedia of Life (EoL)
The goal of the Encyclopedia of Life is to compile and make available over the Internet as much information as possible about the world’s species of plants, animals and microorganisms. It started as a collaborative effort involving several of the world’s leading science institutions - Harvard University, the Field Museum, the Marine Biological Laboratory, the Smithsonian Institution, the Biodiversity Heritage Library, and the Missouri Botanical Garden - and includes a role for the general public and other international partners too.


Next up: Piping Tool

Wednesday, 16 October 2013

Taxon of the Day: Smeagol manneringi

Unidentified species of Smeagol

Today's Taxon of the Day has been produced by Thomas Kunze for all the Tolkienists out there. He writes:

Smeagol manneringi Climo 1980 is a marine slug originally found on a limestone gravel beach in New Zealand. It lives in the intertidal zone, well hidden under rocks and has a strange appearance and quite drab exterior. A recent phylogenetic analysis allowed Climo to place it in Pulmonata, an unranked and informal group of gastropods. In 2010 Neusser et al. placed it in the Eupulmonata, which includes other significant gastropod species like the members of the slug genus Arion, which includes the Spanish slug, one of a number of species that we commonly find eating our gardens, plus the highly appreciated culinary Burgundy snail.

“A slug to find them ...”
Climo named it after the famous character Smeagol from Tolkien’s notable novels The Lord of the Rings and The Hobbit. After finding The ring (“My Precious" ) the hobbit Smeagol becomes Gollum, living most of the time subterreanean as the slug Smeagol does. As a further reason for this name Climo mentioned that the slug Smeagol looks quite strange from the outside, but after studying its internal anatomy the pulmonate relation could be revealed rather easily.

Up to now four further species of monotypic genus Smeagol have been described in the family Smeagolidae.

CoL Annual Checklist page: Smeagol manneringi
CoL contributor: WORMS
Image copyright: 
Katharina M. Jörger, Isabella Stöger, Yasunori Kano, Hiroshi Fukuda, Thomas Knebelsberger & Michael Schrödl (top)  Link

Guillermogp Guillermo García-Pimentel Ruiz [Public domain], via Wikimedia Commons (bottom)

Monday, 14 October 2013

Taxon of the Day: Hamamelis

Hamamelis virginiana
Last month’s Catalogue of Life Dynamic Checklist saw the arrival of many new Global Species Databases including one that contains the genus Hamamelis in the plant family Hamamelidaceae. This taxon includes the species H. virginiana L. (Witch-hazel),  H. vernalis Sarg. (Springtime Witch-hazel) and H. ovalis S.W. Leonard (Big Leaved Witch-hazel) all found in North America, H. japonica  Sieb. & Zucc. (Japanese Witch-hazel) from Japan and H. mollis Oliv. (Chinese Witch-hazel) from China. As a result of the new Global Species Database (GSD) the Catalogue of Life Dynamic Edition now lists all the above species. However, the Annual Checklist updated once-per-year currently lists just four. This is because prior to the new GSD's arrival Hamamelis was formed using a proto-GSD that combined ITIS Regional database (covering North America) with the Catalogue of Life China database. This meant H. japonica, not found in either of these locations, was unfortunately excluded. Next year all five species will be in the Annual Checklist in addition to the Dynamic Checklist as a result of the new GSD.

A deciduous shrub or small tree with a short trunk, Witch-hazel bears many spreading, twisted branches. Because of its ability to flower at a time when other plants are dormant, it is a widely grown garden plant with many known cultivars. It reproduces mainly by seed and has capsules that burst open explosively when mature, launching their contents a fair distance from the parent plant.

The species we have here at Reading University in the Harris Garden is H.virginiana which is now showing some beautiful autumnal colour. Soon the red leaves will drop off leaving the branches bare as if ready for winter, but before long they will burst out into bright yellow, twisted ribbon-like flowers. If you can’t wait or are unable to see it, this time-lapse video below does a good job of recording this event albeit with a Hamamelis cultivar elsewhere.

The etymology of the species epithets describe different aspects of its appearance, distribution or flowering time. Where japonica means from Japan and virginiana is Latin for Virginia, probably a result of its native eastern North American distribution. Then mollis in Latin means "soft" referring to the felted leaves of this species, and vernalis translates to spring in Latin, referring to the later flowering time of this species.  Finally, ovalis means oval, a likely reference to the shape of the leaves. The common name comes from the historical use of the twisted branches as ‘witching sticks’ used as dowsers in the search for water. Hazel describes the resemblance of the leaf shape to those of the hazelnut (ie Corylus).

Hamamelis or Witch-hazel has a well-known eponymously named homeopathic remedy, where extracts, lotions, salves are produced from the bark, twigs and leaves of the plant. For centuries it was used to cure a whole range of bodily ills but mainly these days is used for minor problems such as bruises, sores and inflammations. This is because the used parts of the plant contain compounds which reportedly have astringent, anti-irritant, antioxidant, and anti-inflammatory properties.

The species H. ovalis  was quite recently described to science (2004), which at that time in the plant world was almost the equivalent of finding a new species of mammal. For a detailed account of its discovery see the following web page.

CoL Dynamic Checklist: Search on Hamamelis 
CoL contributor: World Plants
Image copyright: By Jason Hollinger [CC-BY-2.0], via Wikimedia Commons

Thursday, 10 October 2013

Taxon of the Day: Panthera

Today's post has been produced by Thomas Kunze he writes:

Previously on Taxon of the Day we featured two members of the cat family Felidae – the cheetah and the Scottish wild cat. Today we turn our attention to the genus Panthera also in Felidae, which includes just four species, all described by Linnaeus, and all extremely well known. They are:

From top down: Tiger, Lion, Jaguar, Leopard
Lion, Panthera leo (Linnaeus, 1758)
Tiger, Panthera tigris (Linnaeus, 1758)
Jaguar, Panthera onca (Linnaeus, 1758)
Leopard, Panthera pardus (Linnaeus, 1758)

These four species often referred to as the big cats are easily recognisable species and can be seen in many zoos around the world. They have roaring as one of their common traits, in addition to all being top predators in their habitats. In the wild, the adults of all species live solitary, except for lions who live in prides, meeting-up only for mating. Lions are mostly associated with sub-Saharan savannah landscapes in Africa, although there is also a small population living in western India.

The tiger is the world’s biggest living cat distributed in different Asian habitats from the tropics in south East Asia, to Russia in the north. Its range was once much larger than it is today.

The smallest member of the big cats is the leopard which was once present all across Africa, the Arabian Peninsula and the Far East Caucasus. Nowadays, it has a very scattered distribution but still maintains a wide range in sub-Saharan Africa.

The jaguar is the only big cat living in the New World with a range from Central America, down to the Amazon Basin and northern Argentina. There have also been a few recorded in the very south of the United States. 

Sightings of black and white offspring of certain Panthera species are highly sought-after. Melanism, a black pigmentation of the hairs, create the black coloured fur which can occur in jaguar and leopard species. In both, the rosette pattern is still visible, especially in good light (see image below). These variants are commonly referred as Black Panthers. White tigers have a white basic colouring of the fur but are not considered true albinos, because of the blue eye colour and black stripes.

A black Panthera onca 

Recent taxonomic approaches based on molecular data, like that used by the Catalogue of Life’s parnter IUCN Red List, also include the snow leopard, Unica unica (Schreiber, 1775) in the genus Panthera as Panthera unica. All species are listed by IUCN Red list with the tiger as Endangered, the lion as Vulnerable and the status of leopard and jaguar as Near Threatened. Worse still some subspecies of tiger and leopard are Critically Endangered. Like other mammals, these species are provided by ITIS Global to the Catalogue of Life.

CoL Annual Checklist: Panthera 
CoL contributor: ITIS Global
Image copyright: See page for author [CC-BY-SA-3.0], via Wikimedia Commons (top), Public Domain (bottom)

The Catalogue of Life is Moving!

The idea for the Catalogue of Life developed in the early 1990s shortly after Frank Bisby’s arrival at Reading University. Initial funding led to the first release of the Catalogue in 2000 with over 200 thousand species.  The initial aim was to have substantially completed the Catalogue of Life by this date but it became clear that far less taxonomic data was available in a readily accessible and electronic form than was first expected.  However this first publication, containing around 10% of known species proved an important step in realising future grants and projects that have now built the Catalogue to over 1.4m species. The steady growth of the Catalogue accompanied the increasing use of and dependence on the internet, not just by scientists but by the general public. This made a web-accessible list of all living things an extremely timely and welcome project for both individual and institutional users.  Over the past five years the Catalogue has been supported by two substantial Framework 7 e-infrastructure grants: 4D4Life and i4Life. These grants have allowed the Catalogue to continue its steady growth towards the target 1.9m known species despite it becoming increasingly hard to identify sources of high quality data to fill the steadily reducing number of taxonomic gaps.

Catalogue of Life continues to grow every year
However while grant based investment in the e-infrastructure for Catalogue of Life has been steady and substantial it remains difficult to find funding to generate the underlying data, especially because funding has a regional basis and the Catalogue is a truly global collaboration. The achievement of exceeding 70% coverage of all species means that the Catalogue has moved from a research project to a product that is sufficiently complete to be of value to individual and project based users. There is now a steady flow of requests to use the Catalogue of Life as a complete list of species for reference in other projects. It is now used by the major biological data portals ENA, GBIF and IUCN to provide a reference taxonomy to which their data can be linked. It is used by commercial publishers and some search engines as well as providing a species index for the EDIT platform.  Through these partners the Catalogue of Life is providing unique reference material on which biologists can assess the current state of global biodiversity.

4D4Life project meeting at Reading

The Catalogue of Life at Reading University was the major research activity of the late Prof Frank Bisby, the last academic to hold the established Chair of Botany at Reading. The project developed from one person on one computer to a dedicated laboratory filled with active staff developing both the content and the infrastructure for the Catalogue. Content was developed in close collaboration with ITIS, who continue to provide the taxonomic backbone of CoL as well as species level datasets.  The electronic infrastructure developed in collaboration with Cardiff University and ETI in the Netherlands. Frank Bisby’s drive to complete this project led to many long days in the lab, a huge international telephone bill and the close identity of Reading University and Catalogue of Life generated by Frank’s frequent speeches at international conferences where he tirelessly persuaded other scientists that they should join this project. The sudden death of Frank during the 4Life projects led to a more distributed management of the Catalogue of Life with the secretariat remaining active at Reading University but an increasingly important role for the international team of directors for Species 2000 and for the Catalogue of Life Global Team who oversee content and policy for the Catalogue.
Fern checklist
Filling gaps -  ferns have
recently been added to the Catalogue

The link with ITIS established in the first days of Catalogue of Life provided strong support throughout this period of change. Alastair Culham, project leader for i4Life stepped in to manage the completion of the 4D4Life project and, supported by the excellent i4Life team, has converted the Catalogue of Life into a product with international presence. The editorial continuity of the Catalogue of Life has been ensured by the steady work of its Executive Editor Dr Yuri Roskov who has now been with the project for more than a decade. Yuri continues to bring ideas that help to complete the Catalogue yet remains strict about the quality of content. Currently the i4Life team at Reading spans six nationalities each bringing their personal views to development of Catalogue of Life.  At the end of the i4Life project the day-to-day running and management of Catalogue of Life will be transferred to Naturalis in the Netherlands who have committed salaries and resource to running Catalogue of Life for the next five years, as the first host in a rolling five year programme allowing all appropriate organisations to have the opportunity to care for and build this indispensable resource. The process of building the Catalogue of Life will never be completed because thousands of new species of life are discovered and named every year. However, we expect the original target of 1.9m species to be reached by the end of the decade if we continue to add species at the current rate.  

Friday, 4 October 2013

Catalogue of Life in Munich

Posters at the conference
Last month Yuri Roskov and Thomas Kunze attended the 106th Annual Meeting of the German Zoological Society 2013 in Munich, Germany. Here is their account:

Almost every year since 1890 the Annual Meeting of the German Zoological Society has brought together zoologists from all specialisms - neurobiologists, ecologists, physiologists and taxonomists to name but a few. This year in southern Germany, over five hundred zoologists come for four days to the main building of the Ludwig-Maximilians-University Munich to exchange their ideas in numerous lectures and poster presentations. Of course that is where we had to be as well.

On behalf of the Catalogue of Life we presented two posters: The Catalogue of Life: plant species for zoologists and Towards a Global Inventory of Animal Species. The aim was to show how the Catalogue of Life can be a good source of taxonomic data for groups in which the user does not have expertise. So a conference with a wide range of participants was highly appreciated to test this.  The posters we presented displayed to potential users how complete the Catalogue is in both plants and animals at the moment and how they can access and use this data for their own work. So for example, checking species names and concepts and classification of related taxa in the Catalogue might be useful in habitat mapping or food chain analysis. The Catalogue of Life is a easy-to-use, source of primary taxonomic knowledge on plants, fungi, microorganisms, bacteria and viruses for zoologists. Enabling them to link their own taxa with hosts, parasites, food sources, symbionts and other members of ecological association.

Furthermore, we met contributors to our datasets and looked for new partners especially from different insect groups where the Catalogue has gap areas. Overall this meeting gave us a great opportunity to target many biologists and explain to them directly our product.

 Posters can be viewed on the i4Life events page.

Thursday, 3 October 2013

i4Life Part 4: Download and Web Services

Download and Web Services 

Previous posts in this series:
Part 1: Improving the world's taxonomic data indexing (inlcudes full data flow diagram)
Part 2: Global Species Databases
Part 3: The Catalogue of Life

There are a number of ways to access the Catalogue of Life. For checking species names or classifications you can do it online through the search and browse interface. You can also download the results of that search using the export option on any results page. However, if you are a user who needs access to all of the Catalogue, like partners in the i4Life project, Download and Web Services are the best method to transfer large quantities of data.

The Download Service has a graphical user interface that is accessed through a password-protected page on the i4Life website. Web Services are accessed through a URL anywhere. The instructions on how to do this are found on the Web Services page of the Catalogue of Life website. Both methods enable access to the Catalogue of Life database to allow transfer of its contents in DarwinCore Archive format, an essential step in the i4Life data flow shown in the diagram above. Anyone can activate an export process of the Catalogue data using the Download Service (see image below) once they have registered and been given a password. It is a case of selecting the data that is required then pressing a button and the data is automatically downloaded to your computer as a zip file. If you do not want the whole Catalogue, the form allows you to narrow down the export to a specific taxon by selecting it from the drop down box for each rank. So if you want to limit your export to a specific order, family, genus etc., you can. You can also just download the classification without the species names, or alternatively the species names without the classification for any chosen taxon. It is not necessary to have any programming skills to operate the Download Service, but you may need to understand relational data tables and software to be able to do anything useful with the data once you have exported it. As soon as the Dynamic Checklist is updated each month, users can download all or part of this refreshed instance of the Catalogue of Life using the Download Service in this way.

i4Life Download Service interface

This process of obtaining the Catalogue of Life using the Download Service requires human involvement through clicking buttons and selecting options. Our partners and collaborators generally want a more automated way of getting the Catalogue of Life data into their systems. Preferring to control the activation of this process at their end. This is where Web Services come in. Web Services are the Catalogue of Life’s equivalent to APIs (Application Programming Interface). What is an API? Well it is a method that allows one person’s website to plug into another. The instructions that the Catalogue of Life supply on its Web Services information page enables a programmer to set up this exchange. If someone else builds something using this method they may call it a Catalogue of Life application (or widget or tool) and it can become a fixed part of their own e-infrastructure or website. Our partners use Web Services to do different things. For example, IUCN use it as a link-out service (see image below) from their Red List website. What this means is that when someone searches for a species on the IUCN Red List website that has not yet had its conservation status assessed, it would previously have returned ‘not found’. What now happens is that the user's taxon name (in the form of a text string) is used to dynamically query the Catalogue of Life database using Web Services. If there is a match the IUCN Red List website will use the returning information supplied by Web Services. What is returned is in a format that computers can transfer (either XML or PHP) and interpret. More code on the IUCN Red List website then displays it in a user-friendly way - a name with a hyperlink back to the Catalogue of Life taxon record. This lets the IUCN Red List user know that a species with this name does exist (ie they haven’t spelt it wrong and here is in the Catalogue of Life!), but it is not yet assessed.

IUCN Red List link-out to Catalogue of Life

This is one use of Web Services, but they can be used in many different ways to build different applications on other websites. While the advantages are clear for IUCN Red List and other partners in that it enables a real-time display of data from the current updated version of the Dynamic Checklist; for the Catalogue of Life the benefits are that it increases our user base, promoting the data via large, high profile biodiversity websites. Not only partner websites use Web Services, many individual users and commercial users are accessing the Catalogue of life this way, leading to a satellite distribution of the Catalogue worldwide. Any non-commercial user is free to use the latest edition of the Dynamic and Annual Checklists but is required when using it in another system to abide by the Terms of Use and notify the Species 2000 Secretariat. The reason for this is the Catalogue of Life's commitment to ensure proper credit and attribution goes to Global Species Databases, the knowledge-base upon which the Catalogue is built. By tracking users we can make sure that we are fulfilling our requirements as suppliers of this data.

Next up: i4Life Part 5: Global Biodiversity Partners

Tuesday, 1 October 2013

Taxon of the Day: Delphinidae

Dolphins are a group of species that get a lot of positive attention with some believing they will bring healing and physic power to us humans who interact with them. We don't like to go for the popularity vote on Taxon of the Day, but these mammals are currently newsworthy for being the main subject of the extraordinary winning shot in this year's Wildlife Photographer of the Year competition here in the UK. Like all 1.4+M species found in the Catalogue of Life, they are of course special.

The Catalogue of Life lists 37 species in the family Delphinidae, the taxon that holds all oceanic dolphins (fresh-water dolphins are found elsewhere). The common names of some species in this group can be misleading with a handful of them often referred to as whales, including the well-known Orca or Killer whale. However, although dolphins, porpoises, and whales all belong to the order Cetacea the Delphinidae are united by a number of shared characteristics including a single blowhole, streamlined bodies (ie wide in the middle and narrow at each end), and a beak-like nose.

The majority of species have been assessed by the Catalogue of Life's partner the IUCNRedlist for conservation status, with a few currently classified as Vulnerable or Near Threatened and the species Cephalorhynchus hectori listed as Endangered

Did you know that Dolphins, because of their excellent hearing, sonar capabilities and underwater vision, have been used by the US Navy for decades to locate things in the sea. Once they have undergone two years of training they are ready to embark on a mission such as mine-sweeping off the coast of Croatia.

The etymology of the name dolphin is interesting, having very little variation in most languages. To hear a in-depth, inspired and entertaining overview of its origins listen to this pod cast from the lively Dolphin Communication Project.

CoL Annual Checklist: Delphinidae
CoL contributor: ITIS Global
Image copyright: Public Domain

Monday, 23 September 2013

Taxon of the Day: Escherichia coli

E. coli magnified 10,000 times

Taxon of the Day ventures into the kingdom of bacteria with today's well-known subject. Escherichia coli is a bacteria that inhabits the gut of both animals and people and these days seems to be everywhere and in everything, from your handbag to your watercress. Although only one species, it includes hundreds of strains and serotypes - some good, some bad and some potentially life-threatening. E. coli found in human intestines are mostly harmless and help us to digest food, but others, like the strain officially known as VTEC O157 Phage type 2 VT2, is not so pleasant. VTEC is the abbreviation used for verocytotoxin, the cause of gastrointestinal disease producing  E. coli, of which O157 is the most common strain in the UK. Just recently a leading supermarket chain (in the UK) came to nationwide attention for recalling all of its own brand of watercress due to an outbreak thought to trace back to its produce. As of yet no proof of contamination has been found, but as a precautionary measure due to the many numbers of people who fell ill and were treated in hospital, it was felt necessary to act. It is reported that approximately 1000 cases of E. coli per year are treated in the UK. Transmission can occur in direct and indirect ways, most likely in the above case (if in fact true), it came through eating the watercress that had been contaminated by the faeces of infected animals through the soil.

The etymology of Escherichia coli, relates to the German physician Theodor Escherich (1857-1911) who discovered it in 1885. He did not originally name it after himself, which we all know is a no-no in taxonomy etiquette, but by a subsequent taxonomist who reclassified his original taxon.  coli is the Latin genitive of colon, referring to the intestine that this bacteria inhabits.

While E. coli may be considered the bane of consumers, supermarkets and health workers alike, it is frequently a friend to researchers because of its genetic simplicity and/or fast growth rates. It has been used as an aid in completing the Human Genome Project and as a potential new biofuel among many other positive uses.  It has also been the inspiration, along with other pathogens, for artist Luke Jerram to produce some dramatic art, proving that the agents of our pain can have the most striking and beautiful form when blown in glass!

CoL Annual Checklist: Escherichia coli
CoL contributor: Bacteriology Insight Orienting System
Image copyright: Public Domain

Friday, 20 September 2013

i4Life Part 3: The Catalogue of Life

Catalogue of Life in i4Life data flow

Previous posts in this series:
Part 1: Improving the world's taxonomic data indexing
Part 2: Global Species Databases

The Catalogue of Life provides a unified taxonomic index of living species on Earth. The first Annual Checklist was produced in 2001 on CD and had 204, 216 species. Today it is on DVD and also online and contains over 1.4 million species from over 135 contributing taxonomic databases. The Annual Checklist, as the name suggests, is a once per year fixed edition and so is a referenceable version of the entire Catalogue. Additionally, the Catalogue of Life today produces a second edition - the now monthly updated Dynamic Checklist, this differs in not being a fixed edition, but reflects updates to the supplier databases or inclusion of new databases as and when the Catalogue of Life receive them. Accessing the Catalogue of Life is easy, you can do it through the easy to use browse and search interface online (shown by the 'End User' in the diagram above) or for larger data consumers who want all or part of the Catalogue, it can be done programmatically. Which ever way you choose to do it, the Catalogue of Life tries to make data sharing as easy as possible while ensuring that our contributors are fully credited for their work.

The Catalogue of Life has many users from different fields, including students, nature enthusiasts, ecologists, museum collection managers, publishers, commercial natural product manufacturers and policy makers to name but a few. It is unlikely that a taxonomist would use the Catalogue of Life for the taxa they have expertise in, but taxonomists do use it to investigate related taxa outside of their own area. The Catalogue of Life also provides a taxonomic backbone for large biodiversity data suppliers (i4Life Global Biodiversity Programme partners) where the 1.4 million species names act as an indexing mechanism within their databases, from which all other biodiversity data can radiate. It is this important user group that is central to the goals of i4Life and the tools and processes that it has created to achieve them. Using the Download, Piping and Cross-mapping tools (which will be covered in later posts in this series) developed during i4Life, the Catalogue of Life and its global partners have been able to identify the differences and similarities in their taxonomic catalogues. What follows this, is a process of harmonisation between all catalogues. Once this process is fully completed, it will also give a clearer indication of the remaining gaps in the world's understanding of biodiversity that need to be filled.

Today, we look at what the Catalogue of Life team do (referred to as workflows) to create the monthly Dynamic Checklist before making it available online and to Global Biodiversity Partners for use in their own data portals.

As outlined in the previous post in this series, the Catalogue does not produce data itself, instead it acts like a publisher, assembling side-by-side expert taxonomist's global species checklists into a unified and simplified whole. Keeping this huge global taxonomic checklist well organised and up-to-date is a complex task. To achieve this the Catalogue of Life team carries out a process of taxonomic data integrity checks, editorial and aggregation once a month known internally as 'data assembly'. Some of this assembly is automated, some semi-automated and some manual, which ever method is used, the system in place (called the Workbench) allows for a flexibility to move between the three as and when is required.  The editor has overall content control, the data assembly team carry out the updates, and the whole process of production and roll-out is overseen by the CoL Systems Manager. 

Each month a number of steps or workflows occur: Firstly, files are collected or received from GSDs. These are then extracted and transformed into the Catalogue of Life Standard Dataset in text delimited files. Some come this way already, whereas others need more work to get them into a usable format. This data is then transferred into MySQL, the database software that houses the Catalogue of Life. The next step is running checks for data integrity and consistency. Some are carried out and resolved automatically, whereas others, for example ones that deal with nomenclatural and taxonomic issues, may need manual input. After all editorial decisions have been made and updates completed, checklists are merged to form the complete Catalogue and this is then converted to a production schema and deployed on the University of Reading servers.

The following team members have particular responsibility for different stages of this process:

Executive Editor (Dr Yuri Roskov) and Editorial Assistant (Dr Thomas Kunze)
The usability of the Catalogue of Life is dependent on the underlying management classification for unification and simplification. The biological classification systems of different kingdoms follow different Codes and nomenclatural practices, in addition some have alternative classifications within kingdoms.  As noted above, the end user of the Catalogue of Life is generally not an expert in taxonomy so to present multiple taxonomies from which to choose one, would decrease the usability of the Catalogue for one very large user group (although it may improve it for another!). So the management classification, by using a single taxonomy brings all taxa across all kingdoms into a coherent master view, and where possible enforces consistent nomenclature. To place a global species checklist within the Catalogue of Life a specific set of adjustments may need to be decided upon by the editors on where and how to insert it, to make it as consistent as possible, while not losing the essential taxonomic information it has been created to provide.

The Catalogue of Life retains the GSD's own classification
below entry point and uses the management classification above.
When a simplification occurs - for example using the management classification above a GSD insertion point (see picture above) or removal of ranks not recognised by the Catalogue of Life - it is done with the knowledge that the Catalogue of Life links every species to its source database, where a full classification and often extra, associated data can be found by the user.

The Executive Editor is also responsible for continually searching for and identifying new taxonomic data sources. If one is found, the Editor then also facilitates the necessary independent Peer Review process that is required to occur before being accepted into the Catalogue of Life.

Data Assembly (Luvie Paglinawan until July 2013 and Luisa Abucay)
Some data integrity checking does not need taxonomic input, so for example, running a query to check that all synonyms in a checklist have an associated accepted species name or making sure that abbreviations of taxonomic ranks are consistent within a checklist can be carried out automatically, results of which are passed back to the database supplier for consideration of inclusion with the next update. The data assembly do not just run automated checks, they also carry out data transformations as instructed by the editors and do most of the initial data gathering and combining phase of the process.

Systems Manager (Viktor Didziulis)
The informatics processes are overseen by the Systems Manager and once all the updates and assembly have been completed by the data assembly team and editors the Systems Manager will oversee the deployment of the new version of the Dynamic Checklist to the servers. There is more than one server, both for security (ie if one server goes down there is another one running) and for updates, where one can be updated, whilst the other is still running and then vice-versa, so there is no interruption of service to the online Catalogue of Life users.

Global Biodiversity Programmes receive an updated Catalogue of Life database via webservices and the i4Life Download tool. The topic of our next post in this series.

Next up: i4Life Part 4: Download and Web Services

Wednesday, 18 September 2013

i4Life at the Catalogue of Life Global Team meeting

The Catalogue of Life Global Team

This week Alastair Culham and Yuri Roskov from the Reading i4Life team travelled to Leiden, Holland for the annual Catalogue of Life Global Team meeting. The Catalogue of Life Global Team are a group of taxonomic experts from around the world who advise and decide upon scientific policy for the Catalogue of Life. They are made up of Global Species Database custodians, ITIS representatives and Species2000 (the body which governs the Catalogue of Life in financial and legal matters) directors.

On the agenda this time were: 
  • a report on the move of the Species2000 secretariat from Reading University to Naturalis in November this year; 
  • a proposal for updating the Catalogue of Life Management Classification
  • a presentation from Alastair Culham, the i4Life Coordinator, on the soon to be completed project's progress; and 
  • a discussion on how to include fossil taxa in the Catalogue of Life, something that has been presented as a priority by certain members of the Global Team. 
Global Team meetings are always a good chance to connect with parts of the confederation that is the Catalogue of Life.

Monday, 16 September 2013

The 106th Annual Meeting of the German Zoological Association, Munich, Germany, 13-16/09/2013

Thomas Kunze and Yuri Roskov attended the 106th Annual Meeting of the German Zoological Association, Munich, Germany, 13-16/09/2013

Taxon of the Day: Ursinia calenduliflora

Ursinia calenduliflora

Taxon of the Day moves to South Africa for an annual, endemic herb found throughout the region known as Namaqualand. Ursinia calenduliflora (DC.) N.E.Br. is just one of hundreds of flowering species found in this unique floral region. In peak springtime (usually mid-august to mid-september) the most spectacular flower displays can be found, where multi-coloured carpets span as far as the eye can see. Duration, timing and intensity of displays are all determined by weather,  mostly notably, amount and arrival time of rain. A good year can never be accurately predicted, although many try, as it can help with marketing the region as an annual tourist destination. Most of the year, this area of north west southern Africa is a barren desert, with very little green let alone other colours, which makes the short period of flowering, seemingly unimaginable beforehand, even more incredible.

Different floras and online resources, unsurprisingly considering its wide distribution over a region inhabited by indigenous and settler communities, cite different common names. Le Roux (2005) uses the Afrikaans name Berggousblom which translates to mountain daisy, others use english names including Namaqua Parachute Daisy (Manning, 2009) and Springbok rock ursinia. TotD would like to know what the The Nama people, native inhabitants of this region for thousands of years, whose language contain distinctive click click sounds, have named this plant. Something different to the above for sure.

A small dancing part of a big carpet
The genus Ursinia in the family Asteraceae, is named after the botanists Johan Ursinu (1608-1666) and contains 43 species in the Catalogue of Life supplied by the Global Compositae Checklist. All species collectively referred to as parachute daisies, are native to southern Africa, with the highest concentration in the Western Cape. One species is the exception, U.nana is reportedly found in Ethiopia.

CoL Annual Checklist: Ursinia calenduliflora
CoL contributor: Global Compositae Checklist
Image copyright: RLF Matthias

Wednesday, 11 September 2013

Catalogue of Life in Galway

Thomas Kunze and Viktoras Didziulis with their posters

The Catalogue of Life was represented at the 48th Annual European Marine Biology Symposium in Galway, Ireland on 19-23 August 2013, by Thomas Kunze (CoL Editorial Assistant), Viktoras Didziulis (CoL Systems Manager) and Yuri Roskov (CoL Executive Editor). Below is their account of the event:

The conference brought together over 200 academic practitioners in marine biology (biologists, oceanaologist, geologists, ecologists, taxonomists and project managers) from over 20 countries for networking and dissemination of their research. The conference highlighted the need for sustainable management of the oceanic resources against a backdrop of climate change and ocean acidification.

The Catalogue of Life presented the following three posters, which summarised i4Life activity in Workpackage 3 (2012-2013):

  • Overview of marine taxa suppliers in the Catalogue of Life by Yuri Roskov
  • Proto-GSD in the Catalogue of Life – a case study on Mollusca and Platyhelmintes by Thomas Kunze
  • Master data management and user services of the Catalogue of Life by Viktoras Didziulis 

With these three posters different aspects of the Catalogue were highlighted - the technical side, editorial work and the general composition of marine data. 

By visiting the conference, we were able to present the product - the Catalogue of Life, and gain access to both users and partners. The Editorial team is always on the look out for new contributors in gap areas to improve species coverage. In Galway, one of the outcomes of attending this meeting was that we were able to meet, among others, marine protistologists (people working with Protista and Chromista) and discuss the possibilities for additional Global Species Checklists in these groups.  From the partner side of things, we were happy to meet several members of WoRMS, a major contributor to the Catalogue of Life. The Catalogue Life has a wide user base across many different fields, ecologists being one such group; this conference with many in attendance, also gave us the opportunity to target new users in this area.