BAGS: Barcode, Audit & Grade System



OVERVIEW

BAGS emerged as a response to the growing awareness of the susceptibility of DNA barcode reference libraries to several types of errors and inconsistencies. These can arise at various stages of the barcoding pipeline, from the collection and identification of the specimen, through the DNA sequencing and subsequent uploading of the data to DNA sequence repositories, thus becoming potential liabilities for scientific studies which use DNA barcodes as their basis, such as metabarcoding.

BAGS enables the user to generate reference libraries which point out incongruencies between the species names and the sequences clustered in BINs, optimizing the process of selecting the most reliable specimen records and species to work with, according to the available data. This application is also meant to facilitate revision and curation of the reference libraries. Indeed, we encourage users of BAGS and of publicly available reference libraries, to retribute to the communitty by either contributing DNA barcodes to further expand the libraries or to review and curate data.

Given one or more taxonomic groups present in the BOLD database, or a user-provided species list (in the form of a tsv file), BAGS mines and subsequently performs post-barcoding auditing and annotation of a DNA barcode library of COI-5P sequences in an automated way. BAGS features the following tools and options:


  1. User library selection - taxa search or user-provided species list.

  2. Library compilation - application of quality filters to the sequences and specimen data

  3. Optional marine taxa selection/exclusion filter through the WoRMS database.

  4. Auditing and annotation - implementation of the grade ranking system.

  5. Output and annotation-based file sorting - fasta compilation according to grades and auditing report.








WORKFLOW


Firstly, to use the application you should make sure the taxonomic group or groups you want to annotate are present at the BOLD Systems database, considering that intermediate taxa are usually the most likely to be absent. Additionally, the spelling of the taxa should be identical to the spelling according to the information at BOLD.

The app has three main options: download a tsv library for every species belonging to the taxa, download a tsv library for only marine species belonging to the taxa or download a tsv library for only non-marine species belonging to the taxa.

Then, after you enter the name of the taxonomic group/groups or enter a file with a list of species names, a data set will be created and curated following these steps:

  1. Downloading the data set in tsv file format, consisting of specimen data and its respective COI-5P sequence belonging to the chosen taxa, from the BOLD Public Data Portal.

  2. Filtering out the following from the data set:

    • Specimens with sequences of length below the threshold chosen by the user

    • Specimens without data on species name, BIN, lattitude or country of origin

    • Ambiguous characters occasionally present in the species name and COI-5P sequences

    • Specimens with sequences consisting of > 1% Ns, which are usually the most commonambiguous character

  3. In the case of the marine or non-marine taxa options, the data set is filtered once again, retaining or excluding only the species known to be from marine or brackish habitats, using their species name as reference. This is achieved using the WoRMS database, therefore, the download and annotation will take longer.

  4. Lastly, according to the quality and availability of the data of each specimen, qualitative grades from A-E are assigned to each species present in the data set. Then, several reference libraries in fasta format are created, which can be downloaded individually.

GRADES


The assignment of each grade is based on the quality, availability and replicability of the data and metadata for each species, as well as the quality and congruence of the COI-5P sequences, evaluated in accordance to their Barcode Index Number (BIN).


The BIN System is an online framework at BOLD that generates Operational Taxonomic Units (OTUS) by clustering barcode sequences algorithmically, grouping them in a manner that ideally, mirrors their respective specimen morphological identification.


The grades are attributed to each species according to the following criteria:


  • Grade A Consolidated concordance The morphospecies is assigned a unique BIN, which is also assigned uniquely to that species, plus the species has more than 10 specimens present in the library

  • Grade B Basal concordance The morphospecies is assigned a unique BIN, which is also assigned uniquely to that species, plus the species has 10 or less specimens present in the reference library

  • Grade C Multiple BINs The morphospecies is assigned more than one different BINs, but each of those BINs are assigned exclusively to that species

  • Grade D Insufficient data Species is not assigned discordantly, but it has less than 3 specimens available in the reference library

  • Grade E Discordant species assignment Species assigned to a BIN that is assigned to more than one different species. The specimen may match with a different species or display paraphyly or polyphyly





The grades were adapted from the following studies:

  • Costa, Filipe O., Landi, M., Martins, R., Costa, M. H., Costa, M. E., Carneiro, M., . Carvalho, G. R. (2012). A ranking system for reference libraries of DNA barcodes: application to marine fish species from Portugal. PloS One, 7(4), 1-9. doi: 10.1371/journal.pone.0035858

  • Oliveira, L. M., Knebelsberger, T., Landi, M., Soares, P., Raupach, M. J., & Costa, F. O. (2016). Assembling and auditing a comprehensive DNA barcode reference library for European marine fishes. Journal of Fish Biology, 89(6), 2741-2754. doi: 10.1111/jfb.13169


Download, audit and annotate library for all species


Download

Download, audit and annotate library for marine species


NOTE: This option selects only species that are considered as marine and/or brackish at WoRMS.
Download

Download, audit and annotate library for non-marine species


NOTE: This option excludes all species which are assigned exclusively to marine and/or brackish at WoRMS.
Download

Download, audit and annotate library for species list


NOTE: Untick the header checkbox if your file does not have a header for the species column.
Download



NOTE: Since the download process includes the auditing and annotation of the library, the report is ready once the download is concluded.
Make sure to refresh the page every time you are about to download a new library.

Download graded libraries in fasta format


Choose which grades to include in your library:

NOTE: Make sure the data set download is already completed

Graded library including only species with grade A
Download A library

Graded library including only species with grade B
Download B library

Graded library including only species with grade C
Download C library

Graded library including only species with grade D
Download D library

Graded library including only species with grade E
Download E library




Download graded libraries in fasta format


Choose which grades to include in your library:

NOTE: Make sure the data set download is already completed

Graded library including only species with grades A and B
Download AB library

Graded library including only species with grades A, B and C
Download ABC library

Graded library including only species with grades A, B, C and D
Download ABCD library

Graded library including species with all grades assigned
Download ABCDE library

Library auditing report



Barplots display



Citing:

João Tadeu Fontes, Pedro Vieira, Torbjørn Ekrem, Pedro Soares, Filipe O Costa

BAGS: An automated Barcode, Audit & Grade System for DNA barcode reference libraries


Useful links:

BOLD

WoRMS

iBOL

DNAqua-Net

CBMA

IB-S

ME-Barcode







Disclaimer

Despite the fact that utmost care has been taken by us to guarantee the effectivness and reliability of the web application, the use of the application is without any kind of warranty, expressed or implied. In no event shall the authors be liable for any damages of any type.