Updated: 28-11-2023

Providing data

The actual images used for model training can be provided in two ways:

  • (Preferred): Images are accessible via http and the url of each image is provided in the Images file;
  • If the former option is impossible (e.g. for data security/copyright reasons), images can be supplied on a disk containing subfolders for each species (or subspecies), where the folder name is the correct accepted scientific name according to a valid reference (such as

Further constraints on images include:

  • Allowed image formats: .jpg, .png, .bmp;
  • Minimum resolution: 500x500 pixels;
  • Images must be in colour.

Please make sure to supply any number of images of a taxon that you have available. Even a single image can be helpful because we pool all images per taxon.

Metadata input files

In addition to the actual images, some metadata is required to build the models. This includes, of course, the correct identification of the specimen in each image and other required and optional data, such as the taxonomic classification of the specimen.

The data supplier should give two input files:

  • Images file: A .csv file (see example below) listing information for all images;
  • Taxa file: A .csv file (see example below) providing taxonomies for all taxa present in the images file.

Get images file example Get taxa file example

File format

Input files must be comma-separated (,) files. Fields with commas in their value should be enclosed with double quotes ("). See also for details.

Details and examples for the required columns and allowed values for both the images- and taxa files are given provided on their respective pages. While the order of the columns is not important, please make sure that the column headers match the ones given.

Please make sure that the input images- and taxa files are encoded in UTF-8.


The images- and taxa input files will be automatically validated at Naturalis before we start the model training. Validation errors will be reported to the data suppliers. We will validate according to all constraints on fields in the images- and taxa files.

Qualified ID

In many source systems, IDs are mere integers. In order to not confuse e.g. observation records, taxa, and morphologies, we work with a combination of source system and original ID (uid) and source system, separated by a colon (:).

A qualified id is thus the <source_id>:<uid>. For example:

  • WRN:312313 for
  • COL:9849028342 for Catalogue of life.

Data suppliers can use their own source_id under the condition that they do not conflict with existing source_id’s.

Existing source_id’s:

source_id source system
NIA Naturalis Identification API
COL Catalogue of life
NSR Nederlands soortenregister
INAT iNaturalist
GBIF Global Biodiversity Information Facility
WRN / /
NBIC Artsdatabanken Norway
UK_iRecord UKSI
FINBIF Finland