Images
Updated: 28-11-2023Providing data
The actual images used for model training can be provided in two ways:
- (Preferred): Images are accessible via http and the url of each image is provided in the Images file;
- If the former option is impossible (e.g. for data security/copyright reasons), images can be supplied on a disk containing subfolders for each species (or subspecies), where the folder name is the correct accepted scientific name according to a valid reference (such as waarneming.nl).
Further constraints on images include:
- Allowed image formats:
.jpg
,.png
,.bmp
; - Minimum resolution: 500x500 pixels;
- Images must be in colour.
Please make sure to supply any number of images of a taxon that you have available. Even a single image can be helpful because we pool all images per taxon.
Metadata input files
In addition to the actual images, some metadata is required to build the models. This includes, of course, the correct identification of the specimen in each image and other required and optional data, such as the taxonomic classification of the specimen.
The data supplier should give two input files:
- Images file: A
.csv
file (see example below) listing information for all images; - Taxa file: A
.csv
file (see example below) providing taxonomies for all taxa present in the images file.
Get images file example Get taxa file example
File format
Input files must be comma-separated (,
) files. Fields with commas in their value should be enclosed with double quotes ("
). See also https://tools.ietf.org/html/rfc4180 for details.
Details and examples for the required columns and allowed values for both the images- and taxa files are given provided on their respective pages. While the order of the columns is not important, please make sure that the column headers match the ones given.
Please make sure that the input images- and taxa files are encoded in UTF-8
.
Validation
The images- and taxa input files will be automatically validated at Naturalis before we start the model training. Validation errors will be reported to the data suppliers. We will validate according to all constraints on fields in the images- and taxa files.
Qualified ID
In many source systems, IDs are mere integers. In order to not confuse e.g. observation records, taxa, and morphologies, we work with a combination of source system and original ID (uid) and source system, separated by a colon (:
).
A qualified id is thus the <source_id>:<uid>
. For example:
WRN:312313
for Waarneming.nlCOL:9849028342
for Catalogue of life.
Data suppliers can use their own source_id
under the condition that they do not conflict with existing source_id
’s.
Existing source_id
’s:
source_id |
source system |
---|---|
NIA | Naturalis Identification API |
COL | Catalogue of life |
NSR | Nederlands soortenregister |
INAT | iNaturalist |
GBIF | Global Biodiversity Information Facility |
WRN | Waarneming.nl / Waarnemingen.be / Observation.org |
NBIC | Artsdatabanken Norway |
UK_iRecord | UKSI |
DK_ART | Arter.dk |
APSE | SLU Sweden |
FINBIF | Laji.fi Finland |