Taxon namespaces and filtering

Updated: 28-6-2023

Results can be returned with various taxon namespaces using the parameter taxon_namespace (or defined by the token). This parameter does two operations

  • It maps taxon names and taxon IDs to a specific taxon namespace
  • It filters out results unknown to the namespace.
Info

Note that in the case of models fine-tuned for a specific data provider the taxon namespace is only used for mapping the taxa, but not for filtering, and in that case this section does not apply.

A namespace contains only the taxa provided by the respective data partners. Therefore, some images can return different results depending on the namespace used. In most cases this will be good, because taxa not present in a namespace that are e.g. not native to that data partner’s area will be filtered out. In the case of, e.g. exotic species, this might not be accurate.

To overcome this issue, we implemented the rule where, if the algorithm has 99% probability or higher, the taxon will not be filtered out. This prevents having an empty output in some cases. It also means that one can expect taxa unknown to the namespace in the output in specific cases.

Specifically, the first output of the items in the predictions can contain a taxon with a GBIF namespace or a NIA namespace. The shared (GBIF) namespace is used if the taxon unknown to the namespace was matched to a taxon in GBIF. The NIA (Nature Identification API) namespace is used if no match was made with GBIF, but a match between different data partners was determined.

In addition, a separate output per region group taxa_unfiltered is provided for which no filtering has taken place. NB: taxa_unfiltered will only be part of the output when actual filtering has taken place. The extra information in taxa_unfiltered can be used by data providers and/or implementers to potentially enrich their own taxonomy. Implementers do not have to process taxa_unfiltered in their applications.

Example 1

An example is given below for a result for which taxon_namespace=NBIC. Note that Eristalis arbustorum/abusiva is present in taxa_unfiltered but not in taxa because this taxon is not defined in the NBIC namespace. Probabilities are rescaled after filtering so that the API (correctly) predicts Eristalis arbustorum with high probability. Also note that Eristalis arbustorum/abusiva has a (shared) NIA taxon ID because GBIF does not support taxon pairs (or complexes in general).

{
  "predictions": [
    {
      "region_group_id": "image0?region=full",
      "taxa": {
        "items": [
          {
            "probability": 0.999992,
            "scientific_name": "Eristalis arbustorum",
            "scientific_name_id": "NBIC:22870",
            "scientific_name_id_shared": "GBIF:1541146",
            "scientific_name_shared": "Eristalis arbustorum"
          },
          {
            "probability": 5e-6,
            "scientific_name": "Eristalis",
            "scientific_name_id": "NBIC:22866",
            "scientific_name_id_shared": "GBIF:1491190",
            "scientific_name_shared": "Eristalis"
          },
          {
            "probability": 2e-6,
            "scientific_name": "Eristalis abusiva",
            "scientific_name_id": "NBIC:22867",
            "scientific_name_id_shared": "NIA:b22169aa63fe9f0d97587ae27d3072bf1dc4eb0d1abc014469213016",
            "scientific_name_shared": "Eristalis abusiva"
          },
          {
            "probability": 0.0,
            "scientific_name": "Eristalinus aeneus",
            "scientific_name_id": "NBIC:22864",
            "scientific_name_id_shared": "GBIF:1542830",
            "scientific_name_shared": "Eristalinus aeneus"
          },
          {
            "probability": 0.0,
            "scientific_name": "Eristalis nemorum",
            "scientific_name_id": "NBIC:186828",
            "scientific_name_id_shared": "GBIF:6098383",
            "scientific_name_shared": "Eristalis nemorum"
          }
        ],
        "type": "multiclass"
      },
      "taxa_unfiltered": {
        "items": [
          {
            "probability": 0.856912,
            "scientific_name": "Eristalis arbustorum",
            "scientific_name_id": "GBIF:1541146"
          },
          {
            "probability": 0.143082,
            "scientific_name": "Eristalis arbustorum/abusiva",
            "scientific_name_id": "NIA:fc6d4a4a5c59cff2bd2cf0ff25f3fee50b4c7f17a108e50da08f8710"
          },
          {
            "probability": 4e-6,
            "scientific_name": "Eristalis",
            "scientific_name_id": "GBIF:1491190"
          },
          {
            "probability": 2e-6,
            "scientific_name": "Eristalis abusiva",
            "scientific_name_id": "NIA:b22169aa63fe9f0d97587ae27d3072bf1dc4eb0d1abc014469213016"
          },
          {
            "probability": 0.0,
            "scientific_name": "Eristalinus aeneus",
            "scientific_name_id": "GBIF:1542830"
          }
        ],
        "type": "multiclass"
      }
    }
  ]
}

Example 2

Another example where taxa_unfiltered is outputted is when an infraspecies (subspecies, forma, variation, etc.) is predicted by the model, but a namespace does only know the corresponding species. In the example the (shared) model predicts Arion ater var. ater while NBIC knows only Arion ater. In this case the relevant taxon is marked with infra_species_mapped_to_species.

{
  "predictions": [
    {
      "region_group_id": "individual0",
      "taxa": {
        "items": [
          {
            "infra_species_mapped_to_species": true,
            "probability": 0.51212,
            "scientific_name": "Arion ater",
            "scientific_name_id": "NBIC:121255",
            "scientific_name_id_shared": "GBIF:10842166",
            "scientific_name_shared": "Arion ater var. ater"
          },
          {
            "probability": 0.051124,
            "scientific_name": "Pseudohydnum gelatinosum",
            "scientific_name_id": "NBIC:55907",
            "scientific_name_id_shared": "GBIF:5249353",
            "scientific_name_shared": "Pseudohydnum gelatinosum"
          },
          {
            "probability": 0.038855,
            "scientific_name": "Auricularia auricula-judae",
            "scientific_name_id": "NBIC:55837",
            "scientific_name_id_shared": "GBIF:5249271",
            "scientific_name_shared": "Auricularia auricula-judae"
          },
          {
            "probability": 0.035644,
            "scientific_name": "Phallus impudicus",
            "scientific_name_id": "NBIC:56785",
            "scientific_name_id_shared": "GBIF:3314876",
            "scientific_name_shared": "Phallus impudicus"
          },
          {
            "probability": 0.02273,
            "scientific_name": "Daedaleopsis confragosa",
            "scientific_name_id": "NBIC:63560",
            "scientific_name_id_shared": "GBIF:2545670",
            "scientific_name_shared": "Daedaleopsis confragosa"
          }
        ],
        "type": "multiclass"
      },
      "taxa_unfiltered": {
        "items": [
          {
            "probability": 0.5,
            "scientific_name": "Arion ater var. ater",
            "scientific_name_id": "GBIF:10842166"
          },
          {
            "probability": 0.049915,
            "scientific_name": "Pseudohydnum gelatinosum",
            "scientific_name_id": "GBIF:5249353"
          },
          {
            "probability": 0.037936,
            "scientific_name": "Auricularia auricula-judae",
            "scientific_name_id": "GBIF:5249271"
          },
          {
            "probability": 0.0348,
            "scientific_name": "Phallus impudicus",
            "scientific_name_id": "GBIF:3314876"
          },
          {
            "probability": 0.022192,
            "scientific_name": "Daedaleopsis confragosa",
            "scientific_name_id": "GBIF:2545670"
          }
        ],
        "type": "multiclass"
      }
    }
  ]
}