Notice

This multimedia story format uses video and audio footage. Please make sure your speakers are turned on.

Use the mouse wheel or the arrow keys on your keyboard to navigate between pages.

Swipe to navigate between pages.

Let's go

DiSSCo MVP is ready. What next?

Logo https://dissco.pageflow.io/dissco-mvp-is-ready-what-next

Reading time: 14mins
Goto first page






As you may know, we are just this close to getting a minimum viable product (MVP) of the core digital specimen infrastructure of DiSSCo. That leaves us with two options: The first one is to beat our chest gorilla-style (Nah...). The second is to try and start figuring out where to go next, i.e. create a roadmap to continue building DiSSCo according to our priorities.

This binnacle is precisely about that. It aims to give you a better understanding of our MVP and the options that lie ahead. Ready? Scroll down!

Goto first page
Chances are that this binnacle will mention concepts such as Digital Specimens (DS), FAIR digital objects (FDOs), Persistent Identifiers (PIDs) or standards such as Darwin Core (DwC) or MIDS. 
If you are not familiar with this or if you need to refresh your memory, we suggest you go take a look at this other binnacle where all these notions are explained. 

Ok, and now let's go!
Goto first page
The MVP has five main components:

- Digital Specimen Repository (basically storage space)
- PID infrastructure (where we mint DOIs)
- Data processing and publishing
- Authorisation/ Authentication infrastructure
- Indexing and API

Now, DON'T FREAK OUT if you are already lost. These are the components of the MVP but, fortunately for you, we will not go into much detail with them. What really matters is that you understand how the MVP works.

So no excuses, buddy!. Next slide, chop, chop!
Goto first page
1. You have the (specimen) data providers, which are mostly CMSs of natural history collections across Europe. 

2. Then you have DiSSCo's infrastructure. It is here that we curate and enhance the specimen information to optimise it for data consumers.

3. And then you have the data consumers, which include aggregators such as GBIF or GeoCASe. 

Needless to say, step 2 is where DiSSCo's magic happens. Check this out: DiSSCo's core infrastructure will allow humans to work on the data, but also machines, i.e. AI systems or any other system able to collect information and/or add it to the specimen (e.g. machine annotations, optical character recognition, georeference). Keep scrolling to know more.
Goto first page
Considering the size of DiSSCo, which brings together hundreds of institutions from all over Europe, it is easy to think that getting data, handling data and delivering data could easily become quite chaotic. 

Indeed, to keep things in line we need solid foundations. That is precisely what the DiSSCo core infrastructure is for. It lays at the basis of all our services, providing a number of functionalities to make them all conform a coherent platform.

Let's just go over these functionalities really quick.
Goto first page
HARMONISATION OF DATA TO OPEN DS

What does this mean? Basically, it means that DiSSCo collects all the data from CMSs and harmonises it to its open DS data standard. The open DS standard is our thing, but it builds on TDWG standards and gets inspiration as well from GBIF unified model.

Only if we have data from different sources structured according to one solid standard will we be able to create services that work for all the data, regardless of from which of all DiSSCo institutions it comes. Makes sense, right?
Goto first page
MINTING PIDS

The second functionality of DiSSCo's infrastructure has to do with... yeah, of course, FAIR data!

As you know (because we've told you a hundred times), all data in DiSSCo will be FAIR. Minting persistent identifiers (PIDs) is part of that effort (at this point, we will not get into what FAIR means but you can learn it all from this other binnacle). 

For each specimen we will mint a Digital Object Identifier (DOI) that will stay the same, unique and resolvable regardless of changes in the data. 

We have created our own PID infrastructures for DOIs in collaboration with DataCite.


Goto first page
PROVENANCE OF DATA

The third functionality is yet another important component of DiSSCo's FAIRness. Very simply put, we want to know where our data comes from. For that, the core infrastructure of DiSSCo creates provenance data for every object. This entails listing of the changes made to the object, when they happened, who made them, etc. All this information is stored with the PID of the agent, and all of it is traceable back to the source.

Remember: All these functionalities apply to all DiSSCo services!

Goto first page
The next bunch of functionalities are meant to allow agents, be that humans or machines, to make the best out of the data.

There are four of these functionalities that you should get familiar with.

CURATION OF DATA: The DiSSCo core infrastructure makes sure that agents can curate data and, if something is wrong, flag it.

EXTENDING DATA: DiSSCo also makes sure that agents can add to the data.

LINKING DATA: Let's use an example for this one. Let's say you find a DNA sequence in a third-party infrastructure that is relevant for your specimen. The DiSSCo core infrastructure allows you to link the specimen to that infrastructure where lies the DNA data that you need.

DATA EXPOSURE: This one is about letting agents harvest and use DiSSCo data to create their own data. Our core infrastructure also allows for it. 
Goto first page
Turning up the heat!

We know what you are wondering right now: how much of this have we already achieved in our MVP?

Well, we have all our functionalities in the MVP but it is really a first set up. We figured that, rather than working on something until it was 100% and then passing to the next thing, it would be better to work on all fronts progressively, making all functionalities progressively workable.

DiSSCo core infrastructure provides the APIS and all the features needed to build DiSSCo services. Take DiSSCover, for example, which uses many of the functionalities that the core infrastructure provides.

You haven't seen DiSSCover yet?? Go here for a demo!
Goto first page
Ok, so we have our MVP, which looks more or less as we just explained to you. Where to go from here? 

Well, that is an easy one - whatever our users tell us to go. In other words, it is crucial that every development of our core infrastructure meet the demands and priorities of DiSSCo's stakeholders. Part 2 of this binnacle is precisely about this: the demands of DiSSCo stakeholders and how we are meeting them. 

In this second part, we'll have a quick look at:

1. The user demands that we gathered during ICEDIG and DiSSCo Prepare; 

2. The demands from DiSSCo's "early-adopters";

3. Options for further development that we discussed during our recent workshop about the topic last October 2024. 

Goto first page
This is hardly the first time we pay attention to the demands of DiSSCo's stakeholders. In fact, we dealt with this quite in depth during ICEDIG and DPP. The result was a huge number of user stories which translated into two deliverables during DPP: one for life sciences and another one for Earth sciences (By the way, there are significant differences in the priorities of the two strands!)

The next slide provides a snapshot of what has been developed on the basis of DiSSCo Prepare's preliminary studies.  

Goto first page
Goto first page
Close
TIP: If the list that you have just seen is not enough and you want to get a more in-depth explanation, watch the recording of DiSSCo's recent workshop on the core infrastructure MVP.

Go directly to 00:22:12. Our colleague Wouter Addink will be delighted to tell you all you need to know about users demands.

Now, let's go to find out what our early adopters need from us...
I agree with being shown YouTube videos. More information

To opt out of displaying external embeds, manage settings here.

Goto first page
DiSSCo's so-called early adopters are a group of fearless and audacious people who showed a genuine interest in using our core infrastructure. They also have demands and we also take them very seriously. This is the group:

Papillotten project (Naturalis)
● Virtual reference collections (TETTRIs, Luomus)
● MIDS level improvement in digitisation strategy (Museum of Vienna)
● Citizen science platforms DoeDat and Herbonauten (TETTRIs)
● CESP project AI 4 Labels (GBIF ECA Nodes)
● MAS for mass digitisation & trait description (Recolnat)


We don't have the time to review all of them now, but let us take two or three of them as examples, so you get an idea of what these initiatives are asking us to prioritise as we develop DiSSCo's core infrastructure.

Goto first page
The Papillotten project from Naturalis wants to use DiSSCo’s infrastructure to add identifications to their specimens. They are involved in digitising specimens of butterflies but some names are still missing, so the information cannot be published yet.

They need annotation tools and, for that, they first need to be able to put their data in the infrastructure. They also want to add developments in AI-based species recognition (apart from human experts) that have been developed in Naturalis.
Goto first page
In the context of the project TETTRIs, Luomus is working on building reference collections. They want a functionality that allows anyone select a number of specimens and make a virtual collection with them. For that, they need to be able to select individual specimens and group them.

They also need to be able to add information and images of the specimens that can be used for identification, so that their specimens can be reference for other projects.
Goto first page
Our colleagues from the Museum of Vienna want to use DiSSCo's infrastructure as part of their digitisation strategy. They intend to supply level 0 specimens (i.e. empty records) to then start using DiSSCo's infrastructure to enrich the information, digitise the label info, etc.

What they need is a reliable way to integrate the DiSSCo infrastructure with their digitisation workflow.





As you can see, there are many and diverse users stories. Time to ask the hard questions, then! 
Goto first page
Having learned all the lessons from the previous user demands that we just saw, we are at the point where we need to decide where to go from here. The options are many. Just a few examples: 

  • What about extending the data model with missing information, supporting materials, identification for geological specimens and so on?
  • Why don't we focus on the quality of our online data?
  • Should we perhaps work on implementing a community trust model and voting functionality so that we can integrate the annotations made in the DS data?
  • Or even better, shall we focus on integration with digitisation and publication workflows (CMS + infrastructures like GBIF)?
  • Improved access, anyone? (advanced search, SEO optimisation, mobile access...)
  • Or what about working on reporting, stats, visualisation and so on?
Goto first page
In a nutshell, where shall we go now?

During our recent workshop on DiSSCo's core infrastructure, we did a little bit of "Mentimetering" so we could gather as much feedback from the attendants as possible. 

The next slides show the Mentimeter results, but if you're ok with the short version, here it is: Lots of priority on some areas, such as connectivity with local institutional systems (CMSs); support for digitisation workflows; publication of improved data to GBIF; uptake of DOIs that we mint; or virtual collections, among others.

If you want the specifics, continue scrolling!


Goto first page
Goto first page
Goto first page
Goto first page
Goto first page
Goto first page
Goto first page
Goto first page
Goto first page
We will use all this information to further discuss our roadmap for developing DiSSco's infrastructure beyond the MVP.

We don't have dates yet but our technical team should have something to show around next May 2025. In the meantime, you are welcome to continue joining us at our bi-monthly technical demos and workshops. 

Check our Knowledge Area for more information, videos and materials of all technical things from DiSSCo.




You have reached the end of this binnacle. Care to share some feedback? Contact Jose Alonso (jose.alonso@naturalis.nl)



Goto first page
Scroll down to continue Swipe to continue
Swipe to continue