DiSSCo's MVP is almost ready. What comes next?Priorities for further development of the DiSSCo core DS infrastructure beyond the Minimum Viable Product
As you may know, we are just this close to getting a minimum viable product (MVP) of the core digital specimen infrastructure of DiSSCo. That leaves us with two options: The first one is to beat our chest gorilla-style (Nah...). The second is to try and start figuring out where to go next, i.e. create a roadmap to continue building DiSSCo according to our priorities.
This binnacle is precisely about that. It aims to give you a better understanding of our MVP and the options that lie ahead. Ready? Scroll down!
--- Sort of a disclamer
If you are not familiar with this or if you need to refresh your memory, we suggest you go take a look at this other binnacle where all these notions are explained.
Ok, and now let's go!
PART 1: THE MVP OF DISSCO'S CORE DIGITAL SPECIMEN INFRASTRUCTURE
- Digital Specimen Repository (basically storage space)
- PID infrastructure (where we mint DOIs)
- Data processing and publishing
- Authorisation/ Authentication infrastructure
- Indexing and API
Now, DON'T FREAK OUT if you are already lost. These are the components of the MVP but, fortunately for you, we will not go into much detail with them. What really matters is that you understand how the MVP works.
So no excuses, buddy!. Next slide, chop, chop!
You can think of the MVP as the central piece in a 3-step process:
2. Then you have DiSSCo's infrastructure. It is here that we curate and enhance the specimen information to optimise it for data consumers.
3. And then you have the data consumers, which include aggregators such as GBIF or GeoCASe.
Needless to say, step 2 is where DiSSCo's magic happens. Check this out: DiSSCo's core infrastructure will allow humans to work on the data, but also machines, i.e. AI systems or any other system able to collect information and/or add it to the specimen (e.g. machine annotations, optical character recognition, georeference). Keep scrolling to know more.
Indeed, to keep things in line we need solid foundations. That is precisely what the DiSSCo core infrastructure is for. It lays at the basis of all our services, providing a number of functionalities to make them all conform a coherent platform.
Let's just go over these functionalities really quick.
What does this mean? Basically, it means that DiSSCo collects all the data from CMSs and harmonises it to its open DS data standard. The open DS standard is our thing, but it builds on TDWG standards and gets inspiration as well from GBIF unified model.
Only if we have data from different sources structured according to one solid standard will we be able to create services that work for all the data, regardless of from which of all DiSSCo institutions it comes. Makes sense, right?
The second functionality of DiSSCo's infrastructure has to do with... yeah, of course, FAIR data!
As you know (because we've told you a hundred times), all data in DiSSCo will be FAIR. Minting persistent identifiers (PIDs) is part of that effort (at this point, we will not get into what FAIR means but you can learn it all from this other binnacle).
For each specimen we will mint a Digital Object Identifier (DOI) that will stay the same, unique and resolvable regardless of changes in the data.
We have created our own PID infrastructures for DOIs in collaboration with DataCite.
The third functionality is yet another important component of DiSSCo's FAIRness. Very simply put, we want to know where our data comes from. For that, the core infrastructure of DiSSCo creates provenance data for every object. This entails listing of the changes made to the object, when they happened, who made them, etc. All this information is stored with the PID of the agent, and all of it is traceable back to the source.
Remember: All these functionalities apply to all DiSSCo services!
There are four of these functionalities that you should get familiar with.
CURATION OF DATA: The DiSSCo core infrastructure makes sure that agents can curate data and, if something is wrong, flag it.
EXTENDING DATA: DiSSCo also makes sure that agents can add to the data.
LINKING DATA: Let's use an example for this one. Let's say you find a DNA sequence in a third-party infrastructure that is relevant for your specimen. The DiSSCo core infrastructure allows you to link the specimen to that infrastructure where lies the DNA data that you need.
DATA EXPOSURE: This one is about letting agents harvest and use DiSSCo data to create their own data. Our core infrastructure also allows for it.
We know what you are wondering right now: how much of this have we already achieved in our MVP?
Well, we have all our functionalities in the MVP but it is really a first set up. We figured that, rather than working on something until it was 100% and then passing to the next thing, it would be better to work on all fronts progressively, making all functionalities progressively workable.
DiSSCo core infrastructure provides the APIS and all the features needed to build DiSSCo services. Take DiSSCover, for example, which uses many of the functionalities that the core infrastructure provides.
You haven't seen DiSSCover yet?? Go here for a demo!
PART 2: NEXT STEPS IN DEVELOPING OUR MVP
Well, that is an easy one - whatever our users tell us to go. In other words, it is crucial that every development of our core infrastructure meet the demands and priorities of DiSSCo's stakeholders. Part 2 of this binnacle is precisely about this: the demands of DiSSCo stakeholders and how we are meeting them.
In this second part, we'll have a quick look at:
1. The user demands that we gathered during ICEDIG and DiSSCo Prepare;
2. The demands from DiSSCo's "early-adopters";
3. Options for further development that we discussed during our recent workshop about the topic last October 2024.
2a. User demands compiled during ICEDIG and DPP
The next slide provides a snapshot of what has been developed on the basis of DiSSCo Prepare's preliminary studies.
2b. Demands from Early Adopters
● Papillotten project (Naturalis)
● Virtual reference collections (TETTRIs, Luomus)
● MIDS level improvement in digitisation strategy (Museum of Vienna)
● Citizen science platforms DoeDat and Herbonauten (TETTRIs)
● CESP project AI 4 Labels (GBIF ECA Nodes)
● MAS for mass digitisation & trait description (Recolnat)
We don't have the time to review all of them now, but let us take two or three of them as examples, so you get an idea of what these initiatives are asking us to prioritise as we develop DiSSCo's core infrastructure.
They need annotation tools and, for that, they first need to be able to put their data in the infrastructure. They also want to add developments in AI-based species recognition (apart from human experts) that have been developed in Naturalis.
They also need to be able to add information and images of the specimens that can be used for identification, so that their specimens can be reference for other projects.
What they need is a reliable way to integrate the DiSSCo infrastructure with their digitisation workflow.
As you can see, there are many and diverse users stories. Time to ask the hard questions, then!
2c. Options for further development
- What about extending the data model with missing information, supporting materials, identification for geological specimens and so on?
- Why don't we focus on the quality of our online data?
- Should we perhaps work on implementing a community trust model and voting functionality so that we can integrate the annotations made in the DS data?
- Or even better, shall we focus on integration with digitisation and publication workflows (CMS + infrastructures like GBIF)?
- Improved access, anyone? (advanced search, SEO optimisation, mobile access...)
- Or what about working on reporting, stats, visualisation and so on?
During our recent workshop on DiSSCo's core infrastructure, we did a little bit of "Mentimetering" so we could gather as much feedback from the attendants as possible.
The next slides show the Mentimeter results, but if you're ok with the short version, here it is: Lots of priority on some areas, such as connectivity with local institutional systems (CMSs); support for digitisation workflows; publication of improved data to GBIF; uptake of DOIs that we mint; or virtual collections, among others.
If you want the specifics, continue scrolling!
We don't have dates yet but our technical team should have something to show around next May 2025. In the meantime, you are welcome to continue joining us at our bi-monthly technical demos and workshops.
Check our Knowledge Area for more information, videos and materials of all technical things from DiSSCo.
You have reached the end of this binnacle. Care to share some feedback? Contact Jose Alonso (jose.alonso@naturalis.nl)