A "POP CULTURE" APPROACH TO DIGITAL OBJECTS, PIDS, FAIR PRINCIPLES, DATA STANDARDS AND MORE...DISSCO TECHNICAL ARCHITECTUREEverything you want to know but never dared to ask, explained in simple terms.
Contents by: DiSSCo CSO (August 2022)
This is what you know
□ You also know that the data contained in the billions of specimens hosted in European Natural Science Collections (NSCs) are a fundamental basis of knowledge to tackle these challenges.
□ And you also know that by putting all that data together and making it accessible, DiSSCo will help the scientific community work together and do their job a zillion times more efficiently.
‖ Now, DiSSCo is more than just an effort to digitally bring together the data from a couple hundred NSCs, of course… And this is where we get to what you might or might not know:
"DiSSCo aims at turning static records about specimens into dynamic, actionable objects that will evolve with science itself."
‖ Turning static into dynamic... You'll hear this mantra more as you scroll down...
Let's get started: DSArch
‖ Explaining DSArch entails some difficulty but hopefully, this binnacle will help you understand DiSSCo’s data architecture better and will shed some light on a few of the most important concepts that we use in our discussions about technical matters, the ones you hear often but have trouble understanding.
Let’s get to it!
THE THREE PILLARS OF DSArch
① Evolutionary Architecture with Protected Characteristics
② The FAIR guiding principles
③ The Digital Object Architecture
Evolutionary Architecture with Protected Characteristics in short
Evolutionary Architecture with Protected Characteristics in short
‖ A Research Infrastructure is as good as the reliability and solidity of the data it provides.
‖ The evolutionary architecture approach gives us a way of eating the cake and having it too, so to speak. It acknowledges the inevitable evolution of things but at the same time shields some essential components of the architecture by granting them protected status. Those protected components, being “futureproof”, will stay the same in the long term.
We will deal with some of those components, such as the FAIRness of data or the centrality of the Digital Specimen later. If you want to take a look at the whole list, go here
The FAIR principles in short
The FAIR principles in short
‖ You must have heard/seen/read the term FAIR more than a thousand times by now, so let’s make it quick...
‖ FAIR principles are the best way of ensuring proper stewardship of data but let’s admit it: they are a bit abstract, so you might wonder: How is data made FAIR? Is it about writing code or what? The answer to that is not difficult but it is long to explain, so let’s see a couple of examples that will probably help you grasp all this.
□ Example 1: If you want your data to be Findable, the FAIR guiding principles recommend that, among other things, you give your data a globally unique and persistent identifier, that is, a sort of ID code that will belong to your data -and your data only- forever. We’ll get to that later, don’t worry.
□ Example 2: If you want your data to be Reusable, the FAIR guiding principles recommend that, among other things, your data and metadata be released with a clear and accessible data usage license.
‖ There are a bunch of other criteria that you can apply to make your data FAIR. Find them and much more about the FAIR Guiding Principles here and here.
Understanding the Digital Object happy family
Understanding the Digital Object happy family
Digital Specimens (DS) and Digital Collections (DC) are both specific types of Digital Objects (DO). Simple as that.
‖ Ok, in the case of DiSSCo, we should rather say that they are specific types of FAIR Digital Objects, given DiSSCo’s alignment to the FAIR principles. DS and DC are not the only types of digital objects, of course, just the two that are more closely related to NSCs.
‖ “FAIR Digital Object” and “Digital Specimen” are the concepts that you will probably hear more often, so let’s give each of them a paragraph.
‖ In essence, a FAIR Digital Object (FDO) is a digital object that follows the FAIR principles. If a Digital Object is a sequence (or sequences) of bits “structured in a way that is interpretable by one or more of the computational facilities and having as an essential element an associated unique persistent identifier” (DONA Foundation), then a FDO is the same, only FAIR.
‖ And why do FDOs makes sense? At the end of the day, FAIR was not part of the original DO idea, right? Well, it turns out that there are fundamental elements in the DO nature that make it compatible with the FAIR principles. In fact, DOs make FAIR implementation with other systems possible in a much more granular and interoperable fashion. So they do make sense.
‖ As stated above, a Digital Specimen is a specific type of digital object. You will usually find it described in DiSSCo documents as a “surrogate” or a “digital twin” of a physical specimen. You can have a specimen of a butterfly in your hand and its digital twin on the screen of your computer.
‖ But the butterfly on the screen is not just a visual representation of the one in your hand. That digital image is just the “cover photo” of an online package that brings together FAIR data from different sources (taxonomic, genomic, biochemical, you name it), all referring to the same physical specimen.
‖ Besides -and this is the best bit- this online package that contains all the data related to the butterfly is not static in the same way as the information written on the tag of a physical specimens is. Instead, the data anchored by the digital specimen is dynamic, actionable. In other words: you can work on it and transform it (e.g. by annotating it or applying DiSSCo services to it). Remember the mantra: “DiSSCo aims at turning static records about specimens into dynamic, actionable objects that will evolve with science itself”? Here it is.
IDENTIFYING DIGITAL OBJECTS(or "Be yourself" for NSCs specimens)
‖ A bit of History: As NSCs started implementing mass digitisation programmes and mobilising their data for others to use, some changes became more and more pressing. The way of referencing specimens was one of them. Sure, each specimen in a collection normally has its own catalogue ID that is unique within that collection, but the moment collections start working with other collections, there are potential problems. For example, if a specimen in your botanical garden happens to share the same reference number with a totally unrelated specimen in a museum of geology, that might lead to confusion, so no bueno.
‖ A DOI is an alphanumeric code that looks like this:
10.prefix/suffix
‖ For example, if you type this in your navigator...
https://doi.org/10.15468/w6ubjx
... You will visit the Royal Belgian Institute of Natural Sciences Mollusc collection dataset, accessed through GBIF, and this specific PID will never point at any other object, only this particular mollusc dataset. It will never change even though the content or the metadata related to the object might be altered in the future.
‖ Not only that: Just think about it and you will realise that they also contribute to make the data more FAIR, or at least more “FA”, because they make Findability and Accessibility of data easier (more on FAIR below).
‖ Now that you know what a PID is and why they matter, you should know that DiSSCo and other international scientific infrastructures are working to create an DOI specifically for the concept of Digital Specimen. The same way that a DS brings together all relevant data about a specimen, this "expanded" DOI is meant to bring together the relevant PIDs of all that data related to a digital specimen. A PID for connecting PIDs, so to speak.
Ꙭ Wanna know more? Then go here or here. Our colleagues from DiSSCo's technical team will be glad to give you more details.
DESCRIBING DIGITAL OBJECTS(Or "What's in the basket?")
‖ We have just seen that NSCs realised the need for finding a way to reference specimens so that each of them had a unique, persistent identifier accepted by the scientific community. Well, something similar happens with the way NSCs record, describe and exchange their data. NSCs need their own sort of "Esperanto" (only this time working!) that all can understand.
‖ The answer to that need was the development of the data exchange standards, basically a set of rules agreed upon by the scientific community so that everyone records, describes and exchanges data the same way. The main data standards for collections are Darwin Core and ABCD (Access to Biological Collection Data, which turns into ABCDEFG when Extended For Geosciences). For digitisation status, we use MIDS (Minimum Information about a Digital Specimen).
‖ Standards do a great job for supporting exchange and integration across data structures, but DiSSCo wants to take one further step…
‖ This further step that is meant to harmonise DiSSCo’s universe goes by the name of open digital specimen (openDS) data specification. Put really simply: the openDS explains what a digital specimen structure and content should be, the operations that can act upon them and generally how to handle and transfer it. The ultimate goal: Making the best of the digital transformation of NSCs that DiSSCo will bring about.
Ꙭ Wanna know more? Let our own Alex Hardisty give you details here.
USING DIGITAL OBJECTS(Or "Stronger together...!")
THE DIGITAL EXTENDED SPECIMEN
‖ Is there much difference between the European and the American approaches? Generally speaking, you might say that the European approach takes the perspective of the NSCs (at the end of the day, DiSSCo is all about collections) and the American rather sides with the view of the researcher, but all in all, the Digital Specimen and the Extended Specimen have potential to converge. It is not for nothing that, following TDWG 2020, more than 35 organisations worldwide and many individuals decided to work collaboratively towards a global specification and interoperability for the Digital Specimen and Extended Specimen concepts. Guess what new term they came up with to unite both:
Yeah...! the Digital Extended Specimen (DES).
□ The information of a DES is richer and denser than the one limited to a physical specimen, and that will ultimately result in more reliable science.
□ As a DO, DESes will make possible a wide arrange of practices that were a little less than unthinkable with physical specimens (think simulation and prediction capabilities, for example).
□ Co-analysis across scientific disciplines will be made possible in an unprecedented way.
‖ As you know, we continue to work on developing many of the areas explained above, from that new expanded DOI to identify Digital Specimens, to the openDS specification or that brand-new concept of Digital Extended Specimen.
‖ We will continue uploading content about DiSSCo as we reach new milestones. In the meantime, please do not hesitate to contact us if you need further information.
We need your feedback
Please, follow this link and give us some feedback. It will take 2 mins. of your time. Thanks!