Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking schema.org and instruments #225

Open
huberrob opened this issue Jun 22, 2022 · 17 comments
Open

Linking schema.org and instruments #225

huberrob opened this issue Jun 22, 2022 · 17 comments
Labels

Comments

@huberrob
Copy link

Dear all,

I have seen that ESIP group already discussed ways to link samples and instruments so you might be interested to see that the RDA PIDINST group now tries to identify a way how to include references to instruments/sensors in schema.org/Dataset here: https://github.com/rdawg-pidinst/usage

Robert

@rebeccaringuette
Copy link

The first option in that list seems difficult and non-intuitive. The second option, linking through schema:measurementTechnique, could work, but doesn't allow us to include a link and description with the name of the instrument.
One thing we found in DataCite for this is the "isCollectedBy" relation type under datacite:relatedIdentifier and datacite:relatedItem. However, I don't see an equivalent term in PROV-O.
Would it be a good solution to use the same method for linking instruments as for linking observing networks and missions?

@dr-shorthair
Copy link
Collaborator

Do you mean 'the instrument used to create the sample' or at some other point in the lifecycle?

Note that SSN/SOSA has a fairly solid framework for this - see https://w3c.github.io/sdw-sosa-ssn/ssn/#Samplings-overview

@huberrob
Copy link
Author

huberrob commented Mar 3, 2025

Great to see you discovered this thread!
The idea is to use schema.org to indicate the instrument which was used to measure the values listed in a distinct dataset.
This probably won't work in a puristic, schema.org-only way. So using additional types from SSN/SOSA might be an option but would someone really expect this within a schema.org/Dataset record/graph?

@rebeccaringuette
Copy link

In Heliophysics/related, we don't often have samples, just data from in-situ or remote observations. In our case, I am talking about the instrument used to observe a phenomenon and produce the data. At this point, a detailed implementation like the one linked is too much. I am just interested in including the basic information to link the instrument to the dataset it produced in a way aligned with what others are doing.
We are interested in including the instrument link, name, and description. Later, we will work towards adding DOIs to this information to improve the linking quality since websites go out of date. Is that possible with the schema:measurementTechnique method, or is something else needed?
We are also interested in doing the same work to link a mission/observatory in the dataset records they produce.
Both types of these linkages will help assess the impact of the given instrument and mission/observatory on science in the community and how the resources we host are being used in research.

@huberrob
Copy link
Author

huberrob commented Mar 4, 2025

@rebeccaringuette , this is exactly the use case PIDINST are designed for and we would like to see a possible solution on how to expose this information using schema.org

@Kurokio
Copy link

Kurokio commented Mar 4, 2025

To follow up on @rebeccaringuette's two ideas, here is an example of the second option that uses "measurementTechnique" to link to the instrument.

{
  "@context": {
    "@vocab": "https://schema.org/",
  },
  "@id": "https://doi.org/10.concept/doi",
  "@type": "Dataset",
....
  "measurementTechnique": [
    {"@type": "DefinedTerm",
     "identifier": "spase://SMWG/Instrument/MMS/4/FIELDS/FGM",
     "name": "MMS 4 FIELDS Suite, Fluxgate Magnetometer (FGM) Instrument",
     "url": "https://www.nasa.gov/mission_pages/mms/spacecraft/mms-instruments.html"},
    {"@type": "DefinedTerm",
     "identifier": "spase://SMWG/Instrument/MMS/4/HotPlasmaCompositionAnalyzer",
     "name": "MMS 4 Hot Plasma Composition Analyzer (HPCA) Instrument",
     "url": "https://www.nasa.gov/mission_pages/mms/spacecraft/mms-instruments.html"}
   ]
....
}

Note that this example passes the schema.org validator. It also fulfills our desire to be able to map the instrument name, url, and identifier, since these properties descend from the "Thing" type. Let me know what you all think. Thanks!

@dr-shorthair
Copy link
Collaborator

The idea is to use schema.org to indicate the instrument which was used to measure the values listed in a distinct dataset. This probably won't work in a puristic, schema.org-only way. So using additional types from SSN/SOSA might be an option but would someone really expect this within a schema.org/Dataset record/graph?

In SSN/SOSA we distinguish Sensor - which is used in observations - from Sampler - which is used in samplings.
(Actuator is used in actuations.)
These are all sub-types of sosa:System.
However, apart from the name of the type/class the content model for these is pretty thin!

I note that the range of schema:instrument is schema:Thing which is totally generic.

@huberrob
Copy link
Author

huberrob commented Mar 5, 2025

{
  "@context": {
    "@vocab": "https://schema.org/",
  },
  "@id": "https://doi.org/10.concept/doi",
  "@type": "Dataset",
....
  "measurementTechnique": [
    {"@type": "DefinedTerm",
     "identifier": "spase://SMWG/Instrument/MMS/4/FIELDS/FGM",
     "name": "MMS 4 FIELDS Suite, Fluxgate Magnetometer (FGM) Instrument",
     "url": "https://www.nasa.gov/mission_pages/mms/spacecraft/mms-instruments.html"},
    {"@type": "DefinedTerm",
     "identifier": "spase://SMWG/Instrument/MMS/4/HotPlasmaCompositionAnalyzer",
     "name": "MMS 4 Hot Plasma Composition Analyzer (HPCA) Instrument",
     "url": "https://www.nasa.gov/mission_pages/mms/spacecraft/mms-instruments.html"}
   ]
....
}

This would work well for instrument types but maybe not so good for instrument instances? Also sdo:measurementTechnique not necessarily refers to an instrument so it might be hard to guess for (machine) clients if a sdo:DefinedTerm describes an instrument or some other method used? But maybe one could additionally use sdo:additionalType to indicate e.g. sosa:Sensor or similar?
Btw. spase:// identifiers maybe hard to resolve for clients which are not aware of this identifier type - which is of course OK if listed as s_do:identifier_ but in @id (which is missing in the example) probably the URL representation should be used (https://spase.info/etc..)?

@huberrob
Copy link
Author

huberrob commented Mar 5, 2025

schema:instrument

If we could use schema:instrument this would be great but this would require to create a sdo:Action (e.g. sdo:CreateAction) which then refers to the dataset created via sdo:result which seems to complicated to me.

...something like:

 {
   "@context": {
     "@vocab": "https://schema.org/",
   },
"@graph": 
  [
    {
       "@id": "https://doi.org/10.concept/doi",
      "@type": "Dataset"
        ...
    },
    {
      "@id": "someactionuri",
      "@type": "CreateAction",
      "instrument":"someinstrumenturi"
      "result": "https://doi.org/10.concept/doi"
    }
]
}

But it would be far easier to be able to use sdo:instrument within sdo:measurementTechnique , @danbri ?

@rebeccaringuette
Copy link

We are looking for something simple to link the instrument into the dataset record, preferably aligned with what is already common practice. Thanks for all the discussion, and interested to see where this goes.

@rebeccaringuette
Copy link

@huberrob
"Btw. spase:// identifiers maybe hard to resolve for clients which are not aware of this identifier type - which is of course OK if listed as s_do:identifier_ but in @id (which is missing in the example) probably the URL representation should be used (https://spase.info/etc..)?"
Is it acceptable practice to use DOIs in the "@id" field when available, or should it always be the URL?

Hopefully @danbri can get back to us soon on the instrument question.

@rebeccaringuette
Copy link

What about using an approach similar to that being discussed in issue 258? If I (perhaps badly) incorporate sosa into that approach, I think it becomes:

{
   "@context": [
      "https://schema.org/",
      {
         "prov": "http://www.w3.org/ns/prov#"
      }
   ],
   "@id": "https://doi.org/10.concept/doi",
   "@type": "Dataset",
   ... 
   "prov:wasGeneratedBy": [
      {
         "@type": ["ResearchProject", "prov:Activity"],
         "prov:used": {
             "@type": ["Instrument", "prov:Entity", "sosa:sensor"],
             "@id": "https://hpde.io/SMWG/Instrument/MMS/4/FIELDS/FGM"
             "name": "MMS 4 FIELDS Suite, Fluxgate Magnetometer (FGM) Instrument",
         },
         "identifier": "https://hpde.io/SMWG/Instrument/MMS/4/FIELDS/FGM",
         "url": "https://www.nasa.gov/mission_pages/mms/spacecraft/mms-instruments.html",
         "name": "MMS 4 FIELDS Suite, Fluxgate Magnetometer (FGM) Instrument"
      },
      ...
   ],
   ...
}

What are the thoughts about this approach for instruments? It would be useful to have the observatories and instruments represented with similar approaches for programmatic use.

@rebeccaringuette
Copy link

rebeccaringuette commented Mar 11, 2025

This validates in schema.org and uses the sosa sensor term. Thoughts?
@Kurokio helped with this.

   "prov:wasGeneratedBy": [
      {
         "@type": ["ResearchProject", "prov:Activity"],
         "prov:used": 
         {
             "@id": "https://hpde.io/SMWG/Instrument/MMS/4/FIELDS/FGM",
             "@type": ["IndividualProduct", "prov:Entity", "sosa:sensor"],
             "name": "MMS 4 FIELDS Suite, Fluxgate Magnetometer (FGM) Instrument",
             "identifier": "https://hpde.io/SMWG/Instrument/MMS/4/FIELDS/FGM",
             "url": "https://www.nasa.gov/mission_pages/mms/spacecraft/mms-instruments.html"
         }
      },  
      {
         "@type": ["ResearchProject", "prov:Activity"],
         "prov:used": 
         {
             "@id": "https://hpde.io/SMWG/Instrument/MMS/4/HotPlasmaCompositionAnalyzer",
             "@type": ["IndividualProduct", "prov:Entity", "sosa:sensor"],
             "name": "MMS 4 Hot Plasma Composition Analyzer (HPCA) Instrument",
             "identifier": "https://hpde.io/SMWG/Instrument/MMS/4/HotPlasmaCompositionAnalyzer",
             "url": "https://www.nasa.gov/mission_pages/mms/spacecraft/mms-instruments.html"
         }
      }  
]

@dr-shorthair
Copy link
Collaborator

URIs are case-sensitive, so it the correct form is sosa:Sensor - see https://www.w3.org/TR/vocab-ssn/#SOSASensor

Note that the draft update has more examples and diagrams
https://w3c.github.io/sdw-sosa-ssn/ssn/#Systems-and-their-Deployment-overview
https://w3c.github.io/sdw-sosa-ssn/ssn/#Observations-overview

@huberrob
Copy link
Author

@rebeccaringuette I would rather expect this is in a DCAT graph. I fear in a schema.org graph the prov:wasGeneratedBy should not be used unless your instance of a schema:Dataset is also an instance of a dcat:Dataset.

@danbri
Copy link

danbri commented Mar 12, 2025

Most things schema considers a Dataset are going to be considered a dataset by DCAT too. Earlier DCAT specs implied that datasets only lived in organized repositories-- but perhaps that has been softened by now?

@rebeccaringuette
Copy link

URIs are case-sensitive, so it the correct form is sosa:Sensor - see https://www.w3.org/TR/vocab-ssn/#SOSASensor

Note that the draft update has more examples and diagrams https://w3c.github.io/sdw-sosa-ssn/ssn/#Systems-and-their-Deployment-overview https://w3c.github.io/sdw-sosa-ssn/ssn/#Observations-overview

If that is the only error, then that is easy to correct. Thanks!
I don't think it is useful to compare definitions of datasets here. I am, however, interested in comments on how to improve the code snippet I put in. We are working with Colin Smith to incorporate this into the soso software package and would like this to incorporate the knowledge in this group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants