I was comparing the outputs (mostly the JSON-LD) of credreg for CTDL, CTDL-ASN, and QData, and I noticed that some terms show different information depending on which one of those JSON-LD files you're looking at. This is usually caused by the way the schema manager currently handles "borrowing" terms from one schema to show in the context of another schema.
A term is "borrowed" so that it will show up in a schema other than its own (e.g. so there will be a table for ceterms:name on the QData terms page), and by default it just gets borrowed as-is. This is done by creating an entry for a term from one schema in the list of terms for another schema and setting that entry's "BorrowedFromSchemaId" value to be the schema ID of the original schema. If the only values in that entry are the term's URI and the value for BorrowedFromSchemaId, then everything else will be read from the original schema's entry for that URI.
But the schema manager allows overwriting one or more attributes of a term within the context of that schema (so that, for example, ceterms:name could have a different domain depending on whether you were looking at the CTDL terms page or the QData terms page). That may have made sense at some point in the past (maybe there was a use case for it at the time), but I don't think it still does. In other words, the system is working as intended, and the data needs correction.
The only reason borrowing happens at all is mostly for the convenience of the user, so that the user doesn't have to click around between multiple terms pages as much. This behavior was created back when there was minimal borrowing of terms and it was manageable to keep up with it, but since it's such a common practice now, it might be better to remove borrowed entries completely after fixing the discrepancies listed below (though this would mean more navigating on the consuming end, it would also be less confusing).
Regardless of whether you remove borrowed entries or not, you'll need to fix the discrepancies listed in the table below. The process/solution is basically the same for each of them:
- Ensure the canonical representation of the term in the appropriate schema[1] has the right data
- Check the term history from each schema's representation of that term just in case a history entry was made in the non-original schema due to an overwrite being made in the non-original schema.
- Remove any overwriting data from the representation of that term in other schemas.
- Ensure the same fix gets applied to both the current release and the pending release so that the problem doesn't reappear when the pending release becomes current
[1] Since we don't have/use a standalone representation of schema.org terms, the below solutions assume that the CTDL schema (in the schema manager sense, not in an RDF sense) would fill that role. The schema manager does support adding such a representation, but that is probably more complexity than is needed here.
These are the issues I found when comparing the 3 JSON-LD documents linked above:
| # |
Issue |
Fix |
JSON-LD Term Links |
History Links |
| 1 |
QData overwrites the domain of ceterms:alternateName |
Add any valid domains to the CTDL representation of ceterms:alternateName and remove the entire domain value from the QData representation of ceterms:alternateName |
CTDL QData |
CTDL CTDLASN QData |
| 2 |
QData overwrites the domain of ceterms:name |
Add any valid domains to the CTDL representation of ceterms:name and remove the entire domain value from the QData representation of ceterms:name |
CTDL QData |
CTDL CTDLASN QData |
| 3 |
QData overwrites the domain of ceterms:description (and has an invalid value, "qdata:EarningsProfile") |
Add any valid domains to the CTDL representation of ceterms:description and remove the entire domain value from the QData representation of ceterms:description |
CTDL QData |
CTDL CTDLASN QData |
| 4 |
QData overwrites the domain of ceterms:endDate |
Add any valid domains to the CTDL representation of ceterms:endDate and remove the entire domain value from the QData representation of ceterms:endDate |
CTDL QData |
CTDL CTDLASN QData |
| 5 |
QData overwrites the domain of ceterms:startDate |
Add any valid domains to the CTDL representation of ceterms:startDate and remove the entire domain value from the QData representation of ceterms:startDate |
CTDL QData |
CTDL CTDLASN QData |
| 6 |
QData overwrites the domain of ceterms:jurisdiction |
Add any valid domains to the CTDL representation of ceterms:jurisdiction and remove the entire domain value from the QData representation of ceterms:jurisdiction |
CTDL QData |
CTDL CTDLASN QData |
| 7 |
QData overwrites the range of ceterms:processingAgent |
Remove the entire domain value from the QData representation of ceterms:processingAgent (the CTDL representation already points at the 3 organization classes, and ceterms:Agent shouldn't be used as a range since that would be inconsistent with other properties) |
CTDL QData |
CTDL CTDLASN QData |
| 8 |
CTDL-ASN overwrites the domain of ceterms:industryType |
Add any valid domains to the CTDL representation of ceterms:industryType and remove the entire domain value from the CTDL-ASN representation of ceterms:industryType |
CTDL CTDLASN |
CTDL CTDLASN QData |
| 9 |
CTDL-ASN overwrites the domain of ceterms:occupationType |
Add any valid domains to the CTDL representation of ceterms:occupationType and remove the entire domain value from the QData representation of ceterms:occupationType |
CTDL CTDLASN |
CTDL CTDLASN QData |
| 10 |
Both CTDL-ASN and QData overwrite the domain of ceterms:instructionalProgramType |
Add any valid domains to the CTDL representation of ceterms:instructionalProgramType and remove the entire domain value from the CTDL-ASN and QData representations of ceterms:instructionalProgramType |
CTDL CTDLASN QData |
CTDL CTDLASN QData |
| 11 |
The definition and usage note for schema:QuantiativeValue are mismatched between CTDL and QData |
Determine what the correct definition and usage note (if any) should be, update it in the CTDL representation, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. |
CTDL QData |
CTDL CTDLASN QData |
| 12 |
The domain for schema:description is mismatched between CTDL and QData |
Determine the correct domain, update it in the CTDL representation, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. |
Not available (conflicts with the URL for ceterms:description); find it in the full schema JSON-LD files linked above |
CTDL CTDLASN QData |
| 13 |
The usage note and domain for schema:maxValue are mismatched between CTDL and QData |
Determine what the correct values for the domain and usage note (if applicable) should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. |
CTDL QData |
CTDL CTDLASN QData |
| 14 |
The usage note and domain for schema:minValue are mismatched between CTDL and QData |
Determine what the correct values for the domain and usage note (if applicable) should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. |
CTDL QData |
CTDL CTDLASN QData |
| 15 |
The usage note and domain for schema:value are mismatched between CTDL and QData |
Determine what the correct values for the domain and usage note (if applicable) should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. |
CTDL QData |
CTDL CTDLASN QData |
| 16 |
The definition and range for schema:unitText are mismatched between CTDL and QData |
Determine what the correct values for the definition and range should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. See also this comment/issue. |
CTDL QData |
CTDL CTDLASN QData |
I was comparing the outputs (mostly the JSON-LD) of credreg for CTDL, CTDL-ASN, and QData, and I noticed that some terms show different information depending on which one of those JSON-LD files you're looking at. This is usually caused by the way the schema manager currently handles "borrowing" terms from one schema to show in the context of another schema.
A term is "borrowed" so that it will show up in a schema other than its own (e.g. so there will be a table for ceterms:name on the QData terms page), and by default it just gets borrowed as-is. This is done by creating an entry for a term from one schema in the list of terms for another schema and setting that entry's "BorrowedFromSchemaId" value to be the schema ID of the original schema. If the only values in that entry are the term's URI and the value for BorrowedFromSchemaId, then everything else will be read from the original schema's entry for that URI.
But the schema manager allows overwriting one or more attributes of a term within the context of that schema (so that, for example, ceterms:name could have a different domain depending on whether you were looking at the CTDL terms page or the QData terms page). That may have made sense at some point in the past (maybe there was a use case for it at the time), but I don't think it still does. In other words, the system is working as intended, and the data needs correction.
The only reason borrowing happens at all is mostly for the convenience of the user, so that the user doesn't have to click around between multiple terms pages as much. This behavior was created back when there was minimal borrowing of terms and it was manageable to keep up with it, but since it's such a common practice now, it might be better to remove borrowed entries completely after fixing the discrepancies listed below (though this would mean more navigating on the consuming end, it would also be less confusing).
Regardless of whether you remove borrowed entries or not, you'll need to fix the discrepancies listed in the table below. The process/solution is basically the same for each of them:
[1] Since we don't have/use a standalone representation of schema.org terms, the below solutions assume that the CTDL schema (in the schema manager sense, not in an RDF sense) would fill that role. The schema manager does support adding such a representation, but that is probably more complexity than is needed here.
These are the issues I found when comparing the 3 JSON-LD documents linked above: