Skip to content

Discrepancies in the data for some terms in the context of different schema representations #985

Description

@siuc-nate

I was comparing the outputs (mostly the JSON-LD) of credreg for CTDL, CTDL-ASN, and QData, and I noticed that some terms show different information depending on which one of those JSON-LD files you're looking at. This is usually caused by the way the schema manager currently handles "borrowing" terms from one schema to show in the context of another schema.

A term is "borrowed" so that it will show up in a schema other than its own (e.g. so there will be a table for ceterms:name on the QData terms page), and by default it just gets borrowed as-is. This is done by creating an entry for a term from one schema in the list of terms for another schema and setting that entry's "BorrowedFromSchemaId" value to be the schema ID of the original schema. If the only values in that entry are the term's URI and the value for BorrowedFromSchemaId, then everything else will be read from the original schema's entry for that URI.

But the schema manager allows overwriting one or more attributes of a term within the context of that schema (so that, for example, ceterms:name could have a different domain depending on whether you were looking at the CTDL terms page or the QData terms page). That may have made sense at some point in the past (maybe there was a use case for it at the time), but I don't think it still does. In other words, the system is working as intended, and the data needs correction.

The only reason borrowing happens at all is mostly for the convenience of the user, so that the user doesn't have to click around between multiple terms pages as much. This behavior was created back when there was minimal borrowing of terms and it was manageable to keep up with it, but since it's such a common practice now, it might be better to remove borrowed entries completely after fixing the discrepancies listed below (though this would mean more navigating on the consuming end, it would also be less confusing).

Regardless of whether you remove borrowed entries or not, you'll need to fix the discrepancies listed in the table below. The process/solution is basically the same for each of them:

  1. Ensure the canonical representation of the term in the appropriate schema[1] has the right data
    • Check the term history from each schema's representation of that term just in case a history entry was made in the non-original schema due to an overwrite being made in the non-original schema.
  2. Remove any overwriting data from the representation of that term in other schemas.
  3. Ensure the same fix gets applied to both the current release and the pending release so that the problem doesn't reappear when the pending release becomes current

[1] Since we don't have/use a standalone representation of schema.org terms, the below solutions assume that the CTDL schema (in the schema manager sense, not in an RDF sense) would fill that role. The schema manager does support adding such a representation, but that is probably more complexity than is needed here.

These are the issues I found when comparing the 3 JSON-LD documents linked above:

# Issue Fix JSON-LD Term Links History Links
1 QData overwrites the domain of ceterms:alternateName Add any valid domains to the CTDL representation of ceterms:alternateName and remove the entire domain value from the QData representation of ceterms:alternateName CTDL QData CTDL CTDLASN QData
2 QData overwrites the domain of ceterms:name Add any valid domains to the CTDL representation of ceterms:name and remove the entire domain value from the QData representation of ceterms:name CTDL QData CTDL CTDLASN QData
3 QData overwrites the domain of ceterms:description (and has an invalid value, "qdata:EarningsProfile") Add any valid domains to the CTDL representation of ceterms:description and remove the entire domain value from the QData representation of ceterms:description CTDL QData CTDL CTDLASN QData
4 QData overwrites the domain of ceterms:endDate Add any valid domains to the CTDL representation of ceterms:endDate and remove the entire domain value from the QData representation of ceterms:endDate CTDL QData CTDL CTDLASN QData
5 QData overwrites the domain of ceterms:startDate Add any valid domains to the CTDL representation of ceterms:startDate and remove the entire domain value from the QData representation of ceterms:startDate CTDL QData CTDL CTDLASN QData
6 QData overwrites the domain of ceterms:jurisdiction Add any valid domains to the CTDL representation of ceterms:jurisdiction and remove the entire domain value from the QData representation of ceterms:jurisdiction CTDL QData CTDL CTDLASN QData
7 QData overwrites the range of ceterms:processingAgent Remove the entire domain value from the QData representation of ceterms:processingAgent (the CTDL representation already points at the 3 organization classes, and ceterms:Agent shouldn't be used as a range since that would be inconsistent with other properties) CTDL QData CTDL CTDLASN QData
8 CTDL-ASN overwrites the domain of ceterms:industryType Add any valid domains to the CTDL representation of ceterms:industryType and remove the entire domain value from the CTDL-ASN representation of ceterms:industryType CTDL CTDLASN CTDL CTDLASN QData
9 CTDL-ASN overwrites the domain of ceterms:occupationType Add any valid domains to the CTDL representation of ceterms:occupationType and remove the entire domain value from the QData representation of ceterms:occupationType CTDL CTDLASN CTDL CTDLASN QData
10 Both CTDL-ASN and QData overwrite the domain of ceterms:instructionalProgramType Add any valid domains to the CTDL representation of ceterms:instructionalProgramType and remove the entire domain value from the CTDL-ASN and QData representations of ceterms:instructionalProgramType CTDL CTDLASN QData CTDL CTDLASN QData
11 The definition and usage note for schema:QuantiativeValue are mismatched between CTDL and QData Determine what the correct definition and usage note (if any) should be, update it in the CTDL representation, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. CTDL QData CTDL CTDLASN QData
12 The domain for schema:description is mismatched between CTDL and QData Determine the correct domain, update it in the CTDL representation, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. Not available (conflicts with the URL for ceterms:description); find it in the full schema JSON-LD files linked above CTDL CTDLASN QData
13 The usage note and domain for schema:maxValue are mismatched between CTDL and QData Determine what the correct values for the domain and usage note (if applicable) should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. CTDL QData CTDL CTDLASN QData
14 The usage note and domain for schema:minValue are mismatched between CTDL and QData Determine what the correct values for the domain and usage note (if applicable) should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. CTDL QData CTDL CTDLASN QData
15 The usage note and domain for schema:value are mismatched between CTDL and QData Determine what the correct values for the domain and usage note (if applicable) should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. CTDL QData CTDL CTDLASN QData
16 The definition and range for schema:unitText are mismatched between CTDL and QData Determine what the correct values for the definition and range should be, update the CTDL representation to match that, set QData to borrow from that, and remove anything from the QData representation that would overwrite the CTDL representation. See also this comment/issue. CTDL QData CTDL CTDLASN QData

Metadata

Metadata

Assignees

Labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions