Dataset Examples

Dataset encapsulates compounds and their features. In order to model higher order relationships, two new classes have been introduced in OpenTox resource ontology - namely FeatureValue and DataEntry. FeatureValue encapsulates the relationship Feature - hasValue - Value. DataEntry encapsulates the relationship Compound - has values for specific -Features. A dataset consists of multiple DataEntries. These classes can be represented as anonymous classes in RDF notations, as in the example. Triple stores will generate separate triples for all involved binary relationships.

  • Dataset URI is defined by dc:identifier
<dc:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
            >http://myservice/dataset/{datasetid}</dc:identifier>
  • Dataset name is defined by dc:title
<dc:title rdf:datatype="http://www.w3.org/2001/XMLSchema#string">CPDBAS</dc:title>
  • Dataset source is defined by dc:source
        <dc:source rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
            >http://www.epa.gov/ncct/dsstox</dc:source>
  • Other Dublin Core elements can be used for more detailed meta information.
  • An example Dataset with value for two Features for a Compound
    <ot:Dataset rdf:ID="DatasetPredicted">
        <dc:identifier rdf:datatype="&xsd;string"
            >http://myservice/dataset/{datasetid}</dc:identifier>
        <dc:title rdf:datatype="&xsd;string"
            >Multi Cell Call prediction from J48</dc:title>
        <ot:dataEntry>
            <rdf:Description>
                <rdf:type rdf:resource="&ot;DataEntry"/>
                <ot:values>
                    <rdf:Description>
                        <rdf:type rdf:resource="&ot;FeatureValue"/>
                        <ot:value rdf:datatype="&xsd;boolean">true</ot:value>
                        <ot:feature rdf:resource="#MultiCellCallPredicted"/>
                    </rdf:Description>
                </ot:values>
               <ot:values>
                    <rdf:Description>
                        <rdf:type rdf:resource="&ot;FeatureValue"/>
                        <ot:value rdf:datatype="&xsd;boolean">true</ot:value>
                        <ot:feature rdf:resource="#MultiCellCall"/>
                    </rdf:Description>
                </ot:values>
                <ot:compound rdf:resource="#benzene"/>
            </rdf:Description>
        </ot:dataEntry>
    </ot:Dataset> 
    <ot:Compound rdf:ID="benzene">
        <dc:identifier rdf:datatype="&xsd;string"
            >http://myservice/compound/{compoundid1}</dc:identifier>
    </ot:Compound>
    <ot:Feature rdf:ID="MultiCellCallPredicted">
        <dc:identifier rdf:datatype="&xsd;string"
            >http://myservice/feature/{featureid3}</dc:identifier>
        <dc:title rdf:datatype="&xsd;string">MultiCellCall</dc:title>
        <ot:hasSource rdf:resource="#WekaJ48"/>
    </ot:Feature>

The same in N3 notation:

example:DatasetPredicted
      a       ot:Dataset ;
      dc:identifier "http://myservice/dataset/{datasetid}"^^xsd:string ;
      dc:title "Multi Cell Call prediction from J48"^^xsd:string ;
      ot:dataEntry
              [ a       ot:DataEntry ;
                ot:compound example:benzene ;
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature example:MultiCellCallPredicted ;
                          ot:value "true"^^xsd:boolean
                        ];
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature example:MultiCellCall ;
                          ot:value "true"^^xsd:boolean
                        ] ;
              ] .

example:benzene
      a       ot:Compound ;
      dc:identifier "http://myservice/compound/{compoundid1}"^^xsd:string .

example:MultiCellCallPredicted
      a       ot:Feature ;
      dc:identifier "http://myservice/feature/{featureid3}"^^xsd:string ;
      dc:title "MultiCellCall"^^xsd:string ;
      ot:hasSource example:WekaJ48 .

An example Dataset, used as Test Dataset in Validation.

    <ot:Dataset rdf:ID="DatasetTest">
        <dc:identifier rdf:datatype="&xsd;string"
            >http://myservice/dataset/{datasetid}</dc:identifier>
        <dc:title rdf:datatype="&xsd;string"
            >Test dataset for Model M1</dc:title>
        <ot:dataEntry>
            <rdf:Description>
                <rdf:type rdf:resource="&ot;DataEntry"/>
                <ot:values>
                    <rdf:Description>
                        <rdf:type rdf:resource="&ot;FeatureValue"/>
                        <ot:value rdf:datatype="&xsd;boolean">false</ot:value>
                        <ot:feature rdf:resource="#MultiCellCall"/>
                    </rdf:Description>
                </ot:values>
                <ot:values>
                    <rdf:Description>
                        <rdf:type rdf:resource="&ot;FeatureValue"/>
                        <ot:value rdf:datatype="&xsd;boolean">false</ot:value>
                        <ot:feature rdf:resource="#MultiCellCallPredicted"/>
                    </rdf:Description>
                </ot:values>
                <ot:compound rdf:resource="#phenol"/>
            </rdf:Description>
        </ot:dataEntry>
    </ot:Dataset>

 The same in N3:

default:DatasetTest
      a       ot:Dataset ;
      dc:identifier "http://myservice/dataset/{datasetid}"^^xsd:string ;
      dc:title "Test dataset for Model M1"^^xsd:string ;
      ot:dataEntry
              [ a       ot:DataEntry ;
                ot:compound default:phenol ;
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature default:MultiCellCall ;
                          ot:value "false"^^xsd:boolean
                        ] ;
                ot:values
                        [ a       ot:FeatureValue ;
                          ot:feature default:MultiCellCallPredicted ;
                          ot:value "false"^^xsd:boolean
                        ]
              ] .