Overview
Krextor, the KWARC RDF Extractor, is an extensible XSLT-based framework for extracting RDF from XML, supporting multiple input languages as well as multiple output RDF notations. Krextor provides convenience templates that try to do “the right thing”™ in many common cases, as to reduce the need for manually writing repetitive code. The Publications provide further background on the design, requirements, and use cases behind Krextor.
Semantics
The extracted RDF graph will in most cases be an outline of the semantic structure of an XML document, abstracting from the concrete syntax. It can be used for more easily exchanging or interlinking knowledge contained in XML documents on the semantic web. There are many tools that support querying RDF, using languages like SPARQL. If the extracted RDF is backed by an expressive ontology, a reasoner can be used to infer additional knowledge from it.
Supported Formats
Krextor comes with some number of extraction and output modules. Support for additional formats is easy to add. Please let us know if you have written any extraction or output module, test case, or documentation that you would like us to make a part of the Krextor default distribution.
Input Formats (Extraction Modules; varying stability)
The following input formats are already supported. Others are easy to add. Just copy an existing extraction module to get started.
- omdoc (largely stable): OMDoc (source)
- in terms of this ontology ( ontology sources)
- ocd (stable): OpenMath Content Dictionaries (source)
- in terms of this ontology
- feature overview
- xbel (stable but incomplete): XBEL (XML Bookmark Exchange Language) (source)
- in terms of the Shared Desktop Ontologies
- suitable for use with the Nepomuk/KDE semantic desktop
- xhtml-rdfa (experimental): XHTML+RDFa (source)
- omdoc-owl (experimental): OMDoc, interpreted as OWL ontologies (source)
- hcalendar (incomplete): the hCalendar MicroFormat? (experimental; source)
- YourOwnExtraction
Output Formats (all stable)
- sequence of triples:
- rxr: RXR (Regular XML RDF), schema (source)
- ntriples: N-Triples
- grouped triples (first by common subject, then by common predicate; implemented as post-processing of RXR for now):
- java: Java callback for every triple (source)
- none: no output; for testing (source)
- YourOwnOutput
Usage
See Usage
Source code documentation
(generated using XSLTdoc)
External documentation
- A report by Tim Lebo on how he got started with Krextor and hacked it to fit into his application, covering:
- a brief review of the documentation provided here – focusing on those aspects that were relevant to him
- a ready-to-copy XSLT created from the “Simpsons” example on the YourOwnExtraction page
- having extraction modules outside of the extract directory (see also #109)
- an extraction module for the NEMSIS XML language for tabular data
