Document Processing Workflow

Below we sketch the document processing workflow, for details see  poster. See also janta-components.

Implementation Roadmap

see  milestones & tickets for up-to-date information and details.

Rendering

Rendering: Notation Adaptation

  • Purpose: Testing of the NTN Framework of the JOMDoc library
  • Retrieve rendered Documents from file system (later TNTBase) and adapt notations, see ticket:533

Rendering: XML to XML

  • Purpose: Allow integration with various interfaces (most frontend require a specific XML-input format, e.g. see wyzbook's format or WordML)
  • Optionally, requested documents are converted to specific XML formats
  • for demo purpose: XSLT-based conversion from OMDoc to XHTML

Initialization of the document commons: Segmentation and Integration

  • Purpose: Identification, creation, and maintenance of an in-memory document commons
  • On initial retrieval, documents are segmented and integrated into the document commons
  • Segmentation preserves the original document
  • Segmentation splits the document into knowledge items (suitable for reuse to create new documents) with global unique identifiers
  • Integration: We do not identify or construct new interrelation between knowledge items but simply draw on explicit cross-reference in and between documents

Extraction of documents

  • Purpose: Testing the document commons
  • Extract a (previously integrated) document from the document commons

Contextualization

  • Contextualization of the rendering of documents is completed
  • TODO: Specify context annotations for items in the document commons
  • user specific parameters: language
  • system specific parameters: font, font-size, line-height, margins (define the amount of information on a document's page)

Contextualize Doc Extraction

  • Purpose: Extending the NTN Framework towards Variants
  • Adapt extraction of initial documents to the user according to context parameters (exchange text fragments of the original document with appropriate variants, cache the new narrative path in the narrative commons)

Document creation

  • Selection and arrangement of knowledge items in a new document structure based on strategies and templates

Template-based Path Generation

  • Purpose: Towards Generating new documents
  • Generating a new path for an initial document, e.g. a slide presentation, based on templates/ strategies

Template-based Document Generation

  • Purpose: Towards Generating new documents
  • Creating template to generate new documents, selecting and arranging knowledge items
  • Example: A guided tour for a text fragments includes definition and examples for each symbol in the text fragment

Semantic Document Generation

  • Purpose: Towards Generating new documents
  • Processing the content commons, semantic path generation along theoretical dependencies

How to assure that a user receives the same document?

  • User modeling, is not part of the document processor but one level up, the user model generates requests to the document processor (maybe even an interface task)

Attachments