Revision [160]

Last edited on 2012-07-31 21:32:05 by DavidLee
Additions:
- Provide a human readable and editable format of XDM data
- [[XDMSerializeUseCase5 Use Case 5]] Provide a human readable and editable format of XDM data
Deletions:
- Provide a human readable and editable version of XDM data
- [[XDMSerializeUseCase5 Use Case 5]] Provide a human readable and editable version of XDM data


Revision [159]

Edited on 2012-07-31 21:31:44 by DavidLee
Additions:
- Provide a human readable and editable version of XDM data
- [[XDMSerializeUseCase5 Use Case 5]] Provide a human readable and editable version of XDM data
Deletions:
- Provide a human readable output of XDM data
- [[XDMSerializeUseCase5 Use Case 5]] Provide a human readable output of XDM data


Revision [158]

Edited on 2012-07-31 21:30:53 by DavidLee
Additions:
- Sequences should **not** be normalized. Sequences should preserve the individuality, count and type of items. Adjacent atomic item should not be concatenated (normalized).
Deletions:
- Sequences should **not** be normalized. Sequences should preserve the individuality, count and type of items. Adjacent atomic item should not be contented (normalized).


Revision [157]

Edited on 2012-07-28 18:06:05 by DavidLee
Additions:
Processing instructions are serialized identically to the XML Serialization. [ TBD Reference to PI serialization spec]
Deletions:
Processing instructions are serialzed identically to the XML Serialization. [ TBD Reference to PI serialization spec]


Revision [156]

Edited on 2012-07-28 17:59:50 by DavidLee
Additions:
Document nodes start with and serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]
====function items===
Functions are not serialized
Deletions:
Document nodes wrapped with a single root element ("") and then serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]


Revision [153]

Edited on 2011-07-27 05:52:14 by DavidLee
Additions:
- An XDM Stream //must be// able to represent all unicode codepoints expressed in XDM (and by inference XML).
Deletions:
- An XDM Stream //must be// be able to represent all unicode codepoints expressed in XDM (and by inference XML).


Revision [151]

Edited on 2011-07-27 05:31:10 by DavidLee
Additions:
This proposal is to define a common format for serializing XDM data for purposes of data interchange and interoperability preserving more of the XDM information.
The intent of this specification is to provide a common format for interchange of data between XML tools which produce and consume XDM data. Examples of XML tools which produce or consume XDM data include XPath (2.0), XSLT , XQuery, but also include other tools such as XProc, xmlsh and countless custom written programs using the XDM data model.
- A common text representation of XDM data preserving as much of the XDM model as reasonable including
Some purposes for which this format could be used include
- A format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.
- [[XDMSerializeUseCase7 Use Case 7]] A format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.
Deletions:
This proposal is to define a standard format for serializing XDM data for purposes of data interchange and interoperability preserving more of the XDM information.
The intent of this specification is to provide a standard format for interchange of data between XML tools which produce and consume XDM data. Examples of XML tools which produce or consume XDM data include XPath (2.0), XSLT , XQuery, but also include other tools such as XProc, xmlsh and countless custom written programs using the XDM data model.
- A standardized text representation of XDM data preserving as much of the XDM model as reasonable including
Some purposes for which this standard could be used include
- Standardization of a format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.
- [[XDMSerializeUseCase7 Use Case 7]] Standardization of a format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.


Revision [126]

Edited on 2011-01-14 04:44:24 by DavidLee
Additions:
['{' namespace uri '}'] [ prefix ] ':' name WHITESPACE value
Deletions:
['{' namespace uri '}'] [ prefix ] ':' name value


Revision [125]

Edited on 2011-01-14 04:39:12 by DavidLee
Additions:
- Type information of parentless atomic values is discarded.
- Atomic Types and values for parentless atomic types
- Nodes values are preserved for each 7 of the XDM Node types
- Type annotation is NOT preserved explicitly for child nodes , however by associating a schema with elements or documents type annotation can be reconstructed using a schema-aware parser.
- KIND is the item kind as a enumeration.
Document nodes wrapped with a single root element ("") and then serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]
['{' namespace uri '}'] [ prefix ] ':' name value
Deletions:
- Type information of atomic values is discarded.
- Atomic Types and values
- Nodes values are preserved for each 8 of the XDM Node types
- KIND is the item kind
Document nodes are serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]
['{' namespace uri '}'] [ prefix ] ':' name=value


Revision [124]

Edited on 2011-01-11 05:00:41 by DavidLee
Additions:
**//NOTE: This page is currently a work in progress, it is not complete and is available currently for collaboration purposes only.//**


Revision [123]

Edited on 2011-01-11 04:59:42 by DavidLee
Additions:
The XQuery 1.0 and XPath 2.0 Data Model (XDM) references a [[http://www.w3.org/TR/xslt-xquery-serialization/ Serialization]] format for XDM. This format was intended to serialize either full XML Documents, or Parsed External Entities. If given arbitrary XDM, the result can not be reconstructed into the original XDM. In particular, for purposes of data interchange, the losses are severe; sequences are "normalized" into a single rooted XML tree, adjacent atomic values are concatenated and type information is discarded.
Deletions:
The XQuery 1.0 and XPath 2.0 Data Model (XDM) references a [[http://www.w3.org/TR/xslt-xquery-serialization/ Serialization]] format for XDM. This format is "lossy" and does not reconstruct into the original XDM. In particular, for purposes of data interchange, the losses are severe; sequences are "normalized" into a single rooted XML tree, adjacent atomic values are concatenated and type information is discarded.


Revision [122]

Edited on 2011-01-10 06:07:36 by DavidLee
Additions:
- "value" is the value of the attribute, quoted as per the specifications of XML Serialization [TBD where is attribute serialzaiton defined ? ]
Example
{http://www.test.com}test:attr="value"
====text node====
Text nodes are serialized as quoted strings [TBD define quoting mechanism, is it the same as the attribute values ?]
Example:
"Some text in a text node"
TBD: support for namespace nodes optional ? I cannot create or generte a namespace node in XQuery even though it is part of XDM.
====processing instruction node====
Processing instructions are serialzed identically to the XML Serialization. [ TBD Reference to PI serialization spec]
Example

====comment nodes====
Comment nodes are serialized identically to the XML Serialization for comments. [TBD Reference to comment serialization spec]
Example

====atomic values====
Atomic values are serialized in the following form
[ '{' type uri '}' ] prefix ':' typename WHITESPACE value
- For XML Schema built-in types no "type uri" is required
- For user defined types a 'type uri' is required
- Prefix is required. The "xs" prefix is predeclared to be "http://www.w3.org/2001/XMLSchema"
- "value" is the string representation of the atomic value as defined by [[http://www.w3.org/TR/xpath-functions/#casting]] as if casted to "xs:string". The resulting value is then quoted. [ TBD reference to quoting spec]
Deletions:
TBD: support for namespace nodes optional ?
- "value" is the value of the attribute, quoted as per the specifications of XML Serialization


Revision [121]

Edited on 2011-01-10 05:53:50 by DavidLee
Additions:
====document node====
Document nodes are serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]
====element node====
Element nodes are serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]
====namespace node====
TBD: support for namespace nodes optional ?
====attribute node====
Attribute nodes are serialized as
['{' namespace uri '}'] [ prefix ] ':' name=value
- If there is no namespace uri then it is omitted
- If there is no prefix then it is omitted
- "value" is the value of the attribute, quoted as per the specifications of XML Serialization
Deletions:
====document====
Documents are serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]


Revision [120]

Edited on 2011-01-10 05:35:43 by DavidLee
Additions:
======Serialization Format======
The serialization format is described in terms of a stream of unicode characters (not bytes). The conversion to and from bytes to characters is an Encoding property. In memory representations likely need no encoding. File formats //should be// stored in UTF-8 format with no leading Byte Order Marker (BOM).
=====Abstract Form=====
- ITEM is the serialized form of the item. Within ITEM (the serialized form of the XDM Item) any occurring of the START character sequence must be entity escaped.
====START====
The START marker is a character sequence which represents the beginning of a single XDM Item. This sequence is not allowed to occur anywhere else in the XDM stream. This allows an XDM Consumer to perform some kinds of processing of XDM streams without having to parse the ITEM values. For example, a large XDM stream can be split into multiple streams with only needing to recognize a single character sequence.
====WHITESPACE====
WHITESPACE is any whitespace character (space, tab , CR , NL). Normalization of CR/NL sequences to NL is not required outside of ITEM serialization (as is in XML serialization) because in the production syntax wherever WHITESPACE occurs it, multiple adjacent occurances have the same meaning so it is not necessary to distinguish CR/NL from NL.
====KIND====
KIND is a character sequence which identifies the 8 XDM item kinds, seven types of XDM nodes [document, element, attribute, text, namespace, processing instruction, and comment), and atomic types.
====ITEM====
ITEM is the serialized form of a single XDM item. (see Item Serialization)
=====Item Serialization=====
Each of the 8 XDM kinds (7 node types and atomic types) are serialized into character sequences as follows.
====document====
Documents are serialized as specified in [[http://www.w3.org/TR/xslt-xquery-serialization/#xml-output XML Output]]
Deletions:
====Serialization Format====
===Abstract Form===
- ITEM is the serialized form of the item
Within ITEM (the serialized form of the XDM Item) any occurring of the START character sequence must be entity escaped.


Revision [119]

Edited on 2011-01-10 05:13:07 by DavidLee
Additions:
The Serialization Format Properties suggest a possible //Abstract Form// comprised of a sequence of **zero 0r more** of the following markup. (where "+" means one or more and "*" means zero or more).
** START WHITESPACE+ KIND WHITESPACE+ ITEM WHITESPACE* **
- START is the start marker character sequence
- WHITESPACE is any whitespace character (space, tab , CR , NL)
- KIND is the item kind
- ITEM is the serialized form of the item
Within ITEM (the serialized form of the XDM Item) any occurring of the START character sequence must be entity escaped.
Deletions:
The Serialization Format Properties suggest a possible //Abstract Form// comprised of a sequence of **zero 0r more** of the following markup.
** ITEM END_MARKER [WHITESPACE] **
Where "ITEM" is the serialized form of a single XDM Item, and "END_MARKER" is a character sequence which is not allowed within the serialized form of "ITEM". The END_MARKER is an item terminator not a separator.
"WHITESPACE" is zero or more whitespace characters ( space, newline, tab , carriage-return).
This allows concatenation of zero or more XDM Streams to be an XDM Stream without having to insert data [[XDMSerializeUseCase8 Use Case 8]].


Revision [118]

Edited on 2011-01-07 07:51:19 by DavidLee
Additions:
- XDM Consumer An XDM Tool which can consume XDM data (e.g as context item, parameters, external variables).
- XDM Producer An XDM Tool which can produce XDM data (e.g as a query result, result document, return value)
-Individuality of sequence items.
- Types of atomic items.
- Parentless attributes.
- Support for all seven types of XDM nodes [document, element, attribute, text, namespace, processing instruction, and comment.]
- A representation that can be easily implemented using existing vendors XML technology.
Deletions:
- XDM Consumer An XDM Tool which can consume (allow as input, arguments) XDM Data.
- XDM Producer An XDM Tool which can produce XDM data (on output, return or output variables)
-Individuality of sequence items
- Types of atomic items
- Parentless attributes
-
- A representation that can be easily implemented using existing vendors XML technology.


Revision [117]

Edited on 2011-01-07 07:34:50 by DavidLee
Additions:
The XQuery 1.0 and XPath 2.0 Data Model (XDM) references a [[http://www.w3.org/TR/xslt-xquery-serialization/ Serialization]] format for XDM. This format is "lossy" and does not reconstruct into the original XDM. In particular, for purposes of data interchange, the losses are severe; sequences are "normalized" into a single rooted XML tree, adjacent atomic values are concatenated and type information is discarded.
In current implementations there is no standard model for XDM data either within the same environment and language, or across languages and environements. For example, suppose an XQuery operation produces a sequence and it is desired to provide that sequence as a parameter to XSLT transformation, there is no standardized way to exchange the data. In practice in order to accomplish this, either the same vendor tools must be used within the same language and process, or the results must be serialized in a proprietary format and reconstituted in the target using the same proprietary format. Even with the same vendors implementations interchange is not always easy due to differences in internal data formats , languages, or transferring data across process or machine boundaries.
The existing text serialization for XDM proposal [[http://www.w3.org/TR/xpath-datamodel/ XDM]] is inadequate for data interchange.
- Sequences are transformed into a single rooted XML tree.
- Adjacent atomic values are converted to text and concatenated.
- Type information of atomic values is discarded.
- Parentless attributes are not serialized.
- The distinction between element and document is lost.
- The empty sequence () and the empty string "" are serialized identically.
These and other limitations make the existing XDM serialization proposal unsuitable for data interchange.

This proposal provides for a standard serialization format so that XDM data can be interchanged across tools, vendors, languages, environments and machines while maintaining most of the original XDM information.
- XDM [[http://www.w3.org/TR/xpath-datamodel/ XQuery 1.0 and XPath 2.0 Data Model]]
- XDM data An instance of the XDM Data Model
-XDM Stream An representation of a single instance of XDM data (a sequence zero or more items) as a sequence of characters (Unicode codepoints).
- A standardized text representation of XDM data preserving as much of the XDM model as reasonable including
- Parentless attributes
-
- Exchange of XDM data between XDM Tools and tools which are not XDM capable, or with limited XDM capability.
Deletions:
The XQuery 1.0 and XPath 2.0 Data Model (XDM) references a [[http://www.w3.org/TR/xslt-xquery-serialization/ Serialization]] format for XDM. This format is "lossy" and does not reconstruct into the original XDM. In particular, for purposes of data interchange, the losses are severe; sequences are "normalized" and type information is discarded.
In current implementations there is no standard model for XDM data either within the same environment and language, or across languages and environements. For example, suppose an XQuery operation produces a sequence and it is desired to provide that sequence as a parameter to XSLT transformation, there is no standardized way to exchange the data. In practice in order to accomplish this, either the same vendor tools must be used within the same language and process, or the results must be serialized in a proprietary format and reconstituted in the target using the same proprietary format. Even with the same vendors implementations interchange is not always easy due to differences in API layers, languages, or transferring data across process or machine boundaries.
This proposal provides for a standard serialization format so that XDM data can be interchanged across tools, vendors, languages, environments and machines.
-XDM Stream An in-memory or on-"disk" (file) representation of a single instance of XDM (a sequence zero or more items)
- A standardized text representation of XDM data preserving as much of the XDM model as reasonable.
- Exchange of XDM Data between XDM Tools and tools which are not XDM capable, or with limited XDM capability.


Revision [112]

Edited on 2011-01-06 07:00:43 by DavidLee
Additions:
** ITEM END_MARKER [WHITESPACE] **
Where "ITEM" is the serialized form of a single XDM Item, and "END_MARKER" is a character sequence which is not allowed within the serialized form of "ITEM". The END_MARKER is an item terminator not a separator.
"WHITESPACE" is zero or more whitespace characters ( space, newline, tab , carriage-return).
This allows concatenation of zero or more XDM Streams to be an XDM Stream without having to insert data [[XDMSerializeUseCase8 Use Case 8]].
Deletions:
** ITEM END_MARKER **
Where "ITEM" is the serialized form of a single XDM Item, and "END_MARKER" is a character sequence which is not allowed within the serialized form of "ITEM". The END_MARKER is an item terminator not a separator. This allows concatenation of zero or more XDM Streams to be an XDM Stream without having to insert data [[XDMSerializeUseCase8 Use Case 8]].


Revision [111]

Edited on 2011-01-06 06:57:37 by DavidLee
Additions:
The Serialization Format Properties suggest a possible //Abstract Form// comprised of a sequence of **zero 0r more** of the following markup.
** ITEM END_MARKER **

The empty sequence being represented by an empty stream.
Where "ITEM" is the serialized form of a single XDM Item, and "END_MARKER" is a character sequence which is not allowed within the serialized form of "ITEM". The END_MARKER is an item terminator not a separator. This allows concatenation of zero or more XDM Streams to be an XDM Stream without having to insert data [[XDMSerializeUseCase8 Use Case 8]].
Deletions:
The Serialization Format Properties suggest a possible //Abstract Form// comprised of a sequence of zero 0r more of the following markup. The empty sequence being represented by an empty stream.
** [ITEM] [SEPARATOR]**
Where "[ITEM]" is the serialized form of a single XDM Item, and "[SEPARATOR]" is a charactor sequence which is not allowed within the serialized form of "[ITEM]".


Revision [110]

The oldest known version of this page was created on 2011-01-06 06:46:57 by DavidLee
Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki