Formatting code for XDMSerialize
======XDM Text Serialization======
A proposal for text serialization of [[http://www.w3.org/TR/xpath-datamodel/ XDM]] for purposes of data interchange and interoperability.
=====Abstract=====
The XQuery 1.0 and XPath 2.0 Data Model (XDM) references a [[http://www.w3.org/TR/xslt-xquery-serialization/ Serialization]] format for XDM. This format is "lossy" and does not reconstruct into the original XDM. In particular, for purposes of data interchange, the losses are severe; sequences are "normalized" and type information is discarded.
This proposal is to define a standard format for serializing XDM data for purposes of data interchange and interoperability preserving more of the XDM information.
=====Intent=====
The intent of this specification is to provide a standard format for interchange of data between XML tools which produce and consume XDM data. Examples of XML tools which produce or consume XDM data include XPath (2.0), XSLT , XQuery, but also include other tools such as XProc, xmlsh and countless custom written programs using the XDM data model.
In current implementations there is no standard model for XDM data either within the same environment and language, or across languages and environements. For example, suppose an XQuery operation produces a sequence and it is desired to provide that sequence as a parameter to XSLT transformation, there is no standardized way to exchange the data. In practice in order to accomplish this, either the same vendor tools must be used within the same language and process, or the results must be serialized in a proprietary format and reconstituted in the target using the same proprietary format. Even with the same vendors implementations interchange is not always easy due to differences in API layers, languages, or transferring data across process or machine boundaries.
This proposal provides for a standard serialization format so that XDM data can be interchanged across tools, vendors, languages, environments and machines.
====Definitions====
For the purposes of this document the following definitions apply
- XDM Tool A program, module, function or API which can produce or consume XDM Data in some form.
- XDM Consumer An XDM Tool which can consume (allow as input, arguments) XDM Data.
- XDM Producer An XDM Tool which can produce XDM data (on output, return or output variables)
- Environment An instance of the runtime of a single language with native language types, typically a single process.
====Goals====
This proposal expresses multiple goals, not all of which may be possible to achieve. The use cases describe concrete examples of many of the goals, while this summary provides the intent.
- A standardized text representation of XDM data preserving as much of the XDM model as reasonable.
-Individuality of sequence items
- Types of atomic items
- A representation that can be easily implemented using existing vendors XML technology.
Some purposes for which this standard could be used include
- Exchange of XDM data between XDM Tools in different environments
- Exchange of XDM data between XDM Tools from different vendors in the same environment
- Exchange of XDM data between XDM Tools from the same vendor in the same environment where it is difficult to preserve the vendors native data structure
- Exchange of XDM Data between XDM Tools and tools which are not XDM capable, or with limited XDM capability.
- Provide a human readable output of XDM data
- Output of XDM data from test cases using XDM Tools with the purposes of validation and compare
- Standardization of a format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.
====XDM Information Preserved and Lost====
Preserving all of the information in the XDM is very difficult, and likely why a serialization model for XDM has not been specified. This proposal recognizes that not all XDM information is equally important. In the context of the Use Cases, and with the goals of reasonable implementation with existing vendor libraries this proposal aims at preserving some XDM information at the expense of others.
==XDM Information preserved==
The XDM Model defines values as a sequence of zero or more items. Each item is one of the following types
- Atomic Type
- Node type
- document, element,attribute, text, namespace, processing , instruction, comment.
Each type has a value. Atomic types have string values, and node types have XML values.
An XDM serialization format should preserve the following attributes
- Sequences
- Sequences should **not** be normalized. Sequences should preserve the individuality, count and type of items. Adjacent atomic item should not be contented (normalized).
- Atomic Types and values
- Atomic types are preserved with the expanded QName for the type and the string value
- Nodes
- Nodes values are preserved for each 8 of the XDM Node types
- Serialized XDM will retain information about the descendants of nodes in the sequence being serialized, but it will not retain information about their ancestors.
==XDM Information NOT preserved==
- Ancester information. An XDM Node serialized will NOT maintain information about its ancestors. For example if a node $a is serliazed then $a/.. is NOT maintained.
- Serialized XDM will not retain information about node identity: that is, the recipient of the serialized XDM will not be able to determine whether two serialized elements originated from the same node or merely from two nodes that were deep-equal to each other.
- Schema and Type information. XDM Serialization does NOT transfer type **defininitions**. The consumer of the serialized XDM is assumed to have access to the same schema as the producer of the serialized XDM: that is, a QName identifying a type is assumed to have the same meaning to both the producer and consumer.
=====Use Cases=====
Use cases are concrete examples that demonstrate the goals.
- [[XDMSerializeUseCase1 Use Case 1]] Exchange of XDM data between XDM Tools in different environments
- [[XDMSerializeUseCase2 Use Case 2]] Exchange of XDM data between XDM Tools from different vendors in the same environment
- [[XDMSerializeUseCase3 Use Case 3]] Exchange of XDM data between XDM Tools from the same vendor in the same environment where it is difficult to preserve the vendors native data structure
- [[XDMSerializeUseCase4 Use Case 4]] Exchange of XDM Data between XDM Tools and tools which are not XDM capible.
- [[XDMSerializeUseCase5 Use Case 5]] Provide a human readable output of XDM data
- [[XDMSerializeUseCase6 Use Case 6]] Output of XDM data from test cases using XDM Tools with the purposes of validation and compare
- [[XDMSerializeUseCase7 Use Case 7]] Standardization of a format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.
====Serialization Format====
Real Soon Now !
A proposal for text serialization of [[http://www.w3.org/TR/xpath-datamodel/ XDM]] for purposes of data interchange and interoperability.
=====Abstract=====
The XQuery 1.0 and XPath 2.0 Data Model (XDM) references a [[http://www.w3.org/TR/xslt-xquery-serialization/ Serialization]] format for XDM. This format is "lossy" and does not reconstruct into the original XDM. In particular, for purposes of data interchange, the losses are severe; sequences are "normalized" and type information is discarded.
This proposal is to define a standard format for serializing XDM data for purposes of data interchange and interoperability preserving more of the XDM information.
=====Intent=====
The intent of this specification is to provide a standard format for interchange of data between XML tools which produce and consume XDM data. Examples of XML tools which produce or consume XDM data include XPath (2.0), XSLT , XQuery, but also include other tools such as XProc, xmlsh and countless custom written programs using the XDM data model.
In current implementations there is no standard model for XDM data either within the same environment and language, or across languages and environements. For example, suppose an XQuery operation produces a sequence and it is desired to provide that sequence as a parameter to XSLT transformation, there is no standardized way to exchange the data. In practice in order to accomplish this, either the same vendor tools must be used within the same language and process, or the results must be serialized in a proprietary format and reconstituted in the target using the same proprietary format. Even with the same vendors implementations interchange is not always easy due to differences in API layers, languages, or transferring data across process or machine boundaries.
This proposal provides for a standard serialization format so that XDM data can be interchanged across tools, vendors, languages, environments and machines.
====Definitions====
For the purposes of this document the following definitions apply
- XDM Tool A program, module, function or API which can produce or consume XDM Data in some form.
- XDM Consumer An XDM Tool which can consume (allow as input, arguments) XDM Data.
- XDM Producer An XDM Tool which can produce XDM data (on output, return or output variables)
- Environment An instance of the runtime of a single language with native language types, typically a single process.
====Goals====
This proposal expresses multiple goals, not all of which may be possible to achieve. The use cases describe concrete examples of many of the goals, while this summary provides the intent.
- A standardized text representation of XDM data preserving as much of the XDM model as reasonable.
-Individuality of sequence items
- Types of atomic items
- A representation that can be easily implemented using existing vendors XML technology.
Some purposes for which this standard could be used include
- Exchange of XDM data between XDM Tools in different environments
- Exchange of XDM data between XDM Tools from different vendors in the same environment
- Exchange of XDM data between XDM Tools from the same vendor in the same environment where it is difficult to preserve the vendors native data structure
- Exchange of XDM Data between XDM Tools and tools which are not XDM capable, or with limited XDM capability.
- Provide a human readable output of XDM data
- Output of XDM data from test cases using XDM Tools with the purposes of validation and compare
- Standardization of a format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.
====XDM Information Preserved and Lost====
Preserving all of the information in the XDM is very difficult, and likely why a serialization model for XDM has not been specified. This proposal recognizes that not all XDM information is equally important. In the context of the Use Cases, and with the goals of reasonable implementation with existing vendor libraries this proposal aims at preserving some XDM information at the expense of others.
==XDM Information preserved==
The XDM Model defines values as a sequence of zero or more items. Each item is one of the following types
- Atomic Type
- Node type
- document, element,attribute, text, namespace, processing , instruction, comment.
Each type has a value. Atomic types have string values, and node types have XML values.
An XDM serialization format should preserve the following attributes
- Sequences
- Sequences should **not** be normalized. Sequences should preserve the individuality, count and type of items. Adjacent atomic item should not be contented (normalized).
- Atomic Types and values
- Atomic types are preserved with the expanded QName for the type and the string value
- Nodes
- Nodes values are preserved for each 8 of the XDM Node types
- Serialized XDM will retain information about the descendants of nodes in the sequence being serialized, but it will not retain information about their ancestors.
==XDM Information NOT preserved==
- Ancester information. An XDM Node serialized will NOT maintain information about its ancestors. For example if a node $a is serliazed then $a/.. is NOT maintained.
- Serialized XDM will not retain information about node identity: that is, the recipient of the serialized XDM will not be able to determine whether two serialized elements originated from the same node or merely from two nodes that were deep-equal to each other.
- Schema and Type information. XDM Serialization does NOT transfer type **defininitions**. The consumer of the serialized XDM is assumed to have access to the same schema as the producer of the serialized XDM: that is, a QName identifying a type is assumed to have the same meaning to both the producer and consumer.
=====Use Cases=====
Use cases are concrete examples that demonstrate the goals.
- [[XDMSerializeUseCase1 Use Case 1]] Exchange of XDM data between XDM Tools in different environments
- [[XDMSerializeUseCase2 Use Case 2]] Exchange of XDM data between XDM Tools from different vendors in the same environment
- [[XDMSerializeUseCase3 Use Case 3]] Exchange of XDM data between XDM Tools from the same vendor in the same environment where it is difficult to preserve the vendors native data structure
- [[XDMSerializeUseCase4 Use Case 4]] Exchange of XDM Data between XDM Tools and tools which are not XDM capible.
- [[XDMSerializeUseCase5 Use Case 5]] Provide a human readable output of XDM data
- [[XDMSerializeUseCase6 Use Case 6]] Output of XDM data from test cases using XDM Tools with the purposes of validation and compare
- [[XDMSerializeUseCase7 Use Case 7]] Standardization of a format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.
====Serialization Format====
Real Soon Now !