Revision [104]
Last edited on 2011-01-01 08:08:16 by DavidLeeAdditions:
==Expected Results==
Assume that write operations are atomic at the level of items (by either file locking, buffering, or other mechanisms).
In the presence of any supported file operations (via the actor "A file manager" and the actor "XDM Producer")
- Producers should be able to write or append one or more items without knowledge of the state of the data.
- Consumers should be able to read one or more items from any file without knowledge of relative position of that file in the overall sequence.
- Some operations MAY change the ordering of items in the sequence. It is the application which determines if this is acceptable, and how to restrict the operation which change ordering if desired.
- With the exception of item ordering in sequences, all operations result in individual files (or in memory streams) which are themselves valid XDM serialization format.
Assume that write operations are atomic at the level of items (by either file locking, buffering, or other mechanisms).
In the presence of any supported file operations (via the actor "A file manager" and the actor "XDM Producer")
- Producers should be able to write or append one or more items without knowledge of the state of the data.
- Consumers should be able to read one or more items from any file without knowledge of relative position of that file in the overall sequence.
- Some operations MAY change the ordering of items in the sequence. It is the application which determines if this is acceptable, and how to restrict the operation which change ordering if desired.
- With the exception of item ordering in sequences, all operations result in individual files (or in memory streams) which are themselves valid XDM serialization format.
Revision [103]
Edited on 2011-01-01 07:43:05 by DavidLeeAdditions:
======XDM Serialization Use Case 8======
=====Creating and reading expanding files, such as logging data.=====
Some process require storing data incrementally over time. For example, a web server stores access logs for each HTTP request. Long running processes may run indefinitely so at any time (between writing each item) the file should be syntactically valid so that it can be read without waiting for the process to terminate. For simplicity and efficiency, these files should be able to have new items written without rewriting any data. This implies there should be no "end markers" which need to be removed in order to append new data.
Files generated by long lived processes may become indefinitely large, so typically a file rotation scheme is used. The side effect is that the producer is not in control of when a file is moved and needs to be able to append to empty files. A file or stream should be able to be appended with no knowledge of it state; appending to an existing file or stream should be the same as writing the an empty file or stream.
Parsing these files requires that the format is valid and consistent regardless of the number of items present and without knowledge of which item in the original sequence is the first item in the file. The first item (or file) may not be available at all .
These uses imply the following constraints on a serialization
- No file end markers. There should not be any markers or tokens which are required to indicate the end of the sequence. A file EOF or a stream content-length should be sufficient to indicate when to stop processing. For in-memory storage, a stream or string format which allows for incremental appending and which provides a length method should be sufficient.
- No file start markers. There should be no markers, tokens, or other file headers required. The file or stream should be parseable at the start of any item as long as an entire item is present.
- The format for serializing a single item should not differ depending on the position of that item, that is whether it is the first, last, or a middle item.
==An XDM Producer ==
A process (producer) which produces XDM data incrementally. The process may be long lived and the data output at any time.
==An XDM Consumer==
A process (consumer) which reads the data output by the producer. The consumer may wish to read the data at any time and should be able to read every complete item in the sequence. The consumer may not know the state of the data, that is if it is the original start of the data or not.
==A file manager ==
Data stored in files (or in memory) from a long lived process may grow indefinately without management.
A file manager may be used to provide file (or memory) manipulations to manage the size and number of files.
These operations may include
- Moving a file. A file may be moved at any time without cooperation from the producer.
- Truncating a file. A file may be reduced in size by either truncating from the end or the beginning.
- Deleting files. Either the current active file or previously moved files may be deleted and become unavailable.
- Splitting files. A large file may be split into smaller files. Splitting should only occur between items in a sequence.
- Concatenating files. Multiple files either from a single producer, multiple producers, or non-adjacent output from the same producer may be concatenated and maintain a syntactically correct file.
===Nice to Have===
The consumer may wish to skip over items without inuring the penalty of fully parsing them.
A file manager needs to be able to identify items so that files may be split and concatenated. Ideally this should be possible without having to parse each item. For example, sequences of large document nodes should be able to be split into individual files without having to do a full XML parsing of each document.
=====Creating and reading expanding files, such as logging data.=====
Some process require storing data incrementally over time. For example, a web server stores access logs for each HTTP request. Long running processes may run indefinitely so at any time (between writing each item) the file should be syntactically valid so that it can be read without waiting for the process to terminate. For simplicity and efficiency, these files should be able to have new items written without rewriting any data. This implies there should be no "end markers" which need to be removed in order to append new data.
Files generated by long lived processes may become indefinitely large, so typically a file rotation scheme is used. The side effect is that the producer is not in control of when a file is moved and needs to be able to append to empty files. A file or stream should be able to be appended with no knowledge of it state; appending to an existing file or stream should be the same as writing the an empty file or stream.
Parsing these files requires that the format is valid and consistent regardless of the number of items present and without knowledge of which item in the original sequence is the first item in the file. The first item (or file) may not be available at all .
These uses imply the following constraints on a serialization
- No file end markers. There should not be any markers or tokens which are required to indicate the end of the sequence. A file EOF or a stream content-length should be sufficient to indicate when to stop processing. For in-memory storage, a stream or string format which allows for incremental appending and which provides a length method should be sufficient.
- No file start markers. There should be no markers, tokens, or other file headers required. The file or stream should be parseable at the start of any item as long as an entire item is present.
- The format for serializing a single item should not differ depending on the position of that item, that is whether it is the first, last, or a middle item.
==An XDM Producer ==
A process (producer) which produces XDM data incrementally. The process may be long lived and the data output at any time.
==An XDM Consumer==
A process (consumer) which reads the data output by the producer. The consumer may wish to read the data at any time and should be able to read every complete item in the sequence. The consumer may not know the state of the data, that is if it is the original start of the data or not.
==A file manager ==
Data stored in files (or in memory) from a long lived process may grow indefinately without management.
A file manager may be used to provide file (or memory) manipulations to manage the size and number of files.
These operations may include
- Moving a file. A file may be moved at any time without cooperation from the producer.
- Truncating a file. A file may be reduced in size by either truncating from the end or the beginning.
- Deleting files. Either the current active file or previously moved files may be deleted and become unavailable.
- Splitting files. A large file may be split into smaller files. Splitting should only occur between items in a sequence.
- Concatenating files. Multiple files either from a single producer, multiple producers, or non-adjacent output from the same producer may be concatenated and maintain a syntactically correct file.
===Nice to Have===
The consumer may wish to skip over items without inuring the penalty of fully parsing them.
A file manager needs to be able to identify items so that files may be split and concatenated. Ideally this should be possible without having to parse each item. For example, sequences of large document nodes should be able to be split into individual files without having to do a full XML parsing of each document.
Deletions:
=====Standardization of a format for use in XML Pipeline Processors so that steps can be implemented by different vendors or in different languages.=====
XML Pipeline Processors such as [[http://xproc.org/ XProc]] and [[http://www.xmlsh.org xmlsh]] pass XDM values between 'steps' in the pipeline. XProc, for example, passes 'sequence of documents' between steps, as well as input and output to the XProc pipeline as a whole.
Since there is no standardized format for representing 'sequence of documents' (which is a subset of XDM) as well as XDM in general, implementations of XProc must decide on proprietary formats for these. The result is that there is neither a standard format for inputting or outputting data or for how the data is formatted between steps. This means that there is no vendor compatibility for either using pipeline processes or for implementing steps in pipeline processors.
If a vendor wishes to implement an extension step (or even a be used for a predefined step) for an XProc or other pipeline processor they need to implement vendor specific interfaces in order to be integrated into the pipeline processor.
If a developer or user wishes to integrate with XProc (or other XML pipeline processor) since there is no standard way of supplying input or consuming output then each vendors implementation must be integrated differently.
Note that integrating *to* an XML Pipeline processor is equivilent to [[XDMSerializeUseCase1 Use Case 1]] and [[XDMSerializeUseCase2 Use Case 2]] so this use case is specific to implementing and integrating "steps" within a pipeline processor.
==XML Pipeline Processor: ==
An XML Pipeline Processor coordinates an XML Transformation between "Steps". Each step can have XDM values (or a subset of XDM Values) input and output from that step.
==XML Pipeline Processor Extension Step==
Developers of an XML Pipeline Processor Extension Step need to be able to consume and produce XDM types (or a subset of XDM Types). If these are represented in a standard format then step developers could produce steps which work in multiple implementations of Pipeline processors.
For example a producer of 'Validate with Schematron' should be able to write an XProc step that is usable within multiple vendors' XProc pipeline implementations.
====Expected Result====
A developer of an XML Pipeline Processor Extension Step should be able to write the step using non proprietary interfaces so they can be reused in multiple vendors implementations of the XML Pipeline Processor.