XDM Serialization Use Case 8
Creating and reading expanding files, such as logging data.
Some process require storing data incrementally over time. For example, a web server stores access logs for each HTTP request. Long running processes may run indefinitely so at any time (between writing each item) the file should be syntactically valid so that it can be read without waiting for the process to terminate. For simplicity and efficiency, these files should be able to have new items written without rewriting any data. This implies there should be no "end markers" which need to be removed in order to append new data.Files generated by long lived processes may become indefinitely large, so typically a file rotation scheme is used. The side effect is that the producer is not in control of when a file is moved and needs to be able to append to empty files. A file or stream should be able to be appended with no knowledge of it state; appending to an existing file or stream should be the same as writing the an empty file or stream.
Parsing these files requires that the format is valid and consistent regardless of the number of items present and without knowledge of which item in the original sequence is the first item in the file. The first item (or file) may not be available at all .
These uses imply the following constraints on a serialization
- No file end markers. There should not be any markers or tokens which are required to indicate the end of the sequence. A file EOF or a stream content-length should be sufficient to indicate when to stop processing. For in-memory storage, a stream or string format which allows for incremental appending and which provides a length method should be sufficient.
- No file start markers. There should be no markers, tokens, or other file headers required. The file or stream should be parseable at the start of any item as long as an entire item is present.
- The format for serializing a single item should not differ depending on the position of that item, that is whether it is the first, last, or a middle item.
Actors
An XDM Producer
A process (producer) which produces XDM data incrementally. The process may be long lived and the data output at any time.An XDM Consumer
A process (consumer) which reads the data output by the producer. The consumer may wish to read the data at any time and should be able to read every complete item in the sequence. The consumer may not know the state of the data, that is if it is the original start of the data or not.A file manager
Data stored in files (or in memory) from a long lived process may grow indefinately without management.A file manager may be used to provide file (or memory) manipulations to manage the size and number of files.
These operations may include
- Moving a file. A file may be moved at any time without cooperation from the producer.
- Truncating a file. A file may be reduced in size by either truncating from the end or the beginning.
- Deleting files. Either the current active file or previously moved files may be deleted and become unavailable.
- Splitting files. A large file may be split into smaller files. Splitting should only occur between items in a sequence.
- Concatenating files. Multiple files either from a single producer, multiple producers, or non-adjacent output from the same producer may be concatenated and maintain a syntactically correct file.
Nice to Have
The consumer may wish to skip over items without inuring the penalty of fully parsing them.A file manager needs to be able to identify items so that files may be split and concatenated. Ideally this should be possible without having to parse each item. For example, sequences of large document nodes should be able to be split into individual files without having to do a full XML parsing of each document.
Expected Results
Assume that write operations are atomic at the level of items (by either file locking, buffering, or other mechanisms).In the presence of any supported file operations (via the actor "A file manager" and the actor "XDM Producer")
- Producers should be able to write or append one or more items without knowledge of the state of the data.
- Consumers should be able to read one or more items from any file without knowledge of relative position of that file in the overall sequence.
- Some operations MAY change the ordering of items in the sequence. It is the application which determines if this is acceptable, and how to restrict the operation which change ordering if desired.
- With the exception of item ordering in sequences, all operations result in individual files (or in memory streams) which are themselves valid XDM serialization format.
XDMSerialize
There are no comments on this page. [Add comment]