README ====== Much of this content I learned from http://www.ibm.com/developerworks/xml/library/x-matters17.html, and Dave Mertz is also the one who introduced me to PYX. The authoritative source, and inventor of this format, is Sean McGrath. His article is here: http://www.xml.com/pub/a/2000/03/15/feature/index.html. You may also want to browse a copy of "McGrath - XML Processing with Python, Prentice Hall, 2000". The idea behind PYX is to have xml files easily digestible by common unix textutils, like grep, sed, awk, diff etc. The format itself owes ideas to SGML's ESIS standard. This is how the format works: each line has a prefix character, that determines the content-type of the line. Start and end tags, and individual attributes each get their own line, to facilitate easy line-based processing. The prefix characters are: ( start-tag ) end-tag A attribute - character data (content) ? processing instruction So this xml snippet blah looks like this in PYX: ({urn:oasis:names:tc:opendocument:xmlns:office:1.0}document-content A{urn:oasis:names:tc:opendocument:xmlns:office:1.0}version 1.2 A{http://www.w3.org/2003/g/data-view#}transformation http://docs.oasis-open.org/office/1.2/xslt/odf2rdf.xsl -\n - ({urn:oasis:names:tc:opendocument:xmlns:office:1.0}body -\n - ({urn:oasis:names:tc:opendocument:xmlns:office:1.0}presentation -\n - ({urn:oasis:names:tc:opendocument:xmlns:drawing:1.0}page -\n - blah -\n - ){urn:oasis:names:tc:opendocument:xmlns:drawing:1.0}page -\n - ){urn:oasis:names:tc:opendocument:xmlns:office:1.0}presentation -\n - ){urn:oasis:names:tc:opendocument:xmlns:office:1.0}body -\n ){urn:oasis:names:tc:opendocument:xmlns:office:1.0}document-content Converting that back to xml via "pyx2xml -ns" yields this: blah You notice that the conversion is not lossless in terms of octets produced, but should reasonably faithfully conserve the xml info set. One noticeable gap is the fact that conforming sax parsers will not report the xml prologue, so this is not converted to pyx, instead regenerated from scratch inside pyx2xml.py and thus loosing e.g. extra standalone attributes. INVOKING ======== xml2pyx.py : will convert xml file to pyx, on stdout pyx2xml.py [-ns]: will read pyx data from stdin, and output xml to stdout. Without the -ns flag, will operate as a pure filter and output lines as they come in. *With* the -ns flag, will store entire xml file in memory, to later add all used namespaces to the first element. MISC ==== Feedback to Have fun hacking, Thorsten