[xsd-users] Enumerated strings parser generation

Chris Gagneraud chgans at googlemail.com
Wed Nov 2 17:52:55 EDT 2016


On 3 November 2016 at 04:38, Boris Kolpackov <boris at codesynthesis.com> wrote:
> Hi Chris,
>
> Chris Gagneraud <chgans at googlemail.com> writes:
>
>> I understand the difference b/w cxx-tree and cxx-parser, but in this
>> case, is it possible to hit a middle ground?
>

Hi Boris,

> The intended use of C++/Parser is (a) simple XML vocabularies and (b)
> existing object models to parse them into. Your case doesn't fit either
> of the two. Why not use C++/Tree?

C++/Tree does in-memory parsing of the whole document, which is a
no-go for me as some XML file can be really huge (go some segfault on
Windows due to that), plus I liked the idea of having my own object
model. Although right now my object model mimic very closely the XML
structure, I'm planning to change that partially once i'm getting
accustomed with the domain model.
To get started, I might switch back to C++/Tree and limit myself to
"not-too-big" files, and then implement a second stage "translator"
that would convert the XSD/CXX structures to the a more
domain-friendly model...

After re-reading the introductory paragraph of
http://www.codesynthesis.com/products/xsd/c++/parser/, it's not
obvious at all that the intended use is for simple vocabulary, the
text emphasized on memory usage and general performance.
Well, a definite hint is the type-map, which in my case would get
quite big (haven't started yet, but just looking at my object model,
it's clear that i will have to provide a lot of hints in that file)

To give you a bit more context, another problem I have to face is that
existing software that generates these XML files are not always XSD
compliant, not by much but still. My work around so far was to modify
the XSD to relax it a bit (based on parsing failures as they come),
and then implement a second pass to detect minor compliance errors and
issue warnings/errors.

So to summarise my requirements:
- 1. Big XSD (100+ enums, 100's of complex types, XSD/Parser generates
80k SLOCs of skeleton code)
- 2. Can handle big/huge XML files (250+ MB)
- 3. Can cope with minor XSD compliance issues
- 4. Ideally Qt friendly (QString, QList, QVariant, QByteArray, ...)
- 5. Support for partial parsing (I will want at some point to do
partial parsing, as the XML document contain huge amount of data for
the full spectrum of use cases, and the end user might want to use the
XML file for just a sub-domain.)

'1' is the reason I turned to XSD/CXX instead of my own parser (was
using Qt/XML, too much work, and so too much risk of buggy
implementation)
'2' is a definite blocker, as of 2016 250MB XML files is a "big guy",
but the trend is for even bigger files as technology keep evolving.
'3' as already stated could be implemented by carefull XSD
modification, maybe I could catch exception too, but don't really like
it, have to think again about that one.
'4' is not a big deal really, just a nice-to-have
'5' My understading is that it is easily doable with XSD/CXX

Maybe I should give a go at XSD/e?

Thanks for writing this great piece of SW and for making it available
to the open-source community.
Chris

>
> Boris



More information about the xsd-users mailing list