From decoincy.thibault at gmail.com Thu Sep 8 09:32:44 2022 From: decoincy.thibault at gmail.com (Thibault de COINCY) Date: Fri Sep 9 03:48:05 2022 Subject: [studxml-users] Fastest way to count the number of elements with a specific name or at a specific depth? Message-ID: Hi What is the fastest way to count the number of elements with a specific name or at a specific depth? I have an XML like this and I'm trying to count the number of brands: text1 text4 ... text99 text100 text1 text5 ... text49 text50 ... text1 text2 ... text124 text125 I use something like this, but it is very slow: int BrandCount = 0; xml::parser parser(xml_stream, xml_path, xml::parser::receive_elements); for (xml::parser::event_type event (parser.next()); event != xml::parser::eof; event = parser.next()){ if (event == xml::parser::start_element) { if (parser.name = "brand") {++BrandCount;}; }; }; Is there a way to speed up the process? Currently, the parser goes through every single element, is it possible to skip elements that are not "brand"? Given I'm not interested in getting the content of the elements but only the count, is there a way to optimize the parser somehow? Many thanks!! Thibault From boris at codesynthesis.com Sun Sep 11 08:04:10 2022 From: boris at codesynthesis.com (Boris Kolpackov) Date: Sun Sep 11 07:58:46 2022 Subject: [studxml-users] Fastest way to count the number of elements with a specific name or at a specific depth? In-Reply-To: References: Message-ID: Thibault de COINCY writes: > What is the fastest way to count the number of elements with a specific > name or at a specific depth? > > I use something like this, but it is very slow: > > int BrandCount = 0; > xml::parser parser(xml_stream, xml_path, xml::parser::receive_elements); > for (xml::parser::event_type event (parser.next()); event != > xml::parser::eof; event = parser.next()){ > if (event == xml::parser::start_element) { > if (parser.name = "brand") {++BrandCount;}; > }; > }; Yes, this is more or less how I would do it. > Is there a way to speed up the process? Have you tried to build everything with optimization? I can see how this would be slow in comparison with, say, a substring scan, but unless you are parsing huge documents, this shouldn't matter much on modern hardware. > Currently, the parser goes through every single element, is it possible to > skip elements that are not "brand"? > Given I'm not interested in getting the content of the elements but only > the count, is there a way to optimize the parser somehow? Generally, it's impossible to "jump" over chunks of unparsed XML due to its lexical structure. So any conforming parser will have to parse the entire document. One could probably write a specialized parser which will not bother with extracting and returning any data for the uninteresting parts. But general-purpose XML parsers (like libstudxml) normally assume that everything is of interest.