[xsde-users] Re: Error during serialization of the character 'ü'

Tue Dec 15 10:55:18 EST 2009

Hi Boris,

thanks for your fast reply. 

Quote:
Putting the  
Unicode value for 'ü' in a UTF-8 string (which is what you are doing)  
results in an invalid encoding since in UTF-8 such a value is only  
expected as part of a multi-byte sequence.  

Does this mean, i would not have the problem if i build my project without the UNICODE preprocessor command? Or if i build my project with UNICODE would it work if i convert the string to multibyte format using the win api function wcstombs?

Quote:
We are planning to add support for ISO-8859-1 (where 'ü' is represented  
as its Unicode value) in addition to UTF-8 as the object model encoding   
(you will still be able to serialize in UTF-8). If you only need to   
support Western-European languages then this encoding might be a better  
choice since it will be easier to work with. Let me know if you would  
like to try it.  

Yes, i just need support for the western european languages.

Regards, Tom

  _____  

From: Boris Kolpackov [mailto:boris at codesynthesis.com]
To: Thomas Frenzel (TomSun) [mailto:tftomsun at streamteam.de]
Cc: xsde-users at codesynthesis.com
Sent: Tue, 15 Dec 2009 13:57:48 +0100
Subject: Re: Error during serialization of the character 'ü'

Hi Thomas,

  Thomas Frenzel (TomSun) <tftomsun at streamteam.de> writes:

  > When i try to log a message with the character 'ü' i get an 
  > xml_schema::serializer_exception with the text "illegal UTF-8 character" 
  > and what "xml error"

  XSD/e expects all the text that you supply to it (e.g, in the object model
  or with the C++/Serializer mapping) to be in UTF-8. The proper encoding
  for letter 'ü' in UTF-8 is a two-byte sequence "\0xC3\0xBC". Putting the
  Unicode value for 'ü' in a UTF-8 string (which is what you are doing)
  results in an invalid encoding since in UTF-8 such a value is only
  expected as part of a multi-byte sequence.

  We are planning to add support for ISO-8859-1 (where 'ü' is represented
  as its Unicode value) in addition to UTF-8 as the object model encoding 
  (you will still be able to serialize in UTF-8). If you only need to 
  support Western-European languages then this encoding might be a better
  choice since it will be easier to work with. Let me know if you would
  like to try it.

  Boris