[xsd-users] Re: Reg: Problem with xsd tool while using unicode...

Boris Kolpackov boris at codesynthesis.com
Fri Sep 29 08:43:13 EDT 2006


Hi Subhash,

In the future please send technical questions like this to the
xsd-users mailing list (which I've CC'ed) instead of to me directly.
This way a) other developers who may have experienced a similar
problem can provide you with a solution b) questions and answers
will be archived and available to others with similar problems.


subhash.dixit at tcs.com <subhash.dixit at tcs.com> writes:

> Hi Boris.
> I am using XSD tool for xml parsing on solaris.It is working fine with
> utf-8 encoding. It really helped us to develop our product. Now i want to
> integrate it with unicode. But I am not quite sure how to go for it. I
> want to know whether the same xsd tool will work for utf-16 encoding or
> not .
> I  changed the encoding variable to utf-16 form utf-8 in xsd file
> manually. And then tried to create the .cxx and .hxx file.
> It created the file with the warning -
>                   "CustDetails.xsd:1:41: warning: Encoding (utf-16, from
> XMLDecl or manually set) contradicts the auto-sensed encoding, ignoring
> it"
> After that when i tried to create the auto pointer then it gave
> compilation error saying that it is not accessible from main.
>  Can you please help me in this regard. Is there any solution for this
> problem. Does xsd supports unicode or utf-16.
> I m sending you the xsd file.

I think you are confusing three things here which are not related in the
way you seem to expect them to:

1. Encoding used in XML instance documents

   This is specified in the "XML declaration" at the beginning of
   XML documents, e.g.:

   <?xml version="1.0" encoding="UTF-16" ?>

   XSD can parse XML documents in a variety of encodings, including
   ASCII, UTF-8, UTF-16 and UTF-32, *regardless* (and this is important)
   of the encodings used for (2) and (3).


2. Encoding used in your XML Schema file

   This is specified in your .xsd files in the XML declaration (since
   .xsd file is an XML file) and prescribes the encoding of your
   XML Schema document itself. Note that this encoding does not affect
   any of the (1) and (3) as you seem to assume.


3. Encoding used in your application

   This is specified via the --char-type option when compiling your
   schemas with XSD and, in some cases, by setting the appropriate
   default locale in your application.

   The two supported values for this option are 'char' and 'wchar_t'.
   The encoding for 'char' can be any 8-bit encoding such as UTF-8.
   For 'wchar_t' it is a little bit more complicated. Different
   platforms have different wchar_t sizes: some (e.g., VC++ on Windows)
   have 16 bit and others (e.g, most UNIX C++ compilers) have 32 bit.
   Where wchar_t is 16 bit wide it makes sense to use UTF-16. On systems
   with 32 bits you can choose to use either UTF-16 or UTF-32 (also
   known as UCS-4). If you choose to use UTF-16 then two bytes of
   every character will be unused. The nice thing about UTF-32/UCS-4
   is that it is a fixed-width encoding, i.e., each code point occupies
   exactly one 4-byte element which is not the case with UTF-8 or UTF-16.


Now, let's consider an examples. It does not really matter which encoding
we choose for (2) so we will ignore it here (I suggest that you stick to
ASCII or UTF-8).

Let's assume we've compiled our schemas with '--char-type char' and set
the default locale in our application to UTF-8:

   #include <locale.h>

   int
   main (int argc, char* argv[])
   {
     setlocale (LC_CTYPE, "UTF-8");

     ...
   }

If we feed our application a UTF-16 encoded XML document then all the
content will be transcoded from UTF-16 to UTF-8 and our application
will be presented with the data in UTF-8. The same will happen for
any other encoding that an XML instance may use. In other words,
no matter what encoding an XML instance has, it will be transcoded
to the application-local encoding (UTF-8 in this example).

It is also possible to write out XML documents in a different encoding
compared to the the application-local. For this, all serialization
functions have the 'encoding' argument. To put this all together, as
an example, you can read a UTF-16 XML document, manipulate it in UTF-8
in your application and write it out in UTF-32.

hth,
-boris


>
> <?xml version="1.0" encoding="utf-16" ?>
> <xs:schema id="CustInfo"  elementFormDefault="qualified"
> xmlns="http://tempuri.org/CustInfo.xsd" xmlns:mstns="http://tempuri.
> org/ CustInfo.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema">
>
>         <xs:element name="custEntry" >
>         <xs:complexType>
>         <xs:sequence>
>                 <xs:element name="msgID" type="xs:string" />
>                 <xs:element name="custId" type="xs:string"/>
>
>                 <xs:element name="custDetails" >
>                 <xs:complexType>
>                 <xs:sequence>
>                         <xs:element name="custFname" type="xs:string" />
>                         <xs:element name="custLname" type="xs:string" />
>                         <xs:element name="gender" type="xs:string"
> minOccurs="0"/>
>                         <xs:element name="addSeqNum" type="xs:string" />
>                         <xs:element name="country" type="xs:string" />
>                         <xs:element name="idSeqNum" type="xs:string" />
>                         <xs:element name="idNum" type="xs:string" />
>                         <xs:element name="idIssPlace" type="xs:string" />
>                 </xs:sequence>
>                 </xs:complexType>
>                 </xs:element>
>
>         </xs:sequence>
>         </xs:complexType>
>         </xs:element>
>
> </xs:schema>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: Digital signature
Url : http://codesynthesis.com/pipermail/xsd-users/attachments/20060929/4846e6cd/attachment.pgp


More information about the xsd-users mailing list