Discussion:
UTF-8 and Windows
Robert Westlund
16 years ago
Permalink
Hello,

I have a Linux program that creates a pspp input file containing UTF-8
characters, which I run the file through pspp 0.6.1 (also on Linux) to
create a SAV file. When I open this file in psppire on my Mac (US
English), it displays the Unicode characters perfectly, both in the
data and variables. However, opening this same file in psppire or
SPSS 16-17 on Windows XP or Vista displays gibberish for those
characters, regardless of the region I choose in the Regional control
panel. I've also tried adding the UTF-8 BOM to the beginning of the
file, with no success. With a BOM, the same data as CSV or Excel XLS
also works fine. If the "gibberish" is pasted into a text file and
saved as UTF-8, it displays correctly, which leads me to believe that
there is something that is causing Windows/SPSS/pspp to not interpret
the characters as UTF-8.

I've searched the pspp archives, and found other people have some
issues with UTF-8, and the answer is invariably that it is dependent
on your system locale, which, as I mention above, I tried. However,
Windows is not my specialty.

Can anyone tell me what I'm doing wrong, or why the same SAV file
works fine on the Mac but not Windows?

Thanks for any help you can provide.

Rob
michel
16 years ago
Permalink
Hello Rob,

I suppose mac version of PSPP is the 0.6.1, which is known to have problems
with non-ascii chars.
As far as I remember, it doesn't work with such chars. Do you have sure
that your file uses UTF-8?

You could try to save this file using other charset, like ISO-8859-1 (uses
on brazilian portugueses)
or CP437 (which I think is the default charset from windows).

Also, as I remember, pspp 0.6.1 doesn't recognize the LANG variable, used
to change the encoding. At
least not on windows, which uses relocatable.

You could try to use the development version of pspp on both linux and
windows, and see if this works
for you.

Best regards,

Michel
John Darrington
16 years ago
Permalink
0.6.1 doesn't do a very good job of dealing with UTF-8 characters.
In particular, .SAV files aren't given the record which identifies
the encoding of the data.

I suggest that you try the latest "master snapshot" from
http://pspp.benpfaff.org/~blp/ which stores strings internally as UTF-8
and correctly writes .SAV files with the encoding of the file.
Psppire should also display all characters correctly in that version.

J'
...
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
Continue reading on narkive:
Loading...