Re: What is the character encoding of .nwctxt files?
Reply #3 –
Dear Rick and Bob,
Thanks for the help. Character encoding indeed has to do with how all the characters including the £ sign are coded in bits. Not all characters that we want to encode these days can be held in a single byte (0-255). Here's a link to an excellent wikipedia article on the subject. http://en.wikipedia.org/wiki/Character_encoding .
My purpose in asking the question was to defeat a bug I was getting while testing a Python based nwctxt2xml.py & xml2nwctxt.py set of programs for NWC2.
I recently converted from Python 2.5 to Python 3 which uses Unicode UTF-8 enncoding. I have run a dozen "round trip" tests on the following nwctxt files, and all tested OK. Some of these tested bad using Python 2.6. My, possibly mistaken, conclusion is that nwctxt files use UTF-8 encoding.
Here is a round trip test:
1. nwctxt2xml.py converts mySong.nwctxt to mySong.xml
2. xml2nwctxt.py converts mySong.xml to mySong-out.nwctxt
3. If fileCompare mySong.nwctxt, mySong-out.nwctxt = "files are identical" -> OK
Here is the output log of my dozen tests:
test: 001 time: Fri Aug 07 13:48:51 2009 file: ambeaut.nwctxt -> OK
test: 002 time: Fri Aug 07 13:48:54 2009 file: bosgbaa.nwctxt -> OK
test: 003 time: Fri Aug 07 13:48:55 2009 file: hillhbr.nwctxt -> OK
test: 004 time: Fri Aug 07 13:48:56 2009 file: hussaut.nwctxt -> OK
test: 005 time: Fri Aug 07 13:49:39 2009 file: JesusLM.nwctxt -> OK
test: 006 time: Fri Aug 07 13:49:39 2009 file: mozhall.nwctxt -> OK
test: 007 time: Fri Aug 07 14:01:02 2009 file: ocanada.nwctxt -> OK
test: 008 time: Fri Aug 07 14:01:02 2009 file: wamr01-intrt.nwctxt -> OK
test: 009 time: Fri Aug 07 14:01:22 2009 file: wamr02s-dies.nwctxt -> OK
test: 010 time: Fri Aug 07 14:01:43 2009 file: wamr07-lacri.nwctxt -> OK
test: 011 time: Fri Aug 07 14:01:49 2009 file: wamr10-sanct.nwctxt -> OK
test: 012 time: Fri Aug 07 14:01:55 2009 file: wamr12-agnus.nwctxt -> OK
I am going to post my code, open source, so folks can use it under another thread. If I made a mistake in my UTF-8assumption, they'll find it and hopefully let me know so I can fix the code. Thanks for the help.
Joe