7/28/07

convert mp3 id3 tags to unicode.

The problem about playlist on media player are unreadable when it use thai character. This is caused by different encoding used in id3 tags and in media player. Id3 tags use cp874[don't know why] but in current linux distribution, player use unicode. This will not be a problem if tags are english characters that cp874 and unicode has the same value e.g. 'A' is '0x41' for cp874 and '0x0041' for unicode, but not for thai character.

To correct thai character in playlist. There is a script to convert 1-byte charsets to unicode[download] by Kopats Andrei.
Now what to do:
1. download the script. You'll get 'tag2utf.py'.
2. install requirement software

sudo apt-get install python-eyed3

3. in the script, looking for charsets = {'cp1251':'c','koi8-r':'k' } and replace with your language encoding[ for thai change it to charsets = {'tis-620':'t' } ]
Now we ready to convert mp3 tags. run script by 'tag2utf.py <mp3 dir>' it will convert id3 tags of all files in specify directory include sub directory. Below is the result.


resource [in thai]
http://wiki.ubuntuclub.com/wiki/Tag2utf
http://linuxtip.blogspot.com/2007/02/id3-tag-part-ii.html

1 comments:

lenik said...

Tried this myself, got tired with wrong encodings and wrote automatic converter, which can deduce the original encoding by itself. The supported encodings include Chinese, Japanese, Russian/Cyrillic, Hebrew and many others.

http://code.google.com/p/id3-to-unicode/