Skip to main content

convert mp3 id3 tags to unicode.

The problem about playlist on media player are unreadable when it use thai character. This is caused by different encoding used in id3 tags and in media player. Id3 tags use cp874[don't know why] but in current linux distribution, player use unicode. This will not be a problem if tags are english characters that cp874 and unicode has the same value e.g. 'A' is '0x41' for cp874 and '0x0041' for unicode, but not for thai character.

To correct thai character in playlist. There is a script to convert 1-byte charsets to unicode[download] by Kopats Andrei.
Now what to do:
1. download the script. You'll get 'tag2utf.py'.
2. install requirement software
sudo apt-get install python-eyed3

3. in the script, looking for charsets = {'cp1251':'c','koi8-r':'k' } and replace with your language encoding[ for thai change it to charsets = {'tis-620':'t' } ]
Now we ready to convert mp3 tags. run script by 'tag2utf.py <mp3 dir>' it will convert id3 tags of all files in specify directory include sub directory. Below is the result.


resource [in thai]
http://wiki.ubuntuclub.com/wiki/Tag2utf
http://linuxtip.blogspot.com/2007/02/id3-tag-part-ii.html

Comments

lenik said…
Tried this myself, got tired with wrong encodings and wrote automatic converter, which can deduce the original encoding by itself. The supported encodings include Chinese, Japanese, Russian/Cyrillic, Hebrew and many others.

http://code.google.com/p/id3-to-unicode/

Popular posts from this blog

Fixing sendmail take a long time to start

I notice that my database box[FC6+Oracle10.2] take along time to startup. Sendmail and sm-client very very slow to start[ about 5 minutes ]. There's something wrong in /etc/hosts file. 'newalises' command take long time to update and finish with error below. [root@ora10g ~]# newaliases WARNING: local host name (ora10g) is not qualified; see cf/README: WHO AM I? /etc/aliases: 76 aliases, longest 10 bytes, 765 bytes total [root@ora10g ~]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost 192.168.1.55 ora10g [root@ora10g ~]# To fix this, custom hostname[ora10g] need to append to localhost line in /etc/hosts. [root@ora10g ~]# cat /etc/hosts # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost ora10g 192.168.1.55 ora10g [root@ora10g ~]# newaliases /etc/alia...

Too many open files

Last week tomcat log file report many error about "Too many open files" when it has high traffic. Some in catalina_log say 2007-04-07 16:13:40 HttpProcessor[80][272] Starting background thread 2007-04-07 16:13:40 HttpConnector[80] accept: java.net.SocketException: Too many open files and here is from localhost_log 2007-04-07 16:13:40 StandardWrapperValve[myservlet]: Servlet.service() for servlet myservlet threw exception java.io.FileNotFoundException: /home/log/mylog_070407.log (Too many open files) This is because too many file descriptors're opened by tomcat. File descriptor can be limited in both system level and shell level. To check maximum number of fd in system type 'cat /proc/sys/fs/file-max'. In my case it is 65536(someone said it should set to 200000). Tomcat error when try to open socket number 272 so I think 65536 is ok for me for now. Anyway if u want to set it add 'fs.file-max = 200000' to /etc/sysctl.conf pnix@pnix-a7:~$ cat /proc/sys/fs/fil...

Setup MySQL with Ofbiz

Download ofbiz weekily build and extract it somewhere you want. From your ofbiz directory, edit file entityengine.xml in framework/entity/config add new datasources below localmysql datasource part <datasource name="custommysql" helper-class="org.ofbiz.entity.datasource.GenericHelperDAO" field-type-name="mysql" check-on-start="true" add-missing-on-start="true" check-pks-on-start="false" use-foreign-keys="true" join-style="ansi-no-parenthesis" alias-view-columns="false" drop-fk-use-foreign-key-keyword="true" table-type="InnoDB" character-set="latin1" collate="latin1_general_cs"> <read-data reader-name="seed"/> <read-data reader-name="seed-initial"/> ...