UTF-8 and Your Music
A heads-up on something new in Amarok SVN (and coming in 2.2 for those of you not living on the bleeding edge):
We've had various bug reports over the years relating to character sets and collation, causing issues with matching searches for music or mis-sorted items. Well, hopefully no longer.
When you update to 2.2 (recent SVN users, see the note at the end of this post), your Amarok database and tables will be converted to use the 'utf8' character set and 'utf8_unicode_ci' collation as default for any table or column created from this point on. Every single text/varchar field will also be converted through a two-step process to use 'utf8' as the character set (the data inside was always UTF-8, but there was a possible mismatch between what the data was and what the database thought it was, if your mysql wasn't built to use 'utf8' by default). In addition, the character set used when talking to the embedded server (the protocol in the socket) will be 'utf8'.
Fixing this mismatch between what the server might have been using for character set/collation and the data we're putting in there should hopefully ensure that sorting and tags work very well for our users with some files wth non-Latin1 tags (probably just about everybody these days).
* Recent SVN users: if your build date is earlier than this post I'd recommend wiping your mysqle directory (not just a full rescan), as the initial commit of the updating code contained a bug that could possibly cause trouble down the line with user playlists...but you bleeding edge users should be expecting database wipes every now and then
June 21st, 2009 - 09:32
Nice, could i add a little bug report/request? too lazy too fill out a bugreport right now.
Internet radiostreams are often not read correctly, i mean the tags. For example in french radio stations letters like éè etc are shown as ? which is then a problem for the lyrics applet because it can’t find the lyrics.
June 21st, 2009 - 13:57
Not only does your comment have nothing to do with the blog post, but it’s not the forum for requesting feature enhancements or bug fixes. Stop being lazy and fill out a bug report, and please don’t muddle comments to a post with totally unrelated items.
June 21st, 2009 - 14:06
I’m sorry in that case, i was thinking that the radio tag problem had to do something with the encoding of it (utf8 whatever), so i quickly added it here. I’m sorry that i offended you, even though i’m surprised by such a hard reaction. But don’t worry i will add a bugreport, my post here was more to get more information that to really make a bugreport (i help in plasma for some time now and know how and why to use bugzilla)
thanks for your hard work
June 21st, 2009 - 14:33
You haven’t offended me, but I’m not sure why my reaction surprises you. You post a totally unrelated problem on a blog post because you, in your own words, are “too lazy” to fill out a bug report.
If you’d read the post at all carefully, you’d see that the charset handling was between the app and the database. I’m not sure why you would think this has anything to do with streaming audio sending tags on demand.
June 21st, 2009 - 17:02
have a nice weekend!
June 22nd, 2009 - 16:41
Yay! I was hoping for this to happen soon. Now I am tempted to try the svn version… let’s see.
Keep up the great work!