More Goodies from OSCON
Rasmus isn’t the only person making presentations at OSCON. Andrei (another Yahoo! working on the guts of PHP) did his PHP 6 presentation. One of the major features coming in PHP 6 is the built-in Unicode support. PHP 5 already has Unicode support, but it was never baked into the platform. In PHP 6, Unicode will be tied very strongly into the guts of everything.
That’s good news. One of the best things about Java is knowing that when you’re dealing with a string, you’re always dealing in Unicode. No matter where that string came from and no matter where it’s going, for right now it’s Unicode.
Check out Andrei’s slides from the presentation. My favorites are way down on slides 74 and 75 (I wonder how long his presentation was), the slide about transliteration in PHP 6. The short of it is, you can take a Unicode string from one language and transliterate to a string in another language. The example Andrei gives in the slide transliterates the Japanese (I think) string “たけだ, まさゆき” to one of the Latin character sets, where it comes out as “Takeda, Masayuki”. He goes on to show you how to use transliteration to get a pronunciation of your name in another language. Very cool stuff. Just imagine if your mail reader could take mail sent to you from your Japanese penpal and transliterate their name in the “From” field to something pronouncable in English?
I guess I’m going to have to get a PHP 6 install set up on my laptop so I can play around a little.
Addition: Andrei also mentions Powell’s book store in Portland. I have to agree, take a bit of time and go get lost in that store (literally). You’ll have an awesome time.
Addition #2: The ICU project has a transliteration demo page up. Type some text into “Input”, select a source character set in “Source 1″ (for English, pick “Latin”) and then select the desired output character set in “Target 1″. The result is, you find out how to pronounce the input in another language. For instance, my full name in Cyrillic is “Рыан Цхристопхер Кеннеды”. In Arabic, my full name is “ريَن كهرِستُپهِر كِننِدي”. Hopefully I can actually trust the demo…for all I know it’s spitting out “this guy’s impotent” in another language.
July 30th, 2006 at 8:43 am
Ryan,
While transliteration from non-Latin to Latin works fairly well for pronunciation purposes, going the other way is more problematic because.. well, because we all know that English has very complex rules for pronunciation and those rules have many exceptions. What we really want here is transcription, not transliteration:
http://en.wikipedia.org/wiki/Transcription_%28linguistics%29
Transliteration attemps to be lossless, so that the result can be converted back to the original script without any changes. Transcription, on the other hand, may lose some information in order to better represent the -sounds- of the language. Your name in Cyrillic would most likely be represented as “Райен Кристофер Кеннеди”.
Note to self: I should write a blog entry about this.
October 6th, 2008 at 4:26 pm
[...] Kennedy commented on the presentation I gave at OSCON; specifically, about the transliteration support in PHP 6. I [...]