Unicode has overtaken ASCII as the most popular character encoding scheme on the World Wide Web. Also vanquished at almost exactly the same time was the Western European encoding.
Unicode is a character encoding standard that accommodates dozens of languages as well as Roman characters with diacritical marks. ASCII, a tried-and true, decades-old standard, is limited to 128 or 256 characters and has a hard time extending beyond the range of a century-old Remington typewriter.
Mark Davis, Google's senior international software architect, said in a blog post that Unicode vanquished ASCII and Western European within 10 days in December.
"What's more impressive than simply overtaking them is the speed with which this happened," he added, pointing to a graph showing the meteoric rise of Unicode.
Google's a fan of Unicode Web sites. When it processes data from Web sites, it converts it into Unicode first if it's not already there. That improves international search abilities.
"The continued rise in use of Unicode makes it even easier to do the processing for the many languages that we cover," he said.
Google just converted to Unicode 5.1, he added, "so people speaking languages such as Malayalam can now search for words containing the new characters," he said.
One disadvantage Unicode has over ASCII, though, is that it takes at least twice as much memory to store a Roman alphabet character because Unicode uses more bytes to enumerate its vastly larger range of alphabetic symbols.
This article was originally a blog post on CNET News.com.








» Maximum flexibility with powerful blade technolgy
Secure the "Next-Gen SOA Infrastructure" & "Bringing SOA Value Patterns to Life" whitepapers here







There are currently no comments for this post.