Microsoft aims to build a better thesaurus

By Ina Fried, CNET News.com
Monday, February 23, 2009 10:47 AM

A team of researchers at Microsoft is looking to beat Roget at his own game.

Aiming to build a better thesaurus, the Writing Assistance project within Microsoft's research unit is tapping techniques developed to translate from one language to another.

Although thesauri are good at finding lots and lots of synonyms, they require the user to pick the right one because they aren't very good at understanding the context of what is being said. That's where the experience from doing machine translations comes in.

"We've taken the actual translation tables...and what we've done is we've taken those and said if a word in Chinese maps to two different English words maybe those two words are synonyms with some probability," said Christopher Brockett, a computational linguist and one of the Microsoft researchers leading the project.

The approach has two key benefits over a static thesaurus. First of all, the newer approach can do phrases, as opposed to single words. Also, it can draw on the context in which the phrase is used.

Brockett plans to show off a prototype of the tool at TechFest, Microsoft's annual internal science fair. It's just one of dozens of projects that will be shown as part of an effort to expose Microsoft's business units to the work being done in Microsoft's research labs.

TechFest is sort of like "The Dating Game" for Microsoft's research and product development arms. Research teams at Microsoft set up booths, somewhat like a high-school science fair, while product teams shuffle through looking for something that might give their efforts a leg up on the competition.

For the public, TechFest can also offer a glimpse at future product directions. For example, researcher Andy Wilson showed off a number of surface computing projects in the years leading up to the debut of Microsoft's Surface product.

As is the case with most of the projects, the thesaurus effort is still in its infancy.

"We're still working on the algorithms and how much work we give to the language pairs," Brockett said. "We have to get the quality up. There are usability issues that have to be looked into."

Over time, though, Brockett hopes the technique could be used to effectively translate whole sentences. Microsoft has a demonstration of that up on its Web site, but Brockett acknowledges such a treatment shows both the potential and the current limitations of the technology.

But would-be high-school plagiarists beware. Yes, the technology could someday translate the whole Wikipedia article for you, but it would likely translate the article the same way for all your classmates as well. And plagiarism detection software is evolving along with the science of machine translation.

As for the thesaurus itself, the technology would be a natural fit for Word, which already has a built-in traditional thesaurus. But the technology could also help Microsoft in another key area: search.

That's because while search engines are good at finding things like names, that have just one form, they have a harder time finding expressions that can be phrased in multiple ways.

That's less of an issue when searching across the whole Web. For example, searching "Who shot Abraham Lincoln?" "Who killed Abraham Lincoln" and "Who assassinated Abraham Lincoln" all direct you to a page with John Wilkes Booth.

However, when it comes to searching smaller universes, such as a company's intranet, that might not be the case.

"You might not find it if the words are different," Brockett said. In such cases, automatically searching using similar phrases might boost the likelihood of finding a result.

This article was first published as a blog post on CNET News.


WORTHWHILE?

0

0 votes
Blog

Talkback 0 comments

There are currently no comments for this post.


Tech Jobs Now!

Search for your ideal tech job:

Use shades of gray to enhance scale in Excel

Microsoft Office Suite

Excel's palette is generous, but don't throw buckets of pigment all over your spreadsheets just because you can.


Read more »



Ultimate 2012 recovery site: the moon

Blog thumbnail

Have you seen the disaster movie "2012"? A friend from Control Risks and I did, and we reluctantly concluded we wouldn't be able to write off the cost of our..... by Nathaniel Forbes

Read more »

Tags

  1. antivirus
  2. apple ipod
  3. cnet networks inc.
  4. desktop
  5. e - mail
  6. hard drive
  7. intuit inc.
  8. mcafee inc.
  9. microsoft corp.
  10. microsoft windows
  11. microsoft windows vista
  12. microsoft windows xp
  13. norton co.
  14. pc
  15. performance
  16. security
  17. software
  18. tool
  19. web
  20. web site