Google dips toes into 'deep Web' search

By Stephen Shankland, CNET News.com
Tuesday, April 15, 2008 07:00 AM

Google's ever-active search bots, which scour the Web constantly for new pages, have begun a new, more active phase of their indexing jobs.

In a blog post Friday, Jayant Madhavan and Alon Halevy of Google's crawling and indexing team said the company has begun an experiment in which its indexing software experimentally enters text in Web site forms to see what previously undiscovered pages may appear.

"In the past few months, we have been exploring some HTML forms to try to discover new Web pages and URLs that we otherwise couldn't find and index for users who search on Google," they wrote. "This experiment is part of Google's broader effort to increase its coverage of the Web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines."

The new Google indexing practice involves only "high quality" Web sites and doesn't run on sites with "robots.txt" files or other standard mechanisms of warding off indexing software.

To decide what words to "type" into the forms, the indexing software samples from among words on the Web page with the form, Google said.

The technology looks related to a company called Transformic that Google acquired, according to a blog post by Anand Rajaraman, who was involved with the technology earlier in his career, while working for Halevy.

This article was first published as a blog on CNET News.com.


WORTHWHILE?

0

0 votes
Blog

Talkback 1 comments

Econ Stimulus Package for SEO
As I looked at this on the 13th, I thought it would become an economic stimulus package for SEO/SEM firms: pbarnhart.wordpress.com...
Posted by Phillip Barnhart on Tuesday, April 15 2008 08:16 PM


Tech Jobs Now!

Search for your ideal tech job:

Use shades of gray to enhance scale in Excel

Microsoft Office Suite

Excel's palette is generous, but don't throw buckets of pigment all over your spreadsheets just because you can.


Read more »



Ultimate 2012 recovery site: the moon

Blog thumbnail

Have you seen the disaster movie "2012"? A friend from Control Risks and I did, and we reluctantly concluded we wouldn't be able to write off the cost of our..... by Nathaniel Forbes

Read more »

Tags

  1. advertisement
  2. blog
  3. facebook
  4. google inc.
  5. internet
  6. internet advertising
  7. microsoft corp.
  8. network
  9. revenue
  10. search
  11. social networking
  12. software
  13. u.s.
  14. web
  15. web 2.0
  16. web browser
  17. web browsers
  18. web services
  19. web sites
  20. yahoo! inc.