Google dips toes into 'deep Web' search

By Stephen Shankland, CNET News.com
Tuesday, April 15, 2008 07:00 AM

Google's ever-active search bots, which scour the Web constantly for new pages, have begun a new, more active phase of their indexing jobs.

In a blog post Friday, Jayant Madhavan and Alon Halevy of Google's crawling and indexing team said the company has begun an experiment in which its indexing software experimentally enters text in Web site forms to see what previously undiscovered pages may appear.

"In the past few months, we have been exploring some HTML forms to try to discover new Web pages and URLs that we otherwise couldn't find and index for users who search on Google," they wrote. "This experiment is part of Google's broader effort to increase its coverage of the Web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines."

The new Google indexing practice involves only "high quality" Web sites and doesn't run on sites with "robots.txt" files or other standard mechanisms of warding off indexing software.

To decide what words to "type" into the forms, the indexing software samples from among words on the Web page with the form, Google said.

The technology looks related to a company called Transformic that Google acquired, according to a blog post by Anand Rajaraman, who was involved with the technology earlier in his career, while working for Halevy.

This article was first published as a blog on CNET News.com.


See also:  Search
WORTHWHILE?

0

0 votes
Blog

Talkback 1 comments

Econ Stimulus Package for SEO
As I looked at this on the 13th, I thought it would become an economic stimulus package for SEO/SEM firms: (web link)
Posted by Phillip Barnhart on Tuesday, April 15 2008 08:16 PM

Guest user

Guest user

Level: 
Joined: —
Already a member? Log in »



 

Loading...

Tech Jobs Now!

Replicating your infrastructure in a lab

Enterprise Servers & Storage

Learn two ways to replicate your current environment for testing and evaluation of new server platforms.


Read more »



  • HPC Applications

    Ever wondered if High Performing Computing systems really matter in our day-to-day world? HPC is not just reserved for the some obscure high-end scientific studies.

    David Scott from Intel Corporation gives you a quick tour to the process of developing HPC applications and the interesting world of HPC Applications in today's industries, including the lucrative oil industry.
    Play video


  • Maximize IT Spend: Business Acceleration

    How do you ensure your IT solutions are well integrated and streamlined across your enterprise? Rajendhiran Sanggaran from Oracle explains the processes and important considerations required to enable IT to fuel your business to the next level of growth.
    Play video

Tags

  1. ad
  2. boost
  3. china
  4. cloud
  5. data
  6. deal
  7. developers
  8. ebay
  9. facebook
  10. fight
  11. firefox
  12. google
  13. icahn
  14. icann
  15. internet
  16. launch
  17. microsoft
  18. net
  19. online
  20. open
  21. privacy
  22. proxy
  23. report
  24. search
  25. site
  26. suit
  27. users
  28. web
  29. yahoo
  30. youtube

What's the Indian definition of privacy?

Blog thumbnail

Two days back, I was having dinner at an aunt's place. She is a leading doctor. We were discussing my school friend, who happens to be her patient.

My aunt..... by Swati Prasad

Read more »