Bienvenido! - Willkommen! - Welcome!

Bitácora Técnica de Tux&Cía., Santa Cruz de la Sierra, BO
Bitácora Central: Tux&Cía.
Bitácora de Información Avanzada: Tux&Cía.-Información
May the source be with you!

Saturday, April 18, 2009

How do Search Engines Work?

Source
Google has one of the largest databases of Web pages, including many other types of web documents (blog posts, wiki pages, group discussion threads and document formats (e.g., PDFs, Word or Excel documents, PowerPoints). Despite the presence of all these formats, Google's popularity ranking often places worthwhile pages near the top of search results. Our web searching workshop reflects the fact that Google is currently the most used search engine.
Google alone is not always sufficient, however. Not everything on the Web is fully searchable in Google.
Overlap studies show that more than 80% of the pages in a major search engine's database exist only in that database. For this reason, getting a "second opinion" can be worth your time. For this purpose, we recommend Yahoo! Search or Exalead. We do not recommend using meta-search engines as your primary search tool.
Search engines do not really search the World Wide Web directly. Each one searches a database of web pages that it has harvested and cached. When you use a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve the current version of the page. Search engine databases are selected and built by computer robot programs called spiders.
These "crawl" the web, finding pages for potential inclusion by following the links in the pages they already have in their database. They cannot use imagination or enter terms in search boxes that they find on the web. If a web page is never linked from any other page, search engine spiders cannot find it. The only way a brand new page can get into a search engine is for other pages to link to it, or for a human to submit its URL for inclusion. All major search engines offer ways to do this. After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content. Many web pages are excluded from most search engines by policy. The contents of most of the searchable databases mounted on the web, such a
s library catalogs and article databases, are excluded because search engine spiders cannot access them. All this material is referred to as the "Invisible Web" -- what you don't see in search engine results.
What Makes a Search Engine Good?

No comments: