Technicus stultissimus: How do Search Engines Work?

Saturday, April 18, 2009

How do Search Engines Work?

Google has one of the largest databases of Web pages, including many other types of web documents (blog posts, wiki pages, group discussion threads and document formats (e.g., PDFs, Word or Excel documents, PowerPoints). Despite the presence of all these formats, Google's popularity ranking often places worthwhile pages near the top of search results. Our web searching workshop reflects the fact that Google is currently the most used search engine.
Google alone is not always sufficient, however. Not everything on the Web is fully searchable in Google.
Overlap studies show that more than 80% of the pages in a major search engine's database exist only in that database. For this reason, getting a "second opinion" can be worth your time. For this purpose, we recommend Yahoo! Search or Exalead. We do not recommend using meta-search engines as your primary search tool.
Search engines do not really search the World Wide Web directly. Each one searches a database of web pages that it has harvested and cached. When you use a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve the current version of the page. Search engine databases are selected and built by computer robot programs called spiders.
These "crawl" the web, finding pages for potential inclusion by following the links in the pages they already have in their database. They cannot use imagination or enter terms in search boxes that they find on the web. If a web page is never linked from any other page, search engine spiders cannot find it. The only way a brand new page can get into a search engine is for other pages to link to it, or for a human to submit its URL for inclusion. All major search engines offer ways to do this. After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content. Many web pages are excluded from most search engines by policy. The contents of most of the searchable databases mounted on the web, such as library catalogs and article databases, are excluded because search engine spiders cannot access them. All this material is referred to as the "Invisible Web" -- what you don't see in search engine results.

What Makes a Search Engine Good?

Apple	Atari	Commodore	Data General	DEC	Honeywell
Hewlett Packard	IBM	NCR	Olivetti	Sinclair	Sun Microsystem
Silicon Graphics	Unisys	Mattel	Amstrad	Altre marche	Hardware
Calcolatrici	Fuori categoria	Pubblicita	Documentazione	Software	Lista

Technicus stultissimus

Bienvenido! - Willkommen! - Welcome!

Saturday, April 18, 2009

How do Search Engines Work?

No comments:

Labels

Pages

Tux & Cía. -Ventas

Search This Blog

Tux & Cía.

Amenazas Informáticas

Browsers&Security Tests

Diseño Gráfico

Basic Data Management

Windows Updates

Network tools

Forums - Foren - Foros

Which Security SW do you use?

[Your PC]

Piador

Me and Linus Torvalds' work

Free&Open Source SW

Blog Archive

Labels

Comics

About Me

Nerd or Dren?

Live Traffic Map

useful LX commands

Recommended

Networking Technique - Computer Technologies - Commputer Archeology

Computer Museum

The Power of Knowledge-El Poder del Conocimiento-Die Macht des Wissens

Visitantes-Besucher-Visitors