Included here are selected services that provide searching of very large indexes of Internet resources. Each service is comprised of 3 components: a "robot" of some sort which automatically collects links, titles, and text from millions of Internet sites; a database where the resource information is stored; and a search engine which allows the user to interrogate the database for sites of interest. Some services also provide a limited browsable subject catalog, but the primary goal of each tool in this section is to provide a large, searchable database of Internet resources. In addition to these large indexes are meta searchers, which take relatively simple input and search many indexes at once.
Alta Vista
Provider: Digital Equipment Corporation
Browsable index: No
Search capabilities: Yes
Item submission method: Robot, also takes human submissions
Editorial function: No
Annotated: Yes, from the page itself
Notable features: Size of the index, speed of retrieval
FAQ: No, for the product. Yes, for the search engine, on the home page
under "Help," for simple query mode; on the home page under
"Advanced Query," and then "Help," for advanced
query mode.
Alta Vista indexes millions of Web pages as well as the text on those pages. It also provides access to thousands of Usenet newsgroups. Boolean AND/OR/NOT searching is supported, as well as phrase, proximity, truncation, and field searching. Alta Vista provides feedback in order of relevance, but does not provide relevance scores. It has two detailed FAQs for searching that are essential to exploit the power of the system. It is an extremely powerful search tool, particularly for very specific searches. The Digital hardware that runs the system also provides very fast retrievals.
excite
Provider: Architext Software
Browsable index: Yes
Search capabilities: Yes
Item submission method: Robot, also takes human submissions
Editorial function: Yes, for the "NetDirectory"
section
Annotated: Yes, from the page itself for the searchable index, and
by excite editors for the "NetDirectory"
Notable feature: Size of index
FAQ: Yes, for the product, on the home page under
"Excite Search." Yes, for the search engine, on the home page under
"Help."
excite indexes millions of Web pages and the Usenet news articles posted during the previous two weeks. Its Intelligent Concept Extraction (ICE) search engine is based on concept searching, but also supports Boolean AND/OR/NOT searching. Users may search by "concept" or key words. Concept based searching is easier and can yield larger retrieval, but can be confusing in that it sometimes returns sites that are not related to the query. excite offers relevance feedback, returning different colored icons based on the level of relevance, and provides percentage confidence ratings. It allows the user to click on the relevance icon next to an item, to return other similar items (query by example). It also offers the user the option of retrieval by site, with sites listed hierarchically. Users should read the searching FAQs carefully to to use the system to its fullest advantage.
HotBot
Provider: Inktomi and the HotWired Network
Browsable index: No
Search capabilities: Yes
Item submission method: Robot, also takes human submissions
Editorial function: No
Annotated: Yes, from the page itself
Notable features: Size of the index
FAQ: Yes, for both on the home page under "Help."
HotBot claims to index more than 50 million documents at present. Through a forms based interface, it allows Boolean AND/OR/NOT and phrase searching. In "expert" mode, it also supports field searching by date, media type (audio, images, java, Acrobat, etc.), and geographic location. Items are returned in order of relevance, which is provided on a percentage basis.
Lycos
Provider: Lycos, Inc.
Browsable index: Yes, the Lycos "a2z" Directory.
Search capabilities: Yes
Item submission method: Robot, also takes human submissions
Editorial function: No
Annotated: Yes, from the page itself
Notable feature: Size of index
FAQ: Yes, for both on the home page under "Help."
Originally provided by Carnegie-Mellon University, this site is now maintained by Lycos, Inc. It supports Boolean AND/OR searching (through menu picks), relevance feedback, and allows the user to control the level and amount of feedback, as well as the level of relevance. It brings back annotation about the site from the page itself. Lycos also maintains an "a2z" directory based on the number of links maintained to each site by other sites on the Internet, in over 15 subject categories.
Open Text
Provider: Open Text Corporation
Browsable index: No
Search capabilities:Yes
Item submission method: Robot, also take human submissions
Editorial function: No
Annotated: Yes, from the page itself
FAQ: Yes, for the
product. Yes, for the
search engine.
Provided by Open Text, this engine indexes every word of over a million Web pages. In addition to Boolean AND/OR/NOT searching, it supports phrase, proximity, and field searching, and returns relevance feedback.
infoseek ultra
Provider: Infoseek Corporation
Browsable index: No
Search capabilities: Yes
Item submission method: Robot, also takes human submissions
Editorial function: No
Annotated: Yes, from the page itself
Notable feature: Size of index, speed of retrieval
FAQ: Yes, for the product, on the home page under "about Ultra."
Yes, for the search engine, on the home page under "help."
Infoseek's Ultra is in beta testing at this time. It claims to index full text of over 50 million pages. It supports Boolean AND/OR/NOT and phrase searching, as well as field searching in 4 categories (link, site, url, and title). Items are returned in order of relevance, which is provided on a percentage basis. There is a link from Ultra to the Infoseek Guide, Infoseek's browsable Internet directory.
WebCrawler
Provider: America Online
Browsable index: Yes
Search capabilities: Yes
Item submission method: Robot, also take human submissions
Editorial function: No
Annotated: Yes, from the page itself.
Notable feature: Quick and easy page locations
FAQ:Yes, for the product and the search engine, on the home page under
"Help."
Originally provided by the University of Washington, this site is now maintained by America Online, although you don't need an America Online account to use it. It allows phrase, Boolean AND/OR/NOT, phrase, and proximity searching. Relevance feedback is available (if you select "Show Summaries"), as is a short summary taken from the page itself. It is a good basic searcher for a "quick and dirty" search. Webcrawler also contains a browsable index in 15 subject categories.
MetaCrawler
Provider: Erik Selberg and Oren Etzioni
Browsable index: No
Search capabilities: Yes
Item submission method: None
Editorial function: No
Annotated: Yes, from the page itself
Notable features: Verifies results when using "+"
or phrase searching.
FAQ: Yes, for the product, on the home page under "About."
Yes, for the search engine, on the home page under
"Examples."
With a single search request MetaCrawler searches nine search engines: Open Text, Lycos, WebCrawler, InfoSeek, Excite, Inktomi, Alta Vista, Yahoo, and Galaxy. It supports Boolean AND/OR and phrase searching. MetaCrawler will verify that each each reference returned by the search is accessible and contains "valid data". To activate the verification feature, use phrase searching or place a "+" in front of a keyword. The verification feature is unique among "robot" type search engines.
MetaCrawler collects confidence scores from each of the search engines used, combines them, and provides the search results in order of relevance based on the combined confidence score. It does not, however, return individual confidence scores.
MetaCrawler allows the user to focus the search by geographic region and by Internet domain type, e.g. "com", "edu", etc. It also allows the user to specify time and relevance performance parameters. Note that when verification is done, searches can take 3 - 5 minutes to complete. However, the user may consider this time well spent since the references returned are always accessible.
SavvySearch
Provider: Daniel Dreilinger
Browsable index: No
Search capabilities: Yes
Item submission method: None
Editorial function: No
Annotated: Yes, from the page itself
Notable features: Queries over 20 search engines with one
command. Groups results by sets of search engines. Context
sensitive help is available. Query form is available in multiple
languages.
FAQ: Yes, for the product, on the home page under "FAQ."
Yes, for the search engine, on the home page under
"Help." Context sensitive help is available by
clicking on any blue box with an "i" in it.
SavvySearch allows the user to enter a single query to search over 20 search engines. (The list of engines can be found on the SavvySearch front page.) Search results can include Web sites, software, email addresses, and even movies. Boolean AND/OR and phrase searching are supported.
Results are returned in a "search plan," with the best matches and their constituent search engines listed first. Alternatively, the user can request that the results be integrated into a single list of references, in which case duplicates are deleted and the constituent search engines are not specified. The user has some control of how many hits are gathered from each engine, as well as the amount of information displayed. The short searching FAQ should be read carefully in order to fully exploit the system.
search.com
Provider: c|net, inc.
Browsable index: Yes, of searching engines
Search capabilities: Yes
Item submission method: None
Editorial function: Yes
Annotated: Yes
Notable features: The number of search engines available.
User can customize a page of favorite search engines. Brief
annotations and searching tips for each search engine are
provided.
FAQ: Yes, for the
product. Yes, for the search engine, on the home page under "Help."
A Boolean searching primer is also provided.
Search.com provides the user with direct access to over 250 search engines. The engines are organized into 20 subject categories, allowing the user to narrow a search by selecting engines specializing in general topics such as art, science, health, news, sports, or entertainment. Each engine is accompanied by a short annotation, as well as one or two searching tips for that engine. The best engines, as determined by search.com, are indicated by a "top pick" icon.
The search.com service allows the user to create a personalized page of useful engines that will appear for that user each time search.com initializes. The user can then switch back to the default search.com listing by clicking the "search.com" icon at the top of the page, or choose from any of the main subject categories. To aid users in determining the most appropriate engines for specific queries, search.com offers the ability to search the titles of the 250 engines provided. Although search.com doesn't allow searching across engines with a single command, the number of engines available, as well as the organization of the site, make it a valuable addition as an Internet searching tool.