Alan
E. Mann, AG
fhfair@alanmann.com Accredited
Genealogist
www.alanmann.com/articles prepared January 2005
Search
Engines for Genealogists
I
often tell people the Internet is the richest source of genealogical
information available today. The amount, scope, and availability of data are
staggering, even incomprehensible. It is virtually certain that there is valid
information about your ancestors on the Internet that you don’t have.
Information that you would probably want if you only knew it was there. So how can you find it? With a lot of searching. This
session looks at tools to help your searching.
Generally,
a genealogist’s Internet searching has two phases.
This
presentation focuses on the first step—finding the website.
While
“search engine” can be correctly used in several different ways, the most
common usage is a web tool used to find web pages on a specific topic. An
Internet search engine is like a catalog to the Internet. Companies and
individuals with web sites notify the search engines about their sites because
they want people to find out about their site. Search engines also use spiders
or robots—tools that go looking around the Internet, capturing pages and then
indexing them. Beware—not all search engines are equal. Some index the first
few sentences of a web page only. Others index every word. No search engine
indexes all of the web. Some have billions of pages
and others only have a few hundred million…
Search engines are not
designed specifically for genealogy, but rather search for whatever words you
input to search. There are thousands of search engines. One source claims to
list over 809,000 search engines. Basically, a search engine visits web sites
and indexes their content. While a search engine can index the name Richard
Poor, it wouldn’t be able to distinguish between a person by that name, Poor
Richard’s almanac, and a play that had the line “Alas, poor Richard…” Once when
searching for wills left by my Brooks family ancestors, a search engine
confidently directed me to a page where I found the sentence “Garth Brooks
will be appearing…” There are some tricks to using search engines. Use
unusual names whenever possible (see tip #6 at the first, above). When
searching for a common family name, add the word genealogy, the phrase “family
history”, or “was born” after the name to narrow down your search.
Doesn’t
forget the need to know what you are searching—what’s the scope, the source,
and how do you use it? Here are a few
pointers that apply to genealogical searches on the Internet.
What
is metasearch? The term does not yet
appear in most dictionaries, but is a common term on the web. It is used most often
to describe an Internet metasearch engine. The general idea is that you submit
keywords in its search box, and it then transmits your search to several
individual search engines simultaneously. Within a few seconds, you get back
results from several search engines. Metasearch engines do not have their own
index or database of Web pages; they send your search terms to those kept by
search engine companies, and then combine the results from their indexes.
What
you need to know about metasearching is that the quality of their results
depends on what they search and how they organize the results. A metasearch
cannot be better than the sum of the individual databases they query.
There
are some good general web metasearch engines which are not designed as
genealogical search tools, but which can be used to search for genealogy or
genealogically-related topics.
This
class extends the idea to genealogy web pages. Generally speaking, a genealogy
metasearch tool would be something that searches several databases or several
web sites. Using this definition, metasearches can either be those that search
several databases on a single site or tools that search several web sites and
combine the results. This class will look at examples of both types of
metasearches.
Single Site
Metasearches
Here,
we can talk about Ancestry, Heritage Quest Online, FamilySearch, RootsWeb, or a
variety of other web sites. These are sites that have a lot of databases, but
have a (meta)search that looks through and presents
results from all of the different databases.
USGenWeb. The simplest
would be the USGenWeb Archive. Here, there are
hundreds of thousands of files representing extracts, transcriptions,
abstracts, and indexes to many millions of names. The site search engine allows
you to search all of their files at once or all of the files for any one state.
However, the search options are extremely limited. Basically, you can search
for any word in any of the files selected, but you cannot specify whether the
word is a name, place, relationship, or something else. This is called a
freeform, general, or unfielded search. While it does
offer the advantage of searching many things at once, it doesn’t give much
flexibility to limit or narrow the search results.
Ancestry. This is a
site with many different databases. They have made a default search that
searches across those databases—census, wills, family history books,
obituaries, and more. You usually search by name, but can add country, state or
province, year range, keyword, or record type. You can also specify whether to
use soundex or exact spelling. The search template
does not change when you specify a record type. But if you
select a record type from the list at the right, the search template changes.
You will then probably only be able to specify name and keyword. You also will
get a list of databases so that you can further restrict your search. The
general policy for Ancestry is to search the database for the items specified,
but to ignore any input fields that don’t apply to that database. For example,
if you specify a range of years, but the database being searched doesn’t
specify years, Ancestry’s metasearch will just ignore the year range and
display any results from that database.
FamilySearch. This site lists the databases it
searches along the left. The default is “all resources,” a metasearch. You can
limit your search to a specific database to get additional search options. FamilySearch’s general policy is to restrict your search
fields to just those fields that are common to the databases being searched.
Thus, an all resources search has search fields. When you select census
records, you get different search fields. When you select one census year to
search, you get yet more options unique to that census. Exceptions include the
web site search, which disregards everything you enter except surname (this
search is nearly useless, except for unusual surnames).
Heritage
Quest Online. This site
has some very useful ways of grouping results and actually has the most
flexible census searches. It is less of a metasearch than the other sites
listed here because it has three categories, and has no single search that
searches all three categories. Nonetheless, there are thousands of databases
searched within a category.
Genealogical
Metasearches
Perhaps
the best example of a multiple-site metasearch is Internet Family Finder (www.genealogy.com/ifftop.html).
This searches over 300,000 separate family history databases. Unfortunately,
the search has no true fields other than first and last name, but those have
been well identified—making it much more useful than USGenWeb’s
text only search.
Another
example is more of a tool than a search—Culman’s MultiGen. Found at http://ourworld.compuserve.com/homepages/CACulman/MultiGen.htm,
this site has you enter a name once, then submits a
search request to ten genealogy sites at once.
If you select the “open new window” option and then click on “Search
them all,” you will get ten windows with the separate results from each of the
ten sites. While not a true metasearch because it doesn’t combine the results,
it can save time and conduct several searches at once…
Clustering
Metasearches
Clustering
metasearch engines find results and group results by common terms found on the
resulting pages. This can be very helpful by suggesting other terms that you
may recognize and use to narrow down your search results. Three unique examples of clustering
metasearches are Clusty (www.clusty.com) Kartoo (www.kartoo.com), and Vivisimo
(www.vivisimo.com).
Yet More Search Engine
Information
What
makes a good Internet metasearch is an engine that searches good databases, accepts
complex searches, integrates results well, eliminates duplicates, and offers
additional features such as clustering by subjects within your search results.
Yet
another metasearch with some extra helpful features is ZapMeta
(www.zapmeta.com). Try turning snapshots on. Try the other
features out!
For
more general Internet search tools and information on search engines, see http://searchenginewatch.com/links/article.php/2156241,
http://www.searchenginewatch.com/facts/index.php,
and http://www.netstrider.com/search/.
|
|
©Copyright 1997-2005 by Alan E. Mann. All rights reserved.
Written permission to reproduce all or part of this syllabus material in any
format, including photocopying, data retrieval, or the Internet, must be
secured in advance from the copyright holder.