Alan E. Mann, A.G.
alan.familyhistory@gmail.com Accredited
Genealogist
BYU 2007
Annual Family History & Genealogy conference www.alanmann.com/articles
Friday, 3
August 2007 11:00
am - 12:00
Beyond Google:
Using Internet Search Engines
The Internet is the richest source of
genealogical information available today. The amount, scope, and availability
of data are staggering, even incomprehensible. It is virtually certain there is
valid information about your ancestors on the Internet that you don’t have.
Information that we would probably want if only we knew it was there. So how
can we find it? That’s the topic of this session.
What is a Search Engine?
While search
engine can be correctly used in several different ways, the most common
usage is a web tool used to find web pages on a specific topic. It is important
to understand what a search engine is, what it includes, and how to best use it.
The answers to these questions may vary between each search engine. Generally,
a search engine indexes web sites it has been able to identify and index. Some
search engines index every word, some index only the first page, and a few just
the first few sentences.
Companies and individuals with web sites
notify the search engines about their sites because they want people to find
out about their site. Search engines also use spiders or robots—tools that go
looking around the Internet, capturing pages and then indexing them. Beware—not
all search engines are equal. Some index the first few sentences of a web page
only. Others index every word. No search engine indexes all
of the web. Some have billions of pages and others only have a few
hundred million…
Search engines are not designed specifically
for genealogy, but rather search for whatever words you input to search. Originally,
while a search engine could index the name Richard Poor, it wouldn’t be
able to distinguish between a person by that name, Poor Richard’s almanac, and
a play that had the line “Alas, poor Richard…” Once when searching for wills
left by my Brooks family ancestors, a search engine confidently directed me to
a page where I found the sentence “Garth
Brooks will be appearing…” Search engines are becoming more sophisticated
and some ability to distinguish is being designed into their search findings. There
are some tricks to using search engines. Use unusual names whenever possible
(see sidebar).
The best known search engine today is Google.
Google indexes every word on the sites that it has copied. It is estimated that
Google’s 4 billion + indexed pages represents somewhere in the area of 15-20%
of the web. That means that over 80% of the web remains unrepresented in
Google! Consider using more than just Google first because no search engine
indexes more than 20% of the web and second because none of them index the SAME
20% or less, there are sites on each that may not be listed on others. Another
reason is that there are different methods of searching—different ways to apply
your search terms. You need to read the help page, search tips and experiment
with each search engine to find the best way to use that search engine.
In addition, there are nearly a million other
search engines. Some are specialized, some are general. The major four are
Google, Yahoo, Ask, and Live (formerly MSN). General information with links to
search tips can be found at www.geocities.com/familyhistory.geo/search2006.htm.
The lines between different types of search
engines are blurring. The various types have been adopting the features and
some of the functions of the other types, making distinctions almost
nonexistent. Nonetheless, it would be worthwhile to look at four special types
or characteristics of search engines, namely metasearch,
clustering, federated, and custom searching.
Metasearching
Originally, a metasearch
engine is one in which you submit keywords in its search box, and it then
transmits your search to several individual search engines simultaneously.
Within a few seconds, you get back results that came from several search
engines. Metasearch engines do not have their own
index or database of Web pages; they send your search terms to those kept by
search engine companies, then
combine the results from their indexes.
Merriam Webster defines meta as more comprehensive : transcending <metapsychological> --
usually used with the name of a discipline to designate a new but related
discipline designed to deal critically with the original one. Therefore, a metasearch would a comprehensive or transcending search,
or, in other words, a search which includes more than one search.
Single Site MetaSearch. Here, we can talk about Ancestry, Heritage Quest Online, FamilySearch,
RootsWeb, USGenWeb Archive,
or a variety of other web sites. These are sites that have a lot of databases,
but have a tool that will search through and present results from all of the
different databases.
Multiple Site MetaSearch. What you need to know about metasearching is
that the quality of their results depends on what they search and how they
organize the results. A metasearch cannot be better
than the sum of the individual databases they query. What makes a good Internet
metasearch is an engine that searches good databases,
accepts complex searches, integrates results well, eliminates duplicates, and
offers additional features such as clustering by subjects within your search
results.
While there are many metasearch
engines, I would like to show you one with some extra helpful features -- ZapMeta (www.zapmeta.com). Try turning
snapshots on. The past versions can be helpful, and I love the preview panes! Three
companies have tried to apply the broad metasearch
concept to genealogy. The two still in business are Internet Family Finder and MultiGen. Perhaps the best
example of a multiple-site metasearch is Internet
Family Finder (www.genealogy.com/ifftop.html).
This searches over 300,000 separate family history databases. Unfortunately,
the search has no true fields other than first and last name, but those have
been well identified—making it much more useful than text only searches.
Clustering
Clustering metasearch
engines find results and group results by common terms found on the resulting
pages. At first, the difference between a regular
metasearch and a clustering metasearch
is difficult to see. They both allow searching for specific terms within a set
of results. The difference is clustering tools suggest terms, not just search what you input. This can be very
helpful by suggesting other terms that you may recognize and use to narrow down
your search results. The leader in this field is www.clusty.com. Two unique examples of
clustering metasearches are Kartoo (www.kartoo.com),
and Vivisimo (www.vivisimo.com). The strategy in using
these tools is to search for a name, records type, or concept and then use the
words in common on the left to focus in on what you are looking for.
Federated
A federated search is a type of metasearch. Rather than freeform searching typical of most
search engines, a federated search uses organized or fielded data and combines two or more sites with similar data
structure into a single results list. For example, both Ancestry.com and
FamilySearch.org allow you to search by first name, last name, birth year, and
place. A federated search might allow you to input each of those four pieces of
data, then search both FamilySearch and Ancestry, then
present the results in a single list. The leader in federated searching is WebFeat. See the Family History Library desktop combined
search or www.myheritage.com as an
example. BYU Idaho has a tool that federates searching biography databases, but
only work on campus. See the list of databases that are federated at http://abish.byui.edu/library/r_biographies.cfm.
Custom Searching
Google has made it
possible for those wanting to try a little programming to design customized
Google searches. For a few examples of such searches, see:
Yet More Search Engine
Information
Your web browser has
a default search engine, which is where it goes when you type what you want to
find into the address bar of your web browser. Try it and see what happens. My
wife loves this! If you think your research might be in
For more information
about search engines, see www.searchengineshowdown.com
and searchenginewatch.com/links/article.php/2156241
(metasearching), searchenginewatch.com/resources/index.php
(facts, tutorials, explanations), and www.searchenginewatch.com/facts/index.php.
A couple more concepts to cover: personalization and intelligent
searching.
Something new for several search engines is personalization.
A great example is A9 (a9.com). You can layout your search screen, results
screen, change column width or remove columns altogether, and much more. You
can store your favorites on the site (accessible from any computer), and you
can personally annotate any web page using “my diary.” With only a few
improvements, this could be a major improvement in your search experience.
A surprising
newcomer to the search engine marketplace is GigaBlast. Or try a slider. One early slider
implementation is labs.google.com/personalized,
which lets you slant your search results towards your personal preferences.
Soon to be released sliders will allow you to specify how relevance ranking is
weighted.
|
Internet
Search Tips 1.
Start
your searches broad and narrow it down when necessary. Too much detail may
cause you to miss something useful. For example, when visiting FamilySearch.org, we can enter first
name, last name, birth year, birth place, spouse’s name, father’s full name,
and mother’s full name. Enter just the first and last name unless the name is
fairly common (you probably shouldn’t search for just Thomas Walker, William
Jones, or Mary Taylor, but add some place or time period when searching for
common names). If the surname is unusual, enter surname only for your search.
To start, leave the place blank—you never know when family members might be
found in an unexpected part of the world. If you get too many hits, then add
some detail to a new search—but only as much detail as necessary to reduce
the search results to a manageable level. 2.
Be
aware of what is being searched. You may want to search at either a higher or
lower level (e.g., IGI only vs. “all resources”). Sometimes, restricting your
search to a single database gives you additional search options or
effectively narrows your search down to something manageable. 3.
Check
out search help—search help often actually helps. This may be hidden in a
link to “advanced search.” There may be options that look appealing but that
you shouldn’t use—like exact spelling in FamilySearch. 4.
Find
out what options you have in searching. You may be able to use partial,
truncated, or wildcard searches. 5.
The
most important tip of all is the simplest. READ THE SCREEN. If you take time
to do that, you can avoid many rookie mistakes. 6.
Consider
searching for uncommon names first. If John Smith married Hortense
Frinzwilter, don’t search for John Smith—search for
Hortense Frinzwilter. If
David Brown had a brother Eliphalet Brown, search
for Eliphalet. Then David may appear on the same
page or in the same source. 7.
If
you enter a year, ALWAYS select “range of years” or + or – x years.
Years are often estimated, approximated, or incorrectly reported. 8.
Don’t stop just because you succeed! Success can be a barrier to greater success. One
common mistake is to stop searching when we find something. It’s great to
find something about our ancestor, but we should continue our search. Finish
the list of search results or “hits”, then continue with the other search
aspects (see tip 9) even after finding what you were seeking. 9.
Alter
your search approach to find information not just by name, but also by places
your ancestors lived, topic (e.g., ethnicity, religion, society), time
period, event (e.g., war, famine), characteristic, record type needed, or
keyword. Another approach may lead to additional information. 10. The Internet is dynamic. If what you
wanted isn’t there today, it might be tomorrow. If you find something today, you may be
able to find yet more later. This means that our
searches may need to be repeated from time to time to locate new information
or newly indexed or categorized information. 11. Did you know that you can use site: or inurl: to focus in on the most likely results? For
example, I want to see if any of my Iron county,
Utah STUBBS family appears on the USGenweb page for
Iron county, but the site doesn’t have a search engine. I just type site:rootsweb.com inurl:utiron stubbs into my google search,
and I’ve cut hours of searching to seconds. |
Intelligent
Searching. In an
intelligent search, the computer gathers the results from the many web sites
and evaluates whether any of them could be the ancestor you want. A semantic
web would mark up records, transcripts, abstracts, or indexes with the
context of the records. Thus, not only name, but place, time period, and record
type would be identified within the meta-tags embedded in the page
itself. When a search tool asked for help identifying the birth date of James
Peterson in Holbaek,
Denmark in 1786, the search tool would flag church records from that time
period in that area which contained the name Jens Pedersen. It would include a
marriage in 1810, but ignore a marriage record in 1790 (he would have only been
4 years old). It would look at death records from the mid 1800’s, but would
ignore a death record from the 1900’s. Believe it or not, this is not only
possible, but is the future of genealogical research. The only question is how
long it will take.
|
|
©Copyright 2006-7 by Alan E. Mann. All rights reserved. Written permission
to reproduce all or part of this syllabus material in any format, including photocopying,
data retrieval, or the Internet, must be secured in advance from the copyright
holder.