Alan
E. Mann, AG
alan.familyhistory@gmail.com Accredited
Genealogist
BYU 2008 Computerized Genealogy conference www.alanmann.com/articles
Saturday, 15 March 2008 4:15
pm
Search Engines: Getting More
from Google and Using More than Google
The Internet is the richest source of genealogical information available today. The amount, scope, and availability of data are staggering, even incomprehensible. It is likely there is valid information about your ancestors on the Internet that you don’t have. Information you would probably want if only you could find it. So how can you find it? That’s the topic of this session.
What is a Search Engine?
While search engine can be correctly used in several different ways, the most common usage is a web tool used to find web pages on a specific topic. It is important to understand what a search engine is, what it includes, and how to best use it. The answers to these questions may vary between each search engine. Generally, a search engine indexes web sites it has been able to identify and index. Some search engines index every word, some index only the first page, and a few just the first few sentences.
Companies and individuals with web sites notify the search engines about their sites because they want people to find out about their site. Search engines also use crawlers, spiders or robots—tools that go looking around the Internet, capturing pages and then indexing them. Beware—not all search engines are equal. Some index the first few sentences of a web page only. Others index every word. No search engine indexes all of the web. Some have billions of pages and others only have a few hundred million…
Search engines are not designed specifically for genealogy, but rather search for whatever words you input to search. Originally, while a search engine could index the name Richard Poor, it wouldn’t be able to distinguish between a person by that name, Poor Richard’s almanac, and a play that had the line “Alas, poor Richard…” Once when searching for wills left by my Brooks family ancestors, a search engine confidently directed me to a page where I found the sentence “Garth Brooks will be appearing…” Search engines are becoming more sophisticated and some ability to distinguish is being designed into their search findings. There are some tricks to using search engines. Use unusual names whenever possible (see sidebar below).
The best known search engine today is Google. Google indexes every word on the sites that it has copied. It is estimated that Google’s 20 billion + indexed pages represents somewhere around 20% of the web. That means that over 80% of the web remains unrepresented in Google! Consider using more than just Google because no search engine indexes more than 20% of the web and also because none of them index the SAME 20% or less. There are sites on each that may not be listed on others. Another reason is that there are different methods of searching—different ways to apply your search terms. You need to read a search engine’s help page, advanced search tips and experiment with each to find the best way to use that search engine.
In addition, there are nearly a million other search engines. Some are specialized, some are general. The major four are Google, Yahoo, Ask, and Live (formerly MSN). General information with links to search tips can be found at www.geocities.com/familyhistory.geo/search2006.htm.
The lines between different types of search engines are blurring. The various types have been adopting the features and some of the functions of the other types, making distinctions almost nonexistent. Nonetheless, it would be worthwhile to look at four special types or characteristics of search engines, namely metasearch, clustering, federated, and custom searching.
Metasearching
Originally, a metasearch engine is one in which you submit keywords in its search box, and it then transmits your search to several individual search engines simultaneously. Within a few seconds, you get back results that came from several search engines. Metasearch engines do not have their own index or database of Web pages; they send your search terms to those kept by search engine companies, then combine the results from their indexes. Merriam Webster defines meta as more comprehensive : transcending <metapsychological> -- usually used with the name of a discipline to designate a new but related discipline designed to deal critically with the original one. Therefore, a metasearch would a comprehensive or transcending search, or, in other words, a search which includes more than one search.
Single Site MetaSearch. Here, we can talk about Ancestry, FamilySearch, RootsWeb, Heritage Quest Online,USGenWeb Archive, or a variety of other web sites. These sites have many databases, with a tool that will search through and present results from all of their databases.
Multiple Site MetaSearch. What you need to know about metasearching is that the quality of their results depends on what they search and how they organize the results. A metasearch cannot be better than the sum of the individual databases they query. What makes a good Internet metasearch is an engine that searches good databases, accepts complex searches, integrates results well, eliminates duplicates, and offers additional features such as clustering by subjects within your search results.
While there are many metasearch engines, I would like to show you one with some extra helpful features -- ZapMeta (www.zapmeta.com). Try turning snapshots on. The past versions can be helpful, and I love the preview panes! Three companies have tried to apply the broad metasearch concept to genealogy. The two still in business are Internet Family Finder and MyHeritage. Perhaps the best example of a multiple-site metasearch is Internet Family Finder (www.genealogy.com/ifftop.html). This searches over 300,000 separate family history databases. Unfortunately, the search has no true fields other than first and last name, but those have been well identified—making it much more useful than text only searches.
Clustering
Clustering metasearch engines find results and group results by common terms found on the resulting pages. At first, the difference between a regular metasearch and a clustering metasearch is difficult to see. They both allow searching for specific terms within a set of results. The difference is clustering tools suggest terms, not just search what you input. This can be very helpful by suggesting other terms that you may recognize and use to narrow down your search results. The leader in this field is www.clusty.com. Two unique examples of clustering metasearches are Kartoo (www.kartoo.com), and Vivisimo (www.vivisimo.com). The strategy in using these tools is to search for a name, records type, or concept and then use the words in common on the left to focus in on what you are looking for.
Federated Searching
A federated search is a type of metasearch. Rather than freeform searching typical of most search engines, a federated search uses organized or fielded data and combines two or more sites with similar data structure into a single results list. For example, both Ancestry.com and FamilySearch.org allow you to search by first name, last name, birth year, and place. A federated search might allow you to input each of those four pieces of data, then search both FamilySearch and Ancestry, then present the results in a single list. The leader in federated searching is WebFeat. See the Family History Library desktop combined search or MyHeritage as an example. MyHeritage (www.myheritage.com/research) combines many sites with search tools. BYU Idaho has a tool that federates searching biography databases, but only work on campus. See the list of databases that are federated at http://abish.byui.edu/library/r_biographies.cfm.
Custom Searching
Google has made it possible for those wanting to try a little programming to design customized Google searches. For a few examples of such searches, see:
For several years, we have
heard lectures about proper searching, from trying to help individuals use
search techniques more effectively to designing better search engines. For
example, see www.myheritage.com/research.
There has been considerable progress on both fronts, but we yet have to see
either a true semantic genealogy search or a search devoted to genealogy which
learns from the user. There are many personalized searches (which look at your
personal searching habits and rearrange search results based on your perceived
preferences) and some social searches (which rearrange search results based on
your friends’ searching preferences). Both of these fail to directly address
the need to refine searches within a particular discipline. One tool which
claims to be able to learn from a community of searchers in a particular
discipline is www.toppersearch.com.
Yet More Search Engine
Information
Your web browser has a default search engine, which is where
it goes when you type what you want to find into the address bar of your web
browser. Try it and see what happens. My wife loves this! If you think your
research might be in
For more information about search engines, see www.searchengineshowdown.com searchenginewatch.com/links/article.php/2156241 (metasearching), searchenginewatch.com/resources/index.php (facts, tutorials, explanations), and www.searchenginewatch.com/facts/index.php. A couple more concepts to cover: personalization and intelligent searching.
Several search engines now have personalization. An example is A9 (a9.com). You can layout your search and results screens, change column width or remove columns altogether, and much more. You can store your favorites on the site (accessible from any computer), and you can personally annotate any web page using “my diary.” With only a few improvements, this could be a major improvement in your search experience.
A newly relaunched search engine is GigaBlast. Or try a slider. One early slider implementation is http://mindset.research.yahoo.com/, which lets you slant your search results towards shopping or research. Other sliders will allow you to specify how relevance ranking is weighted.
Intelligent Searching
In an intelligent search, the computer gathers the results from the many web sites and evaluates whether any of them could be the ancestor you want. A semantic web would mark up records, transcripts, abstracts, or indexes with the context of the records. Thus, not only name, but place, time period, and record type would be identified within the meta-tags embedded in the page itself. This is the future of genealogical research. The only question is how long it will take.
|
Internet
Search Tips 1.
Start your
searches broad and narrow it down when necessary. Too much detail may cause
you to miss something useful. For example, when visiting FamilySearch.org, we can enter first
name, last name, birth year, birth place, spouse’s name, father’s full name,
and mother’s full name. Enter just the first and last name unless the name is
fairly common (you probably shouldn’t search for just Thomas Walker, William
Jones, or Mary Taylor, but add some place or time period when searching for
common names). If the surname is unusual, enter surname only for your search.
To start, leave the place blank—you never know when family members might be
found in an unexpected part of the world. If you get too many hits, then add
some detail to a new search—but only as much detail as necessary to reduce
the search results to a manageable level. 2.
Be aware of what
is being searched. You may want to search at a higher or lower level (e.g.,
IGI only vs. “all resources”). Sometimes, restricting your search to a single
database gives you additional search options or effectively narrows your
search down to something manageable. 3.
Check out search
help—it often actually helps. This may be hidden in a link to “advanced
search.” There may be options that look appealing but that you shouldn’t
use—like exact spelling in FamilySearch. 4.
Find out what
options you have in searching. You may be able to use partial, truncated, or
wildcard searches. 5.
The most
important tip of all is the simplest. READ THE SCREEN. If you take time to do
that, you can avoid many rookie mistakes. 6.
Consider
searching for uncommon names first. If John Smith married Hortense
Frinzwilter, don’t search for John Smith—search for Hortense Frinzwilter. If
David Brown had a brother Eliphalet Brown, search for Eliphalet. David may
appear on the same page or in the same source. 7.
If you enter a
year, ALWAYS select “range of years” or + or – x years. Years are
often estimated, approximated, or incorrectly reported. 8.
Don’t stop just because you succeed! Success can be a barrier to greater success. One
common mistake is to stop searching when we find something. It’s great to
find something about our ancestor, but we should continue our search. Finish
the list of search results or “hits”, then continue with the other search
aspects (see tip 9) even after finding what you were seeking. 9.
Alter your
search approach to find information not just by name, but also by places your
ancestors lived, topic (e.g., ethnicity, religion, society), time period,
event (e.g., war, famine), characteristic, record type needed, or keyword.
Another approach may lead to additional information. 10.
The Internet is
dynamic. If you don’t find it today, try again later. If you find something today, you may be
able to find yet more later. This means that our
searches may need to be repeated from time to time to locate new information
or newly indexed or categorized information. 11.
Did you know you
can use site: or inurl: to focus on the best results? For example, I want to
see if any of my Iron county, Utah STUBBS family appears on the USGenweb page
for Iron county, but the site doesn’t have a search engine, I just type site:rootsweb.com
inurl:utiron stubbs into my google search, and I’ve cut hours of
searching to seconds. ©Copyright 2006-8 by Alan E. Mann. All rights
reserved. Written permission to reproduce all or part of this syllabus
material in any format, including photocopying, data retrieval, or the
Internet, must be secured in advance from the copyright holder. |