Alan E. Mann, A.G.

alan.familyhistory@gmail.com                                                                     Accredited Genealogist

BYU 2007 Annual Family History & Genealogy conference         www.alanmann.com/articles  

Friday, 3 August 2007                                                                                         11:00 am - 12:00

                                 

Beyond Google:

Using Internet Search Engines

 

The Internet is the richest source of genealogical information available today. The amount, scope, and availability of data are staggering, even incomprehensible. It is virtually certain there is valid information about your ancestors on the Internet that you don’t have. Information that we would probably want if only we knew it was there. So how can we find it? That’s the topic of this session.

What is a Search Engine?

While search engine can be correctly used in several different ways, the most common usage is a web tool used to find web pages on a specific topic. It is important to understand what a search engine is, what it includes, and how to best use it. The answers to these questions may vary between each search engine. Generally, a search engine indexes web sites it has been able to identify and index. Some search engines index every word, some index only the first page, and a few just the first few sentences.

Companies and individuals with web sites notify the search engines about their sites because they want people to find out about their site. Search engines also use spiders or robots—tools that go looking around the Internet, capturing pages and then indexing them. Beware—not all search engines are equal. Some index the first few sentences of a web page only. Others index every word. No search engine indexes all of the web. Some have billions of pages and others only have a few hundred million…

Search engines are not designed specifically for genealogy, but rather search for whatever words you input to search. Originally, while a search engine could index the name Richard Poor, it wouldn’t be able to distinguish between a person by that name, Poor Richard’s almanac, and a play that had the line “Alas, poor Richard…” Once when searching for wills left by my Brooks family ancestors, a search engine confidently directed me to a page where I found the sentence “Garth Brooks will be appearing…” Search engines are becoming more sophisticated and some ability to distinguish is being designed into their search findings. There are some tricks to using search engines. Use unusual names whenever possible (see sidebar).

The best known search engine today is Google. Google indexes every word on the sites that it has copied. It is estimated that Google’s 4 billion + indexed pages represents somewhere in the area of 15-20% of the web. That means that over 80% of the web remains unrepresented in Google! Consider using more than just Google first because no search engine indexes more than 20% of the web and second because none of them index the SAME 20% or less, there are sites on each that may not be listed on others. Another reason is that there are different methods of searching—different ways to apply your search terms. You need to read the help page, search tips and experiment with each search engine to find the best way to use that search engine.

In addition, there are nearly a million other search engines. Some are specialized, some are general. The major four are Google, Yahoo, Ask, and Live (formerly MSN). General information with links to search tips can be found at www.geocities.com/familyhistory.geo/search2006.htm.

The lines between different types of search engines are blurring. The various types have been adopting the features and some of the functions of the other types, making distinctions almost nonexistent. Nonetheless, it would be worthwhile to look at four special types or characteristics of search engines, namely metasearch, clustering, federated, and custom searching.

Metasearching

Originally, a metasearch engine is one in which you submit keywords in its search box, and it then transmits your search to several individual search engines simultaneously. Within a few seconds, you get back results that came from several search engines. Metasearch engines do not have their own index or database of Web pages; they send your search terms to those kept by search engine companies, then combine the results from their indexes.

Merriam Webster defines meta as more comprehensive : transcending <metapsychological> -- usually used with the name of a discipline to designate a new but related discipline designed to deal critically with the original one. Therefore, a metasearch would a comprehensive or transcending search, or, in other words, a search which includes more than one search.

Single Site MetaSearch. Here, we can talk about Ancestry, Heritage Quest Online, FamilySearch, RootsWeb, USGenWeb Archive, or a variety of other web sites. These are sites that have a lot of databases, but have a tool that will search through and present results from all of the different databases.

Multiple Site MetaSearch. What you need to know about metasearching is that the quality of their results depends on what they search and how they organize the results. A metasearch cannot be better than the sum of the individual databases they query. What makes a good Internet metasearch is an engine that searches good databases, accepts complex searches, integrates results well, eliminates duplicates, and offers additional features such as clustering by subjects within your search results.

While there are many metasearch engines, I would like to show you one with some extra helpful features -- ZapMeta (www.zapmeta.com).  Try turning snapshots on. The past versions can be helpful, and I love the preview panes! Three companies have tried to apply the broad metasearch concept to genealogy. The two still in business are Internet Family Finder and MultiGen. Perhaps the best example of a multiple-site metasearch is Internet Family Finder (www.genealogy.com/ifftop.html). This searches over 300,000 separate family history databases. Unfortunately, the search has no true fields other than first and last name, but those have been well identified—making it much more useful than text only searches.

Clustering

Clustering metasearch engines find results and group results by common terms found on the resulting pages. At first, the difference between a regular metasearch and a clustering metasearch is difficult to see. They both allow searching for specific terms within a set of results. The difference is clustering tools suggest terms, not just search what you input. This can be very helpful by suggesting other terms that you may recognize and use to narrow down your search results. The leader in this field is www.clusty.com. Two unique examples of clustering metasearches are Kartoo (www.kartoo.com), and Vivisimo (www.vivisimo.com). The strategy in using these tools is to search for a name, records type, or concept and then use the words in common on the left to focus in on what you are looking for.

Federated

 A federated search is a type of metasearch. Rather than freeform searching typical of most search engines, a federated search uses organized or fielded data and combines two or more sites with similar data structure into a single results list. For example, both Ancestry.com and FamilySearch.org allow you to search by first name, last name, birth year, and place. A federated search might allow you to input each of those four pieces of data, then search both FamilySearch and Ancestry, then present the results in a single list. The leader in federated searching is WebFeat. See the Family History Library desktop combined search or www.myheritage.com as an example. BYU Idaho has a tool that federates searching biography databases, but only work on campus. See the list of databases that are federated at http://abish.byui.edu/library/r_biographies.cfm.

Custom Searching

Google has made it possible for those wanting to try a little programming to design customized Google searches. For a few examples of such searches, see:

 

Yet More Search Engine Information

Your web browser has a default search engine, which is where it goes when you type what you want to find into the address bar of your web browser. Try it and see what happens. My wife loves this! If you think your research might be in Europe, www.euroseek.com finds European sites and lets you search in foreign languages.

 

For more information about search engines, see www.searchengineshowdown.com and searchenginewatch.com/links/article.php/2156241 (metasearching), searchenginewatch.com/resources/index.php (facts, tutorials, explanations), and www.searchenginewatch.com/facts/index.php.  A couple more concepts to cover: personalization and intelligent searching.

 

Something new for several search engines is personalization. A great example is A9 (a9.com). You can layout your search screen, results screen, change column width or remove columns altogether, and much more. You can store your favorites on the site (accessible from any computer), and you can personally annotate any web page using “my diary.” With only a few improvements, this could be a major improvement in your search experience.

 

A surprising newcomer to the search engine marketplace is GigaBlast. Or try a slider. One early slider implementation is labs.google.com/personalized, which lets you slant your search results towards your personal preferences. Soon to be released sliders will allow you to specify how relevance ranking is weighted.

 

Internet Search Tips

1.        Start your searches broad and narrow it down when necessary. Too much detail may cause you to miss something useful. For example, when visiting FamilySearch.org, we can enter first name, last name, birth year, birth place, spouse’s name, father’s full name, and mother’s full name. Enter just the first and last name unless the name is fairly common (you probably shouldn’t search for just Thomas Walker, William Jones, or Mary Taylor, but add some place or time period when searching for common names). If the surname is unusual, enter surname only for your search. To start, leave the place blank—you never know when family members might be found in an unexpected part of the world. If you get too many hits, then add some detail to a new search—but only as much detail as necessary to reduce the search results to a manageable level.

2.        Be aware of what is being searched. You may want to search at either a higher or lower level (e.g., IGI only vs. “all resources”). Sometimes, restricting your search to a single database gives you additional search options or effectively narrows your search down to something manageable.

3.        Check out search help—search help often actually helps. This may be hidden in a link to “advanced search.” There may be options that look appealing but that you shouldn’t use—like exact spelling in FamilySearch.

4.        Find out what options you have in searching. You may be able to use partial, truncated, or wildcard searches.

5.        The most important tip of all is the simplest. READ THE SCREEN. If you take time to do that, you can avoid many rookie mistakes.

6.        Consider searching for uncommon names first. If John Smith married Hortense Frinzwilter, don’t search for John Smith—search for Hortense Frinzwilter. If David Brown had a brother Eliphalet Brown, search for Eliphalet. Then David may appear on the same page or in the same source.

7.        If you enter a year, ALWAYS select “range of years” or + or – x years. Years are often estimated, approximated, or incorrectly reported.

8.        Don’t stop just because you succeed! Success can be a barrier to greater success. One common mistake is to stop searching when we find something. It’s great to find something about our ancestor, but we should continue our search. Finish the list of search results or “hits”, then continue with the other search aspects (see tip 9) even after finding what you were seeking.

9.        Alter your search approach to find information not just by name, but also by places your ancestors lived, topic (e.g., ethnicity, religion, society), time period, event (e.g., war, famine), characteristic, record type needed, or keyword. Another approach may lead to additional information.

10.     The Internet is dynamic. If what you wanted isn’t there today, it might be tomorrow.  If you find something today, you may be able to find yet more later. This means that our searches may need to be repeated from time to time to locate new information or newly indexed or categorized information.

11.     Did you know that you can use site: or inurl: to focus in on the most likely results? For example, I want to see if any of my Iron county, Utah STUBBS family appears on the USGenweb page for Iron county, but the site doesn’t have a search engine. I just type site:rootsweb.com inurl:utiron stubbs into my google search, and I’ve cut hours of searching to seconds.

Intelligent Searching. In an intelligent search, the computer gathers the results from the many web sites and evaluates whether any of them could be the ancestor you want. A semantic web would mark up records, transcripts, abstracts, or indexes with the context of the records. Thus, not only name, but place, time period, and record type would be identified within the meta-tags embedded in the page itself. When a search tool asked for help identifying the birth date of James Peterson in Holbaek, Denmark in 1786, the search tool would flag church records from that time period in that area which contained the name Jens Pedersen. It would include a marriage in 1810, but ignore a marriage record in 1790 (he would have only been 4 years old). It would look at death records from the mid 1800’s, but would ignore a death record from the 1900’s. Believe it or not, this is not only possible, but is the future of genealogical research. The only question is how long it will take.

           

 

 

 

 

 

 

©Copyright 2006-7 by Alan E. Mann. All rights reserved. Written permission to reproduce all or part of this syllabus material in any format, including photocopying, data retrieval, or the Internet, must be secured in advance from the copyright holder.