Assignment 10, part 1: INFORMATION RETRIEVAL
To test the various methods of indexing used in this course, we will retrieve documents to answer three questions. In the LIS 536 INMAGIC database, search each of the following requests:
- 1. I want to know everything about health fairs.
- 2. What problems are typically encountered in family planning counseling?
- 3. Does health promotion have any effect on the involvement of people in their own
- health care?
Search each request three times, using each of these three methods:
- (1) "Automatic" free-text searching: title and abstract fields only -- not descriptor fields
- (2) Human-assigned indexing: descriptors only (in DA and/or DB fields only); and
- (3) a combination of both free-text (in titles and abstracts) and descriptors (in DA/DB fields).
Within these parameters, the search stragegy formulation is entirely up to you. Try various search formulations for each request and method. Plan your searches carefully. Searching procedures on INMAGIC may be different from other systems you know; do NOT assume you can use "standard" syntax. Use the online Help buttons and commands to learn about specific search techniques: e.g. about truncation wildcard characters, combining search fields, combining Boolean operators within one field, etc.
Write down your final search formulations for each search method for each request (9 altogether). If you change or refine your search strategies, jot down notes about why you did so. If appropriate, add your comments about any problems you encountered in searching, or anything unexpected that you found.
Examine the articles retrieved for each request. Write down the article numbers of the final sets you retrieved for each of the three search methods for each of the three requests (9 altogether). (The last set you retrieved for each search question is the "final" set.)
On Your Honour: do your searches first (Part 1); THEN AND ONLY THEN, look at Part 2.
Assignment 10, Part 2: RETRIEVAL PERFORMANCE
In Part 2, we will see whether the indexing method (automatic free text or manually-assigned descriptors, or both) affects retrieval performance, using precision and recall as our retrieval effectiveness measurements:
Precision = Number of relevant documents retrieved divided by
- Total number of documents retrieved
Recall = Number of relevant documents retrieved divided by
- Total number of relevant documents in the collection
For Assignment 10 (both parts), briefly write up and hand in the following:
- Write up your search formulations and the final sets of all article numbers retrieved for each of the 3 requests, using each of the 3 search methods. For each request and method, list your articles in ascending numeric order. (You should have 9 sets of articles; you may find it helpful to write it up as a table). You can refer to requests and methods by number (1, 2, 3). Write up any pertinent notes about why you selected each search strategy, whether and why you refined any of your search strategies, and any problems or comments you had about the search process.
- For each of the 3 requests and 3 methods, determine which articles retrieved were relevant, and which were not, using the attached key and your own judgement. If you eliminate or add a "relevant" article from the list given in the key, explain why you did so. Compute and write up the precision and recall ratios for each request and each method (you should have 9 pairs of ratios).
- Examine the ratios to find out whether any patterns or errors emerge. If you are able to identify a pattern for one request, check whether similar patterns exist for the other requests. Write up your observations. To find out why some articles were missed (low recall) and others retrieved erroneously (low precision), check the indexing and abstracting of those articles and compare to your search formulations. Try to determine why they were falsely retrived or omitted, whether it was the fault of the indexer or the searcher or just unavoidable, and write up your observations.
- Consider your results, and write up your comments about the retrieval performance (as measured by precision and recall ratios). Briefly address each of these four questions:
- Did the indexing and searching methods (free text vs. descriptors, or both) affect retrieval performance? Which performed best? Did the type of request (specific vs. general) affect the retrieval performance?
- How good is the "quality" of our indexing? Is it accurate? Precise? Specific? Exhaustive ?
- Did any of our indexing practices, or our indexing language (controlled vocabulary), or specific errors by an indexer, cause any "mistakes" in retrieval?
- Do you have any suggestions about how to "improve" indexing, and/or advice you could give a new indexer, and/or any other comments on what you have learned about indexing and abstracting in this course?