VA-ROOTS Archives

August 2013

VA-ROOTS@LISTLVA.LIB.VA.US

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Somay, Errol (LVA)" <[log in to unmask]>
Reply To:
Research and writing about Virginia genealogy and family history." <[log in to unmask]>
Date:
Tue, 13 Aug 2013 19:20:19 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (75 lines)
The Virginia Newspaper Project has received a number of comments and questions about Virginia Chronicle<http://virginiachronicle.com/>, the Library of Virginia's online newspaper repository and database. We want to be sure to respond to each question and hope that it helps to improve the experience of using VA Chronicle.



We asked the software developer to respond to the query about Boolean searches. The response is provided directly below with the original query below that. In short, the answer to Mr. Dunn's question is found in the way we "zone" the historical newspapers. The Library of Virginia generally provides page level zoning instead of article level. This is in accordance with the specifications developed by the National Digital Newspaper Program. But other real world factors come into play.  -Errol S.



Wayne isn't doing anything wrong here, although the boolean "AND" bits are unnecessary because the query terms are ANDed anyway. Veridian/Solr isn't doing anything wrong either -- all 56 search results do indeed contain all 4 of the query terms. Wayne is expecting the Alexandria Gazette result to be near the top because the 4 query terms appear close together, but that's not how the Solr relevancy measure works. The Solr relevancy calculation is complicated, but it works based on the size of the text in the search result and the frequency of the query terms within the text. So while the Alexandria Gazette result is found, it's not considered by Solr to be as good a match as the previous 45 matches -- presumably because these contain more occurrences of the query terms (even though they might be scattered around the page). This is another reason why article-level data works better -- the search results are more relevant.




>>Hello,



I love the fact that many old newspapers have been scanned, OCR'd and posted on the website.



Although I saved the link and have spent a good deal of time searching and correcting text, I deleted the original email from the person that notified us of the availability.



My question is how is the advanced search 'supposed' to work versus how it 'actually' works.  It appears that the Boolean 'AND' is ignored.



I started off doing some simple searched like "Dunn" and "Dunkum" (no

quotes) and then correcting the text that was found around those names.

Next, I tried things like "Frank L Dunn" (in quotes) and found the following (which I have already corrected the spellings)"

Alexandria Gazette, Volume 97, Number 297, 12 December 1896 Miss Grace L. Alsop and Mr. Frank L. Dunn, both of Petersburg, eloped on Thursday and were married in Weldon, N. C.



But now if I search for "Grace AND Alsop AND Frank AND Dunn" (without the quotes), there are 56 results returned -- and the entry I am interested in is like number 46!



Yes, I have selected the option to sort by "Best Match First".



If I use Google and search simply for "grace alsop frank dunn" (without quotes), then it is the first entry returned. Naturally, if I click the link, I am taken to the correct page on the LVA website, but since I didn't use LVA's search engine, the text is not highlighted.



Am I doing something wrong with the LVA search?



Thanks,



Wayne Dunn>>




_______________________
Errol S. Somay
Director, Virginia Newspaper Project
Library of Virginia | Richmond, VA 23219
http://virginiachronicle.com
http://virginiamemory.com


To subscribe, change options, or unsubscribe, please see the instructions at
http://listlva.lib.va.us/archives/va-roots.html

ATOM RSS1 RSS2