Thanks Kathy!
P Smith
----- Original Message -----
From: "Kathy Jordan" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Friday, November 17, 2006 11:06 AM
Subject: FW: Web archiving -- Library of Virginia
Dear state agency records officers:
Below is a message I sent to the Virginia government webmasters listserv
regarding Web archiving activities at the Library of Virginia. Would
appreciate your assistance in forwarding this message to the appropriate
people in your agency in the event that they are not on the webmaster
listserv.
Please let me know if you have any questions. The Library will continue
to keep you all informed as new policies and procedures for archiving
Web content are created at the Libary of Virginia.
Thanks very much,
Kathy Jordan
________________________________
From: Kathy Jordan
Sent: Friday, November 17, 2006 10:59 AM
To: [log in to unmask]
Cc: Conley Edwards
Subject: Web archiving -- Library of Virginia
Dear state government webmasters:
The Library of Virginia has begun a test project to archive all state
agency Web sites. In order for this new project to work well, we need
your help with two things: installing a robots text file on your site
root and providing us with a complete list of all your site URLs.
You already may be familiar with our earlier web archiving project,
wherein we successfully crawled, indexed and made available publicly the
entire stable of Web sites in Mark Warner's administration. (You can
access this project here:
http://www.lva.lib.va.us/whatwehave/webarchive/
<http://www.lva.lib.va.us/whatwehave/webarchive/> ).
Although we learned a LOT from the Warner pilot, we realize that Web
archiving is a new area of collecting that still requires much testing
and analysis. And in order for us to best test our processes, we ask
that you help us by doing the following:
Provide a Complete List of Your Site URLs
The Library begins crawls of your site with what is called a "seed."
Most often this is the basic URL that directs users to your home page,
which is easily identifiable for Library staff.
However, it is difficult to identify URLs that are different than your
main home page address -- for example, aliases, subdomains, and older
URLs still in use (such as old www.agencyname.state.va.us
<http://www.agencyname.state.va.us> addresses that still work).
If you email me a list of all your aliases, old working URLs, etc. we
will add them to our crawls in order to better capture all of your
active content.
Place a Robots Text File on Your Site Root
Many of you are familiar with robots.txt exclusions. These files outline
for various crawlers the parameters of access to the files of your Web
sites.
All we are asking is that you add the following content to your existing
robots.txt file to allow the Library's crawler full access to your site:
User-agent: archive.org_bot
Disallow:
If you are unfamiliar with robots files or would like to see an example
of one that already includes permissions for the Library's bot, please
see Governor's Kaine's Web site: www.governor.virginia.gov/robots.txt
<http://www.governor.virginia.gov/robots.txt>
If you need assistance with the robots text file, please contact either
Emily Lockhart at [log in to unmask]
<mailto:[log in to unmask]> or Rose Schooff at
[log in to unmask] <mailto:[log in to unmask]> .
Please pass this message along to other interested parties responsible
for state agency Web sites who may not have received it via this list.
Of course, you may contact me with any other questions or comments.
Thanks in advance for your cooperation and assistance!
Regards,
Kathy Jordan
Kathy Jordan
Electronic Resources Manager
The Library of Virginia
800 East Broad Street
Richmond, VA 23219
804-225-3699
[log in to unmask] <mailto:[log in to unmask]> --
www.lva.lib.va.us <http://www.lva.lib.va.us/>
To UNSUBSCRIBE, change options, or subscribe, please see the instructions
at http://listlva.lib.va.us/archives/va-rol.html
(If using Netscape, must have version 6.1 or higher to view the above page)
To UNSUBSCRIBE, change options, or subscribe, please see the instructions
at http://listlva.lib.va.us/archives/va-rol.html
(If using Netscape, must have version 6.1 or higher to view the above page)
|