VA-ROL Archives

November 2006

VA-ROL@LISTLVA.LIB.VA.US

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Peggy M Smith <[log in to unmask]>
Reply To:
Virginia Records Officer's Listserv <[log in to unmask]>
Date:
Mon, 27 Nov 2006 10:12:40 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (155 lines)
Thanks Kathy!

P Smith
----- Original Message -----
From: "Kathy Jordan" <[log in to unmask]>
To: <[log in to unmask]>
Sent: Friday, November 17, 2006 11:06 AM
Subject: FW: Web archiving -- Library of Virginia


Dear state agency records officers:

Below is a message I sent to the Virginia government webmasters listserv
regarding Web archiving activities at the Library of Virginia. Would
appreciate your assistance in forwarding this message to the appropriate
people in your agency in the event that they are not on the webmaster
listserv.

Please let me know if you have any questions. The Library will continue
to keep you all informed as new policies and procedures for archiving
Web content are created at the Libary of Virginia.

Thanks very much,

Kathy Jordan

________________________________

From: Kathy Jordan
Sent: Friday, November 17, 2006 10:59 AM
To: [log in to unmask]
Cc: Conley Edwards
Subject: Web archiving -- Library of Virginia



Dear state government webmasters:



The Library of Virginia has begun a test project to archive all state
agency Web sites.  In order for this new project to work well, we need
your help with two things: installing a robots text file on your site
root and providing us with a complete list of all your site URLs.



You already may be familiar with our earlier web archiving project,
wherein we successfully crawled, indexed and made available publicly the
entire stable of Web sites in Mark Warner's administration. (You can
access this project here:
http://www.lva.lib.va.us/whatwehave/webarchive/
<http://www.lva.lib.va.us/whatwehave/webarchive/> ).



Although we learned a LOT from the Warner pilot, we realize that Web
archiving is a new area of collecting that still requires much testing
and analysis.  And in order for us to best test our processes, we ask
that you help us by doing the following:


Provide a Complete List of Your Site URLs




The Library begins crawls of your site with what is called a "seed."
Most often this is the basic URL that directs users to your home page,
which is easily identifiable for Library staff.



However, it is difficult to identify URLs that are different than your
main home page address --  for example, aliases, subdomains, and older
URLs still in use  (such as old www.agencyname.state.va.us
<http://www.agencyname.state.va.us>  addresses that still work).



If you email me a list of all your aliases, old working URLs, etc. we
will add them to our crawls in order to better capture all of your
active content.


Place a Robots Text File on Your Site Root




Many of you are familiar with robots.txt exclusions. These files outline
for various crawlers the parameters of access to the files of your Web
sites.



All we are asking is that you add the following content to your existing
robots.txt file to allow the Library's crawler full access to your site:



User-agent: archive.org_bot

Disallow:



If you are unfamiliar with robots files or would like to see an example
of one that already includes permissions for the Library's bot, please
see Governor's Kaine's Web site: www.governor.virginia.gov/robots.txt
<http://www.governor.virginia.gov/robots.txt>



If you need assistance with the robots text file, please contact either
Emily Lockhart at [log in to unmask]
<mailto:[log in to unmask]>  or Rose Schooff at
[log in to unmask] <mailto:[log in to unmask]> .



Please pass this message along to other interested parties responsible
for state agency Web sites who may not have received it via this list.



Of course, you may contact me with any other questions or comments.
Thanks in advance for your cooperation and assistance!



Regards,

Kathy Jordan



Kathy Jordan
Electronic Resources Manager
The Library of Virginia
800 East Broad Street
Richmond, VA 23219
804-225-3699
[log in to unmask] <mailto:[log in to unmask]>  --
www.lva.lib.va.us <http://www.lva.lib.va.us/>


To UNSUBSCRIBE, change options, or subscribe, please see the instructions
at http://listlva.lib.va.us/archives/va-rol.html
(If using Netscape, must have version 6.1 or higher to view the above page)

To UNSUBSCRIBE, change options, or subscribe, please see the instructions
at http://listlva.lib.va.us/archives/va-rol.html
(If using Netscape, must have version 6.1 or higher to view the above page)

ATOM RSS1 RSS2