Rachel's Blog

Friday, November 14, 2008

Dewey Meets Turing

digital +library= librarians, computer scientists, and publishers
--DLI: Digital Libraries Initiative
--considered a "matchmaking" of computer scientists and librarians
--was it successful?
--In 1994 the WWW threw crazy stuff into the picture, which really blurred the lines between pc scientists and librarians (really undermined the common ground that brought the two groups together in the first place)
--bigger web equalled more heuristic approaches to organizing info
--librarians wanted a clear connection to traditional librarian functions
--no matter what the technology, the CORE function of librarians is still relevant
--collections are re-emerging
--more opportunities for connections between scholarly authors/works and librarians
--simply need to come together and find what they need while still working together

digital libraries:challenges and influential work

--greatly distributed scholarly info landscape, makes search and discovery of ideas difficult and taxing
--federated search diagram
--seamless federation of resources = "the holy grail" as author states
--several federally supported projects, incl. UC Berkeley, Stanford, Michigan, etc.
--computer and netowkring technology changed over last decade
--digital world rapidly evolving, this affects 1. publishers, 2. publisher consortiums, 3. bibliographic utilities, 4. academic consortia, and so on.
--several university studies focused on the issues of 'search inoperability' and 'federated searching'

--the goal is to extend services in the next few years to provide better quality access--quality of searching efficient access to information

Tuesday, November 11, 2008

link to website

Here's the link to my website!

www.pitt.edu/~rdr22/

Sunday, November 9, 2008

my comments :)

Here are my comments on Week 10 readings:

https://www.blogger.com/comment.g?blogID=7533952523781723717&postID=8280044869916472597&page=1

https://www.blogger.com/comment.g?blogID=5162573700267662965&postID=7208911322336128681&page=1

Current developments and future trends for the OAI protocol for metadata harvesting.

open archives initiative (OAI)--
so far has been fairly successful
--widely adopted since 2001
--its purpose is defined as : "to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content"
--NSDL provides access to science based learning objects
--problems with the registries are completeness and sparse records
--ongoing challenges:
--metadata variation
--metadata formats
--OAI Data Provider Implementation Practices
--Communication Issues

--this article was great for informing me concerning the OAI. I wasn't deeply familiar with it, but it brought up many good challenges a lot of key points about what the OAI has and will be doing for archives.

Friday, November 7, 2008

the deep web: surfacing hidden value

--traditional search engines are not effective for content located within the deep web
--interesting stat:
fully 95% of the deep web is available to the public without cost/subscription requirements!
--search engines give "indiscriminate crawls" that do not enable access to the full breadth of pages/information out there
--surface web likened to boats on top of water, the deeper you go into the body of water, the more that's down there (the deep web)
--original deep content EXCEEDS all printed global content! whoa!
--serious information searchers must acknowledge the amount and quality of information available through the "deep web" and learn to access it

web search engines, parts 1 and 2

Part 1:
--went from the belief that webpages couldn't be indexed (1995) to very reliable search engines, such as google, yahoo, etc.
--generic search engine infrastructure--multiple, geographically centered data structures
--crawling algorithms process requests and continue until the queue is empty
--real crawlers must address: speed, politeness, excluded content, continuous crawling, spam rejection, and duplicate content

Part 2:
Indexing Algorithms:
--uses and inverted file: two step process including 1) scanning and 2) inversion
Issues with real indexers:
--scaling up: simply too many entries
--term lookup: search terms extend beyond the basic english dictionary to include numbers, characters, email addresses, etc.
--compression
--phrases
--anchor text
--link popularity score
-- query independent score

query processing algorithms:
most common= type that don't include operator words
Speeding up queries:
skipping items
early termination--sort the information as you search
caching