principles of searching © tefko saracevic1 la búsqueda en la web & la web invisible buscando...
TRANSCRIPT
© Tefko Saracevic 1Principles of Searching
La búsqueda en la La búsqueda en la web & la web invisibleweb & la web invisible
Buscando lo difícil de buscar
© Tefko Saracevic 2Principles of Searching
Definiciones de diccionario
World wide web :Archivos conectados a través de Internet
Enorme conjunto de documentos y otros archivos enlazados localizados en ordenadores conectados a través de Internet y utilizados para acceder, manipular, y descargar datos y programas
Invisible - Definición de diccionario:
Que no puede ser visto
Web Invisible– No está en el diccionario
© Tefko Saracevic 3Principles of Searching
© Tefko Saracevic 4Principles of Searching
What is “Invisible web?” Materiales que no incluyen, o no quieren incluir, los
buscadores generales en su colección de páginas web (índices) Inencontrables a través de los buscadores generales
Contiene una enorme cantidad de recursos de información En buena medida de mayor calidad y autoridad que la de la web
visible La calidad es su principal característica
Especializada Parte de ella fluye, o está en streaming o en tiempo real
“No puedes bañarte dos veces en el mismo río” Parte de ella gratuita
Mucho más extensa que la Web visible
© Tefko Saracevic 5Principles of Searching
en otras palabras…Hay mucho más en la web que
o
© Tefko Saracevic 6Principles of Searching
¿Porqué los buscadores no cubren todo?
Tamaño: La web es enorme, no pueden abarcarlo todo Factores Económicos: Los costes asociados son altos
Los buscadores viven de la publicidad Cierto número de buscadores ofrecen resultados de pago
FactoresTécnicos: Capacidad limitada Algunos formatos de archivo son difíciles de tratar
Spam: Eliminan lo malo pero se puede perder lo bueno Restricciones: Algunos sitios no permiten acceso a los
buscadores Estructura profunda : Algunos sitios son complejos
© Tefko Saracevic 7Principles of Searching
Cómo trabajan los buscadores?
Crawlers, spiders: buscan contenido Localizan sitios nuevos o modificados De maneras periódica, no en cada búsqueda
Ningún buscador trabaja en tiempo real (salvo Gnews) Organización de contenido: etiquetado, ordenación
Indización para búsquedas o clasificación en directorios
Bases de Datos, cachés: almacenamiento de contenidos Motor de búsqueda: en función de las demandas Interface: petición de búsqueda, visualización de resultados
Todo ello basado en algoritmos varios, habitualmente ocultos o secretos
© Tefko Saracevic 8Principles of Searching
Cobertura de los buscadores
No cubren más allá de un 20% Difícil (imposible) discernir & comparar cobertura
Buscadores nacionales Cobertura y orientación propia
Muchos buscadores especializados Cobertura propia hacia un tema de interés
Fuentes independientes de material útil Recopilaciones de recursos evaluados
© Tefko Saracevic 9Principles of Searching
Los buscadores difieren
Hay diferencias muy importantes entre los tipos de buscadores citados Necesidad de saber cómo funcionan y en qué
difieren Información sobre buscadores:
Search Engine Watch ratings, news, statistics, charts, explanations, tutorials
Search Engine Showdown “The users’ guide to web searching” - run by a librarian,
news links, ratings
© Tefko Saracevic 10Principles of Searching
Búsqueda en la web Invisible : Datos básicos
Primer paso: Saber claramente qué estás buscando. Limitar la búsqueda a recursos y herramientas
apropiadas al tipo de información que estás buscando Conocer las fuentes Saber encontrar fuentes
© Tefko Saracevic 11Principles of Searching
Fuentes especializadas -en la web invisible
1. Meta buscadores
2. Buscadores y catálogos especializados
3. Buscadores y catálogos temáticos
4. Fuentes de referencia
5. Bibliotecas
6. Bibliotecas virtuales
7. Bases de datos especialiuzadas
8. Sociedades, organizaciones
9. Libros!!!
© Tefko Saracevic 12Principles of Searching
Metabuscadores
Metabuscadores buscan en otros B. Combiando resultados
Dónde encontrar buscadores o metabuscadores:
SearchEngines.comsearch for engines by topic, geography, reference
Search Engine Guide engines categorized by topic; other engine information
Search Engine Colossus international directory of search engines by country, topic from
198 countries and 61 territories; engines in choice of languages
© Tefko Saracevic 13Principles of Searching
Sample of meta engines Some meta engines provide organized results:
Dogpile results from a number of leading search engines; gives
source, so overlap can be compared; (has also a (bad) joke of the day)
Surfwax gives statistics and text sources & linking to sources; for some
terms gives related terms to focus
Teomaresults with suggestions for narrowing; links resources
derived; originated at Rutgers
Turbo10provides results in clusters; engines searched can be edited
© Tefko Saracevic 14Principles of Searching
meta search engines (cont.)
Large directory Complete Planet
directory of over 70,000 databases & specialty engines
Results with graphical displays Vivisimo
clusters results; innovative Webbrain
results in tree structure – fun to use
Kartooresults in display by topics of query
© Tefko Saracevic 15Principles of Searching
Domain engines & catalogs
Cover general & specific subjects Open Directory Project
large edited catalog of the web – global, run by volunteers BUBL LINK
selected Internet resources covering all academic subject areas; organized by Dewey Decimal System – from UK
Profusion search in categories for resources & search engines
Resource Discovery Network – UK“UK's free national gateway to Internet resources for the
learning, teaching and research community”
© Tefko Saracevic 16Principles of Searching
domain engines …
Available in variety of domains & subjects – rich! Think Quest – Oracle Education Foundation
education resources, programs; web sites created by students
All Music Guide resource about musicians, albums, and songs
Internet Movie Database treasure trove of American and British movies
Genealogy links and surname search engineswell.. that is getting really specialized (and popular)
© Tefko Saracevic 17Principles of Searching
domain engines …
Scholarship, science Psychcrawler - Amer Psychological Association
web index for psychology Entrez PubMed – Nat Library of Medicine
biomedical literature from MEDLINE & health journals CiteSeer - NEC Research Center
scientific literature, citations index; strong in computer science
Scholar Googlesearches for scholarly articles & resources
Infominescholarly internet research collections
© Tefko Saracevic 18Principles of Searching
Reference services
Reference services - several models Q&A, directories, email answers etc.
Ask Jeeves! most popular, commercial
Information Please almanac type questions
RefDeskaccess to a number of reference tools
Wikipediaweb encyclopedia in many languages
© Tefko Saracevic 19Principles of Searching
reference …
• Digital reference - new service area for libraries QuestionPoint L of Congress & OCLC
project for a global reference network Virtual Reference Desk – L of Congress
large compilation of web reference sites LiveRef - maintained at Iowa State U
a registry of real time digital reference services
© Tefko Saracevic 20Principles of Searching
Libraries as web sources
Academic, national libraries providing open collections & services; models vary Rutgers libraries - big long term effort University of California, Berkeley
a most elaborate effort together with Sun Corporation
LibWeb U California, Berkeley
“lists currently over 7200pages from libraries in over 125 countries”
Bibliothèque Nationale de France includes virtual exhibitions, among others
© Tefko Saracevic 21Principles of Searching
Virtual libraries on the Web
Libraries emerging only on the Web Virtual Library –
Switzerland, US, UK & other countries – ‘oldest virtual library on the Web’
Internet Public Library U of Michigan also a long term effort
Librarians Index of the Internet very popular and comprehensive
Digital librarian
“a librarian's choice of the best of the Web “ – compiled and annotated by a librarian
© Tefko Saracevic 22Principles of Searching
virtual libraries …
Academic Info Digital Library many links to digital collections & resources in various subjects
Gabriel Gateway to European National Libraries
Museum of online museums a delight
Stanford Encyclopedia of Philosophy
a comprehensive encyclopedia and libraryThe historical New York Times Project
universal library – ongoing digitization
© Tefko Saracevic 23Principles of Searching
Subjects resources Many subject specific sites
rich & often unique coverage & services different approaches & requirements
Examples in health related domains: WebMDHealth
news, medical information Rxlist
The Internet Drug Index Mayo Clinic HealthOasis
health advice
Kidshealthsites for parents, kids, teens
© Tefko Saracevic 24Principles of Searching
Subject resources …
Scholarship, humanities, government KIRKE - Katalog der Internetressourcen für die
Klassische Philologie aus Erlangen German; a variety of resources for classics
Perseus Digital Library Tufts University covers antiquity to renaissance; one of the best subject
sites on the web; affected the whole field Sch of Slavonic & East European Studies, University
College London includes country resources, e.g. Croatia U Mich Document Center official documents from all over the world
© Tefko Saracevic 25Principles of Searching
Subject resources … Growing number of resources in arts, museums
MuseumStuff.com
“We have 1000's of museums, zoos, historical societies and related organizations in our database”
The State Hermitage Museum
One of the greatest museums in the world, and one of the best museum site – developed with IBM help
National Museum of Science and Technology Leonardo da VinciGuess where those pictures came from. A delight!
© Tefko Saracevic 26Principles of Searching
subject resources …
DiotimaMaterials for study of women and gender in the Ancient World
Moving Images Collections“MIC documents moving image collections around the world.”
Part particularly oriented toward science educators. Now at Library of Congress, but developed at Rutgers.
And, of course …
Snoopy
The Official Peanuts Website
© Tefko Saracevic 27Principles of Searching
Societies, organizations Many societies, agencies developed their sites
great many rich sources for searching & resources differences in requirements, depth, richness Assoc. for Computing Machinery
Digital Library; subscription or registration or through RUL US State Department
about the U.S & other countries
FirstGovthe US government official web portal
Ocean Planet NASApresentation of earth & its vast oceans
ArXiv Cornell U, National Science Foundatione-print service in the fields of physics, mathematics, non-linear
science, computer science, and quantitative biology
© Tefko Saracevic 28Principles of Searching
Archiving, books on the web Internet Archive – a large undertaking
includes web archive & lots more publicly available & free 10 billion web pages archived from 1996 to a few months ago Wayback Machine – search to look at old versions of web
pages Books on the web
Million Book Project digitizing books and providing free access
International Children’s Digital Libraryonline children books
Digital books Index“links to more than 105,000 title records from more than 1800
commercial and non-commercial publishers, universities, and various private sites”
© Tefko Saracevic 29Principles of Searching
Language barriers on the Web
English still the major language but declining, now slightly over 50%
Multilingual retrieval search engines Euroseek
searches in a number of languages All the Web
results in 45 languages
© Tefko Saracevic 30Principles of Searching
Web news; keeping up
What is going on on the Web? Some major sources of news and evaluations: Free Pint
newsletter, articles, links; nice & sometimes quirky Internet Resources Newsletter
UK based; monthly newsletter for “academics, students, engineers, scientists and social scientists”
ResearchBuzz daily updates; many aspects; “Collection of items on search
engines, online databases, and other information resources” About.com Web Search
tools, Web Search Forum
© Tefko Saracevic 31Principles of Searching
keeping up …
Information Todaytrade & professional monthly newspaper & web site; industry
news; searcher columns; general analyses of trends
Keeping up through blogosphere: Resource Shelf
bloger about internet (and some other stuff) with archive; it has really good and really bad exchanges & threads
New York Times blogrunner - The annotated NYT blog tracking of NYT articles, topics, authors; thread into
discussion of many other weblogs; includes net & web topics
© Tefko Saracevic 32Principles of Searching
Finding links & listings – back to good old books with a new twist
Number of books on web searching have also sites with links in the book, updates, news Extreme Searcher Randolph Hock
update of a popular book; links by chapter topics
The web library Nicholas G. Tomaiuolospotlights free resources, links by chapter and new topics –
done by a librarian
The invisible web Chris Sherman & Gary Priceoriginal book on the topic, links organized by subject
p.s. most, but not all, of the sites in this lecture can be found on those sites – and much, much more
© Tefko Saracevic 33Principles of Searching
Evaluations, ratings Evaluating web sites: a prime responsibility of
searchers & all information professionals Many sources evaluate web sites:
The Scout Report – librarians’ BIBLE! Annotations. Comprehensive.
Medical Library Association ten most useful sites for consumer health
MLA user guide for finding & evaluating health information on the web
Web 100 commercial, user ranking & evaluation of web sites Evaluating web pages UC Berkeley
tutorial and guide
© Tefko Saracevic 34Principles of Searching
Needed for Web searching Knowledge & competencies on
variety of web sources & their organization search engines web search strategies search dynamics, feedback
Keeping up & up & up Why? many reasons, such as:
constant updates, changes, innovations many domain/subject specific fluidity very high
© Tefko Saracevic 35Principles of Searching
Needed for web searching by professionals
Knowledge of SOURCES in area of interest search engines not enough
not too helpful in finding these other sources; structure hard to discern
find & use specialized sources Evaluation of sources
a key professional skill! application of standard criteria & web criteria:
authority; accuracy; currency (timeliness); objectivity; coverage, persistence, usability
© Tefko Saracevic 36Principles of Searching
Needed competencies …
Knowledge of users & use Knowledge of searching Use of technology Adaptability, flexibility Integration with other resources Teaching others Constant learning & update
again: keeping up, keeping up, keeping up and again: keeping up, keeping up, keeping up
© Tefko Saracevic 37Principles of Searching
information
WWW
But now really: How to do it?
© Tefko Saracevic 38Principles of Searching
© Tefko Saracevic 39Principles of Searching
© Tefko Saracevic 40Principles of Searching
Imagesfrom the invisible web
© Tefko Saracevic 41Principles of Searching
images …
© Tefko Saracevic 42Principles of Searching
images …
© Tefko Saracevic 43Principles of Searching
and of course…
© Tefko Saracevic 44Principles of Searching
P.S. a nice site
Poem by Emily Dickenson:
In a library
Who will write a poem:
In a digital library?
© Tefko Saracevic 45Principles of Searching
P.S. a few weird sites…
SelectSmart.com all kinds of quizzes for you
James Dean official web site Deaducated
Dead Librarians’ Society Livejournal
blogs & authoring tools; and many pathetic entries
© Tefko Saracevic 46Principles of Searching
Sources About.com Web Search http://websearch.about.com Academic Info Digital Library http://www.academicinfo.net/digital.html All the Web http://www.alltheweb.com/ Ask Jeeves! http://www.ask.com/ Assoc. for Computing Machinery http://www.acm.org/ Bibliothèque Nationale de France http://www.bnf.fr/ BUBL LINK http://bubl.ac.uk/link/ CDNET Search.com http://www.search.com/ CiteSeer http://citeseer.nj.nec.com/ CompletePlanet http://completeplanet.com Deaducated http://www.geocities.com/deadlibrarians/ Digital book index http://www.digitalbookindex.org/about.htm Digital librarian http://www.digital-librarian.com/ Diotima http://turbo10.com/ Dogpile http://www.dogpile.com/ Entrez PubMed http://www.ncbi.nlm.nih.gov/PubMed/ Extreme Searcher http://www.extremesearcher.com/ Free Pint http://www.freepint.com/ Gabriel http://www.kb.nl/gabriel/ Genealogy http://darcisplace.com/darci/search.htm
© Tefko Saracevic 47Principles of Searching
sources … Hermitage http://www.hermitagemuseum.org/html_En/index.html Information Please http://www.infoplease.com/ International Children’s Digital Library http://www.icdlbooks.org/ Internet Archive http://www.archive.org/ Internet Public Library, Michigan http://www.ipl.org/ Internet Resources Newsletter. http://www.hw.ac.uk/libwww/irn/ James Dean http://www.jamesdean.com/ Kartoo http://www.kartoo.com/ KIRKE http://www.phil.uni-erlangen.de/~p2latein/ressourc/ressourc.html Leonardo da Vinci Museum http://www.museoscienza.org/english/ Librarians Index to the Internet http://lii.org/ Live Journal http://www.livejournal.com/ LiveRef http://www.public.iastate.edu/~CYBERSTACKS/LiveRef.htm Mayo Clinic http://www.mayohealth.org/ Medical Library Assoc. ten top sites http://www.mlanet.org/resources/medspeak/topten.html Medical Library Assoc. user guide for health inf.
http://www.mlanet.org/resources/userguide.html Medscape http://www.medscape.com/
© Tefko Saracevic 48Principles of Searching
sources … Million Book Project http://www.archive.org/texts/collection.php?collection=millionbooks Museum of online museums. http://www.coudal.com/moom.php MuseumStuff http://www.museumstuff.com/ NYT blogrunner http://nytimes.blogrunner.com/ NYT historical project http://www.nyt.ulib.org/ OCLC Web Characterization Project http://wcp.oclc.org/ Open Directory Project http://dmoz.org Perseus Digital Library http://www.perseus.tufts.edu/ Profusion http://www.profusion.com/ Psychcrawler http://www.psychcrawler.com/ QuestionPoint http://www.questionpoint.org/ ResearchBuzz. http://www.researchbuzz.com/index.shtml Resource Shelf http://resourceshelf.blogspot.com/ Rutgers Libraries http://www.libraries.rutgers.edu/ RxList http://www.rxlist.com/ Sch of East Eur & Slavonic Studies http://www.ssees.ac.uk/dirctory.htm Search Engine Colossus http://www.searchenginecolossus.com/ Search Engine Guide http://www.searchengineguide.com/ Search Engine Showdown http://searchengineshowdown.com/
© Tefko Saracevic 49Principles of Searching
sources … Search Engine Watch http://searchenginewatch.com/ Select Smart.com http://www.selectsmart.com/home.html Snoopy http://www.snoopy.com/ Stanford Encyclopedia of Philosophy http://plato.stanford.edu// Surfwax http://www.surfwax.com/ Teoma http://teoma.com/ The invisible Web http://www.invisible-web.net/ The Scout Report. http://scout.cs.wisc.edu/ The Web Library http://www.ccsu.edu/library/tomaiuolon/theweblibrary.htm Think Quest http://www.thinkquest.org/ Turbo10 http://turbo10.com/ U California Berkeley http://sunsite.berkeley.edu/ U Mich Documents Center http://www.lib.umich.edu/govdocs/ US State department http://www.state.gov/ Virtual Library http://vlib.org Virtual Reference Desk http://www.loc.gov/rr/askalib/virtualref.html Vivisimo http://vivisimo.com Web 100 http://www.web100.com Webbrain http://www.webbrain.com/html/default_win.html WebMD http://my.webmd.com/webmd_today/home/default Wikipedia http://www.wikipedia.org/