|
Reporting
with the Internet - October 1999
This
tip sheet is packed with plain-English details and links.
Nora Paul, one of the world's leading trainers of journalists,
explains The Hidden Web: What Search Engines Won't Find (and
how to find them yourself). A companion file to this is "Web
Chart" - an Acrobat Reader file that displays a search engine
comparison chart Nora originally created in Excel. Nora has
moved on from being known as the Web guru, based at Poynter,
and is now the director of the Institute for New Media Studies,
School of Journalism and Mass Communication at the University
of Minnesota.
|
|
The Hidden Web: What
Search Engines Won't Find
(and how to find them yourself)
The major search sites
on the World Wide Web (like Yahoo!, AltaVista, Go, HotBot, Northern Light,
etc.) are incredible resources. But even the best of them index less
than ¼ of the web pages available. There is a “hidden”
net that can hold some of the best resources and most helpful information.
This presentation goes into the secret info stashes to be aware of, how
you can locate them, and how to use them.
What’s NOT in the search engine database?
Information recently added to a web site: It can take months sometimes before
the spider takes another pass at a web site. In the meantime, lots
of valuable information can have been added that you won’t find in your
web search. Here are some techniques that can help
Update Agents
A
service that monitors the web sites or pages you designate.
When there is a change or addition to that site, a notice about the
change is sent to you. Useful for: Keeping track of releases
for an agency you report on when they don’t have an automatic alert
service.
- NetMind: http://www.netmind.com
“NetMind lets users track any Web page at any level of detail -- including
text, numbers, images, forms, links or keywords -- and be proactively
notified via e-mail, pager, cell phone or PDA when the information changes.”
There is a free registration.
- Informant: http://informant.dartmouth.edu/
Saves search engine queries and web sites (like a company or court's
page), checks them periodically, and sends you e-mail whenever there
are new or updated web pages. The Informant searches Alta Vista, Excite,
Lycos and Infoseek.
- Deja: Thread tracker:
http://www.deja.com
If you have signed-up for “My Deja”, you can have new messages that
have been added to a message thread on a newsgroup you are tracking
sent to you via e-mail.
Alert Services
Alert
services are found on individual web sites and are variously known as “watch
lists”, “distribution lists”, “current awareness services”. They use
discussion list (listserv) software to maintain a list of people interested
in updates and news releases that are then sent to their e-mail address.
Everyone who subscribes to the alert service receives the same documents
or news releases. Many government agencies and non-profit organizations
have these sorts of services on their sites. Useful for: Getting story
ideas, keeping up on your beat.
Filters
Filters provide customization. They can be used to get news stories
from wire services or messages from newsgroups that fit the subject profile
you set up. You sign-up and type in words about the types of topics
you want the filter to catch. The software stores your interest profile
and uses it to check the stories or messages it filters. When it finds
a story or message containing the words you have in your profile, the
program snags it and posts it to your e-mail box (or an area on the service
you can check). Useful for: Getting the latest information about
a company or topic that you are tracking.
- PR-Newswire: http://www.prnmedia.com/prnemail
On PRN, you create a profile that allows you to schedule e-mail delivery
times, select the full-text, headline or abstract of press releases,
select particular states, industries, subject areas or companies, or
use advanced profile to use more specific terms. Press Line http://www.pressline.de/email-service/index.us.phtml
is a similar tool.
- NewsIndex: http://newsindex.com/delivered.html
Set up a profile using subject keywords and have news stories from over
250 news sources around the world delivered to your email. Stories that
appear to match your keywords are returned to you daily via email.
- Quickbrowse: http://www.quickbrowse.com
As its title suggests, the service is meant to provide an easy and quick
way of browsing the Net. A journalist who wanted to view the sites of
all major US papers on a daily basis developed this site. Through Quickbrowse,
one can combine multiple sites (say the feature sections of 5 newspaper)
and then have those sites delivered to your e-mail at a particular time.
- NewsTracker:
http://www.newstracker.com
“NewsTracker collects and filters thousands of late-breaking articles
from a wide variety of online newspapers and magazines including the
Los Angeles Times, Chicago Tribune, Forbes Digital, Advertising Age,
and Russia Today.”
Databases internal
to a website
Spiders
can’t get into databases on a web site, or anything that is retrieved on
the fly from a search query on a site. There are thousands of documents,
articles, reports, and speeches found in these secluded stashes. Useful
for: Finding background information and in-depth reports. Here are
a few ways to find them.
- Invisible Web:
http://www.invisibleweb.com
Yahoo-like directory of databases found on websites. Some items
have the search box on the page, others link you to the web page.
- Direct Search:
http://www.freepint.com/gary/direct.htm
Over 1,000 links to web site databases, organized by archives, books,
government, humanities, news sources, social sciences, legal, ready
reference, business, science, subject-specific.
News archives
Many newspaper web sites have an archive of stories, usually ones that
had been in the print product, not necessarily on the web site. Spiders
might index stories that were on a web page, but they can’t get into the
database archive of stories. Many of these will let you search for
free, but will charge a fee for receipt of a full-text copy of the story.
Useful for: Getting background information, finding out if something happening
locally has happened other places, locating experts, seeing if a story
idea you have is fresh or has been done.
- NewsTrawler:
http://www.newstrawler.com
A parallel search engine for news archives. Click the papers you
want trawled and it will retrieve references to stories that fit
the search you entered.
Commercial Information
Services on the Web
The
stand-alone information services have migrated to the web and been joined
by competitive web start-ups. These huge information stores provide
one-search shopping in archives of newspapers, magazines, transcripts of
television and radio programs. Their material dates back to the early
80s, even before. The contents of these services don’t get indexed by web
search engines
- Electric Library:
http://www.elibrary.com
Search the text of articles from magazines, newspapers, books and transcripts
from around the world. Set fee allows unlimited searching and
article downloading.
- Northern Light:
http://www.northernlight.com
This is a combination spider search of web pages and a “special collections”
database with articles from publications. Abstracts are free but
there is a fee of $1 - $4 for full downloads of selected articles.
- DIALOG:
http://www.dialogweb.com
(if you have an account)(for info: http://www.dialog.com) 500
databases covering business, news, patents, trademarks, science and
government. More than 100 U.S. papers. 221 unique files that don't appear
anywhere else.
- DOW JONES: http://www.dj.com
80-million articles from 6,000 publications, plus market research, analyst
reports and historical market data. Data can be output in variety of
formats, including spreadsheets.
- LEXIS-NEXIS:
http://www.lexis-nexis.com
1.4-billion news stories, legal documents, financial and market reports,
legislative materials and more from 22,000 sources arranged into nearly
10,000 databases. Adds 4.6-million documents a week.
“People Finder”
Files
Put
a name into a search engine, you’ll find web pages with that person’s name
mentioned. Sometimes, though, you want to locate the person (find
an address, e-mail address or phone number). People finder sites on
the web can help you do that, but you have to go to them to do the search.
Useful for: Locating people you need to contact.
- Telephone Directories
on the Web: http://www.teldir.com
“The Internet's original and most detailed index of online phone books,
with links to Yellow Pages, White Pages, Business Directories, Email
Addresses and Fax Listings from all around the world.”
- PeopleSearch:
http://peoplesearch.net
A meta-search site for people-finder databases. Put in a name
and it will go out and do the search in 10 different people-finder databases.
Each search will spawn a new browser window - to quickly close browser
windows, click ALT-F4.
- AnyWho, WhoWhere,
InfoSpace, InfoUSA, PC411: These are some of the largest phone
and e-mail lookup sites on the web.
Library Catalogs
Libraries,
those original collectors and compilers of information, have great resources
on their web sites, much of it sitting in their online catalogs. These
contents won’t be picked up by a spider. Useful for: Locating experts
by searching for the authors of books. Verifying information.
- LibWeb: http://sunsite.berkeley.edu/Libweb/
Over 2700 library web pages from libraries in over 70 countries, there
is a searchable database to find libraries by type or location.
Things search engines
might find but you can’t get them
The spider goes out, sees a web page, indexes it and
puts information about the page in its database. You come along
and do a search. The search results have what sounds like the perfect
page for you. Click on the link with great anticipation and get
a “404 File not found” message. Because of the time lag involved
in the process of scanning, indexing and entry into the database, pages
that were there when the spider came through might have been pulled by
the time you do a search. Tough luck….
Except when you search the web using Google: http://www.google.com
Do a search in Google and if you come to a link that is no longer there,
click the “cached” link at the end of the entry. Google will retrieve
a copy of the page as indexed from its cached page archive!
(Nora
Paul, Poynter Institute Oct. 17, 1999)
|