by
Frederick Rustam
Part Four, ORDINARY
CITIZENS AS SCHOLARS
Part One | Part Two | Part Three |
_____________________________________________________________________
Questor
Institute is a new, experimental technical school where
bright-but-poor
high-school graduates on full scholarships spend
two
years seeking to become wizards of Internet sorcery by studying
the
science and philosophy of information retrieval from textual
databases
such as the World Wide Web. In Part Three, "The Nearness
of
You," Kevin and Marylou studied the tricky process of subject
qualification,
learned the greater effectiveness of proximity
searching
with the NEAR logical operator, and began a search
for
an inspiring mountain.
_____________________________________________________________________
Information Availability
Kevin
and Marylou's studies at Questor Institute were centered on the
technology
of Internet textword searching. The Institute also taught
them
the historical and philosophical foundations of this primary
medium
from which they retrieved information. Most of their exposure
to
the Institute's Internet philosophy occurred in the course,
History
and Nature of Computerized Information Systems.
"Today,
we're gonna get orientated 'bout the 'Net," joked Kevin.
"Oriented,"
corrected Marylou, not allowing for his humor. She pointed
to
the course outline displayed on their workstation screens. "Today's
lecture
is supposed to be a brief introduction to Internet history
and
philosophy."
"The
universal Internet is only a few years old," began the teacher,
who
was the Institute's elderly Rector. "Yet it's already become
the
most significant instrumentality for the storage and retrieval
of
publicly-available information since the advent of the public
library.
We all assume that technology drives and steers society
to
some degree. Printing helped to change medieval society and bring
about
the Renaissance by making more-diverse literature more-widely
available.
Internet technology is changing today's global society by
making
public information from a wide variety of sources and viewpoints
universally-available,
and by making that information more-easily
retrievable.
"The
Internet was not, as our media often say, devised by the Dept.
of
Defense for military telecommunication after a nuclear strike on
our
nation. The TCP/IP packet-switching standard---Transmission Control
Protocol/Internet
Protocol, and the ARPAnet to make use of it---were
devised
so that universities doing defense research sponsored by the
Advanced
Research Projects Agency could quickly share their technical
papers
and computer programs.
"Later,
other government agencies and commercial outfits built similar
national
data transmission networks, and for awhile it was uncertain
which
technical networking standard would triumph over the others.
When
the Europeans saw that TCP/IP might become the de facto
standard
for an international network, they rushed to create their
own
version and got that adopted by the International Standards
Organization---but
too late. TCP/IP became the world's Internet
protocol
because people wanted an international computer network
right
now, and
they chose to use the functioning ARPA
standard,
instead of waiting for something better to develop.
"It
wasn't until the National Science Foundation built a civilian
network
on the ARPAnet standard that a national network for all our
citizens
became possible. When the NSF relinquished operation of its
NSFnet
to a three-level combination of long-distance telecom companies,
regional
telecom companies, and local, commercial Internet service
providers,
the national citizen's computer network we know as the
Internet
was born. It's important for us to appreciate this fact: it
wasn't
until the government transferred one of its taxpayer-financed,
civilian
computer networks to free-enterprise operation that a true
Internet
became available for all our citizens. Our current Internet
wasn't
built by the Defense Department as a gift for the general
public,
although that's what some civilian visionaries in ARPA
hoped
their network would eventually become.
The
Rector tapped his head as if to elicit thought by his audience.
"But
what is our Internet, really?... It's much more than a bunch of
telephone
lines and routers. It's a bunch of agreements, agreements
between
millions of information holders to use the same methods of
formatting
and transmitting information and to make that information
available
everyone who can access it. The earliest of the info-
sharing
technical agreements were email and 'FTP.' File Transfer
Protocol
allowed users anywhere on the network to transfer their
data
files and computer programs to others by using the same
simple
procedures.
"Then
came 'telnet,' a protocol which allowed all Internet users
to
retrieve data by remotely manipulating the databases of other
users.
Such a database might be the computerized catalog of a big
university
library, a place where bibliographic data about rare
books
might be found. This general database-sharing protocol opened
a
big door to information retrieval: the acquisition of information,
routinely
and without formal request. We didn't have to ask somebody
for
their information; we just connected to them on the network and
took
what we wanted---and we did so because the holders of that info
wanted
us to have it. It was this new, free-information mode which
established
the spirit of today's Internet.
"To
retrieve information by telnet, however, we have to know the
technical
procedures by which individual info-holders operate their
databases.
These operating procedures vary, often in slightly-but-
annoyingly-different
ways; yet they have to be employed correctly,
or
we can't retrieve the information."
Universal Compatibility
"Then
a researcher had a terrific idea. A British physicist, Tim
Berners-Lee---while
working at CERN, a European research facility---
came
up with a superprotocol for information-sharing on the Internet.
He
called it 'HTTP, HyperText Transfer Protocol.' It's an agreement
by
all Internet users to format much their Internet information in a
standard
fashion and allow access to it by a commonly-employed means
of
retrieval, transmission, and display. This protocol allowed the
establishment
of the World Wide Web.
"HTTP
also provided for site-to-site hyperlinks embedded within the
text
of webpages. These links offer Web users an important way to
discover
more information about a subject than they can find on the
pages
of the website they're currently viewing. The importance of
hyperlinks
can't be overstressed. Following them can be like following
guidemarks
on the walls of a dimly-lighted tunnel system onward to the
liberating
sunlight.
"The
advantages of a single Web-document protocol have since been
undermined
by the proliferation of webpages in formats requiring
specialized
software to view them. But the HTML format does have
limitations,
and these can now be sidestepped by those document
authors
who prefer more-versatile formats for their works while
making
these works publicly-available via standard Web browsers.
"In
March 1995, a milestone was reached when Internet Web traffic
first
exceeded FTP traffic. HTTP was a quantum jump for the Internet
and
a giant leap for humankind---one far-greater than traveling to
the
moon, I assure you. It was HTTP that changed the Internet from
a
techno-tool for an elite to an information system for the masses.
When
HTTP spawned the World Wide Web, and commercial firms offered
the
common folk access to it, the whole world went online to post
information
and to retrieve the information of others.
"Wow,"
scoffed Kevin, too loudly. Marylou elbowed him. "Shhhhh."
"The
Web is, however, a relatively 'freeform' database, in that its
only
organization---if we can term it that---is that it exists on
webpages.
Its information isn't naturally classified or grouped like
books
in a library or want ads in a newspaper, where similar items
are
gathered together for retrieval convenience. Nor are the Web's
webpages
placed into neatly-packaged information containers like
those
of structured databases, where categorized tables of data
elements
can be retrieved individually or in combination.... But
this
doesn't mean that Web information is almost-irretrievable,
as
some critics have recklessly claimed.
"Millions
of the world's people are accessing the Web, and they're
delighted---but
frustrated, as well---by that vast information system.
Consider
why this is so. In a library, when you want information about
a
person, you can consult a general encyclopedia, a biographical
reference
work, or a specific biography---depending upon the depth of
info
you require. When you want in-depth information about any specific
subject,
you can go to a place on the shelves where books about that
subject
are located. Use of the library's subject catalog is desirable
but
not always necessary.
"But
on the Internet---unless you can directly access or follow a
hyperlink
to another webpage of known relevance---you face the
terabytes
of a vast, distributed megacorpus of text, universal in
its
subject coverage, in which your desired subject is completely
scattered
among billions of webpages in no order, and which is
retrievable
only by means of textword indexes which more-or-less
tell
you where subjects can be found---if you can successfully
search
those indexes.
"There
are some specialized search engines available which index
only
webpages within a narrow area of specialization. But most Web
users
prefer to use general search engines. These engines index
many
more pages, they're updated more frequently, and they usually
offer
more search features. This preference for general engines,
however,
virtually guarantees users more retrieval frustration.
"The
daunting challenge which the World Wide Web and the Usenet offer
information
seekers is the reason for the establishment of Questor
Institute,
and it's why you're here. In meeting this challenge
successfully,
you'll find that you've often retrieved useful info
from
those of your fellow citizens often labeled as 'amateurs.'"
Citizen Scholars
"There
are many Usenet newsgroups; and there are, in effect, several
Webs.
There is the publicized and well-known Commercial Web, where
people
are trying to make money. The Academic Web now educates many
more
people than registered college students by making non-confidential
academic
documents universally available. The Government Web is where
Big
Brother interacts with the Internet---not to control it but to
merely
use it as the rest of us do."
As
he intoned this reality, the Rector rolled his eyes skyward and
clasped
his hands in mock-prayer. "Thank you, Lord, for this blessed
decentralization
of the People's Medium." The audience applauded.
"There's
the Media Web, a means for our news/feature media to extend
their
influence over us, even into a place where we like to think that
we've
escaped their traditional, one-way communication. The Media Web
is
a part of the Commercial Web, although it hemorrhages money at an
enormous
rate. Nonetheless, the media strongly feel that they have to
have
a Web presence. And that's good for information seekers because
media
websites are a valuable source of timely information---but not
necessarily
'authoritative' information. Media error seems to be
amplified
by the modalities of the Digital Age.
"But
most of the Web---nobody can say for sure how much---is the
province
of ordinary citizens. You've noticed this, I trust. The
Citizen
Web is composed of millions of websites created by ordinary
people
like you and me. After the government surrendered the data
transmission
infrastructure of the Internet to commercial telecom
firms,
citizens who were not 'empowered' by colleges, governments,
or
corporations---or favored by traditional publishers---could
more-easily
contribute their information to society's info-pool.
And
they did. The first trickle of citizen participation quickly
became
a flood which is quietly sweeping our old informational
hierarchies
into the drainpipe of history."
The
Rector assumed a conspiratorial stance. "But don't tell this
to
the Information Elite. They still believe themselves to be the
sole
creators and disseminators of of authoritative information."
Kevin
whispered to Marylou, "I don't even have a homepage with an
authoritative
picture of my cat."
"You
will when you get a cat," she purred. "They're so lovable
we
just have to tell everybody about them."
As
if he'd heard that conversation, the teacher continued, "What
do
ordinary citizens have to offer the Web that isn't offered by
academia,
government, and commerce?... The same thing, really---
information---but
without the customary authority those societal
institutions
claim for themselves. Citizens have put up on the Web
billions
of pages of text, sound, and graphics in which they have
an
interest. I'm not just talking about homely 'homepages,' even
though
personal websites can be useful sources of information.
"Many
citizen websters are hobbyists who pursue their hobbies online
as
well as offline. Hobbyist websites are often quite elaborate, and
their
pages speak with an authority which most media pages lack because
the
info comes from the heart as well as the head. 'Interest websites,'
those
which reflect the interests of their entrepreneurs, are also
info-valuable.
Even 'fansites,' where citizen fans exhibit their
enthusiasm
for celebrities, can be useful sources of information.
"I
once heard a popular radio musicologist introduce a medieval
selection
as what sounded to me like 'Lo Mar May.' I was curious
enough
to access the musicologist's website to find this title in his
program
playlist. Although most of his numbered programs were listed,
a
few were not, and #109 was one of those missing! For months, again
and
again, I accessed the page to see if Program #109 had been added.
It
hadn't been. Finally, I decided to 'go general,' that is, to make
a
general search of the Web for another source of this guy's program
playlists.
I did, and I quickly found a citizen's fansite about the
musicologist
which had them. Checking Program #109, I found that the
title
the musicologist had spoken was the French phrase, 'l'Homme
Arme'---'a-r-m-e'
with an acute accent on the e---'The Armed Man.'
If
that citizen website hadn't existed, I might still be waiting to
find
out how 'Lo Mar May' was spelled. Such redundancy on the Web
is
a good thing for information seekers.
"Our
media, both offline and online, rarely refer to the Citizen Web.
When
they do, it's usually to denigrate some part of it. Media folks
seem
to feel they have to label 'amateur' Web information as being
unreliable.
For a few years, the media relentlessly reported the
Internet
as unreliable and dangerous. They did this because they
feared
the competition this new medium presented. Newspapers feared
losing
readers, television feared losing viewers, and these media
tried
hard to stigmatize the new People's Medium---a medium which
they
barely understood, and which today they still poorly know.
The
Rector's hard, calculated look of contempt changed into a
wicked
grin of pleasure.
"But
when the media rushed onto the Web to gain a presence there,
their
scare stories about the Internet greatly diminished. When
editors
and reporters realized the value of the People's Medium
for
their purposes, they became part of the Web community---but,
thank
goodness, not a controlling part. Then, they realized they
couldn't
write or say bad things about the Internet which might
discourage
the computer-owning, educated class from accessing
media
websites. But the Media Elite still keeps its collective
nose
elevated well above the Citizen Web, even as they utilize
its
information freely and often without attribution.
"So
why do citizens spend their money to create informational
websites
with elaborate pages of content? Ultimately, I guess,
they
do it for personal accomplishment. But many feel strongly
about
their hobbies, interests, and causes, and they desire to
express
themselves on the Web to further these. In the end, who
cares
why they do it?... Strong motives can produce information
distortion,
but let's be thankful that so many citizens have
put
their information on the People's Medium. As professional
information
retrievers, you'll make use of data from citizen
webpages
because it's handy and relevant.
"Academia,
government, and commerce have no monopoly on scholarship.
You
don't have to go to college or have an intellectual vocation
to
become a scholar. Every citizen can be an armchair scholar or
even
a deadly-serious one. You'll see a good example of this when
you
discover the 'Witchfinder General' webpage in your homework
assignment."
The Rector smiled, benignly.
"If
you publish 'original' information---that's what you write by
your
own efforts, not what you copy from others---then in Questor
Institute's
view, you're a citizen scholar. Citizens now have the
People's
Medium to display their scholarship, but they could enjoy
some
recognition of their works by the intellectual hierarchs of
our
society. In that vein, we hope to establish at Questor a Website
Excellence
award program to recognize the outstanding contributions
of
the citizenry to society's online information pool. Yes, I know
that
there are a multitude of questionable Web awards out there in
cyberspace,
but as Questor Institute becomes better-known, its Web
awards
will, I trust, be the most-highly valued by their recipients.
"Today's
homework assignment is this, then: locate a citizen webpage
which
provides a short-but-good biography of the 17th-century English
Puritan
witch hunter who styled himself as "The Witchfinder General"
---and
which provides a complete list of his many victims and their
'disposal.'
In your report about this site, answer these questions:
(1) Does the Englishman who put up this
page seem to have
a college degree?
(2) How does his 'authority'---or seeming
lack of it---
matter to you?
"I'd
like to end today's session with an anecdote about scholarship.
In
California, there's an academic who's a self-appointed Internet
'contrarian'---gadfly.
In 1995, he wrote this in a national magazine:
What the Internet hucksters won't tell
you is that the Internet
is an ocean of unedited data without any
pretense of completeness.
Lacking editors, reviewers, or critics,
the Internet has become
a wasteland of unfiltered data. You don't
know what to ignore
and what's worth reading.
Logged onto the World Wide Web, I hunt
for the date of the
Battle of Trafalgar. Hundreds of files
show up, and it takes
15 minutes to unravel them---one's a
biography written by an
eighth grader, the second is a computer
game that doesn't work,
and the third is an image of a London
monument. None answers
my question...
"Recently,
I read that passage reprinted in a book on search engines.
So
I logged onto the Web, and in 15 seconds---not minutes---
I
retrieved a search-results page on which six of the eight items
clearly
stated the date of the Battle of Trafalgar. I didn't have
to
click on those items; I found the answer in their annotations.
How
can it be that I was so quickly successful, and that academic
'expert'
was not?... I don't know how he searched for the date of
the
Battle of Trafalgar, but I searched like a Questor would."
The
Rector wrote on the whiteboard:
<-------------------ASSUMED
SENTENCE------------------>
"The battle of Trafalgar was fought on
October 21, 1805."
<-REDUNDANT-> <-UNIQUE
SEARCHPHRASE-> <---UNKNOWN---->
SEARCH INPUT: <"trafalgar
was fought on">
"As
you already know from your retrieval class, searching for
natural-language
text requires that we employ natural-language
searchwords.
Unlike Dr. Contrarian, I visualized a relevant text,
subject-analyzed
it, and searched appropriately. How many of you
Questors
would have searched this way?" Suddenly, the classroom was
filled
with waving hands, not all of them honestly flourished, but
all
of them strongly motivated.
"Of
course, without examining the six search-return items I couldn't
establish
an 'authority' for the date each offered. But I can assume
an
'inherent authority.' When six webpages agree on the date of a
historical
event, this amounts to a practical authoritativeness for
that
date. It seems unlikely that all six webpages would have gotten
it
wrong, especially when there was no other date in the results.
"Many
who take it upon themselves to harshly criticize the Internet
simply
don't know the medium as well as they should. As Questors,
you'll
study the Internet until it becomes 'second nature' to you.
You'll
learn more than just how to retrieve from it---you'll learn
to
respect it. Without that respect, the Internet surrenders its
secrets
reluctantly.... When you approach a dog with love in your
heart,
isn't it less-likely to bite you?"
"That
was a powerful message," whispered Kevin. "I'm inspired."
Ignoring
his sophomoric cynicism, Marylou replied, "So am I.
And
it makes me feel good to know that I'm studying to enter the
latter-day
equivalent of a medieval craft guild. When our class
graduates
and shows the world what the word 'Questor' now means,
students'll
be thronging to get in here. And we'll be remembered
as
the Institute's pioneer class."
"Yippee!
Circle the wagons!"
"That's
not what I meant, Kevin."
"Hey
sister, don't get me wrong. I can dig the guild thing."
THE END OF PART FOUR
Next: Part Five, "What Does
Information Want?"
__________________________________________________________________
©
2002 by Frederick Rustam. Frederick
Rustam is a retired civil
servant.
He formerly indexed technical reports for the Department of
Defense.
He writes science fiction for Web ezines as a hobby. He
studies
and enjoys the Internet as a hobby.