Skip to main content

How Do You Grab an Explosion Making the World Wide Web Usable

Journal Issue
Column Name
Journal HTML Content

The Internet, or World Wide Web (WWW), is as useful as it is daunting. Its growth has been

both exponential and explosive. Learning to use it requires time, practice and a plan. Otherwise,

you will be lost in the deluge of available information.

</p><p>If the Web was incomprehensibly large in December 1997, it more than doubled in size in little

more than one year. According to a survey by NEC's Research Institute,<small><sup><a href="#1" name="1a">1</a></sup></small> in February 1999 the

Web had more than 800 million searchable pages with more than 6 trillion characters. Only 15

months before, the same authors estimated that the Web had approximately 320 million pages of

information. By contrast, the Library of Congress had an estimated 20 trillion characters.

</p><p>Staying even with this growth is difficult. Getting ahead of it is like trying to surf a tsunami.

Just when you think you have a part of it down pat, it doubles in size and becomes even more

incomprehensible. So, how can you manage this ever-growing monster to put its huge research

library to work for you?

</p><h3>Eating an Elephant</h3>

<p>The answer to the age-old question of "How do you eat an elephant?" applies equally to the

Web—"one <i>byte</i> at a time."<small><sup><a href="#2" name="2a">2</a></sup></small> You have to start somewhere, get familiar with one part of the

Internet and branch out from there. If you have yet to jump on the "information

superhighway," now is the time because you will only get further behind as the Internet

continues its exponential growth.

</p><p>Since you already have an interest in both bankruptcy and a growing interest in the WWW, the

logical place to jump in is <b>http://www.abiworld.org,</b&gt; ABI's own corner of the world (wide

web). Starting each day with a look at Today's Headlines, followed by the discussion boards,

gives you both a crash course on using the WWW and the current events in the bankruptcy

world. The current events portion is essential, since there is the possibility of a major

overhaul coming out of Congress during this session.

</p><p>Once you become familiar with ABI World, you are ready to branch out to the other areas of the

Web. But where do you start? How do you find the needle in one of 10,000 haystacks? This is

where the search engines come into play.

</p><h3>The Little Engines That (Almost) Could</h3>

<p>Simply put, a search engine is a site on the WWW that helps you find other sites with the

information you want. They come in different styles with their own strengths and weaknesses

(which will be discussed below). Now we learn that perhaps the greatest weakness is their

coverage.

</p><p>The NEC Research Institute study included a statistic that was both surprising and completely

anticipated—search engines do not really cover the Internet. That should not come as a surprise

with the explosive growth it experienced in the past few years. How could anything stay

absolutely current? The real surprise came from the report that even the best search engines

cover barely more than 15 percent of the Web, leaving more than 80 percent virtually

untouched. If these search engines did not find a bit of information, the only other way to find

these web pages is purely by accident.

</p><h3>Searching Web Sites</h3>

<p>Northern Light was found to have the best coverage with 16 percent, with Snap and AltaVista

close behind at 15.5 percent each.<small><sup><a href="#3" name="3a">3</a></sup></small> Yahoo!, one of the most popular sites, was reported to cover

only 7.4 percent of the Web. Understanding how these search engines work explains both their

incredible scope and unbelievable lack of reach.

</p><p>Northern Light, Snap and AltaVista all use programs called "spiders" to continuously search

the Web, find sites and index the locations they find by creating an enormous database. The

programs work all of the time to update the information they find. When you search for

something using the search engine, instead of searching all of the web sites, they simply search

their databases and return with a list of sites that match your search criteria. An example of the

power of the search engine (and sheer magnitude of the database) is a recent search on AltaVista

for "bankruptcy." In a search in July, AltaVista took only two seconds to generate a list of

497,890 sites on the WWW where it found the word "bankruptcy." Obviously, these engines

should be used for very specific searches of narrow topics—the true "needle in thousands of

haystacks" searches.

</p><h3>Searching for Types of Web Sites</h3>

<p>By contrast, Yahoo! is an outline (created by real humans) of sites on the Web. Yahoo!

categorizes the sites by area, then topic, and you finally get down to a list of specific sites that

meet your criteria. You can search it using only a mouse (with no "keyword" searching) or

giving it specific criteria to search for. When using the keyword method, Yahoo! creates a list of

categories for you to use to narrow and continue your search. It will usually guide you to the

front page of the web sites instead of the specific pages deeper in the site that use the specific

terms from your search. In short, Yahoo! will drop you at the front door of a site and then let

you do the exploring from there to find what you want. This is the type of search engine you want

for general searches of broad topics.<small><sup><a href="#4" name="4a">4</a></sup></small>

</p><p>Another way to surf the Web is to find what I will call "compilation sites." These sites compile

and categorize lists of sites meeting their criteria. Two popular sites are CEO Express and

Virtual Reference Desk.<small><sup><a href="#5" name="5a">5</a></sup></small> CEO Express categorizes web sites, providing information that a CEO

would use. Its areas include news, financial information (including the financial calculators that

can amortize note payments), medical, travel, legal (including ABI World) and similar sites. It

is a quick way to get to these sites. The Virtual Reference Desk includes (almost) everything

that a reference librarian could find for you (including many of the sites listed by CEO

Express). A personal favorite for finding obscure information is a feature called "Homework

Helper." It found an obscure Shakespearean quote by knowing where to find the web site where

the entire body of his work was available.

</p><p>For a more detailed review of the strengths and weaknesses of the search engines, go to the link

for "How to best use a search engine" at CEO Express.<small><sup><a href="#6" name="6a">6</a></sup></small> This link describes the features of the

most popular search engines and gives specific examples of the searches where each search

engine excels and fails to reach the mark. Having this information in something other than

"technogeek speak" is invaluable.<small><sup><a href="#7" name="7a">7</a></sup></small>

</p><h3>So What's Really Out There?</h3>

<p>The NEC Research Institute report also described the types of sites on the Web. Not

surprisingly, the main type of site was described as "commercial"—businesses with web sites

that inform the world about what they do (and what they have to sell). A surprisingly large

82.1 percent of the sites were described as "commercial." By contrast, "government" and

"religion" sites brought up the rear with 1.2 percent and 0.8 percent, respectively.<small><sup><a href="#8" name="8a">8</a></sup></small>

Businesses have learned that the way to draw traffic to their sites is to entice you with their

products. We apparently need to be bribed (with discounts and the like) to surf to those pages.

</p><p>On the government front, some of the most helpful information comes from the Securities and

Exchange Commission (SEC) and Internal Revenue Service (IRS)—two agencies that you always

hear about. The SEC's EDGAR database includes the recent filings by public companies. The IRS

site includes some tax information but also includes forms you can download and use (instead of

going to the post office or their local office and waiting in line).<small><sup><a href="#9" name="9a">9</a></sup></small>

</p><h3>Summary</h3>

<p>There is no doubt that the WWW will continue to grow. At its current size, you could argue that

its growth rate is irrelevant. If it grows slower, it will still take years before the search

engines catch up to track all of it. If, on the other hand, it grows even faster, we will only be

able to use a small part of all that is available. Either way, the only way to make any use of it is

by going one step at a time. Get familiar with a corner of the Web and move out from there.

</p><hr>

<h3>Footnotes</h3>

<p><small><sup><a name="1">1</a></sup></small> Published in the July 8, 1999 edition of the scientific journal <i>Nature</i> and reported widely in newspapers around the country. <a href="#1a">Return to article</a>

</p><p><small><sup><a name="2">2</a></sup></small> The original answer of "one bite at a time" needed some updating for our increasingly digitized world. <a href="#2a">Return to article</a>

</p><p><small><sup><a name="3">3</a></sup></small> These search engines are found at <a href="http://www.northernlight.com&quot; target="window2">www.northernlight.com</a&gt;, <a href="http://www.snap.com&quot; target="window2">www.snap.com</a&gt; and <a href="http://www.altavista.com&quot; target="window2">www.altavista.com</a&gt;. <a href="#3a">Return to article</a>

</p><p><small><sup><a name="4">4</a></sup></small> A new search engine called "Ask Jeeves" mixes the two types of engines by accepting "natural language" requests and presenting a number of possible sites for the information. It is found at <a href="http://www.askjeeves.com&quot; target="window2">www.askjeeves.com</a&gt;. <a href="#4a">Return to article</a>

</p><p><small><sup><a name="5">5</a></sup></small> These are found at <a href="http://www.ceoexpress.com&quot; target="window2">www.ceoexpress.com</a&gt; and <a href="http://www.refdesk.com&quot; target="window2">www.refdesk.com</a&gt;. <a href="#5a">Return to article</a>

</p><p><small><sup><a name="6">6</a></sup></small> Look under the section titled "Internet Search." <a href="#6a">Return to article</a>

</p><p><small><sup><a name="7">7</a></sup></small> The direct address to this page is too lengthy to copy here. <a href="#7a">Return to article</a>

</p><p><small><sup><a name="8">8</a></sup></small> Those listed as "pornography" accounted for only 1.5 percent of the sites. This is far smaller than some might believe, given the news reports about those sites and the privacy

and First Amendment issues they present. <a href="#8a">Return to article</a>

</p><p><small><sup><a name="9">9</a></sup></small> These sites are found at <a href="http://www.sec.gov&quot; target="window2">www.sec.gov</a&gt; and <a href="http://www.irs.treas.gov&quot; target="window2">www.irs.treas.gov</a&gt;. The far more usable site for SEC documents is maintained by the Technology Centre of

PricewaterhouseCoopers and is found at <a href="http://www.edgarscan.pwcglobal.com/EdgarScan/index.html&quot; target="window2">www.edgarscan.pwcglobal.com/EdgarScan/index.html</a&gt;. <a href="#9a">Return to article</a>

Journal Authors
Journal Date