There is currently no standardised method of sampling content featured on the world wide web. In the absence of standardised protocols, we employed a searching strategy similar to those utilised in studies of Ewings
sarcoma and herbal remedy web sites. Both of these studies identified their study sample by entering multiple keywords into multiple search engines and by assessing
either the first 100 or 250 hits from each search to determine web site eligibility for the study.
We used multiple keywords and search engines for several reasons: (1) approximately 85% of internet users rely on a search
engine to locate information; (2) industry studies show that users employ approximately three keywords in a given search session; and (3) the best single search engine only covers about 16% of the web, but combining the results of multiple search engines
raises coverage to nearly 42%.
To identify the best keywords
we began by first locating approximately a dozen ICVs and examining the
words featured on each
site's main page and metatags. Metatags
provide keywords to help search engines find the site. Most browsers
have a feature
that allows users to view the
underlying source code, including metatags. These approaches yielded a
pool of 11 potential
search keywords (cheap cigarettes,
cheap smokes, cheap tobacco, discount cigarettes, discount smokes,
discount tobacco, low
price cigarette, inexpensive cigarette,
mail order cigarette, online cigarette, and tax free cigarette). All 11
potential
keywords were then typed into six major
search engines used in an earlier study of search engine coverage
to determine which keywords were most efficient in identifying ICVs.
“Discount cigarette”, “cheap cigarette”, “mail order
cigarette”, and “tax free cigarette”
were selected as the four best keywords because they were the most
efficient in locating
ICVs, and at least one of these four
keywords appeared in nearly all of the metatags for sites that were
found using the other
seven potential keywords.
To identify the search engines, we relied on popularity rankings. Media Metrix (www.mediametrix.com)
ratings for August 1999 (the most current ratings at the time of the
data collection) were used to identify the top five
most widely used internet search
engines. The four keywords were entered into these search engines,
except for Yahoo! (http://www.yahoo.com),
which is a category based internet search catalogue that required a
slightly different strategy. Web sites on Yahoo! are
organised hierarchically within
categories. To identify the Yahoo! categories for cigarette vendors, two
keywords—“smoking”
and “cigarette”—were searched. The
keyword “smoking” yielded 36 category matches and the keyword
“cigarette” yielded five
category matches. Of these 41 total
category matches, only one unique category listed cigarette vendor
sites: business and
economy>
companies>hobbies>smoking.