Wednesday, August 21, 2013

WEB SITE IDENTIFICATION PROCEDURES

There is currently no standardised method of sampling content featured on the world wide web. In the absence of standardised protocols, we employed a searching strategy similar to those utilised in studies of Ewings sarcoma and herbal remedy web sites. Both of these studies identified their study sample by entering multiple keywords into multiple search engines and by assessing either the first 100 or 250 hits from each search to determine web site eligibility for the study.
We used multiple keywords and search engines for several reasons: (1) approximately 85% of internet users rely on a search engine to locate information; (2) industry studies show that users employ approximately three keywords in a given search session; and (3) the best single search engine only covers about 16% of the web, but combining the results of multiple search engines raises coverage to nearly 42%.
To identify the best keywords we began by first locating approximately a dozen ICVs and examining the words featured on each site's main page and metatags. Metatags provide keywords to help search engines find the site. Most browsers have a feature that allows users to view the underlying source code, including metatags. These approaches yielded a pool of 11 potential search keywords (cheap cigarettes, cheap smokes, cheap tobacco, discount cigarettes, discount smokes, discount tobacco, low price cigarette, inexpensive cigarette, mail order cigarette, online cigarette, and tax free cigarette). All 11 potential keywords were then typed into six major search engines used in an earlier study of search engine coverage to determine which keywords were most efficient in identifying ICVs. “Discount cigarette”, “cheap cigarette”, “mail order cigarette”, and “tax free cigarette” were selected as the four best keywords because they were the most efficient in locating ICVs, and at least one of these four keywords appeared in nearly all of the metatags for sites that were found using the other seven potential keywords.
To identify the search engines, we relied on popularity rankings. Media Metrix (www.mediametrix.com) ratings for August 1999 (the most current ratings at the time of the data collection) were used to identify the top five most widely used internet search engines. The four keywords were entered into these search engines, except for Yahoo! (http://www.yahoo.com), which is a category based internet search catalogue that required a slightly different strategy. Web sites on Yahoo! are organised hierarchically within categories. To identify the Yahoo! categories for cigarette vendors, two keywords—“smoking” and “cigarette”—were searched. The keyword “smoking” yielded 36 category matches and the keyword “cigarette” yielded five category matches. Of these 41 total category matches, only one unique category listed cigarette vendor sites: business and economy> companies>hobbies>smoking.