ISU Greenlee researchers recommend going "Wayback" for lost Web sites

AMES, Iowa -- It's an experience everyone's had -- a popular Web site for important reference information has suddenly gone "missing." That typically sends most people to Google to "find it."

But two researchers from Iowa State's Greenlee School of Journalism and Communication report a search through www.thewaybackmachine.com will produce better results -- almost twice as many as Google -- according to their study on finding inaccessible online citations from six professional journals.

Assistant Professor Daniela Dimitrova and Greenlee School Director and Professor Michael Bugeja co-authored a paper on their study titled "Raising the Dead: Recovery of Decayed Online Citations in Leading Mass Communication Journals," which will be presented at the Association for Education in Journalism and Mass Communication (AEJMC) Annual Meeting, Aug. 2-5, in San Francisco.

Dimitrova and Bugeja studied the online citations from among the following six leading mass communications professional journals: Human Communication Research, Journal of Communication, Journalism & Mass Communication Quarterly, Internet Research, Journal of Broadcasting & Electronic Media, and New Media & Society. Of the 1,600 online citations they found, 733 (45.8 percent) were inaccessible with "dead" URLs. They then used two different methods to see whether the "dead" URL could be found -- first using the most popular search engine in Google, and then the most popular online archive in the Wayback Machine.

The Wayback Machine outperformed Google in retrieving the dead citations -- finding 392 (53.4 percent) compared to only 201 (27.4 percent) on Google.

"Whenever I need to search for something, Google has always been my first strategy -- so I was a little surprised too," said Dimitrova. "It's not that Google is bad. It just looks for actively existing Web sites. If the information or the Web site has been removed, for whatever reason, Google won't find it.

"Our results show that there are better ways to track this lost information and find it. The Wayback Machine archives the contents of a Web site -- creating a copy or a mirror of that site at a particular point in time -- and that's very promising for finding lost information or Web sites."

The researchers found that there was an overlap of only 129 citations that were found on both sites. Of the 392 citations found on the Wayback Machine, 263 (67 percent) were not found on Google. By contrast, only 72 of the 201 (35.8 percent) citations found on Google were not found on the Wayback Machine.

"The general findings of this content analysis suggest that authors trying to revive vanished online citations are better off using an online archive rather than a search engine at this point in time," said Bugeja, author of the book "Interpersonal Divide: The Search for Community in a Technological Age" (Oxford University Press, 2005).

"The convenience of Google, combined with its immense storage capacity and popularity, are troublesome reminders that scholars may continue to use less reliable methods to retrieve lapsed citations. Moreover, unless graduate programs emphasize the importance of archives in their introductory research classes, the first impulse of newer researchers, trained in the digital rather than physical library, might be to check via search engine rather than archive."

"Most people may still go to Google first to try and find something, and that's fine. The lesson of this research is that if you can't find it on Google, go to a web archive to find it," said Dimitrova.

This study -- which builds on the researchers' previous research about the use and misuse of online citations in the journalism and mass communication area -- is one of 10 that faculty and graduate students from the Greenlee School of Journalism and Communication will be presenting at the AEJMC Annual Meeting.