On Broken Links and Lost Knowledge

Scholarly and scientific research today is in large part dependent on our access to previous research, primary sources, and databases.¹ As many of these resources are to a significant extent – if not primarily – consulted through computers, it is no wonder why many learned references are to a web address linking the reader with a location in cyberspace.² This is not without its drawbacks of course. Books and journals smell infinitely better, for instance, than a toiling hard drive. And have you ever tried penning down notes on a computer screen? You run out of space almost instantly! But those are not the problems we are discussing today – however compelling they might be. Because we are going to talk about the potential loss of knowledge when a link changes after a reference has already been published or, even worse, when the information referenced is no longer available online.

And this subject touches on a question that I sometimes get: why do you mostly reference pages in actual books and articles, instead of linking to an alternative online source? As I explain in the FAQ section of Bildungblocks, this is not mere intellectual posturing! It is of paramount importance to me that my sources can be verified by the readers of these blogs. In order to achieve that, those sources have to be available and as easy to find as possible. I do therefore privilege those books and articles that are relatively readily accessible online, but still I would not choose a merely digital source over an equivalently suitable printed one. Because the pages of a book or article do not change, even if their find place online differs in time, but web addresses do. In addition, articles and books will for the foreseeable future remain consultable in dedicated institutional repositories, while there is no such a back-up system for many digital resources.³ But this does not mean, of course that we should not hesitate to reference solely digitally available sources, if they provide the best information available. Just be aware of the risks! Risks we will not so coincidentally discuss below.

This blog is also available in Dutch.

Broken Links

I have acquired my preference for printed sources – but with an eye towards their online availability – the hard way. Because while I still conducted research activities in a previous life, as well as now writing these blogs, I have encountered more than my fair share of dead links. All too regularly my search to verify referenced sources passed through several graveyards full of them. And even in books and articles from reputable publishers, it could be the case that a footnote pointed me to a website that did no longer exist or which had its web address changed. If such a reference does not also mention the author and the title of the (article on the) web page – or if that title has also been changed on the site itself in the meantime – it can be near impossible to locate the digital source in question. Especially since internet search engines have somehow become less perceptive.⁴ All the while, it is the references to secondary literature, primary sources, or relevant data that make a work authoritative.⁵ As only in this way, we can ascertain that the research is properly substantiated.

“But are there not modern solutions to these modern problems?” you may ask. And it is true that there have been attempts to solve this conundrum, most well-known through the use of what are called Digital Object Identifiers or DOI’s.⁶ These are links which are meant to persistently, well, link to a digital source regardless of whether it moves somewhere else in cyberspace. This is done through a central directory which is updated when needed. In this way the actual content remains available throughout any changes to its digital location. Especially in combination with Open Uniform Research Locator (OpenURL) technology, which provides referencing sensitive to the context of the one doing the search, a DOI-link – characterized by those same three letters – can navigate a researcher specifically to the version of a source they have the most easy access to.⁷ But even though such links are supposed to survive all envisionable calamities – like the demise of the original publisher of the resource – many researchers can attest that even DOI-links can turn out to be broken when you unassumedly click them.⁸

That even persistent links like DOI’s can be killed, in case you wondered, is often related to the metadata involved.⁹ That is the identifying information which singles out the source that should persistently be linked to. Sometimes the metadata provided to the relevant directory by the content provider is deficient. But also the link itself can contain the wrong metadata. Finally, the website where the link leads to can be confusing to the reader. When the material that was published in an older journal has been subsumed under a new title, for example. As such, even systems designed to make knowledge permanently accessible – despite experts’ best efforts to solve this conundrum – intermittently and unintentionally contribute to the loss thereof.¹⁰

Lost Knowledge

The mortality of linked references are a symptom of a larger problem with continuing access to digital sources: entire chunks of the worldwide web are disappearing and subsequently the internet as we know it is gradually lost. In an interesting piece from late 2024 about the withering of our informal digital library, Susan Smith writes that a quarter of the web pages that existed in 2013 is no longer available today.¹¹ This also includes, among other resources, specialist blogs on all kinds of important topics for which there has, as of yet, been no suitable replacement – digital or otherwise.¹²

But it is not only our informal library that inhabits the proverbial danger zone. In an article for Nature in the spring of that same year, Sarah Wild concludes that the digital longevity of humankind’s learned literature is also not sufficiently guaranteed.¹³ Which presents another terrifying probability when one encounters a broken link: that the resource may have simply disappeared from the internet and we have to cross our fingers and toes that a digital or hard copy has been kept somewhere. Preservation of scholarly and scientific knowledge is indeed, as noted by Mary Case, one of the grand challenges of our era.¹⁴

Paywalls and Online Reading Programs

The previously discussed topics are naturally not the only current problem with the accessibility of online information for research purposes. Two other prominent hurdles that I think are important to be also mentioned in this context are publishers’ pay walls and the presentation of digital resources.

To start with the former, you may have wondered why I implied in the foregoing that access to digital resources differs for different persons and why we need technologies like OpenURL to point us towards the digital location of the desired information that is most suitable for us specifically. This has to do with the paywalls that guard much of the secondary literature, primary sources, and databases which are theoretically made available online. Without the means to pay the oftentimes substantial sums necessary, your access is regularly at the mercy of whether you study, work, or have otherwise access to institutions that have already disbursed the relevant publishers in order to bypass those paywalls – like a school, university or (academic) library. And some paywalls may still remain in place even then, depending on the policies and, perhaps more importantly, the funding of your institution.¹⁵

Once you have bypassed these paywalls, you may encounter another complication that I already hinted at above: the online presentation of information can be bewildering. And this goes beyond the example of subsumed journals, that I briefly mentioned. Many times, articles or books are presented on publishers’ web pages or other kinds of digital repositories through an online reading program instead of a simple file or plain text. Notwithstanding the extra functionalities this approach brings with it, one cannot fail to notice that through their sometimes opaque design, these programs can inhibit the reader’s ability to find the resources they need and to save (parts of) the information they require as a back up to consult later, as well as frequently failing to provide a universally accessible environment to even study the material provided.¹⁶ Consequently, when such services are not licensed anymore by your institution of choice, one can no longer check one’s own and others’ references to the resources hosted through such online reading programs.¹⁷

In their own way, paywalls and a less than ideal presentation of knowledge can create the situation that online information might as well be lost or inaccessible to many researchers in a practical sense. It also brings with it the dilemma that I explained earlier, whether one should select sources merely for their fit with one’s research, or also with an eye for their (continuing) availability? And this is another way wherein those who want to write responsibly on science and scholarship have to be careful if they want their readers to be able to verify the sources that substantiate their work.

Conclusion: No Guarantees

The trials and tribulations I laid out in the foregoing frankly shouldn’t exist. References are too important to have even the most carefully crafted link possibly prove mortal. And the knowledge humans have accumulated is too precious to have informal and formal digital resources be relatively inaccessible or at risk of disappearing in the unfathomable depths of cyberspace. But as long as the possibilities of broken links and lost digital knowledge continue to exist, we have to take them into account. Especially because, as I alluded to above, even hard copies are regularly hard to come by and can get less widely available in time.

So even with my preference for books or articles that appeared originally in print, I have to contend with the fact that these are not always readily and indefinitely available to everyone. As such, I have made it part of my writing process to document and store my sources in case I have to hold myself accountable towards an inquiring or simply curious reader. So even if the locations that host my sources disappear or become untraceable, the references in my footnotes live on in the ridiculously meticulously organized map system on my personal computer. But I won’t be here forever and – if nothing unforeseeable happens – my hard drive will probably enjoy an even shorter life span! So if you are so inclined, continue to feel free to ask me any and all questions on my sources for the discussed topics on Bildungblacks. That is, quite literally, what I am here for.

References

Sarah Wild, “Millions of Research Papers at Risk of Disappearing from the Internet”, Nature 2024, 627 (8003), p. 256; Martin P. Eve, “Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles”, Journal of Librarianship and Scholarly Communication 2024, 12 (1), p. 2. For the humanities specifically, see: Patrik Svensson, Big Digital Humanities: Imagining a Meeting Place for the Humanities and the Digital (Ann Arbor: University of Michigan Press, 2016), p. 1-4.
Eve, “Digital Scholarly Journals Are Poorly Preserved”, p. 3.
Though, as we will discuss below, such repositories are not created equal everywhere, see: Joe Karaganis, “Introduction: Access from Above, Access from Below”, in: Joe Karaganis (ed.), Shadow Libraries: Access to Knowledge in Global Higher Education (Cambridge: The MIT Press, 2018), p. 4-6.
Jordi Pérez Colomé, “El Día en que Google Empezó a Empeorar: “Nos Acercamos Demasiado al Dinero””, El Pais May 3^d 2024; Christos Ziakis, “Important Factors for Improving Google Search Rank”, in: Andreas Veglis & Dimitrios Giomelakis (eds.), Search Engine Optimization (Basel: Multidisciplinary Digital Publishing Institute, 2021), p. 3-4.
Wild, “Millions of Research Papers at Risk of Disappearing from the Internet”, p. 256; Anthony Grafton, The Footnote: A Curious History (Cambridge: Harvard University Press, 1999), p. 233.
Sarah Glassner, “Broken Links and Failed Access: How KBART, IOTA, and PIE-J Can Help”, Library Resources & Technical Services 2012, 56 (1), p. 15; Cindi Trainor & Jason Price, Rethinking Library Linking: Breathing New Life into OpenURL (Chicago: American Library Association, 2010), p. 29; Eve, “Digital Scholarly Journals Are Poorly Preserved”, p. 3.
Glassner, “Broken Links and Failed Access”, p. 15.
Trainor & Price, Rethinking Library Linking, p. 29; Wild, “Millions of Research Papers at Risk of Disappearing from the Internet”, p. 256.
Glassner, “Broken Links and Failed Access”, p. 16-18; Trainor & Price, Rethinking Library Linking, p. 16.
Glassner, “Broken Links and Failed Access”, p. 18-22.
This article was published online on The Verge in full acknowledgement that this piece – like many of her previous digital efforts – may not stand the test of time, see: Susan E. Smith, “How to Disappear Completely”, The Verge 16 december 2024, TheVerge.com (retrieved on February 16^th 2024).
And initiatives to archive digital knowledge – which are often at least partly run by valiant volunteers – are as yet not comprehensive. For examples like the Internet Archive and Wayback Machine, see: Anat Ben-David & Adam Amram, “The Internet Archive and the Socio-Technical Construction of Historical Facts”, Internet Histories 2018, 2 (1–2), p. 180-181; Nattiya Kanhabua et al, “How to Search the Internet Archive Without Indexing It”, in: Norbert Fuhr et al (eds.), Research and Advanced Technology for Digital Libraries (Cham: Springer, 2016), p. 147.
Wild, “Millions of Research Papers at Risk of Disappearing from the Internet”, p. 256.
Mary Case, “Preservation and scholarly communication: The Grand Challenges of our Time”, Technicalities 2016, 36 (5), p. 3–6; Eve, “Digital Scholarly Journals Are Poorly Preserved”, p. 3.
Karaganis, “Introduction”, p. 4-6. This inequality also includes the costs for researchers to publish their findings as open access, that is publications that are made available without a paywall because the publisher was already reimbursed, see: Toby Green, “Is Open Access Affordable? Why Current Models Do Not Work and Why We Need Internet‐era Transformation of Scholarly Communications”, Learned Publishing 2019, 32 (1), p. 13-25.
Glassner, “Broken Links and Failed Access”, p. 14; Heather Hill, Disability and Accessibility in the Library and Information Science Literature: A Content Analysis”, Library & Information Science Research 2013, 35 (2), p. 137-138. Some of the mentioned shortcomings have already been discussed for a surprisingly long time, see: David R. Majka, “Remote Host Databases: Issues and Content”, Reference Services Review 1997, 25 (3/4), p. 23-24.
Mirosław Filiciak & Alek Tarkowski, “Poland: Where the State Ends, the Hamster Begins”, in: Joe Karaganis (ed.), Shadow Libraries: Access to Knowledge in Global Higher Education (Cambridge: The MIT Press, 2018), p. 163.