One surprising aspect of running a digital library is the power search engines have over your work. A daily battle ensues, it seems, to insure the library is fully exposed to search engines, making them well aware of constant content updates. Like many digital libraries, we have many returning visitors. But a well optimized site for search engine promises to increase those counts many fold -- by just allowing random searchers to find the site.
For better or worse, a good majority of our content is PDFs, which we can't force search engines to crawl. We haven't created the content, so we don't know much they've followed basic strategies for optimizing PDFs: making them text based, providing good metadata, stripping down file size, etc. So, we are reliant on the power of our metadata to make sure people can find the content.
I found a solution to this taken from a paper from the National Library of Singapore, Libraries without Borders: Content Delivery [pdf], Singapore Style by Chan Ping Wah and Ngian Lek Choh.
The upshot is that these librarians created subject guides on a variety of topics, mostly pertaining to Singapore’s history. Generally, the pages received a few thousand hits per month until librarians placed guides on individual web pages and let Google (and others) crawl the content. The guides, called Singapore Infopedia, had the simple look and feel of a Wikipedia page (the authors' words), but were fully sourced, vetted and reviewed.
Opening the material up to search engines meant not only library users would find the pages, but anyone looking for information on Singapore. Hits went from a few thousand a month to hundreds of thousands per month.
From the paper:
To make our resources available to them, a better way would be to enable our services to be easily incorporated into their own portal, via some sort of API services (e.g. Web Services). With this, researchers at their respective institution will not need to navigate between different portals to request or access content. Instead, within their own portal, they will be able to search and access an external library, for example, the national library’s content.
It’s something to think about for our sites. From a general user perspective, finding information on our sites can be difficult. User experience folks have long argued that users don’t like too many clicks, and they often don’t like PDFs appearing where webpages should be. What I learned from Singapore Infopedia is that people want a concise (up-to-date) write-up on a topic. Instead of linking to the, say, latest Toolkit on behavioral communications strategies, why not write up a few paragraphs on it, source the material and provide easy access to the source material?
What I also learned is that if libraries are in the business of delivering content, they should have more control in how that content is presented. The Singapore Infopedia method, I think, may cut down on some information overload, the stripped-down pages make it easier to read for those in low-bandwidth countries (thus, easier to transfer knowledge) and be easier to find for regular internet searchers. Others may find it easier to link to these pages than to a PDF, which could bring more people to the site. The Singapore paper also pointed the clean write-ups make it much easier to make an API-like service and share the information with other like-minded sites.
Wah and Choh argue that researchers often don't visit the library portal. It would be better if the library could figure a way to get their content onto a researcher's institutional portal:
To make our resources available to [researchers], a better way would be to enable our services to be easily incorporated into their own portal, via some sort of API services (e.g. Web Services). With this, researchers at their respective institution will not need to navigate between different portals to request or access content. Instead, within their own portal, they will be able to search and access an external library, for example, the national library’s content.
Of course, creating good content is time consuming and quite difficult. Worrying about search engine results can also be a fool's errand. But I think finding a way to unbundle some of the materials we place on our knowledge sites may go a long way into having it read, disseminated and acted upon.