About this project
Cross Assistant Reading for the Open Library (CAROL) is a personal research project of Go Sugimoto of Studio G4I. He works for the Austrian Centre for Digital Humanities. It is created to experiment the (re)use of freely available Open Data (via Application Programming Interfaces (APIs)(see Technology section). He has developed a very simple application to explore books from the Open Library/Internet Archive in a new way. All you have to do is to find an interesting book and search inside it and use a various set of data as a reading aid. Hopefully it opens up new research questions, or simply enjoy a basket of context information.
The Scope of the Project
Contextualisation: combination of full-text search and metadata search
There is often a gap between metadata and the content (i.e. data that the metadata describes).
Firstly, metadata plays an extremely important role to discover valubale resoures. A typical example is the history of metadata and system development in library domain. Library catalogues and its management systems enjoy the conceptual progress on the metadata which describes library resources including books, manuscripts, newspapers, audios, and videos. However, fewer attempts are made to explore how to study the content of resource effectively, by taking advantage of the metadata. In general, metadata is used to search, identify, and locate the resouces of interests. Once the resources are discovered, the users are required to examine them manually, for example, page by page, without any further help of the metadata. CAROL is an attempt to change the role of metadata and content, by mixing the metadata search (Dandelion, Wikipedia, DBpedia, Open Library) and the full-text search (Open Library). Thanks to the dual mash-up of data and metadata, CAROL has achieved a certain level of contextualisation for the use in Digital Humanities and cultural heritage. It is important, because data is often too fragmented to be useful, partly due to our (web) environment where data is dispersed, and partly due to the database-ation of the Web.
Distributed research: data-centric research without (owing) data
Secondly, CAROL is a small endeavour to showcase a distributed data-driven research. As it only uses data from external APIs, meaning CAROL does not store any data on its server. When you think of the reason why a lot of people have opened up and shared data on the web, you can soon undestand that they are for other people to use and do something with them. That implies it is natural that we should borrow data. In practice, however, it seems that such data borrowing is still rather limited at least in Digital Humanities. Many projects use external data for data enrichment and/or provide individual hyperlinks, but CAROL provides an example that makes a data-centric research more interesting, by integrating different sets of data on the fly, without owning data at all. Even if your research focuses on your own data, we believe that hybrid data research (mixing the data of your own and the data from the third parties) will be more exercised in the near future.
Thirdly, we try to see the potential of interdisciplinray study in the context of Digital Humanities. There are several reasons why interdisciplinary research is not easy, but CAROL demonstrates how different types of data can be integrated, and assist users to explore books when a knowledge outside their expertise is needed. CAROL achieves this by generating automatically created hyperlinks from one of the most well-known source of knowledge, Wikipedia. It is especially effective, when users encounter technical terms such as specific places, persons, technologies, and spieces.
Evaluation of API implementations for Open Data
Lastly, CAROL contributes to the discourse of Open Data which at the moment is still not easy for many researchers. In particular, APIs are widely used as a mean of Open Data by technology-savvy researchers and software developers, however they could be a barrier for many humanities researchers to use a large amount of valuable data on their own. CAROL proposes the practice of Easy Data to standardise APIs (especially JSON) and create more user-friendly GUI tools to handle APIs, in order to remove the risk of digital devide
CAROL started as a spin-off application. Initially, it was called James Cook Dynamic Journal (JCDJ). As the code was highly portable, it is decided to extend the scope of the application, making it more global and flexible. The reading aid with contextualisation is the same, but now users can freely search books from the Open Library and search inside one of them. The usability is also improved with accordion function to expand and collapse the reading aid section.
If you would like to know more details about the original project, the outcomes can be found:
- Sugimoto G. (2017) Battle Without FAIR and Easy Data in Digital Humanities. In: Garoufallou E., Virkus S., Siatri R., Koutsomiha D. (eds) Metadata and Semantic Research. MTSR 2017. Communications in Computer and Information Science, vol 755. Springer, Cham
- Sugimoto, G. (2018) ‘Who is open data for and why could it be hard to use it in the digital humanities? Federated application programming interfaces for interdisciplinary research’, IN Internatinal Journal of Metadata, Semantics and Ontologies, 2017 Vol.12, No.4, pp.204-218.
From Prototype to Production
As this project is experimental, there are many shortcomings as well as wish list for the future development. For example, it would be nicer if we can:
- add more Natural Language Processing functions (tokeniser, lemmatiser, Part Of Speech tagging, etc)
- output the search results in various formats: file download, REST API etc
- allow users to select what reading aids to be included in the results
- create a better User Interface and User Experience
It is hoped the project grows slowly but continously. Your help/contribution/collaboration is also welcome!
Use of CAROL
General copyright of Cross Assistant Reading for the Open Library (CAROL)
© 2019 Go Sugimoto
The website and the application of Cross Assistant Reading for the Open Library (CAROL) is licensed under a Creative Commons Attribution 4.0 International License.