Here comes Technology
This simple DIY project only consisits of several PHP files. It is manually written without a PHP framework!
What are Application Programming Interfaces (APIs)
Of course, you can read the Wikipedia article, but, if you are not a techie person, it does not mean much for you. I also don't know the precise definition. It does not matter. It is more important if we can use it. To me at least, it is a way to access data (and/or service/software). Normally, people access a website and browse the content. Sometimes we also interact with it by posting information. We upload photos in Facebook, or send name and email address to buy a ticket, or simply type keywords to search on Google. What happening behind the scenes is data transactions and database is often used to manage them efficiently. For example, Google is a kind of database, so that we can find websites after typing keywords.
Now, instead of using a normal website, API allows you to access such a database (and/or service/software) directly. It is an extra service the website provider offers to share the data (and/or service/software) they have. Why? The offered data is standardised, so other people can use them quickly and efficiently. In particular, software developers can write a code to process them automatically. This is important, because normally users have to type something and click buttons, when using a website. As long as the code is written as such, computer/machine can process data without us. In this way, APIs are useful to build a new service/software. For instance, if you have an access to a weather API and a map API (e.g. GoogleMaps), you can show the temparatures of different locations on a map. This can be a totally new service which is useful for somebody. API allows us to reuse somebody's data (and/or service/software). Why not use such a nice idea for Digital Humanities? That's what CAROL tries to do.
Due to the use of full-text search and chain of APIs on the fly, the query performance is rather slow. Unfortunately, there is not much we can do about it, as this application totally depends on external APIs. In particular, the combination of full-text search and named entity recognition based on it makes it hard to achieve a good performance. On the other hand, we have also proved that the current API set-ups could be a bottle neck for serious implementation of distributed data-centric research for Digital Humanities. We need to wait for faster Internet infrastructure. In fact, this point is pointed out in my academic paper. Let's hope the situation will become better in the near future!
No scalable service
As this project is largely experimental, CAROL only uses free service of APIs. Therefore, scalable and stable service cannot be offered at the moment. In particular, it uses free version of a commercial API (Dandelion APIs and Yandex.Translate API) and the use is limited to 1000 units (entity recognition) per day, and 1,000,000 characters (translation) per day and 10,000,000 characters per month, respectively. If there is a high demand on CAROL, we consider how to offer a more robust service. We are also investigating other Open Source solutions such as Ambiverse Natural Language Understanding.
- The Open Library to search inside the journal
- Dandelion to extract entities
- Yandex.Translate to translate the snippet into English
- Wikipedia/Wikimedia to extract metadata of thumbnails
- Google Maps to display maps
- DBpedia to fetch coordinates of places
Data are only displayed and no data are stored in our servers.