Having access to enormous information resources is not equivalent to being able to use this information efficiently. The key issue is the availability of the search interface and retrieval of the required data. The development of such search engines as Google Custom Search helped considerably solve tasks on extracting relevant information from local repositories and the Internet. However, presenting search results in the form of snippets significantly limits the use of search mechanisms in numerous apps where it is necessary to answer specific questions, such as where? when? who? etc.
The systems for information search, the so-called QA (question answering) systems, have been specially developed to provide answers to such kind of questions, asked in a natural language form but not in the form of ready templates. The most well-known QA system is IBM Watson designed by the IBM DeepQA development group. This system has gained wide recognition, and it has proved its efficiency and operability due to its victory in Jeopardy!, a popular TV game show. Today IBM Watson is frequently applied in healthcare and many other domains. A number of alternative approaches to software engineering of QA systems have recently emerged, such as LSTM neural networks. However, this approach requires in-depth research. The development of QA systems based on the technologies designed by IBM DeepQA proves the above-mentioned tendency.
A look at YodaQA, a fully open source system introduced by Petr Baudis in his Master’s thesis, published in several editions and presented as a service on the web, reveals that its success has come due to its good architecture and the IBM DeepQA proven technologies.
However, despite being multilingual, as referred to in numerous articles, YodaQA gives rather poor results when attempting to answer questions in East Slavic languages. Therefore, we have decided to try and design our own system based on the YodaQA architecture that is able to find satisfactory answers to naturally phrased questions in East Slavic languages (for example, Russian) and is focused on some specific domains. We assume that it will allow us to gain the necessary experience and build specialized QA systems in future as stand-alone apps or as intellectual interfaces of other apps.
To be continued…