In the KnowCIT project we extend the conversational abilities of the conversational agent MAX by equipping him with access to collaboratively constructed knowledge drawn from the online encyclopedia Wikipedia. By means of the crowd-sourced knowledge resource, the agent is able to identify, label, track, and continue the topic of a dialog as the interlocutor of a human dialog partner. This allows him to answer questions, to detect topic changes and to react meaningfully to the challenge of dialogical dynamics. The KnowCIT project aims to build interactive technology that enables artificial agents to explore crowd-sourced knowledge resources generated by large communities of web users. From a theoretical point of view we aim to tackle the grounding problem studied in cognitive science by interfacing artificial cognitive agents with social ontologies. That way artificial agents become beneficiaries of crowdsourcing so that their human users gain in turn from the increase of their communicative competence. This Wikipedia is in the line of efforts to utilize social tagging systems such as, e.g., the Wikipedia, wikimanuals and other special wikis, which provide large resources of encyclopedic knowledge. In this context, we plan to exploit object knowledge as well as linguistic and metalinguistic knowledge (by example of so called wiktionaries) in a way that enables virtual agents to identify, label, track and to continue the topic of a dialogue in which they participate as the interlocutor of a human user.
For the evaluation of the WikiQA system we utilized 200 questions from the CLEF-2007 monolingual QA task, using German as the target language. Note that we manually evaluated the answers by means of their sentence representation only. That is, the ex- act answer has not been extracted, but had to be included in the answer sentence as determined by the system:
The knowledge base of WikiQA utilizes the German Wikipedia dump (Version 10/2010). More precisely, it utilizes 1.063.772 articles and 88.883 categories. The entire corpus was linguistically analyzed, and subdivided into 30.890.452 sentences.
Artificial Intelligence Group
KnowCIT: Knowledge Enhanced Embodied Cognitive Interaction Technology
CITEC: Center of Excellence Cognitive Interaction Technology
Universität Bielefeld
Universitätsstrasse 25
D-33615 Bielefeld
Tel: 0521 106-2924
Fax: 0521 106-2962
Prof. Dr. Ipke Wachsmuth
Head of Project
ipke@techfak.uni-bielefeld.de
http://www.techfak.uni-bielefeld.de/~ipke/
Dr. Ulli Waltinger
uwalting@techfak.uni-bielefeld.de
http://www.techfak.uni-bielefeld.de/~uwalting/
Alexa Breuing
abreuing@techfak.uni-bielefeld.de
http://www.techfak.uni-bielefeld.de/~abreuing/