If the proof of the pudding is in the eating then the ultimate test for understanding an instruction is its proper execution. This view greatly expands the scope of natural language understanding beyond the usual syntactic and semantic analysis. In this part of the MUHAI project we seek to operationalize the basic principles of human-centric AI so that machines will be able to understand how to perform everyday actions in the cooking domain. This involves moving away from executing fully explicit standardised instructions towards understanding instructions conveyed through natural language dialogues. The key challenge here is the integration of world knowledge and pragmatic inferencing into the understanding process, both on the level of language processing and on the level of task execution. For example, the knowledge that chopping a cucumber involves the use of a cutting board and a knife, and presupposes a specific orientation of the cucumber, as well as a conventional slice thickness, is not explicitly mentioned in a recipe, but is essential to carrying out the task and must therefore be inferred from common sense knowledge. Also the build-up of knowledge that generalises across recipes and ingredients is of importance, as it is a precondition for adapting existing recipes to given constraints, and ultimately for the creative design of novel recipes.
In order to achieve these goals we will define two kinds of benchmarks:
As in all parts of the MUHAI project the notion of meaning-based and human-centric narratives also applied in the cooking domain. These narratives give meaning to collections of experiences of a virtual agent, i.e. object perceptions, body postures, force dynamics, visual processing and structured data collection, i.e. recipes, images and procedures. Building narratives requires the integration of multimodal sources of input (text, image, sound) and pattern detection in a model of constructional language processing. Constructions will be used as the basic representational unit in which all of these sources are combined. The outcome of constructional language processing is a semantic analysis, including identification of goals, plans, actions, objects, time and causation. The set of analyses make up the starting point for narratives in the domain that can be integrated with the personal dynamic memory in order to truly understand them, in the sense that they can be mapped to a series of low-level actions that can then be executed by a simulated agent in the VR kitchen environment.
To demonstrate the potential of this approach MUHAI will develop two applications for recipe execution and design: