Understanding Everyday Activities

Robert Porzel, UHB.

If the proof of the pudding is in the eating then the ultimate test for understanding an instruction is its proper execution. This view greatly expands the scope of natural language understanding beyond the usual syntactic and semantic analysis. In this part of the MUHAI project we seek to operationalize the basic principles of human-centric AI so that machines will be able to understand how to perform everyday actions in the cooking domain. This involves moving away from executing fully explicit standardised instructions towards understanding instructions conveyed through natural language dialogues. The key challenge here is the integration of world knowledge and pragmatic inferencing into the understanding process, both on the level of language processing and on the level of task execution. For example, the knowledge that chopping a cucumber involves the use of a cutting board and a knife, and presupposes a specific orientation of the cucumber, as well as a conventional slice thickness, is not explicitly mentioned in a recipe, but is essential to carrying out the task and must therefore be inferred from common sense knowledge. Also the build-up of knowledge that generalises across recipes and ingredients is of importance, as it is a precondition for adapting existing recipes to given constraints, and ultimately for the creative design of novel recipes.

In order to achieve these goals we will define two kinds of benchmarks:

  1. one that consists in mapping between existing recipes formulated in natural language and actions executed in the VR world
  2. one that allows us to evaluate a new recipe design or variant proposal

As in all parts of the MUHAI project the notion of meaning-based and human-centric narratives also applied in the cooking domain. These narratives give meaning to collections of experiences of a virtual agent, i.e. object perceptions, body postures, force dynamics, visual processing and structured data collection, i.e. recipes, images and procedures. Building narratives requires the integration of multimodal sources of input (text, image, sound) and pattern detection in a model of constructional language processing. Constructions will be used as the basic representational unit in which all of these sources are combined. The outcome of constructional language processing is a semantic analysis, including identification of goals, plans, actions, objects, time and causation. The set of analyses make up the starting point for narratives in the domain that can be integrated with the personal dynamic memory in order to truly understand them, in the sense that they can be mapped to a series of low-level actions that can then be executed by a simulated agent in the VR kitchen environment.

To demonstrate the potential of this approach MUHAI will  develop two applications for recipe execution and design:

  1. Recipe execution - This application consists in executing recipes expressed in natural language in a VR kitchen environment. This requires mapping between a recipe (i.e. a sequence of instructions) and a sequence of low-level actions to be executed. The application will involve constructional language processing, consultation with the personal dynamic memory for pragmatic inference, and planning the execution of the concrete cooking actions. The application will be evaluated on the benchmarks described above
  2. Recipe design - This application is situated in the domain of professional recipe design. In the first part of the project MUHAI will focus on the challenge of building a virtual agent that can act as an assistant chef. This digital assistant needs to integrate technical cooking knowledge with a considerable prior memory of recipes, previously successful and unsuccessful variants, cooking procedures, and cultural context. Most importantly, it needs to do so in an explicable, transparent manner. In a second phase, the focus will shift to a more challenging task that embraces even more aspects of human-centric AI, namely that of recipe design. This is a capacity that goes beyond skill and knowledge and introduces creativity.


More Articles

Talking (online) about inequality: Towards an observatory on inequality narratives

Carlo R. M. A. Santagiustina. “Storytelling is a means by which representatives of new communi...

Luc Tuymans through the lens of AI

Luc Steels. AI (Artificial Intelligence) researchers try to understand the structures and proces...

MUHAI Visual Identity

Paola Fortuna, Studio +fortuna  If the term “scientific” usually attracts our attention, the t...

Subscribe to Our Newsletter:

I agree with the Privacy policy

Meaning and Understanding
in Human-centric
Artificial Intelligence

Follow Us
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 951846