Word Concept Model for Intelligent Dialogue Agents

Yang Li, Tong Zhang and Stephen E. Levinson

Beckman Institute, 405 North Mathews, Urbana, IL 61801, USA

{yangli, tzhang1, sel}@ifp.uiuc.edu

Abstract.  Information extraction is a key component in dialogue systems.  Knowledge about the world as well as knowledge specific to each word should be used for robust semantic processing.  An intelligent agent is necessary for a dialogue system when meanings are strictly defined by using a world state model.  An extended concept structure is proposed to represent knowledge associated with each word in a “speech-friendly” way.  By considering knowledge stored in the word concept model as well as knowledge base of the world model, meaning of a given sentence can be correctly identified.  This paper describes the extended concept structure and how knowledge about words can be stored in this concept model. 

1   Introduction

With the recent progress in speech recognition techniques, it is practical now to build spoken dialogue systems that can accept relatively large number of vocabularies.  Researchers have built many spoken dialogue systems. [1][2][3] With the increased complexity of tasks, however, there are several limitations for currently used methods.  Building spoken dialogue systems requires a lot of manually transcribed data, and the system are usually not robust and flexible enough in dealing with unexpected sentence styles.  More importantly, information extraction has to be built according to a specific domain.  There is no systematical approach that we can use to build spoken dialogue systems in a fast and automatic way.  Although there are many aspects to be considered, we claim that meanings of each word in an input sentence have not been given enough emphasis when semantic processing is done using current methods.

 

For human, there is knowledge associated with every word he knows.  We know a book is something composed of pages and made of paper.  We also know there are words printed in a book. There may be other information.  All these knowledge composes the concept of “book”.  Human uses the knowledge behind the concept to interpret the meaning of sentences.  There may be techniques for computer to extract information embedded in a sentence without implementing the knowledge associated with each word, but the limitations are clear as we can find from currently built systems.  We shall have a much better chance of success if we explicitly implement into our system the knowledge associated with each word, and each of those word phrases whose meanings are different from the original words.  This knowledge will help us to extract information with more robustness and flexibility if we build powerful reasoning tools at the same time.

 

The knowledge associated with words are actually not knowledge about the “word”, it is about the “world”.  Each sentence has a meaning where the meaning is about the “world”. According to Schank’s Conceptual Dependency Theory, [4] there is a unique representation for meaning behind any sentences that convey the exact message, regardless of language.  This is absolutely true.  However, Schank’s scheme of meaning representation doesn’t define meaning in a strict way. Further, Schank’s conceptualization doesn’t employ knowledge behind each word and therefore, doesn’t cover the meanings of a given sentence accurately. 

 

We claim that strict meaning can only be defined in a given “world”.  Meanings in different “worlds” may have differently representations.  Therefore, it is necessary to have a “world model” before any meaning can be defined.  As Woods suggested, an intelligent system will need to have some internalized “notation” or “representation” for expressing believed and hypothesized facts about the world. [5] Once we have a world model, representation of meaning is clear and unique: descriptions of the world states are the stative conceptualizations of Schank, and changes of the world states are the active conceptualizations.  This meaning representation is coherent to the world model and has nothing to do with any specific language.  We can easily describe the world in formal languages, but the difficult task is how to describe it in natural languages and how to understand the natural languages used to describe it.

 

If a world model is used in a dialogue system, it will be convenient to use an intelligent agent who responses to changes in the world state and acts to achieve some ideal world state.  An intelligent agent is a software that has its own goals and can reason and plan according to its world state or simply environmental variables.  Intelligent agents are capable of completing complicated tasks.

 

As we mentioned earlier, knowledge behind each word – the concept is important for semantic processing.   However, no current knowledge representation method is good for storing information associated with each word. The procedural semantics suggested by Woods cannot describe all the knowledge associated with each word accurately.  Researchers have tried to build universal methods for knowledge representation.  These methods only represent the knowledge by using highly abstracted symbol languages.  They are not suitable for storing information related to a specific word.  In order to build a structure that can store the associated concept of each word, we propose a word concept model to represent knowledge in a "speech-friendly" way.  Our goal should be to establish a universal knowledge representation method so that a system can use the knowledge to extract information from a sentence as well as to learn new words and knowledge by simply filling the newly acquired information into existing structures.  The motivation behind this is that we want to build a system that can grow automatically through interaction with its environments and users.  In section 2, we propose an Extended Concept Structure that can be used to store information related to each word.  In section 3, we describe how knowledge about words can be stored in the proposed concept structure.  Discussion is given in Section 4.

2   Extended Concept Structure (ECS)

Currently, a concept can be formally defined as specific regions in an attribute space.   Each dimension of an attribute space is a specific attribute that can take continuous or discrete, finite or infinite values. In this formal concept description, an attribute space is required first.  Selection of this attribute space is determined by the specific application.  However, for speech capable intelligent agent, the attribute space should be universal if we want the agent has growth potentials.  Therefore, no specific domain should be defined though we may start from some certain applications.  Given the tremendous varieties of everyday life, it is impossible to select an attribute space on which all the possible words can be defined in.  Furthermore, new types of words may not be incorporated into this attribute space properly.  On the other hand, concepts in an intelligent agent should be layered.  Concepts in different layers may have different attributes.  For a fixed task intelligent agent, we can manually set the attributes for different layers.  But for a learning agent, such structure will fundamentally limit the agent’s abilities.

 

In order to conquer the fixed attributes problem, we propose a new type of concept description: extended concept structure. (ECS)  Key features of ECS are:  an attribute can be another concept; an attribute and the concept it belongs to is linked by “relations”; there is no fixed attribute list, concepts are described by attribute-on-demand.

 

The key idea here is not to limit a concept to only several possible attributes.  We may view all the possible concepts as potential attributes to a specific concept.  The problem is that the concept of everyday life is an almost infinite complex network.  Apparently, concepts are layered.  Explanation of “book” requires knowledge of “paper”, “text”, etc.  Ultimately, all concepts are explained by a primitive concept set that is directly explained by input signal patterns from sensory motors.  Therefore, we can view the perceptual space as the fixed part of the attribute space of our concept model.  Primitive concepts built directly on this fixed attribute space can be used as attributes of other concepts.  For example, the concept of “color” can be built directly on the perceptual space while the concept of  “Sun” requires “color” as one of its attributes.  It is not necessary to list all the possible attributes the concept of “sun” might have because all the concepts are possible attributes.  We only list them when the agent knows there is connection between concept and a possible attribute.

 

In order to correctly reflect the relation between a concept and an attribute or another concept, a set of basic relations should be defined.  “Sun” “has” the “color” of “red” while “door” is “part of” “room”.  Here, “has” and “part of” are different relations.  We shall distinguish such relations so that we know “color” is applied to the entire part of “Sun” while “door” may not appear anywhere in a “room”.  Fortunately, the number of basic relations is limited.  We can fix the basic relation set without losing too much understanding abilities of the intelligent agent.  Ways to automatically learn basic relations may also be explored.  However, we shall not worry about that at this stage. 

 

By extending the ideas of concept and attribute, ECS views all concepts as possible attributes for a specific concept.  Therefore, layered concepts are linked by the basic relations.  The base attributes comes from the perceptual space.  In this way, ECS has the capability to universally store knowledge in a similar way used by human minds.

 

Obviously, many problems arise when we want to utilize ECS in an intelligent agent.  Problems in the formal concept description such as learning new concept from examples or discovering new conclusion from knowledge base may be reinvestigated here.  But we may want to focus on building the word concept model by using ECS while coming back to these problems when necessary.

3   Word Concept Model

In human mind, complex concepts are explained by simple concepts.  We need to understand the concept of “door” and “window” before we understand the concept of “room”.  The concept of  “window” is further explained by “frame”, “glass” and so on.  Ultimately, all the concepts are explained based on the input signal patterns perceived by the human sensory motors.  This layered abstraction is the key to fast processing of information.  When reasoning is being done, only related information is pulled out.  Lower layered information may not appear in a reasoning process.  In our word concept model, such layered concept abstraction should also be used.  We should build complex concepts by using simple concepts.

 

As we started to map word into this concept structure, it is necessary to treat different types of words differently.  Since the most basic type of concept in the world are physical matters, we should first map nouns and pronouns into the concept space.  We call this types of words solid word.  For solid word, we can find the mapping on the concept space quite easily since they refer to specific types of things in the physical world.  There may be multiple definitions for one word because one word can have many meaning.  The agent can decided which definition to use by combining context information.  Solid words are the basic building blocks for word concept modeling.  Each solid word can be stored in an extended concept structure or ECS.

 

After we have defined the solid words, other types of words can be defined around solid words instead of in the concept space.  But their definitions are closely coupled with the concept space. 

 

For verbs, we can first define the positions where solid word can appear around this verb.  For example, solid word can appear both before and after the verb “eat”.  For some verb, more than one solid word can appear after the verb.  In all the cases, the positions where solid word can appear are numbered.  Then, definition of this verb is described by how the attribute of each solid word should change.  For example, if “Tom” is the solid word before “go” and “home” is the solid word after “go”, then the “position” attribute of “Tom” should be set to “home” or the location of “home”.  Of course, there are complex situations where this simple scheme doesn’t fit.  However, more techniques can be developed by using the same idea. 

 

For adjectives, definitions are described as values of the attributes.  “Big” may be defined as “more than 2000 square feet” for the “size” attribute if it is used for houses.  Obviously, multiple definitions are possible and should be resolved by context.  Also, since there is link between a concept and an attribute, we can put an extended field in the link of ECS to locally limit the relation.  Obviously, there are many options to solve this ambiguity problem, we can come back and attach them when necessary. Adverbs can be defined in a similar fashion.  Though adverbs are to limit adjectives and verbs, the limitations are ultimately reflected in the attributes of solid words.  For example, “quickly” may mean the position of some object changes a large amount in a short time.  If we define basic relations to reflect changes in time, then “quickly” can be modeled.  

 

For prepositions, we should define them by using the basic set of relationships.  These include “in”, “on”, “in front of”, “at left of”, “include”, “belong to”, etc.  Then prepositions can be defined by using these basic relationships. 

 

There are other types of words, but most of the rest are grammar words and can be dealt with by a language parser.  We should solve many miscellaneous tasks when we are building the system.

4   Discussion

An intelligent system can be built in two ways: one is to build all parts of the system manually.  We can put all the necessary algorithm and knowledge into the system and fine-tune it when we see problems.  Another approach is to build a suitable structure and then the system can learn from the environment and grow on its own.  The first approach may fail when the task is so complex that the designers are unable to control over so many details.  The second approach is attractive but maybe difficult to start.  We hope that we can design a proper structure and implement some core knowledge into the system for learning.  Then the system can grow from that point.  The word concept model is one small step towards building the universal learning structure for intelligent systems as we see language capability plays a key role in intelligence.  Actually, there are already excellent theories in the field of artificial intelligence.  The lack of speech capability for current intelligent agents may be the crucial reason why real applications of artificial intelligence are not so fruitful.  We hope giving intelligent agents natural language capability can help to provide better solutions.  Given the scope of work involved, the authors were unable to exhaust all possible references at this time.  The readers may forgive us for this matter.   

 

Acknowledgement

This work was supported in part by Yamaha Motors Corporation and in part by National Science Foundation Grant CDA 96-24396.

References

1. Zue, V., et al., JUPITER: A Telephone-Based Conversational Interface for Weather Information, IEEE Transactions on Speech and Audio Processing, 8 (1), 85-96, 2000.

2. Baggia, P., G. Castagneri, and M. Danieli, Field Trials of the Italian ARISE train timetable system, in Proceedings of the 1998 IEEE 4th Workshop on Interactive Voice Technology for Telecommunications Applications, New York, NY 1998.

3. Sourvignier, B. et al, The Thoughtful Elephant: Strategies for Spoken Dialog Systems, IEEE Transactions on Speech and Audio Processing, 8 (1), 51-62, 2000.

4. Schank, R. C., Conceptual Information Processing. North Holland, Amsterdam. 1975.

5. Woods, W.  A., Procedural Semantics as A Theory of Meaning, in Elements of Discourse Understanding, Ed. By A. K. Joshi, B. L. Webber and I. A. Sag., Cambridge University Press, 1981.