Dialog system
Encyclopedia
A dialog system or conversational agent (CA) is a computer system intended to converse with a human, with a coherent structure. Dialog systems have employed text, speech, graphics, haptics, gestures and other modes for communication on both the input and output channel.
What does and does not constitute a dialog system may be debatable. The typical GUI
wizard
does engage in some sort of dialog, but it includes very few of the common dialog system components, and dialog state is trivial.
Dialog systems that are based on a text-only interface (e.g. text-based chat) contain only stages 2-4.
or databases), and decides on the best response to the user. The dialog manager maintains the dialog flow.
The design of the dialog manager evolves over time.
The dialog flow can have the following strategies:
The dialog manager can be connected with an expert system
to give the ability to respond with specific expertise.
In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as embodied agents
.
What does and does not constitute a dialog system may be debatable. The typical GUI
Gui
Gui or guee is a generic term to refer to grilled dishes in Korean cuisine. These most commonly have meat or fish as their primary ingredient, but may in some cases also comprise grilled vegetables or other vegetarian ingredients. The term derives from the verb, "gupda" in Korean, which literally...
wizard
Wizard (software)
A software wizard or setup assistant is a user interface type that presents a user with a sequence of dialog boxes that lead the user through a series of well-defined steps. Tasks that are complex, infrequently performed, or unfamiliar may be easier to perform using a wizard...
does engage in some sort of dialog, but it includes very few of the common dialog system components, and dialog state is trivial.
Components
There are many different architectures for dialog systems. What sets of components are included in a dialog system, and how those components divide up responsibilities differs from system to system. Principal to any dialog system is the dialog manager, which is a component that manages the state of the dialog, and dialog strategy. A typical activity cycle in a dialog system contains the following phases:- The user speaks, and the input is converted to plain text by the system's input recognizer/decoder, which may include:
- automatic speech recognizer (ASR)
- gesture recognizer
- handwriting recognizer
- The text is analyzed by a Natural language understandingNatural language understandingNatural language understanding is a subtopic of natural language processing in artificial intelligence that deals with machine reading comprehension....
unit (NLU), which may include:- Proper Name identification
- part of speech tagging
- Syntactic/semantic parser
- The semantic information is analyzed by the dialog manager (see section below), along with a task manager that has knowledge of the specific task domain.
- The dialog manager produces output using an output generator, which may include:
- natural language generator
- gesture generator
- layout engineLayout engineA web browser engine, , is a software component that takes marked up content and formatting information and displays the formatted content on the screen. It "paints" on the content area of a window, which is displayed on a monitor or a printer...
- Finally, the output is rendered using an output renderer, which may include:
- text-to-speech engine (TTS)
- talking headTalking headTalking head may refer to:Computers and internet*Computer facial animation, area of computer graphics that animates images of the human head and face*Interactive online charactersFilm and television*Talking Head , 1992 film by Mamoru Oshii...
- robotRobotA robot is a mechanical or virtual intelligent agent that can perform tasks automatically or with guidance, typically by remote control. In practice a robot is usually an electro-mechanical machine that is guided by computer and electronic programming. Robots can be autonomous, semi-autonomous or...
or avatarAvatar (computing)In computing, an avatar is the graphical representation of the user or the user's alter ego or character. It may take either a three-dimensional form, as in games or virtual worlds, or a two-dimensional form as an icon in Internet forums and other online communities. It can also refer to a text...
Dialog systems that are based on a text-only interface (e.g. text-based chat) contain only stages 2-4.
Dialog manager
The dialog manager is the core component of the dialog system. It maintains the history of the dialog, adopts certain dialog strategy (see below), retrieve the content (stored in filesComputer file
A computer file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished...
or databases), and decides on the best response to the user. The dialog manager maintains the dialog flow.
The design of the dialog manager evolves over time.
- finite-state machineFinite state machineA finite-state machine or finite-state automaton , or simply a state machine, is a mathematical model used to design computer programs and digital logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states...
- frame-based: The system has several slots to be filled. The slots can be filled in any order. This supports mixed-initiative dialog strategy.
- information-state based
The dialog flow can have the following strategies:
- System-initiative dialog: The system is in control to guide the dialog at each step.
- Mixed-initiative dialog: Users can barge in and change the dialog direction. The system follows the user request, but tries to direct the user back the original course. This is the most commonly used dialog strategy in today's dialog systems.
- User-initiative dialog: The user takes lead, and the system respond to whatever the user directs.
- Learned strategy: the system's next dialogue action is chosen based on an optimisation method such as Reinforcement Learning
The dialog manager can be connected with an expert system
Expert system
In artificial intelligence, an expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, like an expert, and not by following the procedure of a developer as is the case in...
to give the ability to respond with specific expertise.
Types of systems
Dialog systems fall into the following categories, which are listed here along a few dimensions. Many of the categories overlap and the distinctions may not be well established.- by modalityModality (human-computer interaction)In human–computer interaction, a modality is the general class of:* a sense through which the human can receive the output of the computer * a sensor or device through which the computer can receive the input from the human...
- text-basedText-basedUsually used in reference to a computer application, a text-based application is one whose primary input and output are based on text rather than graphics or sound. This does not mean that text-based applications do not have graphics or sound, just that the graphics or sound are secondary to the...
- spoken dialog systemSpoken dialog systemA Spoken dialog system is a dialog system delivered through voice. It has two essential components that do not exist in a text dialog system: a speech recognizer and a text-to-speech module.-Components:* Speech recognizer* Text-to-speech...
- graphical user interfaceGraphical user interfaceIn computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...
- multi-modalMultimodal interactionMultimodal interaction provides the user with multiple modes of interfacing with a system. A multimodal interface provides several distinct tools for input and output of data.- Multimodal input :...
- text-based
- by device
- telephone-based systems
- PDAPersonal digital assistantA personal digital assistant , also known as a palmtop computer, or personal data assistant, is a mobile device that functions as a personal information manager. Current PDAs often have the ability to connect to the Internet...
systems - in-car systems
- robotRobotA robot is a mechanical or virtual intelligent agent that can perform tasks automatically or with guidance, typically by remote control. In practice a robot is usually an electro-mechanical machine that is guided by computer and electronic programming. Robots can be autonomous, semi-autonomous or...
systems - desktopDesktop computerA desktop computer is a personal computer in a form intended for regular use at a single location, as opposed to a mobile laptop or portable computer. Early desktop computers are designed to lay flat on the desk, while modern towers stand upright...
/laptopLaptopA laptop, also called a notebook, is a personal computer for mobile use. A laptop integrates most of the typical components of a desktop computer, including a display, a keyboard, a pointing device and speakers into a single unit...
systems- native
- in-browserWeb browserA web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...
systems - in-virtual machineVirtual machineA virtual machine is a "completely isolated guest operating system installation within a normal host operating system". Modern virtual machines are implemented with either software emulation or hardware virtualization or both together.-VM Definitions:A virtual machine is a software...
- in-virtual environment
- robots
- by style
- command-based
- menuMenu (computing)In computing and telecommunications, a menu is a list of commands presented to an operator by a computer or communications system. A menu is used in contrast to a command-line interface, where instructions to the computer are given in the form of commands .Choices given from a menu may be selected...
-driven - natural languageNatural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
- speech graffiti
- by initiative
- system initiative
- user initiative
- mixed initiative
Applications
Dialog systems can support a broad range of applications in business enterprises, education, government, healthcare, and entertainment. For example:- Responding to customers' questions about products and services via a company’s website or intranet portalIntranet portalAn intranet portal is the gateway that unifies access to all enterprise information and applications on an intranet. It is a tool that helps a company manage its data, applications, and information more easily, and through personalized views. Some portal solutions today are able to integrate legacy...
- Customer service agent knowledge baseKnowledge baseA knowledge base is a special kind of database for knowledge management. A Knowledge Base provides a means for information to be collected, organised, shared, searched and utilised.-Types:...
: Allows agents to type in a customer’s question and guide them with a response - Guided sellingGuided sellingGuided selling is a process that helps potential buyers of products or services to choose the product best fulfilling their needs and hopefully guides the buyer to buy. It also helps vendors of products Guided selling is a process that helps potential buyers of products or services to choose the...
: Facilitating transactions by providing answers and guidance in the sales process, particularly for complex products being sold to novice customers - Help deskHelp deskA help desk is an information and assistance resource that troubleshoots problems with computers or similar products. Corporations often provide help desk support to their customers via a toll-free number, website and e-mail. There are also in-house help desks geared toward providing the same kind...
: Responding to internal employee questions, e.g., responding to HR questions - Website navigation: Guiding customers to relevant portions of complex websites --a Website concierge
- Technical support: Responding to technical problems, such as diagnosing a problem with a product or device
- Personalized service: Conversational agents can leverage internal and external databases to personalize interactions, such as answering questions about account balances, providing portfolio information, delivering frequent flier or membership information, for example
- Training or education: They can provide problem-solving advice while the user learns
- Simple dialog systems are widely used to decrease human workload in call centreCall centreA call centre or call center is a centralised office used for the purpose of receiving and transmitting a large volume of requests by telephone. A call centre is operated by a company to administer incoming product support or information inquiries from consumers. Outgoing calls for telemarketing,...
s. In this and other industrial telephony applications, the functionality provided by dialog systems is known as interactive voice responseInteractive voice responseInteractive voice response is a technology that allows a computer to interact with humans through the use of voice and DTMF keypad inputs....
or IVR.
In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as embodied agents
Embodied agents
In artificial intelligence, an embodied agent, also sometimes referred to as an interface agent, is an intelligent agent that interacts with the environment through a physical body within that environment. Agents that are represented graphically with a body, for example a human or a cartoon...
.
Toolkits and architectures
A survey of current frameworks, languages and technologies for defining dialog systems.Name & Links | System Type | Description | Affiliation[s] | Environment[s] | Comments |
---|---|---|---|---|---|
AIML AIML AIML, or Artificial Intelligence Markup Language, is an XML dialect for creating natural language software agents.- Background :The XML dialect called AIML was developed by Richard Wallace and a worldwide free software community between the years of 1995 and 2002... |
Chatterbot Chatterbot A chatter robot, chatterbot, chatbot, or chat bot is a computer program designed to simulate an intelligent conversation with one or more human users via auditory or textual methods, primarily for engaging in small talk. The primary aim of such simulation has been to fool the user into thinking... language |
XML dialect for creating natural language software agents | Richard Wallace Richard Wallace (scientist) Richard Wallace is the author of AIML and Botmaster of ALICE . Dr. Wallace's work has appeared in the New York Times, WIRED, CNN, ZDTV and in numerous foreign language publications across Asia, Latin America and Europe.Richard Wallace was born in Portland, Maine in 1960. He earned his Ph.D... |
||
CSLU Toolkit CSLU Toolkit The CSLU Toolkit is a software library comprising a comprehensive suite of tools that enable exploration, learning, and research into speech and human-computer interaction.The tools include:* Audio* Display* Speech recognition* Speech generation... |
a state-based speech interface prototyping environment | OGI School of Science and Engineering OGI School of Science and Engineering The OGI School of Science and Engineering, located in Hillsboro, Oregon, United States is one of the four schools of the Oregon Health and Science University . Until June 2001, it functioned independently as a public graduate school, the Oregon Graduate Institute . OGI operates four departments and... M. McTear Ron Cole |
publications are from 1999. | ||
VXML Voice XML |
Spoken dialog | multimodal dialog markup language | developed initially by AT&T AT&T AT&T Inc. is an American multinational telecommunications corporation headquartered in Whitacre Tower, Dallas, Texas, United States. It is the largest provider of mobile telephony and fixed telephony in the United States, and is also a provider of broadband and subscription television services... then administered by an industry consortium and finally a W3C World Wide Web Consortium The World Wide Web Consortium is the main international standards organization for the World Wide Web .Founded and headed by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together in the development of standards for the... specification |
Example | primarily for telephony. |
SALT Speech Application Language Tags Speech Application Language Tags is an XML based markup language that is used in HTML and XHTML pages to add voice recognition capabilities to web based applications.-Description:... |
markup language | multimodal dialog markup language | Microsoft Microsoft Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions... |
"has not reached the level of maturity of VoiceXML in the standards process". | |
Quack.com Quack.com AOLByPhone was an AOL interactive voice service that began in 2000. It was offered to millions of consumers. AOLByPhone started with the America Online acquisition of Quack.com, evolving through the subsequent relaunching of Quack.com's Voice Portal as AOLByPhone. AOLbyPhone expanded as AOL... - QXML |
Development Environment | company bought by AOL AOL AOL Inc. is an American global Internet services and media company. AOL is headquartered at 770 Broadway in New York. Founded in 1983 as Control Video Corporation, it has franchised its services to companies in several nations around the world or set up international versions of its services... |
External links
- The Conversational Interface: Our Next Great Leap Forward, 2003, article on the CI by futurist John SmartJohn Smart (futurist)John Smart is a futurist and scholar of accelerating change. He is founder and president of the Acceleration Studies Foundation, an organization that does “outreach, education, research, and advocacy with respect to issues of accelerating change.”. Smart has an MS in futures studies from the...
. - Dialogue Processing in Spoken Language Systems
- Voice User Interface Design
- Spoken Dialogue Technology: Towards the Conversational Interface
- Machine Conversations
- Dialog system at BookRags.
- Dialogue system links by Staffan Larsson.
- Machine learning approaches to building spoken dialogue systems: the CLASSiC project.