Multimodal interaction - AbsoluteAstronomy.com

Multimodal interaction provides the user with multiple modes of interfacing with a system. A multimodal interface provides several distinct tools for input and output of data.

Multimodal input

Two major groups of multimodal interfaces have merged, one concerned in alternate input methods and the other in combined input/output. The first group of interfaces combined various user input modes beyond the traditional keyboard

Computer keyboard

In computing, a keyboard is a typewriter-style keyboard, which uses an arrangement of buttons or keys, to act as mechanical levers or electronic switches...

and mouse

Mouse (computing)

In computing, a mouse is a pointing device that functions by detecting two-dimensional motion relative to its supporting surface. Physically, a mouse consists of an object held under one of the user's hands, with one or more buttons...

input/output

Input/output

In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...

, such as speech, pen, touch, manual gestures, gaze and head and body movements. The most common such interface combines a visual modality (e.g. a display, keyboard, and mouse) with a voice modality

Modality (human-computer interaction)

In human–computer interaction, a modality is the general class of:* a sense through which the human can receive the output of the computer * a sensor or device through which the computer can receive the input from the human...

(speech recognition

Speech recognition

Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...

for input, speech synthesis

Speech synthesis

Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...

and recorded audio for output). However other modalities, such as pen-based input or haptic input/output may be used. Multimodal user interfaces are a research area in human-computer interaction (HCI).

The advantage of multiple input modalities

Modality (human-computer interaction)

is increased usability

Usability

Usability is the ease of use and learnability of a human-made object. The object of use can be a software application, website, book, tool, machine, process, or anything a human interacts with. A usability study may be conducted as a primary job function by a usability analyst or as a secondary job...

: the weaknesses of one modality are offset by the strengths of another. On a mobile device with a small visual interface and keypad, a word may be quite difficult to type but very easy to say (e.g. Poughkeepsie). Consider how you would access and search through digital media

Digital media

Digital media is a form of electronic media where data is stored in digital form. It can refer to the technical aspect of storage and transmission Digital media is a form of electronic media where data is stored in digital (as opposed to analog) form. It can refer to the technical aspect of...

catalogs from these same devices or set top boxes. And in one real-world example, patient information in an operating room environment is accessed verbally by members of the surgical team to maintain an antiseptic environment, and presented in near realtime aurally and visually to maximize comprehension.

Multimodal input user interfaces have implications for accessibility

Accessibility

Accessibility is a general term used to describe the degree to which a product, device, service, or environment is available to as many people as possible. Accessibility can be viewed as the "ability to access" and benefit from some system or entity...

. A well-designed multimodal application can be used by people with a wide variety of impairments. Visually impaired users rely on the voice modality with some keypad input. Hearing-impaired users rely on the visual modality with some speech input. Other users will be "situationally impaired" (e.g. wearing gloves in a very noisy environment, driving, or needing to enter a credit card number in a public place) and will simply use the appropriate modalities as desired. On the other hand, a multimodal application that requires users to be able to operate all modalities is very poorly designed.

The most common form of input multimodality in the market makes use of the XHTML+Voice

XHTML+Voice

XHTML+Voice is an XML language for describing multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via XHTML. Auditory components are defined by a subset of Voice XML...

(aka X+V) Web markup language, an open specification developed by IBM

IBM

International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

, Motorola

Motorola

Motorola, Inc. was an American multinational telecommunications company based in Schaumburg, Illinois, which was eventually divided into two independent public companies, Motorola Mobility and Motorola Solutions on January 4, 2011, after losing $4.3 billion from 2007 to 2009...

, and Opera Software

Opera Software

Opera Software ASA is a Norwegian software company, primarily known for its Opera family of web browsers with over 220 million users worldwide. Opera Software is also involved in promoting Web standards through participation in the W3C. The company has its headquarters in Oslo, Norway and is...

. X+V

XHTML+Voice

is currently under consideration by the W3C

World Wide Web Consortium

The World Wide Web Consortium is the main international standards organization for the World Wide Web .Founded and headed by Tim Berners-Lee, the consortium is made up of member organizations which maintain full-time staff for the purpose of working together in the development of standards for the...

and combines several W3C Recommendation

W3C recommendation

A W3C Recommendation is the final stage of a ratification process of the World Wide Web Consortium working group concerning a technical standard. This designation signifies that a document has been subjected to a public and W3C-member organization's review. It aims to standardise the Web technology...

s including XHTML for visual markup, VoiceXML

VoiceXML

VoiceXML is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser,...

for voice markup, and XML Events

XML Events

In computer science and web development, XML Events is a W3C standard for handling events that occur in an XML document. These events are typically caused by users interacting with the web page using a device such as a web browser on a personal computer or mobile phone.- Formal Definition :An XML...

, a standard for integrating XML

Extensible Markup Language

Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

languages. Multimodal browser

Multimodal browser

A multimodal browser is one which allows multimodal interaction for input and/or output - for example, keyboard and voice interfaces. Examples include Opera and NetFront....

s supporting X+V

XHTML+Voice

include IBM WebSphere Everyplace Multimodal Environment, Opera

Opera (web browser)

Opera is a web browser and Internet suite developed by Opera Software with over 200 million users worldwide. The browser handles common Internet-related tasks such as displaying web sites, sending and receiving e-mail messages, managing contacts, chatting on IRC, downloading files via BitTorrent,...

for Embedded

Embedded system

An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...

Linux

Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...

and Windows

Microsoft Windows

Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, and ACCESS Systems NetFront

NetFront

NetFront Browser is a mobile browser for embedded devices, developed by Access Co. Ltd. of Japan, and was designed to function as an embedded browser....

for Windows Mobile

Windows Mobile

Windows Mobile is a mobile operating system developed by Microsoft that was used in smartphones and Pocket PCs, but by 2011 was rarely supplied on new phones. The last version is "Windows Mobile 6.5.5"; it is superseded by Windows Phone, which does not run Windows Mobile software.Windows Mobile is...

. To develop multimodal applications, software developer

Software developer

A software developer is a person concerned with facets of the software development process. Their work includes researching, designing, developing, and testing software. A software developer may take part in design, computer programming, or software project management...

s may use a software development kit

Software development kit

A software development kit is typically a set of software development tools that allows for the creation of applications for a certain software package, software framework, hardware platform, computer system, video game console, operating system, or similar platform.It may be something as simple...

, such as IBM WebSphere Multimodal Toolkit, based on the open source

Open source

The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...

Eclipse

Eclipse (software)

Eclipse is a multi-language software development environment comprising an integrated development environment and an extensible plug-in system...

framework

Software framework

In computer programming, a software framework is an abstraction in which software providing generic functionality can be selectively changed by user code, thus providing application specific software...

, which includes an X+V

XHTML+Voice

debugger

Debugger

A debugger or debugging tool is a computer program that is used to test and debug other programs . The code to be examined might alternatively be running on an instruction set simulator , a technique that allows great power in its ability to halt when specific conditions are encountered but which...

, editor

Source code editor

A source code editor is a text editor program designed specifically for editing source code of computer programs by programmers. It may be a standalone application or it may be built into an integrated development environment ....

, and simulator.

Multimodal input and output

The second group of multimodal systems presents users with multimedia displays and multimodal output, primarily in the form of visual and auditory cues. Interface designers have also started to make use of other modalities, such as touch and olfaction. Proposed benefits of multimodal output system include synergy and redundancy. The information that is presented via several modalities is merged and refers to various aspects of the same process. The use of several modalities for processing exactly the same information provides an increased bandwidth of information transfer
. Currently, multimodal output is used mainly for improving the mapping between communication medium and content and to support attention management in data-rich environment where operators face considerable visual attention demands.

An important step in multimodal interface design is the creation of natural mappings between modalities and the information and tasks. The auditory channel differs from vision in several aspects. It is omnidirection, transient and is always reserved. Speech output, one form of auditory information, received considerable attention. Several guidelines have been developed for the use of speech. Michaelis and Wiggins (1982) suggested that speech output should be used for simple short messages that will not be referred to later. It was also recommended that speech should be generated in time and require an immediate response.

The sense of touch was first utilized as a medium for communication in the late 1950s. It is not only a promising but also a unique communication channel. In contrast to vision and hearing, the two traditional senses employed in HCI, the sense of touch is proximal: it senses objects that are in contact with the body, and it is bidirectonal in that it supports both perception and acting on the environment.

Examples of auditory feedback include auditory icons in computer operating systems indicating users’ actions (e.g. deleting a file, open a folder, error), speech output for presenting navigational guidance in vehicles, and speech output for warning pilots on modern airplane cockpits. Examples of tactile signals include vibrations of the turn-signal lever to warn drivers of a car in their blind spot, the vibration of auto seat as a warning to drivers, and the stick shaker

Stick shaker

A stick shaker is a mechanical device to rapidly and noisily vibrate the control yoke of an aircraft to warn the pilot of an imminent stall...

on modern aircraft alerting pilots to an impending stall.

Invisible interface spaces became available using sensor technology. Infrared, ultrasound and cameras are all now commonly used. Transparency of interfacing with content is enhanced providing an immediate and direct link via meaningful mapping is in place, thus the user has direct and immediate feedback to input and content response becomes interface affordance (Gibson 1979).

External links

W3C Multimodal Interaction Activity
XHTML+Voice Profile 1.0, W3C Note 21 December 2001

Hoste, Lode, Dumas, Bruno and Signer, Beat: Mudra: A Unified Multimodal Interaction Framework, In Proceedings of the 13th International Conference on Multimodal Interaction (ICMI 2011), Alicante, Spain, November 2011.

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Multimodal input

Multimodal input and output

See also

External links