This article re-introduce multimodal interactions by separating ten myths (empirical knowledge) from reality.
(What is multimodal systems? – Multimodal systems process combined natural input modes – such as speech, pen, touch, hand gestures, eye gaze, and head and body movements – in a coordinated manner with multimedia system output.)
More Sentences & Key Points
Myth 1: If you build a multimodal system, users will interact multimodally.
Users like being able to interact multimodally, but they don’t always do so. Their natural communication patterns involve mixing unimodal and multimodal expressions, with the multimodal ones being predictable based on the type of action being performed.
Myth 2: Speech and pointing is the dominant multimodal integration pattern.
… any multimodal system designed exclusively to process speak-and-point will fail to provide users with much useful functionality.
Myth 3: Multimodal input involve simultaneous signals.
Although speech and gesture are highly interdependent and synchronized during multimodal interaction, synchrony does not imply simultaneity.
Myth 4: Speech is the primary input mode in any multimodal system that includes it.
Speech is neither the exclusive carrier of important content, nor does it have temporal precedence over other input modes.
Myth 5: Multimodal language does not differ linguistically from unimodal language.
… it recently has been demonstrated that multimodal pen/voice language is briefer, syntactically simpler, and less disfluent than users’ unimodal speech.
Myth 6: Multimodal integration involves redundancy of content between modes.
… actual data highlights the importance of complementarity as a major organizational theme during multimodal communication.
Myth 7: Individual error-prone recognition technologies combine multimodally to produce even greater unreliability.
Due to mutual disambiguation, the paralel recognition and semantic interpretation that occurs in a multimodal architecture can yield a higher likelihood of correct interpretation than recognition based on either single input mode.
Myth 8: All users’ multimodal commands are integrated in a uniform way.
(No.) … multimodal systems that can detect and adapt to a user’s dominant integration pattern could lead to considerably improved recognition rates.
Myth 9: Different input modes are capable of transmitting comparable content
Different modes basically vary in the degree to which they are capable of transmitting similar information.
Myth 10: Enhanced efficiency is the main advantage of multimodal systems
(No. Other benefits include) … task-critical errors and disfluent language can drop… Users’ strong and nearly universal preference… physical overexertion is avoided… substantial error avoidance and easier error recovery… accommodate a wide range of users, tasks, and environments.
- Do multimodal systems require multimodal inputs all the time? Or multimodal input is only useful at particular moments while unimodal input constitutes the base case of interaction? Mixedmodal?
- When adding input techniques, the total is no guarantee the sum of the parts (could be more, could be less);
- How about multi-user-modal interaction – imagine someone telling you on the phone how to user an interface, can both you and the interface listen to your friend’s instructions?