Pen Computing Magazine: Multimodal Data Input

Site Sponsors:


iPhone home \| Windows Mobile \| Palm OS \| All Palms \| Features \| Pen Computers \| Tablet PCs \| Handheld PC Rugged PCs \| Case Studies \| Personal Media \| Digital Cameras \| Editor's Columns \| New Gadgets \| Q&A

Natural Input Solutions
A very promising approach to multimodal digital editing and correction

(Original article November 2001; updated and expanded in February of 2009)

The story goes that the current QWERTY keyboard layout was designed some 150 years ago specifically to slow down typists enough to keep the mechanical keys of 19th century typewriters from jamming. Whether this is true or not, the fact is that the keyboard represents the bottleneck between the human mind and the computer. With computers gaining more and more processing power and becoming ubiquitous in our society, the keyboard bottleneck becomes ever less acceptable. You could argue that this ancient method of communicating with machines is one of the greatest detriments to productivity we are facing today.

As a result, there have been many efforts to develop alternate ways of communicating with computers. The mouse was invented to provide an easier way to navigate a new generation of graphical user interfaces. Handwriting recognition was pursued to provide a more natural way to enter data for people not familiar with computers and keyboards. Speech recognition was seen as a way to simply speak to a computer instead of typing or writing.

Well, the keyboard is still here. And that is because making handwriting and speech recognition work turned out to be more complex than anticipated. However, both handwriting and speech recognition technologies have come of age and are now ready to be used in place of the keyboard.

The primary problem is that people working on these new technologies have spent most of their time getting the basics to work. With handwriting recognition, that is a set of algorithms that result in reasonably accurate recognition under a variety of circumstances. The same applies to speech recognition. The latest speech and handwriting recognition products running on the latest hardware will yield good results to a person willing to learn how to use the product and play by its rules.

Where the picture breaks down is in how these new technologies are used and applied. Recognizing handwriting alone is not enough. The software must also work in the environments we generally use, and it must provide easy and natural ways to edit and correct our work. The same applies to speech recognition. Reasonable accuracy in interpreting commands or even dictation is of little value when corrections and editing still require a mouse and a keyboard.

In addition, these alternate input/editing technologies have generally been treated as separate projects. Handwriting recognition tries to do it all: recognition, editing, and correction. Voice recognition likewise tries to do it all with voice. Thus, in addition to a lack of truly functional editing tools in each technology, each also seeks to be the be-all-end-all in replacing the keyboard.

As far as I am concerned each of these alternative input technologies has some advantages and some limitations. Dictating to a computer can result in faster data entry than typing or writing. But using verbal commands to edit, tap, point, and drag is not optimal. In contrast, the pen represents a clearly superior mobile pointing solution for performing editing commands by drawing gestures or symbols. What we have here are two technologies with complementary strengths and weaknesses that have the potential to be combined to form a revolutionary way of interacting with computers.

One such approach I have seen comes from Natural Input Solutions Inc. of Canada. They recognized this dilemma and developed the prototype of a hybrid solution. After analyzing each aspect of existing handwriting and speech recognition technologies they isolated the respective advantages and disadvantages of each recognition technology and created their Natural Input Software project, a solution that combines the best of pen and speech recognition technologies into a system that is more than the sum of its parts.

It is not easy to describe Natural Input in a sentence or two, or even a full review as what Natural Input does is employ and merge the respective strengths of two alternate input technologies, augmented by a number of pen-based editing and correction features that are more intuitive than anything I have seen to date.

The common "active modal shift" approach

Most pen based editing and correction interfaces on the market today use what is called an active modal shift process to temporarily switch from the default text mode to either the gesture only mode or the punctuation only mode. An active modal shift process involves performing additional steps above and beyond the absolute minimum number of steps required to perform a given editing function. For instance, the absolute minimum number of steps required to insert one or more punctuation symbols involves clicking on the document window to position the text insertion point and then drawing the desired punctuation symbols you want to insert. A commonly used active modal shift process involves performing the following steps to insert one or more punctuation symbols:

Click on the document window to position the text insertion point
Draw a mode shift gesture to temporarily switch the context from the default text mode to the punctuation only mode
Draw one or more punctuation symbols.

After a brief timeout the ink will be interpreted into punctuation symbols that will be inserted at the location of the text insertion point. The user can also perform the following steps to perform an editing function to a block of text:

First select a block of text
Second, draw a mode shift gesture to temporarily switch the context from the default text mode to the gesture only mode, then
Finally, draw the gesture associated to the function you want to perform to the selected block of text.

The additional time it takes to perform the extra step of drawing the mode shift gesture involved in the active modal shift process results in a decrease in the overall efficiency of the editing process. Furthermore, being forced to draw an arbitrary mode shift gesture, before you are allowed to draw one or more punctuation symbols or the desired gesture, is simply not as intuitive as being able to directly draw the punctuation symbol or gesture in question.

Natural Input's "passive modal shift" approach

In contrast, Natural Input Software uses what is called a passive modal shift process to switch between the default text mode and the gesture only mode or the punctuation only mode. A passive modal shift process involves performing only the absolute minimum number of steps required to perform a given editing function. For example, when a user wants to insert one or more punctuation symbols using Natural Input Software, all they have to do is click on the document window to position the text insertion point and then pen down on or very near to the location of the text insertion point and draw the punctuation symbols they want to insert. After a brief timeout the punctuation symbols drawn by the user are interpreted with 100% accuracy, because they are interpreted in a context limited to a set of modified and regularly drawn punctuation symbols each with its own unique shape and orientation. Therefore, Natural Input Software's passive modal shift process manages to achieve the desired 100% accuracy, without having to sacrifice the overall efficiency of the editing process as occurs when using the active modal shift process.

The Natural Input Software gesture window process is a passive modal shift process where the pen-up step involved in terminating the selection process of a block of text is used to open up a gesture window. The user can then perform a variety of editing functions by drawing one of the available thirteen single stroke gestures in the gesture window. A gesture drawn in the gesture window is immediately interpreted with 100% accuracy, because the context of the gesture window is limited to a set of single stroke gestures each with its own unique shape and orientation.

Natural Input Software's correction box has a unique patented one-to-one association between each word attribute icon and each corresponding word in the primary word list that allows users to correct speech, handwriting and keyboard recognition errors in one unified correction box. In contrast, alternative pen based editing and correction interfaces on the market today force the user to open up separate handwriting or speech correction boxes. The primary word list is a list of words selected for correction that represent the best guess of what word the user either dictated, wrote down or keyed in. A word attribute icon is an icon that indicates the method of input speech, handwriting, or keyboard that was used to create the corresponding word in the primary word list.

According to Sean Maxted, President and CEO of Natural Input Solutions, future versions of Natural Input Software will also have a word attribute icon dedicated to words created by OCR (Optical Character Recognition) where the image of the scanned word selected in the primary word list will be displayed in the writing window.

At present a prototype version of Natural Input Software for the Vista operating system is available for download at http://www.naturalinputsolutions.com/download.html

Note: The Vista Version of Natural Input Software is also compatible with the Windows XP Tablet PC Edition, provided that the user first downloads and installs the .NET Framework 2.0 Service Pack.

Contact: Sean Maxted, President and CEO of Natural Input Solutions, Inc., at nis@sympatico.ca

Website: http://www.naturalinputsolutions.com