Voice
Voice recognition technology converts human speech into electrical
signals and transforms these signals into coding patterns with
assigned meanings. Voice terminals shine as automated input
devices in applications where an operator's hands and eyes are
occupied, enabling source data capture in real time.
Workers typically wear a microphone/speaker headset connected
to a unit that recognizes spoken words and converts them into
analog electrical signals. The analog signals are converted
to digital patterns, which are decoded or "recognized"
by template-matching or feature analysis. The data output may
be entered into a program or it may activate a range of computer-based
equipment such as scales, programmable logic controllers, or
printers.
In "dialog" voice recognition systems, the unit recognizes
human speech and then synthesizes a spoken response (or plays
back a digitized response) to verify input and/or prompt the
operator through a series of tasks.
Most voice systems are speaker-dependent, trained to recognize
an individual voice that has previously read a vocabulary into
the system. Speaker-trained systems recognize accents, dialects,
and work-specific vocabulary, and offer the highest accuracy
rates (under ideal conditions, error rates equal about 1 percent).
Speaker-independent systems understand words prerecorded by
an average pool of speakers; the system "remembers"
words and attempts to match its limited vocabulary with words
spoken by any new user.
Discrete speech processing is the most commonly used
speech recognition technology. The operator speaks only one
word at a time, or pauses briefly after each word of a phrase.
Chances for false recognition are minimized, verification is
easier, and accuracy rates are consequently high. Discrete speech
is preferable when a large vocabulary is required or when there
is considerable background noise in the environment.
Continuous speech processing is more natural and less
tedious because it allows users to speak at a normal speech
rate. However, continuous speech systems are more susceptible
to false recognition and are less tolerant of background noise
than discrete processing systems. Continuous speech was once
significantly more expensive, but prices have dropped dramatically
in recent years. Ongoing advances in speech recognition software
as well as leaps in hardware development have propelled speech
recognition as an up-and-coming AIDC technology in a range of
industries.
Voice Identification
The spoken word is the way most people communicate.
Voice Data Collection (also called Voice Data Entry) requires
no special printed or encoded symbols, no exotic-looking equipment,
nothing much more intimidating than a telephone headset. It
is also the only technology that is generally trained to the
way a human works rather than requiring the human to learn the
machine's way of doing things. And because speaking doesn't
require the use of hands it is ideal for jobs requiring the
worker's hands to be free. Inspection and baggage handling are
two common applications.
Types of Systems
There are two ways voice data entry systems can be differentiated:
speaker-dependent or speaker-independent, and discrete or continuous
recognition.
Typically, speaker-independent systems are preprogrammed to
recognize a limited vocabulary, such as the digits 0 through
9. Because human speech is so varied, it is not economically
feasible, at this time, to create a speaker-independent system
with a very large vocabulary.
Speaker-dependent systems rely on the operator to train the
system in the words it is to recognize. This training makes
the system less sensitive to external noises and other voices.
It also allows non-English speaking employees to recite the
list of words in his or her native language and have those words
recognized for their English equivalents. Discrete recognition
systems require the speaker to pause between words and to break
numbers down into individual digits. On the other hand, continuous
recognition systems don't require such precision in speech.
People tend to run words together, as in "serialnumber."
Continuous recognition systems can be trained to recognize this,
as well as the number "nineteen seventy-seven oh forty-three."
Grammars
To provide greater system flexibility without placing undue
burdens on acceptable vocabulary, some systems use grammars
to allow different meanings for the same word or to help differentiate
between similar-sounding words.
Portable Systems
Portable systems, sometimes including bar code or other ADC
technologies, take Voice Data Collection into the factory floor,
storage yards or other locations where a hardwired system just
can't go.
Many systems offer Radio Frequency Data Communications interfaces
to provide real-time entry and interactive prompts from the
host. As processing speed and memory capacity inevitably increase,
voice recognition's ability to transfer the spoken word into
electronically transmittable data will likewise grow in breadth
and accuracy, enabling the technology to better capture the
easiest and most intuitive form of input there is, the human
voice.
Mobility is key to many industrial voice applications, and
the combination of voice recognition and RFDC is enabling rapid
growth of systems that are maximizing productivity where wired,
optical-based systems simply cannot effectively operate. With
decreasing technology costs and increasing accuracy rates, voice
recognition is poised for widespread data capture use, from
the desktop to the factory floor.
Common Applications
Voice recognition is commonly used in the automotive industry
for various manufacturing and inspection applications. It is
also used in warehousing and distribution to track material
movement in real time, in the transportation industry for receiving
and transporting shipments, in laboratory work, and in inspection
and quality control applications across all industries.
Currently, voice recognition engines provide better accuracy
rates than keyboard input, but they are not as accurate as barcode
input. Audio feedback improves application accuracy, but waiting
for confirmation decreases the speed of application, which is
one of voice recognition's key advantages. Voice recognition
serves the needs for speedy, hands-free, eyes-free, real-time
input very well. Where both speed and near-100 percent accuracy
are critical, voice recognition should probably be combined
with barcode.
Reprinted with permission from AIM, Inc.
www.aimglobal.org
Back to Index