24 May 2013 by MacGregor Campbell
OK computer (Image: Adam Mitchinson/Getty)
FAMILY lore has it that my wife's
grandmother Cleo once had a problem with a mouse. When she first sat
down to learn how to use a computer, her four sons crowded around,
peppering her with advice. "Move the pointer to the left," one said. She
moved the mouse to the left. "Now move it to the right." No problem.
"Now move the pointer up." Cleo lifted the mouse off the table,
surprised that the cursor refused to follow.
It sounds silly, but her instinct was
natural. To make our computers understand us, we've had to squish our
intuitive body movements into the two-dimensional plane of the mouse or
touchscreen. That's about to change.
From public kiosks to your living
room, this year everyday computers will begin to understand our gestural
vocabulary with unprecedented precision, right down to ultra-fine
finger movements. How will this change the way we interact with the
digital world? Some advocates claim the mouse and keyboard will become
obsolete. That's just hype. Gestural computing's more interesting impact
will be how it changes us – from the new language of fist bumps, finger
shapes and hand signals we may be asked to learn, to an affliction
called "gorilla arm". In turn, our behaviour will shape the technology's
evolution. It's time to wave goodbye to our old notions of how we
navigate digital space.
The first computers able to recognise human gestures
emerged in the 1970s, when researchers equipped people with batons or
wearable accelerometers. The crude resolution of these technologies
stopped them taking off. Still, limited bodily gestures in two
dimensions were incorporated into personal computers: using a mouse to
drag a scroll bar or double-click on a desktop icon required a physical
motion with the hand and arm, rather than typing code. Multi-touch
screens added extra moves to our gestural lexicon: we learned that
spreading apart a pinched finger and thumb on a screen, for example,
zoomed into a photo or a map.
Until very recently, however, most
hand and body language was invisible to computers. While arm, leg, and
torso positions can be used to control video games – thanks to
depth-sensing technology like Microsoft's Kinect – our computers,
televisions and other devices have largely stuck to more traditional
means.
Yet this July, many are anticipating big things from a device made by Leap Motion,
a company based in San Francisco. It will launch an $80 box that can be
plugged into most computers, with the ability to track ultra-fine hand
and finger movements. The company has not disclosed exactly how it
works, but by using a combination of infrared and optical cameras with
clever software, the Leap can detect gestures to a resolution of less than a millimetre,
inside a half-metre-cubed region of air. Leap Motion's app store called
Airspace will launch at the same time, with a host of
gesture-controlled software ranging from music to painting programs.
Read the signals
Many think that it won't be long before high-precision gesture detection can span entire rooms. For example, Jan Zizka and Alex Olwal of the Massachusetts Institute of Technology's Media Lab have developed SpeckleSense,
a device that uses laser-speckle – subtle patterns caused by light
waves of the same frequency interfering with one another – to track
motion with far greater precision and range than the likes of Kinect.
All of a sudden, then, the fidelity of
gestural language that computers can understand is poised to expand
significantly. "The hands have become free of instruments, so the
gestures that were always there now emerge into the daylight," says Jacob Wobbrock,
a human-computer interaction researcher at the University of Washington
in Seattle. "There is no question that a gesture vocabulary of sorts
will enter further into the psyche of today's and future computer
users."
So what kinds of 3D gestures might we
pick up in the next few years? After all, the reverse-pinch gesture for
touchscreens had to be learned – Apple even patented the move (see "Patently absurd?")
– and if you demonstrated the movement to somebody only a decade ago,
they would have had no idea what it meant. Might there be similar hand
movements that we will use to trigger specific commands?
The Leap comes equipped with the
ability to recognise a few basic gestures like "key tap", a
single-finger tapping movement which might be used to bring up a
keyboard on screen, for instance. And independent app developers are
training it to recognise their own gestures, such as "thumbs-up", which some have used as a command to "like" a Facebook post.
More clues about our future body language
can be found from looking at gesture-recognition prototypes developed
in the lab over the past few years. Designers have come up with a range
of hand and arm commands (see diagram).
These efforts suggest that the best
gestural commands are novel moves that the user must do quite
deliberately – otherwise they risk accidentally triggering something on
screen. "We look for gestures that are easy to do, but that aren't used
in normal communication," says Hrvoje Benko
at Microsoft Research in Redmond, Washington. One of these gestures is
to pinch all four fingers together with the thumb to grab the air, for
example. This motion might allow you to drag an object, like a file,
across a screen.
Other researchers have experimented
with gestures for resizing or rotating. Spreading two raised fists apart
can be used to zoom into the screen using some versions of Kinect, for example, or turning a flat hand clockwise or anticlockwise will rotate an image.
Memory games
Which of these myriad gestures will
catch on is unclear, but we can be confident that there is a limit to
how many we'll be able to usefully remember. Gesture sets of 10 or more
might place a prohibitive strain on memory. "Once you go past a few
basic gestures, it gets really confusing," says Chris Harrison of the
Human-Computer Interaction Institute at Carnegie Mellon University in
Pittsburgh, Pennsylvania.
Finding a way to design a gesture
vocabulary that is useful for complex interaction, but simple enough to
remember, is an open challenge, says Jamie Zigelbaum, a designer based in Cambridge, Massachusetts. In 2009, Zigelbaum and colleagues at the MIT Media Lab designed a catalogue of gestures
for a hand-recognition device called g-speak, which is built by Oblong
Industries in Los Angeles. Their gesture set allowed users to browse, view, and organise video clips using 20 commands.
Some moves were fairly straightforward, such as holding the hand in the
shape of a gun to point and select, but many involved positioning the
arms in strange arrangements, and led to criticism that the commands
were too difficult to learn.
One way around this memory issue is to
train people as they go, rather than ask them to learn gestures from
diagrams or videos. One of Benko's projects, called LightGuide,
helps with this problem by using a ceiling-mounted projector to beam
visual instructions onto a person's body. The system displays arrows
directly onto the hands, say, guiding them to the correct positions.
Or you could let people customise
their own gestures – be it for switching off a computer or turning down
the volume on a television. In a recent experiment, Miguel Nacenta and
colleagues at the University of St Andrews, UK, taught one group to use
16 predefined gestures, while another group got the opportunity to
invent 16 hand movements themselves. The following day, participants who
had designed their own gestures were able to recall up to 44 per cent more of them.
Another human constraint that will
shape the development of our gestural vocabularies is the physicality
required. The movements will have to be something that people can do
repeatedly, over long chunks of time.
In the early days of human-computer
interaction using touchscreens, researchers identified an affliction
that they dubbed "gorilla arm" – in which one's arm feels heavy after
waving it about for too long. It is no problem for phones or tablets
sitting on your lap, but the ache soon strikes for any device that
requires you to reach out the arms continually – a wall-mounted screen,
for instance. Gestural systems that require expressive hand movements in
the air, then, are likely to cause a lot more gorilla arms, so more
subtle moves may well come to rule.
The same physicality that can make
gestures tiring can also make them inappropriate in certain settings.
For example, in a 2010 experiment conducted by Adam Fourney at the University of Waterloo, in Ontario, Canada, presenters used a gesture-based slide-show system
in a classroom for two weeks. They could use gestures both to navigate
back and forth through the slides, and interact with slide content by,
for example, zooming into figures, and highlighting and expanding bullet
points. Yet students said they preferred the presenters to use a remote
control to change slides, not gestures, because it was distracting.
They did, however, approve of gestures to interact with the slide
content itself – pointing at a diagram, for example – possibly because
these gestures aren't too removed from movements that presenters already
make, Fourney suggests.
Another factor shaping our nascent
gestural vocabulary will be that we tend to look silly waving our hands
around in the air. One sociological study of how families use Kinect
games in the home, by Richard Harper
and Helena Mentis of Microsoft Research in Cambridge, UK, suggests that
the fun comes from participants laughing at one another as they contort
their bodies. While technology has changed social norms before, having
to perform a similar dance routine might not be so desirable in settings
like the workplace. "It would force us to use our bodies like a
ballerina uses hers. With exceptional control, strength and discipline,"
says Harper. "That would be exhausting to the spirit."
So rather than killing off the
keyboard or mouse – which remain hard to beat for some tasks – the
gestures that catch on will become incorporated into the multifaceted
language we use to communicate with computers. Writing an essay? Use a
keyboard. Moulding an object for your 3D printer, or sorting through
files? Fingers and hands may be better. Human-computer interaction
researchers nowadays call this "multi-modal" interaction. "When a new
mode of interaction comes to life, it doesn't kill off the other ones.
It extends the possibilities, makes new interactions possible," says
Benko.
The true impact of gestural computing,
then, will be that it adds a channel of communication we've never been
able to use before. We have always had myriad ways to convey meaning to
fellow human beings – be it voice, text or body language – but until
now, our computers have been blind to many of these cues. When Grandma
Cleo lifted her mouse off the desk, it made perfect sense. If she had
lived to see it, she may well have appreciated this moment in time in
which machines are finally coming to understand our language, instead of
us struggling to understand theirs.
Correction: Since this article was first published on 22 May 2013, the price of the Leap Motion box has been updated.
The elevator pitch
In the lobby of Microsoft Research in Redmond, Washington, there's an elevator that reads you like a book.It is equipped with a camera that peers at people in front of its doors. When someone approaches, it will open – but only when it senses that the person is looking to use it. The system has processed many hours of video footage of people mingling in the lobby and has learned to distinguish between someone intending to use it and someone just walking by.
As computers come to recognise ever-more-detailed gestures (see main story), they will be able to infer more about us. Other researchers have programmed computers to use body language to infer a person's mood, be it happy, angry or sad. Such "emotionally intelligent" machines would better respond to our needs.
So if you are slouching in front of a screen, bear in mind that a computer may soon be watching.
Patently absurd
It's hard to believe that waving our arms, hands or fingers could spark heated patent litigation, but if the history of the two-dimensional gestures on touchscreens is any precedent, such a fate awaits 3D gesture interfaces, too.A battle over 2D gestures began after a Silicon Valley party in the early 2000s, when Apple's CEO Steve Jobs got riled by a Microsoft engineer boasting about its stylus-controlled touchscreen Tablet PC. Jobs ordered his engineers to build their own touch interface, but he was adamant that it would use only hand gestures. They came up with specific moves like pinch-to-zoom, tap-to-zoom and swipe-to-unlock – which Apple quickly filed patents to protect.
This sparked a gestural-patent landgrab. For instance, Google applied for a patent on a text-recognition gesture involving underlining words in a picture with a swipe, while Nokia's gesture patent applications included circular or oval swipes, with the size of the circle or oval dictating the degree to which the screen zooms in on an image.
Attempts to patent 2D gestures ultimately failed. Apple recently took Samsung to court over the latter's use of the pinch-to-zoom and swipe-to-unlock movements. In the end, the US Patent and Trademark Office ruled Apple's patents invalid on the grounds that earlier inventions used the ideas.
An optimist might think that this would discourage a similar landgrab over 3D gestures. Alas, it hasn't. Microsoft holds patents for Kinect that cover flicks of the hand to scroll on screen, or gestures that call up a search box. And Intellectual Ventures of Bellevue, Washington, has filed a patent on a way to control a television that includes a raised "flat-hand" gesture to get its attention. The company is notorious for aggressively protecting its intellectual property too.
Litigation that tied the tech industry in knots for the past few years looks set to be repeated and, as usual, the only winners will be the lawyers.
No comments:
Post a Comment