The People's Gaze: Co-Designing and Refining Gaze Gestures with General Users and Gaze Interaction Experts

This paper presents a two-phase methodology that combines co-design workshops with non-expert users and expert refinement to develop a grounded, intuitive set of 32 gaze gestures and design principles for hands-free interaction on gaze-enabled devices.

Yaxiong Lei, Xinya Gong, Shijing He, Yafei Wang, Mohamed Khamis, Juan Ye

Published Mon, 09 Ma
📖 5 min read🧠 Deep dive

Imagine you are holding a smartphone, but you want to control it without touching the screen. Maybe your hands are full of groceries, or perhaps you have a disability that makes using your fingers difficult. You want to use your eyes to tap, swipe, and click.

This is the dream of "gaze interaction." But there's a big problem: Your eyes are always looking at things. If the phone thinks every time you look at a button you want to press it, you'd accidentally delete your photos or close your apps constantly. This is known as the "Midas Touch" problem (everything you look at turns to gold/gets clicked).

This paper, titled "The People's Gaze," is a story about how the researchers solved this problem by mixing two very different groups of people: regular users (like you and me) and eye-tracking experts.

Here is the story of how they built a new language for your eyes.

🎨 Part 1: The "People's" Idea (The Co-Design Workshop)

The researchers gathered 20 regular people who had never designed a gaze gesture before. They gave them a simple task: "If you could control your phone with your eyes, what would you do?"

What the regular people came up with:

  • The "Finger" Metaphor: Most people treated their eyes like a finger. They wanted to "swipe" left to go back or "tap" an icon to open it. They wanted the eye to act exactly like a touchscreen.
  • The "Pause" and the "Blink": Realizing that just looking isn't enough, the regular users intuitively invented a two-step system.
    • Step 1: Stare at something for a moment (like a "hover" on a mouse).
    • Step 2: Blink to confirm the action (like a "click").
    • Analogy: It's like walking up to a door. You don't just walk through it; you pause, check if it's open, and then push. The "stare" is the check, and the "blink" is the push.
  • The SOS Idea: Some people thought about emergencies. "If I'm in trouble and can't use my hands, I should be able to blink five times fast to call for help."

The Result: They came up with 102 different ideas. Many were creative, but some were physically impossible for human eyes to do without getting tired.

🔬 Part 2: The "Expert" Filter (The Peer Review)

Next, the researchers brought in 4 scientists who study how eyes move. They looked at the 102 ideas and said, "Hold on a minute. Let's talk about how your eyes actually work."

The Experts' Reality Check:

  • Eyes are not fingers: Your eyes are built for jumping (saccades), not drawing. You can easily look from point A to point B in a straight line. But trying to draw a perfect circle or a spiral with your eyes? That's like trying to write a perfect cursive 'S' with a pencil while someone is shaking your hand. It's exhausting and inaccurate.
  • The "Reading" Problem: If you ask someone to swipe their eyes from left to right, their eyes might just do that naturally because they are reading a sentence. The phone can't tell the difference between "reading" and "commanding."
  • The Solution: The experts kept the best ideas but tweaked them.
    • They kept the "Stare + Blink" system because it solves the "Midas Touch" problem.
    • They threw away the complex circles and spirals.
    • They kept simple, straight-line movements (like a quick flick left or right) because that's what eyes do best.
    • They added a rule: The more dangerous the command, the harder the gesture.
      • Analogy: Changing the volume should be easy (a quick look). Turning off the phone or deleting a file should be hard (a complex sequence) so you don't do it by accident.

🏆 The Final Result: 32 New Gestures

After mixing the "People's" intuition with the "Experts'" science, they ended up with 32 perfect gestures.

The Big Takeaways (The "Rules of the Road"):

  1. The "Activate-Then-Confirm" Grammar:
    Don't just look and click. Look, wait a split second (activate), then do something else (blink or move) to confirm. This separates "looking" from "doing."

    • Analogy: It's like a safety switch on a power tool. You have to hold a trigger and press a button to make it work.
  2. Respect the Eye's Anatomy:
    Use straight lines and corners. Don't ask the eye to draw loops.

    • Analogy: Ask a horse to run in a straight line, not to dance a waltz.
  3. Match the Effort to the Risk:
    Simple gestures for simple tasks. Complex gestures for dangerous tasks.

    • Analogy: You don't need a heavy, complicated key to open a garden shed, but you need a complex security code to open a bank vault.
  4. Use Familiar Landmarks:
    Start gestures from the edges of the screen (like swiping from the side on a phone) because that feels natural.

🌟 Why This Matters

This paper proves that you don't need to be a scientist to design good technology. Regular people have great instincts about what feels natural. But they need experts to tell them what is physically possible.

By combining User Creativity (the "People's Gaze") with Scientific Reality (the "Expert Filter"), the researchers created a set of eye-gestures that are:

  • Intuitive: You can learn them quickly.
  • Safe: You won't accidentally delete your photos.
  • Comfortable: Your eyes won't get tired.

This is a huge step forward for making phones and computers accessible to everyone, especially those who cannot use their hands. It turns the "Midas Touch" problem into a "Midas Control" solution.