by Chancey Fleet
From the Editor: Many of us who are blind have learned to do most tasks in our life without the aid of vision. We know that there are alternative techniques, we are proud to use them, and we treasure the independence they give us. Even so, there are times when we find the services of someone who can see very helpful. Maybe we need that vision for five seconds or for two minutes or for two hours.
Traditionally we have met this need in several ways. Sometimes we wait until we have enough tasks in which sight would be helpful and then pay or get a volunteer to help us. Some of that waiting can seem long when we want what we want or actually need it sooner than the next scheduled visit from a person who can see. Even if we have someone with sight living in our house, we get tired of asking, and they get tired of all the questions, so some mutually-acceptable asking and granting must be agreed upon.
Chancey Fleet offers an answer to this dilemma, which she calls visual interpretation. She is a tech educator for a New York City library and understands as much about technology for the blind as anyone I know. She knows most of the apps that can help us and has a gift for explaining what they are and what they do. Here is her brief overview of some of them:
If you’re a blind person with a smartphone or tablet, you can use it to get visual information on demand. This genre of service is relatively new and can go by many names: you might hear it called remote visual assistance or crowdsourced vision. Personally, I prefer the phrase “visual interpretation” because it precisely names the process of turning visual information into something more useful and because the concept of an interpreter is familiar to people in many walks of life.
Working with a remote visual interpreter can be liberating: you decide what your interpreter can see, when the interaction begins and ends, and whether you need a second (or third, or tenth) opinion. A virtual interpreter can’t touch anything in your environment, so you can’t be tempted to abandon a task that is “too visual” to someone else’s hands. Remote visual interpretation can be an empowering option when you’d rather limit the extent of your interactions with the public, avoid turning friends and colleagues into de facto describers, or when no one around you is available to give you the information you need.
A variety of apps provide remote visual interpretation. Although they vary in price, functionality, and whose eyes are on the other end of the connection, there are some things you’ll want to consider when you use any of them.
Know your camera: it’s important to understand how lighting conditions, glare, angle, orientation, and distance affect your camera. If you’re not familiar with these concepts, Judy Dixon’s National Braille Press book Getting the Picture is an excellent introduction. You can even use remote visual interpretation apps themselves to get feedback about how effectively you’re using your camera.
Think about how you want to listen: these apps are going to talk to you, whether using voice-to-voice connection or text and audio messages. If you plan on using them in public, invest in comfortable headphones. Unless you’re an EMT or a trucker, two-way speakerphone interactions are almost never OK in shared spaces, and you do not get a pass on this article of the social contract just because you’re borrowing some eyes. Besides, headphones will help you hear more clearly. If you’re traveling outdoors, attending an event or tour, or otherwise engaging with these apps in a context where it makes sense to stay maximally aware of the soundscape around you, you can use a single Bluetooth earpiece or go with the timeless budget strategy of just wearing one earbud. If you like balanced sound and prefer nothing blocking your ears, check out the AfterShokz line of bone-conduction headphones (available in wired and Bluetooth models). They rest on the bones just in front of your ears and let you hear your phone or tablet’s audio without blocking what’s going on around you. Whatever you do, read some online reviews before you buy, or take a friend’s headphones for a test drive: people tend to feel deeply about their audio gear, and no one choice is right for everybody.
Don't run out of gas: live video connections and multimedia messaging will drain your battery and deplete your data plan. If you’re counting on all-day access to your tablet or phone and plan on using these apps, carry a backup battery. Once you’ve started using apps that involve multimedia messaging or live video connections, check your phone or tablet’s data usage statistics on a regular basis to make sure that you’re not approaching your data limit; and, when you can, save your data by connecting to a wireless network.
Free your hands: if you’ll be sorting things, assembling something, or taking your own picture, check out your environment to see if you can use a box, a ledge, or some other stable resting place to set up your phone so that its camera covers the area you need. Depending on your typical workflow, you may consider investing in a document camera stand, clip case or tripod case to keep your phone where you want it.
Protect your privacy: even premium apps that rely on paid interpreters might be subject to unsecured networks, data breaches, or human error. Think critically before exposing sensitive personal information to any internet-connected camera, ever.
Work smarter, not harder: take a moment to reflect on what you want to accomplish before you start, and make a little bit of a plan. If you’re looking out for some spices or sorting the mail, having a Braille labeler or some other system on hand will help you capture the information you receive so you won’t have to ask for it again. If you’re learning something complicated (like what button does what on the office copier, the layout of a new neighborhood, or a thirty-step origami project), taking notes or making a recording will empower you to read or hear the information whenever you like until you have it by heart.
Look past the marketing: blind and sighted people are still learning how to talk about these apps, and you are guaranteed to come across marketing materials and news stories that don’t exactly strike the chimes of freedom. When that happens, I’d recommend constructively engaging with the content, whether by posting a comment, dropping a line to the developer, or using social media to tell your own story about visual interpretation. Don’t abandon the tool just because you found it in a tacky package.
As with any technology, apps for visual interpretation come and go. Here are my top four sighted sources right now.
How it works: snap a picture or upload one from your camera roll, and a combination of machine vision and crowdsourced web workers will send you a quick description. Typically, your answer arrives within twenty seconds and is short enough to fit on a fortune cookie.
When it shines: for the simple things. TapTapSee is great at identifying products and describing photos in brief. I use it on a daily basis to sort and label mystery items in my home and office, get real-time feedback about the photos I’m taking, and double check that my pen has ink and my handwriting is legible. TapTapSee descriptions are text-based messages that can be read with magnification, speech, or Braille.
How it works: take one or more pictures, or upload them from your camera roll. Type or record a question, and listen for text and audio replies to come rolling in from sighted volunteers over the course of twenty minutes or so.
When it shines: for rich detail, diverse opinions, and a nuanced understanding of what different people notice when they look at an image. I use BeSpecular to ask for detailed descriptions of clothing and jewelry, ideas about what to wear with what, guidance in picking the “best” photo from a set, and impressions of photos and objects that are important to me. Once I’ve heard five or six different takes on the same image and question, I can find the patterns of consensus and divergence among the responses and arrive at my own informed understanding of the image. BeSpecular finds a happy medium between the brevity of TapTapSee and the live connection used by other apps. There’s something special about BeSpecular’s format of long-form questions and answers. Outside the rhythm of a live conversation, BeSpecular answers almost feel like postcards from a sighted correspondent passing briefly through your life. They’re often full of detail, personality, and emotions like surprise and humor. Once, while delayed on a train at Union Station in Washington, DC, I asked BeSpecular to relieve my boredom by describing the scene outside my window. One respondent sent me an audio reply that explained, in a tone that was equal parts delighted and chagrined, that I had unfortunately sent her the most boring view she had ever seen. It was one train car, an empty John Deere forklift, and a cloudy sky.
How it works: connect to a sighted volunteer who speaks your language and have a conversation about what they see through the lens of your camera.
When it shines: for exploring, sorting, and troubleshooting. Every time I arrive at a new hotel, I check in with BeMyEyes to take the decaf coffee pods out of play, sort the identical little bottles in the bathroom, and learn the thermostat and media controls. I also use it to find out which food trucks are parked on the streets near my office, decipher mystery messages on computer screens, and grab what I need from my local bodega. Since BeMyEyes is powered by volunteers, I try to make the interaction upbeat and fun and let the person I’m working with decide whether they’d like to bow out of a long task after a certain amount of time. There are just over a half a million sighted volunteers and about 35,000 blind users currently registered with the service, so you can call as often as you like without fear of bothering the same person over and over. The system will always connect you to someone for whom it is a reasonable hour, so Americans calling late at night or early in the morning will be connected to wide-awake people in Europe and Australia. Since the volunteer base is so large, you’re likely to get through to someone quickly even when lots of other blind users are connecting.
How to pronounce it: it’s a hard I, so pronounce it as “Ira.”
How it works: use your phone’s camera or a Google Glass wearable camera to connect with a live agent. Agents can access the view from your camera, your location on Google Maps, the internet at large, and your “Dashboard,” which contains any additional information you’d like placed there.
When it shines: for tasks that are long, context-dependent, or complex. An Aira agent can start from any address, use Google Streetview to find a nearby restaurant, glance at online photos to clue you in to whether it’s upscale or casual, suggest and explain the best walking directions to get there, read the daily specials when you arrive, and show you where to sign and tip on the check when you’re ready to leave. Agents have watched and described completely silent YouTube videos with me so that I could learn origami models, counted heads in my local NFB chapter meeting, described 20 minutes of nothing but socks until I found the perfect sock souvenir, read online guitar tabs for me so I could write them down in my own notation, helped me pick out nail polish, and taken spectacular photos through my camera for my personal and professional social media accounts. Aira agents are great at reading handwriting, diagrams and product manuals that seem to have as many pictures and icons as words. When I can’t read something with OCR, Aira can almost always help.
Aira agents are paid, trained professionals. Most of them are unflappable, effective describers who are up for any challenge. Since you pay for their time, you should feel comfortable about asking for what you need, being assertive about the type of descriptive language that works for you, and calling whenever the need arises.Like any new technology, remote visual interpretation solves old problems and creates new ones. To use it well, we need to understand what it requires in terms of power, data, planning, and effective communication. We must employ it with sensitivity to our own privacy and to the legitimate concerns that people sharing space with us may have about cameras. Just as each of us makes different decisions about when and how to use a screen reader, the descriptive track of a movie, or a sighted assistant in daily life, each of us will have our own ideas and preferences about how visual interpreters fit into our lives. Blind and sighted people working together are just beginning to discover how to use language, software, and hardware in ways that employ visual interpretation to our best advantage. Collectively, we still have a lot to learn. The journey is long, but the view is phenomenal.