Google and Aira announce an AI visual interpretation pilot, and accessible kitchen appliances

Welcome to the twenty-seventh episode of Access On, the National Federation of the Blind's Technology podcast.

Episode

Listen to the twenty-seventh episode of the Access On podcast (Browser).

Or listen on your preferred podcast platform.

Timestamps

The show is segmented by chapter, making it easy to move between segments of the podcast if you have an app or player that supports chapters. Below is what's on the show this week, and when you can hear it.

  • Register for our four-hour podcasting seminar 0:00
  • Troy Otillio discusses the AI partnership between Aira and Google 2:12
  • More on accessible appliances 54:37
  • Tech tip, Victor Reader Stream 58:15.340
  • Closing and contact info 1:00:52

Transcript

Speaker 1: Live the life you want.

Speaker 2: Access On.

Jonathan Mosen: Welcome to Access On, the technology podcast of the National Federation of the Blind. At its Google I/O keynote, Google announced a visual interpretation AI partnership with Aira. Aira's CEO Troy Otillio describes the genesis of the partnership and what blind people can expect. There's more listener feedback on the complex task of choosing accessible kitchen appliances. And our tech tip features working with suspend mode in Humanware's latest Victor Reader Stream.

It's Jonathan Mosen at the Jernigan Institute in Baltimore, Maryland. Welcoming you to episode 27 of the podcast. And speaking of podcasts, we've got something for you if you're interested in learning about podcasting. Have you ever thought that you'd love to give podcasting a try, but just don't have a clue about how to get started? Is there a subject you're passionate and knowledgeable about that you want to share with the world? Are you a gifted orator who can capture people's attention with a monologue?

Are you interested in interviewing people either in the same room or across the world? We have a four-hour podcasting seminar. It's on June the 11th from 1:00 until 5:00 PM Eastern. I'm leading this as somebody with over 20 years of podcasting experience now, and it covers every aspect of creating a successful podcast. From technical considerations to crafting compelling content. And there'll be some other familiar NFB voices as part of this podcasting seminar as well.

If you'd like to register, it is absolutely free, and you can go to nfb.org/cena. That's nfb.org/C-E-N-A for our podcasting seminar registration. We look forward to you attending. We will of course serialize this and make it available on Access On, but the beauty of attending it live is that you are able to ask questions of the presenters. And I've no doubt that with a subject like podcasting, there'll be plenty of technical questions to be asked.

When any company gets precious airtime at a big keynote like Google I/O, it is a very big deal for the company concerned. So it must have been a proud day for Aira when Google devoted some time to showcasing a new partnership with Aira partner that leverages the latest AI technology to offer AI based visual interpretation. To talk with me about this partnership and what we as Aira users can expect going forward, I'm joined by Aira's, CEO, Troy Otillio. Troy, congratulations. This is a bit of a feather in your cap, right?

Troy Otillio: I think it's definitely a feather in my personal cap, and I actually think I've said it before, AI will disproportionately and positively impact people seeking accessibility tools, people with disabilities. Especially when you talk about visual information. It's good at condensing information and presenting it in a way that is personalized in certain ways.

But it is a feather in my cap, in that as a technologist, I look to exploit technology for the benefit of my customers, in this case the blind and the vision community and those who want to provide accommodations and services. And there we were. There I was at Shoreline Amphitheater, and knowing that we'd be part of the announcements and one of the few third-party companies to be included and also knowing privately what's been going on behind the scenes. So it's a long way of saying it was a great day and equally I look forward to the future. So it's a small feather that's going to turn into a massive benefit to everyone.

Jonathan Mosen: In five or six years ago, Aira was dreaming of a moment like this. There were presentations at CSUN, there were all sorts of discussions about how Aira was collecting a lot of data about the way that blind people live their lives and the areas where visual interpretation could be useful. And if you could establish sufficient patterns and you could have the machines that were powerful enough to work with those patterns, all sorts of things were possible. And it's like technology has now caught up with the dream.

Troy Otillio: Let's be real, we were early, and a lot of credit goes to Suman. The founder, Suman Kanuganti, when he established Aira. It stands for Artificial Intelligence Remote Assistance. A lot of the money we raised from venture capital was based on the premise that AI would play a role. Those who may recall, we attempted to develop just like you said, AI that was based on the patterns and the data that we were collecting. Yet, it wasn't sufficient. The AI that we could produce at that time mostly on our own, did not meet the needs, did not even come close. And then obviously generative AI has changed the economics and the efficiencies across the board. And here we are being able to leverage our data.

One thing I didn't think about or realize and what we changed is all of the data we collected prior to our Build AI program isn't used, frankly, cannot be used because of our privacy policy, we promised not to share with third parties. And if we are not going to develop AI on our own, which very few companies have the ability to build their own AI, they're going to have to build in partnership with the bigger players, and in our case, Google or DeepMind.

And so I think the vision and even the learning experience we had still remained inspiring because we saw the promise. But now I think Aira and other companies frankly, are now in a position to make this idea a reality. Of course, we have our own thought on how we're going to do it, but I know other companies even outside of accessibility, are all looking to see how they can harness AI to improve outcomes for their customers.

Jonathan Mosen: When you get something announced at a significant event like Google I/O, which is one of the big tech events on the planet every year, there's a lot that goes into this and there would have been a lot of relationship building. Can you share a bit of the story behind Aira's collaboration with Google and what got you to this point?

Troy Otillio: I don't think I've told anyone this publicly, so this is maybe a first. You might go, "Well, gosh, how did that relationship come to take place?" And it really comes down to individuals with passion and interest. And so one place this got kicked off was from a former employee of Aira, Sarah Conrad. She was in marketing at Aira in 2019, and Gregory Wayne, who works at DeepMind, one of the AI leaders, thought leaders, and someone who is really behind the Astra project, they met and they shared information.

And Gregory kind of understood what we did and understood the potential for creating a specific model for visual interpreting. Because Astra in general is based on multimodal, which is to say interactive multiple media types and a proactive agentic agent, which is to say not just reacting to what you ask, but also understanding larger goals and being able as an AI to participate and be proactive.

So anyway, those two get together and Gregory ends up reaching out to me to say, "Hey, what if we do something together?" And that really goes to the credit of DeepMind. If you look at what DeepMind has invested in over the years, there's a lot of projects, whether that's improving the way scientists work or protein folding, they're much aligned with Aira. They're looking to not just serve commercial outcomes. Certainly there's commercial outcomes, but especially as a research organization, they're there to improve the lives of everyone.

And so Gregory reached out to me, and that started the exploration and that was well over a year ago, and that evolved into a close working partnership that continues today, that up until Tuesday has been hidden. If people know about Build AI, and we talked about, "Hey, we're building AI with you, the community, and if you want to opt in and provide your sessions, we're providing it to an unnamed third party."

So now you know who that was, it was DeepMind, it was Google. And it's been a great collaboration and continues to be a great collaboration and they're really an awesome company to work with overall because they're patient and they're appropriately cautious, and at the same time, they think about the mission and they're in it for the longer term perspective. We have a very strong partnership, and I'm also reminded, it's easy for me to know this, but I remind others, we actually leverage quite a bit of the Google ecosystem.

Our interpreters are using a Google map to see where people are when you're navigating. We happen to use a platform called Flutter, which allows us to write software once and deploy everywhere. We touch a lot of parts of Google, but this is a very unique and special relationship because of what AI represents for Aira, for the people who use it, and to fulfill this vision of having an additional choice on how you get your information, whether that's an AI or a human.

We're not there yet today. We're not announcing that this is commercially ready. We're in a testing phase and an evolution phase, but it is the start of a journey.

Jonathan Mosen: Let's talk about that journey specifically now and get into the details of what users of this technology can expect, and we'll talk about rollout in a bit. What actually is the product? What can you do with this today?

Troy Otillio: First off, today we've created a wait list, a sign-up list. You can just Google Aira and wait list and I think you'll actually hit the first link. But there's a wait list, there's five questions we ask you and we put you on a wait list. And now you're asking, okay, let's say you're taken off the wait list and you're put into what Google calls a trusted tester program.

They have trusted tester for all kinds of products that they produce, and there's a trusted tester program around the AI visual interpreter powered by Astra, which is their kind of AI model that is this proactive interactive experience. And so now you're asking, "Okay, let's say I'm on the wait list and I'm pulled off. What is it going to feel like? What is it going to look like?" And I'll do my best to describe it.

But simply put your Aira app because you're now in the program, will have an additional button if you will, below the call button. So you can still call Aira and whether you're paid or free, you can still hit that button and be connected with an agent. No AI is involved. You still have access to the image AI, we've been calling Access AI, but is that same AI that Be My Eyes, Aira, Scene AI has where you can upload an image and have it described by the AI, that's still in play.

But now you have this new button that says call the AI visual interpreter, and you click that button and lo and behold, with a very consistent wait time because it is powered by Compute, after a second, you're going to be greeted by an AI voice saying, "Hey Jonathan, what do you want to do today?" And you're going to talk to it just like you would a human visual interpreter.

You're going to say, "I'm looking to have something identified," or "I'm trying to assemble some Ikea furniture," or the list goes on. And you're going to start interacting with that agent and it's going to sound and feel a lot like a visual interpreter. Of course, it's different. It's an AI. And so there's a bunch of nuances that at least I observe right now that are different.

But the collective goal we have together with DeepMind is to produce something that is on par with the ability of a professional human interpreter across tasks that it's been trained on. So you're going to be interacting. You'll also know that through the Stress Tester program, there's a one-to-one eye visual interpreter silently listening and watching and observing. And why is that visual interpreter on the call with the AI? Well, it's a couple. One, is we are providing feedback alongside this interaction so that we can improve the AI.

So we're looking for weaknesses or places where it could be better. And we're also there to intercept or allow the individual to escalate. So if there's some kind of safety concern, like the information the AI is giving, it's hallucinating and it's perceived to be a problem, the agent will escalate. If the individual is struggling with the AI and it's becoming frustrating, we certainly don't want that and we certainly want the tester to come back again.

So the agent may escalate or if the individual just is like, "We can't perceive that the frustration is there," the user or the tester or the explorer can also say, "I'd like to escalate," or I actually don't know what the verbal command is. So that's what the experience is. You're going to start with the AI, you're going to do what you would do with the visual interpreter up until the point that the human needs to take over if that in fact is needed, and off you go and you complete your task.

I should note one other thing, which is the AI is not intended at this point to support outdoor navigation. We're asking testers not to use it for outdoor navigation, even though it can perform that task because of its general training, but explicitly we're saying not to engage in outdoor navigation at this time. And you complete your call, you rate the call, and there might be some additional follow up, some additional inquiry from the combined team. And that's going to take place starting now and through the foreseeable future as we expand the pool of trusted testers.

Jonathan Mosen: Often an AIRA agent will go out onto the web and do some additional research. If there's a user guide or something like that, can the AI do that?

Troy Otillio: Yeah, let's talk through all the known limitations, and I talked about navigation. Well, I'm going to mention now, we collectively do not support computer-based tasks. So a large portion of our calls are computer-based, and I guess I need to refine it. The AI agent is not capable of taking action on the web on your behalf. It certainly can do research in the sense that as an AI model, as the Astra model even, was demonstrated yesterday, it has the full worldwide knowledge of what it's been trained on.

And in addition, built into the AI, and this is not unique to the visual interpreter model. It is a property of most AIs including GPT, but I think I saw yesterday that Gemini looks like it has some strengths now that I didn't until I went to that presentation yesterday, I didn't realize how far ahead they are in some places.

So it does have access to online information, so it can do the research. Maybe not in the way an Aira agent might do the research interactively. Frankly, part of that remains to be seen because the AI is deep and complex and part of the testing is to see how it performs. And then finally, we do know you can share your screen with the AI and the AI doesn't differentiate between the camera feed and the screen sharing. They're both just video feeds.

So it works that way. And frankly, that was a surprise. We hadn't thought about that use case. And low and behold, it just works. It works to the extent the AI can interpret and proactively engage you, achieve the task at hand. But outdoor navigation and remoting into your is not supported and acting on your behalf on the web, which some people do, is not supported.

Jonathan Mosen: You did expressly say outdoor navigation was not something you wanted to support at the moment.

Troy Otillio: Yes.

Jonathan Mosen: Does that mean that if you're in an airport or a hotel and you're looking for something specific, that you could try the AI in that situation?

Troy Otillio: Yes, you can. And there's an important point to make here, and I want to be clear on it and it's nuanced, which is Google is especially conscious and concerned always about safety. So if you read anything about their approach to AI, as with a lot of companies, but I've learned a lot about Google and DeepMind's view on AI, they very much want to ensure safety, which goes to trust.

There's even a point of view I think could be unique at DeepMind, which is to not even name an AI with a human name, not to try and misrepresent AI as a human. And so safety and being conservative about what AI can do led us to both decide that outdoor navigation is not the use case we want to attempt or try or start with.

Indoor, I think represents less of a safety risk in a more controlled environment. So indoor navigation would be on the menu. I should also point out that in this testing phase, we may change or influence what types of tasks we want people to take on because we want to focus and improve as quickly as possible.

And sometimes by narrowing on certain use cases, we might find more success versus trying to solve everything at once. But at the moment, we're in that early, early phase of deciding where are the biggest weaknesses and where do we need to invest? And it's all just starting today.

Jonathan Mosen: On the one hand, it could be perceived as commendable that Google's taking a cautious approach, but on the other hand, it could be perceived as patronizing, because blind people are ultimately responsible for their own safety. And using their blindness skills beyond visual interpretation, they have ears that hear and a white cane that is in front of them. We could be held back if these big tech companies take a conservative approach to what they think blind people need to be protected from.

Troy Otillio: I think that's true, and I think I can bridge the two parts, right? Large organizations, they have so many areas of focus and they have general policies. And I think as those general policies are applied, this very nuance, like we've talked about, Jonathan, you and I, independent of Google and others, like the censorship if you will, or the limitation of what images can be described.

While logical at some level, it doesn't recognize, as you said, the needs and the independence of people who are blind and low vision. And so part of Aira's job is to help raise the awareness. And it's there's plenty of people within Google who are connected or even part of the community, yet some of these approaches and policies are a general form of an approach, right?

Start with safety in mind. And yes, as you know with Aira, we are not a safety device and it's really you have all the skills you need to be safe and do the things you want. At the same time, translating that to a larger program with principles around core concepts, you get into conflict about that. Outdoor navigation is one of the most complex things that even Aira does.

So forgetting a safety concern, we are also narrowing on I think simpler use cases to make sure... There's just some basic stuff. Like we're having a two-way conversation, you instinctively know when you can interrupt based on my pause, based on a lot of things, just even that two-way dialogue, tuning that, tuning how proactive the AI should be, tuning even some of the language before we get to more complex things like outdoor navigation. But certainly your point is taken as something that's been shared, and I just don't want to suggest that Google isn't listening or hearing, it's just there's a bigger policy in play there that extends beyond just this use case.

Jonathan Mosen: Right. And the National Federation of the Blind has a good cordial relationship with Google and will have a discussion with them about this as well. But I mean it's somewhat arbitrary, right? Let me give you a scenario. Let's say that you are perusing a farmer's market and you've got lots of produce there and you want to go from table to table and have the AI tell you about the different things that are at each table so you can decide what you want to buy. If it's outdoors, presumably you're not allowed to use it, but if it gets rained out and they move it indoors, you can.

Troy Otillio: Yeah, and I think you'll see also I'm talking about the big picture goals. We're going to be very intimate with each user and how they work. You have a visual interpreter on every call. So I do think we'll be able to support nuanced exploration of the limits. And as you described that one, that's one that could be supported.

Where the caution comes in is crossing the street. Whether or not it can be communicated that again, someone who's blind or low vision has complete capacity to be safe on their own, it's just where do you start and where do you expand to? So I know and working with the DeepMind team that it's going to be a very deliberate hands-on approach to exploring use cases, starting with the simplest, making sure we nail the basic interaction and expanding to the more complex. But yeah, you bring up a very good example.

Jonathan Mosen: Just pursuing this a little bit in terms of the farmer's market scenario with another angle I want to take. There is some technology now built into ChatGPT, which is of course the Gemini competitor. Where you can run your camera and it will describe what's going on around you. But one thing it will not do is you can't ask it to look for something, and the ChatGPT as it currently stands, will not keep on scanning the view through the camera and when it sees something saying, "Ah, I found it now."

So let's say I'm looking for the bananas and I'm at this farmer's market and there's table upon table, row upon row of produce, and I just want to walk around and have the AI tell me, "Okay, I see the bananas now." My understanding is that the Google implementation can actually do that. Is that correct?

Troy Otillio: That is correct. That's a very exciting use case. I can reveal that, Tim Elder, he's the president of NFB. I know you know him. Him and Sean Dougherty from SF LightHouse were at the event yesterday with me and they got a sneak preview, and that is literally one of the tests. That we just happened to be at lunch and Tim wanted to know where the recycling was because he's a Californian and he really wants to recycle his LaCroix can, which we also used the AI to describe, and that is absolutely a use case. So it really gets into this general concept, among other things, of memory.

And I'm not an AI expert, I'll just tell you what I understand, but the ability for AI to remember things and then separately for you to give it a task so that it's monitoring without you constantly prompting, like you said, is a design goal and something that even the outside visual interpreter model, which is again the model is trained directly on the eye recessions that many of the explorers have contributed is absolutely part of that use case.

And then that gets into some nuance of what is that interaction look like? How do you as an individual get confident that it's continually scanning? So you said, "Hey, look for bananas," and you were walking for a while and you didn't hear anything. You might ask AI, "Are you still looking for bananas?" And it would probably say, "Yes, I'm still scanning, I have seen none." And you might ask yourself and compare that to a visual interpreter.

Well, how often would a visual interpreter kind of remind you, I'm still looking for bananas? So those are some fine points of tuning and learning that I know will get better over time. But the property of memory, of remembering what you asked, the property of remembering where things are. In fact like we demonstrate that if you put your house keys down somewhere, the AI will remember that, so such that later you might say, "I can't remember where I put my keys." And it can not just respond, "It's on a table." It can respond, "It's on a table in the kitchen."

How did it know it's in the kitchen? Because AI is good at detecting and categorizing a scene. So that kind of gives you a hint at some of the use cases. But the truth is, we don't really know yet what the strengths and weaknesses are going to be, because to date there's been mostly model development and not model testing.

So it's why I think you'll hear Aira and DeepMind at the same time, call it a conservative or cautious or sobriety, that it's not going to come out of the box and be this perfect replica of a visual interpreter. It's going to come out and maybe there's some use cases that are going to be remarkable and some that are going to be less than remarkable. And is it going to take months or is it going to take years? That we don't know. But it is a starting place and as every AI scientist I know likes to say, this is the worst it's ever going to be.

Jonathan Mosen: I spend a lot of time talking to companies about AI in my role and we spend a lot of time doing ethical discussions, sort of deep philosophical discussions as well as technical. Let's say that we're going to a reception and you are there and Everett from Aira is there and Jenine from Aira is there. Can I give pictures of you to the AI and say, "This is Troy, this is Everett and this is Jenine. And when I'm in this reception and I'm walking around with the vision interpreter running, I want you to tell me when you see any of them."

Troy Otillio: I'm smiling if you were to look at me, I love this question. So there's what Aira believes and then there's the policies of any company-

Jonathan Mosen: Right. Right.

Troy Otillio: ... and what have you. So I believe, what I know you believe, what I like talking to Anil Lewis from NFB on this as well, if a sighted person can have a memory and use it to identify things, so should someone who doesn't have vision, right, can't do that. And so the AI should mitigate that, right? I think that's a right.

I don't think anyone would disagree with that if you narrow it to especially that use case. What the challenge of course is that in a broad picture, there's lots of concerns about privacy and I know you know this story, which is there's a lot of fear around facial recognition and what that could mean for individual rights and private. And so those two come to conflict in the policy that the AI vendor has in general.

I don't know the answer to that one frankly. It's really literally that new, Jonathan. I'm going to try that after we hang up. I don't know if it's going to memorize a name just like it could memorize a set of keys on a desk. It's a really great question. I do know, in general, that part of the challenge with any AI is extending the memory window. What is the length of time how good is that memory?

Which is kind of funny to talk about because as humans we have limited memory. We can't remember everything except for a few select people who are uniquely endowed. But with AI, it actually has another limitation at some level from a memory perspective at the AI.

Can you have a bank of photographs that it's always referencing and therefore always memorizing? That would be a more explicit feature versus general AI. But all to say, it's my position and I think it's influenced or comes from working in this industry, that is a right that should be possible. But I don't know the answer as it relates to the current version of the visual interpreter AI today, but I can get back to you on that.

Jonathan Mosen: And what's the pricing model going to be for this?

Troy Otillio: You're way ahead. I don't know. We're in a testing phase. Certainly all compute costs money, right? Yes, Amazon, Google, everyone has free tiers of compute. If you're a developer, they do give some of it away. It's a little bit like five minute free with Aira, and then you've got to pay because obviously Google has costs, Amazon has costs, and we haven't decided how to price this and I look forward to engaging the community and even the access partners.

There's a whole question about how will access partners look at AI? Will they accept it? Our job is to make it as secure, private, trustworthy as visual interpreting. And we could talk a little bit about how our plan is to always have an integration of the human professional interpreter and the AI just like we do with the access AI, but the pricing will be a function of what the internal costs are, and Google has not yet released pricing and we're very much in that early phase, and I don't think we'll even have an idea in the next six months on that and maybe longer.

It all depends on how quickly this evolves and obviously Google DeepMind has to decide what it costs and then what they need to charge to be sustainable, make a profit that they need. And so it's a long way of saying, I have no idea. I have some theories about what we do, but it just equates to, we would charge whatever we need to make sure we are sustainable and have enough profit to grow just like anything else. And I do expect it to be far less expensive than human labor.

Human labor obviously more skilled overall at this day and age. There's a cost there that we pass on and we'll likely do the same. But how we package it together, do you get a plan as an individual that includes AI in humans and is it minutes-based and whatever? Have no idea. And I look forward to engaging folks maybe even at NFB this summer and some early questions on that. How do they perceive the value, assuming it's working very well. And so that's an open question.

Jonathan Mosen: These things are producing voluminous amount of computing power to do what they do. And I did have a laugh. I read somewhere recently Sam Altman saying that, "If people would just stop thanking the AI." Just all the thank-yous that get sent, that would save a massive amount of computing power. So it's funny the way people respond to these things. What is the hallucination factor like with this? Or is it too early to say because-

Troy Otillio: It's too early to say.

Jonathan Mosen: Because sometimes AI just tells you something so convincingly. I'll tell you an interesting story actually. I was going on the road a couple of weeks ago and I have one of those cool Anker power banks that I can charge all sorts of devices with and I carry it in my technology backpack, and I wanted to make sure that it was fully charged. So I put the little screen that it has in front of my iPhone and I started off with one of the AIs and it very chirpily said, "Congratulations, you're good to go. It's fully charged at 100%."

I had a sighted person in the room with me and she said, "No, it isn't. It's at 82%." So then I thought, all right, I'm going to go on a bit of an experiment here. I tried all the different AI services to ask it what percentage the battery level was at. The only one in the interest of full disclosure that got it right was the Envision Ally AI. The other ones all got it wrong, and of course I could have gone to the Aira agent for confirmation, and I would've if I wasn't trying that experiment. So what I'm getting at is, these things are still at a point where they can sound incredibly convincing and they're just flat out giving you false information.

Troy Otillio: I have two comments on that. One is I expect the hallucination rate, maybe expand. Hallucination is I've got a very fixed question based on a very fixed image and I've gotten back information that's either a false positive or a false negative. And as you know, we did a white paper because we validate sessions with professionals, blah, blah, blah. We know what the hallucination rates and depending on the image, it varies. And depending on what you would consider hallucination, obviously you're asking for what is the percentage. If it gets that wrong, it hallucinated. If it told you, and I'm making this up, your blue power bank is at 80% and it was a black power bank. Did it hallucinate? Well, it did. Is it material? No. There's a lot of nuance in there.

But now you're talking about a multimodal AI that is scanning and using the ongoing video input. It's thinking about your task, it's also advising you or interacting sometimes for active, "Please rotate the can left," because you told it look for ingredients, and the sighted person might have said, "Well, it should have rotated the other direction because I could see that the ingredients were if you just would've rotated at counter clock."

So there's a much broader range of what the definition of hallucination will be. That said, I think the hallucination rate will be, this is just Troy's opinion, might actually be higher to start in certain circumstances because it is more complex. I've been using it and I see mistakes and hence why we're testing. We're not even saying it's ready for prime time or we're not saying it's ready for any general use.

So I think it's going to start off here in the tester program of being, quote, "high." And then the goal is to reduce that as quickly as we can. Whether that's description, whether that's providing proactive feedback, whether that's finding your bananas, Jonathan, right? There's got to be situations where it's not going to spot the bananas. That's a false negative.

So we'll find out, and one thing I'm confident about, uniquely confident about Google I suppose, is they have arguably the best and brightest working on, I learned yesterday that the Gemini has started winning in lots of benchmarks relative to overall generative AI, and I think it's great to see that the industry's competing there, right? That's always a winning formula for all of us. But I think the hallucination rate will be definitely high to start and will decrease based on success and testing.

Jonathan Mosen: When testers start to run this, will they be able to use the Ray-Ban Meta integration?

Troy Otillio: No. And we might segue at some point into the announcement Google made yesterday about their XR platform, which is their-

Jonathan Mosen: Right.

Troy Otillio: Yeah. Meta, as we all know right now, is a closed platform. So congratulations to Mike and Be My Eyes. They are deeply integrated into the Meta Ray-Ban. We've requested similar access and at this time we don't have it. And it's frustrating. It's frustrating because explorers are frustrated. They want that first-class experience. We've worked around it. You can make a call, but you're running over the WhatsApp platform and because you're running over WhatsApp, that video feed, we can't feed it to anything.

We can feed it to a human because they're on the WhatsApp on their computer and they're seeing the video coming through the WhatsApp application, but that video stream can't be sent to the AI because we don't have the programmatic integration. We're not part of the Meta platform, which is why I'm really excited about Google's approach with XR.

They're building an open platform and they will be working with companies like Aira in the coming quarters and not just Aira, it'll be anyone and everyone who wants to integrate with their, it's not just their glasses as you've heard or you maybe heard, they're working with one or more glasses manufacturers who can produce this smart glass experience on hardware that they produce. So Google is producing as a reference platform so that others who are presumably and are expert in actually the construction of frames and all the nuance that goes into the construction of the glass, which you know, there's a lot to it.

There's the width of your face and your nose. I've got a big Italian nose, so glasses fit nicely on top of my nose, but there's form factors, there's fashion, there's style, and what I'm excited about is Google announced that XR will support both a AR version with a display, which is great for people who want a display, but for those who don't need a display or want the cost or the battery impact of powering a display, they also have a design that will be implemented without a display. And so it's a long way of saying no, the Meta is not compatible with the Aira AI-powered visual interpreter and until Meta opens their platform, that's not going to be possible.

Jonathan Mosen: Do you have some concerns as CEO of Aira about being boxed in though? Because Ray-Ban Meta has become a phenomenon. There will be another version of that I'm sure at some point that's even more capable. And then there are consistent rumors now coming through the tech press that Apple is about to launch in the next year or so, a glasses product because they see what has happened with Ray-Ban Meta as well, and you can imagine that there will be very deep accessibility integration and Apple's own AI efforts involved in those glasses. You don't presumably want to be pigeonholed into being a Google only product with this.

Troy Otillio: Especially on glasses. Look, we want to Aira everywhere. Google's a great partner. We don't have an exclusive relationship when it comes to glasses, and you know this from the accessibility field, we are largely dependent on the policies and go-to-market mindsets from these larger companies. And Apple's got a great brand regarding accessibility yet, and this isn't about Aira, it's very much a closed ecosystem in a lot of ways, right? It's easier to work with Apple products on iOS than it is often with other products. So that's not changed.

Our job and to the extent others, and the audience wants to help, our job is to make sure Apple is aware of the benefits of enabling a company like Aira or any other company to integrate and provide accessible experiences on their platform. And I think Apple is a great pioneer of accessibility and I'm going to trust and hope that as they evolve their wearables and glasses, that we can be included in that and provide the service that we have for people who share the need or the interest, I suppose, in Aira and the interest in using Apple products. So it's a long way of saying, yeah, I worry about being pigeonholed all the time, and I in part depend on the voices of the community to remind these vendors about what open platforms mean for the community.

Jonathan Mosen: You are familiar, no doubt, with the Apple expression about being Sherlocked, where essentially Apple lets a product category mature and then they come along and they do something that completely upsets the third-party ecosystem. And I guess it is possible one day that Apple introduces a wearable with very good AI in real time describing the world around you, and they may well be able to afford to forward those calls on to a human when you ask.

Troy Otillio: Every company faces similar challenges, I think we could have even a deeper philosophical question. People ask me, "Troy, aren't you afraid that AI is going to eliminate the need for, quote-unquote, "Aira"? And if that happens, I mean certainly we can't stop it nor would we or can we control it.

Yet, our vision, our mission is to provide the very best visual interpreting that is trusted, is efficient, meets your needs, whether that's through AI, whether that's through humans, and as long as there's a need and room for someone like Aira to tailor and integrate technology, because in a lot of ways, if you look at we're not the ones building this visual interpreting model. DeepMind is in this case, but what we're doing is we're integrating it into our app and we are synthesizing it with humans. We are integrators of emerging technology and building with the community, right? Not presupposing that we know what to do and how to build it.

So in general, all companies like, gosh, another tangent. I was sitting in the audience and I didn't realize how far AI and AI at Google has come to do things like create movies. Literally, I was sitting there watching one of their demos showing how easy it is to go literally from a hand sketch plus a dialogue, plus some descriptions of characters, to a generated movie that visually, I'm going to tell you, looked photorealistic and the voices were realistic.

And so gosh, I was thinking about creators, do they feel threatened or what's their role? And I don't have the answer to that. I mean, I have my thoughts. But certainly the future is undefined with the emergence of AI and my goal is to have a tiny hand on the steering wheel and do the best for the community, which includes explorers, it includes businesses and organizations who deploy visual interpreting.

Jonathan Mosen: I think the cool thing is too, that in this case, the blind community has donated, if you will, in exchange for minutes, real world data about how blind people engage with the world, and that's very valuable.

Troy Otillio: It's very valuable. And what I've seen at DeepMind, I think it's the same thing that's happened at Aira when people join and they start working here, you have assumptions as a sighted person and DeepMind is primarily sighted folks there, but you have assumptions about what it means to be blind and then you immerse yourself and you talk to folks, but there is no substitute in part to seeing literally and hearing what happens. So it's not just the AI that I think has been trained here. I think the folks working on this project, new to this experience, have grown quite a bit of understanding from that observation.

But it is, I'm very proud of this because it's quite a contribution the community made and my goal was to make sure they were very conscious about what they were doing and they were opting in and it was very above board and so that we can do more of it too, right? I don't think this is the end of gathering data for the benefit of building better tools, but certainly it puts us in the game of now deploying it and testing it and refining it.

Jonathan Mosen: Just to be clear, I take it there is a material difference between what I would get if I use this tool and what I'd get if I fired my camera up on Gemini Live?

Troy Otillio: It's a different experience. Admittedly, I have not pulled up Gemini Live and worked with it a lot, but some of it is the subjectivity versus objectivity. As you know with Aira, visual interpreting is by default objective. And the descriptions and the way that things are described might be different language that you would use with someone who's sighted, right? Whether they're using the term plaid or in the far left, or there's lots of ways that descriptions have been trained to be uniquely suited for the explorer.

Equally, the techniques for either operating in a physical space and other use cases, I'm drawing a blank, but all that training data is different than General Astra Gemini Live training data, and that has created a truly proactive interactive design with and for explorers or people who are blind. And in part that's what we're going to learn, Jonathan, is how much different is that model and experience than I'd say the model that's ultimately trained on general purpose information. But it is different. I mean, yeah, we can see it especially in the proactive part.

Jonathan Mosen: When will actual blind users start to get a taste of this and what sort of agreement will they be under when they do? Will they be able to talk, for example, freely about this on social media initially?

Troy Otillio: No, they can't do that to start. I think it's in Aira's interest. It might even be in the community's interest. We can debate that. But we're looking to gather information privately, discreetly in that trusted tester program. We don't have a defined end date because we literally don't know. Literally we're at the dawn of exposing this model to real world examples.

Certainly we've done some in-house testing with people who are blind, even some Aira users, Google employees or Aira employees. But in a real world setting, the testing starts today, so there's been some people pulled off the wait-lists, very few, a handful. Got to iron out a lot of the kinks and it'll expand.

So what are you signing? You'll have to read the agreement to be sure, but it's very much the same pattern as the Build AI. We're telling you that we're sharing this data with a third party and you consent to that. You also are able to opt out of sessions as the application allows you to opt out. If you did a session with the AI and for whatever reason you're like, "Hey, I don't want to share that."

You can do that just like you did with Build AI. And then as you surmise, you're also agreeing not to share or talk about your experiences at this time with others, while we evolve and get to a level of confidence that we can share this and preserve kind of the momentum. And so that's what you'd be agreeing to.

Another question you might ask is, where's the trusted tester program geographically supported? I'm going to say you should get on the wait list no matter where you are. I don't care what country, it's five questions to fill out. But what you'll see is that as of today, it's US only except in Illinois and oh my God, is it Illinois or Indiana? See, I'm, I'm going to get confused. One of the two. Or Texas is excluded and thus not Canada, not Australia, not New Zealand, but this is how Build AI started, and you might imagine we're eager to expand to get more diversity overall and eventually expand globally.

I think an advantage that AI has is that you can localize it in different languages much more efficiently than you can by hiring professional visual interpreters, because there you have to find people, and likely in country, to be visual interpreters. So I think AI has an advantage that its ability to communicate in multiple languages is kind of built into the nature of software and language. But hopefully that gives some ideas about what it looks like. But I again encourage anyone to sign up for the wait list in part because that also tells us where the demand is and will help us decide where to prioritize outside of the US.

Jonathan Mosen: For those in those excluded US states who don't know why they're excluded, what is up with that?

Troy Otillio: I don't know. You might imagine a lot of it has to do with the privacy laws in certain regions.

Jonathan Mosen: Yeah, I do remember Illinois coming up in a Be My Eye's discussion where essentially they were having a great deal of difficulty getting past laws relating to describing people.

Troy Otillio: I can't speak for Google. I can tell you that in looking at privacy, there's always this challenge of privacy versus innovation, right? And even like you talked about before, should the AI support outdoor navigation? Or we talked about should it describe naked people? Is that pornography and why shouldn't it describe it? There's a bunch of those kind of questions. Privacy is always well intended and I'm a big privacy advocate and I think people should have control over their data and that's reflected in our privacy policy.

But sometimes the laws are written in a way well-intended, that make for extra challenge to implement. This is my personal opinion on GDPR. I love GDPR. Think it advanced privacy and it's great for the individual. As a vendor of technology, it puts extra burden as you have to deploy often your service in country, in region, and at the end of the day, that costs a lot of money and it adds to operational overhead.

So sometimes there's just a practical decision of to support countries where the privacy laws require you to do more, you have to make that business trade off. And as a company that lives and dies on its own revenue and profit, then you get into that decision, which doesn't really help the person who's in the region that's excluded.

I get that and my best answer is, we'll get there, got to give us some time, and we have to be able to scale to that. But I believe at the core is likely the privacy law as written creates some undue extra challenge, that if we were to try and meet that we'd have to delay the overall program. So you have to trade off, well, do you delay for the few? And we're making the decision like, no, we want to get started because it's a long journey to test and improve the AI. So let's start where it's the most practical.

Jonathan Mosen: We look forward to seeing what happens next with this. Some people can go and search for that link. I will try and put it in the show notes to make it easy for people who review the show notes. And we-

Troy Otillio: Remember our wait list.

Jonathan Mosen: Yeah, yeah. We'll keep in touch on this and congratulations on getting this done, and I'll be very interested to see how it goes.

Troy Otillio: Well, I look forward to seeing you later in 55 days. I think the NFB Convention takes place per an email I just got this morning, which in some ways is frightening. That's just right around the corner, but I'm excited to see where we'll be then and we'll be in touch. And I want to thank all the explorers out there. Honestly, without the advocacy and the usage of Aira Visual Interpreting, we're not being presented on the main stage of Google as a partner, right? They really are asking the question of who should they partner with and why?

And the answer came back as we have great support from the community and among other companies, we're a company of integrity and it's demonstrated in the support from the explorers. So thank you for having me on the show, Jonathan, and I look forward to sharing more as it uncovers. You can see we're at a very early stage and there's a ton of unknown, and sometimes that's the most exciting time.

Jonathan Mosen: Always a pleasure. Thanks Troy. That's Troy Otillio, CEO of Aira. We have some time for some listener contributions before we go, but before we do that, I want to remind you that you can make a difference with the National Federation of the Blind's Lead and Drive Give 25 in '25. When you give 25 dollars or more between May 15 and July the 1st, you're entered into the Give 25 drawing. Each $25 increment is a chance to win.

Your support helps us continue to lead courageously and drive lasting change for blind people across America. You could win prizes like round-trip transportation for two to the 2026 NFB National Convention, hotel accommodations, registration, banquet tickets, or 2,025 dollars cash. Oh, do you want a chance to announce our Give 25 winner at the Convention Banquet? Become a Federation challenger, ask friends and family to make donations and indicate that you prompted their giving.

We'll have drawings for prizes at Convention for our challengers, and if you are the challenger who prompted the most gifts, you can announce our Give 25 winner at the banquet. But that's not all. Be one of the first 100 people to give 100 dollars or more and you'll receive a pair of Aftershocks headphones. I know that'll be of interest to a lot of Access On listeners. And thanks to an anonymous donor, up to 25,000 dollars will be doubled. The annual Give 25 Drive supports the Kenneth Jernigan Fund, Sun Fund, Tenbroek Memorial Fund and the White Cane fund. You can choose a fund when you donate. To enter, visit nfb.org/give25donate. That's nfb.org/give25donate. You can call 410 659 9314 extension 2430. That's 410 659 9314 extension 2430, or you can send a check to National Federation of the Blind and mention Give 25 and the fund in the memo. The winner will be announced July 13th, 2025. Thank you for your generosity.

This one is from Pappy Skutchan. Now, I don't know whether that's Larry or somebody else, but it's a good email and it says I am responding to Rick Roderick's inquiry about an accessible dishwasher. While I'm certainly no expert with accessibility in appliances, the pain and frustration are certainly something we have all felt when just trying to get on with the daily business of living. A few experiences with kitchen remodeling and a move to a new home plunged me into this unenviable circumstance twice recently.

In the first instance, we selected Samsung appliances, thinking they would work with the SmartThings app and for the fact that the induction cooktop used a cool physical round puck that magnetically attracted itself to one of four places on the control part of the surface. You could then twist the puck to raise or lower the power level for the corresponding burner. The problem was that in addition to being unpredictable about how much to turn it or how fast or slow to twist, the app did not update the power level indication in a timely manner.

Sure, you could drop a little water on the pan or feel how hot it was, but it was still very annoying to use. Of course, you cannot try these things out in the store before you buy. The Samsung dishwasher did not even connect with the SmartThings app. So we used the stickers to mark power and start. A few months ago, we purchased new appliances for our home in Florida, and this time we selected LG.

One of the main reasons for this choice was the presence of old-fashioned knobs on the cooktop that actually have a tactile mark and that start and stop at the same predictable position. I am super happy with this method of controlling induction elements because a large percentage of element controls are on a touch screen on other models. The only other minor challenge is finding exactly where the burner is, but the element turns off and annoyingly beeps if you have the pan positioned incorrectly.

The oven is another story. Reasonably, I think to use the app with the oven controls, you must set remote on the touch screen. Once I figured out a simple, reliable method of finding that part of the screen, it was then easy to control using the app, and it is evident that LG spent some time with accessibility. I wrote to LG and suggested they put such controls like remote and off in places near the edge of the screen where a blind person has a decent chance of finding it. The dishwasher and the LG appliances also works with the app, although I found that I prefer using the stick-on bumps to start the machine most often.

The app, however, lets you set modes, change sounds and all that other fun stuff sighted people get to do with their appliances. I would caution to ensure you get a dishwasher with Wi-Fi connectivity, if you want to be able to control it effectively for anything other than starting a pre-selected cycle.

Unfortunately, there is not a good way to make sure an appliance is at all usable before getting it into your home and they change so often. It is almost impossible to keep track of what works. I know that current LG appliances work with the iPhone app, but I have not tried them on Android. I wish I could be of more help and I wish even more fervently, there were an easy way to keep up with all this.

Rich Yamamoto: Hey everyone, this is Rich Yamamoto here. Last time with the Victor Reader Stream, I showed you how to shut down the device when the power button operation is set to suspend mode. Today I'm going to show you how to change the power button operation just in case you may have done this before and forgot how to revert it back, or you want to change it to suspend mode from the default of power off, because that's obviously what the power button will do. So I have my stream here and I'm going to press the number seven key. This will take us to our menu.

Speaker 6: Menu. Import configuration.

Rich Yamamoto: So we're currently in the NLS barred menu right now, so we'll press seven again.

Speaker 6: General settings.

Rich Yamamoto: And we're going to go to general settings. I'm going to press confirm.

Speaker 6: Language.

Rich Yamamoto: Which is the pound key, and I'm going to press the number four, also known as the left arrow.

Speaker 6: Power off options. Suspend.

Rich Yamamoto: And we have power off options. We have suspend. And if I hit confirm.

Speaker 6: Power off options. Power off.

Rich Yamamoto: It is now set to power off. So if I were to shut down the stream now, it would shut down completely as opposed to going to suspend mode. If I set this back to suspend.

Speaker 6: Power off options. Suspend.

Rich Yamamoto: With the confirm key, and I press star, which is also the backslash cancel key.

Speaker 6: General settings.

Rich Yamamoto: And I press four.

Speaker 6: Shut down now.

Rich Yamamoto: There's the shutdown now option that I showed you previously. So let's go back to general.

Speaker 6: General settings. Language. Power off options. Suspend. Power off options. Power off.

Rich Yamamoto: Press confirm to switch that back. And if I back out now.

Speaker 6: General settings.

Rich Yamamoto: And I go left.

Speaker 6: Online settings.

Rich Yamamoto: There's no shutdown option here, nor is there a suspend option. So there's not really a way to suspend from the menu. You can only suspend from the power button as long as the power off option is set to suspend, as opposed to power off. I will mention that when it is in suspend mode like mine was before, it will be slowly draining the battery.

You can get it to last for about six days, I think with this mode on, as opposed to a full shutdown where that isn't happening, where the battery is draining way slower than on suspend. But I will say in suspend, it still drains, but it's slower than if you were to just leave it on all the time.

Jonathan Mosen: That concludes this episode of Access On, the technology podcast of the National Federation of the Blind. To send in a contribution for a future episode, email us, attach an audio clip, or just write it down and send it to [email protected]. That's [email protected]. To keep up to date with Access On, follow us on Mastodon. [email protected]. That's [email protected] on Mastodon. To subscribe to an announcement-only email list about upcoming episodes, send a blank message to [email protected]. That's [email protected]. To learn more about the National Federation of the Blind, visit our website, nfb.org, or phone us 410-659-9314. That's 410-659-9314. And be sure to check out the Nation's Blind podcast right from where you heard this podcast.