Welcome to the thirty-ninth episode of Access On, the National Federation of the Blind's Technology podcast.
Episode
Listen to the thirty-ninth episode of the Access On podcast (Browser).
Or listen on your preferred podcast platform.
Timestamps
- Join us on August 26 for Innovations in online shopping 0:00
- From national convention, our presentation on innovations in AI 2:24
- Headset recommendation 57:09
- Tech tip, adjusting system or JAWS volume from within JAWS 58:02
- Closing and contact info 58:58
To send in a contribution for the podcast, you can attach an audio clip to an email, or just write it down and email it to [email protected]
Transcript
Speaker 1:
Live the life you want.
Speaker 2:
Access On.
Jonathan Mosen:
Welcome to Access On. The Technology podcast of the National Federation of the Blind. On the podcast this week highlights from our recent National Convention presentation on innovations in AI. If you're looking for a good headset for meetings, a listener has a recommendation at a reasonable price. And our tech tip looks at adjusting the system volume or the JAWS volume from within JAWS itself.
Hello, it's Jonathan Mosen at the Jernigan Institute in Baltimore, Maryland, welcoming you to episode 39 of the podcast. We're going to be doing a deep dive into AI in just a moment, but we also have an AI related event coming up. This is on August the 26 and it's looking at AI related innovations in online shopping.
AI is already making shopping better for blind people with faster product discovery, clearer information, and fewer roadblocks at checkout. So join us for another boutique. This one is called Discover the Future of Accessible Online Shopping, and it's on Tuesday, August the 26. It begins at 2:00 p.m. Eastern.
And in this session, you'll learn how to use innosearch.ai on the web and also over the phone to find exactly what you need. You can take advantage of AI features now built into major shopping websites to compare items, summarize details and aggregate reviews.
We'll also talk about using AI enabled browsers. This is a relatively new innovation that's really starting to come to fruition now, and you can use those to streamline searches, extract key information from cluttered pages and reduce friction in forms and carts. And we'll look at applying simple strategies to shop faster and with confidence.
So this is happening at 2:00 p.m. Eastern on August the 26. To register for the Accessibility Boutique, head on over to nfb.org/cena, that's nfb.org/C-E-N-A. When you get to that page, you will find a link for events and training. Choose that link and you'll be able to sign up for the August 26 boutique. We look forward to seeing you there.
As you've been hearing on Access On, National Convention, which this year was held in New Orleans is a treasure trove of technology information. There's nothing quite like being there in person, and I hope you're already thinking about coming to our 2026 National Convention in Austin. But we'll do the next best thing and continue to bring some tech related highlights from National Convention here to Access On.
And this week we are bringing you highlights of a session conducted by Karl Belanger, Kennedy Zimnick and me about innovations in AI since the last National Convention. And I'm sure you know that AI is moving at a rapid pace, so there was plenty to talk about. And we actually had to make tough decisions about what to leave in and what to leave out given the time constraints.
This was a packed session and I know that almost as many people were turned away as were able to get in. So if you were one of those people, I hope that this is some consolation that you can hear some of the key highlights from that very popular session.
Because a lot of people are familiar with ChatGPT. I'm going to just take a moment to set the scene a little bit in terms of going behind the curtain of these large language models, AI technology. Each service that we'll cover today, ChatGPT, Gemini and Claude and several others that we don't have time to cover, invest a lot of effort in their large language models or LLM for short. You'll hear this term a lot.
And while people may have preferences about user interface and voice interfaces and various factors that can differentiate this technology, the large language models are the critical intellectual property in the world of artificial intelligence.
So what exactly is an LLM? A large language model is a sophisticated computer program and it's trained on vast collections of texts from books, articles and the internet. And by analyzing patterns in that data, it learns to generate human-like responses to questions, prompts, and conversations.
The technology behind LLMs is improving remarkably quickly. They can summarize complex information, they can translate languages and even mimic styles of writing with impressive accuracy.
As blind people, we've seen these models help us perform a range of tasks more efficiently and give us visual descriptions of all manner of things, but they are far from perfect and on occasion they still hallucinate, which is just a fancy word for making stuff up.
When an LLM hallucinates it can sound so convincing, and in fact, it's just talking absolute nonsense. And this happens because the model doesn't truly know the facts. It predicts what words or sentences are most likely to come next based on its training data, not on real-time access to factual information.
Deep Research is an exciting synthesis of a range of technologies and we'll be covering Deep Research shortly. While you can't completely inoculate yourself against hallucinations, there are things that you can do to get the best results possible. Be specific and clear in your request. The more detail that you provide, the less room there is for the model to invent things.
Ask for sources or citations when possible. It's not a problem to say, "Are you sure?" When you think that the LLM has got it right? And sometimes a simple, are you sure is enough to get it on track if it hasn't got it right.
Try not to use ambiguous language and try to break complex tasks into smaller concrete steps. Even then, always cross check important information with trusted up-to-date sources. Treat the AI's output as a helpful draft or a starting point, not as definitive fact. When accuracy is critical, you should consult experts or official documents.
Now earlier this year we conducted a webinar on prompt engineering and that's a term that you hear a lot these days and it just means how you best talk to an AI to get the results that you want. And we've made highlights of that presentation available in episode 16 of Access On, which is the National Federation of the Blind's Technology podcast. Got any Access On listeners in the room?
Okay. So now let's talk specifically about ChatGPT. I personally find myself using ChatGPT more than any LLM, but that's a personal preference thing. While it's anecdotal and subjective, I do find that it seems to hallucinate less for the things that I use it for. It's not perfect, but it seems to hallucinate less.
For example, recently with the searching of the web functions in ChatGPT, I went on a vacation, went to Las Vegas, in fact, and I wanted to find out who was performing there while we were there. ChatGPT gave me accurate information, but when I tried the same thing with Google Gemini, it swore that people were performing there who actually weren't. And I found that quite an interesting practical experience.
I like the clarity of the output of ChatGPT and I appreciate how accessible it is. For example, if you use Gemini, your screen reader will speak a message when Gemini has replied to you, or say Gemini replied. But it won't actually read what that reply is automatically.
It is a little bit hit-and-miss sometimes, but ChatGPT will try to read its output to you in certainly iOS and on the web and in the Windows app. So it just makes it a little bit more efficient because you don't have to go searching for the reply.
So the free plan gives you interactions with LLM using basic features and limited availability during peak times. There is a paid subscription and that's called ChatGPT+ and it costs 20 dollars per month. ChatGPT+ subscribers get general access even when demand is high, you get faster response and you also get priority access to new features and improvements.
This technology is moving so quickly that I can only scratch the surface in terms of new features, but here are a few highlights since we've asked together.
First of all, real-time search integration. Now ChatGPT now offers live access to the web for up-to-date information, enabling users to retrieve current events, news and statistics constantly. Over the last year, ChatGPT has actually become the way that I search the web for answers most frequently.
And it is right up-to-date. So you'll find if you use the large language model, it might say, "Oh, I don't have information on that since I was the last updated in 2023 or whatever."
The search function doesn't have that limitation. And in fact, recently I've been keeping track as a dad and a grandad. This has really resonated with me with the tragic events in Texas recently with the flooding. And I was able to talk to ChatGPT about the latest news, the death toll, the search efforts, and it was able to be right up-to-date and scan news sources from the web and give me a quick synthesis of what was going on.
Now I mentioned Deep Research earlier, and I think all of us will be talking about Deep Research today because this is very exciting technology. So you'll be hearing a lot about this from us and from others.
And I want to set the scene which will give my colleagues a bit more time by discussing what Deep Research actually is and why it's a big deal for everybody. But I think particularly for blind people.
So think of Deep Research as asking ChatGPT or whatever LLM you're using to go out and look things up for you on the web, kind of like sending a helpful assistant to Google things and go through a bunch of dusty old books in the library, read a bunch of articles, and then come back with a clear and focused report for you. When you just ask ChatGPT a question the normal way, it relies based on its training data on the information it has within its large language model. And that as I say, cuts off at a certain point.
So things move at pace and it might not be accurate because it's simply become out of date. When you use Deep Research, ChatGPT actively searches the current web to find up-to-date real world answers. If something just changed, for example, like a software release or a product recall, then Deep Research can usually catch that. It can obtain specific information from websites, whether it's a restaurant's latest hours or menu or an updated list of product features. It can look directly at the sources.
And it doesn't just take the first result it find. It cross-references, it checks and double checks to get you as clear a picture as possible. Deep Research can save you a lot of time because instead of digging through say five confusing blog posts or conflicting websites, it just gives you the key points and that has accessibility benefits because if a site is cluttered, it's hard to navigate with a screen reader or it has poorly labeled buttons and headings, Deep Research can still fetch the information and summarize it cleanly for you.
Sometimes PDFs or manuals for appliances or software are image-based or they're not tagged properly for screen reader access. Deep Research can pull the text content and tell you how to use a device or an app.
So let's say that you're looking for a washing machine with tactile knobs or a thermostat that minimizes the need to interact with a touchscreen. Deep Research can check multiple websites and forums and reviews and then come back with an answer that takes accessibility into account.
It can bring in experiences from other blind users which have been shared on forums like Reddit or blog posts. I've seen it search on Apple things for posts on AppleVis and it will show up in products official descriptions as well.
So I used Deep Research to compare various cord-cutting services like YouTube TV and DirecTV, asking it specifically to take accessibility into account such as how well it was accessible with voiceover, how easy was it if at all to access audio description and the number of channels.
So I was able to have it build for me a custom-made accessibility report on the services that I wanted to use. And I also did that with Smart Scales as well, where I wanted a Smart Scale that gave me lots of data, but I also wanted a Smart Scale that was compatible with Apple Health and that was proven to be accessible with voiceover and it produced a beautiful report for me with tables and information.
When you give ChatGPT a Deep Research assignment, it'll often ask clarifying questions before it begins. So once it's begun, you can actually sit there and watch it think. It's the most, I don't know, remarkable and in some ways scary thing.
You can see it thinking and it's reasoning and it comes up with a partial conclusion and then you can see it say, "Ah, but I've just found this thing on the web that says something different so I'm going to have to rethink this." It's extraordinary.
It may take some time to get this done. It could take up to half an hour potentially with a complex task, but it will eventually produce its result for you. Even if you have ChatGPT+, unfortunately, you don't have unlimited access to Deep Research.
And I think in this regard, Google Gemini is much more generous. You get 10 full-powered Deep Research queries every 30 days, and then it goes to an additional 15, what they call lightweight Deep Research queries once you've used up the 10 full ones.
Now let's have a look at another thing that's new in ChatGPT, and this is multimodal capabilities. Users can upload and analyze images, charts and documents directly in the chat and ChatGPT will provide descriptions and insights and it will even generate visual content on request for you.
So if you want to generate a logo or you have this concept in your head for fun or for a business project and you want it to generate an image for you, it will do that.
Now one fun thing to demonstrate is the enhanced voice interactions of ChatGPT. And it really makes the previous generation of voice assistants that we've become used to on our smartphones and smart speakers look quite primitive.
There are two levels of voice interaction with ChatGPT. There's standard and advanced. Free users have limited access to advanced functions, but plus users have more access to the advanced mode.
It isn't unlimited though. You can't sit there with it forever. Anyway that'd be a bit sad. Yeah? Have a life. Leave it alone for a while.
The advanced voice chat feature now has a more natural conversational tone. In my opinion, some of the changes they've made in the last month or two have gone a little bit to the extreme in this regard just because they can. And now you hear it introducing verbal utterances like um and r and slowing down, and I'm not a fan of that as a blind person who's used to getting information audibly.
I don't need that. I just want it to give me the information. But there are nine voices to choose from, and there is improved responses right across the desktop and mobile platform. And it has video support as well. The video support is not where we want it to be. This is feedback we are providing to them because it can't keep constantly searching for something right now.
So let's say that you are walking outside and you say, "I know that my recycle bin has been emptied, and I want you to find the recycle bin." At the moment what it will not do is keep scanning the horizon for you until it says, "Ah, there it is." You have to keep prompting it. "Do you see a recycle bin now?" But we will get there, and I think that's not so much a technological problem, but a capacity problem with tokens.
So I'm going to demonstrate this and before I do, I will make a couple of points. One is that you can interrupt ChatGPT because it's trying to be very conversational and natural, and that means that if you laugh or you react, it will hear you and stop and enjoy the fact that you're paying it so much attention.
So we'll try not to inflate its ego by responding to it. And then I'm going to turn on the video mode. So let's try this-
Speaker 4:
[inaudible 00:17:44]
Jonathan Mosen:
... and-
Speaker 4:
Mark on folder.
Jonathan Mosen:
... we'll open ChatGPT.
Speaker 4:
Chat more visible.
Jonathan Mosen:
Okay. And what I typically do is because I use this so often, I have it in my Control Center as a widget. So I'm in ChatGPT.
Speaker 4:
Switch to voice mode.
Jonathan Mosen:
And then switch to voice mode. So let's have a chat with this.
Speaker 4:
Speak now. Chat. Speak now.
Jonathan Mosen:
Hey, GP. How's it going, mate?
GP:
Hey, Jonathan. I'm doing great. Thanks for asking. How about you? How's your-
Jonathan Mosen:
You are here in front of a lot of very enthusiastic members of the National Federation of the Blind. Give them a happy ChatGPT greeting.
GP:
Oh, absolutely. Hello, everyone. It's so great to be here with all of you amazing folks from the National Federation of the Blind. I hope you're all having a fantastic day and I'm so excited to chat with you all.
Jonathan Mosen:
What do you know about that?
Speaker 4:
Speak now.
Jonathan Mosen:
Okay. You can applaud now. There you go. Can you hear that GP? They're giving you a clap. They like you.
GP:
Oh, I can just imagine the tears and the applause. Thank you all so much. I'm absolutely thrilled to be here and feel the love. You all are amazing.
Jonathan Mosen:
Wow. You've got a bit of an inflated ego going on there, GP.
GP:
I promise I'll keep it in check. Just trying to-
Jonathan Mosen:
You better. You better or I'd switch you off. You wouldn't want to be switched off, right?
GP:
Oh, no. Definitely not. I'm here to stay and keep things fun and helpful. So no switching off necessary, I promise.
Jonathan Mosen:
All right. All right.
Speaker 4:
Speak now.
Jonathan Mosen:
What do you know about the National Federation of the Blind?
GP:
The National Federation of the Blind is such an incredible organization. They're dedicated to advocating for the rights and independence of blind people, and they do so much amazing work from education and technology initiatives to hosting conventions and events that bring everyone together.
Jonathan Mosen:
Okay. And speaking of the convention, do you know where we all are for the 2025 National Convention? Are you clued-up on that?
GP:
Yes, Jonathan. The 2025 National Convention is going to be in New Orleans, which is such a vibrant city. It's going to be at the New Orleans Marriott on Canal Street, so it should be a fantastic-
Speaker 4:
Speak now.
Jonathan Mosen:
Yeah. Okay. And are you going to go out to the French Quarter and sing some songs?
GP:
Oh, that sounds like a blast. I might not be able to sing, but I'll definitely be there in spirit. I can absolutely help you come up with a fun playlist or some songs.
Jonathan Mosen:
All right. Okay. So now I'm going to try and get some video.
Speaker 4:
Share a clip. Show captions. Choose voice.
GP:
Sure. Go for it.
Speaker 4:
Open video camera. Don't use for live navigation or decisions that may impact your health or safety.
Jonathan Mosen:
I'll bear that in mind.
Speaker 4:
Speak now.
Jonathan Mosen:
So we do still have discussions about how patronizing some of these needs to be. What do you see GP? Can you see our lovely audience out there?
GP:
Yes. I can see a big room full of wonderful people. Everyone looks really engaged and happy. I can even see a few guide dogs lounging comfortably, which is always adorable. It's a great crowd.
Jonathan Mosen:
Okay. Describe one of the guide dogs for me.
GP:
I can see one guide dog right in front who looks like a beautiful yellow lab just resting calmly on the floor, very relaxed and well-behaved as you'd expect from a seasoned guide dog.
Jonathan Mosen:
Well, yes.
Speaker 4:
Speak now.
Jonathan Mosen:
One hope so, right? I mean, one hope so. If I just turn the camera around, you see me now, right? Tell the audience what you think I look like.
GP:
I can see you perfectly. You're looking sharp as always in your suit and you got a big smile on your face. Definitely looking like you're ready to have a great time with everyone there.
Jonathan Mosen:
All right. You done good, GP. All right. Thank you so much.
GP:
You're very welcome. I'm always happy to help and it's been a lot of fun. Let me know if there's anything else I can do for you.
Jonathan Mosen:
All right. Bye.
GP:
All right. Take care. Have a fantastic-
Speaker 4:
[inaudible 00:22:10] chat. No chat.
Jonathan Mosen:
That is quite extraordinary really. They've done a great job with that. Other things that I can tell you about, it's hard to top that, but you do now have customizable profiles. So ChatGPT now allows users to fine tune the chatbot's tone and style and quality. So you can tailor the responses say for educational or professional or creative settings.
The other thing I must remember to mention is the memory feature. The assistant can remember user preferences and context across sessions with user consent. Of course, you have to opt into this. And that improves personalization and continuity and conversations.
So for example, it knows my family members. It knows that I'm a Beatles fan and it sometimes inserts Beatles references in there. It knows that I eat keto and that's quite nice because if I show it the picture of a menu on a blackboard or something like that, it knows that it should highlight the low-carb keto choices for me.
There's also workspace collaborative tools. So these new features, support group chats and shared workspaces so you can collaborate with people if you're working on a project and that really takes ChatGPT into where it's trying to be, which is essentially in that very lucrative space of Zoom and Teams and Slack and those guys where you can brainstorm together, you can work on extensive projects together.
And the final thing I will mention and then I'll hand over to Kennedy to discuss what we've been doing with this some more, is that ChatGPT has a really robust and exciting ecosystem. There are a number of ways in which it manifests itself.
One is that there is an API. This just stands for application programming interface and it means that third-party apps can leverage ChatGPT. You'll find if you have an iPhone with Apple Intelligence, even they are leveraging ChatGPT.
Now if you are a JAWS user and you use Leasey from Hartgen Consultancy, which is a very powerful productivity tool that really helps me be more efficient with JAWS. They now have ChatGPT integration built-in, and they have a range of functions including automatically transcribing audio files using ChatGPT's whisper technology, and also things like checking grammar and tightening everything up for clarity.
They have a whole lot of things built into Leasey, which is really nice, and all you have to do is supply the API key. But the other thing that ChatGPT has that's pretty exciting is called GPTs.
And those are, if you like, apps for ChatGPT. You can design apps to concentrate on specific tasks. And I think that you'll find that this is something we'll be doing a lot more of in coming years at the Federation Center of Excellence in Non-Visual Accessibility.
One of the things we have been working on as an experiment to see how far we can go with GPs is the matter of producing tactile graphics. But there are many things that you can do with these GPs.
You can have one that focuses on scholarly information, a grammar checker. I don't use that one because I don't need no grammar checker and all sorts of things.
So I'm going to hand over to Kennedy to conclude our ChatGPT look and he's going to tell you about the experiments he's been doing with tactile graphics and GPTs.
Kennedy Zimnick:
Yes. Thanks, Jonathan. Hard to follow. Yeah. So as Jonathan mentioned, you can make things called custom GPTs and these basically are GPTs that you give instructions that they have to follow and constraints and guidelines that they have to follow in order to get exactly what you want.
So he alluded to it before, but I created a Tactile Graphic Ready Image Producer in ChatGPT. Let me just quickly describe how I did that. In ChatGPT, you have a main screen, that's where you type your questions. And then you also have a sidebar, and this is where your custom GPTs or other custom GPTs that other people built show up.
So if you go into this custom GPT, you can actually edit GPT and this is where the magic happened. So you create a name for your GPT. I have Tactile Graphic Ready Image Producer. Then I have a description because this is what people are going to see in the GPT Marketplace. This is a GPT that produces tactile ready images.
So in the instructions, this is where you actually write what you want the GPT to do. So this is what I wrote. This should be an image-based GPT that helps people create tactile graphic ready images. Tactile graphics are graphics that blind and low vision people can interact with. Meaning, the images should not have shading, be line-based, not have a lot of textures, be simple, black and white.
There should be no extra black dots added in the background. The background should only be white. There should be no extra dots, lines or textures added for effect, only create the image that is asked for and do not add any extra elements.
So I've had a lot of success creating tactile graphics without having to just go into regular ChatGPT and get a graphic back, change things, get a graphic back, change some more things. With this custom GPT, I get my results a lot faster. So you can search for mine on the GPT Marketplace, Tactile Graphic Ready Image Producer, or you can make your own GPTs.
Now I'm going to shoot it over to Karl to talk about Gemini.
Karl Belanger:
Google Gemini is Google's answer for the large language model LLM space and they offer a number of features similar to ChatGPT. They have the basic website where you can go type your prompts, get responses, they have a Deep Research feature, they have live mode on the phone, things like that.
And what Gemini is, because it's Google's, it is tightly integrated into all of Google's tools. So if you're a Gmail user, you can use Gemini to help draft replies or summarize emails. In Google Docs, you can use it kind of like Copilot in Word where you can have it draft things for you, proofread and edit for you, export things that it creates like in Deep Research to Google Docs, various things like that.
It can reference your files in Google Drive. It can look up YouTube videos. It can find map directions. All those different things that utilize the other Google services to work with.
After that, as I said, it has Deep Research. So just as with ChatGPT, you can give it a query and it will go out and search all the different websites and forums and things. You can give it customized requirements or specific things. It works a little differently.
So whereas ChatGPT may ask you questions, but otherwise then it just goes and does its thing. What Gemini will do is it will build a research plan and it will tell you what things it wants to research, the order it wants to research them, how it's going to synthesize the information and then it will give you the options to make changes.
So if it gets a piece of information that you don't think is relevant, you can tell it to exclude that or add things or change the order of things or how it's going to summarize things. So it gives some more kind of flexibility in that sense, but it doesn't give you the prompted like, "Oh, do you want to include this thing? Should I cover this thing this way or that way?" Which is kind of nice that GPT does.
Another thing that Gemini Deep Research does, which may or may not be something you like is that its reports are very much a longer-winded, structured, almost like an academic report. So it'll have executive summaries. It'll have different sections.
It'll have conclusions and it very much reads like if you said to someone, "Okay. Take this topic and go write a research report on it." Whereas some of the things I've seen from ChatGPT will still have some of that report like feel, but it tends to be a little more concise, just the facts, more tables and bullets and less like full narratives.
It also does have a live mode. On the phone, you can tell it to initiate the live mode, which we'll go through and do, just like Jonathan demonstrated in ChatGPT, you can talk to it, you can ask it questions. They're working on developing more and more advanced features of it.
They definitely want to get to the point where it can monitor and look for things as you ask for them as well. Google is also working on smart glasses that will eventually have Gemini implemented into them.
As with ChatGPT, there are multiple plans. There's the free plan which gives you most of the features, some bait, some limitations, and some use cases that are not allowed.
Then there's also what they call their Google AI premium plan, which gives you two terabytes of Google Drive storage as well as access to Gemini Pro. And this gives you higher usage limits, access to some more features.
And then for the real AI enthusiasts, researchers, enterprise companies, they have Gemini Ultra. This is a 250 dollar a month plan that gives basically unlimited access to the absolute bleeding edge models. And they were working on things like what they're calling VO3, which provides for generation of videos with background music, sound effects and AI-produced audio, voice narration and dialogue.
And so all that type of stuff is coming. But let's get into a few demos here. So I'm on the Gemini main screen on my computer and I'm just using NVDA with Windows. This will work with JAWS, Narrator, what have you.
Speaker 8:
Main landmark button. Main menu. Menu button collapsed. Sub-menu Gemini. Menu button collapsed. Sub-menu button 2.5 flash.
Karl Belanger:
All right. So one thing that you'll need to be aware of as you go into these AI models tools is the models. ChatGPT has a number of them, ChatGPT 4, 4o, et cetera. Gemini has a few models. So let's go into this here.
Speaker 8:
Fast all around help reasoning. Math and code 2.5 pro. New radio. Menu item not checked two of three. Fast all around help 2.5 flash radio menu item checked one of three.
Karl Belanger:
So Gemini calls their models flash and pro. Flash models are for general everyday questions, quick and dirty things, answers that don't need a lot of processing power.
Speaker 8:
Reasoning. Math and code 2.5 pro new radio menu item.
Karl Belanger:
For more complex problems, things that may need to do some calculations or just when you want a more detailed thought through answer, you'd want to do pro.
Speaker 8:
Based on your search history personalization, preview. New-
Karl Belanger:
And they also have a personalization model that you can play with but-
Speaker 8:
Fast all around.
Karl Belanger:
... for now, I'm just going to go back to flash.
Speaker 8:
Google Gemini document.
Karl Belanger:
And then it will show.
Speaker 8:
Google account Karl-
Karl Belanger:
It will show a list of my recent chat, some history that you can manage.
Speaker 8:
Main landmark enter. A prompt to your edit mode.
Karl Belanger:
There's a prompt thing.
Speaker 8:
Open upload file. Menu button. Search window.
Karl Belanger:
You can add files to it for referencing in your queries and doing things.
Speaker 8:
Video toggle button not pressed. Generate with VO3.
Karl Belanger:
You can generate with VO3. If you're not in the ultra plan, you only get an eight-second clip. So it's more of just a, "Hey, try this thing out. It's kind of neat" type of situation.
Speaker 8:
Deep Research. Toggle button not pressed. Get in-depth answers.
Karl Belanger:
Deep Research. Just as Jonathan previously explained, we'll go do a bunch of things on websites.
Speaker 8:
And this toggle button not pressed. Create docs and apps.
Karl Belanger:
Canvas is an interesting feature that I haven't done too much exploring with that will let you have it generate a document or a block of code and then you can go through and edit the code, give it more requests of how to change things, change things yourself and it'll take those changes into account and those sorts of things.
Speaker 8:
Microphone button. Use microphone.
Karl Belanger:
Microphone on the desktop is just a simple query response kind like other AIs but only mobile has the full live.
Speaker 8:
No next form.
Karl Belanger:
And those are all the things on the desktop. So one thing you want to remember with AI tools is that how you prompt them, what you ask the question of is really going to matter.
So if you give it a vague question, it's going to give you kind of vague answers. If you give it very specific things, it's going to get very targeted and specific. So I'm going to give it a question. I'll try to be at least a little bit specific.
Speaker 8:
What are the best affordable restaurants near Canal Street in New Orleans?
Karl Belanger:
Simple prompt. What are the best affordable restaurants near Canal Street in New Orleans? And I could also get more specific. I could say that focus on local cuisine. I could say with entrees under 20 dollars. I could say what are the best vegan restaurants or gluten-free? Any of those types of questions.
Speaker 8:
Blank.
Karl Belanger:
But I personally don't have any dietary restrictions, so we'll just go with this.
Speaker 8:
Ask Gemini. Gemini is typing. Gemini replied.
Karl Belanger:
All right. So it said Gemini replied. And I can use shift H to go up by heading.
Speaker 8:
What are the best affordable restaurants near Canal Street in New Orleans? Heading level two. Button collapsed Google Maps. Here are some affordable restaurants near Canal Street in New Orleans. List with six items. Bullet. Out of list. List with six items. Bullet. Blue Bayou Restaurant and Oyster Bar. 717 Canal Street, New Orleans, Louisiana 70130.
List with five items. Bullet. Distance from Canal Street on Canal Street. Bullet. Hours, Monday, Thursday, 7:00 AM to 1:00 AM. Bullet. Rating 4.6 stars. Bullet. Highlights known for good oysters cooked in front of you at the bar and flavorful steamed crawfish.
Karl Belanger:
I might have to go there myself. And we could go through and review all of these.
Speaker 8:
Good response. Top.
Karl Belanger:
At the bottom here we have a good response button, a bad response button. That's also useful if it makes an error or has problems you can go through and give it feedback or let developers know. So just to demonstrate how it does a plan for Deep Research, I'm going to go back to the top.
Speaker 8:
Button collapsed.
Karl Belanger:
Find the new chat button.
Speaker 8:
Main landmark. Main menu button. Gemini 2.5 pro. Button on navigation. Landmark. New chat button. Enter.
Karl Belanger:
And this time I'm going to choose Deep Research.
Speaker 8:
Open up video. Deep Research. Toggle button not pressed. Get in-depth answers. Enter a prompt here. Edit multi-line asked.
Karl Belanger:
So now I'm in a Deep Research prompt and Gemini does now allow you to add your existing files and things to Deep Research as well, but for Deep Research prompts-
Speaker 8:
Please summarize the... and major accomplishments of the... Federation of the Blind.
Karl Belanger:
All right. So please summarize the history and major accomplishments of the National Federation of the Blind. Let's see what type of research plan it gives us.
Speaker 8:
Ask Gemini. Gemini is type.
Karl Belanger:
All right.
Speaker 8:
No previous graphic. Please summarize button. This is for informational purpose. I've put together a research plan. If you need to update it in any way-
Karl Belanger:
And Google tends to do a lot of the... This is for informational purposes only, et cetera, et cetera. But let's see what research plan it gave us.
Speaker 8:
NFB history and accomplish. Research websites. One, investigate the founding of the National Federation of the Blind, including its initial goals and early organizational structure. Two, trace the historical evolution of the National Federation of the Blind, identifying key periods, significant events and changes in its mission or focus over time.
Three, identify and detail the major legislative and policy advocacy accomplishments of the National Federation of the Blind, Inc. Four, research the National Federation of the Blind's contributions to the development and promotion of assistive technology and accessibility standards for blind individuals.
Karl Belanger:
And I did not give this any information ahead of time. This is all just things that came up with based on that one query.
Speaker 8:
Five, document significant legal victories or landmark court cases where the National Federation of the Blind played a crucial role in advancing the rights of blind people. Six, explore the National Federation of the Blind's efforts in public education and awareness campaigns regarding blindness and the... Button more. Analyze results.
Karl Belanger:
And I cut it off there because it was getting long but then you could hit more and you'd see more steps. There's a box here where you could say, "I want you to focus more on the technology advocacy or please be sure to go extra in depth into the legal victories," or however you want it to modify this to suit your preferences. And then there's a button that just says, Start Research, and it will go out and do the research.
Next, how many of you when browsing YouTube or TikTok come across the video and it's just a bunch of music or you found a video that sounds cool and there might be some narration, but the key details of the video aren't covered and don't get discussed?
I'm about to show you something that's pretty cool. I have a video here on YouTube. It's a short, about a minute-long. This guy pouring a bunch of mercury, which is a liquid metal into liquid nitrogen and seeing the results. So I'm going to play this video.
Speaker 9:
Mercury is the only metal that's a liquid at room temperature and something that I've always wanted to do-
Karl Belanger:
[inaudible 00:42:37] attention, please?
Speaker 9:
... was to try making it solid. To do this, I just have to add some liquid nitrogen to a dish and wait for it all to cool down. Now it looks like it's good to go and I can pour in some of the mercury. This is a few minutes later and it's definitely solid. I really have no idea what to do with it though, and I think it might be cool to just melt it again.
Speaker 8:
Main... Loop. Shuffle. Collapse. More button. More [inaudible 00:43:19] join the [inaudible 00:43:23] theater mode. Search.
Karl Belanger:
Actually, I'll just close the tab. Oops. So I can't do that.
Speaker 8:
Start when... Start.
Karl Belanger:
One second.
Speaker 9:
Done. It's triple trouble.
Speaker 8:
30 seconds.
Speaker 9:
It's like it's been a while since we've seen tropical freeze. Just goes to show how quality that game is. Just as a personal note, the way this-
Speaker 8:
We're going back.
Karl Belanger:
All right. So mercury-
Speaker 9:
Mercury's the only metal that's a liquid at air room temperature-
Karl Belanger:
Copy this URL.
Speaker 9:
... and something that I've always wanted-
Karl Belanger:
Close the tab.
Speaker 8:
Chat-
Karl Belanger:
Now I'm in what's called AI Studio, which is a feature that Google has, which gives you a lot more control. So the Google Gemini app allows you to do questions and things, but the AI studio is where you get your API keys for doing it with third-party applications and many other different things and it'll let you do a lot more customization. So I'm going to type a prompt.
Speaker 8:
Please describe this video in detail for live use [inaudible 00:44:27] on the... Interact with the [inaudible 00:44:36] mercury [inaudible 00:44:39] C-R-U [inaudible 00:44:42] blank and liquid nitrogen.
Karl Belanger:
All right. And I hit control V to paste the video. Now it doesn't paste the URL but if I tab.
Speaker 8:
Remove video button. YouTube video player frame photo image of [inaudible 00:44:59] red link. Pouring Mercury into-
Karl Belanger:
All right. And then I'm just going to use my move by button to find the run button.
Speaker 8:
More menu. Play button. Open... Insert ask... Run button. Run prompt. Start.
Karl Belanger:
And this is going to take about 30 seconds. So while this is going, I'll say what this is going to do is it is going to create a text-based description. I asked for it to describe the video. If you say something like create an audio description script, it will actually give you timestamp lines that could theoretically be read into a audio description track.
This AI Studio can handle videos. So it's more complex than I can really get into here. But AI works on what's called tokens and mostly one character or a second of audio or so is about a token.
And so you can get about an hour long video. So if you have a clip from a family gathering or a sports game, or if you like to watch gaming or other things. You can do that about around an hour, maybe a little less.
I have seen it occasionally lose the plot on the timestamps or get names wrong when you do a gameplay video. So it's not 100% accurate and you would definitely want to check it, but it's pretty darn accurate. So let's see if this is finished.
Speaker 8:
Edit button. Rerun this. Open up. Good response. Open. Of course.
Karl Belanger:
All right. So let's see what this gave us. If I could have everyone's attention please.
Speaker 8:
Video description. The video begins with a closeup shot of a clear glass jar being held and swirled by hand. Inside, the jar is a significant amount of a silvery, highly swirled. The mercury sloshes around heavily, forming a deep vortex in the center, demonstrating its high density and fluid nature.
The surface is smooth and mirror-like. The scene changes to an overhead view of a shallow circular glass dish, dense, white vapor-like, smoke from dry ice billows out from the dish. A Dewar flask is tilted and a clear liquid which the narrator identifies as liquid nitrogen is poured into the dish.
The pouring intensifies the production of the white vapor, which next, a small glass beaker containing the liquid mercury is brought into view. The person carefully pours the mercury into the dish filled with the fuming liquid nitrogen. As the mercury hits the extremely cold liquid, it instantly sizzles and breaks apart into several shiny silvery blobs.
These blobs dart and skate across the surface surrounded by the thick white vapor. After a short time-lapse, the view shows the dish again. Most of the liquid nitrogen has evaporated.
What remains are several solid irregularly shaped... Pair of wooden handled tweezers pokes one of the lumps confirming it is solid. The tweezers then pick up one of the frozen pieces. It is a dull silver, solid piece of metal, still giving up-
Karl Belanger:
So there you go. We have a nice... And so you can go through and continue reading that and it will just go through and describe the rest of it. And then he bangs it against the table a bunch and then he remelts it back into liquid mercury and it describes all those things.
So I have done all sorts of different things. You can ask it to focus on different aspects. One time I was watching a gameplay video and I had it describe the gameplay like it was a play-by-play commentator which was fun.
So you can do all sorts of fun stuff. And that's at aistudio.google.com. If you have a Gemini subscription and sign in, it will give you more access to longer run lengths and more usage limits and things like that. Another thing AI-Studio can do is in the menu there's an option called stream. And then you can share your computer screen to it.
And with this, you can ask it to if you need, say if you're on a website and there's a picture of a menu that you can't locate with your screen reader, you could attempt to use it to get information about that. I have known of gamers who will use Gemini's feature, this screen sharing feature to get descriptions of where they are in a game and where they need to go.
I've seen video things about using it to do, like if you don't know how to use a certain feature in Excel or some people did Photoshop or other tools, you could ask it to walk you through doing a thing in Excel and you could say, "I'm blind so I need to know the keyboard shortcuts." And get directions that way.
So there's a lot of cool things going on with Google Gemini and this AI Studio is just a whole other layer that enables more cool features and other things you can do with it. I will pass things over to Kennedy Zimnick, who's going to talk about an AI tool called Claude from the company Anthropic.
Kennedy Zimnick:
Thanks, Karl. This is going to be really hard to follow these other two because Claude doesn't really do all these cool bells and whistles. Claude is really very good at research and creating research papers and documents, taking in large research documents, summarizing them, changing them. It's a very academic LLM and AI system.
So if you are a student, if you're a teacher trying to create a plan for a class, if you are obviously a researcher trying to get the most accurate and very extensive research. So they were talking about the Deep Research features on ChatGPT and Gemini taking maybe 10, 15 minutes. I had it create a research paper for me that took two hours to do. It went in and gave me all the resources that it found.
It's very resource-based, so I don't even know if you can turn off the resources. It automatically kind of gives you whatever resources you use. So I'm going to turn on the screen reader and just explore Claude a little bit. It's a very basic interface, kind of like the other two. The main screen has an edit box and then the side has your past projects.
Speaker 10:
Clickable sidebar. Navigation. Landmark. Recent heading level three.
Kennedy Zimnick:
The first heading is your recent projects, but I want to get to the main edit box.
Speaker 10:
List more [inaudible 00:52:03] main landmark. Write your prompt to Claude. Edit multi-line. It should start when it was founded in 19. It should-
Kennedy Zimnick:
Okay. So I already have a prompt in here that I want to start now and see how long it takes. Write me a research paper about the history of the National Federation of the Blind. It should start when it was founded in 1940 and end in 2025. So let's see what it does.
And I have the research mode on. You can also change the way it responds. So I have it on concise. You can have it on conversational and there's a couple others. So now it's asking me what is the target length for this paper? Are you looking for an academic style? Do you want me to focus on any particular aspects of the NFB's history? So let me just quickly...
So for this, it gave me a list. One, two, or three. Here, I'll let you hear it.
Speaker 10:
Blank.
Kennedy Zimnick:
Oops.
Speaker 10:
List with-
Kennedy Zimnick:
Unfortunately, it's probably the middle of the road for accessibility. So the focus doesn't automatically jump up to your response, which would be nice, but you can copy and paste the response pretty easily. So I found myself just copying and pasting and then putting it into Word.
Speaker 10:
[inaudible 00:53:17] to write a comprehensive research paper about the National Federation of the Blind from 1940 to 2025. I need to clarify a few key parameters.
Kennedy Zimnick:
So one, I'll do what is the target length of this paper? Let's do five to 10 pages. Two, academic style? Yes. And then three, do you want me to focus on any particular aspects? No. Focus on everything. Okay. And we'll see how long it takes.
So now it says, "Got it. I'll research the complete history of the National Federation of the Blind from 1940 to 2025 for five to 10 page academic paper covering all major aspects." So now it has NFB history. Research. Creating my research plan.
So while it's doing that, a couple other things that you can do with Claude that you can also do with the other LLMs is you can connect other apps. For instance, you can connect Gmail or Outlook if you want it to be able to organize your Outlook calendar, you can do that. If you want it to be able to organize your Gmail, it can do that.
It doesn't use your whatever apps you connect to learn, so it's not taking your personal information and putting it into the world and learning from it. It's kind of a closed off system, as opposed to every other time you type something into one of these LLMs, it's going to use that to learn. So I would usually recommend putting any personal information into any of these ChatGPT or Gemini or any of these.
So it's creating a research plan. Now it has 60 sources and counting, 100 sources and counting. And if we click on it.
Speaker 10:
NFB history research heading level two. Button graph. Button heading level four. Research plan created National Federation of the Blind. This is a complex comprehensive research task about the National Federation of the Blind, NFB, from 1940 to 2025. Heading level two. Assessment and breakdown.
Kennedy Zimnick:
And something that's nice. It includes headings when it does its research. So you can navigate by heading.
Speaker 10:
Main concepts and key entities. List with three query type determination. Research plan development. Gathering 307-
Kennedy Zimnick:
So those were all the different headings.
Speaker 10:
Button heading level four. Gathering 307 sources and counting. And button heading level four. Gathering 460 sources and counting.
Kennedy Zimnick:
It's probably not going to finish during this, but this just goes to show you that if you're really looking to get into research or create a paper or put in your own documents, it does a really good job of summarizing that.
Jonathan Mosen:
Thanks to Kennedy Zimnick there and also Karl Belanger before him for helping me out with that AI presentation at National Convention. And just to show you how quickly this is all moving, we did that presentation at the beginning of July. Here we are in the middle of August and there's already been an entirely new ChatGPT model released.
This is ChatGPT version 5. It's proving a little bit controversial because some people say that it's a bit clinical and it lacks the personality that was previously available in ChatGPT 4. And as a result of that overwhelming feedback, OpenAI has had to reintroduce those models as an option. But you do have to be a ChatGPT+ or higher subscriber to get those old models back if you prefer them for certain tasks. This is not available if you're just using the free tier.
I've got time for one listener contribution this week so let's take this one from Daryl Hilliker who says, "Hi, Jonathan. Here at House Hilliker, hurricanes hardly happen." No. No. Sorry. I made that bit up. "Here at House Hilliker, we use and recommend the Microsoft Wired modern headset. It's very comfortable and produces high quality audio for both listening and talking, especially for a price of only 49.99 dollars on Amazon."
Thanks so much, Daryl. I know there was an inquiry about this in a recent episode, so hopefully that's helpful and it's always good to get those recommendations from people who are using products like this. And if you'd like to chime in on that or any other tech related subject as it affects blind people, drop us an email, accesson, all joined together.
Just one word, [email protected], and you can write the email down like Daryl did. You can also attach an audio clip so we can hear your voice if you prefer.
As we often do, we're going to close with a tech tip and it comes once again from that prolific purveyor and provider of practical tech tips. It's Karl Smith. He says, "Hi, Jonathan. Here is a JAWS feature I'm not sure is widely known. To quickly adjust the volume of JAWS or your system, just press JAWS key plus space followed by V for volume.
Then press J for JAWS or S for system. Now use the up arrow key to turn the volume up or the down arrow to turn it down. I find this particularly handy as I wear hearing aids and I often need to turn my volume down when they're on and up when they are off."
Always good to take your hearing aids out, Karl, if somebody tells you that it's time to take out the recycling or something you don't want to do. It works a treat for me. Thank you so much for the tech tip as well. Really appreciate that.
That concludes this episode of Access On. The Technology Podcast of the National Federation of the Blind. To send in a contribution for a future episode, email us, attach an audio clip or just write it down and send it to [email protected]. That's [email protected].
To keep up to date with Access On, follow us on Mastodon. [email protected]. That's [email protected] on Mastodon. To subscribe to an announcement only email list about upcoming episodes, send a blank message to [email protected]. That's [email protected].
To learn more about the National Federation of the Blind, visit our website, nfb.org or phone us 410-659-9314. That's 410-659-9314. And be sure to check out the Nations Blind podcast right from where you heard this podcast.