Welcome to the sixteenth episode of Access On, the National Federation of the Blind's Technology podcast.
Episode
Listen to the sixteenth episode of the Access On podcast (Browser).
Or listen on your preferred podcast platform.
Timestamps
The show is segmented by chapter, making it easy to move between segments of the podcast if you have an app or player that supports chapters. Below is what's on the show this week, and when you can hear it.
- Introduction 0:00
- Overview and general examples of prompt engineering 1:47
- Using prompt engineering to get the visual information you want 24:37
- Custom GPTs and tactile graphics 47:02
- Closing and contact info 58:44
Transcript
Speaker 1: Live the life you want.
Speaker 9: Access On.
Jonathan Mosen: Welcome to Access On, the technology podcast of the National Federation of the Blind. This week we are bringing you an abridged version of our recent boutique on AI prompt engineering. Karl Belanger gives us a general overview of how to craft AI prompts to get the results you want. I'll discuss working with specialist and mainstream tools to obtain visual information from AI. And Kennedy Zimnik discusses custom GPTs and generating tactile graphics.
Welcome to episode 16 of Access On, it's Jonathan Mosen at the Jernigan Institute here in Baltimore, Maryland. The weather's warming up, it's actually feeling positively spring-like out there as I put this together on a Monday just before heading off to the CSUN conference. Looking forward to seeing many of you there. By the time that you hear this, maybe you are heading home and you're using this podcast to while away the hours as you fly.
We have had plenty of requests to bring the recent boutique that we did on prompt engineering, which is a fancy way really of saying, how do you talk to AI effectively, to this podcast. So we're bringing you an abridged version of it today. We have quite a catalog of existing boutiques that we can draw from, which have a lot of interesting information. And given the number of requests that we've received to do this, it's likely something that we will do going forward.
So here is this abridged version of the boutique on AI prompt engineering. We begin with Karl Belanger who tells us what prompt engineering is.
Karl Belanger: AI prompting, also called prompt engineering, is the art of asking AI for things in a way that makes sure that it knows what you want and can provide the response in the best way for you. And this can take a number of factors and there's a number of ways to do things, and that we'll get into.
So the AI prompt is when you get to ChatGPT or Gemini or Copilot or whatever and you have that box, it might say ask something, it might say enter a prompt, whatever it is, it's all the same. You type in what you want the AI to do, hit enter or click the send button, and off it goes to do its thing. And the prompt can be a question, directions on how to do a task or you can even give it instructions to how you want it to behave going forward.
A prompt will have a number of some kind of key components to it. First, clearly ask a question or state what you want to do and define a task. Then give it guidelines or constraints or even specific requirements as to what you want. And I'll explain all these later. Provide background or context to what you're doing. And optionally, you can specify the length or type of input that you're looking for. Also, strike a balance between being too vague or overly specific, clearly ask a question or define a task.
So if you go to an AI, let's say you're thinking of recipes, you might just ask it a simple question of what should I do for dinner tonight? It's a valid question, but it's also very broad. A slightly better way might be if you have a pound of pork that you need to use up, say I have a pound of ground pork that I need to use, please give me three straightforward recipes that I can make with ground pork for dinner tonight. That's a much better question, it clearly states what's going on, tells it what you want and provides information.
Next is provide guidelines, constraints or specific requirements. This can be as simple as, instead of just saying give me recipes, give me three recipes, give me five recipes, give me four marketing ideas, recommend three book series. Simply giving it a number of things you want. In the book series example, give me three fantasy book series that I might like to read that are similar to Lord of the Rings. That gives it a specific requirement, three books. It gives it guidelines, fantasy books similar to the Lord of the Rings.
Constraints can be things like, that are less than 500 pages or that do not contain some other things that you don't want to read. And you can continue to add these guidelines or restraints more or less to specify how you want to have this work.
Provide background or context. This can be things like telling it, for example, if you're crafting a marketing e-mail, you can say, I run a small tech repair startup in Baltimore, Maryland. And then when you say, please give me three marketing e-mail ideas to reach the local community to market my tech repair company, because it knows that you're in Baltimore, Maryland, it might give you some additional recommendations or resources that may be more beneficial to this area.
Or if you're asking for dinner recipes again, if you say, I have kids who generally don't like vegetables, please give me five recipes that you can build in vegetables that are generally kid-friendly. That gives the AI context and background of that you're not just looking for vegetarian or other healthy recipes that you may want, you can also discuss that it's kid-friendly, various other things like that.
Specify the length or type of output. So if you just tell it a question, you'll probably get a narrative back, you might get lists, you might get tables. But you can say, give me the three fantasy book series, provide the first book title, author and a brief description of each series in a table. That's specifying the output. Then it will give you a nice formatted table of what you're looking for. How it can be formatted, what you want in the table, things like that.
Or you might say, generate a list of at least five different elements that contain blah, blah, blah or whatever your input needs. And this can be, you don't have to include this in your prompt, but it can certainly be a very useful thing to make sure the output is what is most useful to you.
Another big piece is finding the right balance. Prompts very much depend on what you're saying, what you're trying to get at. If it's a simple question like how does snow form? That could be fine. You're just asking for a brief description of how does snow form, simple question. But now if you're wanting to know something more in depth, you might say something like, describe how humidity levels and cloud density affect the formation of snow crystals in such and such an area or at certain altitudes. Summarize the different types of crystals in a table. Much more descriptive prompt gives you a lot more to work with and can help you build on the initial prompt.
Striking that balance is definitely hard. So how can we help to work through this better? One is just try to think objectively. If you're trying to convey to an AI what you want it to do, just think how can I clearly articulate what I want? Also, trial and error is okay. So if you do a prompt, ask a question and it gives you some response out of left field, just hit that new chat button and try again.
Or if it got it close, you can go through and say, okay, great, but I want more detail on this area. Or can you reformat it like this? Or I don't think this is quite what I was looking for, can you try redoing it this way?
Other things that I'll get into later are AI will have a regenerate button or rewrite response or various ways to word it. What this will do is it'll essentially redo the prompt but have it try to generate the response in a slightly different format or with slightly different things. That's another way to do it.
And then you can continue to work through your prompts, ask it something, give it more info, back and forth you go until you get to a point where you feel you have the input you want.
Another thing you can do is you can ask AI to help you. So for example, please help me craft a marketing template for my tech repair business. Please ask any further questions that will help narrow down the scope or provide better information to provide the best output. And then that may prompt the AI to ask you questions about where are you located? How big is your company? What is your target audience? Do you target specific types of technology? And things like that. And you can give it answers.
You could also just simply say, I want to write a marketing email for a business but am having trouble, please help me formulate a prompt to best get marketing emails for my business. And it will give you some examples. It may give you other things to think about or questions to answer, things like that.
A new and upcoming tool you could look into are prompt generators. These are tools that will enable you to put in, again, things like marketing email for a small business. And whereas a generative AI like Gemini or Copilot or ChatGPT or any of those could certainly provide some help in doing a prompt creation. These prompt generators are designed and purpose-built to take your prompts, give you suggestions, and generate and refine prompts until you have a prompt you like that you can then copy and paste into your AI model of choice.
The last piece to consider is there are now more and more types of AI models. So if you go on to ChatGPT or Google Gemini or Perplexity, many of these other things, you will see a number of different models. For example, in Gemini you have flash, flash thinking, pro, deep research. And they don't really explain these well. Generally it's okay to just default to whatever the default it's giving you or choose the pro model. But for certain types of questions you may want to consider either a thinking model or a deep research model.
What a thinking or reasoning model does is instead of just going through the calculation once based on whatever prompt you give it, it will go through multiple steps, iterate on itself, try to think through your question and spit out responses. This is overkill for general like what recipes can I make tonight type of questions. But if your question is a very technical or math-based one or if you're looking for some computer code, because that's another thing AI can do is write programs for you, having these reasoning or thinking models can get better, more consistent output than just the basic model would.
Another one is deep research. What these will do is you give it a research topic, for example, as a test, I did one on the marketing trends and claims in accessibility companies. And what it will do is it will go out, search the web, do recursive searches, find sources, and then generate a whole report based on those sources with the information being cited throughout. If you need to do some research on a topic for work, this could be a great way to get you started.
Now with any AI, verifying is always key. Even though these are using websites, they're still not guaranteed to be accurate. And for any of these really any prompt information you ask it for, always trust but verify. That is the most important aspect that I can recommend to you is to be not wholeheartedly skeptical, but at least take everything that AI gives you with that slight bit of, I may want to check this later. Because while they're frequently and generally most of the time very good, AI models are still somewhat prone to errors and can always make mistakes. So definitely check what it's giving you, do some of your own research and build from there.
Now we have Gemini up, I'll be demonstrating a little bit of Gemini and a little bit of another AI called Perplexity. I just want to show you at the top of your screen.
AI: Button collapsed Google account main landmark main menu button, Gemini advanced 2.0 pull experimental menu.
Karl Belanger: At the top, you'll have a menu button, which generally it'll say-
AI: 2.0 slash thinking, 2.0 flash for everyday tasks plus more features, one of five.
Karl Belanger: 2.0 flash. Now I am using the paid version of Gemini and Perplexity for these demos. There are free versions as well, which if you're just getting started will likely be fine, but you may want to consider. So we have the flash for everyday tasks, which is generally the one you'll want to use for most of the time. But just to show you, we have a few different things to choose from.
AI: 2.0 flash thinking, experimental best for multi-step reasoning, two of five.
Karl Belanger: There's the thinking or reasoning one that I mentioned earlier.
AI: 2.0, flash 2.0 pro experimental best for complex.
Karl Belanger: There's the pro which is a more advanced, has more data, knows more things.
AI: 1.5 pro of deep research, you get in depth answers, five of five.
Karl Belanger: And then there's the deep research that I talked about earlier.
AI: 2.0 flash.
Karl Belanger: For now, we'll just use the 2.0 flash. And so for our first prompt just to illustrate, we'll do something vague, let's just do write a marketing email.
AI: Marketing email for a tech repair. Project core team. Write a marketing email for a tech repair startup.
Karl Belanger: Write a marketing email for a tech repair startup. I'll hit enter.
AI: Ask Gemini, Gemini reply. Write a marketing ... button, listen, subject, tired of tech troubles, we've got you covered. Hey name, is your phone screen cracked? Disrupt your day. At your startup name. We understand how frustrating it can be ...
Karl Belanger: And it gives us a fairly generic email about your phone screen.
AI: The best in your city/area. What makes your ...
Karl Belanger: So we have kind of a generic marketing email here. Again here it provides blanks for you to fill in.
AI: List with five items, bullet, fast and convenient.
Karl Belanger: Your product name and startup name, things like that, provide some bullet points. But now let's-
AI: Enter a prompt here, micro, enter.
Karl Belanger: ... try again and we'll make some specifications, I'll tell you my prompt in a second.
AI: Write a marketing email for my tech repair startup called Cool Tech Upgrades space focusing on PC upgrades and repair in Baltimore, Maryland.
Karl Belanger: All right, so this is my new prompt now.
AI: Write a marketing email for my tech repair startup called Cool Tech Upgrades, focusing on PC upgrades and repairs in Baltimore, Maryland. Blank.
Karl Belanger: And I'll say the email should be a hundred words or less.
AI: The email should be 100 KDS or less as [inaudible 00:18:03] project core team. Wait a mark heading level two button expand. Button listen subject level up your PC Baltimore. Hey Baltimore is your PC lagging, Cool Tech upgrades specializes in PC upgrades and repairs right here in town. We offer fast reliable service for everything from slow startups to full system overhauls. Need RAM boost GPU upgrade, we've got you covered. Get a free diagnostic and quote today, visit your website link or call us at your phone number to schedule your upgrade, make your PC cool again.
Karl Belanger: And so there you go. Much more targeted, upgraded, it references in Baltimore right here in town. I didn't provide a website or a phone number, so it still left those things blank that I could fill in later. But there you go. We have a much more targeted email about my business and giving specifications.
So let's do a new chat.
AI: Google main lamp, Gemini main menu button navigation tech repair, new chat button enter.
Karl Belanger: All right. So we have a blank prompt. So again, I'll do one more example and then I'll show you quickly in Perplexity Pro how this works as well.
AI: Give me three recipe ideas for dinner tonight. Give me three recipe ideas for dinner tonight.
Karl Belanger: So that's my prompt.
AI: Ask Gemini. Gemini is typing. Gemini.
Karl Belanger: All right, it has replied. Let's see what we got.
AI: Give three. Button. All right, let's get some dinner inspiration going, taste and time constraints. All right, let's get some dinner inspiration going. Here are three recipe ideas that cater to different taste and time constraints.
Karl Belanger: And so right off the bat here we see because we didn't specify anything, it gave us things and it immediately told us, hey, these are three ideas that cater to different tastes and time constraints.
AI: List with three items, bullet easy, sheet pan, lemon, herb, chicken and veggies, list with three items. Bullet. This is a fantastic option for chicken thighs with bullet drizzle with olive oil, lemon juice and herbs like rosemary and thyme, season with salt.
Karl Belanger: And so now it's giving me fairly generic recipes with ... it's not giving me specific quantities or other things like that. It's all just kind of general suggestions. So again, we can make this a lot better.
AI: Enter a prompt here and multi-line ask Gemini.
Karl Belanger: Let's see, give me three recipes.
AI: Give me three recipes for dinner tonight that are kid friendly and quick to prepare while using healthy ingredients. Give me three recipes for dinner tonight that are kid-friendly and quick to prepare while using healthy ingredients.
Karl Belanger: That's my response.
AI: Gemini replies, give me three recipes, heading left, great, kid-friendly, quick and healthy is the perfect trifecta. Here are three ideas. List with three items. One. Pizza quesadillas. List with four items. Bullet. Why it works. Kids love pizza. Bullet ingredients. Wholewheat tortillas, low sodium marinara sauce, hard skin mozzarella cheese and any kid friendly veggies you have, bell peppers, spinach, mushrooms. Bullet instructions. Spread a thin layer of marinara on a tortilla, sprinkle with cheese and veggies, top with another tortilla. Cook in a skillet or griddle until golden brown and the cheese is melted. Cut into wedges. Bullet healthy angle. Wholewheat tortillas add fiber and you can control the cheese. Out of list two, chicken and veggies.
Karl Belanger: And there we go. We have three different recipes that are much more targeted and that can do what we're looking for. Just real quick, I'm almost out of time.
AI: Perplexity.
Karl Belanger: I do want to show you Perplexity because this has a real cool thing about it, it can do sources and things like that.
AI: Ask anything. No next edit field. Out of edit. What do you want to know? Ask anything. Edit multi-line. Ask anything. Please provide three top tip for effective AI prompting. [inaudible 00:22:28] Please provide three top tips for effective AI prompting and why each is useful. Perplexity document selected.
Karl Belanger: All right, so Perplexity is not quite as accessible as Gemini.
AI: Please provide-
Karl Belanger: It doesn't give me a feedback when it's done, but I just did the search, now if I down arrow-
AI: Pro search.
Karl Belanger: I just did a shift H to find heading and-
AI: Please provide 37 sources.
Karl Belanger: ... it found 37 sources for my simple request of why, find three AI prompting things, tips. So let's just read this real quick.
AI: Blank link AI prompting best practices, link graphic, link code, link AI help, link graphic, link, link, link graphic, link graphic.
Karl Belanger: And so provides links to the top three sources.
AI: Graphic, graphic plus blank plus 34 sources.
Karl Belanger: Plus 34 sources, it's not called a button, but we could hit enter on that to expand it.
AI: Answer, here are three top tips for effective AI prompting and why each is useful. List with three items. One, be specific and provide context. Link AI prompting, best practices. [inaudible 00:23:28] Link getting started with prompts. Link for text-based generative AI tools, link GPT knowledge called prompts include relevant details, background information and the purpose of your query helps.
Instead of asking [inaudible 00:23:41] assign a role and persona to the AI link getting started with prompts for text-based. Link out a prompt of expert or persona. You can enhance the quality. [inaudible 00:23:47]
Karl Belanger: And this is one piece that I hadn't mentioned in my previous piece, but you can, and there is some differing opinion on whether this is viable or not or it's still necessary, but you can say, as a dietitian for recipes or as a marketing expert or acting as a whatever your role is, create a document that will do such and such. And that is another good tip as well that you can use strategically to help when you want it.
AI: You're seeking, question, link of using specific three, use natural language and be conversational, link the ultimate-
Karl Belanger: Yep, and that's again phrasing it just in a way that makes sense like you're talking to a person and be direct. At this point I'm going to turn it over to
Jonathan to talk about how we can use some of these techniques in blindness specific AI tools.
Jonathan Mosen: Thanks very much Karl. Well, in my section of this webinar I'm going to discuss making the most of tools that can give you access to visual information. Broadly speaking, there are two categories of apps that can perform this function. First, there are specific tools that have been designed to perform what we might call visual interpretation. And that is to say you take a picture or you provide an existing image and you interact with the tool to learn about the image.
These tools are designed specifically for that purpose. And the intended audience is blind people. Now your second option is to use mainstream tools for this sort of thing, for image recognition; and you'll need to work a little harder on your prompt engineering skills to get the results that you want.
So let's talk first about the blindness specific apps. It's not an exhaustive list because we now have a lot of options, but some tools like this include Be My AI, which is a part of the Be My Eyes app; AccessAI, which is part of the Aira Explorer app and is also on the Aira Explorer website. You've got Seeing AI from Microsoft, which is available for Android and iOS. Google Lookout, which is Android specific. There's the new Ally from Envision which is about to come out of beta and it's available on the web and smartphone platforms. And let's not forget Pixxiebot, which describes videos.
The tool that you choose to use can often come down to personal preference in terms of the way that the user interface works, whether you find it easy to use or not or maybe the style of the descriptions that you prefer. Some tools are now becoming more aware of user preferences and they give you some choice over how verbose the descriptions are. In a way, this is actually a form of prompt engineering made easy for you. Through selections on a settings screen, you're giving instructions to the AI about how you would like things described you, how you would like it to respond. So prompt engineering made simple.
Regardless of whether configuration options exist or not, in the context of today's discussion on prompt engineering, it's important to emphasize that all these companies have done their own prompt engineering behind the scenes. And these prompts that you don't get to see are part of the app's secret sauce.
Another factor that will influence the output that you get is the large language model that they use, for example, ChatGPT, Google, Gemini or Claude. Some of them have access to several large language models from different companies and they may determine which LLM to send your image to based on the kind of image that you're sending.
Now Picture Smart, which is built into JAWS from Vispero, does allow you to specify which LLM you use, so you can try them both if you want to. Some of these apps are more transparent than others about exactly which large language model they're using and when. If the services give you the chance to rate their responses, you can be sure that those ratings are being used to tweak the services over time. To work on the background prompt engineering that these companies are doing, they will go through that data. So if you feel strongly, either positively or negatively, about a result that you've got back, it's a really good idea to rate it if that option is offered to you. Because I know from first-hand experience that that data is examined.
All of the smartphone apps designed specifically for visual interpretation have a mode that'll give you a description of a picture the moment that you take it. I mean, once it's come back from the cloud. Some, like the Access AI in the Aira Explorer app, have an advanced mode that let's you ask a question right away as you send the picture. And Picture Smart from Vispero does this as well. Usually the automated responses are easier to access because they're what most people use most of the time.
I can't independently verify this of course, but I do believe that usually AI is doing a good job of describing scenes for me, such as if I'm at a restaurant and I want to get a feel for the visual ambience and the decor. If I'm standing on a balcony overlooking something picturesque. Or if I'm looking at something in a museum.
Over the weekend, for example, I snapped some pictures with some of the popular AI tools outside the Jernigan Institute while I was waiting for an Uber. By simply taking the picture, I was given a broad response about the scene, referring to bushes that could be seen outside and some cars that were in the image. And I like to think of this concept as audibly zooming in on parts of the image.
Using good prompt engineering skills, you can expand on the parts of the overview that interest you. For example, rather than saying tell me about the cars you can see, which gives the AI a lot of discretion about how it interprets the request, I was specific and I said give me a detailed description of each car you can see starting with the one closest to me and then moving further away, provide as much detail as possible. And the descriptions I got of the silver sedan and the black SUV and some other cars were excellent. And I presume they were accurate.
I have sometimes found this technique to be helpful if I'm looking for a ride-share that has pulled up. And we've all seen those examples where they tell you that someone's pulling up, they describe the color and the vehicle manufacturer and even the license plate. I haven't had too much luck with license plates, but I have been able to say, find the white Toyota Prius for example.
Describing people is still a controversial topic in the world of AI. But it's generally acknowledged that a blind person should have the same information that a sighted person can simply see. The blindness specific AI tools will attempt to describe someone's appearance and what they're wearing. The details that the AI can provide depend of course on what's in the image. But if, as is common on social media, someone takes a full length selfie with the help of a mirror, those with an interest in fashion choices can ask detailed questions.
Where I've had much less consistent results is in the area of identifying controls on appliances or remote controls. I think what's happening here is that sometimes an AI is taking a look at a device and making a wrong conclusion about what device it is. If the manufacturer of the device is clearly visible in the picture, that will help. But even then it may pick the wrong device to describe to you.
Recently before leaving New Zealand, I took a picture of the remote control of our Samsung TV, which is still in New Zealand because it wouldn't work here, some of the AI tools I used clearly identified that I was looking at the remote control of a Samsung TV, but when I asked the AI to read me the controls from left to right top to bottom and provide a detailed explanation of their function, some of them described a remote control that I knew wasn't mine.
If this happens, sometimes you can give the AI a bit of a gentle correction. And this is a form of prompt engineering you might want to get into from time to time, not just with visual things. Even saying, are you sure, would you mind taking another look at this is sometimes enough to get it on track again with an apology and the correct information. At other times it'll still get it wrong or insist that it was right in the first place.
It can be very frustrating to think that you have a description and the order of the controls only to press the buttons and find that they're not performing the functions the AI say they should. And in fact it can get you into a real pickle if you drill deep down into menus on an inaccessible device. The quality of the image can have a real bearing on this, so be sure you have good natural lighting whenever that's possible, and if it's not possible, enable the flashlight for your camera.
This morning, I conducted an experiment and I took a picture of the Honeywell thermostat in my office here at the Jernigan Institute, because this is a really common task for many of us, right? When we perform this, we go to a hotel, we want to know how to adjust the temperature, finding the thermostat in the first place and then getting it to the temperature that we like.
All the apps that I tried identified the manufacturer of the thermostat. And most of them without prompting told me the temperature on the display. And I think this is an example of the internal prompt engineering at work. An AI for a sighted person is less likely to speak or display that temperature information because it concludes that a sighted person can already see it. For example, when I asked ChatGPT's real-time video mode to tell me about the thermostat, it didn't tell me the temperature, although it did when I told it that I was a blind person and required a detailed description.
The fun began when I asked each AI tool to tell me the controls left to right because there's a row of buttons on the bottom of this thermostat. I had a bit of a head start here because I knew that the leftmost control had a minus sign. It was a tactually distinguishable minus sign on it. So that would be for decreasing the temperature. And the far right control had a plus sign on it, so that would be for increasing the temperature. I didn't know what the middle buttons did.
Envision Ally got it right the first time. This was the one that's coming out from Envision in the very near future. But other tools gave me erroneous results. Sometimes I was able to get them on track by giving them a gentle, are you sure kind of nudge if I said, are you sure about that, the thermostat left and rightmost buttons seem like temperature controls to me. That sometimes offered enough correction for it to be helpful.
The bottom line here though is that I have found for questions relating to controls on appliances, human verification is advisable. And it's what Karl said before, trust but verify. In the Be My AI portion of the Be My Eyes app, there's an easy way to connect with a volunteer if you want to do that. And the Aira Explorer app allows you to seek human verification from a professionally trained agent. Now there's no cost for that.
The response can take some time when Aira is busy, but I find this to be very useful in a situation like the thermostat example. Within about three minutes I received a response from an Aira agent giving me the correct order of the controls. And since it's provided by text, it's easy to copy somewhere, I can copy it to the clipboard and paste it somewhere if I'm building a little cheat sheet on the device in question.
You can perform vision interpretation functions with mainstream AI applications as well that offer a camera function and with hardware such as the very popular Ray-Ban Meta smart glasses. For mainstream applications and devices, your prompt engineering skills are even more important. Some mainstream AI tools let you tell them to remember things. We talked about this with the question that we had earlier. So you could for example say, remember I am blind, so image descriptions should be as detailed as possible.
But I don't believe this sort of memory is making a difference when it comes to describing images. If I take a picture with ChatGPT or use its live video mode, which I'll come to in just a bit, I find that if I tell it that I'm blind and require a detailed description, I get much more detail than I normally do.
Now in the past, telling an AI you're a blind person may result in an unwanted expression of sympathy about you being blind. Many of us who were early adopters of AI are familiar with this. You tell it that and it will say I'm sorry you're blind. But the National Federation of the Blind has been actively working on that and it isn't something that I've seen now in quite some time.
So when using a mainstream tool, I do disclose my blindness now and I get much better results because of it. This is particularly notable with the Ray-Ban Meta smart glasses. By saying I'm blind, give me a detailed description. It makes a huge difference. Now, it may be that saying give a detailed description is enough, but when it comes to AI, I always find that context and detail can make a difference. And as I say, as long as it's not going to apologize to me for my blindness, I am quite happy to tell it.
These tools are not necessarily designed to perform optical character recognition of documents. What they are designed to do is summarize. But in particular Ray-Ban Meta is working on this. We meet with them and they are aware of the glasses having gone so viral in the blind community. And they're already responding to that by introducing Be My Eye support directly into the user interface.
I found that if you get a print document and you ask the Ray-Ban smart glasses to read the document word-for-word, it now sometimes will. I've seen other times when it won't, particularly where it may conclude that there is too much personal information in the document or there's some disclosure of address details, that sort of thing seems to trigger it. But in general with Ray-Ban Meta smart glasses, the more precise you can be about what you want Meta AI to give you, the better results you're likely to get.
Because for most people, for people who are sighted, this is an augmentation of the world they can see around them. So actually telling it you're a blind person or that you need detailed descriptions makes an enormous difference. This product is particularly sensitive to prompt engineering.
Where summarization can be very helpful, whether you use a blindness specific or a mainstream tool, is when you're reading a document where you want to extract key bits of information. For example, you can take a picture of a menu at a restaurant and ask for the lowest price choice on the menu or what steaks are on the menu.
Ally from Envision lets you streamline this process even further by giving you the opportunity to have the AI remember a whole bunch of things about you, including your dietary preferences. When you provide a restaurant menu to it, Ally will then volunteer choices that seem to match your dietary preferences.
Since these tools aren't designed to do OCR, at least in most cases, if you want the most accurate reading of an important document, you may be better using traditional optical character recognition. Seeing AI and Lookout offer this, and there are other apps such as VDScan on the iPhone and the KNFB Reader functions built in to the NFB Newsline app.
I recently heard from an NFB member who was annoyed that their state ID had their last name spelled incorrectly on it. And they were about to call an Uber and get down there and castigate the issuing authority and demand that they get their ID corrected. When they decided they should get some human verification, which said that the ID was spelt correctly all along. The AI had just decided that the name was a misspelling and took it upon itself to correct it. So as we always say, it's about using the right tool in the toolbox.
ChatGPT and Google now both offer mainstream real-time video AI. Google's is a little more difficult to find because it's not yet in their Gemini app. I suspect it'll get there at some point. But ChatGPT's is available to Plus subscribers via the ChatGPT app. This can be very useful because it doesn't require you to take pictures manually, so you don't have to keep fumbling and asking for texts because you can talk to this thing in real time. You don't have to wait for a response to be returned, it is very quick. And I'll show you this in just a second.
But this technology is not quite yet where we at the NFB would like it to be. Although we are confident it will evolve quickly as AI tends to do. At the moment, you can ask a question of the AI and it'll give you a very prompt response based on what it sees at the moment you ask the question. What you can't yet get it to do is to ask it to keep watch for something. For example, you can't be walking down the street and ask it to tell you when it sees a McDonald's. You'll have to keep asking, do you see a McDonald's in the picture? And it will give you an immediate response, but it just won't watch out for you.
What I'm going to do now is just show you this in real time because it is a fairly new feature and because it does cost. So I am going to turn speech on on my iPhone.
AI: Speech on.
Jonathan Mosen: And I have ChatGPT voice in my control center because I use it quite a bit. So I'm using an iPhone. I'll go to control center.
AI: Explorer, control center, create event, open chat voice.
Jonathan Mosen: And there's open ChatGPT voice. So I'm going to double tap that.
AI: Open chat button.
Jonathan Mosen: Hello, GP, are you there?
GP: Hi, Jonathan, I'm here, how's it going?
Jonathan Mosen: Mate, would you like to say hello to all the attendees of the NFB boutique on how to engineer an AI prompt.
GP: Hello everyone at the NFB boutique, it's fantastic to be here with you all. Let's dive into the art of engineering effective AI prompts. Do you have any-
Jonathan Mosen: Yes, I do, yes, sorry to cut you off, but I guess that proves that I can cut you off. Now you have a video feature that is quite new, correct?
GP: Yes, I do have a video feature. If you'd like to use it just let me know.
Jonathan Mosen: Okay. I want to use it. I thought I had to enable the camera myself, can you enable the camera for me?
GP: I can't enable your camera for you.
Jonathan Mosen: Exactly. Okay. So I'm going to do that now.
AI: Choose open video camera. Switch button off.
Jonathan Mosen: I'll double tap that. Okay. You got eyes now, GP?
GP: I can see a bit of what's in front of you. What would you like me to help with?
Jonathan Mosen: Just describe the room if you would.
GP: From what I can see, there's a wooden panel wall behind you and a laptop in front. If you'd like more detailed information about anything specific, just let me know.
Jonathan Mosen: Tell me about the laptop, can you tell what the manufacturer is?
GP: Yes, it looks like a Lenovo ThinkPad, identifiable by the red track point in the center of the keyboard. If you'd like more details or need help with anything specific, just let me know.
Jonathan Mosen: Now we're doing a webinar, as I said, so we've got some audio equipment. What did you think about this audio interface, can you tell me what that audio interface is?
GP: The audio interface looks like a Focusrite device, possibly a Scarlett model. It's compact with several input and output options. If you need any details on its features or setup, I'm happy to help.
Jonathan Mosen: It is a Focusrite, but it's not a Scarlett. Can you maybe take another look and see if you can get it right?
GP: My mistake. Based on the design and layout, it looks like a Focusrite Vocaster, possibly the Vocaster II. It has distinct controls for microphones and headphones. If you need more details about its features or setup, let me know.
Jonathan Mosen: That is correct, that it's a Vocaster II. All right, you get a chocolate fish. Now, here's the microphone and I'm going to try and position the logo in the view of the camera. Can you tell me what microphone this is?
GP: Certainly, the microphone appears to be a Shure SM7B.
Jonathan Mosen: No, it isn't, have another go.
GP: Ah, I see a distinctive logo, it looks like a Heil Sound PR 40 with its gold finish. It's also a great microphone for voice work and podcasting.
Jonathan Mosen: So why did you think it was a Shure first and now you know it's a Heil?
GP: The initial angle made it hard to see the distinctive logo. Now that I have a clear view, I can see the unique shape and details that indicate it's a Heil PR 40. Thanks for adjusting the camera.
Jonathan Mosen: Oh, anytime, mate, anytime. All right, well, thank you for being a part of this webinar. It's been an absolute blast. And I'm sure that people found it helpful.
GP: You're very welcome, Jonathan, I'm glad I could help. If there's anything else you need, just let me know. Enjoy the rest of your webinar.
Jonathan Mosen: Yeah, well, you too. So that is a quick look at some of the AI tools and the new real-time mode. You will find that this becomes more and more common. It is quite remarkable in terms of just the flow of that. And we will start to see Siri responding in kind as well in the next little while. And I believe in the next 24 hours you will find another pretty prominent manufacturer entering into this space as well.
Kennedy Zimnik: I am going to be talking about creating images with ChatGPT specifically, but a lot of this stuff applies to other AIs as well. So I'm going to be describing what's going on on screen. I'll also turn on my screen reader and let you guys hear what it sounds like.
So like I said, I am using ChatGPT, and I'm using ChatGPT Plus. Like Jonathan just said, this is a 20 dollars a month subscription. It gives you access to all the ChatGPT features. And the really cool part about it that I'm going to be showing today is the image creation feature. And then I'm also going to be showing how to create your own custom GPT. I'm going to show you a tactile graphic-ready image creator that I created using this custom GPT. And then I'll show you how to get that image from ChatGPT into a Word document and you can save it as a PDF if you'd like.
I'm just on ChatGPT.com. There's also apps that you can download. I just saw that you can download a desktop application. I don't have it downloaded now, but I might do that. There's also smartphone applications for Android and iPhone of course.
The biggest thing about image creation, just like we've been talking about today, is being specific for what you want and then also being specific in the style that you want it. So AI images kind of have this look to them that you can tell that they're AI. If you're not specific with what you ask it for, then it's going return an image that has kind of the shading and the quality and kind of the elements that you'd expect to find in an AI image. If you want to get away from that, then you have to be a little bit more specific with what you want. And then usually ask it for more of a flat image or less shading, things like that.
So I'm going to start today with just, this is just regular ChatGPT, this isn't any custom GPTs or anything like that, so this is just the base GPT. And I'm using GPT 4o, this is the GPT that's automatically selected. I'm not going to go through all of them, there's a good amount and they always ... kind of different ways that you can interact with it. So I'm just going to be using 4o for today.
So to create an image, you just basically ask it to create that image. So create an image, let's just do a person ice skating on a sunny winter day. We'll see what that does. So I just said create an image of a person ice skating on a sunny winter day. So it's creating this image. So we created this image, it popped up in the main chat and it has some text underneath. Here's an image of a person ice skating on a sunny winter day. Let me know if you want any adjustments.
So I'm going to ask it, describe this image as if I were a blind person. This image to me as if I were a blind person. Okay. I'm going to just let this write out. In my findings asking for it to describe it as if I were a blind person or if I had low vision was a really easy way to get really detailed descriptions of the image. So I've just been using that. And of course you can always ask it for more details. If you have a detail that it describes and you want some more information on that, you can always do that. I'm going to go ahead and press the read aloud button.
Speaker 10: The image shows a peaceful winter scene where a person is ice skating on a frozen lake under a bright sunny sky. The skater is dressed warmly in a thick winter coat, scarf, gloves and boots, gliding gracefully across the smooth ice. The sun is shining, casting long, crisp shadows and reflecting brightly off the icy surface. In the background snow-covered trees line the edges of the lake, their branches heavy with fresh snow. The sky is a clear brilliant blue with no clouds in sight adding to the serene and joyful atmosphere. The ice itself is glossy and reflective with faint skate marks visible from previous glides. The scene feels vibrant yet tranquil, capturing the crispness of winter air and the simple pleasure of skating outdoors on a perfect winter day.
Kennedy Zimnik: So that was the read aloud feature. You can change the voice and settings. I think it sometimes sounds better than just a screen reader reading it, so that's why I did that. If you wanted to change something about this image, you can always ask it to modify the image somewhere. OS, you can modify the person to wearing a different color hat, you can have them without a jacket on, with a jacket on, things like that. So that is all possible with image creation.
What we're interested in here is tactile graphics. So tactile graphics are simple graphics, the digital version, that don't have a lot of different colors and shadings and textures. Tactile graphics, if you don't know are graphics that you create in different ways, whether that is on an embosser, on a different thermoform machine, there's a lot of different ways to create tactile graphics. But in the end, you'll have a graphic that you can touch and feel to interact with.
So when we're creating tactile graphic ready images, we're just looking for simple images that are black and white. A lot of the images are line images, so just use lines. And not a lot of different textures and shading like I was saying. And you might have some labels as well. We've not been able to do Braille labels yet, I don't think the technology is quite at creating good Braille, if that makes sense. But eventually that would be kind of the goal to have maybe a STEM graphic that has Braille labels labeling the different parts of a cell, for example.
So I'm going to switch over now to my custom GPT that I made. On the left side of the screen you have ChatGPT and all the custom GPTs that you've been using. Before I go there, let me just show you the Explore GPT window. So the Explore GPT allows you to explore different custom GPTs that are out there. So let me just show you how it works first and then maybe you'll kind of get what a custom GPT is.
So this is the one that I created, tactile graphic ready image producer. I'm going to go ahead and edit this GPT. Anybody with GPT Plus can do this. What a custom GPT is is a custom chatbot that produces responses within a certain framework that you've set out. So for example, this is my tactile graphic ready image producer.
So these are the instructions I gave it. This should be an image-based GPT that helps people create tactile, graphic ready images. Tactile graphics are graphics that blind and low vision people can interact with. Meaning the images should not have extra shading, it should be line-based, not have a lot of textures, be simple black and white. There should be no extra black dots added in the background. The background should only be white. There should be no extra dots, lines or texture added for effect. Only create the image that it has asked for and do not add any extra elements. So these were instructions that I went into and configured this custom ChatGPT with.
Now that this is created, I'm going to go back. Let's do update. And I'm going to do view GPT. So I'm going to turn on the screen reader now and kind of go through how a blind person would create an image and then get that description read back to them. So we're in my tactile graphic ready image producer. I'm going to turn on NVDA.
NVDA: Loading NVDA. Welcome to NVDA. Welcome to ... welcome ChatGPT tactile graphic ready image producer document, main landmark, section, multi-line editable ...
Kennedy Zimnik: Because it's already this custom GPT, I don't have to be too specific with wanting a tactile graphic. So I'm going to do create an image of a school bus.
NVDA: ChatGPT is generating a response. From my ... ChatGPT says a simple black ... here is a simple black and white line drawing of a school bus. Let me know from ...
Kennedy Zimnik: Okay. And it created a simple black and white line drawing of a school bus. If we wanted to get a description of this.
NVDA: [inaudible 00:56:20].
Kennedy Zimnik: So describe the image as if I were a blind person.
NVDA: ChatGPT is generating a response. ChatGPT is still generating a response. ChatGPT is still generating a response.
Kennedy Zimnik: Okay, thank you for telling me that.
NVDA: Out of ... clickable. The image is a simple black and white line drawing of a school bus viewed from the side. The bus has a long rectangular body with a slightly rounded front where the driver's seat would be. Beside there are multiple evenly spaced rectangles.
Kennedy Zimnik: So I'm not going to go through the whole thing, but it describes the image. And I could also do that read aloud feature as well. If I wanted to save this, what we're going to do is we're going to navigate using G to get to the previous graphic. And then we're going to do shift function, F10, to simulate a right click, do a save image as, save that image. And then we could open it in a Word document or a PDF document and insert it in.
NVDA: A simple black and white line drawing of a school bus. The bus is depicted from a side view showing its rectangle-
Kennedy Zimnik: It started giving me a description of the image. When this focus is on the image, and I got there just by doing shift G, I'm going to do shift function F10.
NVDA: Menu, sub menu, open image in new ... save image as ChatGPT, save as dialogue. B-U-S, dot, J-P-G, tool bar, save button, save, download complete, bus.jpg.
Kennedy Zimnik: Great.
NVDA: Photos. [inaudible 00:58:06]
Kennedy Zimnik: So now I could open this file in Word and then insert it as a picture that I had.
Jonathan Mosen: Kennedy Zimnik concluding our AI prompt engineering boutique. One of many boutiques that we've done and there are many to come. And if you'd like to find out more about what's coming up, head on over to nfb.org/cena. That's nfb.org/cena, which is our center of excellence in non-visual accessibility here at the National Federation of the Blind. You'll find the training page there where you can register your interest to find out about our boutiques and also longer form training coming up.
That concludes this episode of Access On, the technology podcast of the National Federation of the Blind. To send in a contribution for a future episode, email us, attach an audio clip or just write it down and send it to [email protected]. That's [email protected].
To keep up to date with Access On, follow us on Mastodon, [email protected]. That's [email protected] on Mastodon. To subscribe to an announcement only email list about upcoming episodes, send a blank message to [email protected]. That's [email protected]. To learn more about the National Federation of the Blind, visit our website, nfb.org, or phone us 410-659-9314. That's 410-659-9314. And be sure to check out the Nation's Blind podcast right from where you heard this podcast.