Future Reflections        Special Issue: Technology

(back) (contents) (next)

From Handouts to Digital Files

by Marshall Flax

From the Editor: Marshall Flax is a computer programmer and the father of two blind children.

Elsewhere in this issue you will learn about the ever-growing options for accessing electronic text: screen readers, refreshable Braille displays, embossers, iPads, and commercial ebooks. Ideally, you should be home free once you have computer-readable text. But how do you obtain a computer-readable text when your kid's teacher sends home a photocopied handout? In this short article I will offer some useful techniques.

It's a pretty safe assumption that the teacher's handout came from a source online. Somewhere an electronic version exists, if only you can find it. Google, of course, should be your first stop. Go to the middle of the handout and find a string of words, perhaps five to ten words in length. Phrases containing unusual words or juxtapositions of words are best, and phrases containing lots of hyphenation or punctuation do not work as well.

Once you have chosen a phrase to look for, surround the exact words in double quotes and do a Google search. The double quotes will tell Google that you want those exact words in that exact order, rather than just a page containing them anywhere. If you choose your phrase carefully, you're likely to find only the article in question. (And since you chose a phrase from the middle, you're not going to get sidetracked by a site that only has the first or last pages of the article you need.)

When you have found the right webpage, you might have a problem extracting the text from surrounding graphics. In this case, you have a few options. Sometimes Google's cache in the search results works better than the page itself. Sometimes cutting and pasting into an application that doesn't know about formatting and graphics (Notepad or, if you're more technical, gVim from <http://www.vig.org> works wonders.

If the article isn't online, your child's teacher may be able to send it to you as a PDF file. Sometimes cutting and pasting from Adobe Reader is all that's necessary, though you may have to play a bit with the accessibility options. Instead of playing with Adobe Reader, it's usually faster and easier to email the PDF to your Gmail account and use the preview feature from your Gmail inbox page. (Don't have a Gmail account? Get one, even if you never tell anyone about it!) This technique also works well for DOC and DOCX files, and probably for many other file formats as well.

What if the PDF was created by scanning a previously-printed piece of paper, so it contains no actual text? In this case, upload the PDF to Google Docs. (Don't have a Gmail account? Please see the previous paragraph. If you have a Gmail account, go to <docs.google.com>, and you'll find your Google Docs account). When you upload a scanned PDF into Google Docs, it will do Optical Character Recognition (OCR) and guess what the letters are.

Google's OCR technology is free and good, and often it will do exactly what you need. However, it is limited as to the size of the files it will process. More specialized techniques (detailed at <http://sites.google.com/site/marshallflax/advanced-pdf-ocr> are needed to split a large doc into smaller, manageable pieces.

Google isn't your only friend. Bookshare.org is a wonderful resource, and, for good reasons, it doesn't seem to be indexed by Google. (If you're a kid, be sure to logout before searching; Bookshare silently censors results if you're logged in as a child user.)

Most of all, be fearless! In civilized countries, people rarely die from poorly-executed web searches. Enjoy, and you may be lucky and save someone (perhaps yourself) from having to type in entire articles by hand.

And vote for legislators who support the freedom of the web and promote laws that encourage academic publishers to make their publications easily available to our kids. Thank you!

(back) (contents) (next)