Artificial Intelligence tools have made huge strides in transcribing handwritten text in recent years. I’ve already written about FamilySearch’s use of AI to transcribe thousands of deed and probate images here: AI-Powered Full-Text Search of Handwritten Text at FamilySearch.
In this post, I’ll share how I’ve been using ChatGPT 4.0 (the paid version) and Claude (free version) to uploadĀ images (.jpg and .png) and transcribe them quickly. These large language models (LLMs) do a pretty good job of readingĀ clear handwriting. I often use them when the task is simple and will take me 3-4 minutes, but I want to reduce that time to 30 seconds.
ChatGPT 4.0 and Claude
The paid version of ChatGPT by OpenAI (https://chatgpt.com/) costs $20 per month and allows you to upload files. The free version won’t let you do this. Claude by Anthropic (https://claude.ai/) is a breakoff of OpenAI and is similar in its capabilities — but it allows free users to upload files.
Update:Ā On May 13, OpenAI released ChatGPT 4o (o for omni), which is available to free users and allows the upload of files.
Document Images
The images that I ask the LLMs to transcribe for me are usually a mix of typed/printed text and handwritten text, usually cursive. Often they are short documents like, marriage bonds, birth certificates, bill of sale, pension application, etc.
Here’s an example of a document I recently transcribed with the help of ChatGPT 4.0:
Source: Compiled service record, William T. Dyer, Sergeant, Co. D, 39 Tennessee Mounted Infantry (Confederate), parole document, 9 July 1863; “Compiled Service Records of Confederate Soldiers Who Served in Organizations from the State of Tennessee,” database with images, Fold3 (https://www.fold3.com/image/78183425 : accessed 16 Apr 2024); citing Carded Records Showing Military Service of Soldiers Who Fought in Confederate Organizations, compiled 1903 – 1927, documenting the period 1861 – 1865, Record Group 109, The National Archives.
I had already transcribed the other cards in the compiled service record into my Airtable research log, but I decided to save some time and have this one transcribed by ChatGPT.
Chatting with ChatGPT
Before giving ChatGPT the job to transcribe the image you copy/paste or upload into the chat box, it helps to provide it with some context. Here’s the prompt I wrote for the parole document:
You are an expert genealogist. Transcribe this page from the Civil War compiled military service record of William T. Dyer from Hawkins County, Tennessee. It says W.T. Dyer, a 4 Serg. of Co. D. 31 Reg’t Tenn Vols, C.S.A…. and is signed William T. Dyer. The paroling officer appears to be illegible, but could be [N.? Pullen?] of the 20th reg’t of Illinois vols.
I already knew the name of the research subject from transcribing the other carded images, and it’s relatively easy for me to guess at the name of the paroling officer. But to fully capture the essence of this document, I’d rather have a full transcription than an abstract or summary. That’s why I asked ChatGPT to transcribe the full thing.
Fact Checking
An important next step in using LLMs is comparing the information from the generated transcription with the original.Ā I noticed the date at the top was transcribed as July 5th, but the document actually said July 9th. The date at the bottom was transcribed as [illegible]. I prompted the chatbot to fix this and update a couple things:
The date at the top is July 9th. Then at the bottom it’s July 10th. Add N. Pullen in square brackets in the place of the illegible signature, followed by 20th reg’t Illiinois Vols, Captain, and paroling officer.
I then copied and pasted the fixed transcription into my research log! Here’s what my entry for this source looked like when I was finished:
Transcribing with Claude
Now I’ll show how to transcribe with Claude.ai, an LLM by Anthropic. In this example, I simply uploaded a marriage license. I asked the chatbot to play the role of an expert genealogist. This activates the neural network associated with genealogical words and phrases.
Claude started by giving some key details, instead of transcribing the image. It had a major hallucination – adding a made-up detail that William T. Dyer was the son of Susanna Dyer. However, the actual transcription it gave after the key details was correct.
Transcribing Newspaper Articles Quickly
Snagit’s Grab Text Feature
In Steve Little’s NGS Course, Empowering Genealogists with AI, I learned about a tool called Snagit (https://www.techsmith.com/screen-capture.html). This tool takes screenshots and then allows you to “grab text” from them. This isn’t an AI tool, but it can use OCR to grab text from typed (non-handwritten) materials. I have used that a bit over the last few months, and appreciate it for quickly grabbing text to paste into my research log. However, it doesn’t add nice line breaks or formatting like ChatGPT and Claude. One of the strengths of the chatbots is their ability to format text nicely and use language well.
Here’s how the transcription came out with Snagit’s grab text feature:
ALL READY FOR OPENING
EVERYTHING IN PERFECT READ:
INESS FOR TOMORROW.
Strings of New Horseflesh Still Ar
: rivingā-Attraction Companies are
. Pouring in and Exhibits of all
Kinds Will Abound.
Indications are that an Immense crowd will attend the opening of the first annual exhibition tomorrow after- I noon. President Wall and Secretary Stuart stated today that everything will be In perfect readiness by to- morrew. Exposition hall s being tastily decorated and every lady who hasexibiās is requested to send them in. Tckadies who have charge of this feature of the work state that it is. very Important that exhibits be sent out at once to be arranged for the opening. The exhibits will be taken care of in good shape. Ladies sendins in exhibts will not be required to purchase s asoh tickets.
When ine flag falls tomorrow the throng wili witness the best string of horses ever brought here. The races will be one of the big features of the J meeting and everyone is looking for- ; ward to tismn with keen Interest.
Among .ii new horses that have ar- , rived recently may be included Miss . Steadman, Blue Rock, The Captain ; and others.
Jockey Jchnny Dunbar arrived yes , terday wlh six runners. Mr. Cole of Plttsburg, Kas., came in from Kansas with five harness horses.
This is helpful, but it has a lot of errors.
Using ChatGPT for Newspaper Clippings
I’ve been using ChatGPT to transcribe newspaper clippings quickly. I like to have the full text of the article in my research log. This is saving a lot of time! Here’s how the transcription turned out of the same newspaper article with ChatGPT 4.0:
ALL READY FOR OPENING
EVERYTHING IN PERFECT READINESS FOR TOMORROW.Strings of New Horseflesh Still ArrivingāAttraction Companies are Pouring In and Exhibits of all Kinds Will Abound.
Indications are that an immense crowd will attend the opening of the first annual exhibition tomorrow afternoon. President Wall and Secretary Stuart stated today that everything will be in perfect readiness by tomorrow. Exposition hall is being tastily decorated and every lady who has exhibits is requested to send them in. The ladies who have charge of this feature of the work state that it is very important that exhibits be sent out at once to be arranged for the opening. The exhibits will be taken care of in good shape. Ladies sending in exhibits will not be required to purchase season tickets.
When the flag falls tomorrow the throng will witness the best string of horses ever brought here. The races will be one of the big features of the meeting and everyone is looking forward to it with keen interest.
Among the new horses that have arrived recently may be included Miss Steadman, Blue Rock, The Captain and others.
Jockey Johnny Dunbar arrived yesterday with six runners. Mr. Cole of Pittsburg, Kas., came in from Kansas with five horses.
It definitely saves time to have the transcription come out so beautifully the first time!
More Examples of Transcription with ChatGPT
Below are two additional records that I’ve asked ChatGPT 4.0 to transcribe. I hope this gives you some ideas for using ChatGPT and Claude to help make your transcription tasks more efficient. In my next article, I will discuss using Transkribus (https://readcoop.eu/transkribus/) to transcribe longer, more complex documents.
Learn more about Using AI Tools in our 4-day workshop, Research Like a Pro with AI.
4 Comments
Leave your reply.