In this episode of the Research Like a Pro Genealogy Podcast, Nicole and Diana interview Steve Little, the National Genealogical Society Artificial Intelligence Program Director. Steve explains what Large Language Models are and their strengths, including summarization, extraction, generation, and translation. He discusses how AI can be used in genealogy to extract names, dates, and relationships from text, assist with genealogical writing and translating documents, transcribe printed and handwritten text, and transform text to different comprehension levels or modern standard English.
Listeners will learn about the potential of AI in the field of genealogy, as well as its limitations. Steve emphasizes the importance of verifying AI-generated results. He also touches on the future of AI in genealogy, predicting new use cases, more competitors, and expanding context windows.
This summary was generated by Google Gemini.
Transcript
Nicole (0s):
This is Research Like a Pro episode 319 uses of AI in genealogy with Steve Little. Welcome to Research Like a Pro a Genealogy Podcast about taking your research to the next level, hosted by Nicole Dyer and Diana Elder accredited genealogy professional. Diana and Nicole are the mother-daughter team at FamilyLocket.com and the authors of Research Like a Pro A Genealogist Guide. With Robin Wirthlin they also co-authored the companion volume, Research Like a Pro with DNA. Join Diana and Nicole as they discuss how to stay organized, make progress in their research and solve difficult cases. Let’s go.
Nicole (42s):
Today’s episode is sponsored by Newspapers.com. Hi everyone. Welcome to Research Like a Pro
Diana (48s):
Hi Nicole. How, are you doing today?
Nicole (50s):
I am awesome and I have discovered a very fun new thing I’m excited to tell you about it.
Diana (56s):
Spill the Means.
Nicole (58s):
Well, as You know in our Research Like a Pro with AI workshop last week we were talking a little bit about proof arguments and DNA proof arguments and using Claude projects to upload research reports and get some ideas for how to organize a proof argument. And I thought, well, I want to add DNA evidence to this proof argument that I’m working on and maybe Claude can help me draw a descendancy diagram for my DNA matches. And sure enough, it was able to do so and that was exciting. Although it couldn’t really figure out what the relationships were in the diagram, it could successfully generate code that generated an SVG file, an image. Well, I decided to write a quick blog post about that yesterday because it was such a fun discovery and at the end I said, I think Lucidchart will be adding AI soon.
Nicole (1m 46s):
And then I thought, well, maybe they already have. So I went and checked and sure enough Lucidchart now has a generate diagrams tool. This is as of the spring, so it’s been around for a couple months I just haven’t noticed it. And I tried it out and it worked amazingly well. And, it generated a diagram that I can edit and move around the boxes if I want to, or I can continue to prompt it to create additional diagram pieces. So that was so exciting and I included that in my blog post.
Diana (2m 14s):
That is so great. And as soon as you told me about that, I had to go check it out. So for our listeners who are wondering how to find that, when you open up Lucid chart over on the left, you look down for the little icon that it appears all the different places are using to show its AI. So it’s that. How would you even describe that little icon Nicole
Nicole (2m 35s):
Two four point stars where one is smaller to the upper right than the other?
Diana (2m 40s):
Yeah, that’s a good description. And then you click on it and then you can see what you can do with AI. So I am super excited to try that out because we all know how much time it takes to make those diagrams. And as we were chatting before we came on today, Nicole mentioned that you could just take your description that you’ve put either either on your Ancestry DNA, note that this is your, you know, third cousin once removed, and then you have the specific relationship written out, you know, I write the DNA test taker and then I have, you know, the line going back. So if you just copy and paste that into the prompt with some instructions, it can create that for you.
Diana (3m 21s):
So I am…that’s another reason to have good notes, right?
Nicole (3m 25s):
Right now you can just use all of your notes, copy and paste into a diagram.
Diana (3m 30s):
Yeah, that’s exciting. It’s gonna be so fun to try that out.
Nicole (3m 32s):
Well, we wanted to share that fun discovery today because we have a fun AI guest with us today. Hi Steve.
Steve (3m 39s):
Morning,
Nicole (3m 39s):
We’re so happy to have Steve Little here. He’s gonna be talking with us today about uses of AI for genealogy. So that will be fun.
Steve Little (3m 47s):
I’m glad to be here with y’all and I’m glad to learn about Lucidchart. I became a Lucidchart aficionado on your recommendation many years ago and have kept a current subscription. And I didn’t know that that was there either. It’s breaking out all over.
Nicole (4m 4s):
It really is. Well, it just so happens that Steve is also presenting in the Research Like a Pro Webinar Series this month. So we get to feature the description of his webinar now in our announcements segment. So this is coming up on August 20th at 11:00 AM So if you’re a webinar member for the Research Like a Pro Webinar Series, you’ll be able to watch that. And if you’re not, you can still register. The title of Steve’s webinar is Who’s Eli’s Daddy: A Civil War-era Open Secret – A DNA Case Study, and he’s gonna talk about his ancestor James Eli “Bawly” Bower, who was born in 1863 in Ash County, North Carolina during the Civil War.
Nicole (4m 46s):
And this is a fun one because oral history suggests that Bawly’s father was a Confederate soldier who while on leave in 1862, allegedly returned home not to his wife and children, but another woman, Margaret Riley Bower. And nine months later, Bawly Bower was born and shortly after the soldier, William McMillan was dead. So this case study aims to determine if documentary evidence and DNA analysis, both autosomal and YDNA can confirm or refute the family legend that William McMillan is the father of JJames Eli “Bawly” Bower, born to Margaret. So this will be a great discussion of how we can use DNA evidence to help with these oral histories that come down to us, especially where this one concerns a non-paternity event where you really do need to use DNA evidence to confirm that.
Nicole (5m 37s):
So we’re excited, Steve.
Steve Little (5m 39s):
I’m excited too. This was because of the mildly endogamous community it really took YDNA to cut through the mess to clarify this. And so this was the case that I used to learn more about how to use YDNA and how it can really supplement your autosomal work.
Nicole (6m 0s):
Fantastic. I think that will be really useful for so many of us. A few more announcements. Our next study group begins very soon. So this is Research Like a Pro, not with DNA, but just focusing on documentary evidence. This begins August 28th, so we hope that you can join us. Also, we have a few upcoming conferences to mention the Association of Professional Genealogists PMC Conference Professional Management Conference is September 19th through the 21st. And this is a virtual conference this year. I’m excited to be teaching about Canva. My topic is creating a brand kit with Canva.
Nicole (6m 41s):
So if you’re a professional genealogist and this is something you want to learn more about, then you should definitely come. And one thing that I’ve discovered recently as I made the syllabus for this class is that Canva has many AI tools, actually a ton of AI tools, and some of them are really exciting, especially if you’ve done any amateur graphic design and been frustrated with like the pictures you have and they’re not the right size orientation or they don’t have a large enough background. There’s a background expander with AI, you can generate images within Canva using DallE with the paid subscription. Canva has its own magic studio image generator, So. it has a lot of really useful tools now that AI has become such a big thing.
Nicole (7m 25s):
So we’re excited about the PMC conference. And then the other conference coming up is the East Coast Genetic Genealogy Conference, and that is October 4th through sixth and it’s in Maryland and online. And actually all three of us are presenting. So myself, Diana, and Steve, I saw you’re on the schedule too, Steve, that’s gonna be so fun.
Steve Little (7m 47s):
I’m excited.
Nicole (7m 48s):
My lectures are going to be about organizing DNA results with research logs and diagrams as well as the lecture about tracking DNA matches with Airtable. And then Diana’s talking about DNA reports. You’ve made a DNA discovery, now how do you share it? And then Steve is talking about the power of AI in education for genetic Genealogists. So this is gonna be a fun conference. If you can’t make it to Maryland, I hope you’ll join us virtually.
Diana (8m 15s):
Wow. Thanks for going through all those. And it’s fun to think of conference seasons starting up again in the fall after having the summer off. So that will be just some fun to start seeing how people are using AI in their work processes. I know looking at that title about DNA reports, I’m thinking I’m definitely going to have to add some tips on using AI in writing reports. Kind of fun for all of us as presenters to freshen up our presentations with something about AI.
Nicole (8m 45s):
Well, that’s true. It’s something that we can integrate into a lot of the work products that we create in genealogy. And that’s actually today’s topic. So that’s a perfect segue into what we wanna talk about and we should probably do a little bit more of an introduction for Steve and talk about that since we just assume everybody knows Steve because he’s done such an amazing job this year and last year teaching about AI. And we are the beneficiaries of that because he has done so much to help us learn how to incorporate artificial intelligence tools into our everyday work. So let me read a little bit about Steve. All 32 of Steve’s third great grandparents had settled into one Appalachian county by 1820 many earlier, and 60 of his most recent ancestors were born, lived, and died there in Ash County, North Carolina.
Nicole (9m 34s):
Steve is interested in genetic genealogy, especially endogamy, pedigree collapse, and teasing-apart multiple relationships using DNA segment triangulation. He is also a husband, dad, birder, chess dilettante, film & TV fan, genetic genealogist, Methodist pastor, photographer, reader, writer, regex script hacker, skygazer, and Virginian. And he is the National Genealogical Society Artificial Intelligence Program Director. And this background that I just read doesn’t really have much of Steve’s background information on artificial intelligence. So I’m gonna ask him to tell us a little bit more on that. And Steve, can you tell us about how did you come into this role of the person to talk to about AI and genealogy?
Steve Little (10m 22s):
Love of language? I have always loved language. I graduated high school in 1985 and after an adventure in engineering and flunking out of nuclear engineering degree, I, the pendulum swung to poetry and literature. I got a undergraduate degree in English literature and then did my master’s work in a field called Applied Linguistics. And this was 40 years ago. And my interest in applied linguistics happened to be a field called Computational Linguistics and natural Language Processing.
Steve Little (11m 3s):
And that’s just using computers to study how language works. And 40 years ago or thereabouts, I had no idea it was gonna come in so useful. And that is not where I spent most of my professional life. I spent 20 years in libraries and then most of the past 20 years in a second career in community service. But a third career presented itself in that background in computational linguistics and natural language processing turns out to be two of the pillars of the type of artificial intelligence that burst on the scene about two years ago called Large Language Models.
Steve Little (11m 46s):
So between a love of technology and a love of genealogy and a love of language, those three things came together in just perfect focus about two years ago. And it’s just been the most exciting rollercoaster ride of my professional life.
Nicole (12m 4s):
Well, I love that. And we got to meet you in a Research Like a Pro with DNA study group. And we were just very impressed with all the work you were doing with pedigree collapse and endogamy, which is such a challenge with DNA work. And so that was a really fun thing to learn about with your project as well.
Steve Little (12m 23s):
I should have written a special biography for y’all because very important in my genealogical education was being part of Research Like a Pro with DNA study Group four. I cannot overstate how big a deal that was. So that was three or four years ago now. And yes, that’s when I first met y’all. And one of my things that I’m most proud of is being a Research Like a Pro alumni. And I often, often will, will tease y’all that I am always glad to be your East coast evangelist. Yes, I, I love the Research Like a Pro method and I am a proud alumni.
Nicole (13m 2s):
We love that. You’re amazing
Diana (13m 4s):
And. it was fun to meet you in person. I met you first in Virginia, in Richmond for the NGS conference. That was not this year but a year ago. And then you came out to Utah and we were able to meet up at RootsTech. So, so fun to meet our alumni. We love it. Well Steve, you mentioned large language models, so we often use that term or then we start saying just LLMs. But for those listeners who are really not that familiar with this, can you tell us a little bit about them, what their strengths are, maybe their weaknesses, You know, just let us know a little bit more about these large language models.
Steve Little (13m 47s):
Yes, let’s maybe zoom out a bit and start from the most general and get a bit more specific because we, in the past two years, we’ve used the phrase artificial intelligence a great deal in the past two years as if it was something new. But it’s, it’s been around for, I’m 57 and it’s older than me, but the way that we’ve used the phrase artificial intelligence in the popular culture over the past two years, the way you see it in the headlines over the past two years, they’re talking about something newer. And although large language models have been around for more than two years, they, they really burst into the public consciousness in November of 2022.
Steve Little (14m 33s):
And when I’m introducing the idea of large language models to folks for the first time, the analogy I use is the spreadsheet, Excel and Google Sheets. Everybody knows what a spreadsheet is. It makes it really easy to work with numbers. You can do addition, subtraction, multiplication and division with the spreadsheet. And once you know how to do that, you can do anything you can imagine with numbers and, and learning how to do addition, subtraction, multiplication and division. From there you can do algebra and geometry and trigonometry and calculus and all kinds of interesting things with math, well, large language models are to language what a spreadsheet is to numbers.
Steve Little (15m 20s):
And so a large language model lets us work with words, sentences, paragraphs, pages of text, chapters of books and whole books at a time. And, it lets us slice and dice, add, subtract, multiply, and divide words the way that a spreadsheet does with numbers. And that’s impressive enough in itself. But what really is mind blowing is that words are more than just You know. If you have the word chair, it can represent any chair And. it represents the idea of chairs.
Steve Little (16m 2s):
And so these large language models also let you manipulate ideas and concepts and abstractions like freedom and democracy and genealogy and second cousin once removed, you can start to work with those ideas. It’s not as precise as a spreadsheet or a calculator, but we’re getting there and, and so being able to work with language in a way that we’ve never been able to do before is just, it, it you can’t overstate how transformative this is going to be.
Diana (16m 41s):
Right. And I think we have seen how it has improved dramatically from the beginning where you could get all sorts of crazy answers to your questions and now where it’s getting more and more accurate or it’s getting easier to have it form what you actually want. So I think You know one of the strengths of it is it seems to be always learning and getting better or the programmers are making it better. What do you think are the strengths of, of these LLMs?
Steve Little (17m 11s):
That’s a great way to put it. And strengths is a, a good, a good question ’cause it is stronger in doing some things rather than others. And you mentioned something else that this is changing. They’re, they’re getting better all the time and that’s a little bit different than than the spreadsheet. There’s been lots of whistles and bells added to spreadsheets since the eighties or nineties. But not too many major developments in spreadsheets that’s quite different than large language models. The pace of acceleration and what we can do with them grows shockingly fast.
Steve Little (17m 53s):
And so we’re coming up on the one year anniversary of the first big talk I gave. And so I just, in the past 12 months, I’ve started to review the things we we could do and couldn’t do, or a better way to phrase it, the things that it was strong with and weak at changes rapidly. A year and a half or two years ago, the mantra was words, not facts because as you mentioned, they’re not very good or they were two years ago, a year and a half ago, very poor at research, they’re very good at making plausible sounding sentences, but large language models are disconnected from reality.
Steve Little (18m 39s):
They don’t know what’s true from what’s false. And so two years ago, a year and a half ago, even as recently as six months ago, we strongly discouraged people from using them for research that is a very weak use of large language models. There’s much better uses, strong uses of large language models and that’s when you keep it closer to manipulating language. There’s four key strengths, the the things that it does very, very well and that everybody Genealogists and family historians, but anybody else who’s sitting at a computer can use them for are manipulating words and sentences and paragraphs and, and those four key strengths are these summarization, which is taking a a lot of text and condensing it to something smaller and that I’d liken that to turning a lump of coal into a diamond.
Steve Little (19m 46s):
And then the second key strength is extraction. If you give it a bunch of text, a page or a chapter of a book and have it search for information and pull that information out, that’s called extraction and that’s like finding a needle at a haystack. The third strength is generation and that’s where you take a little bit of text and you spin it out into something bigger. For example, you could take a short list of names, dates, places, relationships and events, stuff that Genealogists have a lot of and that can quickly become a report or a narrative or anything else you could do with language.
Steve Little (20m 33s):
And finally, the fourth great strength is translation. And that means not just translating from one human language to another, from Spanish into English or German into French, which it is good at, you have to be careful with it with all these tasks. But translation also means you can translate from Elizabethan English if you have a deed that was written in the 17 hundreds in language that might be tricky to understand, it will not only translate that older style of language to modern contemporary English, but it will also translate from legalese into plain English.
Steve Little (21m 19s):
It will take scholarly academic language and simplify it so that a high school student or an elementary school student can understand something that’s a quite powerful educational tool. You can take a peer reviewed academic study from a scholarly journal and ask it to translate it so that a fifth grader could understand it and it’s going to do amazingly well. So those four things, if if you stick mostly to those things, summarization, extraction, generation and translation, those are what they’re very, very strong at.
Steve Little (21m 60s):
There’s another class of perhaps more sexy and exciting things that it is weaker at. They’re the one that grabs everybody’s attention, the ones that that it’s weaker at. But folks are drawn to, I feel like I spend 90% of my time scolding people, slapping them on the wrist and say, please learn the basics first, learn summarization, extraction, generalization, translation first. But what they want to do first is research and that gets ’em into trouble. Or they want to use it for handwriting recognition, which it can do.
Steve Little (22m 40s):
But it’s one of those weaker categories that I call here but not yet. You can see that we’re gonna get really good at that in the fullness of time, but we’re not there yet. But it’s those sexy things that people want to do first, but if they would focus on the simpler task first that they will actually save much more time. But we all have to make our own mistakes and learn for ourselves. And so I I just try to be patient with folks and say, oh yeah, you learned, you learned that it’s not as good as research is. We hope that it one day will be, and actually we’re getting much better at we can do things now that we couldn’t do six months ago and so many things we could do now that we couldn’t do 12 months ago.
Steve Little (23m 30s):
And that pace of acceleration is going to continue for a couple years. So much so that the transformation is going to be disruptive and perhaps even scary to some folks that these tools are gonna become so smart so fast.
Diana (23m 49s):
I love that. Well, as you were talking, I just kept thinking about the idea that these LLMs are tools and we are the users so they’re only as good as we know how to use them. And I think of like a canvas and a paintbrush and paints You know I’m not that good with those tools, but a skilled artist is fabulous and creates great things. So the better we get with using these tools, the better products we’re going to have to help us with our genealogy.
Steve Little (24m 16s):
I love that image of a painting. I used that quite a bit this summer. I evoked the name of Bob Ross, the the popular guy who taught us to paint with the fuzzy hair working with these machines when you often we call them large language models, more popularly people call them chatbots. And when you chat with your chat bot, if you think of it as a continuing conversation that you’re almost creating a painting with words through your conversation, you actually become much more expert and the results you get, if you imagine your conversation, your chat with your chat bot chat as painting a conversation, it becomes very effective because that’s how it works.
Steve Little (25m 7s):
Each time you have a discussion with a chat bot, it’s not alive, it doesn’t have a brain, it’s not thinking, each time it tries to put a sentence together, it’s just rereading everything from your previous conversation. So as you’re chatting with it, you’re giving it more to work with. And I, Bob Ross used to start by painting a background and then he would add some trees and then he would finish with the little birds You know that’s how our conversations develop with a chat bot and we get much more effective results when we think of these short conversations as a way to paint a picture to elicit a response from these talking tools.
Nicole (25m 54s):
That was such a, a wonderful overview of large language models and the four strengths that you mentioned ever since you taught that in the empowering Genealogists with AI class, it resonated with me so much. I’ve always remembered that. And now when I’m doing different tasks with ChatGPT or Claude, I think, which category does this fall into? And it’s just so useful to think about the different things that we can do in those categories. Well another thing you taught us in the Empowering Genealogists with AI class at NGS was to spend a certain amount of time just playing with the large language models. So I wanted to play off of that idea and ask you what is the best way to familiarize ourselves with large language models and what they can do to help us with our work?
Steve Little (26m 42s):
Such a good point, and this isn’t my observation, they’ve done peer reviewed studies now that show that it takes about 20 hours for a new user, something clicks in our minds, these tools are so different than any other kind of technology we’ve used before and we have now 20 years, most of us have been using Google for 20 years and, it kind of shrunk our brains and made us think that the way to use the computer is to type in a simple question and and we’re gonna get a response. And these tools are so much different that it takes about 20 hours, but I, I do not think that 20 hours needs to be spent like a piano lesson, like you’re gonna sit down and practice for an hour a day for three weeks.
Steve Little (27m 34s):
Playing with these tools is the way that most folks find that they get good and and start to understand and don’t be afraid to make mistakes so that you can see how it makes mistakes. So you know if there’s an area of genealogy that you are really well versed in or sports or any other category, if there’s history or anything else, you can quiz it and have conversations with it about a topic that you’re really well versed on and you can see how it confidently makes mistakes. It will tell you things that aren’t true because it doesn’t know what’s real and what’s not, what’s true and what’s false and So it is just by playing with it that we start to develop a sense of what it’s good for and where we start to find ourselves on thin ice.
Nicole (28m 32s):
That’s such a good idea. And I remember when it started to click with me what I could do and also the limitations and that was very liberating to realize some of the amazing strengths that the chatbot had and then some of the things I needed to watch out for. So I’d highly recommend that idea of 20 hours of playing.
Diana (28m 56s):
And I agree, I love what I learned in your institute course that every time we do a task we should just ask ourself, can AI help us with this? So that brings me to my next question. Often in our research we have something like a county history or we have a census or we have a deed and we want to get all the information out of that. We want the names, the dates, the relationships, this is what is important to us in genealogy and so that would be a task that AI could help us with. So what are your thoughts on that? What are our best ideas for trying to use this idea of extraction?
Steve Little (29m 35s):
This is a good use of AI because one of the, the dangers is we don’t want the chat bot to make things up. There’s a funny word that has become popularized over the past two years when a large language model makes a mistake or states an incorrect fact that’s not true. For example, if it said that that Thomas Jefferson lived at Mount Vernon and George Washington lived at Monticello, well those are wrong. It it got those two things backwards. That’s a mistake it’s probably not likely to make. But that’s the kind of thing when you start to quiz it about more obscure information that it could confidently make the antidote to that.
Steve Little (30m 20s):
The way to mitigate hallucinations is to supply the information that you wanna work with. You bring to the large language model, the text, the documents, the information you want work with. We use them for analysis, not for research. We want to use them to process information, not gather information. And one of the ways we do that is by extracting information, names, dates, places, relationships and events from a document that we bring it for example a will or obituary or a deed for example, if you had an obituary, you get the best results if you just talk to them in plain natural language.
Steve Little (31m 10s):
And so you might say something like this, you are an expert genealogist, your goal is to assist your user to extract information. Find below an obituary, extract every name, date and relationship from the obituary. And it will do that. And because you’ve given it the obituary to work with, it will do very good at only working with the information you’ve given it. But we can actually go a step further and improve our results.
Steve Little (31m 53s):
And this is something I’m actually quite proud of. This was an innovation that I discovered about 15 months ago. When you ask it to extract information from a text, you also can ask it for a final piece of information. You can say and show me the exact quotation from the obituary that leads you to conclude that Mary is the daughter of Bob and it will add a column to your spreadsheet that includes the sentence from the obituary that explicitly states that Mary was the daughter of Bob.
Steve Little (32m 34s):
And so that makes your verification, your validation much, much easier. And it reduces the chance of hallucination to almost zero. Now there is a downside opposite concern of hallucination is omission. It may skip somebody. And so whenever you work with these tools, you’re the human in the loop. It’s your job to not fall asleep at the wheel. You have to do the verification and the validation. And so not only do you have to go back and double check its work just the way as if you were working with an intern, you have to check it for mistakes or hallucinations, but you also have to check it for omissions.
Steve Little (33m 22s):
These are the kinds of things that it gets better at every month, but it’s not perfect. And so we don’t let the perfect be the enemy of the good. It doesn’t have to be perfect to be extremely useful, but we don’t take everything as if it’s a final draft the first time we get a result.
Diana (33m 45s):
You are so right and I love your example and I have used that of asking it to tell me where did you get this information in the text so that we can double check easily. And one of the great things about using large language models is that if it does make a mistake, then you just say You know you have that name and that date wrong. Go fix that. This is the correct one, And, it will fix it for you. So you don’t have to take that table that it created for you and make the corrections yourself. You just tell it what’s wrong and then it magically creates that table again for you. Which I love correcting it and getting it like we talked about, painting the picture, getting what I actually wanted after the conversation.
Diana (34m 29s):
So, it really is fun to use once you get the hang of it.
Steve Little (34m 33s):
One of the things that will surprise folks is it will apologize for making a mistake too. And, you know we’re not used to our calculators, you can tell a calculator to figure out the cube root of 27 and it doesn’t make mistakes, but even if it did ever make a mistake, it wouldn’t apologize. But these tools will apologize to us and that’s what scared everybody a year and a half ago was they’d never had a computer that talked to them as if it had an emotional state. And it doesn’t really, it’s just that is a side effect of the way it uses language.
Steve Little (35m 13s):
It doesn’t have feelings or anything like that, but because it’s trained on 6,000 years of human language and that’s how humans talk to each other, when we hear it simulate that, it is funny.
Diana (35m 27s):
It’s very interesting and I find myself talking back to it using things like Please and Thank you. Yes, yes. Even though it’s not alive. So it is fun.
Nicole (35m 37s):
Well let’s talk about one of the big strengths of AI and that is writing. So how can large language models help us with our genealogical writing?
Steve Little (35m 47s):
This is an area of strength. There’s another expert that I would probably defer to. Nicole, you’ve gotten really good at this. The four strengths that we talk about, summarization, extraction, generation and translation. This is the the generation part where you have a a, a list of information, a list of names, dates, places, relationships and events. And you can ask it to quickly get that into a format, into a narrative text or into some sort of report. This is how Genealogists are starting to do this in a way that many folks are finding exciting and useful, but all other folks are a bit concerned but many Genealogists for them the enjoyment is the scavenger hunt, the endorphin rush of finding a piece of information.
Steve Little (36m 48s):
And so they have these lists of information that never get transformed into something that will be useful to other people. A genealogical report or even just a story. And this is a strength, especially if you’re just looking for a first draft. These tools are very good at taking a list of name states, places, relationships and events and transforming that into a narrative report, into a story and and one of the ways that makes it really helpful is if you show it a model of what you want your result to look like.
Steve Little (37m 32s):
There’s some funny terminology that the experts use in this field. They’ll say, I was using the large language model and I used a “zero shot prompt” or a “two shot prompt”. A zero shot prompt just means with no examples and if you give it no examples, it’s going to just do whatever it wants to do, whatever statistically it thinks you most likely want. But if you give it one example or three examples or five examples, then its results will mirror your examples almost perfectly so that if you were to give it a genealogical report or three and then give it a list of your data and say now take this list of genealogical information and transform it into a report that matches these other three reports, it’s going to do really good at that.
Steve Little (38m 39s):
Now you’re still gonna have to go back and you just considered a first draft, but the more examples you show it, the better it’s gonna do.
Nicole (38m 47s):
I love that so much. And when you taught us about that in the class, I thought I have so many lists of dates and events from timelines and research logs that I have and, it would be fun to try to turn those into a report. And so you mentioned that I’ve been doing that and practicing with that and it’s been so interesting to see the prompt that I need to use. And you’re right, the examples that I’ve given the chatbot have been one of the most useful parts of my prompts for getting it to turn out the way I want. But it is very wonderful to be able to transform old research logs where I never wrote anything into You. know that descriptive biographical narrative about our ancestors that we can then share and use for other purposes.
Nicole (39m 31s):
It’s exciting.
Steve Little (39m 32s):
One of our favorite teachers has a a phrase of just getting nouns and verbs onto paper. And for the folks who really struggle with that, this breaks that surface tension, this breaks that friction and they can get to a first draft really quick.
Diana (39m 49s):
Yes, I have seen so many people that do our study group and do great research, but when they get to the writing that is so difficult and if there’s just something that can get you started like this, that’s a wonderful thing. Well, let’s do a word from our Sponsor, Newspapers.com. Today’s episode is sponsored by Newspapers.com, your go-to resource for unlocking the stories of your ancestors. Dive into the newspapers where your family’s history unfolds as you search nearly a billion pages in seconds. Newspapers.com offers an unparalleled treasure trove of historical newspapers providing a window unto the past with papers from the 17th century to today. Newspapers.com is the largest online newspaper archive.
Diana (40m 30s):
It’s a gold mine for anyone seeking to uncover stories from the past. Whether you’re a seasoned genealogist or just starting your journey, Newspapers.com makes it easy to search for obituaries, birth announcements and the everyday stories that shaped your family. It’s like having a time machine at your fingertips. And here’s the best part, our listeners get an exclusive offer. Use promo code, FamilyLocket it for a 20% discount on your subscription. That’s FamilyLocket at Newspapers.com. Sign up today at Newspapers.com and embark on a journey of discovery. Well Steve, we’ve talked about doing some extraction, doing some writing. What about translation? You mentioned earlier this is one of the large strengths.
Diana (41m 13s):
So what if we have a document and it’s in German and we need it to be in English? How can AI help us with that?
Steve Little (41m 21s):
Well, I should probably clarify that among the strengths, this is the one that we start to be kind of cautious about because one of our responsibilities is being loop, meaning that we have to do verification and validation. This is where we have to recognize our limits and the limits of the tools we’re using. Every time I use a large language model for any serious work. And by that, if it’s gonna have a footnote attached to it, a serious scholastic level where you’re doing peer reviewed work and other people are going to be looking at your work and other serious ways like that, we have to start to be much more careful because if I do not read or speak Spanish or German or French, I cannot verify or validate the work that it’s doing.
Steve Little (42m 27s):
If I’m just noodling around or I want a general sense of what this document might say, well that’s okay, but if it’s important that we’re correct, we may have to hire a professional translator to confirm and verify. This is an area where I do not claim expertise. My family had settled into one Appalachian county by 1820. And so every document I ever look at is in English. And I barely passed high school ’cause I couldn’t pass Spanish or French or German. They wouldn’t let me pass until I took Latin and Fortran.
Steve Little (43m 7s):
So being good at languages is not my strength. Other folks were the ones that let me know that these tools, although they’re very good, they still have to be checked. And if you don’t speak the language in which your document is in Spanish or French or German or whatever, then you can’t do that verification and validation. And so you should limit yourself on how seriously you trust those results.
Diana (43m 34s):
Oh, that’s a great answer and a great caution that we are careful we are the human in the loop. Yes, we may be use it to play with it to get started, but if we are doing serious work, we need that additional verification from someone who actually speaks the language.
Nicole (43m 53s):
Wonderful, wonderful idea. What about translation in another sense, like transforming written texts to be at a different comprehension level or other types of translations?
Steve Little (44m 5s):
Yes. This is a much more useful and fun way to use these tools. For example, one is just an older style of English to another. For example, if you did have something that was written in Elizabethan English or Shakespearean English, a style of writing that may seem unfamiliar to you and sometimes the meaning of words have changed over hundreds of years. These tools are are very good at that. If you take an older document and you just say, translate this into modern standard English, it’s very good at that.
Steve Little (44m 46s):
And then you can also say the comprehension level, this is where you can say, translate this legalese into plain English. Or you take a a scientific academic text and you can say, simplify this. My two favorite are this: I often have it translate things into what a fifth grader or seventh grader could understand. And then I’ll often say have it do it as a sophomore college student or a second year college student. And it’s very good at that.
Steve Little (45m 27s):
Now this only works for going from complex to more simple. If you had a son or a daughter or a niece or a nephew would in the second grade and they did a report on elephants or dinosaurs, I, I don’t think I would trust it to say transform this into the third draft of a peer reviewed study for the journal of mammology. If you’re trying to make things more complicated, that’s a probably a bit more unproven territory for now. But making things simpler is a very good use.
Nicole (46m 5s):
I love that. That would just introduce a lot of opportunity for made up information probably
Steve Little (46m 11s):
And I wouldn’t be able to to know if it was right or wrong.
Diana (46m 15s):
Well, something that we work with all the time in our research and our genealogy is our family tree. Do you have any ideas about how AI can be used with family trees and jed com files
Steve Little (46m 29s):
Cautiously and with very low expectations and with smaller files? This is one of the first things we’d looked at about 15 months ago and we discovered that large language models did have some understanding of jed com files and you could even ask it to refresh its memory. If I’m going to try to do research or an experiment with something like this, I would start by saying large language model, please explain to me the basics of the structure of a GEDCom file. And by doing that, you’re introducing into your conversation the rules for making and understanding a GEDCom file so that for the rest of your conversation, each time it’s putting a sentence together, it’s rereading those, the GEDCom standard if you were to actually ask it to do that.
Steve Little (47m 32s):
But GEDCom files are a little bit complicated and the amount of information these tools can chew on at any one time is limited. And some folks have genealogical databases with not just 300 or 3000, but 30,000 people And it and they cannot do that today. A year ago we were working with GEDCom files that just had a a few dozen people and the error rate was still quite high. And so this is what I would call a a weak use of these tools. If your expectations are very low and the size of the information you’re working with is very small, you might be able to convert a GEDCom file into a spreadsheet or a narrative report.
Steve Little (48m 26s):
But there’s probably better ways to do that. And that’s a lot of the learning curve with these large language models. It’s as if we’ve been given a hammer and everything looks like a nail. And so we try to use these tools to to do things and and we experiment and that’s okay. And so this is one of those places where you might spend three hours trying to get it to do something that if you were to go into your genealogical database, it would do it in three minutes. And so it’s just not practical. Now GEDCom files aren’t the only way to work with genealogical information ahnentafel files are much better.
Steve Little (49m 13s):
This is the one where you have a direct ancestry. For example, I might be listed as person number one and then my father and mother would be person two and three. Grandparents would be 4, 5, 6, and seven. GEDCom files can get quite messy, but an ahnentafel file is much more regular and structured and so you can actually take a ahnentafel file and do quite a bit with that and the error rate will be near zero. I’ve had much more success with those. And Nicole at the beginning of the conversation was talking about how you could create graphical family trees.
Steve Little (50m 1s):
I use ahnentafel files if I’m trying to create a graphic image of a family tree. The large language models are much more successful with ahnentafel files than GEDCom files just because the structure is much more well-defined and simpler.
Diana (50m 18s):
That’s such a great tip. And our programs like Roots Magic or Family Tree Maker can easily create those ahnentafel files for us that we can then upload, paste in as text or you know, however we wanna get it into the large language model to, you know, use it to write something for us or do a little bit of analysis. So that’s a great tip.
Nicole (50m 42s):
Well I’m gonna try that with an ahnentafel file. And, it makes sense that the numbering system in that file would be easier to understand than a GEDCom. And I’ve actually learned a lot about how GEDCom files work by asking it to generate GEDCom files and seeing the underlying code that it creates and generates and seeing, oh it just has ID numbers and then it just associates family IDs with individual IDs and, and it, it doesn’t have to be an order. So having the ahnentafel file where everyone’s numbered in order makes a lot more sense to the large language model just like it makes a little more sense to us. Right,
Steve Little (51m 15s):
Exactly. The tool you mentioned Claude, I, when I discovered this in the spring, we were actually driving across the state of Virginia and this new way of using Anthropics Claude to generate the scripts to create these graphical pedigree charts. My son had just gotten his driver’s license, so I was in the backseat of the car playing with this and figured out how to do this And. it was just so exciting and gonna be very useful.
Nicole (51m 46s):
Alright, let’s talk about one of those sexy uses of AI that you, you had mentioned how can AI possibly help us with the transcription of printed and handwritten text.
Steve Little (51m 56s):
How can it do it? This is one of those tasks that our wish that these tools could do this better is not in proportion to its ability to do it. We really, really want these tools to help with optical character recognition and handwritten text recognition. And it’s another one of those cases that it’s not quite there yet if 50% or 80% or 90% is good enough for you, you may have fun experimenting with this, but you have to be very, very careful.
Steve Little (52m 37s):
We should probably differentiate, there’s two flavors of artificial intelligence that people are using, especially for handwritten text recognition. For something that looks like it was made by a typewriter or computer, it’s much better at that optical character recognition. And that technology has been pretty good and solid for a while. Although sometimes the original scans, for example, we may go to a newspaper archive and we’ll look at the source text behind a newspaper article and we’ll see You know almost every other word is misspelled just by one or two characters because sometimes it confuses the letter S with the letter F or the letter E with the letter A.
Steve Little (53m 31s):
And little make mistakes like that make it difficult to index and find those articles and the large language models excel at proofreading those, at taking that messy raw optical character recognition text and cleaning that up. So that’s, that’s a good strong use correcting a bad raw scan of a newspaper. Now when you start to get into handwritten text recognition, this is where it gets much more sketchy and there’s, there’s actually two flavors of artificial intelligence that can do this and one is better than another.
Steve Little (54m 18s):
Unfortunately the one that’s popular and easy for most folks to use is the one that is the weaker at this task. The folks who are really serious about handwritten text recognition use a European tool, and again, I’m probably going to mispronounce this, I’m not sure if it’s transcribe or transcribe us, but that is the industry leader. That is the state of the art. If you’re really serious about this, that’s the tool that you wanna be using. And even then they don’t claim that this is close to being perfected and and what I mean by that is this, think about as opposed to optical character recognition, think about speech recognition we’ve taken for granted now for 15 years that you can talk to a computer And, it will understand you.
Steve Little (55m 21s):
We’ve all been talking to Siri and Alexa and I apologize if I just woke up a million people’s devices, but those tools have been very good at understanding our voices for 15 years. They’re near perfect. Well, handwriting recognition and optical character recognition are nowhere near that good optical character recognition. You know they talk about having a performance level of 85%, it’s not getting 85% of the pages correct or the sentences correct or even the words correct.
Steve Little (56m 2s):
They’re lucky if they get 85% of the letters correct, which means that every third word is gonna be misspelled. And that’s with good print. Handwriting recognition is much, much worse. And so our hope and prayer is that handwriting recognition one day would recognition and we’re not anywhere close to that today, but it’s so desired and so wanted. It shows glimmers occasionally folks will try easy tasks and succeed and get really excited if you do good, good for you.
Steve Little (56m 46s):
But just be very, very careful. You really have to put in the time to follow up.
Nicole (56m 53s):
That’s such a good way to think about it that it’s not as good as speech recognition yet. And to kind of think about those implications for needing to verify what we get and double checking everything. And I don’t know how to say transcribe us or transcribe us, I go back and forth too, but I’ve heard both ways and I’ve used it a lot. I just think there’s a lot of work to be done in this area as far as practicing playing and learning the limitations for each of us who want to try to use it. And so I think that all of your warnings about this are helpful. But I would also say that just everybody needs to try it and play with it too to see what it can do.
Nicole (57m 33s):
And the way that I’ve been playing with it is to verify everything it gives out. And then I have been slowly learning what I need to do to make the prompts better. And one of my favorite things to tell it is to put anything that you don’t know in square brackets. And, it hardly ever does that because it thinks it knows everything, but, but as I’ve practiced with correcting different OCR and HTR transcriptions with a large language model, it has a lot of suggestions for a transcription, something else made and has a lot of ideas for how to fix that up. And so putting those in square brackets is really good at that. So that’s one of my favorite uses for AI with transcription right now.
Nicole (58m 15s):
And, it is exciting to think about how these will be improved over time.
Diana (58m 19s):
And I would add to try different models, You know if you’re not getting the results you like on Claude, go try ChatGPT, go try Gemini. I recently did my 57 page pension file for an ancestor and I tried all of them finally settled on Claude, which actually blew me away with how well it did with some things. So I think we just need to keep trying and see how it gets better. Which leads me to our last question as we wrap this up. What do you think the near future of AI and genealogy holds for us?
Steve Little (58m 59s):
Accelerated new things that we’re going to be able to do? What has made the past two years so exciting is you can almost, maybe not week by week, but certainly month by month, we can see improvements in new things that we’re able to do. Two years ago it was a one horse race. Two years ago there was just open AI’s ChatGPT, and each time they would put out a new version. For example, the one that we all started with was called GPT-3 0.5, and that was introduced in November of 22.
Steve Little (59m 49s):
But by March of 23, 15, 16 months ago, they introduced GPT-4 and the leap in usefulness was breathtaking. And, and that continues each time these tools get smarter, they become able to do more things or they improve how they do the things we struggled with and So it. It’s just thrilling to watch. And at the beginning of 2024, there was only ChatGPT, but now we not only have Gemini and Claude, the one from Facebook called Meta AI, it’s as good as the others.
Steve Little (1h 0m 39s):
And so now we have a serious four horse race and these competitors are continually improving what their tools can do. And when that happens, there’s new things that we can do. And so one of the things that I encourage Genealogists to do, track your failures. Your failures are gonna become very important. They’re going to be your roadmap for discovery and advancements because today’s limits or tomorrow’s breakthroughs, the things that these tools cannot do today are going to be what they’re able to do tomorrow.
Steve Little (1h 1m 25s):
And if you have tasks that you see, well, it fails 50% of the time or succeeds 90% of the time, but that’s not good enough. And then you hear one of these tools has reported to be better at this, then you take that use case that was a little bit sketchy and then you discover, oh wow, now it works. And that is how we map advances. And so we’re seeing that all the time with what we can do. And the easiest way is just the sheer amount of information we can feed these things.
Steve Little (1h 2m 6s):
When I taught my first class a year ago, we had to use Thomas Jefferson’s will because it was only three pages long, because George Washington’s will at six pages was too much. Now we can easily do 20 page documents. All of these tools can work with a 20 page document. And this week I worked with Google’s Gemini has a version that can work with documents that are more than 1000 pages long and and that’s just mind blowing it just week by week, if not week by week, month by month, we can see how smart these things are getting.
Steve Little (1h 2m 53s):
There’s little things that I keep an eye on that I’m looking forward to. One of the ones I’m really excited about are transcripts. Many of us have audio files. Even if you have an old eight millimeter movie or VHS tape of a family reunion from decades ago, getting that audio transferred into something we can work with that is tantalizingly close right now. If you have a little bit of computer experience, it’s easy to translate an audio file into a transcript, a written transcript, but it’s not as easy as we want it to be.
Steve Little (1h 3m 40s):
It still takes a little bit too much computer experience to do it easy. I would not be surprised if by the end of the year 2024 that it’s just drag and drop easy. That you just drag an MP three file into your large language model and it instantly converts an hour of audio into a 20 page transcript. And what’s stunning is that cost about 6 cents, it happens in about three minutes. And it, it just costs pennies. And then you’ve got a 20 page transcript from an hour long family reunion.
Steve Little (1h 4m 25s):
And the audio transcription is very, very good. Again, because it’s using the same speech recognition that we were talking about before. It is very, very good. It’s just not as easy to get to yet as we would like. So some of the advances are just ease of use. But it’s gonna be an exciting couple years.
Diana (1h 4m 47s):
I agree. And I think it behooves us all just to jump in and get started learning because it is going to change the way we do our work in family history and genealogy. I, it’s already changed mine. There’s so many tasks that I use AI for now that speeding up my work, making it more fun to do some of those tedious things and it’s only going to get better. So yes, Thank you so much Steve for introducing us to what AI and genealogy can hold for us. It’s been such a pleasure to have you on the podcast and to discuss all these great things.
Steve Little (1h 5m 23s):
I’m thrilled to be this. I get excited. I’ve been talking about this stuff for a year and a half, two years now, and I still get goosebumps. I’m just having the time in my life and I’m glad that other folks are finding this useful and helpful. I love helping folks do The next thing I’d like to do with these things.
Nicole (1h 5m 42s):
Well, to wrap it up, I want to refer everyone to go listen to Steve’s podcast that he does with Mark Thompson. It’s called the Family History AI Show. You can listen to it on Apple Podcasts and wherever you get your podcasts right now. They have nine episodes with a new one coming out every week, and we’ve been listening to these and they’re very fun. It’s so helpful to have Mark and Steve’s perspective on the new developments And. it also saves us time because we don’t have to go do the research. We can just listen to their podcast and we get all the news, the exciting developments and how it applies to genealogy. So thanks for doing the podcast that you’re working on with Mark, and we’re excited to continue listening to that.
Steve Little (1h 6m 27s):
Well Thank you. We directly ascribe our inspiration to you and Diana. Mark is also a grateful Research Like a Pro alumni. That’s how Mark and I met. We were both in Research Like a Pro with DNA study Group four, and so I’m deeply appreciative to you introducing me to my podcast co-host.
Diana (1h 6m 47s):
Great.
Nicole (1h 6m 47s):
Well, this has been a fun podcast episode. Thanks Steve, and we’ll talk to you guys all again next week. Bye
Steve Little (1h 6m 56s):
Bye.
Diana (1h 6m 57s):
Bye-Bye everyone.
Nicole (1h 6m 56s):
Thank you for listening. We hope that something you heard today will help you make progress in your research. If you want to learn more, purchase our books, Research Like a Pro and Research Like a Pro with DNA on Amazon.com and other booksellers. You can also register for our online courses or study groups of the same names. Learn more at FamilyLocket.com/services. To share your progress and ask questions, join our private Facebook group by sending us your book receipt or joining our courses to get updates in your email inbox each Monday, subscribe to our newsletter at FamilyLocket.com/newsletter. Please subscribe, rate and review our podcast. We read each review and are so thankful for them. We hope you’ll start now to Research Like a Pro.
Links
The Family History AI Show with Steve Little and Mark Thompson – https://podcasts.apple.com/us/podcast/the-family-history-ai-show/id1749873836
National Genealogical Society – Empowering Genealogists with Artificial Intelligence – https://www.ngsgenealogy.org/ai/
Sponsor – Newspapers.com
For listeners of this podcast, Newspapers.com is offering new subscribers 20% off a Publisher Extra subscription so you can start exploring today. Just use the code “FamilyLocket” at checkout.
Research Like a Pro Resources
Airtable Universe – Nicole’s Airtable Templates – https://www.airtable.com/universe/creator/usrsBSDhwHyLNnP4O/nicole-dyer
Airtable Research Logs Quick Reference – by Nicole Dyer – https://familylocket.com/product-tag/airtable/
Research Like a Pro: A Genealogist’s Guide book by Diana Elder with Nicole Dyer on Amazon.com – https://amzn.to/2x0ku3d
14-Day Research Like a Pro Challenge Workbook – digital – https://familylocket.com/product/14-day-research-like-a-pro-challenge-workbook-digital-only/ and spiral bound – https://familylocket.com/product/14-day-research-like-a-pro-challenge-workbook-spiral-bound/
Research Like a Pro Webinar Series 2024 – monthly case study webinars including documentary evidence and many with DNA evidence – https://familylocket.com/product/research-like-a-pro-webinar-series-2024/
Research Like a Pro eCourse – independent study course – https://familylocket.com/product/research-like-a-pro-e-course/
RLP Study Group – upcoming group and email notification list – https://familylocket.com/services/research-like-a-pro-study-group/
Research Like a Pro with DNA Resources
Research Like a Pro with DNA: A Genealogist’s Guide to Finding and Confirming Ancestors with DNA Evidence book by Diana Elder, Nicole Dyer, and Robin Wirthlin – https://amzn.to/3gn0hKx
Research Like a Pro with DNA eCourse – independent study course – https://familylocket.com/product/research-like-a-pro-with-dna-ecourse/
RLP with DNA Study Group – upcoming group and email notification list – https://familylocket.com/services/research-like-a-pro-with-dna-study-group/
Thank you
Thanks for listening! We hope that you will share your thoughts about our podcast and help us out by doing the following:
Write a review on iTunes or Apple Podcasts. If you leave a review, we will read it on the podcast and answer any questions that you bring up in your review. Thank you!
Leave a comment in the comment or question in the comment section below.
Share the episode on Twitter, Facebook, or Pinterest.
Subscribe on iTunes, Stitcher, Google Podcasts, or your favorite podcast app.
Sign up for our newsletter to receive notifications of new episodes – https://familylocket.com/sign-up/
Check out this list of genealogy podcasts from Feedspot: Top 20 Genealogy Podcasts – https://blog.feedspot.com/genealogy_podcasts/
Leave a Reply
Thanks for the note!