How to Evaluate an AncestryDNA Thrulines Hypothesis
Are you wondering how to tell if an AncestryDNA Thrulines estimate is reliable, or even possible? The Thrulines algorithm works by comparing your DNA matches, their trees, and all the searchable trees in the Ancestry public member tree database. If the algorithm can make your tree and your DNA match’s tree connect somewhere, the hypothesis shows up in Thrulines. One problem with this is the many errors in the public member trees database. Also, Thrulines is a computer algorithm, not a genealogist. Sometimes it merges identities that should not be merged. Also, most of us have thousands of matches in the AncestryDNA database. Some of those matches are false, as explained brilliantly by Blaine Bettinger here:
Blaine Bettinger, “The Danger of Distant Matches,” blog post, 6 January 2017, The Genetic Genealogist (https://thegeneticgenealogist.com/2017/01/06/the-danger-of-distant-matches/).
I recently came across a Thrulines estimate that I wanted to check for accuracy. It was a Thruline for a cousin, Deanna, who shared her results with me.* I had not worked on her family tree or her matches much, so I had to do some work to determine if the Thrulines estimate was accurate. After finding that it was false, I made a list of criteria to determine if a Thrulines estimate is reliable or not.
*Deanna and her matches have all been privatized throughout this post.
-Most of the matches share over 15 cM: If most of the matches are under 15 cM, many could be false matches sharing a pseudosegment with you instead of an actual segment.
-Matches are grouped to the correct parent: if one or both parents have tested, the DNA matches in the Thrulines estimate should be grouped in the appropriate maternal or paternal group – a Thruline through your mother’s side should only include matches that say “Mother’s Side.”
-Matches form a genetic network: when you check the shared match lists or cluster report, you should see that the matches also match each other and form a network
-Documentary evidence points to a possible connection: the names, dates, and places for the known ancestor, potential ancestor and potential siblings look feasible
Possible Red Flags:
-Common surname: there’s an increased risk for erroneous merges by the Thrulines algorithm and in trees users created
-Potential ancestor lived before 1800: increased risk of errors in user submitted trees, more possible descendants to find among your thousands of matches
-No siblings: this is an obvious problem – if the only matches descend through your known ancestor, the most recent common ancestor of the matches is not the ancestor in the Thruline estimate
Thrulines Example #1 – Martha M. Silvius
One of the common types of Thrulines I see that are not helpful, though not necessarily erroneous, are the type with no siblings to our known ancestor. Here is an example from my Thrulines:
In my Thruline to Martha M. Silvius, all the matches are through my known 2nd great-grandmother. This is actually a Thruline showing evidence that Jessie Estelle Ross is my 2nd great grandmother. The only evidence that Martha M. Silvius is my 3rd great grandmother is from my tree and other trees – there are no DNA matches corroborating it. If this Thruline had additional siblings to my known ancestor, Jessie Estelle Ross, with DNA matches descending from them, then that might actually provide evidence about Martha being my 3rd great grandmother.
Thrulines Example #2 – Isaac Newton Wilson
The next benchmark to check is if most of the matches share more than 15 cM. The screenshot above shows matches who descend from Sarah E. Wilson, a possible sister to Arthusa Wilson. All except one match, Art, are over 15 cM, so much less likely to be false matches.
The next thing to check is if the Kat, Joe, Di, Nel, Jen, and Art are truly paternal matches, since this Thruline is supposedly through Deanna’s father. Deanna’s mother has tested at Ancestry, but not her father. We would then expect to see the maternal/paternal bucketing system say “no group assigned” instead of “mother’s side” on all of the matches in this Thruline.
I clicked on the matches with Kat, Joe, Di, Nel, Jen, and Art and found that none of them labelled “mother’s side.” As you can see in the example above, Sarah E. Wilson’s descendant, “Kat,” is not assigned to a group. We can then infer that because she is not in the “mother’s side” group, she is on Deanna’s father’s side. Another possibility is that she is a false match, but since she shares 22 cM, it’s less likely. Kat matches Deanna’s son and daughter also, who are labelled “Mother’s Side.” This is because they also match Deanna’s mother, which we would expect. The other shared matches don’t match Deanna’s mother, so they all appear to be paternal matches.
To confirm that the matches in this Thruline form a genetic network, I checked their location in a cluster chart called a network graph. I used Gephi to create the network graph with Deanna’s matches downloaded from the DNAGedcom client. Each colored cluster represents a genetic network – matches who also match each other. The idea is that each genetic network has a common ancestor or shared ancestral line. (To get your own network graph, contact Shelley Crawford of ConnectedDNA!)
The graph is zoomed out very far so you can’t see the nodes and lines connecting very well in the screenshot. The nodes/dots in the graph are Deanna’s DNA matches and the lines are shared match connections. When I zoom in, I can see nodes and labels with match’s names.
I searched the list of matches in Gephi and located Kat, Joe, Di, Nel, and Jen on the network graph. They are all located in salmon pink cluster at the bottom, showing that they are in the same genetic network. That network probably has Isaac Newton Wilson as one of the common ancestors. Other potential common ancestors for that cluster include Isaac Newton Wilson’s wife, and/or parents.
After evaluating this Thruline, I believe it is a good hypothesis. I can work toward proving it with documentary research and further evaluation of shared cM and relationships.
Thrulines Example #3 – Jeremiah I. Green
This is a another Thruline for Deanna to Jeremiah I. Green. At first glance, it looks good because there are matches through several siblings. When I looked at the matches coming from each potential sibling to Deanna’s ancestor, I realized that most of them were under 10 cM. Only eight of the thirty matches shared over 15 cM with Deanna. I was also wary that the surname Green is somewhat common and the potential ancestor, Jeremiah Green, lived in the early 1700s. To see if this Thrulines estimate could be accurate, I focused on verifying the traditional research for Deanna first.
Deanna supposedly descends through Jeremiah’s daughter Martha Mabel Green. When I opened Martha Mabel Green’s branch of the Thruline, I noticed that there were two generations of “potential ancestors” to evaluate before the line got to someone already in Deanna’s tree, Lucinda Harris. Deanna only had the following info for Lucinda Harris:
b. 1802 [place unknown]
married William P. Jones (1797-?)
Child: Francis Lily Jones
No sources, no parents, no other children, no place of birth or death.
The information on Lucinda was so sparse that I decided to started with Deanna’s mother and verified the parent-child links back to her great-grandmother, Francis Lily Jones. When I got to her, the trail became difficult to follow.
Francis Lily Jones married Newton Elijah Laster and they lived in Hawkins, Tennessee. I reviewed Lily’s death certificate from 1921 and it revealed the following perplexing information for her parents:
Name of father: Unknown Ligitimate
Birthplace of father: [blank]
Maiden name of mother: Lucy Jones
Birthplace of mother: Tennessee
As I continued to review the documentary research, I found many public member trees with Lily’s parents as James C. Jones and Lucinda Jones. They seemed to be part of a large Jones family in Hawkins County, Tennessee, and maybe were cousins who married each other.
Going back to the Thrulines estimate I clicked on John Early Harris possible father for Lucinda, keeping in mind how common the name Harris is. The sidebar opened, as shown in the screenshot above, with information to evaluate Martha Mable Green possibly being the mother of John Early Harris. The trees did show that a John Early Harris of Warren, NC was the father of a person named Lucinda Harris, whose spouse and children were unknown. This John Early Harris’ parents’ were Joseph Harris and Jane Egerton.
Another tree showed that a John Earley Harris of Granville, NC was the son of Isham Harris and Martha Mabel Green, but this John Earley had no children. Although both men named John Earley Harris had the same birth dates and death dates, there were clearly two separate men whose identities had been merged.
In reviewing sources for John Earley Harris, I found a FindAGrave memorial (with no headstone photo) for him that showed several of the siblings in the Thrulines. This was the family for Martha Mabel Green and Isham Harris. The other John Early Harris had a daughter named Lucinda Harris. The parent-child links in the Thrulines for this generation were not viable and were founded on the merging of two identities. Traditional research showed that Deanna’s didnt’ actually go back to Martha Mabel Green. So why were there so many matches to Martha’s descendants showing up in the Thruline?
Another criteria for a good Thrulines estimate is that the matches should be on the appropriate side of your family. Deanna’s Thrulines estimate to the Green family was supposedly through her mother, who has also tested at Ancestry, but hasn’t shared her results with me. However Deanna’s matches have been grouped so that those who also match Deanna’s mother say “mother’s side.” If the match doesn’t say “mother’s side,” there are only two other possibilities: it is a paternal match or a false match. When I checked J, D, V, R, and K, the matches descending through Martha Mabel Green, only one of them was an actual maternal match. See the screenshots below:
Update 11/16 8:45am: I learned that matches under 20 cM are not given a mother’s side or father’s side designation. So, I can’t assume that J is not a mother’s side match just because there’s no “mother’s side” label. I would have to look in Deanna’s mother’s matches to see if she matches J also. The same applies to the match between Deanna and K below, who share only 9 cM.
Many people have not had either of their parents test at Ancestry, so they can’t do this test. However you can still look at the shared match lists and see if the shared matches are known maternal or paternal cousins. This works well for matches over 20 cM, but under 15-20 cM, you may see some matches who are related on your mother’s side and your father’s side. This is why some of the matches above look like they could be maternal matches, but they are not.
For example, K supposedly shares 9 cM with Deanna. There are 4 shared matches – SongMom1647, Heather, RSmith2, and E.L. Two of the matches also match Deanna’s mother, and are put in the “Mother’s Side” group. The other two are not assigned to a group, so are probably paternal. Shared match lists don’t show you people who are all triangulating on the same segment and thus probably all have a common ancestor – they are just people who share DNA with you and with each other. They could have a different common ancestor between them than you have with each of them.
Another criteria for a good Thrulines estimate is that the DNA matches form a genetic network. You can check this by looking at each match and viewing their shared matches. You should be able to see some shared match connections between the matches. Shared match lists only go down to 20 cM, so you may not see any shared matches for some.
You can also check their location in a cluster chart or network graph, as I mentioned before. I found all the matches that were 14 cM and over in the Green Thrulines estimate, since 14 cM was the minimum threshold included in the graph. As I mapped each of their locations on the graph, I was not surprised to find that they were not located in the same part of the graph. Some were located in or near maternal clusters, others were located in paternal clusters, and some were in a blob in the center that probably includes some false matches who seem to match several clusters.
My conclusion: this Thrulines estimate is inaccurate. Not only did the documentary sources show the parent-child links were based on merged identities and erroneous trees, but the analysis of maternal/paternal groupings of matches showed that the matches were not on the maternal side. The cluster analysis showed the matches did not appear in the same genetic network. This Thrulines estimate had the following red flags:
-The potential ancestor was born before 1800
-Most of the 30 matches were under 15 cM:
8-15 cM: 22 matches
16-30: 7 matches
30 +: 1 match
–The matches didn’t match the correct parent
-The matches didn’t group into a genetic network
So, beware of Thrulines estimates! Even though they might look good at first with thirty matches descending from multiple siblings. If many matches are small matches, the Thrulines estimate could be false.
Good luck as you evaluate your Thrulines!