With the release of the new DNAGedcom 4, many are wondering the best way to use the tool. David Grawrock, CG, one of our Family Locket Genealogists team members is sharing a helpful tutorial for using this tool. After gathering DNA matches and shared matches into a spreadsheet, you can make network graphs with Gephi and perform other analyses.

-Nicole

Matches

Congratulations, you’ve got your DNA results back and you want to use them in your family research.

You could type them in one by one, but nobody, well at least sane people, wants to do that over 20 thousand times. And that doesn’t include the shared matches where you from 10 to 100 matches, which ends up looking like hundreds of thousands of entries if not into the millions. Note that my numbers are small as one set of grandparents are Norwegian and the other German, both regions notorious for having few testers. My wife has 46,256 matches for a comparison.

DNAGedCom

After rejecting the idea of manual entry, some sort of automation comes to mind. Thankfully, there is a program that does grab your data, DNAGedCom. The web site is:

www.dnagedcom.com

Program

DNAGedCom is not free. You must pay for its use. The website has the payment options.

DNAGedCom installs as an application on your local machine after downloading from the website. Installing with the defaults works just fine. Not going to reiterate the installation instructions, if you have problems work with the DNAGedCom team.

Version 4.0

Version 4.0 is currently in beta testing; however, it appears to be stable. As of 10 January 2025, the program can gather matches and ICW from 23, A, and FT. It can only gather matches from MH.

Bleeding Edge

In technical terms using an open beta, such as DNAGedCom 4.0 is not using leading edge technology, but rather bleeding edge technology. The difference is that the bleeding edge is VERY sharp and at times it can cut you, or in other words you are now bleeding. To avoid cutting yourself, backup often, keep a series of backups, and be ready for a potential hiccup. For “standard” use of getting matches and the ICW gathered, I personally haven’t had an issue with the program since way back in the closed Beta phase. The code works like it should. There are still some quirks, but the team continues to weed out those quirks.

Database Considerations

Database

The program, when gathering, stores the information in a SQL database in a location selected by the user, suggestions as to where to store are next. The database is a standard SQL database and capable of being read by SQL readers such as DBBrowser or code written in Python.

To select the location, first use the “Change Settings” button.

Note that when you restart the program, the database in use when you previously quit is automatically loaded as the database in use.

Switching databases

If the selected database is not correct, and there is a pre-existing database, then click on the three-dot button “…” and navigate in the file structure to the desired file.

If you want to create a new database, then click on the plus button “+” and navigate to the folder and name the database.

Database naming

The files use the standard .DB extension to indicate the file is a database. I like to name my databases with the program using the database, so as you can see the database name includes dnagedcom. That is merely a convention that I use. There is no requirement to use the program name in the file name.

Database file location

Where you store the file depends on how many research projects you have, how many companies you need to download from, and how time sensitive your needs are.

Single Tester, Single Company

When dealing with a single tester from a single company, the location can be anywhere. Personally, I like to have the file in a folder with my project, but it doesn’t really matter.

Single Tester, Multiple Companies

If you use a single database for all gathers, then the project folder works well.

If you use a database per testing company, the files MUST all be in the same location. The reason is fully explained in the Multiple Gathering section.

Multiple Testers, Single Company

You SHOULD use a single database in this situation. The reason is that it will drastically reduce your gather time and size of the database. The reason is that the ICW information is in many instances, duplicated. For example, you are A, first cousin B is the other tester, and C & D are second cousin matches. The first gather, for A, gets the AB, AC, AD, BC, BD, and CD. On the second gather for B, DNAGedCom can skip the matches already gathered, which are the AB, AC, AD, BC, BD, and CD.

Multiple Testers, Multiple Companies

I recommend the single database for the same reasons as the multiple testers, single company recommendation.

If you do separate into multiple files, you should separate them by company to gain the advantage of the gather time speed up.

Multiple Gathering

If you have multiple testers or companies, there is the obvious question of, “Can I do more at the same time?”

Maybe. That’s a horrible answer, but the truth as of January 2025, and the 4.0 beta test version, there are some issues.

Given that I’ve had to do multiple gathers, across multiple testers, and across multiple companies, I’ve felt the pain, or had to rerun entire gathers, from these options.

Gather two testers from the same company

Two testers from the same company, why not run two instances of DNAGedCom to gather faster?

DO NOT DO THIS. The reason is not technical but practical. While it might work, the gathering process takes up resources of the testing company. Doing this for multiple testers, from your single account, can put a dent in throughput for others on the website.

Gather same instance, different companies, same database

In this model you run a single copy of DNAGedCom and start gathers at more than one company, all to the same database. The design of DNAGedCom SHOULD allow this. It’s not deeply tested and may have hiccups. The hiccups are difficult to discover.

Given the difficulty of discovering the hiccups, I do not recommend this type of gather until the DNAGedCom team explicitly state this is tested and working.

Gather different instances, different companies, same database

In this model you start DNAGedCom twice, or more, with all instances pointing at the same database, and start gathering for different companies. This SHOULD work but has the same caveats as using the same instance.

Given the difficulty of discovering the hiccups, I do not recommend this type of gather until the DNAGedCom team explicitly state this is tested and working.

Gather different instances, different companies, different databases in same folder,

In this model you start DNAGedCom twice, or more, each instance pointing to a different database, the databases are all in the same folder and start gathering for the different companies.

While this worked in 3.0, this does NOT work. The team want this to be right in the future, but as of January 2025, do not attempt this.

Gather different instances, different companies, different databases in different folders,

In this model you start DNAGedCom twice, or more, each instance pointing to a different database, the databases are in different folders and start gathering for the different companies.

This did NOT work in 3.0 and it is NOT working in 4.0. It WILL corrupt the databases. The only recovery mechanism is to go to a backup or do the gathers from scratch.

Conclusion

Currently, the only concurrent processing I think is worth trying is same instance, same database, different companies.

Gathering Quirks

Time

How long will the gathering take? There is no hard and fast number, it all depends on how many matches to gather, how many of those matches have oodles of shared matches, and the load on the testing web site.

For instance, with my twenty thousand matches, a first-time download into a new database takes days. For my mother-in-law, with her Ashkenazi heritage, it’s weeks. This is especially true if you gather to the minimum size like 8 cM.

To minimize the time, my technique is to filter the first gather to 20 cM and above. The resulting database, while not complete, provides a starting point for analysis. After doing that gather, I’ll filter from 15-20 cM, and then a gather at 10-15. Only after those are done will I do the final gather of 8-10 cM. With those gathers complete, once every month or so, I’ll do a gather on the current database to catch any new matches.

Count

As you gather, on the left side of the screen, there is a count. That count is how many ICW matches need to be gathered for the current match. It’s helpful but it doesn’t tell you how many matches are remaining to gather.

Pausing

After gathering for some time, the gather can pause on an individual match. If you see the same name for some period, dependent on your patience, you can unfreeze the gather by pushing the stop button on the right. The great design point of DNAGedCom is that an interruption doesn’t really affect the gather and when restarted DNAGedCom picks up where it left off. This includes things like being interrupted by an OS reboot. If a reboot occurs in the middle of your gather, just relaunch DNAGedCom, filter for the same cM size, and restart the gather. DNAGedCom will figure out where it was and continue. This feature is critical to understand that you won’t lose the 2 weeks long gather that was ongoing.

Conclusion

This is only the start of what you can do with DNAGedCom, there are reports it can directly create. In addition to those reports, using the database for external activities, like network graphs in Gephi are possible.

If you are attempting to do more with your DNA results, then what is provided by your testing company, I strong encourage you to obtain and use DNAGedCom.

Additional Filter Images

Updated 5/5/2025

You bring up the filter panel by clicking the filter button.

Then from the filter panel you select shared DNA.

Karen Loop Boehme

· Reply

February 9, 2025 at 12:33 PM

I have questions! I have been gathering each of my testers in their own .db file for a single project. I’ve also been separating the .db file by platforms. From the post, I think I am supposed to be gathering them all in one big .db file from across all the platforms. Is that correct? The post also suggests (I think!) that I can run multiple gathers at a time on one device (meaning open the program multiple times on the same device). I have been running them concurrently on 3 separate devices with different .db files, but not on the same device nor trying to use the same .db file. Is this how everyone uses DNAGedcom 4? Do I now need to rerun all those testers (5 of us) into one big data file? Do I use the same .db file for the concurrent gathers? I’ve also asked this question on the RLP with DNA 7 FB group. Thanks for your help and for a very enlightening post!!!

Diana Elder
· Reply

February 10, 2025 at 3:44 PM

From David:
Multiple testers into multiple dbs is fine but the issue is going to be trying to merge them and do the cross analysis that a single db can provide.

As they have access to multiple devices to do the downloads I would not change that. One could write some python or other language to merge the dbs after completing the downloads.

There is no reason to redo the downloads. They are just fine. Just need to merge the data.

Pam

February 10, 2025 at 6:35 PM

this explanation was quite confusing.

MARCIE RITTER

February 12, 2025 at 9:17 AM

Thank you for this helpful article! I have used DNAGedcom in the distant past to gather info. I understand that now some of the companies are wary of gathering like this. Will using a method like David suggests to save time (starting with a small range of cM and gradually increasing it) also make it less likely that one’s account will be flagged? Thank you!

Kendra Stephenson

February 14, 2025 at 6:36 AM

Great article! It is nice to have gathering to one database explained clearly.

I’ve been using 4.0 since December without issue on the gather side. My question is the ICW (in common with) file generated that is needed to complete a gephi graph. My ICW files contain lines where each match has an entry that shows the match in common with itself. In looking back on the ICW files I generated with DNAGedCom 3.0, I didn’t have this issue. I use A with protools. The Gephi graphs I generate with the 4.0 are difficult to get to delineate and frequently locks up Gephi. Is there something I can or should do to correct this?

David Grawrock
· Reply

Author

February 25, 2025 at 1:40 PM

Sorry for the delay in responding, I was traveling.
That sounds like a DNAGedCom issue, you shouldn’t see those records that loop back on themselves. I’m sure that those types of rows would cause issues with Gephi. Two suggestions, first communicate with the DNAGedCom tech support team. They may have an answer.
Second would be to note what match has the duplicate record, create a new database, and then gather just the cM size for the duplicate i.e. set the range to 32-32 cM. Check that new database for duplicates. If they are present, then you have a problem with DNAGedCom. If they are not, then something happened on your gather and you most likely should just do the gather on a new database.

Martina P. White

May 4, 2025 at 2:58 PM

David,

Could you show a picture of where you are setting the filters in the gathering process for the new 4.0, while using “A” DNA Company. After I log into this particular DNA company I am workin with while in the “Gather’ window, I immediately get the picture you show above. I do have the ability to choose which kit I want to gather from, and continue with the usual “A” filters, but I don’t see any prompts for choosing the cm filters in the gathering. So let’s say I click on my Father’s Paternal side filter, how can I eliminate the under 20cm kits or above 400 cm kits to make things go faster? This is something I was able to do in the earlier versions of the Client. My ultimate goal is to produce a CLM colored graph of clusters.

Thank you,

Tina

May 4, 2025 at 3:00 PM

PS: I’m working on a MAC

David Grawrock

Author

May 4, 2025 at 5:55 PM

Sure. You just use the filters provided by A to filter. I’ll add the pictures at the bottom of the post.

John Branan

May 28, 2025 at 2:38 PM

Thank you for the great article. I had some difficulty with Ancestry “overloading” and kicking me out on my first attempt to capture data. I had better luck when I set the filter at the top from “All” to “None”, but that seems to be overlooking the ICW matches that I seek. Do you have any suggestions to gather all the data without getting kicked out?

DNAGedcom 4: Efficiently Gathering and Storing DNA Matches

Matches

DNAGedCom

Program

Version 4.0

Bleeding Edge

Database Considerations

Database

Switching databases

Database naming

Database file location

Single Tester, Single Company

Single Tester, Multiple Companies

Multiple Testers, Single Company

Multiple Testers, Multiple Companies

Multiple Gathering

Gather two testers from the same company

Gather same instance, different companies, same database

Gather different instances, different companies, same database

Gather different instances, different companies, different databases in same folder,

Gather different instances, different companies, different databases in different folders,

Conclusion

Gathering Quirks

Time

Count

Pausing

Conclusion

Additional Filter Images

About David Grawrock

You also might be interested in

Research Like a Pro with DNA Airtable Base Update – 2024

Should I sort my 23andMe matches by percent or strength of relationship?

If I know how a DNA match is related, do I still need to contact them?

10 Comments

Leave your reply.

Leave a Reply

Thanks for the note!

Search

Blog Post Categories

Airtable Links

Free Downloads

Institute Courses

Featured Products

Sale

Newsletter Sign Up – Receive Excel Research Log