Saturday, June 5, 2021

Ethnicity Guestimates

Probably the biggest draw of direct-to-consumer genetic testing companies is their ethnicity or ancestry estimates. But if you've ever tested with multiple companies or known someone who has, you might have noticed that the results are not the same from company to company. So let's get this out of the way now - those ethnicity estimates are just estimates and likely to be wrong. You cannot be certain of the geographic origin of your ancestors by looking at a single DTC genetic test result. That doesn't mean that the information can't still be useful or interesting - but more on that another day. For those who haven't bought a test yet, let's use my results as an example to understand why these discrepancies in ethnicity estimates exist. I have tested on both 23andMe & Ancestry, as well as uploaded my 23andMe raw data to the MyHeritage database (which is free).

23andMe Results
According to 23andMe, I'm 46.5% British/Irish, 17.4% French/German, 7% Broadly Northern European (which also would be associated with being British/Irish and French/German), 24.5% Italian, 0.5% Greek/Balkan and 2.8% North West Asian (split between Anatolian aka Turkish and Cypriot).
Ancestry Results
According to Ancestry, I'm 40% Irish, 19% English/Northwestern European (which they say includes parts of France, Belgium and the Netherlands), 19% Scottish, 11% French, 7% Northern Italian, 2% Cypriot, 1% Middle Eastern, and 1% Germanic. 
MyHeritage Results (using 23andMe data)
According to MyHeritage, I'm 27.5% North/Western European (which encompasses France and Germany), 21.8% Irish/Scottish/Welsh, 18.7% English, 14.2% Greek/Southern Italian, 9.9% Italian, 2.2% Balkan, and 5.7% North African. So what gives? 

One of the first things to notice is that each company carves up the globe differently. They have decided to cluster geographic areas differently, which can account for a big chunk of the ethnicity estimate differences. For example, 23andMe doesn't separate Scotland or Wales from their British/Irish estimate, but Ancestry lumps English ancestry with Northwestern Europe, while MyHeritage is considering North/Western Europe to include the same places as Ancestry but also France and Germany which both 23andMe and Ancestry separate out. Since there's not an easy way to compare the percentage of my DNA associated with single countries directly between the 3 companies, let's just count the percentage of my DNA that seems to have come from England, Wales, Scotland, Ireland, the Netherlands, France, and Germany: 23andMe says that's 71%, Ancestry says that's 90%, and MyHeritage says that's 68%. Now we have a better agreement between two of the companies, but it's still not a concordance. So how does that compare to what I know about my family after more than 25 years of paper genealogy research?

Each person has 4 biological grandparents and will inherit roughly 25% of their DNA from each grandparent (NOTE: it is possible to inherit slightly more or less DNA from your grandparents due to the random shuffling of DNA that occurs when parents make their egg and sperm). My father's father was born to two people who were half English and half Irish, so he was likely 50% English and 50% Irish. My father's mother was born to a 100% German mother and an English/Welsh/Dutch/French father so she was likely 50% German and a 50% mix of the other 4 places. My mother's father was born to a Scottish/Irish father and an English/Irish/German/Dutch mother. My mother's mother was born to first-generation Italian Americans - all their family that I can trace to back to the mid-1850s lived in Southern Italy. This grandmother's DNA on both 23andMe and Ancestry suggests she's about 80% Italian and the rest is a mix of Greek/Cypriot or North West Asian.

For simplicity's sake, let's just focus on a small bit of my ancestry - the German/French/Dutch bit. We can guess that my dad is about 25% German and that I, in turn, should have gotten about 12.5% German DNA from those ancestors, but I also could have inherited some German DNA from my mom's side of the family, as well as French and Dutch DNA from both sides. This means that estimates from 23andMe and MyHeritage are probably pretty accurately gauging those populations. But Ancestry saying I'm more French than German doesn't make a lot of sense based on oral family history and my paper trail. It was interesting to me, however, that each company was capable of confirming some of the locations I know my ancestors to have emigrated from: Ancestry notes an association with Roscommon in Ireland, 23andMe identified Campania in Italy, and MyHeritage linked me to Aberdeen in Scotland.

So why are there sometimes significant differences between companies when your DNA obviously doesn't change? It's largely due to the fact that each company uses its own proprietary method of assigning DNA segments to different geographic locations in the first place. However, each method relies on a similar approach. Basically, the company will select a group of people currently living in a particular geographic location who are deemed to have "pure" DNA for that area to serve as a reference for all the people who test with the company. If you are found to have DNA that matches with a particular reference dataset, they assign that segment to match that geographic region, and then they tally all your DNA segments to give you the ethnicity estimate. Since each company selects its reference populations differently and clusters geographic areas differently, it is impossible to get complete concordance between companies. If you want to learn more about these methods, check out this on 23andMe, this on Ancestry and this on MyHeritage. (And if you really want to get into the science, read the white papers from 23andMe and Ancestry - I couldn't locate one for MyHeritage. Some of the reference sample sets are shockingly small.) And if you need more helpful visuals on this topic, check out the video from Vox below.

No comments:

Post a Comment