dna
Y-chromosome Basics

It is the intent of this page to give an easy to understand explanation of the terms used in the Y-chromosome DNA testing. The explanations presented here are sufficient to understand the results tables and the analysis of the data presented on the DNA Project page. Technical rigor is certainly not the intention! for a more rigorous discussion, see, for example, the FamilyTreeDNA site listed on the Favorite Links page.

First, a brief word about the Y-chromosome is appropriate. DNA contained in the nucleus of most of our body's cells is composed of 23 pairs of long helices such as depicted in the cartoon above.  The 23rd pair, the so called "sex pair", is composed of either two X-chromosomes in the case of females or one X and one Y-chromosome in the case of males. So, only males have a Y-chromosome which they inherit from their fathers who, in turn, inherited from their fathers and so on in generations. Consequently, Y-chromosome testing applies only to the paternal pedigree and only males can be test subjects in this project.

What follows are explanations of the columns in Table 1 on the main project page.

Kit: The numbers in this column are the test kit numbers assigned to each test subject
by FamilyTreeDNA when they join the project . The Kit numbers identify the test subjects.  Other than disclosing the earliest known Hayden/Haden/Hyden males ancestor,  no other personal identifying information is disclosed either on the web site or by the project administrator without the consent of the test subject.

Group #: The group identifications in this column are assigned by the project administrator. The reason for the assignments are discussed in the analysis sections. Test subjects with the same Group # are likely related as shown in the hypothesized pedigree diagrams or closely related to in the same family line


Earliest Known Ancestor: The earliest known "Hayden" ancestor is given in the table immediately below the Results Table. This ancestor is usually known from traditional, well documented genealogical research. It is typically the next generation back that the researcher is testing with Y-chromosome analysis.


Haplo*: The next column right of Group # is the test subjects haplogroup. I will return to this subject in a couple paragraphs.


DYS#: The next 37 columns right of Group # contains the Y-chromosome data for each test subject. The values in the first row are the DYS#s which have been assigned by scientists to each of the markers on the Y-chromosome which are measured. If a 12 marker test is ordered, then results are returned for DYS#s 393 through 389-2. The 25 marker test returns 393 through 464d. The 37 marker test through 438. The DYS#s in red are known to mutate at a faster rate than those in black. It is also noted that none of these markers have any connection to any genes.


The numbers in the DYS# columns refer to the number of repeats of a specific unit of DNA bases (different sequences for each DYS#). If two individuals have different values for the same DYS#, then they have a different number of repeats or otherwise known as different alleles for this marker. The rows following the column headed by "Haplo*", consequently, constitute that Kit # subject's Y-chromosome data. This collection of "repeat numbers" is called the test subject's haplotype. For example, in the results table for Kit 25320, the alleles in the row starting with DYS# 393 (13) through 438 (12) are that test subject's haplotype.

Now we can return to an explanation of "haplogroup". Haplotypes can be placed in a common group called a haplogroup. Members of the same haplogroup all descend from the same founding father who lived tens of thousands of years ago. On the flip side, members of different haplogroups are likely not related along their paternal in a genealogical time frame. Studies by scientists of haplogroups peovide many clues to the migration of early humans but are of limited use in genealogy. FamilyTreeDNA predicts haplogroup from each test subject's haplotype. These predictions are shown in red in the "Haplo*" column. In the case of haplogroups labeled green, such as for Kit 25320, the haplogroup has been confirmed (not just predicted) by an additional test.

Most of the members of the Hayden Family DNA Project belong to haplogroup R1b1. One might ask, "If all the members have the same haplogroup, thus the same founding father, then why do the haplotypes differ?". The answer is that over time, the allele at each marker can mutate and change from say 12 to 13. There is a great deal of discussion in the literature about mutation rates. It is apparent that mutation rate varies by DYS# which complicates analysis. However, for the type of pedigree hypothesis testing in this project, it is sufficient to assume an average rate to apply to all markers.

Average mutation rates quoted in the technical literarure vary from a conservative once in 500 generations (0.002, or 0.2%, chance of a mutation occuring at a marker at a generation change) to once in 250 generations (0.04 or 0.4% chance). For the first 12 and the first 25 markers, where much of our hypothesis testing is done, once in 400 generations (0.0025 or 0.25% chance that any one marker will mutate at a generation change) is a reasonable estimate and the one used here. In the case of 37 markers, a slightly higher rate of 0.3% is used.

Note that this is an AVERAGE rate; a DYS# marker could mutate twice in 400 generations or once in 800. It is important to understand that proper analysis requires the use of statistics. So Y-chromosome testing can provide compelling evidence about the connection of family lines in a genealogical time frame but cannot say for certain that the common ancestor lived exactly so many generations ago.

Because of mutations, exact matchs are not always the most probably results. One can calculate the most probable number of mismatches as follows.

First, count the number of opportunities for a mutation between two test subjects. If you look at Pedigree Chart1 in the analysis section, you will see that Kit 25320 is hypothisized to be 6 generations removed from the common ancestor, Nathaniel, Sr., and Kit 25694 is 5 generations removed. So there are a total of 6 + 5 = 11 opportunities for any one marker to have mutated between these two test subjects.

Since Kit 25694 subject was tested at only 25 markers (compared to 37 markers for 25320), we can compare only the first 25 markers. So at each generation change, the chance for a mutation is 0.25% per marker times 25 markers or 6.25% that one of the 25 markers will mutate at each generation change. For the 11 generation changes then, the most probable number of mutations between these two test subjects is 0.0625 (6.25%) times 11 which equals about 0.7. Since a marker mutates or it doesn't, the most probably number has to be an integer; 1 in this case. Zero mutations is also highly probable. As the number of observed mutations increases above 1, the probability that the hypothesized pedigree in Pedigree Chart 1 is correct decreases. The FamilyTreeDNA web site has links to the details of the probabilities fpr those interested.

Now look at Table 1 on the results page for these two test subjects. You see a difference of 2; 1 at DYS# 439 (which is a faster mutating marker) and 1 at DYS# 437 (which is not faster mutating than average). This is called the genetic distance between these two test subjects. The genetic distant between any pair of test subjects is shown in Table 2 in the results section. So, based on just the difference between observed and most probably genetic distances, the hypothesized pedigree is not rejected but is still open to question. Remember, most probably genetic distance is a statistically derived value and not absolute. As explained in the analysis section, this along with other factors leads leads to the conclusion that the hypothesized pedigree is correct or close to being so.

< Return to the Main DNA Project Page