
Y-chromosome Basics
It is the intent of
this page to give an easy to understand explanation of the terms used
in the Y-chromosome DNA testing. The explanations presented here are
sufficient to understand the results tables and the analysis of the
data presented on the DNA Project page. Technical rigor is certainly
not the intention! for a more rigorous discussion, see, for example,
the FamilyTreeDNA site listed on the Favorite Links page.
First, a brief word about the Y-chromosome is appropriate. DNA
contained in the nucleus of most of our body's cells is composed of 23
pairs of long helices such as depicted in the cartoon above. The
23rd pair, the so called "sex pair", is composed of either two
X-chromosomes in the case of females or one X and one Y-chromosome in
the case of males. So, only males have a Y-chromosome which they
inherit from their fathers who, in turn, inherited from their fathers
and so on in generations. Consequently, Y-chromosome testing applies
only to the paternal pedigree and only males can be test subjects in
this project.
What follows are explanations of the columns in Table 1 on the main
project page.
Kit: The numbers in
this column are the test kit numbers assigned to each test subject by FamilyTreeDNA when they join the
project . The Kit numbers identify the test subjects. Other than
disclosing the earliest known Hayden/Haden/Hyden males ancestor,
no other personal identifying information is disclosed either on the
web site or by the project administrator without the consent of the
test subject.
Group #: The group identifications in this column
are assigned by the project administrator. The reason for the
assignments are discussed in the analysis sections. Test subjects with
the same Group # are likely related as shown in the hypothesized
pedigree diagrams or closely related to in the same family line
Earliest Known Ancestor: The earliest known "Hayden" ancestor is
given in the table immediately below the Results Table. This ancestor
is usually known from traditional, well documented genealogical
research. It is typically the next generation back that the researcher
is testing with Y-chromosome analysis.
Haplo*: The next
column right of Group # is the test subjects haplogroup. I will return
to this subject in a couple paragraphs.
DYS#: The next 37
columns right of Group # contains the Y-chromosome data for each test
subject. The values in the first row are the DYS#s which have been
assigned by scientists to each of the markers on the Y-chromosome which
are measured. If a 12 marker test is ordered, then results are returned
for DYS#s 393 through 389-2. The 25 marker test returns 393 through
464d. The 37 marker test through 438. The DYS#s in red are known to
mutate at a faster rate than those in black. It is also noted that none
of these markers have any connection to any genes.
The numbers in the DYS# columns refer to the number of repeats of a
specific unit of DNA bases (different sequences for each DYS#). If two
individuals have different values for the same DYS#, then they have a
different number of repeats or otherwise known as different alleles for
this marker. The rows following the column headed by "Haplo*",
consequently, constitute that Kit # subject's Y-chromosome data. This
collection of "repeat numbers" is called the test subject's haplotype.
For example, in the results table for Kit 25320, the alleles in the row
starting with DYS# 393 (13) through 438 (12) are that test subject's
haplotype.
Now we can return to an explanation of "haplogroup". Haplotypes can
be placed in a common group called a haplogroup. Members of the same
haplogroup all descend from the same founding father who lived tens of
thousands of years ago. On the flip side, members of different
haplogroups are likely not related along their paternal in a
genealogical time frame. Studies by scientists of haplogroups peovide
many clues to the migration of early humans but are of limited use in
genealogy. FamilyTreeDNA predicts haplogroup from each test subject's
haplotype. These predictions are shown in red in the "Haplo*" column.
In the case of haplogroups labeled green, such as for Kit 25320, the
haplogroup has been confirmed (not just predicted) by an additional
test.
Most of the members of the Hayden Family DNA Project belong to
haplogroup R1b1. One might ask, "If all the members have the same
haplogroup, thus the same founding father, then why do the haplotypes
differ?". The answer is that over time, the allele at each marker can
mutate and change from say 12 to 13. There is a great deal of
discussion in the literature about mutation rates. It is apparent that
mutation rate varies by DYS# which complicates analysis. However, for
the type of pedigree hypothesis testing in this project, it is
sufficient to assume an average rate to apply to all markers.
Average mutation rates quoted in the technical literarure vary from a
conservative once in 500 generations (0.002, or 0.2%, chance of a
mutation occuring at a marker at a generation change) to once in 250
generations (0.04 or 0.4% chance). For the first 12 and the first 25
markers, where much of our hypothesis testing is done, once in 400
generations (0.0025 or 0.25% chance that any one marker will mutate at
a generation change) is a reasonable estimate and the one used here. In
the case of 37 markers, a slightly higher rate of 0.3% is used.
Note that this is an AVERAGE rate; a DYS# marker could mutate twice in
400 generations or once in 800. It is important to understand that
proper analysis requires the use of statistics. So Y-chromosome testing
can provide compelling evidence about the connection of family lines in
a genealogical time frame but cannot say for certain that the common
ancestor lived exactly so many generations ago.
Because of mutations, exact matchs are not always the most probably
results. One can calculate the most probable number of mismatches as
follows.
First, count the number of opportunities for a mutation between two
test subjects. If you look at Pedigree Chart1 in the analysis section,
you will see that Kit 25320 is hypothisized to be 6 generations removed
from the common ancestor, Nathaniel, Sr., and Kit 25694 is 5
generations removed. So there are a total of 6 + 5 = 11 opportunities
for any one marker to have mutated between these two test subjects.
Since Kit 25694 subject was tested at only 25 markers (compared to 37
markers for 25320), we can compare only the first 25 markers. So at
each generation change, the chance for a mutation is 0.25% per marker
times 25 markers or 6.25% that one of the 25 markers will mutate at
each generation change. For the 11 generation changes then, the most
probable number of mutations between these two test subjects is 0.0625
(6.25%) times 11 which equals about 0.7. Since a marker mutates or it
doesn't, the most probably number has to be an integer; 1 in this case.
Zero mutations is also highly probable. As the number of observed
mutations increases above 1, the probability that the hypothesized
pedigree in Pedigree Chart 1 is correct decreases. The FamilyTreeDNA
web site has links to the details of the probabilities fpr those
interested.
Now look at Table 1 on the results page for these two test subjects.
You see a difference of 2; 1 at DYS# 439 (which is a faster mutating
marker) and 1 at DYS# 437 (which is not faster mutating than average).
This is called the genetic distance between these two test subjects.
The genetic distant between any pair of test subjects is shown in Table
2 in the results section. So, based on just the difference between
observed and most probably genetic distances, the hypothesized pedigree
is not rejected but is still open to question. Remember, most probably
genetic distance is a statistically derived value and not absolute. As
explained in the analysis section, this along with other factors leads
leads to the conclusion that the hypothesized pedigree is correct or
close to being so.
< Return to the Main
DNA Project Page