AP Syllabus focus:
‘Comparing DNA nucleotide sequences and protein amino acid sequences provides strong evidence for evolution and shared ancestry.’
Molecular data let biologists test evolutionary relationships by directly comparing heritable information. Patterns of similarity in DNA and proteins, interpreted with mutation and inheritance, provide quantitative evidence for evolution and common ancestry.
Why DNA and proteins are powerful evidence
Evolution predicts that descendants inherit genetic information from ancestors, with changes accumulating through mutation and other genetic processes. Therefore, organisms with a more recent common ancestor should show greater similarity in:
DNA nucleotide sequences (A, T, C, G)
Protein amino acid sequences (translated products of genes)
Because DNA and proteins are present in all living organisms and are directly tied to heredity, they provide a widely applicable, testable line of evidence.
Core idea: sequence similarity reflects relatedness
Molecular homology: Similarity in DNA or protein sequences due to inheritance from a common ancestor.
Molecular homology is evaluated by aligning sequences and comparing positions (sites) to quantify similarity and differences.
DNA evidence: comparing nucleotide sequences
DNA comparisons can target different genomic regions depending on the question:
Protein-coding genes: differences may affect amino acid sequences and function
Noncoding regions (e.g., introns, intergenic DNA): often accumulate changes with fewer functional constraints, making them useful for distinguishing closely related lineages
Organelle DNA (mitochondrial or chloroplast): often present in many copies per cell and may be easier to sequence

This diagram maps the human mitochondrial DNA genome as a circular chromosome with annotated protein-coding genes, rRNA genes, and tRNA genes. It helps connect the idea of “organelle DNA” to a real, heritable sequence source frequently used in molecular systematics and evolutionary inference. Source
Key interpretations:
Fewer nucleotide differences between two sequences generally indicates a more recent divergence from a common ancestor.
Some mutations are neutral (no effect on fitness) and can accumulate over time, providing a record of evolutionary change.
Shared unique changes at the same positions in a gene across species are unlikely to occur by chance repeatedly and support shared ancestry.
Protein evidence: comparing amino acid sequences
Proteins reflect the information encoded in DNA but add functional context:
Many proteins are conserved because changes can disrupt function; strong conservation across very different organisms supports deep common ancestry.
When proteins differ, the pattern of amino acid substitutions can still match expectations from descent with modification.
Important considerations:
Because multiple codons can code for the same amino acid, DNA sequences can differ while the protein remains unchanged (silent/synonymous substitutions). This is why DNA can provide finer resolution in some comparisons.
Conversely, protein comparisons can be especially informative when DNA differences are numerous but protein function remains comparable, revealing which parts of the molecule are under stronger constraint.
What patterns in sequences count as “evidence”?
Molecular and genetic evidence supports evolution and shared ancestry when it shows consistent, non-random structure, such as:

This cladogram-style figure shows how a smaller clade (Amniota) nests within a larger clade (Vertebrata), illustrating hierarchical grouping. The same nested pattern is what molecular sequence data produce when taxa are grouped by shared derived changes inherited from common ancestors. Source
Nested similarity: species group into patterns where closely related species share more sequence similarity than distant relatives.

This phylogenetic tree (built from RNA sequence comparisons) illustrates nested branching relationships among major groups of life. Branch points represent inferred common ancestors, so taxa that share a more recent node are expected to have fewer sequence differences than taxa whose lineages diverged deeper in the tree. Source
Conserved genes and proteins across life: fundamental processes (e.g., ATP synthesis, DNA replication enzymes) rely on homologous molecules found in many taxa, indicating descent from ancient ancestors.
Correspondence across independent genes: when multiple unrelated genes show similar relationship patterns among the same organisms, the best explanation is shared evolutionary history rather than coincidence.
Strengths and limits of using molecular comparisons
Strengths:
Provides quantitative measures of similarity (percent identity, number of substitutions)
Uses a universal framework (DNA → RNA → protein) applicable across organisms
Can detect relatedness even when morphology is misleading (e.g., convergent traits)
Limits to interpret carefully:
Different genes evolve at different rates; choosing an appropriate molecule matters.
Strong natural selection can constrain change, making some sequences remain similar even over long timescales.
Sequence comparisons require correct alignment and identification of truly comparable genes (homologs) to avoid false conclusions.
FAQ
Mitochondrial DNA is present in many copies per cell, so it can be easier to recover from small or degraded samples.
It is also typically inherited from one parent, which can simplify tracing sequence lineages in some studies.
Orthologs diverge after speciation; paralogs arise by gene duplication within a lineage.
Comparing paralogs instead of orthologs can make two species seem more distantly related (or oddly related) because the genes are not equivalent evolutionary counterparts.
Alignment decides which nucleotides or amino acids are treated as “corresponding positions.”
Different alignment settings can change estimated similarity, especially in regions with insertions/deletions or repeated motifs.
The genetic code is redundant: multiple codons can specify the same amino acid.
This allows DNA changes to accumulate without altering the protein sequence, particularly at third codon positions.
If the wrong sequences are compared (non-homologous genes), similarity values are not meaningful.
Also, strong selective constraints can keep sequences similar, while rapidly evolving regions can saturate with changes, obscuring deeper relationships.
Practice Questions
Explain how comparing DNA nucleotide sequences can provide evidence for shared ancestry between two species. (2 marks)
States that species with a more recent common ancestor have more similar DNA sequences / fewer nucleotide differences (1)
Links similarity to inheritance of DNA with mutations accumulating over time (1)
A gene is sequenced from four species. Species A and B have 98% nucleotide identity in this gene, while A and C have 90% and A and D have 89%. Describe how both DNA and protein (amino acid) sequence comparisons from this gene can be used to support evolution and shared ancestry, and give two reasons interpretations may differ between DNA and protein comparisons. (5 marks)
Uses DNA identity to infer A and B are more closely related / share a more recent common ancestor than A with C or D (1)
States that sequence differences arise from mutations inherited through lineages (1)
Explains that translating to proteins allows amino acid sequence comparison as additional molecular evidence (1)
Reason 1 for differing interpretations: synonymous (silent) substitutions change DNA but not amino acids (1)
Reason 2 for differing interpretations: functional constraint/selection conserves amino acids so proteins may change more slowly than DNA (1)
