AP Syllabus focus:
‘Molecular data usually provide more accurate, reliable evidence than morphology for building phylogenetic trees or cladograms.’
Molecular comparisons let biologists infer evolutionary relationships using heritable sequences rather than potentially misleading visible traits. By quantifying similarities and differences in DNA, RNA, and proteins, phylogenies can be built and evaluated with greater precision.
Why molecular data are often more reliable than morphology
More characters, less ambiguity
Molecular sequences provide thousands to millions of comparable positions (nucleotides or amino acids), creating a large evidence base.
Many morphological traits are complex and influenced by environment, development, and measurement subjectivity, yielding fewer clear, independent characters.
Convergent morphology is common
Different lineages can independently evolve similar structures when exposed to similar environments, obscuring true relatedness. Molecular data can reduce this problem because:
Some molecular changes are selectively neutral (especially synonymous DNA changes), accumulating in ways less tied to similar lifestyles.
Similar-looking organisms may still show substantial sequence divergence that reveals they are not close relatives.
Homology testing is clearer in sequences
To build a tree, comparisons must involve homologous features (inherited from a common ancestor). In molecular work:
Homology is evaluated through sequence alignment, where bases/amino acids are lined up to represent shared ancestry.

Schematic legend illustrating how a multiple sequence alignment represents similarity and difference: matches, mismatches, insertions, gaps, and unaligned regions. This helps connect the concept of “homologous positions” to the practical mechanics of lining up residues so that comparable sites are analyzed in phylogenetic inference. Source
Many genes are conserved enough across taxa to align reliably, supporting comparisons across broad evolutionary distances.
What “molecular data” includes in phylogenetics
Common data sources
Nuclear DNA: large genome; informative across many timescales.
Mitochondrial DNA (mtDNA) (animals) and chloroplast DNA (cpDNA) (plants): often present in many copies per cell, which can aid detection and sequencing.
Protein (amino acid) sequences: useful when DNA is too divergent; amino acids can show deeper conservation.
Types of molecular variation used
Single-nucleotide substitutions (base changes)
Insertions/deletions (indels) that change sequence length
Gene presence/absence or larger genomic rearrangements (when available)
Choosing genes for the question
Different molecules evolve at different rates, so gene choice affects resolution:
Slow-evolving genes help resolve deep splits (distant common ancestry).
Fast-evolving regions can distinguish closely related species or populations.
How molecular comparisons inform tree building
Sequence similarity and relatedness
In general, the more recently two lineages shared a common ancestor, the more similar their sequences tend to be (because fewer changes have accumulated). Molecular phylogenetics leverages this by:
Comparing aligned sequences across taxa
Estimating which branching pattern best explains the observed differences
Orthologs vs paralogs (getting the right gene history)
Gene families can complicate inference if the wrong sequences are compared.

Diagram showing how a gene duplication event creates paralogs within a lineage, and how a later speciation event creates orthologs between species. It clarifies why phylogenetic analyses typically prefer orthologs for reconstructing species relationships, while paralogs can reflect gene-family history rather than organismal history. Source
Orthologs: Homologous genes in different species that diverged because of speciation (often best for reconstructing species relationships).
Using paralogs (genes related by duplication within a lineage) can mislead because the gene tree may not match the species tree.
Combining multiple molecular markers
Because any single gene may reflect an unusual history, stronger inferences often come from:
Comparing multiple genes or genomic regions
Checking whether different datasets support similar branching patterns
Key limitations and how they can affect reliability
Molecular data are powerful, but not automatically “correct.” Common issues include:
Different evolutionary rates among lineages or genes, which can distort perceived relatedness if not accounted for.
Multiple substitutions at the same site over long timescales, which can hide earlier changes and reduce signal.
Gene duplication and loss, which increases the risk of comparing non-orthologous sequences.
Horizontal gene transfer (especially in prokaryotes), where genes move between lineages, causing gene histories to conflict with organismal histories.
Alignment errors and missing data, which can introduce false similarities or differences.
Interpreting molecular-based phylogenies
When evaluating a molecular phylogeny, focus on:
Whether the compared sequences are likely homologous and appropriately chosen for the timescale
Whether multiple independent molecular regions support the same major groupings
Whether the inferred relationships make biological sense without relying on superficial similarity alone
FAQ
Different models weight changes (e.g. transitions vs transversions) differently, affecting inferred distances and branching patterns. Poor model choice can bias topology.
Indels are represented as gaps during alignment. Treatment varies: ignored, coded as characters, or modelled—each can change support for particular branches.
It estimates divergence time from mutation accumulation rates. It can fail if rates vary across lineages, if selection differs among genes, or if calibration points are inaccurate.
mtDNA represents a single, maternally inherited history and can be skewed by introgression or lineage sorting, whereas nuclear DNA averages across many loci.
They often use resampling or likelihood-based support metrics to estimate how consistently the data favour particular branches under the chosen model and alignment.
Practice Questions
State two reasons why molecular data can provide more reliable evidence than morphology when constructing a phylogenetic tree. (2 marks)
Any two of: many more comparable characters; less affected by convergent evolution; more objective/quantifiable; can compare conserved genes across taxa. (1 mark each)
A student uses sequences from one gene to build a phylogeny for five species. Explain how comparing the wrong type of homologous gene could mislead the phylogeny, and describe two additional molecular approaches that could reduce this risk. (5 marks)
Correctly identifies that comparing paralogs (from gene duplication) instead of orthologs can make the gene tree differ from the species tree. (2)
Describes using multiple genes/independent loci to check for consistent relationships. (1)
Describes selecting confirmed orthologous sequences (e.g. single-copy genes or reciprocal best matches) before analysis. (1)
Describes using amino acid vs DNA, or combining nuclear with organelle markers, to test robustness across datasets. (1)
