AP Syllabus focus:
‘DNA sequencing determines nucleotide order in DNA molecules; resulting sequence data can generate DNA fingerprints used to compare different samples.’
DNA sequencing and DNA fingerprinting are core biotechnology tools for reading DNA and distinguishing samples. They rely on predictable base pairing and natural DNA variation to produce comparable, interpretable genetic information.
DNA sequencing: determining nucleotide order
DNA sequencing identifies the exact order of nucleotides (A, T, C, G) in a DNA molecule, producing a DNA sequence that can be stored, searched, and compared across samples.
DNA sequencing: A method used to determine the precise order of nucleotides in a DNA fragment.
What sequencing output looks like

This diagram summarizes Sanger (dideoxy) sequencing, showing how chain-terminating ddNTPs generate a set of DNA fragments that differ by one nucleotide. The fragments are separated by electrophoresis and detected as a color-coded chromatogram, which is then translated into a base-by-base sequence (A, T, C, G). Source
Sequencing generates a digital representation of bases (a string of letters). Depending on the method, you may also see:
Read data: many short sequences (“reads”) from the same sample
Consensus sequence: the most supported base at each position after combining reads
Variant calls: positions where a sample differs from a reference or another sample
Why sequencing is useful in comparisons
Because sequence data are explicit, comparisons can be direct:
Match: the same bases at the same positions (within expected error)
Difference: substitutions or length differences (often in repetitive regions)
Relatedness inference: more shared variants usually indicates closer relationship (interpretation depends on context and sampling)
DNA fingerprinting: identifying and comparing samples
DNA fingerprinting uses variable regions of DNA to generate a pattern or profile that can be compared between samples (for identity testing or relatedness assessment).

This STR electropherogram shows how DNA fingerprinting produces a peak pattern across multiple loci, where each peak corresponds to an allele (a fragment size/repeat count) at a specific STR marker. In mixture samples, additional peak sets can appear, making interpretation depend on allele compatibility and relative peak heights. The resulting multi-locus peak pattern is the “profile” used to compare biological samples. Source
DNA fingerprinting: The creation of a DNA-based profile from variable genomic regions to compare biological samples.
A key idea is that many genomic regions are shared by all humans, but some regions vary enough to distinguish individuals.
Sources of variation used for fingerprints
High-utility fingerprint regions typically include:

This diagram illustrates gel electrophoresis, the classic method for separating DNA fragments by size to create a banding pattern. Smaller fragments migrate farther through the gel matrix than larger fragments, producing distinct bands that can be compared between samples. This size-based separation underlies many fingerprint-style profiles, including those based on variable repeat regions. Source
Short tandem repeats (STRs): short motifs (e.g., 2–6 bases) repeated a variable number of times
Variable number tandem repeats (VNTRs): longer repeat units with variable repeat counts
Single-nucleotide polymorphisms (SNPs): single-base differences (often used in large panels)
STR: A short DNA sequence motif repeated in tandem, with repeat number varying among individuals.
How sequencing data can generate DNA fingerprints
Sequence information can be converted into a fingerprint by focusing analysis on selected variable loci and then comparing the results between samples.
Conceptual workflow (sequencing-based fingerprinting)
Choose a set of highly variable loci (commonly STRs or SNP panels)
Obtain sequence reads covering those loci
Determine the alleles present at each locus (e.g., repeat counts for STRs, specific bases for SNPs)
Compile alleles into a profile (the “fingerprint”)
Compare profiles between samples to assess:
Consistency (same alleles across loci)
Exclusion (mismatched alleles at one or more loci)
Strength of match (more matching loci generally increases confidence)
What “comparing samples” means in practice
Comparisons are most informative when:
Multiple independent loci are analysed (reduces chance of coincidental matches)
The same loci and analysis rules are used across all samples
Data quality is sufficient to avoid false differences from sequencing error
Interpreting fingerprints: what can and cannot be concluded
Fingerprint interpretation depends on whether profiles are:
Complete (all loci successfully typed)
Partial (missing loci due to limited or degraded DNA)
Mixed (DNA from more than one individual)
Key interpretive outcomes include:
Match supports same source: profiles are consistent across tested loci
Exclusion supports different sources: at least one locus clearly incompatible
Inconclusive: insufficient loci/quality to decide confidently
Data reliability and common pitfalls
Even with strong methods, results can be affected by:
Contamination: foreign DNA introduced during collection or processing
Degradation: fragmented DNA reduces recoverable loci and read quality
Sequencing error: incorrect base calls can create apparent differences
Repeat-region challenges: repeats (STRs/VNTRs) can be harder to read accurately than non-repetitive DNA
Good practice typically includes:
Negative controls (to detect contamination)
Replicates (to confirm uncertain calls)
Clear rules for calling alleles and handling low-quality data
FAQ
Higher depth means the same locus is read many times, so true alleles are supported repeatedly.
Low depth increases the chance that random errors or missing reads cause incorrect or incomplete profiles.
Repeat tracts can cause read slippage and ambiguous alignment, making repeat-number calling difficult.
Short reads may not fully span longer repeats, reducing certainty.
Yes, mitochondrial DNA can help with highly degraded samples because it is abundant per cell.
However, it is shared along maternal lines, so it is less individually specific than nuclear STR/SNP profiles.
In some multiplexed sequencing runs, reads may be assigned to the wrong sample barcode.
This can introduce low-level чуж DNA into a profile unless stringent filtering and controls are used.
Some alleles are common in populations, so limited loci can coincidentally match.
Using more independent loci and well-chosen markers reduces the probability of an accidental match.
Practice Questions
State what DNA sequencing determines and explain how sequence data can be used to compare two DNA samples. (3 marks)
States sequencing determines the order of nucleotides/bases in DNA (1)
Explains sequences can be aligned/compared base-by-base to identify matches/differences (1)
Links differences to genetic variation/identity comparison (1)
Describe how a DNA fingerprint can be generated from sequencing data and used to assess whether two samples could come from the same individual. (6 marks)
Identifies variable loci used for fingerprinting (e.g., STRs/SNPs) (1)
Describes obtaining sequence reads covering the chosen loci (1)
Explains calling alleles at each locus (e.g., repeat number or specific base) (1)
Describes compiling allele calls into a profile/fingerprint across multiple loci (1)
Explains comparing profiles: match across loci supports same source; mismatch can exclude (1)
Mentions need for multiple loci and/or quality controls to reduce error (1)
