doj2016: erlich presentation
TRANSCRIPT
Yaniv Erlich@erlichya2/23/16
Advanced Strategies for DNA
Identification
@erlichyaYaniv Erlich
Supported by NIJ 2014-DN-BX-K089
Yaniv Erlich@erlichya2/23/16
Outline1. Surname inference from DNA
2. The power of genome-wide STR profiles
3. Rapid identification with handheld deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Outline1. Surname inference from DNA
2. The power of genome-wide STR profiles
3. Rapid identification with handheld deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Correlation between Y-chr and surnames
www.ysearch.org:Y
Y
Smith
Smith
Y
Smith
Erlich
Advanced Strategies for DNA Identification
ACGCACGC…
Surname inference HipSTR Y-STRs MinION Summary
Yaniv Erlich@erlichya2/23/16
Databases of interest
www.smgf.org www.ysearch.org140,000 publicly accessible surname-Ychr records
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
How to find surnames?Estimating the time to most recent common ancestor
Target
i-th record in db
surname
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Empirical test to determine the probability of recovering a US surname
Y-chr of a real person
Querying Ysearch and SMGF
Inferring surname
x900
For US Caucasian males:12% Successful recoveries 5% Wrong recoveries83% Unknown
Comparing the predicted surname to the true one
Surname inference algorithm
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Distribution of inferred surnames
Most of the inferred surnames are relatively rare
Intro. Methodology The Venter case Anonymous datasets Summary
Inferred surname =~ zipcode
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Putting it all together: the Venter case
www.ysearch.org:
lobSTRDYS458: 17 repeats
Try it yourself: bit.ly/find_craig
We got a surname from whole genome sequencing data
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
AftermathOur study
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Outline1. Surname inference from DNA
2. The power of genome-wide STR profiles
3. Rapid identification with handheld deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
HipSTR: a new STR calling algorithm
• Haplotype-based imputation, phasing, and genotyping of STRs
• Haplotype– Robustness to stutter noise
• Imputation– Recover STR dropouts from nearby SNPs.
• Phasing– Resolve homoplasy between alleles
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
STR benchmarking results
HipSTR is the most accurate STR caller AND does this ~10x faster than the next best method
GATK
Platyp
us
Freeba
ys
lobSTR
Repea
tSeq
HipSTR
50
55
60
65
70
75
80
85
90
95
100A
ccur
acy
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
HipSTR: solving homoplasy• Can now correctly detect STRs with identical lengths
but different sequences (homoplasy)• Real example:
– Length based genotype: -4/-4– HipSTR genotype: (AGAT)8(ACAT)9 / (AGAT)10(ACAT)7
• HipSTR available at https://github.com/tfwillems/HipSTR
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Chromosome wide scan of Y-STRs• Goal: scan every possible STR on the
Y chromosome and assess mutation rates
• >200,000 transmissions = high accuracy.
• Leverage 1500 whole genome world-wide samples.
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Benchmarking mutating ratesOur study
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Deliveries
http://bit.ly/DOJERLICH
Scanning 4500 Y-STRs
Estimating mutation rates of 700 polymorphic Y-STRs
Finding additional fast mutating Y-STRs
Imputation of Y-STRs with high accuracy
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Outline1. Surname inference from DNA
2. The power of genome-wide STR profiles
3. Rapid identification with handheld deice
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Oxford Nanopore MinIONFeatures:• USB stick• Portable• Low throughput• High error rate
(10%)Can we identify samples within minutes?
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
CODIS STR are challenging1. MinION has too many indels
2. Many reads required to correct for error
3. CODIS needs a PCR machine Proposal:Shotgun sequencing + Bayesian approach
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
ApproachReal-time Sequencing data
(reads)
Filter SNPs
Bayesian algorithm Based on prior knowledge • Frequency alleles in
population • Error rate
Alignment to human genome15% error in base calling
Genome db
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Real experiment: me versus Venter
Prob
abilit
y of
a m
atch
6m30s
• Assuming a database of 107
• Retrospect analysis
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Real experiment: Venter vs. me
50 150
250
350
450
550
650
750
850
950
0%
1%
10%
100%
P(Erlich)P(Venter)
# of Reads Returned from MinION
• Bad flowcell: 50min for detection
• My genome: 23andMe array
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Summary
Advanced capabilities:1. Surnames2. Homoplasy detection3. More Y-STRs4. Imputation of missing markers5. Rapid identification of DNA
Surname inference HipSTR Y-STRs MinION Summary
Advanced Strategies for DNA Identification
Yaniv Erlich@erlichya2/23/16
Acknowledgements*Thomas Willems (MIT) *Melissa Gymrek (HST – Harvard/MIT)*Sophie Zaajier (Columbia University*Robert Piccone (Columbia University)
Chris Taylor-Smith (Sanger Institute)David Poznik (Stanford University)1000Y analysis group
* Supported by: NIJ 2014-DN-BX-K089