doj2016: erlich presentation

25
Yaniv Erlich @erlichya 2/23/16 Advanced Strategies for DNA Identification @erlichya Yaniv Erlich Supported by NIJ 2014-DN-BX-K089

Upload: yaniv-erlich

Post on 09-Jan-2017

359 views

Category:

Science


0 download

TRANSCRIPT

Page 1: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Advanced Strategies for DNA

Identification

@erlichyaYaniv Erlich

Supported by NIJ 2014-DN-BX-K089

Page 2: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Outline1. Surname inference from DNA

2. The power of genome-wide STR profiles

3. Rapid identification with handheld deice

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 3: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Outline1. Surname inference from DNA

2. The power of genome-wide STR profiles

3. Rapid identification with handheld deice

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 4: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Correlation between Y-chr and surnames

www.ysearch.org:Y

Y

Smith

Smith

Y

Smith

Erlich

Advanced Strategies for DNA Identification

ACGCACGC…

Surname inference HipSTR Y-STRs MinION Summary

Page 5: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Databases of interest

www.smgf.org www.ysearch.org140,000 publicly accessible surname-Ychr records

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 6: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

How to find surnames?Estimating the time to most recent common ancestor

Target

i-th record in db

surname

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 7: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Empirical test to determine the probability of recovering a US surname

Y-chr of a real person

Querying Ysearch and SMGF

Inferring surname

x900

For US Caucasian males:12% Successful recoveries 5% Wrong recoveries83% Unknown

Comparing the predicted surname to the true one

Surname inference algorithm

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 8: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Distribution of inferred surnames

Most of the inferred surnames are relatively rare

Intro. Methodology The Venter case Anonymous datasets Summary

Inferred surname =~ zipcode

Advanced Strategies for DNA Identification

Page 9: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Putting it all together: the Venter case

www.ysearch.org:

lobSTRDYS458: 17 repeats

Try it yourself: bit.ly/find_craig

We got a surname from whole genome sequencing data

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 10: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

AftermathOur study

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 11: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Outline1. Surname inference from DNA

2. The power of genome-wide STR profiles

3. Rapid identification with handheld deice

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 12: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

HipSTR: a new STR calling algorithm

• Haplotype-based imputation, phasing, and genotyping of STRs

• Haplotype– Robustness to stutter noise

• Imputation– Recover STR dropouts from nearby SNPs.

• Phasing– Resolve homoplasy between alleles

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 13: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

STR benchmarking results

HipSTR is the most accurate STR caller AND does this ~10x faster than the next best method

GATK

Platyp

us

Freeba

ys

lobSTR

Repea

tSeq

HipSTR

50

55

60

65

70

75

80

85

90

95

100A

ccur

acy

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 14: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

HipSTR: solving homoplasy• Can now correctly detect STRs with identical lengths

but different sequences (homoplasy)• Real example:

– Length based genotype: -4/-4– HipSTR genotype: (AGAT)8(ACAT)9 / (AGAT)10(ACAT)7

• HipSTR available at https://github.com/tfwillems/HipSTR

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 15: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Chromosome wide scan of Y-STRs• Goal: scan every possible STR on the

Y chromosome and assess mutation rates

• >200,000 transmissions = high accuracy.

• Leverage 1500 whole genome world-wide samples.

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 16: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Benchmarking mutating ratesOur study

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 17: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Deliveries

http://bit.ly/DOJERLICH

Scanning 4500 Y-STRs

Estimating mutation rates of 700 polymorphic Y-STRs

Finding additional fast mutating Y-STRs

Imputation of Y-STRs with high accuracy

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 18: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Outline1. Surname inference from DNA

2. The power of genome-wide STR profiles

3. Rapid identification with handheld deice

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 19: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Oxford Nanopore MinIONFeatures:• USB stick• Portable• Low throughput• High error rate

(10%)Can we identify samples within minutes?

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 20: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

CODIS STR are challenging1. MinION has too many indels

2. Many reads required to correct for error

3. CODIS needs a PCR machine Proposal:Shotgun sequencing + Bayesian approach

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 21: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

ApproachReal-time Sequencing data

(reads)

Filter SNPs

Bayesian algorithm Based on prior knowledge • Frequency alleles in

population • Error rate

Alignment to human genome15% error in base calling

Genome db

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 22: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Real experiment: me versus Venter

Prob

abilit

y of

a m

atch

6m30s

• Assuming a database of 107

• Retrospect analysis

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 23: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Real experiment: Venter vs. me

50 150

250

350

450

550

650

750

850

950

0%

1%

10%

100%

P(Erlich)P(Venter)

# of Reads Returned from MinION

• Bad flowcell: 50min for detection

• My genome: 23andMe array

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 24: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Summary

Advanced capabilities:1. Surnames2. Homoplasy detection3. More Y-STRs4. Imputation of missing markers5. Rapid identification of DNA

Surname inference HipSTR Y-STRs MinION Summary

Advanced Strategies for DNA Identification

Page 25: DoJ2016: Erlich presentation

Yaniv Erlich@erlichya2/23/16

Acknowledgements*Thomas Willems (MIT) *Melissa Gymrek (HST – Harvard/MIT)*Sophie Zaajier (Columbia University*Robert Piccone (Columbia University)

Chris Taylor-Smith (Sanger Institute)David Poznik (Stanford University)1000Y analysis group

* Supported by: NIJ 2014-DN-BX-K089