bioinformatika kfc/bin ii. sekvencefch.upol.cz/wp-content/uploads/2018/03/bin_02_sekvence_vz4.pdfiub...
TRANSCRIPT
![Page 1: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/1.jpg)
Bioinformatika
KFC/BIN
II. SekvenceRNDr. Karel Berka, Ph.D.
Univerzita Palackého v Olomouci
![Page 2: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/2.jpg)
Centrální dogma molekulární
biologie
![Page 3: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/3.jpg)
Centrální dogma molekulární
biologie
reversnítranscripce
informace funkce
DNA RNA protein
![Page 4: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/4.jpg)
IUB kód
code nucleotides complement
A A T
C C G
G G C
T T A
(U U) A
M AC K
R AG Y
W AT S
S CG W
Y CT R
K GT M
V ACG B
H ACT D
D AGT H
B CGT V
N ACGT N
- space -
codethree-letter
code aminoacid
A Ala Alanine
C Cys Cysteine
D Asp Aspartic acid
G Glu Glutamic acid
H His Histidine
I Ile Isoleucine
K Lys Lysine
L Leu Leucine
M Met Methionine
N Asn Asparagine
P Pro Proline
Q Gln Glutamine
R Arg Arginine
S Ser Serine
T Thr Threonine
V Val Valine
W Trp Tryptofan
Y Tyr Tyrosine
X Xxx Any aminoacid
* --- stop
NAProteiny
![Page 5: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/5.jpg)
genetický kódT C A G
T TTT Phe TCT Ser TAT Tyr TGT Cys T
TTC Phe TCC Ser TAC Tyr TGC Cys C
TTA Leu TCA Ser TAA Stop TGA Stop A
TTG Leu TCG Ser TAG Stop TGG Trp G
C CTT Leu CCT Pro CAT His CGT Arg T
CTC Leu CCC Pro CAC His CGC Arg C
CTA Leu CCA Pro CAA Gln CGA Arg A
CTG Leu CCG Pro CAG Gln CGG Arg G
A ATT Ile ACT Thr AAT Asn AGT Ser T
ATC Ile ACC Thr AAC Asn AGC Ser C
ATA Ile ACA Thr AAA Lys AGA Arg A
ATG Met ACG Thr AAG Lys AGG Arg G
G GTT Val GCT Ala GAT Asp GGT Gly T
GTC Val GCC Ala GAC Asp GGC Gly C
GTA Val GCA Ala GAA Glu GGA Gly A
GTG Val GCG Ala GAG Glu GGG Gly G
![Page 6: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/6.jpg)
Termíny a zkratky
Genomika: kompletní genetická informace o
organismu (DNA sekvence) a její interpretace.
strukturní
funkční
DNA, RNA: nt (nucleotid), bp (pár bazí)
Proteomika: Co, kde (a kdy) se v organismu
exprimováno a jakou to má funkci
Proteiny: aa (aminokyseliny)
![Page 7: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/7.jpg)
Sekvence
5' C-G-A-T-T-G-C-A-A-C-G-A-T-G-C 3‘
| | | | | | | | | | | | | | |
3' G-C-T-A-A-C-G-T-T-G-C-T-A-C-G 5‘
5' C-G-A-U-U-G-C-A-A-C-G-A-U-G-C 3‘
Nter R W Q R C Cter
DNA
RNA
Protein
![Page 8: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/8.jpg)
Příklad: Hemoglobin
DNA sekvence - 444 bp
atggtgcatctgactcctgaggagaagtctgccgttactgccctgtggggcaaggtgaac
gtggatgaagttggtggtgaggccctgggcaggctgctggtggtctacccttggacccag
aggttctttgagtcctttggggatctgtccactcctgatgctgttatgggcaaccctaag
gtgaaggctcatggcaagaaagtgctcggtgcctttagtgatggcctggctcacctggac
aacctcaagggcacctttgccacactgagtgagctgcactgtgacaagctgcacgtggat
cctgagaacttcaggctcctgggcaacgtgctggtctgtgtgctggcccatcactttggc
aaagaattcaccccaccagtgcaggctgcctatcagaaagtggtggctggtgtggctaat
gccctggcccacaagtatcactaa
Proteinová sekvence - 147 aa
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFE
SFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFAT
LSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQ
KVVAGVANALAHKYH
DNA sekvence určuje proteinovou sekvenci
proteinová sekvence určuje proteinovou strukturu
struktura proteinu určuje funkci
![Page 9: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/9.jpg)
DNA• DNA sekvenace
– 1972 DNA klonování
– 1975 DNA sekvenace
– od 80. let – sekvenační revoluce
Manuálně (dideoxy elektroforéza)
• Sanger
Automaticky - robotizace
• J. Craig Venter
– Celera Genomics
![Page 10: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/10.jpg)
Protein
• Proteinová sekvenace
– Edmanovo odbourávání
• Sanger - fluorescenční činidlo
– MS/MS
masses (m/z)
940.421 - ELSDIAR
1093.477 - QLLLTADDR
1341.556 - PHSHPALTPEQK
1469.633 - PHSHPALTPEQKK
1488.645 - GILAADESTGSIAKR
1646.650 - LQSIGTENTEWENRR
2122.975 - IGENHTPSALAIMENANVLAR
2241.903 - YTPSGQAGAAASESLFISNHAY
![Page 11: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/11.jpg)
Projekt Lidský genom
(The Human Genome Project)• Zahájen v polovině 80-tých let 20. století
• Odhad: 100,000 genů, dokončeno v roce 2005
• Automatické sekvenování a zdokonalení výpočetní techniky– Shotgun methody
• První verze publikována v roce 2000 společně– International Consortium Human Genome Project (veřejně
financovaná společnost)
– Celera Genomics (soukromá společnost)
• Referenční sekvence lidské DNA dokončena v dubnu 2003
http://genomics.energy.gov/
![Page 12: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/12.jpg)
Projekt Lidský genom
(The Human Genome Project)
• 20 313 genů (Ensembl.org, 21.2.2016)
• 20 769 genů (Ensenbl.org, 30.9.2013)
Alternativní sestřih – 10,000,000 proteinů
• Stovky genů jsou výsledkem horizontálního přenosu z bakterií (v linii obratlovců)
• Desítky genů jsou odvozeny od transpozibilních elementů
• Rychlost mutací u můžů je asi 2x větší než u žen
• >1,400,000 jednoduchých nukleotidových polymorfismů (SNPs)
![Page 13: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/13.jpg)
Biologické databáze
primarní vs. sekundární
formát vs. obsah (computers vs. human)
primární
sbírají informace o dotyčných sekvencích
sekundární
Obsahují výsledky analýzy dat z primárních databází
Sestaveny pomocí mnohočetného porovnávání
(multiple alignment) homologních sekvencí pro zachycení
konzervovaných oblastí – zařazení do rodin
![Page 14: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/14.jpg)
DNA databáze
• GenBank (NCBI) – od roku 1982 – vz. 212, 190,250,235 sekvencí, 207,018,196,067 nt (Feb 2016)
– vz. 1, 606 sekvencí, 680,338 nt (Dec 1982)
• WGS (Whole Genome Shotgun) – od roku 2002– vz. 212, 333,012,760 sekvencí, 1,399,865,495,608 nt (Feb 2016)
• ENA - EMBL (EBI) – 713,500,000 sekvencí, 1,611,100,000,000 nt (Feb 2016)
– 83,666,567 sekvencí, 150,163,403,742 nt, (Nov 2006)
69 GB compressed (376 GB uncompressed)
• DDBJ (DNA DataBase of Japan)– 64,267,978 sekvencí, 68,259,314,742 nt (Dec 2006)
sdílejí „accession numbers“ ("A12345" v EMBL je stejný jako"A12345" in GenBank or DDBJ)
![Page 15: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/15.jpg)
Primární proteinové databáze
• UniProtKB (PIR-PSD, SwissProt, TrEMBL)– UniProtKB/Swiss-Prot
manually curated and reviewed protein sequence database 550,552 (Feb 2016)
– UniProtKB/TrEMBLautomatically-annotated and not reviewed. 60,971,489 (Feb 2016)
• NCBInr; – compiled from a variety of sources, including
SwissProt, PIR, PRF, PDB, and translations from annotated coding regions in GenBank and RefSeq 4,396,331 entries (January 2007) - 4GB
![Page 16: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/16.jpg)
Sekundární proteinové databáze
Sekundární
databáze
Zdroj dat Princip řazení
PROSITE UNIPROT Regulární výrazy
(patterns)
PRINTS OWL motivy (fingerprints)
Pfam UNIPROT Skryté Markovovské
Modely (HMMs)
BLOCKS PROSITE
/PRINTS
motivy (blocks)
![Page 17: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/17.jpg)
formáty sekvencí
binární s chromatogramy
pro programs
minimal
annotované
textové
(human
readable)
SCF
ALF
ABI
interní databáze těchto
programů
text
fasta
EMBL
GenBank
ASN
XML
![Page 18: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/18.jpg)
SCF
SCF: standart chromatogram file
![Page 19: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/19.jpg)
fasta format
>gi|6102607|gb|AF145233.1|AF145233 Mus musculus transcription factor PAX4
TGGCAGGACTGAAGCAGCTGGAGGCTGTTACAAGACCAGACCACCAGCAAACCCTGGAGCCTGCACAGGA
CCCTGAGACCTCTTCCTGGAATTCCCACCTTTTTTCCTCCATCCAGAACCAGTCCCAAAGAGAAACTTCC
AGAAGGAGCTCTCCGTTTTCAGTTTGCCAGTTGGCTTCCTGTCCTTCTGTGAGGAGTACCAGTGTGAAGC
ATGCAGCAGGACGGACTCAGCAGTGTGAATCAGCTAGGGGGACTCTTTGTGAATGGCCGGCCCCTTCCTC
TGGACACCAGGCAGCAGATTGTGCAGCTAGCAATAAGAGGGATGCGACCCTGTGACATTTCACGGAGCCT
TAAGGTATCTAATGGCTGTGTGAGCAAGATCCTAGGACGCTACTACCGCACAGGTGTCTTGGAACCCAAG
TGTATTGGGGGAAGCAAACCACGTCTGGCCACACCTGCTGTGGTGGCTCGAATTGCCCAGCTAAAGGATG
AGTACCCTGCTCTTTTTGCCTGGGAGATCCAACACCAGCTTTGCACTGAAGGGCTTTGTACCCAGGACAA
GGCTCCCAGTGTGTCCTCTATCAATCGAGTACTTCGGGCACTTCAGGAAGACCAGAGCTTGCACTGGACT
CAACTCAGATCACCAGCTGTGTTGGCTCCAGTTCTTCCCAGTCCCCACAGTAACTGTGGGGCTCCCCGAG
GCCCCCACCCAGGAACCAGCCACAGGAATCGGACTATCTTCTCCCCGGGACAAGCCGAGGCACTGGAGAA
AGAGTTTCAGCGTGGGCAGTATCCAGATTCAGTGGCCCGTGGGAAGCTGGCTGCTGCCACCTCTCTGCCT
GAAGACACGGTGAGGGTTTGGTTTTCTAACAGAAGAGCCAAATGGCGCAGGCAAGAGAAGCTGAAATGGG
AAGCACAGCTGCCAGGTGCTTCCCAGGACCTGACAGTACCAAAAAATTCTCCAGGGATCATCTCTGCACA
GCAGTCCCCCGGCAGTGTACCCTCAGCTGCCTTGCCTGTGCTGGAACCATTGAGTCCTTCCTTCTGTCAG
CTATGCTGTGGGACAGCACCAGGCAGATGTTCCAGTGACACCTCATCCCAGGCCTATCTCCAACCCTACT
GGGACTGCCAATCCCTCCTTCCTGTGGCTTCCTCCTCATATGTGGAATTTGCCTGGCCCTGCCTCACCAC
CCATCCTGTGCATCATCTGATTGGAGGCCCAGGACAAGTGCCATCAACCCATTGCTCAAACTGGCCATAA
GAGGCCTCTATTTGACAGTAATAAAAACCTTTTCTTAGATGTTAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
> řádek s komentářem – specifikace zda NA, či protein
![Page 20: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/20.jpg)
GenBank fields
Reference Seq-id
The NCBI RefSeq project provides a curated, nonredundant set of
reference sequence standards for naturally occurring biological
molecules, ranging from chromosomes to transcripts to proteins.
Prefixes:
•NC_ chromosomes
•NM_ mRNAs
•NP_ proteins
•NT_ constructed genomic contigs
•NG_ genomic regions or gene clusters
![Page 21: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/21.jpg)
GenBank fields
FEATURE field:
structured record
must have location (which can be partial)
main fields:
•SOURCE
•CDS (coding region)
•RNA
•GENE
•PROTEIN
![Page 22: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/22.jpg)
GenBank flatfile
LOCUS AF145233 1360 bp mRNA ROD 23-OCT-1999
DEFINITION Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds.
ACCESSION AF145233
VERSION AF145233.1 GI:6102607
KEYWORDS .
SOURCE house mouse.
ORGANISM Mus musculus
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.
REFERENCE 1 (bases 1 to 1360)
AUTHORS Kalousova,A., Benes,V., Paces,J., Paces,V. and Kozmik,Z.
TITLE DNA binding and transactivating properties of the paired and
homeobox protein Pax4
JOURNAL Biochem. Biophys. Res. Commun. 259 (3), 510-518 (1999)
MEDLINE 99294619
PUBMED 10364449
REFERENCE 2 (bases 1 to 1360)
AUTHORS Kalousova,A., Paces,J. and Kozmik,Z.
TITLE Direct Submission
JOURNAL Submitted (23-APR-1999) Dept. of Transcription Regulation,
Institute of Molecular Genetics, Videnska 1083, Prague 142 20,
Czech Republic
FEATURES Location/Qualifiers
source 1..1360
/organism="Mus musculus"
/db_xref="taxon:10090"
gene 1..1360
/gene="Pax4"
CDS 211..1260
/gene="Pax4"
/note="DNA binding protein; paired box protein; homeobox
protein"
/codon_start=1
/product="transcription factor PAX4"
/protein_id="AAF03533.1"
…
![Page 23: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/23.jpg)
GenBank flatfile
CDS 211..1260
/gene="Pax4"
/note="DNA binding protein; paired box protein; homeobox
protein"
/codon_start=1
/product="transcription factor PAX4"
/protein_id="AAF03533.1"
/db_xref="GI:6102608"
/translation="MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDIS
RSLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWE
IQHQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCG
APRGPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWF
SNRRAKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSP
SFCQLCCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLI
GGPGQVPSTHCSNWP"
BASE COUNT 359 a 381 c 328 g 292 t
ORIGIN
1 tggcaggact gaagcagctg gaggctgtta caagaccaga ccaccagcaa accctggagc
61 ctgcacagga ccctgagacc tcttcctgga attcccacct tttttcctcc atccagaacc
121 agtcccaaag agaaacttcc agaaggagct ctccgttttc agtttgccag ttggcttcct
181 gtccttctgt gaggagtacc agtgtgaagc atgcagcagg acggactcag cagtgtgaat
…
1081 tccagtgaca cctcatccca ggcctatctc caaccctact gggactgcca atccctcctt
1141 cctgtggctt cctcctcata tgtggaattt gcctggccct gcctcaccac ccatcctgtg
1201 catcatctga ttggaggccc aggacaagtg ccatcaaccc attgctcaaa ctggccataa
1261 gaggcctcta tttgacagta ataaaaacct tttcttagat gttaaaaaaa aaaaaaaaaa
1321 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa
//
![Page 24: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/24.jpg)
ID AF031150 standard; RNA; ROD; 1379 BP.
XX
AC AF031150;
XX
SV AF031150.1
XX
DT 27-FEB-1998 (Rel. 54, Created)
DT 27-FEB-1998 (Rel. 54, Last updated, Version 1)
XX
DE Mus musculus paired-box transcription factor (Pax4) mRNA, complete cds.
XX
KW .
XX
OS Mus musculus (house mouse)
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia;
OC Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus.
XX
RN [1]
RP 1-1379
RA Inoue H., Nomiyama J., Nakai K., Matsutani A., Tanizawa Y., Oka Y.;
RT Isolation of full-length cDNA of mouse PAX4 gene and identification of its
RT human homologue;
RL Biochem. Biophys. Res. Commun. 243:628-633(1998).
XX
RN [2]
RP 1-1379
RA Inoue H., Nomiyama J., Nakai K., Tanizawa Y., Oka Y.;
RT ;
RL Submitted (23-OCT-1997) to the EMBL/GenBank/DDBJ databases.
RL Third Dept. of Int. Med., Yamaguchi University, 1144 Kogushi, Ube,
RL Yamaguchi 755, Japan
XX
FH Key Location/Qualifiers
…
EMBL flatfile
![Page 25: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/25.jpg)
EMBL flatfile
…
FH Key Location/Qualifiers
FH
FT source 1..1379
FT /db_xref=taxon:10090
FT /organism=Mus musculus
FT /cell_line=MIN6
FT CDS 297..1346
FT /codon_start=1
FT /gene=Pax4
FT /product=paired-box transcription factor
FT /protein_id=AAC40046.1
FT /translation=MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDISR
FT SLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWEIQ
FT HQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCGAPR
FT GPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWFSNRR
FT AKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSPSFCQL
FT CCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLIGGPGQV
FT PSTHCSNWP
XX
SQ Sequence 1379 BP; 327 A; 402 C; 347 G; 303 T; 0 other;
aaaaaaaaaa aaaaagcggc cgctgaattc tagcagaagg ctgccctctg ctcctgagtg 60
aaggctctgt gaagctctgg accccctggc aggactgaag cagctggagg ctgttacaag 120
accagaccac cagcaaaccc tggagcctgc acaggaccct gagacctctt cctggaattc 180
ccaccttttt tcctccatcc agaaccagtc ccaaagagaa acttccagaa ggagctctcc 240
gttttcagtt tgccagttgg cttcctgtcc ttctgtgagg agtaccagtg tgaagcatgc 300
agcaggacgg actcagcagt gtgaatcagc tagggggact ctttgtgaat ggccggcccc 360
…
gctgtgggac agcaccaggc agatgttcca gtgacacctc atcccaggcc tatctccaac 1200
cctactggga ctgccaatcc ctccttcctg tggcttcctc ctcatatgtg gaatttgcct 1260
ggccctgcct caccacccat cctgtgcatc atctgattgg aggcccagga caagtgccat 1320
caacccattg ctcaaactgg ccataagagg cctctatttg acagtaataa aaacctttt 1379
//
![Page 26: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/26.jpg)
ASN.1
Seq-entry ::= set {
class nuc-prot ,
descr {
title "Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds." ,
source {
org {
taxname "Mus musculus" ,
common "house mouse" ,
db {
{
db "taxon" ,
tag
id 10090 } } ,
orgname {
name
binomial {
genus "Mus" ,
species "musculus" } ,
lineage "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae;
Mus" ,
gcode 1 ,
mgcode 2 ,
div "ROD" } } } ,
pub {
pub {
sub {
authors {
names
std
![Page 27: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/27.jpg)
Genome project of Rhodopseudomonas palustris
Sequencing and characterization of 5kb region.
(diplomová práce Jany Prejdové pod
Janem Pačesem)
modelový příklad
využití sekvenace
![Page 28: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/28.jpg)
DNA sequencing
![Page 29: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/29.jpg)
connecting contigs
![Page 30: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/30.jpg)
>jana (4797 nt)
GAATTCGCCGCGGGGCTGCGCATCACCGATGCCGCCACCATCGAGATCGTCGAGATGGTACTGGCCGGCTCGATCAACAAGCAGCTCGTCGGC
TACATCAACGAAGCGGGCGGCAAGGCCGTCGGCCTGTGCGGCAAGGACGGCAACATGGTGTCCGCCACCAAGGCGACGCGCACCATGGTCGAT
CCGGATTCGCGGATCGAAGAGGTGATCGACCTCGGTTTCGTCGGCGAGCCGGAGAAGGTCGACCTCACCCTGCTCAACCAGCTGATCGGCCAC
GAGTTGATCCCGGTGCTGGCGCCGCTGGCGACCTCCGCGTCGGGCCAGACCTTCAACGTCAATGCCGACACCTTTGCAGGTGCGGTTGCCGGT
GCGCTGCGGGCCAAGCGCCTGCTGCTGCTGACCGACGTGCCGGGCGTGCTCGACCAGAACAAGAAGCTGATCCCCGAACTGTCGATCAAGGAT
GCCCGCAAGCTGATCGCAGACGGCACCATCTCGGGCGGCATGATCCCCAAGGTCGAGACCTGCATCTACGCGCTCGAACAGGGCGTCGAAGGC
GTCGTCATCCTCGACGGCAAGGTCCCGCACGCAGTGCTGCTCGAATTGTTCACCAACCAGGGCACCGGCACGCTGATCCACAAGTGATGCGAG
GCTGCGGCGACAACATCCGTCATGGCCGGGCTCGTCCCGGCCATCCACGTCTTTCCGGCGGTTTTCTCAGCAAGACGTGGATGCCCGGCACAA
GGCCGGGCATGACGGGGTGGAGATCGCGCGCCCTCGCCGCCATTGTCACCACCCTCGCCCTCACCTCCGCCGCCCACGCCGACCTCAAGCTCT
GCAACCGCATGAGCTACGTGGTCGAGACGGCGATCGGGGTCGATTCCAACGGCACCACCGCCTCGCGCGGATGGCTGCGGATTGATCCGGCGC
AATGCCGGGTCGTGGTGCAAGGCGCGCTCAACGCCGACCGCATCATGCTGAATGCCCGCGCGCTGGCGGTGTACGGCGTCTCGCCGCTGCCGC
AGAACGGCACTGACCGGCTGTGCATTGCCGAAGACAATTTCGTCATCGCCGCCGCGCGGCAATGCCGCGGCGGCCAAACGCTCGCCGCCTTCA
CCGAGATCAAGCCCACCGACACCGAGGACGGCAACAAGATCGCTTATCTGGCGGAAGACTCCGGCTACGACGACGAACAGGCCAAACTCGCCG
CGATCCAGCGGCTGCTGGTGATCGCCGGTTACGACGCCTCGCCGATCGACGGCGTCGACGGCCCGAAGACGCAGGCCGCGCTGTCCGCCTTCC
TCAAGAGCCGAGGCCTGAAGCCCGAGATCGTCGATGCGCCGGATTTCTTCGACGTGATGATCAAGGCAGTGCAGCAGCCGTCCGGCAGCGGGC
TGACCTGGTGCAACGACACCAAGTACAAGATCATGGCGGCCGTCGGCGAAGACGACGGCAAGACTGTCACCAGCCGCGGCTGGTACGGTGTTG
CGCCCGGCCAATGCCTGCGCCCCGACCTCGGCGCACAGCCGAAGCGGGTGTTCAGCTTCGCCGAAGCGGTCGACGGCAGCGGCAGGCCGGTGA
CCATCAAGGGCCGTGCGCTGAACTGGGGCGGCGGCGTGACGCTGTGCACGCGTGACAGCAAGTTCGAGATCGGCGAGCAAGGCGATTGCGCGG
CGCGCGGCCTCGCCGCCACCGGCTTCGCCGCCGTCGATCTCAGTAGCGGCAAGACATTGAGGTTGTCCGCCCCATGATGCAGCTCGGCAAACG
CGGCTTCGATCACGTCGAGACCTGGGTGTTCGATCTCGACAACACGCTGTACCCGCATCACCTCAACCTATGGCAGCAGGTCGATGCGCGGAT
CCGCGACTTCGTCGCCGACTGGCTGAAGGTTTCGCCGGAAGAAGCCTTCCGTATCCAGAAGGATTACTACAAGCGCTACGGCACCACGATGCG
CGGGATGATGACCGAGCACGGCGTTCACGCCGACGACTACCTGGCTTATGTCCACGCCATCGACCATTCGCCGCTGCAGCCGAATCCGGCGAT
GGGCGATGCGATCGAGCGACTGCCGGGCCGCAAGCTGATCCTGACCAACGGCTCGACCGCCCATGCGGGCAAGGTGCTGGAGCGGCTCGGCAT
CGGCCATCATTTCGAGGCGGTGTTCGACATCATTGCGGCCGACCTCGAGCCGAAGCCGGCGCCGCAGACCTACCGCCGTTTTCTCGATCGCCA
TGGTGTCGACCCGGCCCGCGCCGCGATGTTCGAAGACCTCGCCCGCAACCTCACCGTGCCGCACCAGCTCGGCATGACCACCGTGCTGGTGGT
GCCTGACGATAGCCAGGACGTGGTCCGCGAAGATTGGGAGCTTGAAGGCCGCGACGCCGCCCACGTCGATCACGTGACTGATGATTTGACAGG
GTTCTTGGGGAAGCTGAGTTCGCTGTAGGCCGGGGACGCCTCCCAAGCGTCAATCGTCATCGCCGCCGGATGCAAGGCGGCTAGGTATTGCGG
AGCGCTCGCGATCTTCCGTCCAATGCCCTGGGATACTGGATCGCCCGGACGAGCCGGGCGACGACGTTGAAGAGAGATGACGTGGCGTCACCA
CATCCCCCGCCGTCATCGCCCGCGCAGGCGGGCGATGACTTGGCGGACGGGGCGGCGCCTTGACTCCGACCCGGCGAATCCGGACAACACTCC
GCAAGGACTGGACCACGCTGTTCTTCAGCTTTCGAGGTCGGATCAATCGCGCCAAATACTGGCTGGTCGGACTGATCTACGTCGCCGCCTGGA
TGG …
sequence in FastA
![Page 31: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/31.jpg)
Leucin
Rhodobacter capsulatus
anticodone number %
CUA 3 <1
CUC 119 16
CUG 458 60
CUU 157 20
UUA 0 0
UUG 27 3
Escherichia coli
%
4
9
52
10
11
13
how to find genes?
![Page 32: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/32.jpg)
genes
![Page 33: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/33.jpg)
Sanger
Ch21 (in Nature)
cDNA
GENESCAN
EXOFISH
eukaryotic genes
![Page 34: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/34.jpg)
which proteins are encoded by
genes?
ja1 ja2 ja3 ja4 ja5 ja6
![Page 35: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/35.jpg)
BLAST - search for relatives
![Page 36: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/36.jpg)
which proteins are encoded by
genes?
ja1 ACETYLGLUTAMATE KINASE EC 2.7.2.8
ja2
ja3
ja4 TETRAHYDRODIPICOLINATE EC 2.3.1.117
N-SUCCINYLTRANSFERASE
ja5
ja6 SUCCINYL-DIAMINOPIMELATE EC 3.5.1.18
DESUCCINYLASE
ja1 ja2 ja3 ja4 ja5 ja6
![Page 37: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/37.jpg)
what function have these genes
in the cell?
![Page 38: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/38.jpg)
what function have these genes
in the cell?
![Page 39: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/39.jpg)
which proteins are encoded by
genes?
ja4 TETRAHYDRODIPICOLINATE EC 2.3.1.117
N-SUCCINYLTRANSFERASE
ja5 ACETYLORNITHINE EC 2.6.1.11
TRANSAMINASE
ja6 SUCCINYL-DIAMINOPIMELATE EC 3.5.1.18
DESUCCINYLASE
ja1 ja2 ja3 ja4 ja5 ja6
![Page 40: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/40.jpg)
bioinformatics
Rhodopseudomonas palustris
can synthetize aminoacid
lysine in biochemical pathway with
enzyme EC 2.6.1.17.
![Page 41: Bioinformatika KFC/BIN II. Sekvencefch.upol.cz/wp-content/uploads/2018/03/BIN_02_sekvence_vz4.pdfIUB kód code nucleotides complement A A T C C G G G C T T A (U U) A M AC K R AG Y](https://reader030.vdocumento.com/reader030/viewer/2022040920/5e98159023a7e919ca241c6e/html5/thumbnails/41.jpg)
Credits
• Při přípravě této přednášky byly použity
přednášky:
– Jan Pačes a Jiří Vondrášek – Bioinformatika
(UK Praha)
– Aplikovaná proteomika (UO Hradec Králové)