paramor identifies paradigms

1
ParaMor Identifies Paradigms e.er.erá.ido.ieron.ió 28: deb, escog, ofrec, roconoc, vend, ... e.ido.ieron.ir.irá.ió 28: asist, dirig, exig, ocurr, sufr, ... e.erá.ido.ieron.i ó 28: deb, escog, ... e.er.ido.ieron.ió 46: deb, parec , recog ... e.ido.ieron.irá. 28: asist, dirig, ... e.ido.ieron.ir.ió 39: asist, bat , sal, ... e.er.erá.ieron.ió 32: deb, padec , romp , ... e.ido.ieron.ió 86: asist, deb, hund ,... e.erá.ieron.ió 32: deb, padec, ... er.ido.ieron.ió 58: ascend , ejerc, recog, ... ido.ieron.ir.ió 44: interrump , sal, ... azar.e.ido.ieron.ir. 1: sal Linguistic Structure and Bilingual Informants Help Induce Machine Translation of Lesser- Resourced Languages Christian Monson, Ariadna Font Llitjós, Vamshi Ambati, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell, Robert Frederking, Erik Peterson, Katharina Probst Paradigms: The Structure of Inflectional Morphology Spanish Paradigm Cells Inflection Class ar er ir 1 st , Sg, Present o o o 2 nd , Sg, Present as es es 3 rd , Sg, Present a e e 1 st , Pl, Present amos emos imos ... Hab Mode Repor t Pol / Mood Tense Obj Agr ke pe (ü)rk e la a fi ki fu Ø Ø Ø nu afu Ø Ø Ø Mapudungun (Non-Indoeuropean, Central Chile) Subj Agr / Mood (ü)n li chi yu Loc Asp pa tu pu ka Ø Ø Results Rule Induction Statistics S NP VP N John V NP Det N apple an ate John seb ek ne khaya S NP VP S NP VP ne VP V NP V VP NP S NP VP S NP VP John appl e an ate John seb ek ne khay a VP V NP appl e an ate seb ek khay a VP NP V Sentence s Structur al Rules Rules with Count > 2 Phrasal Lexicon Englis h Urdu 3126 6824 640 12456 Englis h Telugu 3126 7543 721 13500 Englis h German 300K 183K 16K 680K Englis h French 1200K 1.3M 45K 4.2M ParaMor: Unsupervised Induction of Paradigm Morphology Syntactic Rule Induction for Machine Translation Elicitation Tool VP Det NP NP N niños N VP S PolP V jugaron V un N juego N los Syntactic Rule Refinement Translation Correction Tool Automatic Syntax Refinement Automatic Syntax Induction Refinement Results METEOR BLUE NIST Baseline 0.618 0.361 6.68 Refined 0.623 0.378 6.79 Automatic metrics evaluate an English to Spanish MT System Rule Refinement has also been succesfully applied for Mapudungun to Spanish MT English German Finnish Turkish P R F 1 P R F 1 P R F 1 P R F 1 ParaMor & Morfessor 50.6 63.3 56.3 49.5 59.5 54.1 49.8 47.3 48.5 51.9 52.1 52.0 Bernhard 61.6 60.0 60.8 49.1 57.4 52.9 59.7 40.4 48.2 73.7 14.8 24.7 Bordag 59.7 32.1 41.8 60.5 41.6 49.3 71.3 24.4 36.4 81.3 17.6 28.9 Morfessor 82.2 33.1 47.2 67.6 36.9 47.8 76.8 27.5 40.6 73.9 26.1 38.5 Zeman 53.0 42.1 46.9 52.8 28.5 37.0 58.8 20.9 30.9 65.8 18.8 29.2 Morpho Challenge 2007

Upload: oma

Post on 07-Jan-2016

28 views

Category:

Documents


1 download

DESCRIPTION

VP. N. V. N. S. NP. VP. N. V. NP. Det. N. S. VP. John. ate. an. apple. NP. VP. V. NP. John. ne. ek. seb. khaya. John. ate. an. apple. ate. an. apple. John. ne. ek. seb. khaya. ek. seb. khaya. NP. VP. NP. V. S. VP. S. S. VP. VP. NP. VP. NP. ne. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ParaMor Identifies Paradigms

ParaMor Identifies Paradigms

e.er.erá.ido.ieron.ió28: deb, escog, ofrec, roconoc, vend, ...

e.ido.ieron.ir.irá.ió28: asist, dirig, exig, ocurr, sufr, ...

e.erá.ido.ieron.ió28: deb, escog, ...

e.er.ido.ieron.ió46: deb, parec, recog...

e.ido.ieron.irá.ió28: asist, dirig, ...

e.ido.ieron.ir.ió39: asist, bat, sal, ...

e.er.erá.ieron.ió32: deb, padec, romp, ...

e.ido.ieron.ió86: asist, deb, hund,...

e.erá.ieron.ió32: deb, padec, ...

er.ido.ieron.ió58: ascend, ejerc, recog, ...

ido.ieron.ir.ió44: interrump, sal, ...

azar.e.ido.ieron.ir.ió1: sal

Linguistic Structure and Bilingual Informants Help Induce Machine Translation of

Lesser-Resourced Languages

Christian Monson, Ariadna Font Llitjós, Vamshi Ambati, Lori Levin, Alon Lavie, Alison Alvarez, Roberto Aranovich, Jaime Carbonell,

Robert Frederking, Erik Peterson, Katharina Probst

Paradigms: The Structure of Inflectional Morphology

Spanish

Paradigm CellsInflection Class

ar er ir1st, Sg, Present o o o2nd, Sg, Present as es es3rd, Sg, Present a e e1st, Pl, Present amos emos imos

... … … …

Hab Mode ReportPol / Mood

TenseObj Agr

ke pe (ü)rkela a

fiki fu

Ø Ø Ønu afu

ØØ Ø

Mapudungun (Non-Indoeuropean, Central Chile)

Subj Agr / Mood(ü)n

lichiyu…

Loc Asp

pa tu

pu ka

Ø Ø

Results

Rule Induction StatisticsS

NP VP

N

John

V NP

Det N

appleanate

John sebekne khaya

S

NP VP

S

NP VPne

VP

V NP V

VP

NP

S

NP VP

S

NP VP

John appleanate

John sebekne khaya

VP

V NP

appleanate

sebek khaya

VP

NP V

SentencesStructural

RulesRules with Count > 2

Phrasal Lexicon

English Urdu

3126 6824 640 12456

English Telugu

3126 7543 721 13500

English German

300K 183K 16K 680K

English French

1200K 1.3M 45K 4.2M

ParaMor: Unsupervised Induction of Paradigm Morphology

Syntactic Rule Induction for Machine Translation

Elicitation Tool

VP

Det

NP

NP

N

niños

N

VP

S

PolP

V

jugaron

V

un N

juego

Nlos

Syntactic Rule Refinement

Translation Correction Tool Automatic Syntax Refinement

Automatic Syntax Induction

Refinement Results

METEOR BLUE NIST

Baseline 0.618 0.361 6.68

Refined 0.623 0.378 6.79

Automatic metrics evaluate an English to Spanish MT System

Rule Refinement has also been succesfully applied for Mapudungun to Spanish MT

English German Finnish Turkish

P R F1 P R F1 P R F1 P R F1

ParaMor & Morfessor 50.6 63.3 56.3 49.5 59.5 54.1 49.8 47.3 48.5 51.9 52.1 52.0Bernhard 61.6 60.0 60.8 49.1 57.4 52.9 59.7 40.4 48.2 73.7 14.8 24.7

Bordag 59.7 32.1 41.8 60.5 41.6 49.3 71.3 24.4 36.4 81.3 17.6 28.9

Morfessor 82.2 33.1 47.2 67.6 36.9 47.8 76.8 27.5 40.6 73.9 26.1 38.5

Zeman 53.0 42.1 46.9 52.8 28.5 37.0 58.8 20.9 30.9 65.8 18.8 29.2

Morpho Challenge 2007