Constraint Driven Transliteration Discovery

Dan Goldwasser     Ming-Wei Chang     Yuancheng Tu     Dan Roth    
Book chapter in RANLP, 2009
[pdf]

Abstract

This paper introduces a novel constraint-driven learning framework for identifying named-entity (NE) transliterations. Traditional approaches to the problem of discovering transliterations depend heavily on correctly segmenting the target and the transliteration candidate and on and aligning these segments. In this work we propose to formulate the process of aligning segments as a constrained optimization problem. We consider the aligned segments as a latent feature representation and show how to infer an optimal latent representation and how to use it in order to learn an improved discriminative transliteration classifier. Our algorithm is an EM-like iterative algorithm that alternates between an optimization step for the latent representation and a learning step for the classifier’s parameters. We apply this method both in supervised and unsupervised settings, and show that our model can significantly outperform previous methods trained using considerably more resources


Bib Entry

  @article{GCTR_ranlp_2009,
    author = "Dan Goldwasser and Ming-Wei Chang and Yuancheng Tu and Dan Roth",
    title = "Constraint Driven Transliteration Discovery",
    booktitle = "Book chapter in RANLP",
    year = "2009"
  }