Freitag, 23. Januar 2009

Statistical Machine Translation

I recently stumbled over a reasonable good survey on Statistical Machine Translation from Lopez [1]  . Starting with the IBM Model 3 and 4 it explains the critical steps of machine translation like
1. selection of the translational model (e.g. Transducers, Synchronous Context Free grammars)
2. Parametrization of the model, i.e. what are the parameters which can be learned (e.g. fertility of words, word alignment etc.)
3. Parameter estimation, i.e. how to estimate the values of parametrization (e.g. using generative or discriminative statistical models)
4. Decoding, which is simply translating new text based on the selected and parametrized model

Overall, it contains some interesting detail insights on problems like how to deal with sequences and the difference between discriminative and generative statistical models (see also CRF Introduction). Worthy to read.

Open Source Resources:
[Moses] http://www.statmt.org/moses/
[Overview] http://opentranslation.aspirationtech.org/index.php/Open_Source_Translation_Tools


[1] Lopez, A. 2008. Statistical machine translation. ACM Comput. Surv. 40, 3 (Aug. 2008), 1-49. DOI= http://doi.acm.org/10.1145/1380584.1380586

Keine Kommentare:

Kommentar veröffentlichen