Background

Various methods and techniques have been explored in the past in the aim of automatically generating new bilingual (and multilingual) dictionaries based on existing ones, so that given L1>L2 and L2>L3 sets, a new L1>L3 dictionary is produced (see selected references below). The intermediate language that is used in the process is called a pivot, and it is possible to use multiple pivots for this purpose. Although considerable work has been done to this end, it was usually conducted on different types of datasets and evaluated in different ways, applying various algorithms that are often not comparable.

The Translation Inference Across Dictionaries shared task (TIAD-2017) is launched with the intention of offering quality lexical resources within the context of a single coherent experiment, which enables reliable validation of results and solid comparison of methods and techniques for auto-generating translation equivalents across languages, as well as stimulating and enhancing further research. The ensuing systems and results will be presented at a workshop held as part of the first Language, Data and Knowledge conference in Galway, Ireland, on 18-20 June 2017 (http://ldk2017.org). The papers describing the systems of the participants will be peer-reviewed and published on CEUR-WS (http://ceur-ws.org). We hope that the combined workshop and publication will constitute an ideal forum to share the results and ideas stemming from diverse approaches and perspectives, and to promote such improvements and their dissemination.

The data for TIAD-2017 is released by K Dictionaries Ltd and is extracted from its Global Series, which includes monolingual, bilingual and multilingual lexical sets for 24 languages (Kernerman, 2011). Each language core is compiled independently within a common overall framework and sharing the same technical infrastructure with all other languages. It consists of detailed, well-structured lexicographic information on the core language, which serves as a base for translation to any other language. The language pairs are thus created directly between each other, with no intermediate language bias (e.g. English, in WordNet), and joining the different pairs of each language core forms a multilingual network.

References

Acs, J., Pajkossy, K., and Kornai, A. 2013. Building Basic Vocabulary across 40 Languages. Proceedings of the 6th Workshop on Building and Using Comparable Corpora, the Association for Computational Linguistics. ACL. http://eprints.sztaki.hu/7183/2/40lang.pdf

Kaji, H., Tamamura, S. and Erdenebat, D. 2008. Automatic Construction of a Japanese-Chinese Dictionary via English. In LREC 2008 Proceedings: 699–706. http://repository.dlsi.ua.es/242/1/pdf/175_paper.pdf

Kernerman, I. 2011. From Dictionary to Database: Developing a Global Multi-Language Series. In Proceedings of eLex 2011, Bled, 10-12 November 2011. Electronic Lexicography in the 21st Century: New Applications for New Users. http://elex2011.trojina.si/Vsebine/proceedings/eLex2011-14.pdf

Kovář, V., Baisa, V. and Jakubíček, M. 2016. Sketch Engine for Bilingual Lexicography. International Journal of Lexicography, 29.3, 339-352.

Mausam, Soderland, S., Etzioni, O,, Weld, D, Skinner, M. and Bilmes, J. 2008. Compiling a Massive, Multilingual Dictionary via Probabilistic Inference. In Annual Meeting of the Association of Computational Linguistics. ACL. https://www.cs.washington.edu/sites/default/files/ai/papers/tmpiVvJEg.pdf

Saralegi, X., Manterola, I. and San Vicente, I. 2011. Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 846–856. ACL. http://dl.acm.org/citation.cfm?id=2145526.

Shezaf, D. and Rappoport, A. 2010. Bilingual Lexicon Generation Using Non-Aligned Signatures. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 98–107. ACL. http://dl.acm.org/citation.cfm?id=1858692

Tanaka, K. and Umemura, K. 1994. Construction of a Bilingual Dictionary Intermediated by a Third Language. In Proceedings of the 15th Conference on Computational Linguistics, Volume 1, 297–303. ACL. http://dl.acm.org/citation.cfm?id=991937

Villegas, M., Melero, M., Gracia, J., and Bel, N. 2016. Leveraging RDF Graphs for Crossing Multiple Bilingual Dictionaries. In LREC 2016 Proceedings: 613–622. http://repository.dlsi.ua.es/242/1/pdf/175_paper.pdf