By Anna Feldman

Whereas supervised corpus-based equipment are hugely exact for various NLP tasks, together with morphological tagging, they're tricky to port to different languages simply because they require assets which are dear to create. for this reason, many languages haven't any reasonable prospect for morpho-syntactic annotation within the foreseeable destiny. the tactic awarded during this e-book goals to beat this challenge by way of considerably proscribing the required facts and as an alternative extrapolating the suitable info from one other, comparable language. The procedure has been proven on Catalan, Portuguese, and Russian. even supposing those languages are just fairly resource-poor, an analogous procedure could be in precept utilized to any inflected language, so long as there's an annotated corpus of a comparable language on hand. Time wanted for adjusting the approach to a brand new language constitutes a fragment of the time wanted for platforms with wide, manually created assets: days rather than years. This publication touches upon a couple of themes: typology, morphology, corpus linguistics, contrastive linguistics, linguistic annotation, computational linguistics and traditional Language Processing (NLP). Researchers and scholars who're attracted to those medical components in addition to in cross-lingual experiences and purposes will tremendously make the most of this paintings. students and practitioners in machine technology and linguistics are the potential readers of this ebook.

Show description

Read or Download A Resource-Light Approach to Morpho-Syntactic Tagging PDF

Best study & teaching books

Making mathematics with needlework : ten papers and ten projects

Mathematical craftwork has turn into highly regarded, and mathematicians and crafters alike are eager about the connection among their crafts. the focal point of this publication, written for mathematicians, needleworkers, and lecturers of arithmetic, is at the dating among arithmetic and the fiber arts (including knitting, crocheting, cross-stitch, and quilting).

50 Math and Science Games for Leadership

Did you love Math or technology at school? have you ever performed video games that encouraged your concept tactics for Math and technological know-how? attempting to be inventive on your Math, technology or management category? Can management learn? Is management an paintings or a technology or Math? looking to effect your education application with artistic video games?

The Art Teacher's Survival Guide for Secondary Schools: Grades 7-12

A useful compendium of seventy five inventive artwork tasks for paintings educators and school room lecturers This authoritative, sensible, and complete advisor bargains every thing lecturers want to know to behavior an efficient arts guide and appreciation application. It meets secondary artwork teacher's targeted wishes for growing paintings classes that disguise every thing from the basics to electronic media careers for aspiring artists.

Additional info for A Resource-Light Approach to Morpho-Syntactic Tagging

Example text

The MDL approach is based on the insight that a good grammar can be used to most compactly describe the corpus. e. the grammar matches the corpus well; Hana and Culicover 2008). Goldsmith (2001) uses an MDL approach in an algorithm acquiring (with 86% precision) concatenative morphology in a completely unsupervised manner from raw text. More specifically, Goldsmith uses MDL to accept or reject the hypothesis proposed by a set of heuristics. There are also approaches which do not use probability or informationtheoretic measures at all, but instead seek purely discrete relatedness measures and symbolic factorizations.

36 Chapter 3. Previous resource-light approaches to NLP tization problem is often captured in Yarowsky and Wicentowski’s (2000) work by a large root+POS↔inflection mapping table and a simple transducer to handle residual forms. Unfortunately, such an approach is not directly applicable to highly inflected languages, such as Czech or Russian, where sparse data becomes an issue. Yarowsky and Wicentowski (2000) use the Cucerzan and Yarowsky’s (2000) bootstrapping approximation of tag probability distributions.

The initial state annotator tags each word in the corpus with a list of all allowable tags. Since now instead of sets of tags, one tag per word is used, the transformation templates must also be changed. Instead of being templates which change one tag to another, they select a tag from the set of tags. That is, they change a word’s tagging from a set of tags to a single tag. 12). The context C can be defined as before, although Brill (1999) limits the context to the previous (following) word/tag.

Download PDF sample

Rated 5.00 of 5 – based on 34 votes