Introduction to post-editing 1. Introduction: Why Why post-editing MT outputs? Is it really necessary for a translator to acquire post-editing skills? If the machine will replace the work of a technical translator, why acquiring these “new” “new” skills? The answer is simple. Technical translators need to acquire these skills or at least be familiar with the peculiarities of this task because there is currently an increasing demand in the market to post-edit texts coming from machine translation (MT) engines in order to attain different levels of quality.
Bartolomé Mesa-Lao
[email protected] bm
[email protected] Center for Research and Innovation in Translation and Translation Technology Copenhagen Business School, Denmark 22/05/2013 – 22/05/2013 – SEECAT SEECAT project This hand-out presents the basic concepts of post-editing in the localization industry. Aims of the session:
To acquire basic concepts about post-editing. To reflect on the concept of quality in localization. To identify different types & levels of post-editing. To present general post-editing guidelines.
Contents 1. Introduction: Why post-editing post-editing MT outputs? ....................................................... 2 2. Machine Translation Translation ................................................................................................ 2 2.1. MT integrated in the localization process......................................................... 3 3. Basic concepts in post-editing post-editing .................................................................................. 4 3.1. Defining Post-editing........................................................................................ 4 3.2. Post-editing vs. Translation .............................................................................. 5 3.3. Post-editing vs. Revision .................................................................................. 5 3.4. Post-editor profile ............................................................................................. 6 3.5. Pre-editing and controlled language ................................................................. 7 4. Common MT errors ................................................................................................. ................................................................................................. 8 5. Quality in Translation .............................................................................................. 9 5.1. Quality concepts in Localization .................................................................... 11 5.2. Quality of post-edited material: assessment ................................................... 12 6. Types of post-editing .............................................................................................. 13 6.1. Fast post-editing ............................................................................................. 14 6.2. Full post-editing.............................................................................................. 14 7. General post-editing post-editing guidelines ............................................................................. 14 7.1. Guidelines for fast post-editing ...................................................................... 15 7.2. Guidelines for full post-editing....................................................................... 15 8. Post-editing effort effort and productivity...................................................................... 16 8.1. Temporal post-editing effort........................................................................... 17 8.2. Cognitive post-editing effort .......................................................................... 17 8.3. Technical post-editing effort .......................................................................... 17 9. References............................................................................................................... . 17
1 of 18
From the industry perspective, there are several reasons for using MT: a) to lower productivity prices, b) to publish more content, c) to publish into more languages, d ) to publish in less time. In a recent survey carried out by TAUS (2010), 52% of the sixty seven companies in the US, Europe and Asia declared that they provided post-editing services on a regular basis to their clients, and that 74% of the resources they used to carry out the task were freelance translators. As MT is being improved, the role of post-editors might eventually change but there will be a need for their involvement in the process of creating automatic output either by editing the output or implementing changes to the corpus or engines. For example, post-editors could be involved in selecting the adequate corpus and cleaning up the data so the output is more suitable for a particular customer as well as providing constant feedback to improve the engine’s performance. There is room for translators in this “new” “new” field but there is also a need to be prepared and acquire knowledge so translators can be the best capable resource to carry out these tasks as well as to contribute to the development of MT and post- editing techniques and guidelines. According to Vasconcellos and León (1985), who led the first post-editing experience at the PAHO (an organization with one of the longest traditions on MT implementation and postediting), their experience “has “has led to the conclusion that post-editing requires a trained professional translator” translator” because “only “only an experienced translator will be aware of the words whose variable meanings are dependent on extra linguistic context”. Text disambiguation requires the “attention “attention of a translator with training, experience, good knowledge of the subject matter, vocabulary in both languages, and technical understanding of what is meant by the text”. Also, they explained that the post-editor is the professional best fit to give feedback about the engine and to suggest improvements. Moreover, acquiring post-editing skills might be a good practice in translation training. As Kliffer (2008) concludes, following an experiment where translation students post-edit raw output, “post-editing “post-editing drove impressed upon our students the importance of a holistic approach to interpreting the source text and translating the phrase rather than the word. The activity also provided them with a taste of what t o expect if they undertake a career in translation.” translation.” He also remarked that the experience was confidence building for students and increased their motivation. As a conclusion, training in post-editing does not only serve the purpose of acquiring new skills for MT related tasks but it also helps to open up a different perspectives in the already “known “known” translation tasks.
2. Machine Translation The definition of machine translation on the homepage of the European Association of Machine Translation (EAMT) reads: Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains. (EAMT 2008)
2 of 18
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Although the definition is broad, since computers are used to translate texts in other forms that are not called “machine translation”, translation”, such as translation memories, it reflects the use of MT today. MT should be “useful in “useful in a number of specific domains” but domains” but not necessarily a replacement for human translation. The idea of a fully automatic high quality translation (FAHQT) has been replaced by a more practical use of human aided machine translation (HAMT) within restricted environments. Machine translation is used in different industries more or less successfully, especially in those that produce large contents of highly repetitive nature (as is the content in the localization industry) that can be easily “understood” by an engine. MT is frequently associated with controlled language and controlled translation because if technical writers of source texts follow repetitive syntactical patterns, they will facilitate the implementation of MT solutions in a given company, thus increasing their translation capacity and saving costs. Even in this case, not everything is automatic in MT; there is a need for human interaction either before or after the machine has processed the data. The intervention before the machine processes the data is called “pre“pre-editing” and editing” and it occurs at the source-language level to change language structures so that the machine-translation engine is not confronted with ambiguous options. The intervention after the machine processes the data is called “post“post-editing” editing” and it occurs at the target-language level to correct frequent errors in the machine-translated output. Post-editing is still essential to produce an end-quality product, meaning an end-quality product without frequent language mistakes found in the machine-translated output.
2.1. MT integrated in the the localization process process The standard localization workflow consists of a pre-production or analysis phase, a production phase and a post-production phase. During the pre-production phase, files are analyzed to establish type of files, subject matter, language combination and volume by means of wordcounts, thus establishing the complexity of the project. This information serves to calculate the most frequent variables in a localization project: time, time, cost and quality. quality. The word-counts are frequently done using a computer-aided tool (CAT), such as SDL Trados, MemoQ, Déjà Vu or a client’s proprietary client’s proprietary tool. Project Managers or Localization Engineers, depending on the size of the agencies, carry out word-counts against an existing translation memory (TM) using a specific language combination. This process determines the level of full and fuzzy matches in the text. These figures are used in all the financial transactions of a localization project (quotations, purchase orders and invoices). There are standards already set for different levels of fuzzy matches and projects are paid and charged according to these standards (even if fuzzy match payment experiences some variance in the market). In recent years, however, there has been a change in the workflow of localization projects.
Phase 1: Translation memories (TMs)
Phase 2: Machine Translation (MT)
Phase 3: Post-editing by humans
Hybrid text
Source text 0%tr anslated
Translation memory (TM)
Machine Translatoin (MT)
100% 100 % translated
but with MT errors
Human Translator Post-editor
Hybrid text (only translated with retrieved matches)
Untranslated segments?
x % translated
Target text 100% translated
Figure 1. Current translation workflow for most language service providers (LSPs)
3. Basic concepts concepts in post-editing In this first section we would look at the basic concepts necessary to understand the nature of this task as opposed to other already frequent tasks in translation/localization. It is quite common that students and professional translators are trained (academically) in translation strategies and theories, but it is rarer to be trained for revision and post-editing. Therefore, it is advisable to have a clear idea of the tasks involved in post-editing and revision, as well as in translation itself, as well as to have a basic knowledge of how MT operates. Looking at different concepts will help us to define the task and focus on its execution.
3.1. Defining Post-editing Post-editing can be defined as reviewing a pre-translated text generated by a MT
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
“Post-editing: “Post-editing: examination and correction of the text resulting from an automatic or semiautomatic machine system (machine translation, translation memory) to ensure it complies with the natural laws of grammar, punctuation, spelling and meaning” meaning” according to the Draft of European Standard for Translation Services (in Joscelyne 2006). In this last definition, post-editing also refers to the edition of TMs outputs. Although postediting MT outputs and TM outputs tend to run in parallel, they require different skills or at least they require a different focus on different type of errors. We will see that when comparing post-edition with revision later on. Although it is not mentioned in Joscelyne’s definition, it is important to highlight that the task of post-editing is closely related to the set quality expectations within a project.
3.2. Post-editing vs. Translation Translation Now that we have a d efinition of post-editing. How d oes post-editing differ from translation? And how is post-editing related to translation? There are many theories that give different definitions of translation such as the traditional, functionalist or communicative approach. However, translation is seen in localization as an individual step in which the source text is given an equivalent target text. The EN-15028 (the European quality standard for translation services) defines translation as “the “the rendering of the written text in the source language into the target language”. On language”. On many occasions this is only one single string of source text rendered into another string of target text. Translation, as most of us understand it, is something more “so phisticated” phisticated” and broader, that encompasses an in depth knowledge of each language and culture in order to communicate the same meaning in both languages. In the localization industry, however, a simpler concept is used. In technical translation, the standard translation process is: translators translate the source text using a substantial amount of given reference material (style guides, glossaries, dictionaries, term banks and TMs). Then, they will or should revise their work and correct any possible mistakes. And finally, if there is enough money in the budget to afford that, a reviewer will go over the translation again and check issues to do with language (including specific terminology), transfer and layout. The difference at this point is that, during the post-editing task, the translator already has a draft version of the source text (MT output) and depending on the quality provided by the MT engine, the output might require a) translating again from scratch (if it is not useful), b) correcting quite a lot of errors, c) correcting a few errors or d ) simply accepting the proposal without any change. Therefore the post-editor is faced with two source texts (the actual source and the MT proposal). In this sense, post-editing is closer to reviewing than to translating. During this
In a commercial setting, revising is carried out in order to improve texts, supervise quality produced by contractors, as well as revise work done by new employees or contractors. Sometimes, this step is not carried out at all for time or budget constraints and sometimes because the process is already defined as such and it is deemed more efficient not to revise. Although the EN-15038 specifies that the revision needs to be carried out by a third party, not all translation companies follow this standard. The fall in the price of translation has also contributed greatly in the elimination of this quality step. Post-editing also involves revising but the main difference is the source text, while in postediting the text comes from a MT engine (output) when revising, the source is a translation done by a human translator. As a consequence the resulting target text contains different type of errors than those found in a human translation. This type of errors will need to be corrected in a different way depending on the purpose of the text. As Laurian (1984) states “post-editing “post-editing is not revision, nor correction, nor rewriting. It is a new way of considering a text, a new way of working on it, for a new aim”. Krings (2001), who has carried out the most comprehensive post-editing research to date, also points out that this task deals with recurring, predictable errors, while revising checks for mistranslation or omissions. Later on we will see the most frequent errors found in raw output, but in general terms, the errors done by a human translator are randomly made and unpredictable while MT follows certain patterns that can be anticipated according to the language combination, the type of text and the engine used. On some occasions human errors are more difficult to spot but at the same time the texts are easier to read as they follow a “human logic”. Post-editing involves revising a text that might follow an odd syntactical structure. This type of texts put a strain on the person reviewing that it is quite different to the effort required to revise human translations. As Krings points out “working with “working with three different texts in the post-editing situation with source text (source text, machine translation, and the subject’s own subject’s own target text) leads to an additional cognitive load vis-à-vis normal translation with only two texts involved”. involved”. In conclusion, the task of post-editing appears to be a more demanding task than translation in terms of cognitive effort. What seems to be clear is that both revising and post-editing require specific skills, and that translators are key agents in both activities.
3.4. Post-editor profile After analyzing what post-editing is and the difference between this task and other translation related tasks, it would be a natural step to look into the profile needed to carry out the task and the differences from those requirements needed for a translator.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Proficient knowledge of the source language and contrastive knowledge knowledge of source and and target languages. Advanced word processing skills; full key proficiency and efficiency in cursor positioning. Effective use of search and replace functions. Positive, tolerant and open minded predisposition towards MT. Confidence in abilities abilities and technical expertise. Recognition of typical or repetitive MT errors. Ability to use macros and coded dictionaries. Advanced terminology management skills. Background knowledge of MT technology technology and and history including types of post-editing and different levels of expected quality. Pre-editing and Controlled Language Language skills. Knowledge of controlled authoring tools.
Programming skills (for automatically correcting errors). Text Linguistics knowledge.
Some of these skills are shared with those of a translator. However, there are additional skills such as MT technology knowledge and tolerance, pre-editing and controlled language skills or programming skills that are not normally required when looking for translators to take part in a post-editing project.
3.5. Pre-editing and controlled language language There are several pre-editing techniques that allow reducing the post-editing effort. These are: following a style guide (technical writers), controlled terminology (using a set of unique terms when writing) and controlled language. Controlled language means that the source language (e.g. a technical text) is written in a standard way to avoid lexical ambiguity and complex grammatical structures, and thus making it easier for the user to read and understand it and consequently easier to apply technology to the text such as TMs or MTs. As a consequence texts have a consistent and direct style, they can be easily reused, they are easier and cheaper to translate, and easier to read. Controlled language focuses mainly on Vocabulary and Grammar and it is intended for very specific domains, even for specific companies. It is indeed useful to create high quality MT output but also to avail fully of existing TMs (avoiding fully matches with minor or unnecessary lexical or syntactical changes throughout a text). Basically, controlled language will help disambiguation of terms and sentences by keeping a very high level of consistency both externally (terms) and internally (grammatical structure).
Avoid the use of more than three nouns. Avoid too many adjectives modifying a noun. noun. Use determiners. Avoid spelling spelling mistakes and make make sure punctuation is correct. Use the active voice. Use “that”, “in order “in order to” and to” and “whic “which” after h” after verbs that admit omissions. When using phrasal verbs, make make sure that the preposition is as close to the verb verb as possible. Repeat prepositions in conjoined constructions.
Use parallel structures in coordinated coordinated sentences. Use always always the same term for the same item/product: avoid synonyms. Use general general dictionary terms terms rather than obscure obscure terms. terms.
Use acronyms and abbreviations that will not cause ambiguity.
For example: When reading this text, make sure to take notes. When you are reading this text, make sure that you take notes. The consistency of the source text guarantees a smooth process when using MT or TMs and reduces costs for the companies that use it. Additionally, it avoids translators to constantly query for obscure passages in the text. However, controlled language is not always performed on the source texts that will then be machine translated and eventually post-edited. Although the post-editing time is reduced considerably, the initial investment required in order to apply controlled language is high, and therefore companies might avoid this step. Post-editors will find that a vast number of texts that they will work with would not be written using controlled language nor will they be pre-edited.
4. Common MT MT errors There are several classifications of MT errors. The aim of classifying the errors is not only to improve MT output by providing feedback but also to raise awareness amongst post-editors. If they know the type of errors frequently found when performing this task, it is easier to spot them and to know what to change, thus avoiding unnecessary changes. It is important to point out that depending on the type of engine, the content and language pair the type of errors might change considerably. These are just examples of errors and of error typology. Laurian (1984) distinguishes between three types of errors:
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
2.
3.
4.
1.1. General vocabulary 1.1.1. Function words (articles, pronouns, conjunctions) 1.1.2. Other categories (verbs, nouns, adjectives) 1.2. Terminology 1.3. Homographs/Polysemic words (words like “uses” uses ”, “report” report” and and “starts” starts”) 1.4. Idioms (MT systems will tend to translate them literally) Syntactic errors 2.1. Sentence/Clause analysis (wrong analysis of str uctures, relative pronouns, use of commas) 2.2. Syntagmatic structures structures (wrong interpretation of past participle, for for example) 2.3. Word order Grammatical mistakes (for example, the translation of the pronoun IT or gender gender in the romance languages or phrasal verbs “carry out”, out”, “ porter dehors” dehors” in in French instead of “exécuter ”). 3.1. Tense 3.2. Number 3.3. Active / passive voice Errors due to defective input text (mistakes in the source language)
Krings (2001), on a similar line, classifies errors from the MT output of this extensive study as below. The classification is not intended as a general one but to his particular output. However, it is useful to see how errors were classified in this extensive study.
Lexical: Part Part of speech recognition error: error: verbs recognized recognized as nouns or vice vice versa. Lexical: Other: wrong use of certain terms in the context. context. Morphology: Word formation: wrong formation of words. For For example, Drähten des Telefons instead of Telefondräte. Telefondräte .
Morphology: Other: incorrect infinitive form, incorrect plural form. Syntax: Word order Syntax: Other: wrong use of infinitives Stylistic usage norms Punctuation: incorrect comma usage Textual coherence: coherence: incorrect gender gender of anaphoric reference reference form, inconsistent form form of address for text addressees ( Du ( Du and Sie) Sie) Textual pragmatics: inappropriate form of address for for text addressees. addressees. Literal transfer from ST
He rightly points out that several MT errors can overlap; each error can sometimes be assigned to different categories. Although all these classifications are valid for their specific purpose of a particular engine or
word for word translation, the source text and equivalence, the target text and the receiving reader/culture, the communication act and the role of translator as mediator, the purpose (skopos) of the translation, or even the mental state of the translator and her cognitive processes. Of course, every theory draws from the previous one and they all seem to live together, not altogether in harmony, but at least in constant development through these same differences. It is obvious then that depending on the translation theory a “good tran “good translation” slation” will will be classified differently. What might appear to be good for one theory might not be sufficient, and sometimes completely wrong, for another theory. Quality is therefore an obscure and elusive concept. In MT the predominant theory, as Chesterman (2000) reflects, is equivalence in its most pure form: “strict equivalence is a sine qua non. Instead of waffling about mystical energy, practitioners of machine translation are concerned with practical rules of language use. They have to believe that rules exist, and that they are as stable as those of gravity.” gravity.” Pym Pym (2004) also points out that equivalence is the prevailing translation theory behind all processes in localization. And it is not only in translation theory that we find divergent points of view, it seems that professionals in the translation field have their own very particular view of what a good translation is and sometimes if they are queried about it, (what is quality for you?), it is hard for them to come up with a definition. When translation is a transfer of a source string into a target string with the least amount of changes and at maximum speed, equivalence becomes the prevailing concept, even without being conscious about the theory behind it, in any translator ’s ’s behavior when translating. I would add that the skopos theory also plays a very important role, as the purpose of the translation and all the players involved in the translation activity play a fundamental role in localization and in machine translation post-editing. In this context, a “good “good tr anslation” anslation” is the one that renders an equivalent target text according to the skopos of the project in question. Therefore, the translation quality should be judged according to these variables and not according to an abstract notion of linguistic quality. In the localization industry, quality is frequently seen as a series of procedures carried out in order to guarantee a “linguistic” “linguistic” quality that is then again very volatile and that tends to be simplified by classifying errors in different categories and counting them. The translation will be a Pass if the overall count reaches a level, or a Fail if the overall count is below a level. In the first case, the overall quality is deemed to be “good” enough. “good” enough. Brian Mossop who has written a complete and intelligent guide on editing and revising for translators (2001) distinguishes between “quality control” and “quality “quality assessment” and explains that both contribute to “quality “quality assurance”. assurance”.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
5.1. Quality concepts concepts in Localization In Localization, the concept of quality is considered an implicit value provided in all translations carried out by Language Service Providers (LSPs) or freelance translators. Quality is in most cases a given value, an assumed service provided to the client. When working in a translation agency or as a translator, to deliver good quality is a must. Good quality is, then, variable depending on the customer, its product, audience, style guides, reference material, and QA group, amongst others. Since Quality is difficult to define, everyone refers to it in very general and abstract terms. What customers refer to quality translation is that translation, especially in technical translation and localization, reflects exactly the content of the source text. What does “ref lecting lecting exactly the content” content” mean? As we saw before, translation is perceived as the rendering of an equivalent text, almost as a word by word exercise. The target text should contain exactly what the source text contains with minor exceptions, that is, few adaptations to the local markets. On the other hand, the LSPs use a much more functionalist’s functionalist’s approach, that is, the quality provided varies according to the translation brief discussed with the customer, the focus is on the customer ’s needs and what they pay for. If they do not pay for review, well then the translation is not reviewed by a third party. Translators, on the other hand, have different approaches. On some occasions, they will work for a customer oriented purpose and, on other occasions, they might work towards their idea of quality; an idea that is related to the use of correct grammar and language style. The truth is that there is not much time allowed in localization to offer a very well written translation (in Mossop’ Mossop’s definition), and we aim at a well written translation in most cases, while reality more often than not obliges translation providers in general to produce a Fully accurate and even Intelligible translation. Most localization agencies, however, will follow procedures to guarantee the quality of the translated products. These procedures cover everything from correctly selecting the translators to checking the quality of the translation or offering the right translation brief during the project. This set of procedures is normally known as Quality Assurance (QA) and it is designed to assess the quality of products or services provided. QA implies that a series of steps are taken in order to guarantee quality and that corrective actions are in place in case errors are detected in the product or service. Normally, companies will use procedures and indicators to monitor this process. Wikipedia offers a very clear definition of Quality Assurance: “Quality “Quality Assurance refers to planned and systematic production processes that provide confidence in a product's suitability for its intended purpose”. intended purpose”. It It refers to a set of activities intended to ensure that products (goods
If we apply all the concepts that we have seen before we can conclude that quality is not a set of grammatical rules set on stone or an ideal to try and reach, it is a variable concept that will very much depend on the characteristics of a given project as defined on many occasions by the customer and by the translation agency. More often than not, there will be no clear information about the quality of the MT output. Depending on who is providing the information about the output, the quality feedback could be overly enthusiastic or extremely negative. It is rare to receive a serious analysis of the output with samples and scores. Some MT output providers might send an automatic score (Blue, Meteor, NIST or TER) that gives information on how close the output is to human quality with a single number. Unfortunately, this number might mean very little in practical terms. It is advisable to assess the output for each language combination using different parameters (for example, Grammar, Terminology, Format) in a randomly selected set of strings extracted from the overall content (that could be classified according to segment length) where a post-editor can then classify the quality of the segment (Excellent, Good, Poor, or even from 0 to 4, or any other classification). Even though time is required for this assessment, it will give a clear idea of the productivity savings the team of post-editors might be expected to obtain during the project. If the posteditor and the translation team do not have this information, they are working pretty much in the dark in terms of prices and might be overwhelmed by the number of e-mails sent by post-editors complaining about the quality of the output with little data available to discuss the matter. The customer’ customer’ss quality expectations for the final project need to be very specific as post-editing can be “superficial” or “superficial” or “thorough” depending “thorough” depending on the purpose of that translation. As in general revision terms, there are different types of expected quality levels. Post-editing is in general classified in two: Full post-editing leading to human quality translation and rapid post-editing with minimal corrections for text “gis “gisting”. ting”. Between these two options, there is a wide range of alternatives. Establishing the quality expected by the customer will help determining the price as well as writing specific instructions to post-editors. If this is not done, some might correct only major errors thinking that they are obliged to utilize the MT proposal as much as possible, while others will correct major, minor, and even acceptable proposals because they feel the text has to be as human as possible. In general terms, customers know their “readers” and “readers” and the type of text they want to produce. Post-editors should have a very clear idea of the expected quality. Otherwise, they will not be able to start the assignment.
5.2. Quality of post-edited post-edited material: assessment One of the reasons to introduce MT in the localization cycle is to save costs. It would not make
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Consistency to coherence in terminology across the project and Format and Format to correct use of tags, correct character styles, correct footnotes translation, hotkeys not duplicated, correct flagging, correct resizing, correct use of parser, template or project settings file. The errors found are then assigned a severity than can be Minor , Major and Critical . All errors are weighted according to this severity. For example, an error classified as Minor weights 1 point, if classified as Major, 5 points, and finally i f it is deemed to be Critical it is worth the total amount of allowed errors plus 1. Similarly, the J2450 errors are classified as:
Wrong term Wrong meaning Omission Structural error Misspelling Punctuation error Miscellaneous error
The engine used The language pair The desired quality specified by the customer or purpose of the translation The volume of documents that needs to be translated The time available for the translation The structure of the given text They type of “reader “reader s” or s” or “users” for “users” for that particular text The use of the final text
Depending on these factors, there will be different levels ranging from Full post-editing leading to human quality or rapid post-editing with minimal corrections for text “gisting”. In MT and post-editing, it is frequent to differentiate between texts that will be read quickly, for internal use and perishable, and texts that will be published and are intended for a wider audience. In the first case, the texts needs to be understandable and accurate, but the style is not fundamental and it even admits some grammatical and spelling errors. In the second case, the text needs to be understandable and accurate, but also the style, grammar, spelling and terminology need to be similar to the one provided by a human translator. The texts are
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
What should these guidelines cover? Obviously it is difficult to answer this question as it will depend on the quality of the output, language combination, and the usual variables in MT. Besides, post-editors cannot be burdened with a whole book on post-editing, as time is of essence and their work needs to be profitable. The guidelines should be short and precise and they should cover the following areas:
Post-editors should read the source segment first to understand the meaning of the sentence. Then, proceed to read the MT suggestion, so that they can decide whether it can be recycled in post-editing. There are some basic pointers to help with this decision: The suggestions should be applied if: 1. Large pieces of the sentence/term the sentence/term are correct (these can be reused during post-edit).
Description of the type type of engine used. used.
Description of the source text (type and structure of source text).
Brief description of the quality of output for that language combination.
3. Raw MT output contains several errors which might slow down the post-editing task. However, the post-editor types slowly, so post-editing so post-editing still proves to be faster than than translating from scratch.
Expected quality by the customer (as (as described above).
4.
Scenarios when to discard a not useful segment (post-editors should have an idea of how much time to spend in order to “recycle” a “recycle” a segment or discard it altogether). Typical type of errors for that language combination that should be corrected (including reference to tagging and links).
Changes to be avoided (according to customer’s expectations, for example certain stylistic changes).
How to deal with terminology (according to output analysis and customer’s customer’s expectations. The
2.
The raw MT quality i s very high, although some minor corrections may be needed.
The MT output has the correct meaning and it is completely understandable.
You should NOT apply the suggestion if: 1. Raw M T does not make any sense and it would take longer t o post-edit t han to translate from scratch. 2.
The user takes a few minutes trying to figure out what the raw MT is trying to say, but it doe sn’t make sense.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
Trusted by over 1 million members
Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions! Start Free Trial Cancel Anytime.
and, therefore, standard metrics do not exist yet, but mainly to the amount of variables to consider. At any rate, we have little information on productivity of translators’ work translators’ work in general. The industry uses standards (for example, 2000 to 2500 translated words per day) but we all know these standards are hardly applicable to all translators. Moreover, there are also agreed metrics on TM editing (percentages paid according to fuzzy match level), but most translators would agree in saying that these percentages hardly represent the amount of work they need to perform on each proposed segment. The studies dealing with productivity when post-editing MT segments (such as Krings 2001, O’Brien O’Brien 2006, Guerberof 2008 and 2009) do not show pronounced productivity increases when using MT. Frequently, however, MT developers will claim that their engine dramatically increases the translator ’s productivity without necessarily making their methodology available. There is definitely uncertainty about the gains when using MT and post-editing. A figure that is normally used when discussing productivity in post-editing is 5,000 words per day but the reality is that each project will have different productivity according to the different variables. Krings (2001) discusses post-editing effort as the key element in determining if the application of MT is worthwhile and distinguishes three main concepts necessary in order to understand
pp 45. Guerberof, A. 2009. “Productivity and Quality in the post-editing of outputs from translation memories and machine translation” translation”. Localisation Focus. The International Journal of Localisation. Vol. 7 Issue 1. Joscelyne, A. 2006. “Best practices in post-editing” post-editing”. In TAUS. www.translationautomation.com Kliffer, D. 2008. “Post-Editing Machine Translation as an FSL Exercise” Exercise ”. In Porta In Porta Linguarum. Linguarum. Number 9. 53-67 Krings, H. 2001. Repairing Texts: Empirical Investigations of Machine Translation Postediting Processes. Processes. G. S. Koby, ed. Ohio. Kent State University Press. Mitamura, T. 1999. “Controlled Language for Multilingual Machine Translation” Translation”. In Proceedings of Machine Translation Summit VII . Singapore. 13-17. Laurian, A.M. 1984. “Machine Translation: What type of post-editing on what type of documents for what type of users” users”. Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics. Linguistics. 236-238.