Thursday, 21 June 2012

Carly Rae Jepsen's Call Me Maybe in Arabic rendition

Carly Rae Jespen's much parodied but utterly brillaint hit 'Call Me Maybe' and its catchy chorus have been faithfully rendered into Arabic. Just so you know.


"Hey, I just met you
and this is crazy
but here's my number
So call me, maybe?"

Thursday, 14 June 2012

Correcting formatting issues to have Arabic and English text in the same line


There are often times when you need to have Arabic and English (or another left-to-right language) in the same piece of text. Microsoft Word has no issue adding Arabic text next to existing English text, but trying the opposite is disastrous. It’s very frustrating when the text flips to the opposite side and the whole text is thrown out of place like this;



When what you actually want is this:



In this text I needed to add English book titles as references, yet as I go to write ‘Political Man’ next to الرجل السياسي, notice that in the first attempt to add the English, the whole line changes position so that the text no longer reads in the correct order.
The solution is actually very simple. To fix this follow these easy steps;

1)      Highlight the text or select all (ctrl+a), and align the text to the right



2)      Align the text direction to the right with this button:



3)      Use the short cut key shift+alt to switch to English keyboard langauge, and simply type in the English where you want, remembering to switch back to Arabic to continue typing int Arabic. 


Tuesday, 12 June 2012

Evaluating Error Analysis Results From MT Systems across Genres (using Arabic)


Introduction

Developers, researchers and end-users of Machine Translation (MT) systems are often interested in analysing their efficacy to establish the relative benefits of their application to translation. In other words, users of MT are interested in the quality of a system’s output and whether it produces ‘good’ translations.  When it comes to judging the quality of MT performance, the basic tenant is captured in the maxim adopted by Papineni et al., ‘The closer a machine translation to a professional human translation, the better it is’ (2001: 1).

Numerous evaluative methods of judging MT system performance have been developed, and they fall into two broad categories; human evaluation and automatic evaluation. The most widely recognized benchmark for assessing MT quality is professional human translators who make judgements based on standards of accuracy, fidelity and fluency usually against ‘gold standard’ human-translated reference texts (King, 1996 in Pryzybocki et al. 2006: 1).

For the purpose of this investigation, MT output quality is seen from the perspective of the typical non-commercial end-user who may be one of the millions who use MT systems on a casual basis. These causal end-users are most likely to utilize MT for informational purposes or to ‘get the gist’ of a text and in such instances it is accuracy of semantic content that becomes the major aspect of quality (Koponen, 2010: 2). This report will adopt an error analysis scheme to classify and count errors in texts machine translated by the popular MT systems Systran and Google Translate. The error scheme focuses on semantic errors found in the target texts. Two fields or text genres have been used; general politics and technology although both texts are designed for general rather than technical or specialist audiences. The results demonstrate how both MT systems perform and whether patterns across text genres may be suggested. Finally, the results will be compared to scores generated by the automated evaluation metric BLEU to establish whether the general results are corroborated between my own human evaluation and that of a popular automatic metric.  
Evaluating MT output quality is an important task that may be of interest to individuals and larger scale commercial users wishing to decide which MT system to use. Error analysis is one way to do this, and comparing results from different systems may serve as a basis for determining the types and frequency of particular errors and thus, how a hierarchical classification of translation errors might serve as the basis for improving fluency in MT systems.

Materials

The texts selected for this experiment have been chosen to reflect the needs of average MT users, and include two source texts in Arabic comprising a general political text, and a technological text suitable for a lay reader on the development of communication technology. Both texts are taken from the Project Syndicate website which provides high quality articles in several languages, and the translations are done by professional volunteer translators who render paragraph for paragraph. The two varying genres were chosen to observe whether levels of translation accuracy vary with text type, and both texts contain a range of sentence types as well as complex nouns, pronouns and names of organisations.

Each source text was translated from Arabic into English using two different MT systems: Systran which is a freely available rule-based (RbMT) MT system and one of the oldest commercial organizations in the field; and Google’s Google Translate, a free statistical system. Rule-based systems rely on thousands of lexical and syntactic rules coded into the software, while statistical systems rely on statistical learning methods applied to large corpora to create a database of bilingual phrase tables and language models for translation (Koponen, 2010: 3).

Methodology

Many error analyses attempt o identify errors in correlation with parts of speech such as verbs, prepositions and determiners etc[1]. However, such an approach departs from my stated aim of evaluating the semantic, rather than purely linguistic aspects of MT output. This evaluation will instead probe the correlation of semantic aspects between source and target text as a general, but effective indicator of fluency.

Analysis of the MT output has been performed by myself, a professional translator who is competent in the language pair. Items in the target text that seem intuitively ‘unnatural’ or simply not fluent or adequate translations of the corresponding source text item as perceived from a semantic point of view,  have been counted as errors using the following scheme adapted from Kaponen, 2010. Items can be larger than single words since compound nouns and idioms for example, are regarded as one single semantic entity in this scheme and count as one error if incorrect. Semantic items were compared between the source and target texts with the human translations as ‘gold standard’ references.

It is of course the case that items can be translated in different ways and still remain legitimate translations by virtue of retaining the semantic content albeit in another expression. In this evaluation, such occurrences are called ‘substituted items’ if the reader of the target text can derive the same informational data from them as he could from the source text or human reference, and for this reason they are not classified as errors.
The following error categories were adopted based on Kaponen’s scheme:

Omitted concept: ST concept that is not conveyed by the TT
Added concept: TT concept that is not present in the ST
Untranslated concept: SL words that appear in TT (with Arabic as SL, untranslated concepts appear transliterated)
Mistranslated concept: A TT concept has the wrong meaning for the context.
Substituted concept: TT concept is not a direct lexical equivalent for ST concept but can be considered a valid replacement for the context (i.e. a synonym).

Results

The number of errors found in the target texts is shown in Table 1, while error rates and percentile analyses are presented in Table 2. Substituted items are shown but not counted as errors. The rule-based system, Systran, made a total of 472 errors, with a greater proportion, though not significantly so, in the technology genre. The overall mean error rate was 21.37%. As for the statistical-based system Google Translate, it produced far fewer errors, with just 135 across both genres, though with a far greater proportion in the technology genre. Overall, the Google Translate combined mean error rate was 6.4%.

Table 1 Error count in MT target texts according to genre and MT system


Omissions
Additions
Untranslated items
Mistranslated items
Substitut-ed items
Misordered items
Total errors
Systran
Politics
8
12
5
189
         12
30
244
Technology
20
6
5
152
0
45
228
Google Translate
Politics   
2
5
2
13
9
6
28
Technology
20
1
2
66
0
18
107

Table 2 Total error counts and averages


Error Rate Error/Words
Error rate (%)
Mean error rate
Systran
Politics
244/1183
20.6%

21.3%
Technology
228/1030
22.14%
Google Translate
Politics   
28/1147
2.4%

6.4%
Technology
107/1028
10.4%

Discussion

The error count results show some patterns but also reveal widely divergent differences in performance across the MT systems, with Google Translate vastly outperforming Systran in preserving source semantic content. While both systems scored higher error counts for the genre of technology compared to politics, Google’s error rate in the political domain was only a tenth of those registered in Systran, and only a half in the technological domain.

By far the most common error in both systems across the genres was mistranslated items (see Table 3), with 72.2% of errors in Systran and 58.5% for Google Translate.  

Table 3 Rate of error type

Omissions
additions
Untranslated items
Mistranslated items
Substitut-ed items
Misordered items
Total errors
Systran
5.9%
3.8%
2.1%
72.2%
2.5%
15.9%
472
Google Translate
16.3%
4.4%
2.9%
58.5%
6.7%
17.8%
135

That both systems recorded the highest number of errors in the ‘mistranslated’ category gives weight to the proposition that semantic mismatches in the form of mistranslations are the most common issue for MT systems in general. However, we must acknowledge that mistranslations occur on a cline of fluency to non-fluency and some mistranslations might more readily relate to the context whilst others are do greatly impede comprehension. In this sample segment below we note that mistranslations in many sentences preclude adequate comprehension for informational purposes, whilst others merely worsen comprehension.

Arabic source text-
وكان من بين المزايا التي يحصل عليها عضو البرلمان الحق في تخصيص خمسة عشر خطاً هاتفياً لمن يعتبره مستحقا.
Human reference translation-
‘Members of parliament had among their privileges the right to allocate 15 telephone connections to whomever they deemed worthy.’
Gloss translation-
‘Was among the privileges that acquired them member of the parliament the right to allocate fifteen line telephone to-who he considered deserving.’
Systran-
‘Thevirtues were among which collects raised hermember of the parliament the right forspecification of five ten lines is telephoneblamed considers him deserved.’
Google Translate-
‘One of the advantages obtained by the Member of Parliament the right to allocate fifteen telephone line for those he considers worthy.’

An exhaustive list of errors classified according to parts of speech is beyond the scope of this report, but the major errors of each system can be seen in this representative example taken from the technological text. The Systran TT is incomprehensible to the extent that users cannot derive from it the same semantic content they otherwise could from the human translated text (of course, in authentic MT scenarios the user does not have access to a human ‘gold standard’ translation for reference).

Among the major mistranslation issues frequently occurring in Systran are:
  • Words joined together across the grammar spectrum; definite articles, prepositions, possessives etc
  • Very high occurrence of homography -erroneous assignment of part-of-speech categories which is a common issue for MT systems and especially direct transfer systems (Lehrberger & Bourbeau, 1988: 15).
  • Failure to translate numbers, خمسة عشر (fifteen) is rendered ‘five ten’.
  • Confusion of definite and indefinite articles.
As for Google Translate, the most common errors are;
  • Omission of the verb ‘to be’ as seen in the example above. Arabic does not usually use the present tense of this verb and this may be indicative of why Google has difficulty ‘detecting’ its semantic import in the source text.
  • Confusion of definite and indefinite articles.
  • Some cases of homophony.
Automatic MT evaluation

Automatic Machine Translation methods build on the idea of proximity to professional human translation by developing metrics that can account for and replicate human evaluations of MT output quality. One such automatic metric is the Bilingual Evaluation Understudy (BLEU) that was developed by a research team at IBM. BLEU measures translations between 0 and 1, with 1 being a perfect match to the reference human translation.

With the error rate scores previously produced by human evaluation in mind, we can form and test the following hypothesis;

Since the BLEU metric measures precision of machine translation by analysing their closeness to the reference translation, we can predict that a text with greater errors will achieve a lower BLEU score, and on this premise Google Translate should achieve a higher score in the BLEU test, indicating that it is a closer and a more precise translation than the one offered by Systran.

This proves to be the case (see chart 1 below), and the large gap in error rate between the two systems seems to be reflected in the BLEU scores. However, the error rate evaluation showed that technology scored higher than politics in both systems, but this seems to the inverse case in the BLEU scores.

Chart 1 Automated BLEU Evaluation.

Conclusion

The results of this report reveal that Google Translate performs better- in fact significantly so- than Systran in translating from Arabic into English in two text genres, politics and technology. Not only did Systran, the rule-based system, make more errors, it is also evident from the error type analysis and the translation segment examples shown that its semantic content was far less adequate than that produced by the statistical system Google Translates. The automatic evaluation scores provided by BLEU substantiate this although it disagrees with the relative performances of the two genres.  It is of course the case that human evaluation is inherently subjective and open to vagaries, yet we have seen that it was able to form the basis of a successful hypothesis affirmed by BLEU, an automated evaluation metric.

 Although Systran made more errors overall and may be deemed less precise, the degree and criticalness of semantic mismatch cannot be ascertained based on the results as certain errors will have greater impact in misconstruing meaning than others.  To account for this, further studies may wish to accord a weighting to each error type according to how critical they are in distorting the semantic content of the source text. Further studies may also like to assess whether a correlation is to be found in error rates and types of MT system used.

Based on the conclusions in this report, I would recommend that Arabic-English users of automated MT systems use Google Translator as their preferred option over Systran due to its greater quality output as demonstrated in this report’s findings.


 [1] See for example the taxonomy developed by Elliott et al in (Elliott et al., 2004). 


Bibliography

Elliott, D., Hartley, A., & Atwell, E., (2004): ‘A fluency error categorization scheme to guide automated machine translation evaluation. In: Machine translation: from real users to research: 6th conference of the Association for Machine Translation in the Americas, AMTA 2004, Washington, DC, September 28 – October 2, 2004; ed. Robert E. Frederking and Kathryn B. Taylor (Berlin: Springer Verlag, 2004); pp. 64-73.

King, M. (1996). ‘Evaluating Natural Language Processing Systems’. In: Communications of the ACM (39) 1, pp.73–79.

Koponen, M. (2010): ‘Assessing Machine Translation Quality with Error Analysis’. In: Electronic Proceedings of the KäTu Symposium on Translation and Interpreting Studies 4 (2010).
Lehrberger, J., & Bourbeau, L., (2010): Machine Translation: Linguistic Characteristics of MT Systems and General Methodology of Evaluation, Lingvisticae Investigationes upplementa 15, Amsterdam/ Philadelphia: John Benjamins.

Papineni, K., Roukos, S., Ward, T. & Zhu, WJ. (2002): ‘BLEU: A Method for Automatic Evaluation of Machine Translation’. ACL 2002: Proceedings of the 40th Annual Meeting of the Association for Computer Linguistics. Philadelphia, July 2002 pp311-318.

Pryzybocki, M., Sanders, G. & Le, A., (2006): ‘Edit Distance: A Metric for Machine Translation Evaluation’. In: LREC (2006).

Friday, 8 June 2012

Video emerges of Saudi man humiliating Bangladeshi worker

A video has emerged in recent days of a Saudi national humiliating and degrading a Bangladeshi man who appears to be employed by him. The Saudi man filmed the episode on his mobile and is seen slapping his victim after he apparently disparaged the Saudi government.


The victim is seen begging for his parents' livelihood as his Saudi attacker interrogates him as to why he purportedly said bad things about the Saudi government. Despite his repeated apologies and denials, the Bangaldeshi man is hit and humiliated even more. The most disturbing thing in video is how the Saudi man forces his victim to kiss his hand and feet, and spits on him. He also made his victim disparage himself and his family by calling them animals and praising the Saudi government.
Let's be absolutely clear about this. Such behaviour on the part of (male) Saudi citizens towards non-white foreigners is not exactly rare. South Asian workers who get on the wrong side of their hosts often go missing, are tortured, and are treated like scum in a society whose vast oil reserves have only exacerbated its chronic and unprecedented state of backwardness and contempt for others. 

Thursday, 7 June 2012

Jordanian car park comes alive with poetry


A scene from the capital Amman.




“let’s imagine
the rivers immolating in the distance
we hear silence and suddenly
comes music toward us
as if it killed us
returning us to more powerful dance
under the white sun”

pics from Revan, translation via Azaadi






Tuesday, 5 June 2012

An Evaluative Comparison of SDL Trados Studio 2009 and MemoQ 5.0


 This report presents a comparative analysis of two prominent Computer-Aided Translation Tools; SDL’s Trados Studio 2009, and Kilgray’s memoQ 5.0 with a view to evaluating the affordances of each tool and advising whether or not memoQ should be introduced to a CAT-tools teaching programme.

I have chosen to compare memoQ to Trados since the latter is the market-leader in CAT tools, and is therefore memoQ’s major rival.  I have identified the major areas of CAT tool functionalities to present a comprehensive comparison whilst bearing conceptual considerations and authentic professional applications in mind.

File Analysis & Invoicing

An integral part of undertaking a translation job is the ability to assess source files to deduce a project’s scope and the savings to be made from a tool’s application (Austermühl , 2001: 142). Further, the ability to derive an accurate invoice from such information is also important (Esselink, c2000: 364).

Both memoQ and Trados Studio provide detailed and customizable file analyses at the global and active document level detailing the standard references of word counts, segment counts and fuzzy matches. However, two features unique to memoQ are homogeneity and the ability to quantify formatting tags in the word count. Homogeneity estimates potential gains from internal leverage, i.e. a kind of internal fuzzy matching, and the translator thereby gains a greater understanding of time-cost savings and may pass this on to a client.

Although both Trados and memoQ can count tags, the latter is unique in its ability to quantify the additional time factor inherent in dealing with heavily-tagged documents such as HTML files; - something many translators feel should be reflected in invoices. Thus a tag-to-word proportion can be entered (i.e. how many words the translator thinks are equivalent to the average tag) which increases the overall word count.


Both tools offer comprehensive file analysis statistics although memoQ is more readily tuned towards the task of invoicing and quantifying workloads. What is more, statistical reports can be generated in memoQ and exported at any time, -perhaps to a project manager during a long, multi-document project, whilst Trados can only export an initial statistical report.

Support for complex language scripts

A feature common to both memoQ and Trados 2009 is the use of an asterisk or pipe (*, |) to act as a de facto fuzzy match feature for when terms are entered into a TB. Unfortunately, this feature does not work with Arabic in memoQ but is successful in Trados, and the problem is compounded because memoQ has no terminology fuzzy match; it fails to recognize Arabic words with the definite article prefix attached even if they were originally saved as indefinite or vice versa, and this is a major translation issue for Arabic.


The ‘workaround’ solution in MemoQ is to perform a look up or wildcard concordance search though this has to be performed manually in a separate dialogue box.
Trados overwhelming excels in this aspect of automatic fuzzy term recognition as it accepts the asterisk and pipe signifiers and it can also perform wildcard termbase searches within the immediate translation environment.


Aside from term retrieval issues, complex language scripts such as Arabic also affect the display and interface. Trados is unable to display Arabic, Russian and the CKJ languages in its preview mode for HTML files, and letters/characters are displayed as question marks.


For its part memoQ offers superb real time previews and is able to display all complex language scripts. Overall, whilst memoQ offers far better display and preview of complex languages, Trados is superior in its handling of them concerning fuzzy/wildcard terminology recognition.

Translation memory (TM); features and management       
        
Translation memories enable the reuse of previously translated segments, known as ‘leveraging’ (Bowker, c2002: 92). The flexibility and efficacy of TM usage in memoQ and Trados 2009 can be compared by testing whether TMs can be easily transferred from one project to another where language data differs slightly (i.e. with sublanguages), and indeed, whether or not TMs can be reversed to function in the inverse language pair.

When a document composed of TUs made in the opposite language pair is opened, we see that memoQ immediately recognizes this and all TMs (and TBs) in the project are able to function without the need to manually change the language pair or create new TMs. Furthermore, memoQ TMs function even when the sublanguages differ (i.e. Ar Lebanon instead of Ar Jordan).


The same experiment in Trados reveals that TMs and TBs are not bidirectional and can neither function across sublanguages;


This problem can only be solved by exporting the original TM in TMX format, and creating a new empty TM in the inverse language pair direction and importing the data. Nevertheless this is more work for the translator especially in a project setting where they may be sent a TM or TB by a client and find they are unable to utilize it without creating new resources.

Transaltors wish to leverage as many TUs as possible from previous tranlations submitted to TMs. MemoQ however goes further in offering automated fragment assembly. This intelligently recognizes source segments which have previously been translated as part of larger TUs by intuitively searching at the subsegment level.


The second feature, TM-driven segmentation, allows MemoQ to intuitively join and split segments to find a match. Trados Studio does not have features approaching this kind of functionality although segmentation settings can be customized.

Terminology management

Terminology management involves the creation and maintenance of term bases so that terms do not have to be retranslated every time they appear, as well as to provide consistency (Savourel, c2001: 281). As previously noted, memoQ term bases can operate bidirectionally and across sublanguages. This flexibility is significant as it offers full compatibility of translation resources so that TBs received from clients or project managers in other sublanguages can be leveraged. MemoQ projects allow any number of TBs (and TMs) to be utilized at the same time while Trados 2009 allows just one (with others for reference).

-Term extraction

An excellent productivity and term management tool in memoQ called term extraction allows translators to intuitively create term bases by an automated search process at the file or project level that identifies suitable TB candidates. Statistical analyses generate frequently occurring terms words ignoring common ‘stop words’ (this is also customizable). If the resultant candidate terms are already in existing TMs/TBs, they are automatically translated, whilst the remaining terms are displayed in context so that they can be translated accurately.


The extraction tool is an effective way for translators to quickly and efficiently create terminology resources for a new translation job, and finding out how many terms can be leveraged from exisitng translation resources. Trados for its part does not come with an extraction tool, and if it did it would still require exisiting TBs/TMs to be in the correct langauge direction to benefit from exisitng resources, and as we have seen, inverting langauge directions require further efforts in Trados.

Terminology management in Trados is perfomed through its Multiterm package which can be integrated into a translation environment. However, it requires that the user carefully manage and integrate TB files alongside the task of translation, and the benefits of this are severally compromised when it is considered that only one TB can be searched at a time, and seperate TB files are useful only for the ability to exchange them. MemoQ’s fully incorporated terminology management console offers more features and the security of fully integrated TBs into projects.

-Automated term leveraging processes

Another terminological feature that makes MemoQ far more efficient and productive with regard to terminology management  is automated concordancing. This tool has been described as “playing a video game with cheat mode constantly on” [1] as it leverages multi-term expressions from the concordance so that the translator does not have to remember what has been translated before. Users of Trado Studio 2009 only have a regular manual concordance search tool which does not offer automated concordance searches, meaning the user has to keep a mental record on whether he suspects that a term or TU has been previously translated. Trados’ earlier 2007 edition did in fact have a ‘start concordance search if no matches found’ feature, and professional translators have decried its removal in the 2009 edition[2].



Although the absence of a fuzzy term match in MemoQ is a startling omission, it nonetheless contains extremely powerful terminological features such as term extraction and automatic concordance that Trados Studio cannot rival.

-Importing Term bases

Although memoQ works with the standard XML termbase format, adding to or merging TBs must be done in CSV format. It is common for clients to have terminology lists or glossaries in Excel format which they may send to translators to use. Trados users must use SDLs Multiterm Convert tool in conjunction with the CSV (XLS) file. This process creates 4 files of which the XDT and XML files must be correctly delineated with field names in another conversion process in the separate Multliterm Desktop tool. The process is extremely convoluted, relies on two tools, and is time-consuming compared to memoQ’s import command available in its term bases tab.

Complex file formats and file management

-Formatting tags
Translating a HTML webpage in Trados is made easier by the way it can effectively place ‘wrap-around’ tags into the target segment. With one mouse click the user can insert the tags around the target words in the correct places, and this an important feature for Arabic as the change in language direction easily confuses the user and the placing of tags requires some consideration.


What is more, Trados alerts the user of incomplete tag pairs byprevent the file being verified and displayng grey ghost tags where the translator must restore them correctly.


MemoQ is far less efficient in inserting tag pairs around large stretches of text and can only insert individual tags with the F9 short cut. Although memoQ provides an error message for incomplete tags, it does not have a comparable restore feature like Trados’ ghost tags. Both tools allow toggling of tag information display, but most crucially memoQ has a tag edit feature allowing the user to edit faulty tags and most importantly,  to localize hyperlinks.


Being able to edit tags is a  major benefit for MemoQ users. SDL has taken the opposite appraoch and insisted on their users not be allowed to access to any kind of source text editing and this is despite the segment lock feature being available. MemoQ users can edit source text segments using F2, and this is vital as source text ‘typos’ will affect subsequent TM matches. Trados users cannot edit source segments but may edit TUs inide the TM.

-Exporting a ppt file

Powerpoint files are  complex file format. Exporting them from Trados results in formatting changes regarding text alignment, and tabled information becomes completely illegible.


However, the same export process from memoQ results in some more minor formatting problems, but it is nonetheless recoverable.


 Interoperability

Interoperability is an essential aspect of modern CAT tools, and is reflected in universality of some basic file formats such as TMX that are designed to operate in all tools (Savourel, c2001: 397). In real translation settings, interoperability is vital for transferring TMs/TBs from one CAT tool to another; creating a translation resource in one particular tool and subsequently sharing it with colleagues who operate other tools; and receiving a job made in one tool but completing it in another.

Because memoQ was a rather late comer in translation tools, interoperability is one its central ideals and even necessities. Indeed, one localization expert has called MemoQ a “the Swiss Army knife of translation environment tools when it comes to compatibility” [3].

MemoQ <> Trados TM exchange.

Exporting a TM from Trados is very simple, and can be done with one right click in the translation memories pane. The exported TM is in the standard TMX format, and can be imported into MemoQ with the ‘import from TMX/CVS’ or ‘create/ use new’ command.
MemoQ has a ‘Process Trados TMX for best results in MemoQ’ command in the import TMX function. This incorporates the TMX in such a way as to best suit the segmentation in files originating in Trados. What is more, memoQ still performs the import if the sublangauges do not completely match.



MemoQ <-> Multiterm exchange

MemoQ has been developed with Trados file formats in mind, and can export any TB in a Multiterm-compatible XML file although it cannot import term base data in the same format. The Multiterm TB has to be exported in text delineated format for import into MemoQ, while memoQ’s export as XML command creates an XDT definition file that provides field name definitions alongside the XML file. However, it is extremely difficult to import into an existing Multiterm termbase because matching the field names is difficult if not confusing.

Project management & work flow

Project management is an essential aspect of managing the delivery and execution of translation jobs efficiently and on schedule, and as such relates to effective work flow procedures between translators, the project manager and the client. Although the tools under evaluation here are freelancer editions, they provide some work flow-related features that facilitate document and resource management, and interaction with project managers.
One vital aspect of document management is ensuring that file and folder locations work effectively and some CAT tools have features to guarantee this.  MemoQ’s reimport feature can be used to efficiently update and synchronize project documents, guaranteeing that the translator is working on the most up-to-date version of a particular document. This automated mechanism reflects a common workflow procedure for translators; they receive an initial document to translate but later receive a changed/updated version from the project manager. Or, translator colleagues exchange files for reviewing and must ensure synchronicity. Confusion is avoided in otherwise having to add and remove documents and keep track of which is the latest.

MemoQ also assigns a unique ‘tracking’ number to each document version that enables changes made by the translators or reviewer visible. What is more, the X-translate feature enables any previous tracked version of a document to be reinstated with segment statuses preserved.

In the same way, changes and comments are automatically updated from exported bilingual RTF documents reimported following review. Trados Studio 2009 cannot produce bilingual RTFs and users must revert to SDL’s TagEditor, and this effectively renders the bilingual review exchange process impossible between Trados 2009 users and non-CAT tool users who work from a word editor.  The reimport and track changes features are not available in Trados 2009.

-Project Packages

Packages are a major project management tool intended to facilitate large file and resource exchanges in a compressed format along with metadata and project information that provide a ready-made work environment. Although memoQ generally offers outstanding interoperability for Trados-made files, it cannot import Trados project packages (it does however import Transit packages). Trados packages conveniently display progress, word counts and assigned tasks.


MemoQ has its own ‘handoff and delivery package’ system that utilizes intuitive file extensions (.mqout) (.mqback). The freelance edition is unable to create handoffs, though it can receive and return them and this reflects an authentic routine work flow although being able to compress files and resources (TMs and TBs) in one file would be highly beneficial for memoQ freelancer users.

Quality Assurance and reviewing

Both memoQ and Trados Studio have real-time quality assurance checks such as spelling, tag placement and placeables review. The purpose of QA is to maintain quality by detecting errors and speed up the translation purpose. Not all automated error warnings will match human quality evaluation and so it is vital that a) the translator can customize QA settings and b) be able to quickly resolve them.

A QA check in memoQ displays all warning messages from either an active document or an entire project in a separate tab where they can be globally managed and a report exported. The user can see the error in context with source and target segments present, and impressively, the user can choose to ignore all warnings of one kind in single click if they are not actual errors. This is useful in a scenario such as the one presented in this screenshot where number mismatches are flagged because of mixed Arabic (known as ‘Hindi’) and western numerals though they are in fact correct.


Trados also produces a global list of error messages along with an exportable report, but intractability is stymied and far less flexible than in memoQ in that errors cannot be simultaneously seen in situ and the user is not taken to the error location by clicking error messages. This inevitably adds some blindness to the task of resolving errors. There is also far more rigidity in solving errors; the Trados user can only delete messages he decides are not real errors, and this automatically confirms a segment’s status. Error messages in memoQ on the other hand are ‘ignored’ rather than deleted, and this allows for the file to be confirmed as translated while still signifying to later reviewers or project managers that a potential error had previously been flagged, and the error message can be re-reviewed if need be.


Trados also lacks a way of grouping error messages by kind, and offering a batch ignore/delete command, entailing that the error review process in Trados is considerably more time-consuming than in memoQ.

A related feature that memoQ has but which is absent in Trados Studio is its extremely powerful global find and replace command (ctrl+h). The user can find and view all occurrences of a word or phrase in both source and target segments and see them listed in a separate tab, then replace, correct or remove them n any or all documents in any format in a single project at once. This is also a major quality assurance mechanism in that it ensures complete consistency as the user is not simply replacing or editing things blindly since the occurrences are shown within context.


Tradios Studio does not have this feature although it can be downloaded from Trados app store OpenExchange at no cost, and operates as a batch processing feature. However, freelancers cannot be expected to pay large amounts for a CAT tool only then to have to download functionalities as add-ons.

Trados Studio instead has a basic find and replace function for the target text that can operate only on one document and the user cannot be shown where the changes are made so as to ensure that they are correct in context.


As is true in many other areas, memoQ users enjoy greater reach with its quality assurance functionalities because of the time-saving batch processes available in resolving errors as well as their visibility in context. Trados lacks these and this ultimately mean it is less able to guarantee quality.

Conclusion

This report has set out to analytically compare the CAT tools SDL Trados Studio 2009 and Kilgray’s Memoq 5.0 (released in 2011).

Following in-depth comparative analysis of the major functional areas of CAT tools, namely file analysis and invoicing, support for complex language scripts and file formats, translation memory and terminology management, project management and work flows, interoperability and QA, I must conclude that memoQ overwhelmingly offers superior features overall, and greater efficiencies and productivity for translators in a range of professional scenarios. It is of course the application of CAT tool features to real professional routines that ultimately justify their usage (Austermühl , 2001: 107).

 Although Trados excels in some areas such as tag placements, it is less amenable to the contemporary need for interoperability and flexibility demanded by exchanges in commercial translation than its rival memoQ. The rigidity of Trados’ unidirectional TMs and TBs are a major drawback for effective resource exchange whilst the lack of HTML text preview for complex language scripts is startling for a leading modern CAT tool. Ultimately, Kilgray is a relative newcomer to the industry and memoQ has evidently been designed with two specifics in mind; - to function compatibly with other CAT tools and especially the market leader Trados and to offer solutions and improvements on it while providing a more intuitive work environment and ease of mgration fortranslators used to other tools.  It is my opinion that memoQ has successfully achieved those aims and more.

The powerful functionalities, capabilities and professional-orientated features of memoQ certainly justify it as a major rival to Trados (2009) and I would highly recommend it to be taught to students enrolled on an MA Translation course as I believe it offers the best features among CAT tools I am aware of, in addition to being highly intuitive and easy to learn.


Bibliography (Harvard)

Austermühl, F., (2001): Electronic Tools for Translators, Manchester: St. Jerome.

Bowker, L., (c2002): Computer-Aided Translation Technology: A Practical Introduction, Ottawa: University of Ottawa Press.

Esselink, B., (c2000): A Practical Guide to Localization, Amsterdam, Philadelphia : John Benjamins Pub. Co.

Losser, K., (2012): ‘Translation Tribulations: Compatibility workflows with the memoQ Translator Pro edition (Part 1)’. [ONLINE] Available at: <http://www.translationtribulations.com/2011/10/compatibility-workflows-with-memoq.html> [Accessed 01 June 2012].

Savourel, Y., (c2001): XML Internationalization and Localization, Indianapolis, Ind: Sams.

Automatic concordance lookup in Trados Studio 2009 FL (SDL Trados support). [ONLINE] Available at: <http://www.proz.com/forum/sdl_trados_support/161098-automatic_concordance_lookup_in_trados_studio_2009_fl.html> [Accessed 29 May 2012].

Improve leverage from existing translation memories. Kilgray Translation Technologies. [ONLINE] Available at: <http://kilgray.com/faq/business-problem/22-improve-leverage-existing-translation-memories> [Accessed 29 May 2012].

Other consulted materials
Quah, C.K, (2006): Translation and technology, Basingstoke: Palgrave Macmillan.

Comparing memoQ™ to SDL Trados Studio™; Benefits of memoQ™ version 3.5 translator pro over SDL Trados Studio 2009™. Kilgray Translation Technologies. [ONLINE PDF] Available at: http://kilgray.com/memoq/memoQvsTrados09.pdf [Accessed 29 May 2012].




[1] Quoted from Roberto Savelli, a translator and memoQ user. Improve Leveraging from Existing Translation Memories [online] Available as: http://kilgray.com/faq/business-problem/22-improve-leverage-existing-translation-memories [Accessed 29 May 2012]
[2] See for example this user online forum http://www.proz.com/forum/sdl_trados_support/161098-automatic_concordance_lookup_in_trados_studio_2009_fl.html [Accessed 29 May 2012]
[3] Compatibility workflows with the memoQ Translator Pro Edition part 1, 2011. Translation Tribulations. [online] Available at <http://www.translationtribulations.com/2011/10/compatibility-workflows-with-memoq.html> [Accessed 29 May 2012].