Playing with Google Translator Toolkit API



Just out of curiosity I wanted to know how many words I have already translated from English to Portuguese in respect to Translating ScottGu's Blog to Portuguese.

This appeared to be a great chance to play with Google Translator Toolkit (GTT) API since I use GTT to translate Scott Guthrie’s posts.

GTT gives me the number of words it finds in the source document. I could count them one by one but that’d be a tedious task. Don’t you think? That’s what computers are for.

Google Translator Toolkit is a pretty good tool because it helps translators translate better and more quickly through one shared, innovative translation technology. It uses machine translation when possible and still allows human intervention.

When a document is uploaded for translation, GTT pretranslate the doc with a combination of previous translated docs by human translation (translation memories), machine translation, etc. Great technology put to work here. That’s why I’ve chosen it.

Given the above, the translated word count I’m interested won’t be an exact figure but it does show a realistic figure about my work as a translator. So let’s find this magic number using GTT API.

Basically what one needs to write a GTT client app is very well described at Google Translator Toolkit Data API v1.0 Developer's Guide. Pay special attention to the Getting Started section as it teaches you how to set up the Google client library. Refer to this: Getting Started with the Google Data Java Client Library.

Now I’m using a Mac and so I played with GTT API with Eclipse for Mac OS if you mind. The programming language is Java and the following code creates a console application.

This is the code I wrote to satisfy my curiosity:


import
com.google.gdata.client.gtt.*;
import com.google.gdata.client.gtt.DocumentQuery;
import com.google.gdata.data.gtt.*;
import com.google.gdata.util.*;

import java.io.IOException;
import java.net.URL;

/**
*
@author Leniel Macaferi
* 12-11-2010
*/
public class GttClient
{

   
static final String DOCUMENTS_FEED_URI = "http://translate.google.com/toolkit/feeds/documents";

   
public static void main(String[] args) throws IOException, ServiceException
   
{
       
try
       
{
           
GttService myService = new GttService("GoogleTranslatorToolkitClientApp");

           
// Your Google username and password go here...
           
myService.setUserCredentials("YourUserName", "YourPassword");

            URL feedUrl =
new URL(DOCUMENTS_FEED_URI);

            DocumentQuery query =
new DocumentQuery(feedUrl);

           
// Send the query to the server.
           
DocumentFeed resultFeed = myService.getFeed(query, DocumentFeed.class);

            printResults
(resultFeed);
       
}
       
catch (AuthenticationException e)
        {
           
// TODO Auto-generated catch block
           
e.printStackTrace();
       
}
    }

   
/**
     * Iterates the document feed and prints some information to the console screen.
     *
@param resultFeed
     */
   
private static void printResults(DocumentFeed resultFeed)
    {
       
System.out.println("...done, there are " + resultFeed.getEntries().size()
               
+ " documents matching the query in your inbox.\n");

       
int i = 1;
       
int totalWords = 0;

       
for (DocumentEntry entry : resultFeed.getEntries())
        {
           
System.out.println(String.valueOf(i++) ") "
                   
+ "id = " + entry.getId().substring(DOCUMENTS_FEED_URI.length() + 1)
                   
+ ", title = '" + entry.getTitle().getPlainText() + "'"
                   
+ ", number of words = '" + entry.getNumberOfSourceWords().getValue() + "'");

            totalWords += entry.getNumberOfSourceWords
().getValue();
       
}

       
// Here's where I satisfy my curiosity... :D
       
System.out.println("Total words translated so far = " + totalWords);
   
}
}

As you see the code is straightforward.

Make sure to replace the strings YourUserName and YourPassword to match your GTT login information.

When I ran the code, this was the output I got:

1) id = 00001vipkz2ce0w, title = 'a-few-quick-asp-net-mvc-3-installation-notes.aspx', number of words = '482'
2) id = 0000082e1f41udc, title = 'add-reference-dialog-improvements-vs-2010-and-net-4-0-series.aspx', number of words = '399'
3) id = 0000206w1a7tog0, title = 'announcing-entity-framework-code-first-ctp5-release.aspx', number of words = '2165'
4) id = 00001qlrxh63jls, title = 'announcing-nupack-asp-net-mvc-3-beta-and-webmatrix-beta-2.aspx', number of words = '1500'
5) id = 00001zm1lv1meio, title = 'announcing-silverlight-5.aspx', number of words = '784'
6) id = 00001vgl6hzbwg0, title = 'announcing-the-asp-net-mvc-3-release-candidate.aspx', number of words = '1999'
7) id = 00001rtmq1go4cg, title = 'asp-net-4-seo-improvements-vs-2010-and-net-4-0-series.aspx', number of words = '1113'
8) id = 00000snuf95gd8g, title = 'asp-net-mvc-2-model-validation.aspx', number of words = '2925'
9) id = 00000joo5ihwykg, title = 'asp-net-mvc-2-release-candidate-2-now-available.aspx', number of words = '549'
10) id = 00000si5lunjbi8, title = 'asp-net-mvc-2-released.aspx', number of words = '524'
11) id = 00000f8wakzalts, title = 'asp-net-mvc-2-strongly-typed-html-helpers.aspx', number of words = '705'
12) id = 00000en7g3nkvls, title = 'asp-net-mvc-2.aspx', number of words = '619'
13) id = 00001so4spobfnk, title = 'asp-net-mvc-3-layouts.aspx', number of words = '1817'
14) id = 00001snnr32adc0, title = 'asp-net-mvc-3-new-model-directive-support-in-razor.aspx', number of words = '792'
15) id = 00001vldns8skqo, title = 'asp-net-mvc-3-server-side-comments-with-razor.aspx', number of words = '653'
16) id = 00001r2yti5z4sg, title = 'automating-deployment-with-microsoft-web-deploy.aspx', number of words = '3784'
17) id = 00000m0if2iqmtc, title = 'built-in-charting-controls-vs-2010-and-net-4-series.aspx', number of words = '637'
18) id = 000010zzw2fq1a8, title = 'cleaner-html-markup-with-asp-net-4-web-forms-client-ids-vs-2010-a', number of words = '1725'
19) id = 00001ih7gktv6kg, title = 'code-first-development-with-entity-framework-4.aspx', number of words = '5365'
20) id = 00001r7ki4lh05c, title = 'debugging-tips-with-visual-studio-2010.aspx', number of words = '1692'
21) id = 0000151qr16itj4, title = 'download-and-share-visual-studio-color-schemes.aspx', number of words = '311'
22) id = 00001ibpvmetdkw, title = 'entity-framework-4-code-first-custom-database-schema-mapping.aspx', number of words = '1930'
23) id = 00001f6hlprohz4, title = 'introducing-asp-net-mvc-3-preview-1.aspx', number of words = '2833'
24) id = 00001so0gu8f3ls, title = 'introducing-razor.aspx', number of words = '3312'
25) id = 00000ug44uvcow0, title = 'javascript-intellisense-improvements-with-vs-2010.aspx', number of words = '930'
26) id = 00000jtavmijvgg, title = 'jquery-1-4-1-intellisense-with-visual-studio.aspx', number of words = '179'
27) id = 00001q9k1j435s0, title = 'jquery-templates-data-link-and-globalization-accepted-as-official', number of words = '828'
28) id = 00000b4mu45b4e8, title = 'microsoft-ajax-cdn-now-with-ssl-support.aspx', number of words = '309'
29) id = 00000ao1rdg9dds, title = 'my-presentations-in-europe-december-2009.aspx', number of words = '1684'
30) id = 000010q75bisw74, title = 'new-lt-gt-syntax-for-html-encoding-output-in-asp-net-4-and-asp-ne', number of words = '977'
31) id = 00000lubanp79c0, title = 'no-intellisense-with-vs-2010-rc-and-how-to-fix-it.aspx', number of words = '449'
32) id = 00000tqxxxv1h4w, title = 'optional-parameters-and-named-arguments-in-c-4-and-a-cool-scenari', number of words = '841'
33) id = 000010ydwjqjzeo, title = 'pinning-projects-and-solutions-with-visual-studio-2010.aspx', number of words = '667'
34) id = 00001wjhyhw29ds, title = 'search-engine-optimization-seo-toolkit.aspx', number of words = '711'
35) id = 000007lw738nbwg, title = 'searching-and-navigating-code-in-vs-2010-vs-2010-and-net-4-0-seri', number of words = '1305'
36) id = 00000e1cqi0ta0w, title = 'silverlight-4-demos-from-my-pdc-keynote-now-available.aspx', number of words = '613'
37) id = 00000743lq060hs, title = 'url-routing-with-asp-net-4-web-forms-vs-2010-and-net-4-0-series.a', number of words = '1045'
38) id = 000020npa0inls0, title = 'using-ef-code-first-with-an-existing-database.aspx', number of words = '2520'
39) id = 00001vo9oowkpvk, title = 'Using-Server-Side-Comments-with-ASP.NET-2.0-.aspx', number of words = '460'
40) id = 00000eafvti4sn4, title = 'visual-studio-2010-and-net-4-0-update.aspx', number of words = '391'
41) id = 000017d48tqvpc0, title = 'visual-studio-2010-productivity-power-tool-extensions.aspx', number of words = '682'
42) id = 000008fncz9zpc0, title = 'vs-2010-and-net-4-0-beta-2.aspx', number of words = '590'
43) id = 000007m1ubf8zcw, title = 'vs-2010-code-intellisense-improvements-vs-2010-and-net-4-0-series', number of words = '734'
44) id = 00000jxh5t9ebr4, title = 'vs-2010-net-4-release-candidate.aspx', number of words = '748'
45) id = 00001r3ulw3aygw, title = 'vs-2010-web-deployment.aspx', number of words = '1352'
46) id = 000008sxc7ta800, title = 'wpf-4-vs-2010-and-net-4-0-series.aspx', number of words = '3171'

Total words translated so far = 59801

If I was to charge for those translations and considering that each word translated costs $ 0.07 (market price), this fast math gives how much I’d have accrued so far:

59801 x 0.07 = $ 4,186.07

$ 4,186.07 / 46 = $ 91.00 average per doc translated

That’s a lot of money, but as you know already I do not charge a thing to translate ScottGu’s blog. That’s something I do to help others and to keep myself up to date.

Hope you liked the curiosity that made me write this post, the reasoning regarding the math and of course this simple java console application.

Notes
It’s important to mention that I started using GTT on Oct 17, 2009, that is, more than a year after I started translating Scott’s posts. According to my records, I have translated in fact 77 posts so far since April 2008. Those remaining 31 posts ( 77 - 46 = 31 ) didn’t figure in the math above. :(

GTT cold have folders just like Google Docs so that one could organize their translations by client or whatever.

I tried to get only 100% translate completed documents but GTT doesn’t give me this info. It’s true even if I mark the translation as complete. Although GTT shows 100% complete in its UI, when I read the value of entry.getPercentComplete() it gives me not 100% but what is described at Word count and translation completion. So I had to consider every document even those that I still need to finish translating.

Download
You can download the sample app with the necessary libraries at:

https://sites.google.com/site/leniel/blog/GoogleTranslatorToolkitClientApp.zip