Machine Translation Tools webinar


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The purpose of this webinar is to share what we’ve learned over the two year process of creating the website from technology to content development to outreach. The site was built with computer-assisted translation, that is one that combines machine translation tools with a human review. It’s been live since January 2012. I’m Gwen Daniels, the Technology Director at ILAO. Also on the call is Dennis Rios, our Spanish Content and Outreach Coordinator.
  • I’m going to start with a little background about the project. LSC awarded the TIG grant in 2010 and work began in 2011. ILAO is not LSC-funded, so the grant was submitted by LAF. The grant had three major components: first, it provided funding for a 2 year Spanish Outreach and Content Coordinator position, integration of the Google Translate API and the creation of tools needed to generate mirrored websites.
  • So since ILAO is not an LSC grantee and we didn’t want to be a subgrant, that raised some special issues in terms of what the Spanish content and outreach coordinator could do. And we’ll talk much more about that when we get to content development and marketing. But basically, LAF provided oversight of the grant, and their Community Engagement Unit oversaw the outreach component. They contracted with ILAO for technical development, content translation services, and implementation of the outreach plan. We were not able, with the LSC funding, to generate any new content under the project.
  • Dennis will speak to the issues related to content and outreach in a bit. I’m going to start with an overview of the technology. We did three major things to support creation of the website: we integrated the Google Tranlsate API into our content management system, we translated the National Subject Matter Index, which provides the practice area framework for most statewide websites into Spanish and we created a series of tools to make generating and keeping the Spanish site updated easier.
  • So what is the Google Translate API? It’s a REST-based service that Google created that allows web developers to seamlessly integrate translation into their systems. By comparison, the website that many people use is a separate service that requires you to copy and paste your text. The translate API is a paid service that allowed us to basically build a drop down and have our content management system send our content to Google and Google sends back an XML packet containing the translation.
  • So the Google Translate API is actually really simple to use. To use, one simply needs to pass a GET request to the API with their key, the 2 letter abbreviation of the source language and the 2 letter abbreviation of the target language and the text to be translated. If all goes well, Google will respond with a JSON-formatted string that includes the translated text.
  • So what does this mean for statewide websites? It means we can integrate that GET request in a way that we can send content to the Google Translate API and Google translates it, sends the translated text back and then we can do with it whatever we want. So, for ILAO, we built just a drop down into our content management system that when selected, takes all our translatable fields and sends them to Google. What are translatable fields? By that I mean those fields we want in Spanish so title, description, the actual content, keywords. Google can’t handle more than about 1000 characters at a time so we built a splitter that breaks blocks of text on various markers: paragraphs, headers, and list elements to make them smaller chunks. So I go in and select Spanish from the list…
  • And within a few seconds, we have the same content created in Spanish. In addition to the translations, we updated our content management system to create a new piece of content from the English version, copying all the metadata such as problem codes and jurisdictions, author, site settings as well as populating all the translated fields with the Spanish translations. And because Google Translate supports HTML, we are able to preserve all of our formatting in the new piece of content. What this means is Dennis isn’t wasting time formatting text. He can focus on reviewing the actual translations.
  • So we’ve been using the service for the last couple of years. What we’ve learned is this: it does work well as a first pass for translation. The plainer the language in the English content, the better Google translates it. We’ve seen significant improvements over the past year—as Google learns more and indexes more Spanish content, it gets smarter as a translator.
  • As I mentioned, it’s a paid service. When we first submitted the grant proposal, it was a free service. Then Google announced they were deprecating the API and the developer community complained enough that Google agreed to re-launch it as a paid service. Google charges $20/million characters translated and HTML mark up does not count in that limit. We’ve spent less than $100.
  • So we came up with the idea of creating a “site builder’ that would take the page layout files from the English website and replace
  • So for creating the page templates, we took our English site code and then replaced all the navigation elements with variable names. In many ways, it is conceptually similar to creating an automated form or a mail merge. We did this for each page of our website and it should also allow us to publish new websites using different language variables.
  • From a admin perspective, all those variables names that are used in our page templates are stored in a database table that the content team can access by clicking Manage Templates. This allows them to make changes and update translations when needed without tech support. And I’ll show you what that screen looks like.
  • So they can change links, text through the admin panel and then regenerate the website and have their changes take immediate effect.
  • And now, I’ll turn this over to Dennis to talk about content and marketing.
  • Content and Marketing Ayuda Legal IL
  • So the idea was to combine technology with community engagement to create a vibrant, meaningful legal resource for Hispanics in Illinois. The project started in January of 2011. By February, we hired a Spanish Content & Outreach Coordinator, and we engaged a Spanish Website Advisory Committee, referred to as the “SWAC”. The committee consists of members from each of the organizations whose logos you see on this slide, including the three LSC-funded programs in Illinois and several other community groups with strong ties to the Hispanic and Latino populations. The Committee has a project charter. It’s mission is to guide, promote, advise and support Illinois Legal Aid Online in its effort to better serve Illinois’ lower-income, Spanish-speaking community and their advocates by increasing access to justice. It meets every other month and we provide lunch. We designed a number of documents, including a project timeline and a checklist of participation and outreach guidelines for committee members. We’d be happy to share these with you if you are interested. The members help us: prioritize content for translation; advise on the drafting of an outreach and education and a separate marketing plan; and assist in implementing these plans. They also helped us pick out a “face” for the website, which I’ll show you in a moment.
  • Here are more metrics from Google Analytics – our most-viewed articles.Here you see the titles in Spanish. For those of you, like me, who don’t speak Spanish, HERE is their translation. How can a criminal record affect my immigration status?U Visas for victims of crimeHow long will a foreclosure on my home take? Food stampsObtaining permanent residency in the USTips for parents who are separatingWe were surprised by this list. For comparison’s sake, I’ll ALSO give you the most popular pieces of legal content on our English website for the public. We expected that there would be more of a focus on immigration-related issues, but did not expect there to be such a disparity in the areas covered.
  • I want to talk briefly about the usage so far. The website launched on January 23, 2012, so it is about 10 months old. What you see on this slide are some metrics taken from Google Analytics, which provides free web data collection. When I ran these stats, there had been almost 40,000 visits to the website. The pie chart shows how people are getting to us – Search traffic represents users who come to the website from Google or another search engine. Referral traffic is websites who link to your website; Direct traffic shows users who type the URL into their browser directly. The map shows the approximate geographic location of the website visits. Not surprisingly, most came from Illinois – but also portions from California, Texas and Florida. We had a fair number from Mexico and Spain, as well. The other interesting usage statistic is mobile devices. 24% of the visits to the website thus far have been from mobile devices; this compares to about 16% from our English-language website, Illinois Legal Aid dot org. The breakdown of the type of mobile devices heavily favors Android, followed by Apple devices.Poll: What % of Hispanic adults own a smartphone? 50%
  • Machine Translation Tools webinar

    1. 1. Creating An Experiment with Machine Translation
    2. 2. About Our TIG Grant• Project of LAF (formerly Legal Assistance Foundation of Metropolitan Chicago)• Started in January 2011• $157,100• Funded: creation and promotion of, 2 year Spanish Outreach and Content Coordinator and integration of machine translation tools and tools to create mirrored sites
    3. 3. Staffing & Structuring the Project• LAF oversaw the grant• LAF’s Community Engagement Unit oversaw the outreach component• LAF subcontracted with ILAO for: – Technical Development – Content translation of Existing Statewide Website Content – Implementation of the outreach plan
    4. 4. The Technology Behind• Google Translate API• A Spanish NSMI• Site Builder Tools
    5. 5. The Google Translate API• REST-based service that allows seamless translation from a web site• Everything happens machine to machine; the Google part is completely hidden from the use
    6. 6. Thoughts on Google Translate’s Utility• Plain Language translates best• Short sentences give better translations• It has improved significantly in the past year
    7. 7. How Much?• Google charges $20/million characters translated• ILAO has spent less than $100 in Google Translate costs since January 2011
    8. 8. Spanish NSMIWill Be Available on GoogleDocs and we’ll share it withLSNTAP
    9. 9. The “Mirrored Site Builder”• How it works: – Took English web templates and replaced English text with variable names – Created a database table to map variable names to Spanish translations – Built tools to allow us to set the Spanish translations – Built tools to allow us to generate and re-generate the Spanish website on demand
    10. 10. Our Site Builder
    11. 11. Tech Docs Coming Soon• Translated NSMI• List of Variables and their translations used on• Technical Documentation on Google Translate, including code samples in ColdFusion• Technical Documentation on the Site Builder
    12. 12. Content & Marketing
    13. 13. The SWAC• Spanish Website Advisory Committee• 7 Local Pro-Latino Organizations• Meeting every other month
    14. 14. What Content to Translate? 1. 1. ¿Cómo criminal record affect How can a los Antecedentes 1. Getting a Divorce in Illinois Penales Pueden Afectar mi my immigration status? Estatus Migratorio? 2. Evicting Your Tenant 2. 2. Visa: Immigration benefits for U La U-Visa: Beneficios de 3. Getting Custody of a Child Inmigración paracrime victims of Víctimas de Crimen 4. Creating a Non-Profit 3. How long will a foreclosure on my Organization 3. ¿Cuánto Tiempo Tomará el home take? Proceso de Ejecución 5. Changing Your Name in Illinios Hipotecaria sobre Mi Casa? 4. Getting Food Stamps 4. Obtener Estampillas de Comida 6. Applying for Unemployment 5. Getting permanent residency in (SNAP) Benefits the US 5. ¿Cómo Puedo Obtener la 7. Expunging Your Criminal Record Residencia Permanente en los 6. Tips for EE.UU.?who are parents separating 6. Consejos para Padres Separados: Cómo Ayudar a sus Hijos con el Proceso de la Separación
    15. 15. Marketing the Site• Usability testing• Newspapers and other publications• Radio• Presentations
    16. 16. Website UsageMobile use …25% of visits from mobile (16%)60% Android » 31% iOS » 3% Blackberry
    17. 17. What would we have done differently?• Prioritized content based on Spanish not on English• Had we not been limited to translating content, we probably would have developed some new content directly in Spanish• Quality assurance sooner--especially when English content is changed
    18. 18. Questions?• Gwen Daniels, Director of Technology Development,• Dennis Rios, Spanish Content and Outreach Coordinator,