Opendata, web and dolomites

Report

Teaser, summary, work performed and final results

Periodic Reporting for period 1 - Unbabel (Unbabel: Scalable, affordable and seamless content globalization using distributed crowd-post editing)

Teaser

Unbabel has developed an innovative translation pipeline that combines Statistical Machine Translation (MT) and distributed crowd post-editing. Unbabel’s approach has a strong emphasis on improving the current state-of-the-art technology to enable fast, scalable and...

Summary

Unbabel has developed an innovative translation pipeline that combines Statistical Machine Translation (MT) and distributed crowd post-editing. Unbabel’s approach has a strong emphasis on improving the current state-of-the-art technology to enable fast, scalable and cost-effective human quality translations.
The approach is based on two key insights:
1. A large percentage of mistakes in MT stem from small errors that are easy for humans to fix.
2. In order to produce consistent quality, we need to have multiple people working in sequence, each correcting the previous person’s work.

The use of post-editing and several computer aided translation tools embedded directly in the translation UI, significantly minimizes the human effort required to produce a quality translation, thus reducing the cost of translation tremendously. Furthermore, by using statistical MT engines, Unbabel generates post-edited data that is then used to update the MT engine, thus leading to higher quality MT output that is going to require less post-editing effort by the human editors, creating a virtuous cycle. Classical Post-editing has been shown to increase the translation speed by 20%-40% while increasing the translation quality. Adding parallelism to the translation process greatly increases its scalability, and to the best of our knowledge it is the first time this is done.

Unbabel’s business strategy is based on already existing cloud services and web application platforms, such as Zendesk, MailChimp or Wordpress. With this approach, Unbabel is able to address huge user communities at once, instead to trying to reach a huge amount of single users spread around the world individually. To this end, Unbabel provides an API that allows integrating Unbabel services with online applications already used by companies, making translation a transparent part of the process. This results in an increase in the ability especially for SMEs to compete in the international markets, effectively levelling the language playing field.

Two objectives have been identified as crucial for the next step before the start of the SME Instrument Phase 1 project:
1. To accelerate growth by integrating the Unbabel translation services with the most promising web/cloud platforms addressing users like SMEs that are the main target for our services and
2. To accelerate the growth and improve the quality of the editor community to improve quality and availability of our services. A critical issue in this context is the quality of user interfaces. To attract editors, the interfaces must be attractive, and to enhance efficiency of the post-editing process, the user interface itself must allow for efficient use.

Unbabel performed two studies to support the team in pursuing these goals. The first study is a market survey aimed to identify the most promising cloud/web services as channels for company growth. The second study is aimed at finding the best solution for post editing user interface. These studies were carried out in the course of the project and its results were integrated into a broader feasibility study evaluating the company’s business case and planning the next steps in business development.

Work performed

Market Survey

The main objective of the Market Survey was to identify the most promising cloud services as main channels for Unbabels translation services. To this end, Unbabel analysed the market potential of each channel and the associated risk and effort to integrate the translation platform with this specific channel. Finally, Unbabel performed a trade-off, considering also strategic aspects, and decided on the channels to integrate first.

Before this study, the market potential, user segmentation and characteristics of the required translation process in order to meet demand of the various integrations were not yet well understood. Through an extensive market study for interesting integrations Unbabel has learned much about the market it intends to enter. By first listing possible integrations by their strategic and financial potential we came up with a short list of 50 potential integrations in the areas of E-commerce, CMS, Sales/CRM, Social Media, Marketing, Customer support, Websites, E-mail, Self Publishing, Collaboration Software, File sharing, Sharing Economy and Subtitling. After a first filtering based on data availability and market importance, 30 services were considered for further analysis. For these 30 potential integrations we conducted market analyses to look into the attractiveness and dynamics of the respective integration. We considered especially user data, international users, calculated approximate expectations of words to translate volumes, and how many of those customers we would be able to acquire, to get an idea about the size of the market we can conquer.
Our qualitative assessment of the 30 integrations was mostly focused on discovering the product’s growth potential, ease of finding customers, type of customers, pricing structure, availability of recurring dynamic content, presence of competing offers and worldwide presence. We categorized the integrations on these aspects. After an elimination process, a list of 13 potential integrations remained. These were prioritized on prior customer requests, enterprise tools, direct competitors that are already clients of Unbabel (e.g. Zendesk, MailChimp) and presence of competition. If any competing translation service was available, the integration was deprioritized in a ranking that was initially based on amount of revenue per month and the other prioritization topics (e.g prior customer requests) with the weight determined by strength of competition according to Unbabel. After ranking these on the total revenue and average revenue per users, we discussed the strategic intention of each one of those. We also took into account the ease of building the integration for each of them. As a preliminary result, we selected Google Drive and Facebook. These are the two integrations which scored very well overall and we believe make the most sense to develop.

The Unbabel integration for Google Drive will enable enterprise users to easily request the translation of whole files from within their workspace, thus overcoming language barriers. This tool will enable international managers to easily translate marketing material, manuals, customer facing presentations, etc. Unbabel for Google Drive will be an enterprise grade solution, that will allow the operations team to manage the budget for translation used by the employees. The tool will also be used to translate documents used internally in multinational companies, reducing internal communication difficulties caused by language. Finally, some companies will use it to translate content into other languages to reach new clients with their products. Unbabel already received several requests from customers to create a cloud storage tool/add-on that would work just as any other Google Drive add-on. Some current customers already upload files into a specific “Unbabel translation” folder on a cloud storage service, which they share with Unbabel so that these are quickly and without much additional effort tran

Final results

Both activities had very clear results that point us to next steps to accelerate the company\'s growth. During the first activity, two channels, Google Drive and Facebook, were identified that should be integrated with the Unbabel API. The second activity resulted in the selection of two main interfaces to be distributed and disseminated in the editor community with promising features improving translation speed and quality. Unbabel will now start the integration with Google Drive and Facebook and focus on these channels in customer acquisition. Unbabel will complete the implementations of interfaces and start to promote their use in the editor community.
Besides the very concrete results, the studies also point to other potential for improvement in and to areas that deserve further investigation. These results are:
· Unbabel should focus more on client relationship management proposing holistic solutions to customers that, by now, use Unbabel services only for a subset of their own services. With this approach it is much easier and cost-efficient to tailor solutions to the specific needs of customers improving customer satisfaction and making services even more affordable to end users.
· Concerning the integration of new channels, Unbabel should rely more on the user community to extend the technical platform through which the translation services are available. There is a huge number of platforms available and Unbabel will not be able to address them all in the near future. This has already influenced the decision to go for the platforms with the greatest user bases. Instead of providing integrations for smaller platforms in the future, Unbabel will rather support clients to develop their own integrations – proprietary or, even better, open source. The positive experience with building the editor community, is an inspiration to also foster a community for the Unbabel API. For this end, Unbabel must improve the API to make it more user-friendly and, even more importantly, must stimulate and support the growth of the community.
· In this context, a complete automation of the process is of growing importance.
· The complete pipeline includes machine quality assessment to decide if a result needs further editing or a human consistency check. A lot of attention is paid on those aspects in the research community. Unbabel will exploit and improve those results by adopting them in practice.
· As mentioned, there is still room for improvement on the mobile editor interfaces. Unbabel will study analytics data from the editor community to optimize the interfaces even further to get the maximum of speed and quality out of these interfaces.
· Even more important, however, is the integration of the MT engine with the editor perspective. Until now, these aspects were handled separately by the research community and by us. However, from the use editors make of available translation options, we can learn how to provide options that help editors to reach excellent human quality translation quickly, instead of providing a translation that is “the best” from a merely theoretical perspective.
· Finally, automatic error detection mechanisms should be implemented to hint editors to potential improvement of their work. One of the unexpected results of the UX study is that users often do not correct minor errors if they are not pointed at them. A simple indicator pointing to a potential error – similar to tools found in modern word processing tools like MS Word – therefore have the potential to significantly improve translation quality.

Website & more info

More info: http://www.unbabel.com.