Machine translation has experienced a rise over the last years, and experts predict this is just the beginning of a new age for AI-powered translation businesses. AI machine translation has displaced statistical machine translation, rule-based translation programs, and similar technology. In most languages, MT approaches human translation quality.
A recent survey on machine translation deployment revealed an ever-widening integration of technology. Even if no one should be surprised, it seems that widespread skepticism surrounds claims that machine translation matches human parity. We noticed a widespread desire for conversational machine translation to fit several purposes, which requires further guidance on the how and when. These ideas result from the real improvements technology achieved over the last years.
Market research revealed three trends expected to drive advancement in machine translation
- Increased integration of machine learning as a built-in service for other applications. MT serves numerous purposes and varied audiences
- A shift to context-driven machine translation. Some tech developers think of context as large chunks of text, but research shows that it can address various instances that can trigger significant improvements in the sector.
- The development of metadata-aware machine translation. Few machine translation engineers rely on metadata in their training, but this will change in the future because MT should be able to consider all aspects from age, gender, and location of the speaker when registering and translating texts.
All rolled into one; the above trends show that machine translation could respond intelligently to the market’s needs at several levels and deliver high-quality results for set contexts. However, even if artificial intelligence and technology have progressed in recent years due to improvements in data science, neural networks, and machine learning, some persistent challenges still hold back MT progression.
Frontiers machine learning must cross to match human parity
Long-tail languages
Machine translation plays a paramount role in enabling organizations to address audiences in their own languages. This implies communicating in long-tail languages or niche languages for conversational MT. Long-tail languages are less frequently used, and machine translation developers didn’t focus on developing services suitable for them for a long time. But with the rise of globalization, organizations need to rely on machine translation software to communicate with their public in their native languages, whatever they may be. Long-tail languages usually refer to a group of languages only a few thousands or millions of people use. But they include idioms that can cause blockers for AI machine translation.
Why do long-tail languages pose challenges to MT?
Commonly used languages like English, French, Spanish, or Dutch have huge data resources developers can use to build or improve machine translation capabilities. On the other hand, long-tail languages lack these data resources or have low-quality databases that negatively affect machine translation. NMT technology is the card up developers’ sleeve for difficult languages like Russian and Japanese and paves the way for long-tail communications systems. It transforms the traditional approach to create direct models for complex languages by developing improvements powered by human translators.
Word accuracy
Data science progressed significantly over the last period, resulting in fast and effective computing technology that can analyze more data in shorter time-frames. This trend causes an improved word accuracy in AI translation, and specialists are optimistic that tech can achieve a higher level of accuracy in the future, essential for professional use. But word accuracy doesn’t solve all AI machine translation issues because humans don’t communicate using isolated words. They pair words to obtain particular meanings and construct paragraphs and works of writing or speech. People offer context, use articulation and tone, imply meaning, use metaphors and comparatives, and rely on irony and satire to convey a message. AI fails to detect and translate all these language characteristics, but it can get smarter with the advent of technology and data science.
Use of idioms
It’s raining chair legs. Earth and sand are falling. It’s throwing frogs. You’re probably confused with the meaning of the previous three phrases. It’s understandable to miss their sense if English is your native language because they’re idioms for “It’s raining cats and dogs”, in Greek, Japanese, and Polish. Now, if you didn’t understand their meaning, imagine how AI could do it. However, organizations rely on local idioms to create an instant connection with users because they’re rooted in the history, culture, and general mindset of a society. Even if machine translation software is multifaceted these days due to the high availability of data and deep neural networks, it still finds it challenging to translate these set expressions.
An MT system would most likely fail to handle the mentioned idioms. However, machine translation developers like Pangeanic are aware that organizations rely on local set expressions to connect with their audiences and strive to develop truly conversational systems by integrating thousands of idioms in software to help it understand and make connections between cultural references from different cultures languages. Conversational machine translation might be the next best thing since sliced bread if it succeeds in transferring language subtleties. Pangeanic’s CEO Manuel Herranz states,”Trust me, it’ll take a lot of such examples for AI to learn or least, give me the benefit of the doubt!” Until MT systems manage to apprehend the meaning of local idioms, organizations should rely on using them for medical, legal, or science instances where literality is expected and leave creative literature texts to human translators.
Algorithmic bias
Algorithmic bias is a well-known issue within AI technology and is challenging to overcome. The hard truth is that algorithmic bias is taken over from people who create and use them in the social environment they live in. With AI machine translation, the most common form of bias relates to gender (for example, people assume that doctors are male and nurses are female). The problem is complex because several languages don’t use gender pronouns, and MT software has to find solutions to translate accurately when gender is relevant.
Crossing the final barriers is possible
Considering the widespread success of NMT, it’s expected technology to triumph over the mentioned barriers in the following years.
This article was written in cooperation with Pangeanic