IMLD 2019: “Once words are written, they can’t be taken back”

For the International Mother Language Day 2019, 21st February, we interviewed Tisane Labs CEO, Vadim Berman.

Q: Vadim, you are a co-founder and the CEO of Tisane Labs, that provides text analytics for over 20 languages. How did you get involved with languages?

Vadim Berman, CEO, Tisane Labs

VB: One could describe me as a “serial immigrant” or “ethnically confused” 😊. I lived in 6 countries; my first move happened back in 1991 from then-Soviet Union to Israel, when I was in my teens. I had to master Hebrew at school; I was also the family translator. I was upset about the fact that my English, which I thought was pretty good, was nowhere near the local level. I sat with a textbook and then forced myself to manage a diary in English.

Later, when I started my career as a software engineer, I became obsessed with tying between languages and software. The first push to explore the world of natural language processing was an article in Wired back in 2000 called Talking to Strangers. (Looks like it’s still there: https://www.wired.com/2000/05/translation-2/ – little did I know that I would meet some of the people mentioned there in person.) The second push was a speculative fiction story by Jorge Luis Borges called Tlön, Uqbar, Orbis Tertius, which describes an imaginary world whose inhabitants deny reality, and as a result, use languages without nouns. I was wondering how software could handle this kind of language.

I had zillions of ideas and wanted to try them all. I thought the combination of my linguistic and software development skills gives me an advantage. I started experimenting with machine translation and extraction of meaning. The interest became an obsession, the obsession then became a living, and in a few years, already living in Australia, I cofounded LinguaSys, an American venture which was later acquired by Aspect Software, a US-based multinational. I met much of my current team in LinguaSys. I went on to start Tisane Labs in 2017, after I left Aspect.

 

Q: Tisane stands apart supporting so many different languages. Can you explain why?

VB: In one colloquial sentence, because we can and because we have to.

The decision to focus on multilingualism was both motivated by the economics and the possible applications. There is a shared linguistically neutral core, and so a new language is not started from zero. There are multiple devices to easily reuse shared elements between language models.

When the less mainstream languages take less effort to add, they become more economically viable. As these markets are often underserved, we can benefit from less competition and more coverage.

While there are many use cases when one or two languages are enough, in many scenarios like hospitality industry, no matter how small your business is, ignoring most languages means ignoring a significant segment of your customer base.

 

Q: Can you give us some interesting examples of something similar / different among these languages?

VB: Funny enough, the biggest similarity is that in every language, many native speakers believe that their language is the most unique, the most complex, and deserves the most special treatment.

My focus is on the inner workings of languages. I see them not as an amorphous bag of words, but as a set of cogs, levers, and pulleys.

On the structural level, after a while it looks like different languages borrow from more or less the same bag of tricks. There are lots of differences, of course, but when we add support to a new language, we find that a new phenomenon can be handled the same way as a seemingly different phenomenon in a different language. For example, when you think of it, English compound verbs (like “give away” or “tell off”) behave somewhat similarly to the German and Dutch separable verbs (e.g. “ankommen” in German), and so the ways we handled both are very similar.

All in all, languages are influenced by the national mindset. If the speakers see a need for a word or a structure, it will be invented. If they like nuance, one word will have less interpretations. At one point, my team was working with hotel and restaurant reviews, which are a pretty good representation of how an average person thinks. We had a blast looking at all the different aspects. There were lots of Swedish reviews for some reason obsessed with glass doors in the bathroom. Russians were keen to write long, creative travelogues. My personal favourite was a French review of a fine dining establishment, which was ending with the following, “It was almost perfect. Almost! The tiny flaw we found was that the antenna of the lobster was broken. All the rest was wonderful, and I would’ve given 9.5 out of 10, but because there is no way to deduct a fraction of a point, I give 9 out of 10.”

 

Q: The International Mother Language Day focuses on indigenous languages this year. What do you think AI could bring to indigenous languages?

VB: Most importantly, the promise not to be forgotten.

By some estimates, every two weeks a language dies with its last speaker, and between half to every 9th language is predicted to disappear by the end of the century.

If we catalogue the language somewhere, somehow, it will exist. There is a Russian saying which can be roughly translated as, “once words are written, they can’t be taken back” (in Russian, “что написано пером, не вырубишь топором). Today, with better tools, we can preserve context and samples, we can analyse its structure and derive conclusions about its origin and learn more about the culture that created it. Maybe even be used to decipher or understand another language, as MIT researcher Regina Barzilay demonstrated with Ugaritic in 2010 (http://news.mit.edu/2010/ugaritic-barzilay-0630).

A well-documented dead language may also come back to life. There are multiple examples of dead languages that were revived: Cornish, Dalmatian, Hebrew, and more.

As AI software today is an essential part of the modern infrastructure, the lack of software support causes native speakers to “abandon ship” and adds the risk of a language to become marginalised and eventually disappear. Monolingual speakers of poorly supported languages are somewhat locked outside the global discourse.

Sadly, the natural language processing today is mostly data-hungry, and therefore, it takes a lot of time for the advances to trickle down. We see it as our mission to change that.