Moderation in the Age of Deplatforming

Parler logo

Watch Your Language

As the cautionary tale of Parler shows, lack of proper moderation may bring down an entire community. All it takes is one incident that may raise the prospect of lawsuits or consumer campaigns to delete your app for everyone involved even indirectly. Nor is it the first time: back in 2018, Gab was forced offline over the Pittsburgh synagogue shooting.

It’s not just the United States. Virtually every jurisdiction today has laws to make sure that online chatter does not cause offline issues. You think you can host your controversial website in Russia? Think again. Roskomnadzor (a Russian federal service in charge of censorship of communications) goes beyond banning neo-Nazi outlets. For example, some websites are forced to replace specific Russian swearwords by “describing what they mean” instead, as in a charade. (Literally descriptions like, “a word made of four letters starting with X that rhymes with Y”.)

And if it weren’t for the threat of deplatforming, users leaving, and pressure from campaigns like Stop Hate for Profit, brands are no longer happy to let their ads be displayed near problematic content. When the №1 advertiser on Facebook considers pulling their spending, no deplatforming is needed to get the message across. With the bulk of social media subsisting on ads, whatcha gonna do when no one wants to advertise with you?

Parler Case Study

A class action suit is basically a courtroom equivalent of an angry mob with torches and pitchforks looking for targets. It does not matter if a party was just marginally involved; if it can be proven that it was involved at all, and can pay, it will likely be named an accomplice.

Suppose there was an incident or a chain of incidents in which people got killed and massive damage was inflicted. The action was planned on a platform built by a small unprofitable startup, hosted on a cloud maintained by a trillion dollar company, and distributed via app stores by other trillion dollar companies. Do you think the lawyers, whose compensation may be contingent on the settlement amount, will miss an opportunity to pick on the big guys? Neither do I.

The reaction of Apple, Google, and Amazon therefore should not be surprising. What’s surprising is that Parler wasn’t deplatformed earlier, given the number of warnings AWS sent (note the obfuscated F-word in the first post; apparently, that’s what the author felt was the worst part of the death threat). Just before the deplatforming, the data dump of Parler’s user content has been obtained, and is now doing circles in the digital forensic community.

Could Parler have survived? They had moderation in place; very loose, but it was there: no porn, no spam, no obscene nicknames. Some observers complained about the policy being inconsistent and arbitrary. For example, a user got banned for mocking Parler, even though it was never against the published rules.

Clearly, what they had in place wasn’t enough. When you are repeatedly told by your hosting company that you have potential law enforcement issues, you better do something about it. Were they paying attention? Variety and others posted conflicting accounts of the former CEO and the shareholders, mentioning discussions about the need to crack down on the darker parts of the gray zone. Whoever said what, it does sound like the subject was raised.

If that’s the case, they knew it was problematic; they were discussing ways to mitigate the issue. But it took too long.

The moral of the story is, don’t let an inconsistent and lacking moderation policy become a trainwreck in slow motion.

Element Case Study

Just around the time when Parler went offline, Google banned Element, a messaging app using a decentralized protocol, from its app store. This time, however, it was only Google, and the ban did not last long. According to the firsthand account of an Element team member in Hacker News:

  • it was an extremely severe violation.
  • after an explanation how Element and Matrix work, Google apologized and reinstated Element.
  • most importantly, both Element and Matrix provide a robust toolset to moderate the Matrix communities.

And that was the end of the story.

The sophistication of Matrix.org moderation toolset is impressive, and should be regarded as a gold standard. The creators of Matrix seem to realize that decentralized != chaos. They have a concept of policy servers, and a component that provides portable moderation and, as they say, “impersonal” rules. Contrast it with the approach of “poop emoji in the comments not OK, death threats OK, but sometimes not, and sometimes we’ll just kick out people who troll us”.

Because proper moderation is, like proper law enforcement, consistent and impersonal.

Tisane API: Moderation Aid Done Differently

Here in Tisane Labs, we build automated tools to tag problematic textual content. Tisane API is not a panacea by any means, but we work hard to avoid the common pitfalls. Our starting point is the need, not what can be done quickly. Some of the capabilities of Tisane appeared an overkill at first, only to be requested later by the users.

Outlined below are the core design principles of Tisane.

Transparency and Granularity by Design

Every alert the system provides contains a built-in rationale, and the actual fragment that caused the alert.

Example Tisane response, complete with explanation
Example Tisane response, complete with explanation

A moderator, with their human biases, likely overworked, underpaid, and possibly coming from a different culture, should not be required to be an arbiter and a professor on ethics. Nor should they be required to figure out what’s wrong with an non-abusive utterance like “I am a black sex worker”.

With Tisane, their task is reduced to verifying that the system interpreted the utterance correctly.

Consistent Decisions

Wouldn’t it be more democratic if every policeman could apply the laws the way they see fit? After all, they know the reality better than some lawmakers far away, right?..

Shocked? An insane idea, isn’t it?

And yet that’s roughly how many automatic moderation systems function today. Some rely on models trained by underpaid human annotators; others make it even worse and let the model be trained by decisions of moderators.

Do you want to get deplatformed because five of ten of your moderators were fixated on the poop emoji and taught the system to dismiss the death threats to public figures?

Tisane’s alerts are strictly aligned with the provided guidelines. Our built-in guidelines rely on principles today regarded as universal or near-universal. We do not tag content critical of a political faction or a public figure (unless they are part of the conversation) as a violation. We avoid political decisions; the management team of Tisane Labs, as well the rest of staff is made of people with very different opinions, mindsets, and backgrounds.

That said, if your community wants to steer clear of a particular topic (like the Peloton communities avoiding political discussions), there is a topic detection and entity extraction mechanism. Rather than providing a simple classifier, we follow the “Swiss army knife” approach.

Simplicity

We avoid floating point “confidence thresholds”.

It may give the developers a false sense of control, but it doesn’t make the decisions more accurate, let alone transparent. We view it as an equivalent of security theater in the airports.

Privacy and Data Security: On-prem

Sometimes, the conversations in the community may be more discreet. Whatever the policies of the vendor are, whatever jurisdictions they must be compliant with, some communities want to be shielded from changes in the legislation or intrusion into privacy. On the other hand, some communities have specific requirements, and need to construct their own guidelines.

Which is why Tisane has an option for an on-prem single tenant installation, with the ability to adhere to custom policies. Contact us to request more information.

Pricing

Moderation is important, but investing in fences and locks more than the house is worth makes no sense. We realize that moderation is akin to insurance policy, and keep our prices affordable.

To try Tisane, sign up here. The free plan does not expire.

Conclusion

Moderation is no longer a luxury. Even in cases when it’s not a legal requirement, running an online community without a moderation solution in place is akin to driving without insurance or never locking your house.

You may get away with it for a while, but as your community grows, the probability that bad actors abuse your platform is too high to ignore.

Quickwork Releases Tisane Slack Plug-in

Our partners at Quickwork, a platform that allows building any workflow automation in minutes, integrated Tisane API .

Tisane can now be linked to thousands of applications and APIs using Quickwork’s simple and powerful UI, no coding skills required. For example, use Quickwork’s moderation plug-in for Slack using Tisane. See the video of the Slack plug-in in action here.

Quickwork is an enterprise-grade, ISO 27001 and GDPR compliant, no code, API-based, SaaS platform with an extensive range of pre-integrated business- and consumer- apps to solve various automation problems. It offers three key capabilities – Integration, API Management & Conversation Management as a Service. This platform’s Real-time messaging & API based architecture makes it easier to build on these capabilities to offer better workflow solutions and customer experience.

Tisane Labs’ Solutions on Microsoft Azure Add Wikidata Extraction Feature

Published on Marketwatch and Yahoo! Finance

Wikimedia-Azure

SINGAPORE, Oct. 13, 2020 /PRNewswire/ — Tisane Labs, a supplier of text analytics AI solutions, today announced a new feature in Tisane API, already available on Microsoft Azure Marketplace and AppSource. With the new feature, Tisane API now allows tagging and extraction of Wikidata entities, complementing the capabilities provided by Azure Cognitive Services and supporting nearly 30 languages.

“Wikidata allows utilizing Wikipedia knowledge in ways never explored before, but there’re not many ways to get Wikidata references from unstructured text,” said Vadim Berman”Now, with the new feature of Tisane API built on Azure, our users can easily obtain Wikidata IDs from Tisane’s JSON response. Imagine being able to annotate text with images, GPS coordinates, important dates, 3rd party references, and whatever the ever-growing and open Wikidata database contains.”

“Microsoft Azure Marketplace and AppSource lets customers worldwide discover, try, and deploy software solutions that are certified and optimized to run on Azure,” said Sajan Parihar, Senior Director, Microsoft Azure Platform at Microsoft Corp. “Azure Marketplace and AppSource helps solutions like Tisane API reach more customers and markets.”

Tisane API runs in the cloud utilizing Azure API Management, with a simple REST interface that can be linked from any popular programming platform today. Tisane Labs provides a range of tailored plans for its clients with the option of a custom installation on-premises and a free plan.

To try Tisane API, visit https://tisane.ai.

For more information, email press@tisane.ai.

Chai Update: Explainability, Name Validation, New Entities & Formats, Wikidata

The next update after Boldo has to start with C, and so it’s Chai this time (no, corona does not qualify).

It has been a busy year. With more users and new use cases come more feature requests, and we worked hard to implement them.

Explainability

In the moderation space, it helps to provide a cue why the system classified a message as problematic. Human moderators are often stressed, overworked, and overwhelmed, while natural language processing is bound not to be 100% error-free. Reducing their task to a simple “sanity check” whether the system understood the utterance makes them more productive and more consistent. When the explain setting is set to true, Tisane provides a short snippet describing the reasoning (settings: read more here).

JSON response including an explanation
Pinpointing problematic content

Name Parsing & Validation

Many communities require users to enter real names. Some users prefer not to, for different reasons. In some cases, there is a need to break down a full name into constituents, such as given name, surname, middle name, etc.

Tisane now provides methods to:

  • parse names into constituents, extracting the components
  • validate names, tagging names of important people, spiritual beings, and fictional characters. No more Boba Fett or Satan as a real name!
  • compare names in the same or different languages, producing a list of differences
JSON response recognizing a name as that of an 'important person'
JSON response recognizing a name as that of an “important person”

Read more about name-related methods.

Wikidata Everywhere

Wikidata IDs are now supported at the level of entities, topics, and even some non-entity words.

Imagine deducing geographic coordinates, Britannica article ID, or semantic properties from text that went through Tisane!

New Formats, Entity Types, Abuse Types

Different content formats may have different logic tied to them, especially when the context is lacking. With new requirements, we added two new formats:

  • search to screen search queries
  • alias to screen user names in online communities
Example of search query screening

With more law enforcement vendors adopting Tisane, we were asked to add entity types of interest to the law enforcement, specifically:

  • weight
  • bank_account 
  • credit_card, including subtypes representing the major credit card types
  • credential, with optional subtypes md5 and sha-1
  • crypto for major cryptocurrency addresses

and more. See all the types here.

We also added two more types to tag problematic content:

  • adult_only, for activities allowed in adult communities but restricted for minors (e.g. consuming alcohol)
  • mental_issues, for signs of depression and suicidal thoughts

Miscellaneous

As always, we keep honing and enhancing our language models, as well as throughput and stability of Tisane.

Tisane Adds Spellchecking and Hashtag Parsing

Spellchecking and De-Obfuscation

Tisane API now provides built-in spellchecking, as well as hashtag parsing capabilities. Just like with all the other functions, the functionality is supported across all the languages.

The spellchecking is context-sensitive; for example, as shown on the screenshots below, solmon may be interpreted either as Solomon or salmon, depending on the context.

solmon -> Solomon

solmon -> salmon

Note that we do not provide full contextual spellchecking, when the word is legitimate (for example, they’re vs their, or you’re vs your). The spellchecking is also not supported for the languages not using the white space, such as all flavours of Chinese, Japanese, or Thai. Tisane also skips special entities, such as email addresses, phone numbers, online aliases, and names.

As Tisane API is aimed at abuse detection, the jargon, as well misspelled and/or obfuscated swearwords, is supported.

The spellchecking can be used to detect and correct obfuscated swearwords, as shown on the screenshot below:

Deobfuscating swearwords

The spellchecking / deobfuscating functionality is part of the /parse method. To disable the spellchecking, set the disable_spellcheck setting to true:


...
"settings":{..., "disable_spellcheck":true,...}
...

Hashtag Parsing & Segmentation

The hashtags can now be parsed, whether they contain cues like underscores or camel case, or not, and integrated with the rest of the utterance, as shown on the screenshot below:

Parsing and desegmenting hashtags

The same functionality is used (in a limited scope) to de-obfuscate utterances spelled without spaces, like in the screenshot below:

ihateyou love

The spellchecking / deobfuscating functionality is part of the /parse method.

WARNING: the hashtag parsing is off by default. It is activated by the subscope parameter in the settings:


...
"settings": {"subscope": true}
...

Do not hesitate to contact us for questions and more information. If you are new to Tisane, please sign up here, it’s free up to 50,000 requests a month.

Zendesk Integrates Tisane in Smooch

Tisane API can be now used to moderate chat messages in Zendesk’s Smooch platform.

Zendesk is a world leader in customer service software. Smooch is a conversational platform that collates messages across web, mobile, and social messaging and combines user activity and existing profile data, enabling admins to create more tailored experiences. A hotel, for instance, could give guests the ability to ping staff on-property, and an online retailer could manage issues like incorrect shipments and returns across channels.

With support of Tisane, Zendesk users can:

  • Tag and classify abusive content, like profane and non-profane insults, hate speech, sexual harassment, criminal activity.
  • Track and tag attempts to establish external contacts between the users.
  • Extract topics, entities, and more.

The Tisane integration is aimed at:

  • Online communities looking to improve and streamline their moderation process.
  • Business chat operators trying to block abusive, hostile and irrelevant content.
  • Enterprises trying to keep the online conversations compliant with HR regulations and abuse-free.
  • e-Commerce portals that need to track and/or prevent deals bypassing them.

Visit the Tisane integration page on Smooch and sign up to start.

New Features: Boldo Update

Photo by Praveen (CC BY 2.0)

Following the requests of our users, we implemented additional features. They are now active; feel free to kick the tires. (And yes, our updates are named after herbal teas.)

Detecting Attempts to Establish External Contact

In some communities, external contacts must be monitored. Marketplaces, communities with some kind of harassment issues, scammers attempting to lure users out, and so on. As of today, a common simple solution is to scan messages using regular expressions and find phone numbers and emails. That, however, is insufficient, as the users often find ways to bypass these checks, or introduce non-standard formatting.

We now detect these attempts and place the detected snippets with the external_contact type. For example, a request to provide an email, a WhatsApp number, and so on (“we need your email”, “wat is ur whats app”, etc.).

Signal to Noise Ranking

If you need to create a summary or write a report about how a particular topic or brand is reflected in the social media, the sheer amount of posts that need to be processed is often overwhelming. What’s worse, 95% of these are not much help. They either copy other people’s thoughts, are completely off-topic, contain all kinds of abuse, or just vent frustration and negative emotions. Same goes for the comments: with a few pearls, many are just background noise.

The signal to noise ratio is not unlike conventional search engine rankings, but better adapted for the social media content needs.

The ranking prioritises posts related to the specified concepts and domains, and penalises off-topic content and abuse.

In order to compute the signal to noise ranking, provide an array of concept IDs (family IDs) in your settings under the relevant attribute (e.g. “relevant”: [12345,6789]).

Native Topic Standard Overhaul

While we support taxonomy standards like IPTC and IAB, our internal taxonomy is much richer. The topics that don’t appear in IPTC and IAB can be exposed using the native topic mode (code: native). Previously, it was used for internal purposes only, and contained numeric codes.

After this update, they contain English descriptions, and the taxonomy was also expanded.

Topic Optimization

Some of the topics may overlap. “Compound” topics like cryptocurrency may imply other topics like finance and software. Depending on your application, you may or may not need these “constituent” topics.

The optimize_topics parameter allows control over how it’s presented. For example, when analyzing a sentence like “exchange btc to xmr”, and the optimize_topics is set to false, we get:

  {
    "text": "exchange btc to xmr",
    "topics": [
       "money",
       "commerce",
       "business",
       "finance",
       "software",
       "currency",
       "cryptocurrency"
    ]
  }

When the parameter is set to true, we get:

  {
    "text": "exchange btc to xmr",
    "topics": [
       "cryptocurrency"
    ]
  }

Format-Sensitive Logic

We had to learn the hard way that it matters where the text is coming from.

A simple example. A single word like “fool” may be a title, in which case it bears a negative connotation, but not a personal attack. However, when posted as a part of a dialogue (e.g. in a comment in Instagram), it is a personal attack.

We introduced support of different logic for different formats.

Feature Default Format Changed to Universal Dependencies

Tisane supports several standards to display grammar features, such as Penn, Universal Dependencies, Glossing Abbreviations, and the native codes and descriptions. We saw that the original glossing abbreviation format was confusing for many users, and changed the default to Universal Dependencies.

If you prefer to do so, you can still use “glossing” to obtain glossing abbreviations.

Document-Level Sentiment

While we stress that the aspect-based sentiment analysis provides more actionable intelligence, we added a document-level attribute for certain scenarios. Add “document_sentiment”:true to the settings to obtain the document-level sentiment value in range -1 (most negative) thru 1 (most positive). It will be placed in the sentiment attribute.

Contact us for questions and more information. If you are new to Tisane, please sign up here, it’s free.

IMLD 2019: “Once words are written, they can’t be taken back”

For the International Mother Language Day 2019, 21st February, we interviewed Tisane Labs CEO, Vadim Berman.

Q: Vadim, you are a co-founder and the CEO of Tisane Labs, that provides text analytics for over 20 languages. How did you get involved with languages?

Vadim Berman, CEO, Tisane Labs

VB: One could describe me as a “serial immigrant” or “ethnically confused” 😊. I lived in 6 countries; my first move happened back in 1991 from then-Soviet Union to Israel, when I was in my teens. I had to master Hebrew at school; I was also the family translator. I was upset about the fact that my English, which I thought was pretty good, was nowhere near the local level. I sat with a textbook and then forced myself to manage a diary in English.

Later, when I started my career as a software engineer, I became obsessed with tying between languages and software. The first push to explore the world of natural language processing was an article in Wired back in 2000 called Talking to Strangers. (Looks like it’s still there: https://www.wired.com/2000/05/translation-2/ – little did I know that I would meet some of the people mentioned there in person.) The second push was a speculative fiction story by Jorge Luis Borges called TlĂśn, Uqbar, Orbis Tertius, which describes an imaginary world whose inhabitants deny reality, and as a result, use languages without nouns. I was wondering how software could handle this kind of language.

I had zillions of ideas and wanted to try them all. I thought the combination of my linguistic and software development skills gives me an advantage. I started experimenting with machine translation and extraction of meaning. The interest became an obsession, the obsession then became a living, and in a few years, already living in Australia, I cofounded LinguaSys, an American venture which was later acquired by Aspect Software, a US-based multinational. I met much of my current team in LinguaSys. I went on to start Tisane Labs in 2017, after I left Aspect.

 

Q: Tisane stands apart supporting so many different languages. Can you explain why?

VB: In one colloquial sentence, because we can and because we have to.

The decision to focus on multilingualism was both motivated by the economics and the possible applications. There is a shared linguistically neutral core, and so a new language is not started from zero. There are multiple devices to easily reuse shared elements between language models.

When the less mainstream languages take less effort to add, they become more economically viable. As these markets are often underserved, we can benefit from less competition and more coverage.

While there are many use cases when one or two languages are enough, in many scenarios like hospitality industry, no matter how small your business is, ignoring most languages means ignoring a significant segment of your customer base.

 

Q: Can you give us some interesting examples of something similar / different among these languages?

VB: Funny enough, the biggest similarity is that in every language, many native speakers believe that their language is the most unique, the most complex, and deserves the most special treatment.

My focus is on the inner workings of languages. I see them not as an amorphous bag of words, but as a set of cogs, levers, and pulleys.

On the structural level, after a while it looks like different languages borrow from more or less the same bag of tricks. There are lots of differences, of course, but when we add support to a new language, we find that a new phenomenon can be handled the same way as a seemingly different phenomenon in a different language. For example, when you think of it, English compound verbs (like “give away” or “tell off”) behave somewhat similarly to the German and Dutch separable verbs (e.g. “ankommen” in German), and so the ways we handled both are very similar.

All in all, languages are influenced by the national mindset. If the speakers see a need for a word or a structure, it will be invented. If they like nuance, one word will have less interpretations. At one point, my team was working with hotel and restaurant reviews, which are a pretty good representation of how an average person thinks. We had a blast looking at all the different aspects. There were lots of Swedish reviews for some reason obsessed with glass doors in the bathroom. Russians were keen to write long, creative travelogues. My personal favourite was a French review of a fine dining establishment, which was ending with the following, “It was almost perfect. Almost! The tiny flaw we found was that the antenna of the lobster was broken. All the rest was wonderful, and I would’ve given 9.5 out of 10, but because there is no way to deduct a fraction of a point, I give 9 out of 10.”

 

Q: The International Mother Language Day focuses on indigenous languages this year. What do you think AI could bring to indigenous languages?

VB: Most importantly, the promise not to be forgotten.

By some estimates, every two weeks a language dies with its last speaker, and between half to every 9th language is predicted to disappear by the end of the century.

If we catalogue the language somewhere, somehow, it will exist. There is a Russian saying which can be roughly translated as, “once words are written, they can’t be taken back” (in Russian, “что написано пером, не вырубишь топором”). Today, with better tools, we can preserve context and samples, we can analyse its structure and derive conclusions about its origin and learn more about the culture that created it. Maybe even be used to decipher or understand another language, as MIT researcher Regina Barzilay demonstrated with Ugaritic in 2010 (http://news.mit.edu/2010/ugaritic-barzilay-0630).

A well-documented dead language may also come back to life. There are multiple examples of dead languages that were revived: Cornish, Dalmatian, Hebrew, and more.

As AI software today is an essential part of the modern infrastructure, the lack of software support causes native speakers to “abandon ship” and adds the risk of a language to become marginalised and eventually disappear. Monolingual speakers of poorly supported languages are somewhat locked outside the global discourse.

Sadly, the natural language processing today is mostly data-hungry, and therefore, it takes a lot of time for the advances to trickle down. We see it as our mission to change that.

Tisane API Integrated with PubNub

PubNub, the company behind the world’s leading realtime Data Stream Network (DSN), added Tisane API as a supported component in its catalog. The Tisane Labs Natural Language Processing Block runs serverlessly in the PubNub network, joining the blocks released by Microsoft, Amazon, IBM, ESRI, and more. The block fully supports the original Tisane functionality, including:

  • Detection of personal attacks and cyberbullying, hate speech, criminal activities, sexual harassment
  • Topic modelling, compliant with IPTC and IAB standards
  • Sentiment analysis 2.0
  • Entity extraction

And more.

The PubNub Data Stream Network powers thousands of apps, streaming 1.9 Trillion messages to over 330 million devices a month, with powerful and extensible frameworks like PubNub ChatEngine™ .