Chai Update: Explainability, Name Validation, New Entities & Formats, Wikidata

The next update after Boldo has to start with C, and so it’s Chai this time (no, corona does not qualify).

It has been a busy year. With more users and new use cases come more feature requests, and we worked hard to implement them.

Explainability

In the moderation space, it helps to provide a cue why the system classified a message as problematic. Human moderators are often stressed, overworked, and overwhelmed, while natural language processing is bound not to be 100% error-free. Reducing their task to a simple “sanity check” whether the system understood the utterance makes them more productive and more consistent. When the explain setting is set to true, Tisane provides a short snippet describing the reasoning (settings: read more here).

JSON response including an explanation
Pinpointing problematic content

Name Parsing & Validation

Many communities require users to enter real names. Some users prefer not to, for different reasons. In some cases, there is a need to break down a full name into constituents, such as given name, surname, middle name, etc.

Tisane now provides methods to:

  • parse names into constituents, extracting the components
  • validate names, tagging names of important people, spiritual beings, and fictional characters. No more Boba Fett or Satan as a real name!
  • compare names in the same or different languages, producing a list of differences
JSON response recognizing a name as that of an 'important person'
JSON response recognizing a name as that of an “important person”

Read more about name-related methods.

Wikidata Everywhere

Wikidata IDs are now supported at the level of entities, topics, and even some non-entity words.

Imagine deducing geographic coordinates, Britannica article ID, or semantic properties from text that went through Tisane!

New Formats, Entity Types, Abuse Types

Different content formats may have different logic tied to them, especially when the context is lacking. With new requirements, we added two new formats:

  • search to screen search queries
  • alias to screen user names in online communities
Example of search query screening

With more law enforcement vendors adopting Tisane, we were asked to add entity types of interest to the law enforcement, specifically:

  • weight
  • bank_account 
  • credit_card, including subtypes representing the major credit card types
  • credential, with optional subtypes md5 and sha-1
  • crypto for major cryptocurrency addresses

and more. See all the types here.

We also added two more types to tag problematic content:

  • adult_only, for activities allowed in adult communities but restricted for minors (e.g. consuming alcohol)
  • mental_issues, for signs of depression and suicidal thoughts

Miscellaneous

As always, we keep honing and enhancing our language models, as well as throughput and stability of Tisane.