The next update after Boldo has to start with C, and so it’s Chai this time (no, corona does not qualify).
It has been a busy year. With more users and new use cases come more feature requests, and we worked hard to implement them.
Explainability
In the moderation space, it helps to provide a cue why the system classified a message as problematic. Human moderators are often stressed, overworked, and overwhelmed, while natural language processing is bound not to be 100% error-free. Reducing their task to a simple “sanity check” whether the system understood the utterance makes them more productive and more consistent. When the explain
setting is set to true, Tisane provides a short snippet describing the reasoning (settings: read more here).
Name Parsing & Validation
Many communities require users to enter real names. Some users prefer not to, for different reasons. In some cases, there is a need to break down a full name into constituents, such as given name, surname, middle name, etc.
Tisane now provides methods to:
- parse names into constituents, extracting the components
- validate names, tagging names of important people, spiritual beings, and fictional characters. No more Boba Fett or Satan as a real name!
- compare names in the same or different languages, producing a list of differences
Read more about name-related methods.
Wikidata Everywhere
Wikidata IDs are now supported at the level of entities, topics, and even some non-entity words.
Imagine deducing geographic coordinates, Britannica article ID, or semantic properties from text that went through Tisane!
New Formats, Entity Types, Abuse Types
Different content formats may have different logic tied to them, especially when the context is lacking. With new requirements, we added two new formats:
search
to screen search queriesalias
to screen user names in online communities
With more law enforcement vendors adopting Tisane, we were asked to add entity types of interest to the law enforcement, specifically:
-
weight
-
bank_account
-
credit_card
, including subtypes representing the major credit card types credential
, with optional subtypesmd5
andsha-1
crypto
for major cryptocurrency addresses
and more. See all the types here.
We also added two more types to tag problematic content:
adult_only
, for activities allowed in adult communities but restricted for minors (e.g. consuming alcohol)mental_issues
, for signs of depression and suicidal thoughts
Miscellaneous
As always, we keep honing and enhancing our language models, as well as throughput and stability of Tisane.