Tisane Adds Spellchecking and Hashtag Parsing

Spellchecking and De-Obfuscation

Tisane API now provides built-in spellchecking, as well as hashtag parsing capabilities. Just like with all the other functions, the functionality is supported across all the languages.

The spellchecking is context-sensitive; for example, as shown on the screenshots below, solmon may be interpreted either as Solomon or salmon, depending on the context.

solmon -> Solomon

solmon -> salmon

Note that we do not provide full contextual spellchecking, when the word is legitimate (for example, they’re vs their, or you’re vs your). The spellchecking is also not supported for the languages not using the white space, such as all flavours of Chinese, Japanese, or Thai. Tisane also skips special entities, such as email addresses, phone numbers, online aliases, and names.

As Tisane API is aimed at abuse detection, the jargon, as well misspelled and/or obfuscated swearwords, is supported.

The spellchecking can be used to detect and correct obfuscated swearwords, as shown on the screenshot below:

Deobfuscating swearwords

The spellchecking / deobfuscating functionality is part of the /parse method. To disable the spellchecking, set the disable_spellcheck setting to true:


...
"settings":{..., "disable_spellcheck":true,...}
...

Hashtag Parsing & Segmentation

The hashtags can now be parsed, whether they contain cues like underscores or camel case, or not, and integrated with the rest of the utterance, as shown on the screenshot below:

Parsing and desegmenting hashtags

The same functionality is used (in a limited scope) to de-obfuscate utterances spelled without spaces, like in the screenshot below:

ihateyou love

The spellchecking / deobfuscating functionality is part of the /parse method.

WARNING: the hashtag parsing is off by default. It is activated by the subscope parameter in the settings:


...
"settings": {"subscope": true}
...

Do not hesitate to contact us for questions and more information. If you are new to Tisane, please sign up here, it’s free up to 50,000 requests a month.