boldo – Tisane Labs, language, and everything else

Following the requests of our users, we implemented additional features. They are now active; feel free to kick the tires. (And yes, our updates are named after herbal teas.)

Detecting Attempts to Establish External Contact

In some communities, external contacts must be monitored. Marketplaces, communities with some kind of harassment issues, scammers attempting to lure users out, and so on. As of today, a common simple solution is to scan messages using regular expressions and find phone numbers and emails. That, however, is insufficient, as the users often find ways to bypass these checks, or introduce non-standard formatting.

We now detect these attempts and place the detected snippets with the external_contact type. For example, a request to provide an email, a WhatsApp number, and so on (“we need your email”, “wat is ur whats app”, etc.).

Signal to Noise Ranking

If you need to create a summary or write a report about how a particular topic or brand is reflected in the social media, the sheer amount of posts that need to be processed is often overwhelming. What’s worse, 95% of these are not much help. They either copy other people’s thoughts, are completely off-topic, contain all kinds of abuse, or just vent frustration and negative emotions. Same goes for the comments: with a few pearls, many are just background noise.

The signal to noise ratio is not unlike conventional search engine rankings, but better adapted for the social media content needs.

The ranking prioritises posts related to the specified concepts and domains, and penalises off-topic content and abuse.

In order to compute the signal to noise ranking, provide an array of concept IDs (family IDs) in your settings under the relevant attribute (e.g. “relevant”: [12345,6789]).

Native Topic Standard Overhaul

While we support taxonomy standards like IPTC and IAB, our internal taxonomy is much richer. The topics that don’t appear in IPTC and IAB can be exposed using the native topic mode (code: native). Previously, it was used for internal purposes only, and contained numeric codes.

After this update, they contain English descriptions, and the taxonomy was also expanded.

Topic Optimization

Some of the topics may overlap. “Compound” topics like cryptocurrency may imply other topics like finance and software. Depending on your application, you may or may not need these “constituent” topics.

The optimize_topics parameter allows control over how it’s presented. For example, when analyzing a sentence like “exchange btc to xmr”, and the optimize_topics is set to false, we get:

  {
    "text": "exchange btc to xmr",
    "topics": [
       "money",
       "commerce",
       "business",
       "finance",
       "software",
       "currency",
       "cryptocurrency"
    ]
  }

When the parameter is set to true, we get:

  {
    "text": "exchange btc to xmr",
    "topics": [
       "cryptocurrency"
    ]
  }

Format-Sensitive Logic

We had to learn the hard way that it matters where the text is coming from.

A simple example. A single word like “fool” may be a title, in which case it bears a negative connotation, but not a personal attack. However, when posted as a part of a dialogue (e.g. in a comment in Instagram), it is a personal attack.

We introduced support of different logic for different formats.

Feature Default Format Changed to Universal Dependencies

Tisane supports several standards to display grammar features, such as Penn, Universal Dependencies, Glossing Abbreviations, and the native codes and descriptions. We saw that the original glossing abbreviation format was confusing for many users, and changed the default to Universal Dependencies.

If you prefer to do so, you can still use “glossing” to obtain glossing abbreviations.

Document-Level Sentiment

While we stress that the aspect-based sentiment analysis provides more actionable intelligence, we added a document-level attribute for certain scenarios. Add “document_sentiment”:true to the settings to obtain the document-level sentiment value in range -1 (most negative) thru 1 (most positive). It will be placed in the sentiment attribute.

Category: boldo

New Features: Boldo Update