Classification Setup#

Mailytica’s Classification Setup offers various methods to classify messages. You can choose from keyword detection to AI models, depending on your specific needs.

To get started, you need to have already created a topic as explained in the Add a New Topic section. Then, in the Edit section of your topic, select Classification Setup from the top tab bar.

Add a new classification model

Click on the button to add a classification model. A new screen will appear with all the classification options for your topic.

Artificial Intelligence Classification#

With Mailytica, you can use artificial intelligence to easily classify your emails by topic or business transaction. The ability to learn from unstructured data is one of the major components of artificial intelligence. As a result, Mailytica is designed to learn by itself, by analysing email data for patterns.

These characteristics make Mailytica’s AI classification system powerful, but it still needs a set of email messages to train the AI model to recognize the email topic. The training process includes using an initial set of at least 10 labelled messages, although the recommended minimum amount is 30 for better results.

Mailytica uses machine learning techniques to generate statistical models from email messages. An important step involved in this process is training the AI model, so that it understands which type of messages should be labelled as that particular topic or business transaction.

Once trained, the model will be able to automatically classify incoming messages into pre-established topics.

The following steps are required to create an Artificial Intelligence Classification model:

  1. First, in the Classification Setup section, select Artificial Intelligence Classification from the top tab bar. The Artificial Intelligence Classification tab will show you the instructions on how to define your AI model. Click Next.

    Add a new AI classification model
  2. Under the Training Language section, you can select the language of your labelled messages. Mailytica will use these messages to understand the topics and then provide labels and predictions for new emails. For example, if you have a sample of emails already labelled in English with the topic Delivery Status, your training language would be English.

    Select AI model language
  3. The Training Datasets section allows you to choose which messages Mailytica will use when training the model. Typically, training datasets are defined during the supervised training or manual labelling process. But with Mailytica, datasets can also be automatically created. For example, if you manually label your email messages, they will be saved to the Gold dataset. Similarly, if you use a dictionary model to classify your messages, they will be saved in the Dictionary dataset.

  4. Next, under Maximum number of messages, you can define how many labelled messages will be used to train the AI classification model. The recommend default value is 1000. Additionally, you have the option to further refine the selection of messages using a percentile value. Setting this percentile value can be useful for discarding outlier data that might negatively impact model predictions. For more information on how to detect outliers, please visit the Messages section.

    Select maximum number of messages for AI classification model
  5. Then, in Minimum number of a message’s token, you can establish a minimum number of tokens necessary for classification. The recommended default value is 3. Tokens are just individual words in any email message. Setting the minimum value to 3 means that a message must contain at least 3 words to be considered for training.

    Select minimum number of tokens for AI classification model
  6. Following, in Features you can select which parts of the email message the AI classification model should consider. Choose from the available options, such as the email body, subject, attachment content, and/or attachment filenames. You have the flexibility to make multiple selections from these options. There are three distinct configurations to consider:

    • New Inbound Message: Choose which parts of a new incoming email (not a reply) should be considered for prediction.

    • Reply Inbound Message: Determine which parts of a reply email should be considered for prediction.

    • Message for Training: Select which parts of an email should be used during the training phase for the AI model.

    Select email features
  7. Then, in Term discrimination you can choose the term discrimination type between TF-IDF (Term Frequency - Inverse Document Frequency) or TF-NBW (Term Frequency - Negative Binomial Weighting). The recommended value is TF-IDF. In addition, you can set Retrain term discrimination to Active if you want to reuse previously trained models for faster training time, or to Inactive if you prefer to train the model from scratch.

  8. Now you should set the correct Confidence Threshold for your AI classification model. A confidence score is a number between 0 and 1 that represents the probability that the model has correctly classified the email message. For instance, if you set a threshold of 0.5 it means that you will reject all predictions with a confidence score below or equal to 0.5. For higher precision, set a high confidence threshold; for higher recall, lower the confidence threshold. As an example, if you are unsure of what accuracy would suit you best, we recommend starting with a threshold of 0.5, and adjust with next training iterations.

    Select AI classification model's confidence threshold
  9. Next, in Reclassification you can choose whether to include already seen messages in the classification model. If this is disabled, only new and unseen email messages will be classified.

    AI classification model reclassification
  10. Finally, in Status, toggle the Active button and then click Add. If you don’t want your model to classify any emails, you can set the Status option to Inactive.

    Create AI classification model

Once you click the Add button, the training process will begin. The time it takes to train will depend on the number of labelled email messages you have. You can track your training progress by clicking on your AI classification model and checking the Status section. As soon as it says Active your AI Classification model has finished training and it is now ready to use.

Trained AI classification model

Dictionary Classification#

Dictionary Classification categorizes email messages by finding keywords that are associated with a topic.

For instance, if you have a topic called Order Status, you can define keywords that your clients often use when inquiring about the status of their orders. Mailytica will then classify those emails and carry out an action upon it (automated Smart Response or E-Mail Routing).

To set up a Dictionary Classification model, follow these steps:

  1. First, in the Classification Setup section, select Dictionary Classification from the top tab bar. The Dictionary Classification tab will show you the instructions on how to define your keywords. Click Next.

    Add a new dictionary classification model
  2. Afterward, you can choose the language of the messages you want to classify. For example, if you select English, then only emails in that language will be considered for the dictionary classification.

  3. Next, you can define the model’s Input Field, which refers to the different parts of an email message that Mailytica can use to find your keywords. To select an input field, first delete the default value and then select your desired input from the dropdown menu.

    Dictionary classification input field

    The recommended value is cleaned_email_message. For more information on Input Fields, please visit the section Mailytica’s Email Data.

  4. In Dictionary Entries, you can add all the keywords related to your topic. There are two different options to enter your keywords. The first option is to upload a .txt file with each keyword separated by a semicolon. The second option is to add them manually, to do so simply write your keywords in the text box and press enter after each one. For example, for our Order Status topic we will add the keywords order status, shipping status, and status of my order.

    Add dictionary entries
  5. Following, in Case Sensitivity you can choose whether keywords should be case-sensitive. The recommended value is Inactive. When inactive, Mailytica will detect the keywords regardless if they are written in upper or lower case. To illustrate, the keyword order status in our previous example will be detected whether it is written as order status, Order Status, or Order status in a message.

  6. Next, the Reclassification feature allows you to choose whether Mailytica also classifies past messages that you have already received. If this feature is disabled, only new messages are classified. Therefore, the recommended value is Active.

  7. The next step is to configure the Confidence Strategy, which determines how confident Mailytica should be before classifying an email.

    • First, select a Confidence Value from 0 to 1. The Confidence Value represents the minimum confidence Mailytica needs to classify an email under a particular topic. A higher value (closer to 1) means Mailytica will only classify emails it’s very sure about, while a lower value (closer to 0) allows more flexibility, meaning Mailytica can classify emails even if it’s less certain about the topic.

    • In Confidence Strategy, choose among three strategies to calculate match confidence:

      • Max: This option selects the highest confidence score from all possible topics.

      • Multiply: This strategy multiplies the confidence scores of all potential matches. This strategy increases classification accuracy for emails that include multiple relevant keywords. In this way, it’s less likely to incorrectly classify an email based on a single, potentially misleading match.

      • Log: The log strategy applies logarithmic scaling to confidence scores, helping to balance the impact of very high or very low scores. This approach is helpful for finding a middle ground.

    • Next, select a value for Document Normalization, the default value is 100. This reduces the impact of longer emails to ensure they’re treated equally with shorter ones. If an email exceeds this value, Mailytica will adjust its calculation to prevent length from impacting the classification.

  8. You can then select the Classification Mode between normal Classification, Blacklist, or Whitelist. In Blacklist, you set specific terms to block certain topics from a classification. The model will exclude every message that matches these terms. On the other hand, Whitelist positively selects your emails. A classification is successful only if the contents of the message match the keywords in the dictionary model.

  9. To finish, in Status, toggle the Active button and then click Add. Your Dictionary Classification model should now be activated and saved. If you don’t want your model to classify any emails, you can set the Status option to Inactive.

    Dictionary classification model details

Regular Expression Classification#

Regular Expression Classification is a powerful way to categorize email messages with user defined words or expressions. Regular expressions work similarly to keywords, but with more rigorous matching rules.

A regular expression (also known as a regex or RegExp) is simply a pattern that you can use to match specific characters. Therefore, regular expressions identify text patterns and extract specific portions of text from within larger blocks of text.

Regular expressions are used to determine whether a string matches a certain pattern. This can be done through the use of special characters such as *, ^, [ ], ( ), and +. For example, square brackets [] can be used to match any character that is inside them. Therefore, the regular expression [cb]at would match the words cat and bat, while eat would not match this pattern.

Common Regular Expressions#

Regular Expression

Meaning

Example

[ ]

Matches characters inside brackets

gr[ae]y: grey or gray

[a-zA-Z]

Matches characters between a-z or A-Z

[a-zA-Z]: My order number is #12345

\p{L}

Matches any kind of letter, including language-specific characters

\p{L}: My last name is Müller

.

Any character

.: I have not received a confirmation email

\d

Any digit

\d: Can you please call me at 123 456 7890?

( )

Capture everything inside parentheses

(shipment): What is the status of my shipment?

(?i)

Case insensitive match

(?i)(shipment): Shipment, shipment, SHIPMENT, etc.

(?: )

Defines an unnamed group. You can use it to define optional matches.

shipp(?:ing)?: How long does it take for my order to ship?

|

Either or

(order|shipping): Is my order ready for shipping?

?

0 or 1

colou?r: colour or color

*

0 or more

#*: Your order # 567890 is out for delivery.

+

1 or more

ad+res+: address, adress, addres, …

{}

An exact number of characters

\d{10}: The tracking number is 1234567890.

^

Beginning of the string or line

^Hello: Hello, what is the status of my order?

$

End of the string or line

Regards$: Thank you for your help. Regards

\n

Matches a line break

\n(\w+): I recently placed an order. Could you please send me any updates?

Regular Expressions Examples#

Below is an example of how to use regular expressions to extract information from email messages:

Example email message

Subject: Order status

Sender: client@company.com

Hello,

I wanted to get a status update on my last order #98237 that I placed on 01/10. When can I expect it to ship?

You can contact me at 123-456-7890 if you need any additional information.

Thanks,

Veronica Jones

We could use regular expressions to extract information from a message as follows:

Using regular expressions#

Target

Regular Expression

Result

Order number

(#\d{5})

#98237

Order date

[0-9]{2}\/[0-9]{2}

01/10

Phone number

[0-9]{3}-[0-9]{3}-[0-9]{4}

123-456-7890

Message topic

(order|ship(?:ping)?)

order, ship

Regular Expression Classification Set Up#

To set up a Regular Expression Classification model, follow these steps:

  1. First, in the Classification Setup section, select Regular Expression Classification from the top tab bar. The Regular Expression Classification tab will show you the instructions on how to define your expressions. Click Next.

    Add a new regular expression model
  2. Then, you can choose the Language of the messages you want to classify. For example, if you select English, then only emails in that language will be considered for the regular expression classification.

  3. Next, in Sender Filter, if the sender holds an identifier that can be used as a filter, you can add its corresponding regular expression. Otherwise, you can add the expression .* to include every character.

  4. Similarly, in Recipient Filter, if the recipient holds an identifier that can be used as a filter, you can add its corresponding regular expression or just add .*.

  5. Following, under Subject Filter, you can enter a regular expression to match emails with a specific subject. For example, we can match emails with either the subject Order status or Shipping status using the following regular expression: (Order|Shipping) status.

    Add a subject filter to a regular expression classification model
  6. Likewise, in Message Filter you can enter a regular expression to match the message’s body. Otherwise, you can add the expression .* to include every character.

  7. Afterward, you can also define an Input Field filter, which refers to the different parts of an email message that Mailytica can use to find your keywords. To do this, enter the appropriate name of the input field in the first text box and then, in the Filter text box, enter the regular expression of your choice. If you wish to skip this step, please enter .* to include every character. For more information on Input Fields, please visit the section Mailytica’s Email Data.

  8. Next, the Reclassification feature allows you to choose whether Mailytica also classifies past messages that you have already received. If this feature is disabled, only new messages are classified. Therefore, the recommended value is Active.

  9. In Confidence Weighting you can set a value between 0 and 1 depending on the reliability of your regular expressions. A higher confidence weighting (closer to 1) means you’re more sure that the email is classified correctly when it matches the regex pattern. Confidence weighting is just a measure of your trust in the regex match. It’s a way to control how sure Mailytica needs to be that a certain pattern is correct before deciding the classification of an email.

  10. You can then select the Classification Mode between normal Classification, Blacklist, or Whitelist. In Blacklist, you set specific terms to block certain topics from a classification. The model will exclude every message that matches these terms. On the other hand, Whitelist positively selects your emails. A classification is successful only if the contents of the message match the expressions in the regex model.

  11. To finish, in Status, toggle the Active button and then click Add. Your Regular Expression Classification model should now be activated and saved. If you don’t want your model to classify any emails, you can set the Status option to Inactive.

    Regular expression classification model details

Document Length Classification#

With Mailytica, you can also classify email messages depending on the number of words they contain. You might use this feature to prioritize your inbox or to group messages in a specific language. Additionally, by classifying email messages by number of words, you can find messages that are likely to be advertisements or spam.

To create a new Document Length Classification model, follow these steps:

  1. To begin, in the Classification Setup section, select Document Length Classification from the top tab bar. The Document Length Classification tab will show you the instructions on how to define your model. Click Next.

    Add a new document length classification model
  2. Next, in the Language section, you can choose the language of the messages you want to classify. For instance, if you are receiving emails in multiple languages and would like to sort your inbox by language, then you could create Document Length Classification models for each one. As an example, we can create a model that can identify messages in German, so they can be filed in a language-specific folder in our inbox.

    Document length classification model language
  3. Following, under Minimum Number of Words, you can set the minimum number of words a message has to have before it can be classified into the topic. Set the value to 0 if you want messages from any length to be classified.

    Document length classification model minimum words
  4. Similarly, in the Maximum Number of Words section, you can set the maximum number of words messages should have to be matched with their related topic. If you want messages to be classified regardless of length, enter a larger number, such as 100,000.

    Document length classification model maximum words
  5. Then, the Reclassification feature allows you to choose whether Mailytica also classifies past messages that you have already received. If this feature is disabled, only new messages are classified. Therefore, the recommended value is Active.

    Document length classification model reclassification
  6. To finish, in Status, toggle the Active button and then click Add. Your Document Length Classification model should now be activated and saved. If you don’t want your model to classify any emails, you can set the Status option to Inactive.

    Document length classification model