# Classification

# What

# What is classifier and classification?

When an end user queries your AI, the AI determines which transition to take and which state to land on by processing the query through an AI engine called a classifier (used interchangeably with query classifier/QC). This process that determines the direction of the next turn in the conversation is called classification. Notice that every query comes into the platform will be classified first. During this process, the classifier engine assign a label to the query and then maps it to an existing intent and take the classification transition that the intent is associated with.

classificaiton

Classification data is used to train the classifier engine to make decisions on queires. For example, if you have an intent called food_order in your training dataset, when an end user say "I want a burger", this utterance is likely to be mapped to that intent. As a conversational designer, you will want to make sure the classification data for your competencies have a wide variety of "utterances" (ways of saying something) so that your AI will be able to respond correctly to natural and messy human expressions.

# What is an intent?

An intent is a collection of data with which you train classifier engine. You would normally assign a label upon creating an intent. Two most common intent labels that we use in the Platform are: competencyname_start and competencyname_update.

There are also a list of pre-built intents in the Platform you can use for your model training:

  • clean_hello_start
  • clean_goodbye_start
  • cs_yes
  • cs_no
  • cs_cancel

# What is the difference between adding a new classification transition and adding an intent to an existing classification transition?

You can either add a new transition or add an intent to an existing transition (see graph below). Adding an intent is almost like adding the data to the existing except on classification you see intent1&intent2(in response payload). If you add another classification transition they "compete" with each other for classification which could lower either or their respective scores for classification. With individual intent, you can also configure action/response specific to this transition through Business Logic which provides more customizability. shared & individual transition

# How

How to add/remove an intent?

How to collect classification data?

How to export classification data?

How to curate classification data?

How to use the uniqueness sorting tool?

How to use the classification data insight tool?

How to utilize query classifier?

Best Practice Tips

# How to add/remove an intent?

There are two ways to go about creating an intent:

  1. By adding a classification transition, you need to specify the intent. You can choose an existing intent or Enter your own. See [How to add different types of transitions?](states&transitions.html#how-to-add-differenct-types-of transitions)

  2. You can also hover over the classification transition to which you wish to add an intent, click on addto reveal the add intent field.

add intent

To remove an intent, hover over the intent name and you should be able to see remove. Generally the ones that link root to the competency cannot be removed.

# How to collect classification data?

Generally in the Clinc AI Platform, there are three ways to collect classification data:

  1. Manual Entry
  2. Import
  3. Crowdsource

# Manual Entry

  1. From the detail sidebar, select the competency/state/transition, click on the intent to which you would like to add data. If you need to add a new intent, see How to add an intent?

  2. On the data curation page, type in utterances in the Add utterances field one at a time and click Add.

  3. After making changes, click Save Data on the top right to save edits before leaving the page.

add clf data

Other actions:

  1. Edit utterances by clicking into text and changing the utterance in place.

  2. Remove utterances by clicking on the clear at the end of each row.

# Import

On the same classification data page where you manually add data, use the Import button to import classification data from a JSON file or a CSV file. Take get_balance as an example, the uploaded classification data file needs to be in one of the following formats:

JSON

{
	version:1,
  "data": {
  	"get_balance_start": [
    	"Can you tell me the balance for my savings account please",
    	"I want my balance",
    	"I want the balance for my checking account",
    	"I'd like to know how much money is currently in my savings account"
       ]
    }
}

or

{
    "Can you tell me the balance for my savings account please": "get_balance_start",
    "I want my balance": "get_balance_start",
    "I want the balance for my checking account": "get_balance_start",
    "I'd like to know how much money is currently in my savings account": "get_balance_start"
}

CSV

Similar to the JSON format, you will have the intent name in the first column and utterances in the second column.

cls in csv

If the format is validated, the utterances will be imported and appear in the data view for their corresponding intent.

# Crowdsource

You can either click the Crowdsource button (below Import and Export) on thhe data page. Or from your AI version workspace, go to Crowdsourcing Data in competency sidebar.

  1. Click New Classification Job.

  2. Fill in the modal popped up:

    clf crowdsource

    Default Settings:

    • Job Name: Name the job for your own reference. It will NOT be visible to the crowdsourcing workers.
    • Worker Job Title: The job name that will be posted to the crowdsourcing platform.
    • Job Description: A description that is passed to the crowd. Make sure that it includes a descriptive prompt so that the crowd worker knows how to successfully complete the labelling task. Read the Best Practice Tips to learn our recommendation on how to frame the description.
    • Classification: Select the intent for which you are collecting data.
    • Example Utterance: Provide at least three example utterances to demonstrate to the crowdsource workers the kind of utterances you are collecting.
    • Number of Utterances: Number of utterances you are collecting. It should be a multiple of 5, and no greater than 500.
    • Reward per Worker: How much the crowdsource workers will be paid for each utterance. The reward must be no more than $0.50. Each worker will provide 5 utterances. MTurk charges an additional 20% fee, on top of the reward per worker.

    Advanced Settings:

    • Job Duration: How long the job will be available on MTurk before expiring.
    • Worker Country: You can choose the regions that the crowdsourcers are from. Currently only three countries: China, United States and United Kingdom.
  3. Once all the information is filled in, you are ready to Launch the Job!

  4. Once a job is complete, click on the job title to review the results. It's time to curate the classification data.

Other actions:

  1. You can also cancel the job at any point of the process.

# How to export classification data?

On the classification data page, use the Export button to export classification data to a JSON file or a CSV file.

export in csv

Reference the Import section for the format of each file type.

# How to curate classification data?

When your crowsource job progress reaches 100%, you are ready to curate the data. This process is just as important as collecting. Here is a checklist that we recommend to keep in mind while curating:

  • Make sure that all utterances fits your job description i.e. fits within the competency and represents the intent you are collecting utterances for. For example if you are collecting data for food_order_start, an utterance like "actually change the burger to a sandwhich" does not fall under the intent.
  • Do not delete typos, slang, incorrect grammar, etc. Utilizing the typos to better represent "messy" human language.
  • “Is this teaching the AI something new?” Make sure to delete copy cat utterances.
  • Make sure there's a variety of data that is representative of all the ways someone could give a query.
  • Collect data at different granularity: capture ways people say about something (the intent you are collecting data for) as much as possible.

export to clf

To curate classification data:

  1. Click View Results on the crowdsourcing page. You will see a list of utterances provided by crowdsource workers.

  2. Deselect the ones (or delete later from the classification data page) that doesn't fit your job description/competency/intent. You can also edit the utterance.

  3. Once you have a curated dataset, click Export to Classifier to export the utterances for training purpose. Select the intent that the utteraces are trainig. If an utterance is successfully exported, its row will be greyed out.

# How to use the uniqueness sorting tool?

The uniqueness sort tool provides two main functions:

  1. help identify errors;

  2. help identify unique and underrepresented training samples.

Errors are training samples that are either noise (i.e. out-of-scope) or are mislabeled with the wrong intent. Underrepresented training samples are samples that deviate significantly from the lexical norm within an intent. This means that the words used in the underrepresented training sample are more unique than most of the other samples within an intent.

To access the uniqueness sort tool for an intent, navigate to the intent’s data page:

uniqueness sorting tool

  1. Select the Uniqueness ascending or descending in the SORT BY dropdown.

  2. After a moment, the tool displays a uniqueness score. The higher the score, the more unique the utterance is.

When an utterance gets a high uniqueness score, that can mean either it is erroneous, irrelevant to the intent or this is a more unique way of expressing the intent compared to the rest of the data within the intent. In the example above, the utterances “Which weather app has the best widget?” is clearly irrelevant to my_balance_start intent. But the first one “Weather prediction please” is a valid query. With underrepresented data, you should consider collecting more diverse training data. One good practice can be using the unique sample as an example in a new crowdsourcing job.

Note: The uniqueness scores are computed per intent, the scores should not be compared across intents.

# How to use the classification data insight tool?

The goal of the classifier weight analysis tool is to provide an estimate as to how heavily certain words are weighted within an intent classifier. We call the score of a weight an “Influence Index”. If a word has a high influence index for a given intent, then the intent classifier gives that word high importance and a query containing that word is more likely to be classified as that intent.

To access the classification data insight tool, click bar chart icon, a data insight modal will appear.

clf data insight

What do we do with the insights?

  • Make sure each of the terms listed under each intent make sense to belong under that intent. What is important to keep in mind is that each term listed will make any utterance containing that token have a higher likelihood of classifying into that intent. In our example above, the high influence index of the word "expect" should raise a red flag. If I put in a query "What should I expect to wear today?" It's likely to be classified into weather_forecast_start while clothing_recommendation_start is the correct one.

  • Another thing to look for when looking at the data insights, is stop words and punctuation that don’t have anything to do with the subject of the intent. Examples of these terms are: “is”, “are”, “the”, and any sort of punctuation. When these appear, it can be useful to try typing in utterances that could be ambiguous without the specified term into the query side bar and check the intent probability to see how much of an impact that term is making.

Note: You need at least two outgoing transitions from each state for this tool to function. As it is not possible to draw inferences about intents when there is less than two outgoing intents.

# How to utilize query classifier?

Watch this video on Query classifier vs. SVP (opens new window) to learn how to utilize query classifier and slot-value pairing.

# Best Practice Tips

  1. How to frame crowdsource job to collect data more efficiently
  • The job description should be concise yet detailed at the same time. You shouldn't assume that the crowd has any existing knowledge pertaining to your competency/task. One effective way to describe your job is to describe a scenario that a user may be in when they need to use this assistant and ask the crowd to provide what type of questions they would usually ask. For example for balance, a good job description can be like the following:

    • Imagine you have a virtual assistant that have access to your bank information and can answer questions about your personal finance, if you are wondering about your bank account balance, what would you say to the assistant?
  • Provide multiple example utterances, for both classification and slots. The examples should be as diverse as possible, which lead to more diverse results collected from the crowd. For example, example 1:"what's my balance", example 2: "how much money do I have" is a much better example list than example 1: "what's my balance" and example 2: "what's my balance last week?"

  • Resolve common trends in your answers. Large crowdsource jobs often have similar responses and do not have enough diversity to support your virtual assistant. Try adding diversity by running several smaller crowdsource jobs with nuanced prompts and examples.

  1. What makes data “good”
  • Good data can be thought of as a representation of how users interact with the AI. Quality and consistency are important for a good dataset. A high quality dataset requires that its utterances be correctly labelled and added to the proper use case. Consistency is also important and requires that the utterances and words/phrases in the dataset are labelled with the same ruleset and heuristics across the entire dataset.
  1. What amount is "enough"
  • Generally we recommend to have 300-500 utterances for each classification intent label and 600-1000 utterances for each SVP slot for a human-in-the-room level of quality. However, depending on the use case, the number needed can vary. The amount of data required depends heavily on a range of factors, including the complexity of the competency, the diversity of the utterance for the competency, and the other competencies that co-exist in the same AI version (e.g. how close they are to the new competency). Therefore, there is no golden rule for how many utterances you need to have for a competency. Secondly, data collection and curation is an ongoing process, even well beyond the point of the competency prototype. These numbers are minimum amount of utterances you need for a "working" competency prototype with an average complexity. Continuous testing and data curation are required to improve the competency quality. It is through testing out the competency yourself and on others that you can truly evaluate whether there is enough data.


Last updates: 02/10/2020