# Crowdsourcing Data

If you've experimented with some queries in the query sidebar, you may have discovered that your AI version does not always respond as you would expect.

User: How much money do I have in checking?

AI: Hello!

Remember that a new AI version is basically a new brain, eager to get out in the world and learn. So far, we've given the AI very little to know about the world by way of the training data we've provided. The more data the AI has to learn from, the better its predictive power becomes. Just as humans become more intelligent and wise with experience, artificial intelligence needs the "experience" of data to discover patterns and "truths" about its environment.

Instead of manually entering hundreds of utterances (and then labeling any potential slots) by hand, the Clinc Platform provides a method of data collection known as crowdsourcing. Using this feature, you post a "job" to Amazon's Mechanical Turk's (MTurk) platform. Once posted, MTurk workers view the job and complete for marginal amounts of money.

# Scoping Jobs

We won't run a scoping job in this tutorial, since we are already working with get_balance and account_transfer. Just remember that scoping jobs are the tool you need when you aren't sure what scope or capabilities a competency should have. Read more here.

# Classification Jobs

Since we already have two competencies we are in a great position to launch classification jobs. This adds more diverse data to the classifier, and saves your from having to enter hundreds of utterances by hand.

At a high level, any classification job is asking the MTurk worker:

If you were trying to ___ or were asking for ___, what kinds of things would you say?

In other words, we want to take an intention a user might have, such as querying for account balance, and populate a variety of ways that intention might be expressed.

To do so, hover over to the CREATE tab in the nav and select Crowdsource.

Click

# Job Name

Fill in Job Name with get_balance. We will be able to see this is a classification job based on its icon (shown on the button above). This name is for organizational purposes on the platform.

# Job Worker Title

Fill in Job Worker Title with something like Virtual Financial Assistant. This title is what MTurk workers will see when they browser for jobs to complete.

# Job Description

In Job Description, enter something like:

Imagine you had a virtual assistant that could help you determine the balance of your bank account(s). What kinds of things would you say to or ask the assistant? Be creative.

At a minimum, the description should cover:

  1. The scope of the competency
  2. A prompt that tells the user to provide utterances

Try to be as vague as possible in order to not influence the workers inputs.

# Classification

Select the get_balance_start classification by clicking the dropdown labeled Select Intent. This helps prepare the classification data into the JSON format required for importing to a classifier. It is for internal use only. The worker will have no idea they are providing data for a Clinc AI competency.

# Examples

Provide at least three example utterances that are diverse in structure and content. You as the conversational designer will know if slots exist for a competency, so vary the amount and values of slots if applicable.

Here are three examples:

What is my balance?

I haven't checked on my savings account in a while, how much do I have?

Tell me my checking account balance

# Number of Utterances

This number determines how many responses are needed before the job is taken down. We will leave ours as the default 125. If you want the job to finish sooner and gather more data later, you can enter a lower value like 50.

# Reward per Worker

This is the amount an MTurk worker receives for providing an utterance. We will stick with the default of 12 cents. Too low a reward, and no one will take the job. Too high, and you risk workers trying to gameify the job and submit as many random entries as possible that might not be related to account balance queries.

# Launch Job

Click Launch Job and wait for your results to populate. You can view intermediate results at any time. Click the down arrow on the job listing and click View Results.

# Curating Results

Before we bring the data into our AI, we can weave in a layer of data curation. Scan the results for any bad data. For get_balance, we would want to uncheck anything that didn't have to do with asking for an account balance.

You can also filter out data that is repetitive with only minor parts changing. Take the following example:

What is my checking balance?

What is my savings balance?

What is my credit balance?

The only thing that changes is the value of the word before balance. These types of patterns can cause the system to "overfit" this type of utterance. In a practical sense, the system might be more likely to identify anything before balance as the account type. This would be incorrect if the query was

What is my balance?

Because my is clearly not an account type.

# Assign to a Class

Click the dropdown on the right to select get_balance_start. This will classify each checked utterance as this intent.

# Export

We can either EXPORT TO CLASSIFIER or EXPORT TO FILE. The file export is a JSON file of type:

{
  "some utterance a user said about account balance": "get_balance_start",
  ...
}

Where get_balance_start comes from the selection in the Assign to a Class dropdown.

For this tutorial, we will export directly to the get_balance_start classifier.

Once you export, you'll see greyed out utterances. This indicates that the utterances have already been exported to the classifier. Utterances with a white background after exporting to a classifier indicate an utterance already present in the classifier.

# Train

That's it for crowdsourcing classification data. Now you head back to the competency to train and export the utterances for SVP labeling (if applicable).

# Slot-Value Pairing Jobs

# Job Name and Worker Job Title

Similar to classification jobs, an SVP job has a Job Name for internal use and a Worker Job Title that the MTurk workers see. Enter get_balance and Asking for Account Balance for these fields, respectively.

# Job Description

For the description, enter something like:

Imagine all the ways someone might ask for an account balance. However they say it, they might mention which account in particular. Highlight the account type in the following phrases. If there is no account type, leave it unlabeled.

An example of a phrase without the account specified:

What is my balance?

Like the classification job, we want to provide enough information for the worker to provide good data and nothing more. Too much context or description can influence how people respond.

# Examples

To clarify expectations, we show some examples. Similar to classification examples, we want to show variety in structure and the amount of slots used. All examples must have at least one slot, but we can provide a "no slot" example in the description. Since get_balance has only source_account, each of our examples will include a labeled slot. Use the following or similar utterances as examples:

How much do I have in my savings account?

Tell me how much I have in checking.

Do you know how much I have in my credit card account?

Use your mouse to highlight the slot values and select source_account under Available Slots as the slot name.

# Utterance Source

Instead of providing utterances like in classification jobs, MTurk workers label slots for the utterances you provide. You can either use the results from an existing classification job or upload a file of utterances. For this example, we will use the results from the get_balance classification job. You will want to wait for that job to finish before using it.

# Number of Votes Per Slot

We will leave this as 1 because it is a pretty straightforward slot scenario. This means that it only takes one person to label a slot for the labeling to be valid. Read more here.

# Reward Per Worker

Similar to classification jobs, $0.12 is a sweet spot value that is enough to be worth the work but not too much so as to encourage sloppy or dishonest work.

# Launch Job

Click Launch Job and wait for your results to populate. You can view intermediate results at any time. Click the down arrow on the job listing and click View Results.

# Curating Results

Before we bring the data into our AI, we can weave in a layer of data curation. Scan the results for any bad data. For get_balance, we want to make sure all the labeled slots are valid.

# Export

We can either EXPORT TO COMPETENCY or EXPORT TO FILE. The file export is a JSON file of type:

{
    "sentences": [
        {
            "classification": "unclassified",
            "sentence": "Has my checking account gained any interest this month?",
            "svpLabels": [
                {
                    "label": "O",
                    "word": "Has"
                },
                {
                    "label": "O",
                    "word": "my"
                },
                {
                    "label": "B-source_account",
                    "word": "checking"
                },
                {
                    "label": "O",
                    "word": "account"
                },
                {
                    "label": "O",
                    "word": "gained"
                },
                {
                    "label": "O",
                    "word": "any"
                },
                {
                    "label": "O",
                    "word": "interest"
                },
                {
                    "label": "O",
                    "word": "this"
                },
                {
                    "label": "O",
                    "word": "month"
                },
                {
                    "label": "O",
                    "word": "?"
                }
            ]
        },
        ...
    ]
}

For this tutorial, we will export directly to the get_balance competency. Click EXPORT TO COMPETENCY and select get_balance from the dropdown.

# Train

That's it for crowdsourcing SVP data. Imagine doing all of that labeling your self...

# On Your Own

Try creating your own classification and SVP jobs for account_transfer.

Show Hint

# Classification Job for account_transfer

Field Value
Job Name account_transfer
Worker Job Title Virtual Financial Assistant
Job Description Imagine you had a virtual assistant that could transfer money between your accounts. What kinds of things would you say to make those requests? Be creative.
Classification account_transfer_start
Examples Transfer 40 bucks from checking to savings
Can you transfer some money to my checking account?
Move money from savings to checking

# Classification Job for account_transfer

Field Value
Job Name account_transfer
Worker Job Title Making an Account Transfer
Job Description To transfer money between accounts, a user must provide the source_account (aka transfer from), destination_account (aka transfer to), and amount. Given the following phrases, highlight the word (or words) that represent the source_account, destination_account, and amount.
Classification account_transfer_start
Examples Transfer 40 bucks from checking to savings
Can you transfer some money to my checking account?
Move money from savings to checking

# Conclusion

By leveraging crowdsourcing jobs, you can collect large amounts robust data to train your AI to have the human-in-the-room fluency that only Clinc can provide.