# Data Query Language
# What
# What is Data Query Language?
Data Query Language (DQL) is a language that Clinc has developed that enables you to locate data through queries. It supports queries based on strings, slots, slot values, phrases and the number of slots labeled, which provides a clear picture of slot value distributions and slot data quality.
# What's the syntax of DQL?
We will illustrate the syntax of the Data Query Language (DQL) and what it is capable of doing through an example below:
Index | Annotated Sample |
---|---|
0 | Book a flight from { SRC rome } to { DST paris } |
1 | I need to book a flight |
2 | Please find airfare out of { SRC london } to { DST dublin } |
3 | I need to fly to { DST turin } |
# String Queries
Query for strings by using double-quotes. We can select all samples containing the string "flight" by querying:
Query | Retrieved |
---|---|
"flight" | [0, 1] |
To find samples that have the phrase "book a flight", query:
Query | Retrieved |
---|---|
"Book a flight" | [0, 1] |
You can also format the search string as a regex like: Query: "r'[cheese|ham]burger'"
Which will select all utterances containing either "cheeseburger" or "hamburger".
Note: The whole regex must be within double quotes and the expression body must be within single quotes. All regular expressions in DQL use POSIX standards (opens new window).
# Boolean Operators
The DQL currently supports the and
, or
, and not
operators. All samples with "from" and "to" can be selected by querying:
Query | Retrieved |
---|---|
"from" and "to" | [0] |
Boolean expressions can be grouped within parentheses ()
. The query below shows an example:
Query | Retrieved |
---|---|
("flight" or "book") and not "from" | [1] |
# Slot Queries
The { }
characters can be used to specify slots. The syntax is: { slot_name slot_value_expression }
The slot_name
is the name of the slot for which we wish to search and may either be a string or a regex string (format r'expression'). The slot_value_expression
can be a string representing a slot label, a regex expression (format r"expression"), or a boolean expression composed of the and
, or
, or not
operators with strings.
Note: Spacing matters. There needs to be whitespace between the { and }.
All samples with the slot DST and value "dublin" can be selected by querying:
Query | Retrieved |
---|---|
{ DST "dublin" } | [2] |
Find all samples with SRC slot with value "london" or "rome" by querying:
Query | Retrieved |
---|---|
{ SRC "london" or "rome" } | [0, 2] |
Note: The expression "london" or "rome" is an example of a slot_value_expression. Also note that using a regex as the slot name requires single quotes
''
while using a regex as the slot value requires double quotes""
.
The wildcard symbol *
can be used within slot queries. All samples with the slot SRC and any value can be selected by querying:
Query | Retrieved |
---|---|
{ SRC * } | [0, 2] |
All samples with any slot can be found by querying:
Query | Retrieved |
---|---|
{ * * } | [0, 2, 3] |
Boolean operators can be used to join together slot queries into an expression. To find samples that have DST but not SRC slots, query:
Query | Retrieved |
---|---|
{ DST * } and not { SRC * } | [3] |
Regular expressions can be used in slot_value_expressions. The query below shows how to use regex to find all slots that begin with "par" or "tur":
Query | Retrieved |
---|---|
{ * r"par | tur" } |
# String and Slot Queries
We can use slot queries within string queries. One use case is where we want to find all samples with the phrase "from rome to paris", and where "rome" is a SRC slot and "paris" is a DST slot. This can be done by querying:
Query | Retrieved |
---|---|
"from { SRC "rome" } to { DST "paris" }" | [0] |
Note: We still used the same double quote symbol
""
introduced above to specify the phrase, and we also used the same slot syntax{ }
introduced above to specify the slots.
# Number Queries
The <
, >
, and =
operators can be used to query for samples containing specific numbers of strings or slots.
Find all samples with more than 1 "to" by querying:
Query | Retrieved |
---|---|
"to" > 1 | [3] |
Find all samples with fewer than 2 instances of "to" by querying:
Query | Retrieved |
---|---|
"to" < 2 | [0, 1, 2] |
Number queries can be mixed with booleans. (However, the comparison value cannot be grouped.) An example with a number query combined with a slot query is:
Query | Retrieved |
---|---|
"to" < 2 and { SRC "london" } | [2] |
Number queries can work with slots as well:
Query | Retrieved |
---|---|
0 < { SRC * } | [0, 2] |
# How
How to use the advanced slot data search tool?
# How to use the advanced slot data search tool?
To use the advanced slot data search feature in the platform, you need to enable it first:
- Go to the Settings page, scroll to the Institution-Level Beta Features, and find DQL.
- Click Enable. The changes are saved automatically.
The search bar is located on the upper right corner of all the slot data pages. Click the checkbox to enable DQL.
The search bar also provides detailed error messages when the syntax of the query is incorrect. It also shows the number of matching utterances.
Note: Data will be filtered implicitly by the slot of the page you are on. If you want to filter all the slot data, make sure to be on the All Data page.
Last updated: 08/21/2020