# AI FAQ
# What
# What is AI FAQ?
AI FAQ is a feature that allows the Clinc Platform to fulfill queries from externally sourced data. Users will be able to import files or add URLs to create documents that can be referenced during the query.
# How
# How to use AI FAQ
In order to use this feature, the institution needs the
AI FAQ
entitlement. TheAI FAQ
feature must also be enabled under the institution settings page.The
AI FAQ Collection Management
page contains a listing of all existing collections for an institution. New collections are also created here.To create a collection, click on the
Create Collection
button. Refer to setting advanced query options for a deeper understanding onThreshold
andTop K
.Click on a collection to view a listing of all the documents in that collection. Users can create new documents by uploading files or adding URLs. Documents can be filtered by tags using the tag sidebar and the same sort/filter options that exist elsewhere on the platform. URL-type documents have a sync button that will index the newest version of that webpage's data.
Click the
+ Upload Files
button to add files to a collection. Accepted files are.pdf, .docx, .txt, and .json
.json
files must be in the following format:[ { "source": "gcp", "filename": "ai_platform_data_labeling_docs_faq", "content": "what quality control methods can i use to ensure the labeling quality?", "answer": "you can request multiple human labelers to annotate each piece of your\n data" } ]
Multiple files can be uploaded at once as long as each file size is under 5MB and does not have spaces in the filename. Descriptions can be specified for each of the uploaded files and provided tags will be applied to all files. Refer to advanced options to learn more about each of the options in the
Advance Upload Options
dropdown.
Click the
+ Add URLs
button to add web content to a collection as documents. Similar to files, multiple websites can be added at a time via manual entry, webscraping, or import. Any tags added or advanced settings configured will be applied to all websites. URLs have an extra advanced option,Remove HTML Tags
, defined in the advanced options section.Entering urls manually can be done by clicking the
+ Add Url
button and specifying the url and an optional description.Urls can be imported into the platform to aid in data entry. Importing urls requires a CSV file with a specific format to be used. There is no limit to the amount of urls that can be entered via CSV import. To import a CSV file, click on the
Import
button seen in the screenshot. A dialog will appear and a user should enter a valid CSV file here containing the urls to be added. Below is an example of a valid CSV file that can be used to input many urls at once.url,description https://www.company.com https://www.company.com/human-resources.html https://www.company.com/employee-handbook.html https://www.company.com/faq
Webscraping is another tool Clinc offers to automate the process of adding URLs. Clinc can scrape static sites for urls to be used for web content. A user can extract urls from a webpage and use them to curate web content. To get started with this feature, click on the
Scrape
button seen in the screen shot. A dialog will appear and a user should enter the URL that they intend to scrape. Html class attributes specify one or more class names for an element and are mostly used to point to a class in a style sheet, in this case they can be used to narrow scrape results. An optional class selector field can be included in the url entry dialog to narrow down results of a scrape. Class attribute names can viewed by inspecting the page source of the site a user wished to scrape. The results will contain a list of urls consisting of all urls nested inside of the elements selected by the class attribute. See the screenshot below for an example of how to use the css class selector option in the Webscrape URL dialog. In the example, the user has enteredsiteMap
for the css class selector. This will select all html elements that match withclass=siteMap
.After uploading files or adding URLs, a reindexing process for the collection will start and will be visible via the notification icon. New documents will be queryable upon completion of reindexing.
Make a request to the
/v1/ai-faq/query
endpoint from Business Logic to start query a collection. The endpoint accepts a number of parameters, refer to the ai faq query endpoint to learn more.Clicking on an existing document will display more information about the file such as its tags and the advanced options configured to the file.
# How to Configure Advanced Options
The platform offers several preprocessors that allow users to customize the way documents are processed before they are stored in the datastore. These parameters are crucial for optimizing the retrieval and generation processes based on the user's specific requirements.
Remove Substrings
Because a document may contain sensitive information, a user can specify any substrings they would like to be excluded from processing.
Remove HTML Tags
This option allows pruning of website data by removing all content between the given tags. For example, entering
header
will remove all content between<header>
tags.Split By and Split Length
Large documents should be broken down into smaller chunks for more efficient processing. Users can specify whether they want to split it by words, sentences, or passages. The split length value describes the maximum number of units (words, sentences, or passages) allowed in one document after splitting. Depending on the split by option, the split length value will update automatically to the default.
Split By Default Split By Value Lower Convert text to all lowercas sentence 2 word 200 passage 1 split_by: "sentence" split_length: 2 Explanation: This configuration will split the document into chunks of two consecutive sentences each. For example: Chunk 1: "You are processing a legal document with lengthy sentences and want to analyze each sentence individually for clarity and content." Chunk 2: "This configuration will split the document into chunks of two consecutive sentences each." ...
split_by: "word" split_length: 2 Explanation: This configuration will split the document into chunks of two consecutive words each. For example: Chunk 1: "technical specifications" Chunk 2: "for a" Chunk 3: "product You" ...
split_by: "passage" split_length: 2 Explanation: This configuration will split the document into passages, each containing two consecutive paragraphs.
- Split By Overlap
The split by overlap value only applies to
word
splitting. It describes the word overlap between two adjacent documents after a split. This provides overlap between adjacent splits to ensure continuity in the processed data. The overlap value must be less than the split length value. For example, consider the following configuration: Split byword
, split length of5
and a split overlap of2
. In this scenario, the document will be split into chunks of five consecutive words each with an overlap of two words between adjacent chunks. Let's consider the following sentence from the document:"The experiment results indicated a significant improvement in the performance of the new algorithm compared to the previous one." Split Process: Splitting the sentence into chunks of five words each with an overlap of two words: Chunk 1: "The experiment results indicated a significant" Chunk 2: "a significant improvement in the performance" Chunk 3: "the performance of the new algorithm compared" Chunk 4: "the new algorithm compared to the previous" Chunk 5: "to the previous one."
# Creating a Personality and Editing Response Templates
In order to show the AI FAQ response when querying the collection, users need to modify the response template for the device they are querying.
The list of current AI FAQ personalities are shown in the Personalities
tab of the collection page. A default personality is created when a user creates a new collection and users can also create new personalities. Attaching devices to personalities will work the same way as version personalities.
![auto faq personality](../media/auto_faq_personality.png)
Each personality will have one response template and users can edit the contents of the response template by clicking `View Templates`.
The AI FAQ response variable is exposed as `ai_faq_response`. The `ai_faq_response` variable has the following format:
```
"results": {
"documents": [
{
"id": "a3931d51058f15235eb8b39fc387eb3f",
"content": "This APR will vary with the market based on the Prime\nRate.a\nCash Advance APR\n29.49%. This APR will vary with the market based on the Prime Rate.b\nPenalty APR and When\nIt Applies\nUp to 29.99%. ",
"content_type": "text",
"meta": {
"embedded": true,
"page": 1,
"uuid": "e7a6cc8c-706a-4711-9b4c-4260734ce34c",
"tags": [],
"source": "card_agreement.pdf"
},
"id_hash_keys": [
"content",
"meta.uuid"
],
"score": 0.9984670281410217
},
],
"root_node": "Query",
"params": {
"threshold": 0.0,
"top_k": 5,
"filters": {
"tags": [
"quality"
]
},
"index": "1d8a75fd-aa9f-46f6-b957-ff286fe5b24f_test"
},
"query": "quality control",
"node_id": "Thresholder"
}
```
Below is an example response template that iterates over the documents and render the content of each document.
![auto faq template](../media/auto_faq_template.png)
# How to Query a Collection
In order to query the collection, the user needs to associate the collection with an application.
1. Create a new [environment](environments_applications.html#how-to-create-an-environment)
2. Create an application and attach the collection to it.
3. There are multiple ways to query a collection once it is attached to an application. One way is to use the Query Sidebar and choose the environment/application. Another way is to attach the application to an integration and query the integration. A user can also query the collection directly using the query endpoint similar to querying an environment/application.
# How to Query from BLS
In order to query the collection from a business logic server, hit the /v1/ai-faq/query
. The request requires a collection_name
, query
, institution_id
and the pipeline_name
.
Other parameters can be specified to restrict the results of the query such as specifying the tags or the filenames of the documents a user wishes to query.
Example payload:
```
{
"query": "what is the apr rate?",
"collection_name": "FAQ",
"pipeline_name": "search_pipeline",
"institution_id": "123",
"params": {
"top_k": 3,
"threshold": 0.99,
"generate": true,
"filters": {
"tags": ["APR"],
"name": ["filename.txt"]
}
}
}
```
# Setting Advanced Query Options
When querying a collection, the user can specify parameters to restrict the results of the query. These options are set on the collection level and can be set when creating a collection or by editing an existing collection.
Top K determines the number of documents the service will return.
Threshold determines the level of confidence in returning the documents. A higher value will be stricter in returning results whereas a lower value will be more lenient and yield more results.
Generate Response uses generative AI to build a summarized response of all the returned documents. This is useful in cases where the user wants to provide an easy-to-read response.
Last Updated: 7/29/24