Thoropass uses GenAI to automate some workflows, such as document ingestion or answering questionnaires. We currently use Google's Vertex AI and Vector Search as our machine learning platform and vector stores respectively.
GenAI Flow
Like many platforms, we will:
Ingest the data (e.g., PDFs, CSVs).
Separate the data into chunks. In the case of documents, we will typically chunk page by page.
Chunking involves segmenting large amounts of data into smaller, more manageable amounts.
Process the chunks and convert the data into embeddings.
Embeddings are a numerical representation of a complex object. Embeddings are designed to be consumed by machine learning models.
The embeddings are stored in a vector store (Vector Search).
When a question must be asked regarding the stored vectors, the question is converted to embeddings and sent to the vector store. The vector store finds similarities between the question and the data points.
We send the similarities found to a machine learning platform (Vertex AI) to determine the answers. Depending on the prompt, we may determine multiple answers with varying levels of confidence.
Our Default Confidence Configuration
For some workflows, such as our due diligence questionnaire flow, we may return multiple answers with varying levels of confidence. The following is the configuration for this flow.
temperature โ This indicates the randomness found when generating an answer. The default value is 0.0 which represents a โHighโ confidence answer.
{
"confidence_levels": {
"low": {
"top_k": 20,
"top_p": 1,
"temperature": 0.5
},
"medium": {
"top_k": 10,
"top_p": 0.5,
"temperature": 0.25
},
"high": {
"top_k": 1,
"top_p": 0.1,
"temperature": 0
},
}
}