Vector Database Web Gateway offload encoding inbound and generative inference outbound

A scaled deployment of IRIS web application tends to have multiple web servers.

These play well to load-balance traffic, providing good utilization on busy applications.

Web Servers take over the checking and serving static of content, to let the Database Server focus on data query centered tasks.

Recently Vector Similarity Search with generative AI has gone mainstream.

This can be broken down into several steps of activity

1) Generate an encoding for an input question. For example: Convert string "Tell me how to increase the lock table size" into an encoding for vector search.

2) Using the new encoding scan, index retrieval and calculate from a Vector column the highest similarity content

3) Return one or more text content that with highest similarity

4) [ optional ] Generate content based on prompt and retrieved text content

So the idea here is to delegate / offload both step 1 and 4 to the web servers, while retaining vector search within the main database.

Web Server deployment would normally be for a CPU type cloud node.

So the hypothesis is for the web server to alternatively be hosted on a TPU ( Inference ) type node, where it hosts a model for:

Deriving encoding input
[ optional ] generative output enriching outbound content.

This mitigates need to:

Sharing a model to the client for generating encoding
Additional pre-query request-response cycle step to generate encodings next used for vector search.

Inbound convention

A convention for client web requests to indicate which form fields require generating an encoding for
Configuration to allow functionality for csp application
Metrics for utilization
Header shared to database to indicate generative AI is available on response processing.

Output template

There would need to be for output processing:

"prompt placeholder" - removed after processing
One or more "content placeholders" - removed after processing
"Output placeholder" - replaced by generative content
Metrics for utilization

Output prompt templates could be focused on:

Summarizing information

Shorten length of text
Convert list to sentences

Redacting / anonymizing content where appropriate based on user context

Clarifying workflow

Web Browser -> Web Gateway Encode -> Database Search and output -> Web Gateway Generative Transform -> Web Browser Render

Encodings for pictures

For uploaded picture encoding, a web gateway to database connection optimization could be to discard the uploaded picture content after first generating the encoding vector and / or classifications. Ie: Where only need to pass on the lightweight encoding / classifications as input to the database.

For example:

Wound / burn / rash / skin inflammation identification from uploaded image and retrieval and summary of corresponding patient knowledge data.

Summary

Essentially looking for a viable way for encoder and generative transform functionality to be provided out-of-the-box, in a plugable way, to enrich existing and new web applications. So application developers don't need to reinvent the approach, or obtain new training or experience to achieve greater impact.

ADMIN RESPONSE

Apr 23, 2024

Thank you for submitting the idea. The status has been changed to "Planned or In Progress".
This is not a commitment; plans are subject to change. Stay tuned!

Post comment

Admin

Vadim Aniskin

Reply
| Nov 15, 2023

@Alex Woodhead, you have a comment on your idea. Please answer it to help your idea to be promoted.

0 reply Hide replies

Thomas Dyar

Reply
| Nov 9, 2023

I think this relies too much on our creaky not-very-modern web components, IMO. If we really had a modern, robust web gateway framework on which to build such a scheme, it might make it more attractive. We will be building some of this functionality into IRIS using Embedded Python at least, and maybe supporting a sharded instance will get some of the distributed processing benefits that this idea is supposed to deliver.

1 reply Hide replies

Please enter your email address

RELATED IDEAS