A scaled deployment of IRIS web application tends to have multiple web servers.
These play well to load-balance traffic, providing good utilization on busy applications.
Web Servers take over the checking and serving static of content, to let the Database Server focus on data query centered tasks.
Recently Vector Similarity Search with generative AI has gone mainstream.
This can be broken down into several steps of activity
1) Generate an encoding for an input question. For example: Convert string "Tell me how to increase the lock table size" into an encoding for vector search.
2) Using the new encoding scan, index retrieval and calculate from a Vector column the highest similarity content
3) Return one or more text content that with highest similarity
4) [ optional ] Generate content based on prompt and retrieved text content
So the idea here is to delegate / offload both step 1 and 4 to the web servers, while retaining vector search within the main database.
Web Server deployment would normally be for a CPU type cloud node.
So the hypothesis is for the web server to alternatively be hosted on a TPU ( Inference ) type node, where it hosts a model for:
Deriving encoding input
[ optional ] generative output enriching outbound content.
This mitigates need to:
Sharing a model to the client for generating encoding
Additional pre-query request-response cycle step to generate encodings next used for vector search.
A convention for client web requests to indicate which form fields require generating an encoding for
Configuration to allow functionality for csp application
Metrics for utilization
Header shared to database to indicate generative AI is available on response processing.
There would need to be for output processing:
"prompt placeholder" - removed after processing
One or more "content placeholders" - removed after processing
"Output placeholder" - replaced by generative content
Metrics for utilization
Output prompt templates could be focused on:
Summarizing information
Shorten length of text
Convert list to sentences
Redacting / anonymizing content where appropriate based on user context
Web Browser -> Web Gateway Encode -> Database Search and output -> Web Gateway Generative Transform -> Web Browser Render
For uploaded picture encoding, a web gateway to database connection optimization could be to discard the uploaded picture content after first generating the encoding vector and / or classifications. Ie: Where only need to pass on the lightweight encoding / classifications as input to the database.
For example:
Wound / burn / rash / skin inflammation identification from uploaded image and retrieval and summary of corresponding patient knowledge data.
Essentially looking for a viable way for encoder and generative transform functionality to be provided out-of-the-box, in a plugable way, to enrich existing and new web applications. So application developers don't need to reinvent the approach, or obtain new training or experience to achieve greater impact.
Thank you for submitting the idea. The status has been changed to "Planned or In Progress".
This is not a commitment; plans are subject to change. Stay tuned!
@Alex Woodhead, you have a comment on your idea. Please answer it to help your idea to be promoted.
I think this relies too much on our creaky not-very-modern web components, IMO. If we really had a modern, robust web gateway framework on which to build such a scheme, it might make it more attractive. We will be building some of this functionality into IRIS using Embedded Python at least, and maybe supporting a sharded instance will get some of the distributed processing benefits that this idea is supposed to deliver.