Skip to Main Content
InterSystems Ideas
We love hearing from our users. Tell us what you want to see next and upvote ideas from the community.
* Bugs and troubleshooting should as usual go through InterSystems support.
Status Needs review
Categories Generative AI
Created by Alex Woodhead
Created on Nov 8, 2023

Vector Database Web Gateway offload encoding inbound and generative inference outbound

A scaled deployment of IRIS web application tends to have multiple web servers.

These play well to load-balance traffic, providing good utilization on busy applications.

Web Servers take over the checking and serving static of content, to let the Database Server focus on data query centered tasks.

Recently Vector Similarity Search with generative AI has gone mainstream.

This can be broken down into several steps of activity

1) Generate an encoding for an input question. For example: Convert string "Tell me how to increase the lock table size" into an encoding for vector search.

2) Using the new encoding scan, index retrieval and calculate from a Vector column the highest similarity content

3) Return one or more text content that with highest similarity

4) [ optional ] Generate content based on prompt and retrieved text content

So the idea here is to delegate / offload both step 1 and 4 to the web servers, while retaining vector search within the main database.

Web Server deployment would normally be for a CPU type cloud node.

So the hypothesis is for the web server to alternatively be hosted on a TPU ( Inference ) type node, where it hosts a model for:

  • Deriving encoding input

  • [ optional ] generative output enriching outbound content.

This mitigates need to:

  • Sharing a model to the client for generating encoding

  • Additional pre-query request-response cycle step to generate encodings next used for vector search.

Inbound convention

  • A convention for client web requests to indicate which form fields require generating an encoding for

  • Configuration to allow functionality for csp application

  • Metrics for utilization

  • Header shared to database to indicate generative AI is available on response processing.

Output template

There would need to be for output processing:

  • "prompt placeholder" - removed after processing

  • One or more "content placeholders" - removed after processing

  • "Output placeholder" - replaced by generative content

  • Metrics for utilization

Output prompt templates could be focused on:

Summarizing information

  • Shorten length of text

  • Convert list to sentences

Redacting / anonymizing content where appropriate based on user context

Clarifying workflow

Web Browser -> Web Gateway Encode -> Database Search and output -> Web Gateway Generative Transform -> Web Browser Render

Encodings for pictures

For uploaded picture encoding, a web gateway to database connection optimization could be to discard the uploaded picture content after first generating the encoding vector and / or classifications. Ie: Where only need to pass on the lightweight encoding / classifications as input to the database.

For example:

Wound / burn / rash / skin inflammation identification from uploaded image and retrieval and summary of corresponding patient knowledge data.


Essentially looking for a viable way for encoder and generative transform functionality to be provided out-of-the-box, in a plugable way, to enrich existing and new web applications. So application developers don't need to reinvent the approach, or obtain new training or experience to achieve greater impact.

    Nov 8, 2023

    Thank you for submitting the idea. The status has been changed to "Needs review".

    Stay tuned!

  • Admin
    Vadim Aniskin
    Nov 15, 2023

    @Alex Woodhead, you have a comment on your idea. Please answer it to help your idea to be promoted.

  • Thomas Dyar
    Nov 9, 2023

    I think this relies too much on our creaky not-very-modern web components, IMO. If we really had a modern, robust web gateway framework on which to build such a scheme, it might make it more attractive. We will be building some of this functionality into IRIS using Embedded Python at least, and maybe supporting a sharded instance will get some of the distributed processing benefits that this idea is supposed to deliver.

    1 reply