Skip to Main Content
InterSystems Ideas
We love hearing from our users. Tell us what you want to see next and upvote ideas from the community.
* Bugs and troubleshooting should as usual go through InterSystems support.
Status Needs review
Created by Tirthankar Bachhar
Created on Nov 24, 2025

AI Digital Twin & Safe Replay Studio: One‑click production clone, anonymized traffic replay, and “what‑if” simulation

Problem

  • Changes to productions (new DTLs, rules, adapters, timeouts, upgrades) are risky because test environments rarely mirror real traffic, partner quirks, or peak volumes.

  • Reproducing an incident or validating a partner change requires manual data gathering, custom scripts, and coordination, delaying delivery and increasing outage risk.

Proposal

  • Add a “Create Digital Twin” wizard in the Management Portal that clones an existing production into an isolated sandbox namespace and replays real traffic (anonymized) to validate changes before go‑live.

  • The studio uses AI to auto‑generate targeted test cases, predict risk and capacity impact, and highlight behavioral diffs between “current” and “candidate” configurations.

Customer value

  • Faster onboarding and safer releases; fewer production incidents; confident partner testing; clear audit trail for compliance.

  • Shortens days/weeks of manual UAT to hours, with measurable risk scoring and automated rollback guidance.

Key capabilities

  • One‑click clone

    • Copies Ens.Config.Production and Ens.Config.Item into a sandbox namespace.

    • Automatically stubs outbound Operations (switches to simulated endpoints or “Null” ops) to avoid touching partners.

    • Imports related classes/DTLs/BPL/RuleSets and compiles.

  • Data capture and replay

    • Sources: Message Bank, Ens.MessageHeader + body payloads, journals, and Visual Trace links.

    • Choose time window, message types, and rate: 1x, accelerated, or peak profile.

    • Maintains causality/correlation (SessionId, MessageBodyId, Sync responses).

  • PHI‑safe anonymization

    • Built‑in de‑identification transforms for HL7 (segment/field maps), FHIR (bulk deface via FHIRPath), JSON/XML (XPath/JSONPath rules).

    • Configurable tokenization so referential integrity across messages is preserved.

  • AI test authoring and risk scoring

    • Analyzes historical errors/timeouts and auto‑creates edge‑case messages.

    • Suggests mutation tests for fields with high defect history or schema drift.

    • Produces a Risk Score per change (DTL edit, rule tweak, timeout change) with top contributing factors.

  • Diff and impact analysis

    • Compares old vs new behavior at multiple levels:

      • Routing decisions (RuleSet outcomes)

      • DTL field‑level differences (added/removed/mutated values)

      • Operation call patterns, HTTP status mixes, latency distributions

    • Generates a human‑readable “Change Impact Report” and a machine‑readable JSON for CI/CD gates.

  • Capacity and SLA simulation

    • Predicts queue growth, CPU/IO hot spots, and p95 latency under projected volumes.

    • Recommends adapter settings (pool size, retry/backoff), index suggestions, and throttling strategy.

  • Release support

    • Exports a self‑contained zpm module with test corpus and replay scripts for CI.

    • Optional blue/green and canary assist: route X% of live traffic through the candidate flow and compare outcomes in real time (read‑only shadow mode).

Technical design (IRIS‑specific)

  • Namespace topology

    • DTN_ sandbox namespace with roles limited to read/write locally; no external credentials.

    • Production definition copied via Ens.Config.Production/Item APIs; operations swapped to DTN.MockOperation subclasses.

  • Replay engine

    • DTN.Replay.Service reads Message Bank/Ens.MessageBody by ts range; drives messages through the cloned Production preserving sequence and sync semantics.

    • Uses EnsLib.* adapters in loopback or embedded mocks; generates canned responses from recorded traces when needed.

  • Anonymization

    • DTN.Deidentify.DTLs for HL7/FHIR/JSON with pluggable lookup/token services (HS.* libraries where available).

    • Policy stored in DTN.Security.Policy; enforced on import and before replay.

  • Diff engine

    • DTN.Diff.Router compares RuleLog; DTN.Diff.DTL walks source/target objects and produces field‑level deltas with severity tags.

    • Results persisted in DTN.Results.* tables; exposed via SQL views for dashboards.

  • AI integration

    • Embedded Python for test generation and risk scoring (scikit‑learn/xgboost) or IntegratedML models over DTN.Results.Aggregates.

    • LLM-powered explanations that cite exact rules/DTL nodes and link to source.

Portal UX

  • New action: Interoperability > Productions > “Create Digital Twin”

  • Wizard steps: Select production and time window -> choose anonymization policy -> select changes to test (new DTL, rule file, parameters) -> run replay -> view Impact Report.

  • Dashboards: Diff Summary, Risk Score, Capacity Forecast, and “Open in Visual Trace” deep links for any anomaly.

Security and governance

  • Never reuses production credentials; outbound endpoints are mocked by default.

  • All PHI handling is policy‑driven with audit logs of every sample and transform.

  • Role‑based access; approvals required to export artifacts or enable shadow traffic.

MVP acceptance criteria

  • Clone a production into a sandbox namespace; swap outbound Operations to mocks.

  • Import messages from Message Bank for a selected window; apply de‑identification.

  • Execute replay and show:

    • Routing decision diffs

    • Field‑level DTL diffs for at least one message type

    • Latency and queue forecasts under 1x and 3x volume

  • Generate an Impact Report and export a zpm bundle with test data and replay job.

  • Performance: replay 10k messages in <10 minutes on a standard dev box.

Example scenarios

  • Validate a new HL7→FHIR DTL before a payer go‑live; catch unexpected PID/OBX mapping changes and propose fixes.

  • Reproduce a weekend outage using last week’s traffic and confirm that a timeout + retry policy prevents reoccurrence.

  • Capacity plan for a new site onboarding by simulating 2x traffic with current hardware.

Success metrics

  • 70% reduction in UAT cycle time for interface changes.

  • 50% fewer post‑release defects tied to mapping/routing.

  • Documented audit packages produced automatically for every release.

  • ADMIN RESPONSE
    Nov 24, 2025

    Thank you for submitting the idea. The status has been changed to "Needs review".

    Stay tuned!