Skip to Main Content
InterSystems Ideas

Have an idea, suggestion, or something that doesn’t work as expected in InterSystems products or services? Share it here on the InterSystems Ideas Portal.

The Ideas Portal is where community members can propose improvements, report bugs, and help influence the product roadmap across InterSystems products and the overall developer experience. 22% of submitted ideas are implemented by InterSystems or members of the Developer Community.

💡 Ideas and bugs are both welcome, no matter how big or small. You can submit feature requests, usability improvements, workflow suggestions, and bug reports. Whether you’re an experienced expert or just getting started, your fresh perspective is valuable.

🛠️ About bugs and fixes. If you have access to InterSystems WRC, please submit bugs there for immediate action. Bug reports submitted through the Ideas Portal are reviewed and tracked, but do not guarantee immediate resolution.

Start by sharing what could be better - the community and our teams will help take it from there.

Status Needs review
Created by Tirthankar Bachhar
Created on Nov 24, 2025

AI Log Analytics & Predictive Anomaly Detection for Interoperability

Problem

  • Operational issues (bad data, unusual traffic spikes, rising error rates) are discovered too late, after queues back up or downstream systems fail.

  • Current Event Log, Message Viewer, and Visual Trace are excellent for investigation, but they’re reactive and manual. Teams need proactive signals and clear root‑cause hints.

Proposal

  • Add a new “Log Analytics” section in the Management Portal that uses IRIS SQL + IntegratedML/Embedded Python to learn baselines from historical logs and predict anomalies in near real time.

  • It surfaces patterns of probable bad data, unusual traffic, adapter errors, latency changes, and queue growth—per Service/Process/Operation—then suggests likely causes and next actions.

Data sources (IRIS/Interoperability specific)

  • Ens.EventLog (errors/warnings/info), Ens.MessageHeader (message metadata), EnsLib.HL7.SearchTable (field indexes), EnsLib.FHIR.* resource validation results, Adapter logs (EnsLib.* TCP/HTTP/File/SFTP/JDBC), system metrics via %SYS.Monitor and ^%SS, production metrics (queue depth, retry counts, message sizes, latency).

  • Optional: Message Bank for centralized multi‑namespace aggregation.

  • Journal timestamps for high‑precision time alignment.

Data model (new Observability schema)

  • OBS.Event (ts, Production, Namespace, Host, Service, Process, Operation, AdapterType, Direction, Status, ErrorCode, ErrorText, RetryCount, LatencyMs, MsgSize, RemoteHost, Protocol, MessageType, HL7Event, FHIRResource, HTTPStatus, TLSFlags).

  • OBS.FieldQuality (per message type): RequiredMissingCnt, UnknownCodeCnt, OutOfRangeCnt, DuplicatesCnt, CoercionFailuresCnt, SegmentCountStats.

  • OBS.Aggregates (1min/5min/1hr windows): volume, error_rate, p95_latency, queue_depth, backoff_rate.

  • Views that join OBS.* back to Ens.* classes for deep-linking to Visual Trace/Message Viewer.

ML/Analytics approach

  • Baselines per interface: learn normal ranges by time‑of‑day/day‑of‑week and seasonality.

  • Unsupervised anomaly detection (Isolation Forest or robust Z‑score) for:

    • Volume spikes/drops

    • Error/timeout surges

    • Latency shifts

    • Queue depth acceleration

  • Predictive forecasts (next 30–60 minutes):

    • Expected inbound volume and error rate

    • Probability of SLA breach given current trends

  • Data quality scoring for HL7/FHIR/JSON:

    • Required fields missing (e.g., PID‑3, MSH version)

    • Unknown value sets/code systems

    • Structural deviations (segment/resource cardinality)

  • Implementation options:

    • IntegratedML models: CREATE MODEL, TRAIN MODEL, PREDICT via SQL over OBS.Aggregates.

    • Embedded Python (scikit‑learn/prophet) for advanced time‑series; exposed as IRIS methods or SQL table functions.

    • Feature store built with computed columns and scheduled tasks (Ens.Job) to maintain aggregates.

Portal UX

  • New menu: Interoperability > Log Analytics.

  • Dashboards:

    • Traffic & Error Heatmap by Service/Operation

    • Anomaly Timeline with severity score and confidence

    • Queue Risk Forecast (when/where backlog will exceed threshold)

    • Data Quality Panel (top offending fields, segments, value sets)

  • Drill‑downs:

    • Click an anomaly to open Visual Trace filtered to the affected window and components.

    • “Explain” side panel: contributing features (e.g., remote host, message type), top correlated signals, and suggested actions.

  • Alerts:

    • Rules like “if AnomalyScore > 0.8 and queue_depth_slope > X, alert via ENS.Alert or email/SNMP/Webhook.”

    • Maintenance: certificate expiry predictors (based on TLSFlags/errors), disk space risk (journal size/velocity), connection flaps.

Suggested actions (generated)

  • Throttle or burst buffer recommendations per adapter.

  • Routing rule hints (e.g., quarantine messages with unknown PV1 values).

  • Index suggestions for heavy SQL in Operations.

  • Configuration checks: TCP keepalive, HTTP timeouts, SSL/TLS ciphers, retry/backoff tuning.

Technical details

  • OBS.* classes stored in a dedicated schema; populated by:

    • A lightweight Business Service/Process that subscribes to Ens.EventLog and message metadata streams.

    • Scheduled Task that compacts and aggregates into OBS.Aggregates.

  • Models:

    • Train nightly; update baselines continuously using sliding windows.

    • Governance: versioned models stored in a class (OBS.ModelRegistry), with metrics (AUC, precision/recall for labeled events when available).

  • Performance:

    • Use shard-friendly tables (IDKEY on ts + component) and bitmap indexes for event type filters.

    • Efficient queries via window functions (AVG, STDDEV, PERCENTILE_DISC) and materialized views for hot panels.

Security and privacy

  • Read‑only views by default; no PHI content stored beyond necessary field statistics unless enabled.

  • Role‑aware: Ops role can view/acknowledge; Admin can change thresholds and retraining cadence.

  • Redaction options for ErrorText; configurable field sampling.

  • Works on‑prem or with customer‑selected LLM/ML endpoints; no production payloads sent externally unless approved.

MVP acceptance criteria

  • OBS.Event and OBS.Aggregates populated from Ens.EventLog and MessageHeader across at least one production.

  • Anomaly Timeline showing predicted spikes/drops with confidence and deep links to Visual Trace.

  • Data Quality Panel for HL7 messages using EnsLib.HL7.SearchTable indices (missing required, unknown codes).

  • IntegratedML model that forecasts next‑hour error rate or volume per Service and exposes PREDICT() in SQL.

  • Alerting to ENS.Alert with severity and suggested next step.

  • Dashboard performance: <2s for last 24h view; <5s for 7‑day view.

Example detections

  • “Inbound HL7 ADT volume +180% vs baseline; unknown PID‑3 code system rising; probable bad feed from Host X.”

  • “Operation OP_FHIR posting 429s increasing; forecast SLA breach in 25 minutes; suggest retry/backoff adjustments.”

  • “Queue depth for BS_FileIn accelerating; expected to exceed 5,000 in 40 minutes; recommend temporary throttle and quarantine rule.”

Success metrics

  • 50% faster time‑to‑detect issues compared to manual monitoring.

  • 30% reduction in after‑hours incidents caused by unnoticed data/traffic anomalies.

  • Measurable decrease in message reprocessing due to early quarantine of bad data.

  • ADMIN RESPONSE
    Nov 24, 2025

    Thank you for submitting the idea. The status has been changed to "Needs review".

    Stay tuned!