On-Prem Deployment Guide

This guide describes how to deploy the On-Premise version of [Ai]levate Revenue Recovery, where customers host and operate both the Database Storage Layer and the AI Compute Layer while [Ai]levate manages the Cloud Services Layer.

This guide describes how to deploy the On-Premise version of [Ai]levate Revenue Recovery, where customers take responsibility for hosting and operating both the Database Storage Layer (Elastic datastore) and the AI Compute Layer (Tenstorrent AI Warehouse). [Ai]levate continues to manage the Cloud Services Layer, providing orchestration, workflow management, authentication, and application delivery. The Relay Service Layer remains customer-hosted and is used to securely connect the customer’s EHR datastore to the platform.

Introduction

The On-Premise model is designed for organizations with strict data residency, sovereignty, or infrastructure control requirements, giving them full ownership of compute and storage, while still benefiting from [Ai]levate’s managed SaaS orchestration layer.


General Approach

In the On-Premise model, customers deploy and manage their own Elastic datastore and AI Warehouse hardware, which serve as the backbone of the platform’s storage and compute layers. These components must be provisioned, secured, patched, and scaled by the customer, while [Ai]levate connects to them through secure service connections. Unlike the SaaS model, the AI Warehouse and datastore require inbound connectivity, meaning customers must configure firewall rules, certificates, and access policies to allow [Ai]levate services to interact with these systems.

The Relay Service Layer remains outbound-only, providing secure connectivity between the EHR system and [Ai]levate’s Cloud Services. This hybrid model enables customers to retain full control of infrastructure while relying on [Ai]levate for orchestration, governance, and compliance enforcement at the cloud layer.


Deployment overview

The On-Prem deployment of [Ai]levate Revenue Recovery is a joint operation between the customer and [Ai]levate. The customer provisions and secures all infrastructure components — Elastic datastore, AI Warehouse, and Relay VM — ensuring that each is reachable and properly configured. Once prerequisites are in place, [Ai]levate engineers onboard the Relay, validate inbound connections to Elastic and the AI Warehouse, and configure the Cloud Services Layer to orchestrate workflows across the customer environment. The process is deliberately staged to minimize disruption, with clear validation checkpoints at each step before moving to production.

sequenceDiagram
    participant C as Customer SysAdmin
    participant E as "[Ai]levate Engineer"
    participant R as Relay VM
    participant SQL as EHR Datastore
    participant ES as Elastic Datastore (On-Prem)
    participant AI as AI Warehouse (vLLM API)
    participant CS as "[Ai]levate Cloud Services"

    %% Preparation
    Note over C: Preparation Phase
    C->>C: Provision Elastic, AI Warehouse, Relay VM
    C->>SQL: Validate SQL connectivity (TCP 1433)
    C->>ES: Enable TLS & access policies
    C->>AI: Expose vLLM API securely

    %% Deployment
    Note over E,R: Deployment Phase
    E->>R: Onboard Relay VM (Azure Arc, tunnel, proxy)
    R->>SQL: Test LAN connectivity to EHR
    R->>CS: Establish outbound tunnel (443)

    %% Integration
    Note over E,CS: Integration Phase
    E->>ES: Validate inbound TLS connectivity from Cloud
    E->>AI: Validate inbound TLS connectivity from Cloud
    CS->>ES: Register Elastic datastore
    CS->>AI: Register AI Warehouse endpoint

    %% Validation
    Note over E,C: Validation Phase
    CS->>SQL: Retrieve test claims
    CS->>AI: Execute sample AI task
    CS->>ES: Store metadata & logs
    E->>C: Provide runbook, confirm handover

Step-by-Step Deployment

Phase

Steps

Preparation (👤 Customer)

• Provision Elastic datastore, AI Warehouse hardware, and Relay VM.
• Validate SQL connectivity from Relay (TCP 1433).
• Configure TLS certificates and DNS for Elastic and AI Warehouse.

Deployment (🤝 Joint)

• [Ai]levate engineer onboards Relay (Azure Arc agent, tunnel, proxy).
• Relay establishes outbound connectivity to Cloud Services.

Integration (🤝 Joint)

• Validate inbound TLS connections from [Ai]levate Cloud to Elastic and AI Warehouse.
• Register endpoints in Cloud Services tenant.

Validation (🤝 Joint)

• Run test claim ingestion from EHR.
• Execute sample AI task on AI Warehouse.
• Store metadata in Elastic datastore.
• Provide handover documentation and runbook.


Technical Prerequisites

The following technical prerequisites must be prepared by the customer before deployment. These ensure the [Ai]levate platform can securely interact with customer-managed infrastructure.

Elastic Datastore Requirements

Cloud Hosting - The [Ai]levate product team recommend to host the required Elastic cluster in a Public Cloud infrastructure owned by the Customer to offer more sizing flexibility as well as an easier networking configuration
AreaRequirementNotes
DeploymentCustomer-provisioned Elastic cluster. A service connection string must be provided to [Ai]levate Engineer (or registered partner).Minimum 3 nodes recommended for HA
SizingBased on claim volume and query concurrencyPlan for growth and redundancy
EncryptionAES-256 encryption at restCustomer responsibility
AccessService account with READ/WRITE permissions. A service connection must be shared to [Ai]levateUsed by [Ai]levate Cloud Services
Backup/DRBackup, restore, and retention policiesCustomer responsibility

AI Warehouse Requirements

AI Warehouse Deployment guide A more detailed deployment guide for the Tenstorrent hardware is available [HERE]


AreaRequirementNotes
HardwareCustomer-procured Tenstorrent hardware. A service connection string must be provided to [Ai]levate Engineer (or registered partner).Must be sized for workload concurrency
Software[Ai]levate-provided AI model imagesRuns in vLLM-compatible runtime
ConnectivityInbound TLS access to vLLM APIExposed to [Ai]levate Cloud Services
SecurityCertificates + TLS enforcedCustomer responsibility
Patch ManagementOS, runtime, and firmware updatesCustomer responsibility

Relay Service Requirements

Relay Service Deployment guide A more detailed deployment guide for the [Ai]levate Relay Service is available [HERE]


AreaRequirementNotes
Operating SystemUbuntu Server LTS (recommended)Other modern Linux distros acceptable
Sizing2 vCPU, 4–8 GB RAM, 20 GB diskLightweight workload
PlacementSame LAN/subnet as SQL datastoreMust reach EHR SQL over TCP 1433
AccessLocal user with sudo privilegesNeeded for [Ai]levate engineers
ConnectivityOutbound 443 to *.ailevate.com and Azure Arc servicesNo inbound rules required
Time SyncNTP enabled and accurateRequired for TLS operations

Network & DNS Requirements

ComponentRequirementNotes
Elastic ClusterAllow inbound TLS connections from [Ai]levate Cloud ServicesSecure firewall rules required
AI WarehouseExpose vLLM API endpoint to [Ai]levateTLS enforced
Relay VMOutbound 443 to [Ai]levate and Azure Arc endpointsOutbound-only design
Internal LANRelay must reach SQL datastore on TCP 1433Validate with nc -vz
DNS ResolutionPublic FQDNs + internal hostnames resolvableForward + reverse lookups required

Elastic Cluster Sizing

Capacity planning for On-Premise deployments must consider both storage and compute growth. Elastic clusters should be sized based on historical claim volumes, with headroom for growth and redundancy Customers are responsible for scaling and upgrading both layers, while [Ai]levate ensures Cloud Services can orchestrate workloads efficiently across them.

As part of its operations, [Ai]levate Revenue Recovery stores Denied Claims and AI analysis. This section provides both minimum and recommended hardware requirements depending on the volume of claims that [Ai]levate will have to analyze.

Due to the amount of data it processes, [Ai]levate is a CPU intensive application. To avoid any bottleneck introduced by the storage (disk or SAN) or the computing power, [Ai]levate offers a minimal and recommended configuration.

  • The minimal performances generally cover the needs of most infrastructures.
  • The recommended performances offers better experience for large or active EHR infrastructures.

Data nodes (primary storage & query workhorses)

Number of Claims to processRAM (per data node)CPU (vCPUs per node)Disk Capacity (per node, SSD)Minimal PerformanceRecommended PerformanceCluster Notes
1 – 2,50016 GB (8 GB heap)4–8500 GB200 MB/s, 3,000 IOPS400 MB/s, 6,000 IOPS3 data nodes, small cluster
2,500 – 5,00032 GB (16 GB heap)8–121 TB300 MB/s, 5,000 IOPS600 MB/s, 10,000 IOPS3–5 data nodes, 1 coord node
2,500 – 7,50064 GB (32 GB heap)12–162 TB400 MB/s, 7,500 IOPS800 MB/s, 15,000 IOPS5–7 data nodes, dedicated master
7,500 – 10,00096 GB (32 GB heap, rest OS cache)16–243–4 TB500 MB/s, 10,000 IOPS1,000 MB/s, 20,000 IOPS7–9 data nodes, 3 master nodes
10,001 – 15,000128 GB (32 GB heap, rest OS cache)24+6 TB600 MB/s, 12,500 IOPS1,200 MB/s, 25,000 IOPS9–12 data nodes, coord + master split
15,001 – 30,000+256 GB (32 GB heap, rest OS cache)32+8–12 TB800 MB/s, 15,000 IOPS1,500 MB/s, 30,000 IOPS12–20+ data nodes, large-scale cluster

Dedicated master-eligible nodes (cluster state & consensus)

Number of Claims to process# Master NodesRAM (per node)CPU (vCPUs)Disk Capacity (SSD)Minimal PerformanceRecommended PerformanceNotes
1 – 2,5000 (co-locate) or 3*16 GB (8–12 GB heap)4100 GB100 MB/s, 500 IOPS200 MB/s, 1,000 IOPS*3 dedicated for HA/SLA
2,500 – 5,000316 GB (8–12 GB heap)4–6100–150 GB100 MB/s, 500 IOPS300 MB/s, 1,500 IOPSAZ-spread required
2,500 – 7,500332 GB (16 GB heap)6–8150 GB100 MB/s, 500 IOPS300 MB/s, 1,500 IOPSWatch cluster state size
7,500 – 10,000332 GB (16 GB heap)8200 GB200 MB/s, 1,000 IOPS400 MB/s, 2,000 IOPSKeep them master-only
10,001 – 15,000332–64 GB (16–24 GB heap)8–12200 GB200 MB/s, 1,000 IOPS400 MB/s, 2,000 IOPSIncrease heap if state grows
15,001 – 30,000+364 GB (24–32 GB heap)12–16200–300 GB300 MB/s, 1,500 IOPS600 MB/s, 3,000 IOPSConsider voter-only masters in very large clusters

Coordinating (query/router) nodes

Number of Claims to process# Coordinating NodesRAM (per node)CPU (vCPUs)Disk Capacity (SSD)Minimal PerformanceRecommended PerformanceNotes
1 – 2,5000–116 GB (8–12 GB heap)4–8100 GB200 MB/s, 1,000 IOPS400 MB/s, 2,000 IOPSAdd if dashboards/API are spiky
2,500 – 5,000132 GB (16 GB heap)8–12100–150 GB300 MB/s, 1,500 IOPS600 MB/s, 3,000 IOPSPut behind LB
2,500 – 7,5001–264 GB (24–32 GB heap)12–16150–200 GB400 MB/s, 2,000 IOPS800 MB/s, 4,000 IOPSScale with query concurrency
7,500 – 10,0002–364–96 GB (32 GB heap)16–24200 GB500 MB/s, 3,000 IOPS1,000 MB/s, 6,000 IOPSKeep stateless; autoscale if possible
10,001 – 15,0003–496–128 GB (32 GB heap)24+200 GB600 MB/s, 4,000 IOPS1,200 MB/s, 8,000 IOPSSeparate read vs write paths if needed
15,001 – 30,000+4–6128–192 GB (32 GB heap)32+200–300 GB800 MB/s, 5,000 IOPS1,500 MB/s, 10,000 IOPSAdd more for heavy aggregations

Additional Requirements

  • Keep JVM heap ≤ 32 GB on all roles; give extra RAM to the OS page cache (Lucene loves it).
  • Use SSD/NVMe everywhere; spinning disks won’t meet the IOPS targets.
  • Shards: aim for ~50 GB per shard (avoid tiny shards); size node disk to keep utilization < 70%.
  • HA: 3 dedicated masters, spread across AZs; avoid colocating master with data under load.
  • Scale out > scale up: add nodes to handle concurrency instead of endlessly growing single-node specs.
  • Networking: low latency between nodes (same region/AZ set); place clients near coordinating nodes.

Networking

On-Premise Topology

This chart summarizes the [Ai]levate network topology


flowchart TD
    subgraph AilevateCloud["[Ai]levate Cloud (Azure)"]
        CloudServices["Cloud Services Layer (RCM Orchestration, Auth, Apps)"]
    end

    subgraph Customer["Customer Environment"]
        EHR["EHR System (Epic, Cerner, etc.)"]
        Relay["Relay Service (Outbound-only)"]

        subgraph AICompute["AI Compute Layer"]
            Warehouse["Dedicated AI Warehouse (Tenstorrent Hardware, vLLM)"]
        end
        subgraph Storage["Database Storage Layer"]
            Elastic["Elastic Datastore (Encrypted at Rest, Tenant-Isolated)"]
        end
    end

    EHR <--> Relay
    Relay --> CloudServices
    CloudServices --> Warehouse
    CloudServices --> Elastic

Network Bandwidth Sizing

In addition to compute and storage capacity, the network bandwidth between the Cloud Services Layer (Azure), the AI Compute Layer (AI Warehouse), and the Database Storage Layer (Elastic) plays a critical role in ensuring predictable performance. Latency and insufficient bandwidth may directly impact the speed of claim ingestion, AI task execution, and overall denial remediation workflows. The following table provides minimum and recommended bandwidth values based on the average number of claim objects processed per minute.


flowchart LR
    subgraph Ailevate["[Ai]levate Cloud Services (Azure)"]
        Services["Cloud Services Layer
(RCM Orchestration, Auth, Apps)"]
    end

    subgraph Customer["Customer Environment"]
        Elastic["Elastic Datastore (Encrypted)"]
        Warehouse["AI Warehouse (Tenstorrent, vLLM)"]
    end

    Services <--->|TLS Encrypted Traffic<br/>1–30 Mbps depending on claims| Elastic
    Services <--->|TLS Encrypted Traffic<br/>1–30 Mbps depending on claims| Warehouse

Claims Processed per MinuteMinimum BandwidthRecommended Bandwidth
1 – 5001 Mbps2 Mbps
501 – 7,505 Mbps10 Mbps
7,51 – 4,00015 Mbps30 Mbps

Networking & Connectivity

Concept Unlike the SaaS model, On-Premise deployments require inbound connectivity for both the Elastic datastore and AI Warehouse. Customers must configure firewall rules, certificates, and DNS entries to securely expose these services to [Ai]levate Cloud Services. The Relay Service remains outbound-only, minimizing attack surface on the EHR side. VPN or PrivateLink can be implemented to provide private connectivity instead of public TLS endpoints.


Network Flow Matrix The table below summarizes all required network flows between the [Ai]levate Cloud Services, the customer-managed Elastic datastore, AI Warehouse, and Relay Service. It specifies the direction of traffic, protocol/port, and the entity responsible for configuration. This matrix should be used by customer networking and security teams to validate firewall rules and connectivity.


flowchart TD
    subgraph Customer["Customer Environment"]
        EHR["EHR SQL Datastore"]
        Relay["Relay VM (Outbound-only)"]
        Elastic["Elastic Datastore (9200/tls)"]
        Warehouse["AI Warehouse (vLLM API, 8080/tls)"]
    end

    subgraph Ailevate["[Ai]levate Cloud Services (Azure)"]
        Cloud["Cloud Services Layer
(Orchestration, Auth, Apps)"]
    end

    Relay -->|TCP 1433| EHR
    Relay -->|443 TLS| Cloud
    Cloud -->|443 TLS| Elastic
    Cloud -->|443 TLS| Warehouse
    Elastic -->|9200 TLS| Cloud
    Warehouse -->|8080 TLS| Cloud

SourceDestinationProtocol / PortDirectionEncryptionResponsibility
Relay VM (Customer)[Ai]levate CloudHTTPS / 443OutboundTLS 1.2+Customer (firewall rules)
Relay VM (Customer)EHR DatastoreTCP / 1433 (NextGen)LANTLS / Local NetworkCustomer
[Ai]levate Cloud ServicesElastic DatastoreHTTPS / 9200 (Elastic API)Inbound to CustomerTLS 1.2+Customer (expose Elastic securely)
[Ai]levate Cloud ServicesAI Warehouse (vLLM API)HTTPS / 8080*Inbound to CustomerTLS 1.2+Customer (expose vLLM securely)
Elastic Datastore[Ai]levate Cloud ServicesHTTPS / 443OutboundTLS 1.2+Customer
AI Warehouse[Ai]levate Cloud ServicesHTTPS / 443OutboundTLS 1.2+Customer
DNS Resolver (Customer)Public DNS / InternalUDP/TCP 53OutboundCustomer

Operation

In On-Premise deployments, customers assume operational ownership of the Elastic datastore and AI Warehouse, including patching, scaling, monitoring, backups, and disaster recovery. They must also maintain TLS certificates, inbound firewall rules, and access controls for both components. The Relay VM, while lightweight, also requires customer patching, SQL credential rotation, and log monitoring.

[Ai]levate remains responsible for the Cloud Services Layer, ensuring RCM workflows, authentication, orchestration, and application delivery operate seamlessly. This division of responsibilities allows customers to maintain sovereignty over their infrastructure while benefiting from [Ai]levate’s managed orchestration and compliance capabilities.


flowchart TB
    subgraph Customer["Customer Operations"]
        C1["Manage Elastic Datastore<br/>(scaling, backup, patching)"]
        C2["Operate AI Warehouse<br/>(hardware, vLLM runtime, patching)"]
        C3["Maintain Relay VM<br/>(patching, SQL creds, logs)"]
    end

    subgraph Ailevate["[Ai]levate Operations"]
        A1["Manage Cloud Services<br/>(apps, workflows, orchestration)"]
        A2["Ensure Compliance<br/>(HIPAA, tenant isolation guidance)"]
    end

    Customer -->|Provide infrastructure| Ailevate
    Ailevate -->|Platform orchestration| Customer

Security and Data Privacy

The On-Premise deployment enforces the same “secure by design” principles as the SaaS model but gives customers direct control over storage and compute. All data at rest in Elastic must be encrypted using AES-256, and all data in transit must use TLS 1.2+. Customers may configure BYOK via Azure Key Vault, integrating their own key lifecycle policies.

By design, the AI Warehouse never stores data, only executing tasks via the vLLM interface. The Elastic datastore remains entirely customer-controlled, with [Ai]levate accessing it only through secured service connections. Strong role-based access controls (RBAC) ensure fine-grained permissions within applications, while single-tenant isolation guarantees logical separation across customers.

Instead of replicating or exporting datasets, [Ai]levate employs secure data sharing patterns, executing queries without exposing raw storage. This approach, combined with HIPAA compliance across the platform, enables healthcare organizations to maintain sovereignty while meeting regulatory requirements.


flowchart TB
    subgraph Customer["Customer-Controlled"]
        Elastic["Elastic Datastore (Encrypted, BYOK supported)"]
        Warehouse["AI Warehouse (Tenstorrent, vLLM - No Storage)"]
    end

    subgraph Ailevate["[Ai]levate Cloud Services"]
        Services["Orchestration, Auth, Workflows"]
    end

    Services --> Elastic
    Services --> Warehouse

    classDef secure fill:#cce5ff,stroke:#333,stroke-width:1.5px;
    classDef private fill:#e6ffe6,stroke:#333,stroke-width:1.5px;

    Elastic:::private
    Warehouse:::private
    Services:::secure

Security Recommendations for on-premise implementation

Control AreaOn-Premise Implementation
EncryptionAES-256 for storage and transit; customer-managed
Key ManagementBYOK via Azure Key Vault; customer controls policy
Separation of DutiesElastic stores data; AI Warehouse executes only
Tenant IsolationLogical isolation across [Ai]levate Cloud Services
Access Control (RBAC)Fine-grained role-based permissions in apps
Secure Data SharingQueries executed without exposing raw data
ComplianceHIPAA alignment across storage, compute, and cloud

Checklist (Summary)

ResponsibilityActionOwner
Elastic datastoreProvision, encrypt, backup👤 Customer
AI WarehouseProcure, configure, patch👤 Customer
Relay VMDeploy, patch, monitor👤 Customer
NetworkingConfigure inbound (Elastic & vLLM), outbound (Relay)👤 Customer
Identity/SSOConfigure SSO or Magic Link🤝 Joint
Cloud Services LayerProvision & operate🏢 Ailevate
ComplianceHIPAA (cloud + infra)🤝 Shared