On-Prem Deployment Guide

This guide describes how to deploy the On-Premise version of [Ai]levate Revenue Recovery, where customers take responsibility for hosting and operating both the Database Storage Layer (Elastic datastore) and the AI Compute Layer (Tenstorrent AI Warehouse). [Ai]levate continues to manage the Cloud Services Layer, providing orchestration, workflow management, authentication, and application delivery. The Relay Service Layer remains customer-hosted and is used to securely connect the customer’s EHR datastore to the platform.

Introduction

The On-Premise model is designed for organizations with strict data residency, sovereignty, or infrastructure control requirements, giving them full ownership of compute and storage, while still benefiting from [Ai]levate’s managed SaaS orchestration layer.

General Approach

In the On-Premise model, customers deploy and manage their own Elastic datastore and AI Warehouse hardware, which serve as the backbone of the platform’s storage and compute layers. These components must be provisioned, secured, patched, and scaled by the customer, while [Ai]levate connects to them through secure service connections. Unlike the SaaS model, the AI Warehouse and datastore require inbound connectivity, meaning customers must configure firewall rules, certificates, and access policies to allow [Ai]levate services to interact with these systems.

The Relay Service Layer remains outbound-only, providing secure connectivity between the EHR system and [Ai]levate’s Cloud Services. This hybrid model enables customers to retain full control of infrastructure while relying on [Ai]levate for orchestration, governance, and compliance enforcement at the cloud layer.

Deployment overview

The On-Prem deployment of [Ai]levate Revenue Recovery is a joint operation between the customer and [Ai]levate. The customer provisions and secures all infrastructure components — Elastic datastore, AI Warehouse, and Relay VM — ensuring that each is reachable and properly configured. Once prerequisites are in place, [Ai]levate engineers onboard the Relay, validate inbound connections to Elastic and the AI Warehouse, and configure the Cloud Services Layer to orchestrate workflows across the customer environment. The process is deliberately staged to minimize disruption, with clear validation checkpoints at each step before moving to production.

sequenceDiagram
    participant C as Customer SysAdmin
    participant E as "[Ai]levate Engineer"
    participant R as Relay VM
    participant SQL as EHR Datastore
    participant ES as Elastic Datastore (On-Prem)
    participant AI as AI Warehouse (vLLM API)
    participant CS as "[Ai]levate Cloud Services"

    %% Preparation
    Note over C: Preparation Phase
    C->>C: Provision Elastic, AI Warehouse, Relay VM
    C->>SQL: Validate SQL connectivity (TCP 1433)
    C->>ES: Enable TLS & access policies
    C->>AI: Expose vLLM API securely

    %% Deployment
    Note over E,R: Deployment Phase
    E->>R: Onboard Relay VM (Azure Arc, tunnel, proxy)
    R->>SQL: Test LAN connectivity to EHR
    R->>CS: Establish outbound tunnel (443)

    %% Integration
    Note over E,CS: Integration Phase
    E->>ES: Validate inbound TLS connectivity from Cloud
    E->>AI: Validate inbound TLS connectivity from Cloud
    CS->>ES: Register Elastic datastore
    CS->>AI: Register AI Warehouse endpoint

    %% Validation
    Note over E,C: Validation Phase
    CS->>SQL: Retrieve test claims
    CS->>AI: Execute sample AI task
    CS->>ES: Store metadata & logs
    E->>C: Provide runbook, confirm handover

Step-by-Step Deployment

Phase	Steps
Preparation (👤 Customer)	• Provision Elastic datastore, AI Warehouse hardware, and Relay VM. • Validate SQL connectivity from Relay (TCP 1433). • Configure TLS certificates and DNS for Elastic and AI Warehouse.
Deployment (🤝 Joint)	• [Ai]levate engineer onboards Relay (Azure Arc agent, tunnel, proxy). • Relay establishes outbound connectivity to Cloud Services.
Integration (🤝 Joint)	• Validate inbound TLS connections from [Ai]levate Cloud to Elastic and AI Warehouse. • Register endpoints in Cloud Services tenant.
Validation (🤝 Joint)	• Run test claim ingestion from EHR. • Execute sample AI task on AI Warehouse. • Store metadata in Elastic datastore. • Provide handover documentation and runbook.

Technical Prerequisites

The following technical prerequisites must be prepared by the customer before deployment. These ensure the [Ai]levate platform can securely interact with customer-managed infrastructure.

Elastic Datastore Requirements

Cloud Hosting - The [Ai]levate product team recommend to host the required Elastic cluster in a Public Cloud infrastructure owned by the Customer to offer more sizing flexibility as well as an easier networking configuration

Area	Requirement	Notes
Deployment	Customer-provisioned Elastic cluster. A service connection string must be provided to [Ai]levate Engineer (or registered partner).	Minimum 3 nodes recommended for HA
Sizing	Based on claim volume and query concurrency	Plan for growth and redundancy
Encryption	AES-256 encryption at rest	Customer responsibility
Access	Service account with READ/WRITE permissions. A service connection must be shared to [Ai]levate	Used by [Ai]levate Cloud Services
Backup/DR	Backup, restore, and retention policies	Customer responsibility

AI Warehouse Requirements

AI Warehouse Deployment guide A more detailed deployment guide for the Tenstorrent hardware is available [HERE]

Area	Requirement	Notes
Hardware	Customer-procured Tenstorrent hardware. A service connection string must be provided to [Ai]levate Engineer (or registered partner).	Must be sized for workload concurrency
Software	[Ai]levate-provided AI model images	Runs in vLLM-compatible runtime
Connectivity	Inbound TLS access to vLLM API	Exposed to [Ai]levate Cloud Services
Security	Certificates + TLS enforced	Customer responsibility
Patch Management	OS, runtime, and firmware updates	Customer responsibility

Relay Service Requirements

Relay Service Deployment guide A more detailed deployment guide for the [Ai]levate Relay Service is available [HERE]

Area	Requirement	Notes
Operating System	Ubuntu Server LTS (recommended)	Other modern Linux distros acceptable
Sizing	2 vCPU, 4–8 GB RAM, 20 GB disk	Lightweight workload
Placement	Same LAN/subnet as SQL datastore	Must reach EHR SQL over TCP 1433
Access	Local user with sudo privileges	Needed for [Ai]levate engineers
Connectivity	Outbound 443 to *.ailevate.com and Azure Arc services	No inbound rules required
Time Sync	NTP enabled and accurate	Required for TLS operations

Network & DNS Requirements

Component	Requirement	Notes
Elastic Cluster	Allow inbound TLS connections from [Ai]levate Cloud Services	Secure firewall rules required
AI Warehouse	Expose vLLM API endpoint to [Ai]levate	TLS enforced
Relay VM	Outbound 443 to [Ai]levate and Azure Arc endpoints	Outbound-only design
Internal LAN	Relay must reach SQL datastore on TCP 1433	Validate with nc -vz
DNS Resolution	Public FQDNs + internal hostnames resolvable	Forward + reverse lookups required

Elastic Cluster Sizing

Capacity planning for On-Premise deployments must consider both storage and compute growth. Elastic clusters should be sized based on historical claim volumes, with headroom for growth and redundancy Customers are responsible for scaling and upgrading both layers, while [Ai]levate ensures Cloud Services can orchestrate workloads efficiently across them.

As part of its operations, [Ai]levate Revenue Recovery stores Denied Claims and AI analysis. This section provides both minimum and recommended hardware requirements depending on the volume of claims that [Ai]levate will have to analyze.

Due to the amount of data it processes, [Ai]levate is a CPU intensive application. To avoid any bottleneck introduced by the storage (disk or SAN) or the computing power, [Ai]levate offers a minimal and recommended configuration.

The minimal performances generally cover the needs of most infrastructures.
The recommended performances offers better experience for large or active EHR infrastructures.

Data nodes (primary storage & query workhorses)

Number of Claims to process	RAM (per data node)	CPU (vCPUs per node)	Disk Capacity (per node, SSD)	Minimal Performance	Recommended Performance	Cluster Notes
1 – 2,500	16 GB (8 GB heap)	4–8	500 GB	200 MB/s, 3,000 IOPS	400 MB/s, 6,000 IOPS	3 data nodes, small cluster
2,500 – 5,000	32 GB (16 GB heap)	8–12	1 TB	300 MB/s, 5,000 IOPS	600 MB/s, 10,000 IOPS	3–5 data nodes, 1 coord node
2,500 – 7,500	64 GB (32 GB heap)	12–16	2 TB	400 MB/s, 7,500 IOPS	800 MB/s, 15,000 IOPS	5–7 data nodes, dedicated master
7,500 – 10,000	96 GB (32 GB heap, rest OS cache)	16–24	3–4 TB	500 MB/s, 10,000 IOPS	1,000 MB/s, 20,000 IOPS	7–9 data nodes, 3 master nodes
10,001 – 15,000	128 GB (32 GB heap, rest OS cache)	24+	6 TB	600 MB/s, 12,500 IOPS	1,200 MB/s, 25,000 IOPS	9–12 data nodes, coord + master split
15,001 – 30,000+	256 GB (32 GB heap, rest OS cache)	32+	8–12 TB	800 MB/s, 15,000 IOPS	1,500 MB/s, 30,000 IOPS	12–20+ data nodes, large-scale cluster

Dedicated master-eligible nodes (cluster state & consensus)

Number of Claims to process	# Master Nodes	RAM (per node)	CPU (vCPUs)	Disk Capacity (SSD)	Minimal Performance	Recommended Performance	Notes
1 – 2,500	0 (co-locate) or 3*	16 GB (8–12 GB heap)	4	100 GB	100 MB/s, 500 IOPS	200 MB/s, 1,000 IOPS	*3 dedicated for HA/SLA
2,500 – 5,000	3	16 GB (8–12 GB heap)	4–6	100–150 GB	100 MB/s, 500 IOPS	300 MB/s, 1,500 IOPS	AZ-spread required
2,500 – 7,500	3	32 GB (16 GB heap)	6–8	150 GB	100 MB/s, 500 IOPS	300 MB/s, 1,500 IOPS	Watch cluster state size
7,500 – 10,000	3	32 GB (16 GB heap)	8	200 GB	200 MB/s, 1,000 IOPS	400 MB/s, 2,000 IOPS	Keep them master-only
10,001 – 15,000	3	32–64 GB (16–24 GB heap)	8–12	200 GB	200 MB/s, 1,000 IOPS	400 MB/s, 2,000 IOPS	Increase heap if state grows
15,001 – 30,000+	3	64 GB (24–32 GB heap)	12–16	200–300 GB	300 MB/s, 1,500 IOPS	600 MB/s, 3,000 IOPS	Consider voter-only masters in very large clusters

Coordinating (query/router) nodes

Number of Claims to process	# Coordinating Nodes	RAM (per node)	CPU (vCPUs)	Disk Capacity (SSD)	Minimal Performance	Recommended Performance	Notes
1 – 2,500	0–1	16 GB (8–12 GB heap)	4–8	100 GB	200 MB/s, 1,000 IOPS	400 MB/s, 2,000 IOPS	Add if dashboards/API are spiky
2,500 – 5,000	1	32 GB (16 GB heap)	8–12	100–150 GB	300 MB/s, 1,500 IOPS	600 MB/s, 3,000 IOPS	Put behind LB
2,500 – 7,500	1–2	64 GB (24–32 GB heap)	12–16	150–200 GB	400 MB/s, 2,000 IOPS	800 MB/s, 4,000 IOPS	Scale with query concurrency
7,500 – 10,000	2–3	64–96 GB (32 GB heap)	16–24	200 GB	500 MB/s, 3,000 IOPS	1,000 MB/s, 6,000 IOPS	Keep stateless; autoscale if possible
10,001 – 15,000	3–4	96–128 GB (32 GB heap)	24+	200 GB	600 MB/s, 4,000 IOPS	1,200 MB/s, 8,000 IOPS	Separate read vs write paths if needed
15,001 – 30,000+	4–6	128–192 GB (32 GB heap)	32+	200–300 GB	800 MB/s, 5,000 IOPS	1,500 MB/s, 10,000 IOPS	Add more for heavy aggregations

Additional Requirements

Keep JVM heap ≤ 32 GB on all roles; give extra RAM to the OS page cache (Lucene loves it).
Use SSD/NVMe everywhere; spinning disks won’t meet the IOPS targets.
Shards: aim for ~50 GB per shard (avoid tiny shards); size node disk to keep utilization < 70%.
HA: 3 dedicated masters, spread across AZs; avoid colocating master with data under load.
Scale out > scale up: add nodes to handle concurrency instead of endlessly growing single-node specs.
Networking: low latency between nodes (same region/AZ set); place clients near coordinating nodes.

Networking

On-Premise Topology

This chart summarizes the [Ai]levate network topology

flowchart TD
    subgraph AilevateCloud["[Ai]levate Cloud (Azure)"]
        CloudServices["Cloud Services Layer (RCM Orchestration, Auth, Apps)"]
    end

    subgraph Customer["Customer Environment"]
        EHR["EHR System (Epic, Cerner, etc.)"]
        Relay["Relay Service (Outbound-only)"]

        subgraph AICompute["AI Compute Layer"]
            Warehouse["Dedicated AI Warehouse (Tenstorrent Hardware, vLLM)"]
        end
        subgraph Storage["Database Storage Layer"]
            Elastic["Elastic Datastore (Encrypted at Rest, Tenant-Isolated)"]
        end
    end

    EHR <--> Relay
    Relay --> CloudServices
    CloudServices --> Warehouse
    CloudServices --> Elastic

Network Bandwidth Sizing

In addition to compute and storage capacity, the network bandwidth between the Cloud Services Layer (Azure), the AI Compute Layer (AI Warehouse), and the Database Storage Layer (Elastic) plays a critical role in ensuring predictable performance. Latency and insufficient bandwidth may directly impact the speed of claim ingestion, AI task execution, and overall denial remediation workflows. The following table provides minimum and recommended bandwidth values based on the average number of claim objects processed per minute.

flowchart LR
    subgraph Ailevate["[Ai]levate Cloud Services (Azure)"]
        Services["Cloud Services Layer
(RCM Orchestration, Auth, Apps)"]
    end

    subgraph Customer["Customer Environment"]
        Elastic["Elastic Datastore (Encrypted)"]
        Warehouse["AI Warehouse (Tenstorrent, vLLM)"]
    end

    Services <--->|TLS Encrypted Traffic<br/>1–30 Mbps depending on claims| Elastic
    Services <--->|TLS Encrypted Traffic<br/>1–30 Mbps depending on claims| Warehouse

Claims Processed per Minute	Minimum Bandwidth	Recommended Bandwidth
1 – 500	1 Mbps	2 Mbps
501 – 7,50	5 Mbps	10 Mbps
7,51 – 4,000	15 Mbps	30 Mbps

Networking & Connectivity

Concept Unlike the SaaS model, On-Premise deployments require inbound connectivity for both the Elastic datastore and AI Warehouse. Customers must configure firewall rules, certificates, and DNS entries to securely expose these services to [Ai]levate Cloud Services. The Relay Service remains outbound-only, minimizing attack surface on the EHR side. VPN or PrivateLink can be implemented to provide private connectivity instead of public TLS endpoints.

Network Flow Matrix The table below summarizes all required network flows between the [Ai]levate Cloud Services, the customer-managed Elastic datastore, AI Warehouse, and Relay Service. It specifies the direction of traffic, protocol/port, and the entity responsible for configuration. This matrix should be used by customer networking and security teams to validate firewall rules and connectivity.

flowchart TD
    subgraph Customer["Customer Environment"]
        EHR["EHR SQL Datastore"]
        Relay["Relay VM (Outbound-only)"]
        Elastic["Elastic Datastore (9200/tls)"]
        Warehouse["AI Warehouse (vLLM API, 8080/tls)"]
    end

    subgraph Ailevate["[Ai]levate Cloud Services (Azure)"]
        Cloud["Cloud Services Layer
(Orchestration, Auth, Apps)"]
    end

    Relay -->|TCP 1433| EHR
    Relay -->|443 TLS| Cloud
    Cloud -->|443 TLS| Elastic
    Cloud -->|443 TLS| Warehouse
    Elastic -->|9200 TLS| Cloud
    Warehouse -->|8080 TLS| Cloud

Source	Destination	Protocol / Port	Direction	Encryption	Responsibility
Relay VM (Customer)	[Ai]levate Cloud	HTTPS / 443	Outbound	TLS 1.2+	Customer (firewall rules)
Relay VM (Customer)	EHR Datastore	TCP / 1433 (NextGen)	LAN	TLS / Local Network	Customer
[Ai]levate Cloud Services	Elastic Datastore	HTTPS / 9200 (Elastic API)	Inbound to Customer	TLS 1.2+	Customer (expose Elastic securely)
[Ai]levate Cloud Services	AI Warehouse (vLLM API)	HTTPS / 8080*	Inbound to Customer	TLS 1.2+	Customer (expose vLLM securely)
Elastic Datastore	[Ai]levate Cloud Services	HTTPS / 443	Outbound	TLS 1.2+	Customer
AI Warehouse	[Ai]levate Cloud Services	HTTPS / 443	Outbound	TLS 1.2+	Customer
DNS Resolver (Customer)	Public DNS / Internal	UDP/TCP 53	Outbound	—	Customer

Operation

In On-Premise deployments, customers assume operational ownership of the Elastic datastore and AI Warehouse, including patching, scaling, monitoring, backups, and disaster recovery. They must also maintain TLS certificates, inbound firewall rules, and access controls for both components. The Relay VM, while lightweight, also requires customer patching, SQL credential rotation, and log monitoring.

[Ai]levate remains responsible for the Cloud Services Layer, ensuring RCM workflows, authentication, orchestration, and application delivery operate seamlessly. This division of responsibilities allows customers to maintain sovereignty over their infrastructure while benefiting from [Ai]levate’s managed orchestration and compliance capabilities.

flowchart TB
    subgraph Customer["Customer Operations"]
        C1["Manage Elastic Datastore<br/>(scaling, backup, patching)"]
        C2["Operate AI Warehouse<br/>(hardware, vLLM runtime, patching)"]
        C3["Maintain Relay VM<br/>(patching, SQL creds, logs)"]
    end

    subgraph Ailevate["[Ai]levate Operations"]
        A1["Manage Cloud Services<br/>(apps, workflows, orchestration)"]
        A2["Ensure Compliance<br/>(HIPAA, tenant isolation guidance)"]
    end

    Customer -->|Provide infrastructure| Ailevate
    Ailevate -->|Platform orchestration| Customer

Security and Data Privacy

The On-Premise deployment enforces the same “secure by design” principles as the SaaS model but gives customers direct control over storage and compute. All data at rest in Elastic must be encrypted using AES-256, and all data in transit must use TLS 1.2+. Customers may configure BYOK via Azure Key Vault, integrating their own key lifecycle policies.

By design, the AI Warehouse never stores data, only executing tasks via the vLLM interface. The Elastic datastore remains entirely customer-controlled, with [Ai]levate accessing it only through secured service connections. Strong role-based access controls (RBAC) ensure fine-grained permissions within applications, while single-tenant isolation guarantees logical separation across customers.

Instead of replicating or exporting datasets, [Ai]levate employs secure data sharing patterns, executing queries without exposing raw storage. This approach, combined with HIPAA compliance across the platform, enables healthcare organizations to maintain sovereignty while meeting regulatory requirements.

flowchart TB
    subgraph Customer["Customer-Controlled"]
        Elastic["Elastic Datastore (Encrypted, BYOK supported)"]
        Warehouse["AI Warehouse (Tenstorrent, vLLM - No Storage)"]
    end

    subgraph Ailevate["[Ai]levate Cloud Services"]
        Services["Orchestration, Auth, Workflows"]
    end

    Services --> Elastic
    Services --> Warehouse

    classDef secure fill:#cce5ff,stroke:#333,stroke-width:1.5px;
    classDef private fill:#e6ffe6,stroke:#333,stroke-width:1.5px;

    Elastic:::private
    Warehouse:::private
    Services:::secure

Security Recommendations for on-premise implementation

Control Area	On-Premise Implementation
Encryption	AES-256 for storage and transit; customer-managed
Key Management	BYOK via Azure Key Vault; customer controls policy
Separation of Duties	Elastic stores data; AI Warehouse executes only
Tenant Isolation	Logical isolation across [Ai]levate Cloud Services
Access Control (RBAC)	Fine-grained role-based permissions in apps
Secure Data Sharing	Queries executed without exposing raw data
Compliance	HIPAA alignment across storage, compute, and cloud

Checklist (Summary)

Responsibility	Action	Owner
Elastic datastore	Provision, encrypt, backup	👤 Customer
AI Warehouse	Procure, configure, patch	👤 Customer
Relay VM	Deploy, patch, monitor	👤 Customer
Networking	Configure inbound (Elastic & vLLM), outbound (Relay)	👤 Customer
Identity/SSO	Configure SSO or Magic Link	🤝 Joint
Cloud Services Layer	Provision & operate	🏢 Ailevate
Compliance	HIPAA (cloud + infra)	🤝 Shared