On-Prem Deployment Guide
This guide describes how to deploy the On-Premise version of [Ai]levate Revenue Recovery, where customers host and operate both the Database Storage Layer and the AI Compute Layer while [Ai]levate manages the Cloud Services Layer.
This guide describes how to deploy the On-Premise version of [Ai]levate Revenue Recovery, where customers take responsibility for hosting and operating both the Database Storage Layer (Elastic datastore) and the AI Compute Layer (Tenstorrent AI Warehouse). [Ai]levate continues to manage the Cloud Services Layer, providing orchestration, workflow management, authentication, and application delivery. The Relay Service Layer remains customer-hosted and is used to securely connect the customer’s EHR datastore to the platform.
Introduction
The On-Premise model is designed for organizations with strict data residency, sovereignty, or infrastructure control requirements, giving them full ownership of compute and storage, while still benefiting from [Ai]levate’s managed SaaS orchestration layer.
General Approach
In the On-Premise model, customers deploy and manage their own Elastic datastore and AI Warehouse hardware, which serve as the backbone of the platform’s storage and compute layers. These components must be provisioned, secured, patched, and scaled by the customer, while [Ai]levate connects to them through secure service connections. Unlike the SaaS model, the AI Warehouse and datastore require inbound connectivity, meaning customers must configure firewall rules, certificates, and access policies to allow [Ai]levate services to interact with these systems.
The Relay Service Layer remains outbound-only, providing secure connectivity between the EHR system and [Ai]levate’s Cloud Services. This hybrid model enables customers to retain full control of infrastructure while relying on [Ai]levate for orchestration, governance, and compliance enforcement at the cloud layer.
Deployment overview
The On-Prem deployment of [Ai]levate Revenue Recovery is a joint operation between the customer and [Ai]levate. The customer provisions and secures all infrastructure components — Elastic datastore, AI Warehouse, and Relay VM — ensuring that each is reachable and properly configured. Once prerequisites are in place, [Ai]levate engineers onboard the Relay, validate inbound connections to Elastic and the AI Warehouse, and configure the Cloud Services Layer to orchestrate workflows across the customer environment. The process is deliberately staged to minimize disruption, with clear validation checkpoints at each step before moving to production.
sequenceDiagram
participant C as Customer SysAdmin
participant E as "[Ai]levate Engineer"
participant R as Relay VM
participant SQL as EHR Datastore
participant ES as Elastic Datastore (On-Prem)
participant AI as AI Warehouse (vLLM API)
participant CS as "[Ai]levate Cloud Services"
%% Preparation
Note over C: Preparation Phase
C->>C: Provision Elastic, AI Warehouse, Relay VM
C->>SQL: Validate SQL connectivity (TCP 1433)
C->>ES: Enable TLS & access policies
C->>AI: Expose vLLM API securely
%% Deployment
Note over E,R: Deployment Phase
E->>R: Onboard Relay VM (Azure Arc, tunnel, proxy)
R->>SQL: Test LAN connectivity to EHR
R->>CS: Establish outbound tunnel (443)
%% Integration
Note over E,CS: Integration Phase
E->>ES: Validate inbound TLS connectivity from Cloud
E->>AI: Validate inbound TLS connectivity from Cloud
CS->>ES: Register Elastic datastore
CS->>AI: Register AI Warehouse endpoint
%% Validation
Note over E,C: Validation Phase
CS->>SQL: Retrieve test claims
CS->>AI: Execute sample AI task
CS->>ES: Store metadata & logs
E->>C: Provide runbook, confirm handover
Step-by-Step Deployment
Phase | Steps |
|---|---|
Preparation (👤 Customer) | • Provision Elastic datastore, AI Warehouse hardware, and Relay VM. |
Deployment (🤝 Joint) | • [Ai]levate engineer onboards Relay (Azure Arc agent, tunnel, proxy). |
Integration (🤝 Joint) | • Validate inbound TLS connections from [Ai]levate Cloud to Elastic and AI Warehouse. |
Validation (🤝 Joint) | • Run test claim ingestion from EHR. |
Technical Prerequisites
The following technical prerequisites must be prepared by the customer before deployment. These ensure the [Ai]levate platform can securely interact with customer-managed infrastructure.
Elastic Datastore Requirements
| Area | Requirement | Notes |
|---|---|---|
| Deployment | Customer-provisioned Elastic cluster. A service connection string must be provided to [Ai]levate Engineer (or registered partner). | Minimum 3 nodes recommended for HA |
| Sizing | Based on claim volume and query concurrency | Plan for growth and redundancy |
| Encryption | AES-256 encryption at rest | Customer responsibility |
| Access | Service account with READ/WRITE permissions. A service connection must be shared to [Ai]levate | Used by [Ai]levate Cloud Services |
| Backup/DR | Backup, restore, and retention policies | Customer responsibility |
AI Warehouse Requirements
AI Warehouse Deployment guide A more detailed deployment guide for the Tenstorrent hardware is available [HERE]
| Area | Requirement | Notes |
|---|---|---|
| Hardware | Customer-procured Tenstorrent hardware. A service connection string must be provided to [Ai]levate Engineer (or registered partner). | Must be sized for workload concurrency |
| Software | [Ai]levate-provided AI model images | Runs in vLLM-compatible runtime |
| Connectivity | Inbound TLS access to vLLM API | Exposed to [Ai]levate Cloud Services |
| Security | Certificates + TLS enforced | Customer responsibility |
| Patch Management | OS, runtime, and firmware updates | Customer responsibility |
Relay Service Requirements
Relay Service Deployment guide A more detailed deployment guide for the [Ai]levate Relay Service is available [HERE]
| Area | Requirement | Notes |
|---|---|---|
| Operating System | Ubuntu Server LTS (recommended) | Other modern Linux distros acceptable |
| Sizing | 2 vCPU, 4–8 GB RAM, 20 GB disk | Lightweight workload |
| Placement | Same LAN/subnet as SQL datastore | Must reach EHR SQL over TCP 1433 |
| Access | Local user with sudo privileges | Needed for [Ai]levate engineers |
| Connectivity | Outbound 443 to *.ailevate.com and Azure Arc services | No inbound rules required |
| Time Sync | NTP enabled and accurate | Required for TLS operations |
Network & DNS Requirements
| Component | Requirement | Notes |
|---|---|---|
| Elastic Cluster | Allow inbound TLS connections from [Ai]levate Cloud Services | Secure firewall rules required |
| AI Warehouse | Expose vLLM API endpoint to [Ai]levate | TLS enforced |
| Relay VM | Outbound 443 to [Ai]levate and Azure Arc endpoints | Outbound-only design |
| Internal LAN | Relay must reach SQL datastore on TCP 1433 | Validate with nc -vz |
| DNS Resolution | Public FQDNs + internal hostnames resolvable | Forward + reverse lookups required |
Elastic Cluster Sizing
Capacity planning for On-Premise deployments must consider both storage and compute growth. Elastic clusters should be sized based on historical claim volumes, with headroom for growth and redundancy Customers are responsible for scaling and upgrading both layers, while [Ai]levate ensures Cloud Services can orchestrate workloads efficiently across them.
As part of its operations, [Ai]levate Revenue Recovery stores Denied Claims and AI analysis. This section provides both minimum and recommended hardware requirements depending on the volume of claims that [Ai]levate will have to analyze.
Due to the amount of data it processes, [Ai]levate is a CPU intensive application. To avoid any bottleneck introduced by the storage (disk or SAN) or the computing power, [Ai]levate offers a minimal and recommended configuration.
- The minimal performances generally cover the needs of most infrastructures.
- The recommended performances offers better experience for large or active EHR infrastructures.
Data nodes (primary storage & query workhorses)
| Number of Claims to process | RAM (per data node) | CPU (vCPUs per node) | Disk Capacity (per node, SSD) | Minimal Performance | Recommended Performance | Cluster Notes |
|---|---|---|---|---|---|---|
| 1 – 2,500 | 16 GB (8 GB heap) | 4–8 | 500 GB | 200 MB/s, 3,000 IOPS | 400 MB/s, 6,000 IOPS | 3 data nodes, small cluster |
| 2,500 – 5,000 | 32 GB (16 GB heap) | 8–12 | 1 TB | 300 MB/s, 5,000 IOPS | 600 MB/s, 10,000 IOPS | 3–5 data nodes, 1 coord node |
| 2,500 – 7,500 | 64 GB (32 GB heap) | 12–16 | 2 TB | 400 MB/s, 7,500 IOPS | 800 MB/s, 15,000 IOPS | 5–7 data nodes, dedicated master |
| 7,500 – 10,000 | 96 GB (32 GB heap, rest OS cache) | 16–24 | 3–4 TB | 500 MB/s, 10,000 IOPS | 1,000 MB/s, 20,000 IOPS | 7–9 data nodes, 3 master nodes |
| 10,001 – 15,000 | 128 GB (32 GB heap, rest OS cache) | 24+ | 6 TB | 600 MB/s, 12,500 IOPS | 1,200 MB/s, 25,000 IOPS | 9–12 data nodes, coord + master split |
| 15,001 – 30,000+ | 256 GB (32 GB heap, rest OS cache) | 32+ | 8–12 TB | 800 MB/s, 15,000 IOPS | 1,500 MB/s, 30,000 IOPS | 12–20+ data nodes, large-scale cluster |
Dedicated master-eligible nodes (cluster state & consensus)
| Number of Claims to process | # Master Nodes | RAM (per node) | CPU (vCPUs) | Disk Capacity (SSD) | Minimal Performance | Recommended Performance | Notes |
|---|---|---|---|---|---|---|---|
| 1 – 2,500 | 0 (co-locate) or 3* | 16 GB (8–12 GB heap) | 4 | 100 GB | 100 MB/s, 500 IOPS | 200 MB/s, 1,000 IOPS | *3 dedicated for HA/SLA |
| 2,500 – 5,000 | 3 | 16 GB (8–12 GB heap) | 4–6 | 100–150 GB | 100 MB/s, 500 IOPS | 300 MB/s, 1,500 IOPS | AZ-spread required |
| 2,500 – 7,500 | 3 | 32 GB (16 GB heap) | 6–8 | 150 GB | 100 MB/s, 500 IOPS | 300 MB/s, 1,500 IOPS | Watch cluster state size |
| 7,500 – 10,000 | 3 | 32 GB (16 GB heap) | 8 | 200 GB | 200 MB/s, 1,000 IOPS | 400 MB/s, 2,000 IOPS | Keep them master-only |
| 10,001 – 15,000 | 3 | 32–64 GB (16–24 GB heap) | 8–12 | 200 GB | 200 MB/s, 1,000 IOPS | 400 MB/s, 2,000 IOPS | Increase heap if state grows |
| 15,001 – 30,000+ | 3 | 64 GB (24–32 GB heap) | 12–16 | 200–300 GB | 300 MB/s, 1,500 IOPS | 600 MB/s, 3,000 IOPS | Consider voter-only masters in very large clusters |
Coordinating (query/router) nodes
| Number of Claims to process | # Coordinating Nodes | RAM (per node) | CPU (vCPUs) | Disk Capacity (SSD) | Minimal Performance | Recommended Performance | Notes |
|---|---|---|---|---|---|---|---|
| 1 – 2,500 | 0–1 | 16 GB (8–12 GB heap) | 4–8 | 100 GB | 200 MB/s, 1,000 IOPS | 400 MB/s, 2,000 IOPS | Add if dashboards/API are spiky |
| 2,500 – 5,000 | 1 | 32 GB (16 GB heap) | 8–12 | 100–150 GB | 300 MB/s, 1,500 IOPS | 600 MB/s, 3,000 IOPS | Put behind LB |
| 2,500 – 7,500 | 1–2 | 64 GB (24–32 GB heap) | 12–16 | 150–200 GB | 400 MB/s, 2,000 IOPS | 800 MB/s, 4,000 IOPS | Scale with query concurrency |
| 7,500 – 10,000 | 2–3 | 64–96 GB (32 GB heap) | 16–24 | 200 GB | 500 MB/s, 3,000 IOPS | 1,000 MB/s, 6,000 IOPS | Keep stateless; autoscale if possible |
| 10,001 – 15,000 | 3–4 | 96–128 GB (32 GB heap) | 24+ | 200 GB | 600 MB/s, 4,000 IOPS | 1,200 MB/s, 8,000 IOPS | Separate read vs write paths if needed |
| 15,001 – 30,000+ | 4–6 | 128–192 GB (32 GB heap) | 32+ | 200–300 GB | 800 MB/s, 5,000 IOPS | 1,500 MB/s, 10,000 IOPS | Add more for heavy aggregations |
Additional Requirements
- Keep JVM heap ≤ 32 GB on all roles; give extra RAM to the OS page cache (Lucene loves it).
- Use SSD/NVMe everywhere; spinning disks won’t meet the IOPS targets.
- Shards: aim for ~50 GB per shard (avoid tiny shards); size node disk to keep utilization < 70%.
- HA: 3 dedicated masters, spread across AZs; avoid colocating master with data under load.
- Scale out > scale up: add nodes to handle concurrency instead of endlessly growing single-node specs.
- Networking: low latency between nodes (same region/AZ set); place clients near coordinating nodes.
Networking
On-Premise Topology
This chart summarizes the [Ai]levate network topology
flowchart TD
subgraph AilevateCloud["[Ai]levate Cloud (Azure)"]
CloudServices["Cloud Services Layer (RCM Orchestration, Auth, Apps)"]
end
subgraph Customer["Customer Environment"]
EHR["EHR System (Epic, Cerner, etc.)"]
Relay["Relay Service (Outbound-only)"]
subgraph AICompute["AI Compute Layer"]
Warehouse["Dedicated AI Warehouse (Tenstorrent Hardware, vLLM)"]
end
subgraph Storage["Database Storage Layer"]
Elastic["Elastic Datastore (Encrypted at Rest, Tenant-Isolated)"]
end
end
EHR <--> Relay
Relay --> CloudServices
CloudServices --> Warehouse
CloudServices --> Elastic
Network Bandwidth Sizing
In addition to compute and storage capacity, the network bandwidth between the Cloud Services Layer (Azure), the AI Compute Layer (AI Warehouse), and the Database Storage Layer (Elastic) plays a critical role in ensuring predictable performance. Latency and insufficient bandwidth may directly impact the speed of claim ingestion, AI task execution, and overall denial remediation workflows. The following table provides minimum and recommended bandwidth values based on the average number of claim objects processed per minute.
flowchart LR
subgraph Ailevate["[Ai]levate Cloud Services (Azure)"]
Services["Cloud Services Layer
(RCM Orchestration, Auth, Apps)"]
end
subgraph Customer["Customer Environment"]
Elastic["Elastic Datastore (Encrypted)"]
Warehouse["AI Warehouse (Tenstorrent, vLLM)"]
end
Services <--->|TLS Encrypted Traffic<br/>1–30 Mbps depending on claims| Elastic
Services <--->|TLS Encrypted Traffic<br/>1–30 Mbps depending on claims| Warehouse
| Claims Processed per Minute | Minimum Bandwidth | Recommended Bandwidth |
|---|---|---|
| 1 – 500 | 1 Mbps | 2 Mbps |
| 501 – 7,50 | 5 Mbps | 10 Mbps |
| 7,51 – 4,000 | 15 Mbps | 30 Mbps |
Networking & Connectivity
Concept Unlike the SaaS model, On-Premise deployments require inbound connectivity for both the Elastic datastore and AI Warehouse. Customers must configure firewall rules, certificates, and DNS entries to securely expose these services to [Ai]levate Cloud Services. The Relay Service remains outbound-only, minimizing attack surface on the EHR side. VPN or PrivateLink can be implemented to provide private connectivity instead of public TLS endpoints.
Network Flow Matrix The table below summarizes all required network flows between the [Ai]levate Cloud Services, the customer-managed Elastic datastore, AI Warehouse, and Relay Service. It specifies the direction of traffic, protocol/port, and the entity responsible for configuration. This matrix should be used by customer networking and security teams to validate firewall rules and connectivity.
flowchart TD
subgraph Customer["Customer Environment"]
EHR["EHR SQL Datastore"]
Relay["Relay VM (Outbound-only)"]
Elastic["Elastic Datastore (9200/tls)"]
Warehouse["AI Warehouse (vLLM API, 8080/tls)"]
end
subgraph Ailevate["[Ai]levate Cloud Services (Azure)"]
Cloud["Cloud Services Layer
(Orchestration, Auth, Apps)"]
end
Relay -->|TCP 1433| EHR
Relay -->|443 TLS| Cloud
Cloud -->|443 TLS| Elastic
Cloud -->|443 TLS| Warehouse
Elastic -->|9200 TLS| Cloud
Warehouse -->|8080 TLS| Cloud
| Source | Destination | Protocol / Port | Direction | Encryption | Responsibility |
|---|---|---|---|---|---|
| Relay VM (Customer) | [Ai]levate Cloud | HTTPS / 443 | Outbound | TLS 1.2+ | Customer (firewall rules) |
| Relay VM (Customer) | EHR Datastore | TCP / 1433 (NextGen) | LAN | TLS / Local Network | Customer |
| [Ai]levate Cloud Services | Elastic Datastore | HTTPS / 9200 (Elastic API) | Inbound to Customer | TLS 1.2+ | Customer (expose Elastic securely) |
| [Ai]levate Cloud Services | AI Warehouse (vLLM API) | HTTPS / 8080* | Inbound to Customer | TLS 1.2+ | Customer (expose vLLM securely) |
| Elastic Datastore | [Ai]levate Cloud Services | HTTPS / 443 | Outbound | TLS 1.2+ | Customer |
| AI Warehouse | [Ai]levate Cloud Services | HTTPS / 443 | Outbound | TLS 1.2+ | Customer |
| DNS Resolver (Customer) | Public DNS / Internal | UDP/TCP 53 | Outbound | — | Customer |
Operation
In On-Premise deployments, customers assume operational ownership of the Elastic datastore and AI Warehouse, including patching, scaling, monitoring, backups, and disaster recovery. They must also maintain TLS certificates, inbound firewall rules, and access controls for both components. The Relay VM, while lightweight, also requires customer patching, SQL credential rotation, and log monitoring.
[Ai]levate remains responsible for the Cloud Services Layer, ensuring RCM workflows, authentication, orchestration, and application delivery operate seamlessly. This division of responsibilities allows customers to maintain sovereignty over their infrastructure while benefiting from [Ai]levate’s managed orchestration and compliance capabilities.
flowchart TB
subgraph Customer["Customer Operations"]
C1["Manage Elastic Datastore<br/>(scaling, backup, patching)"]
C2["Operate AI Warehouse<br/>(hardware, vLLM runtime, patching)"]
C3["Maintain Relay VM<br/>(patching, SQL creds, logs)"]
end
subgraph Ailevate["[Ai]levate Operations"]
A1["Manage Cloud Services<br/>(apps, workflows, orchestration)"]
A2["Ensure Compliance<br/>(HIPAA, tenant isolation guidance)"]
end
Customer -->|Provide infrastructure| Ailevate
Ailevate -->|Platform orchestration| Customer
Security and Data Privacy
The On-Premise deployment enforces the same “secure by design” principles as the SaaS model but gives customers direct control over storage and compute. All data at rest in Elastic must be encrypted using AES-256, and all data in transit must use TLS 1.2+. Customers may configure BYOK via Azure Key Vault, integrating their own key lifecycle policies.
By design, the AI Warehouse never stores data, only executing tasks via the vLLM interface. The Elastic datastore remains entirely customer-controlled, with [Ai]levate accessing it only through secured service connections. Strong role-based access controls (RBAC) ensure fine-grained permissions within applications, while single-tenant isolation guarantees logical separation across customers.
Instead of replicating or exporting datasets, [Ai]levate employs secure data sharing patterns, executing queries without exposing raw storage. This approach, combined with HIPAA compliance across the platform, enables healthcare organizations to maintain sovereignty while meeting regulatory requirements.
flowchart TB
subgraph Customer["Customer-Controlled"]
Elastic["Elastic Datastore (Encrypted, BYOK supported)"]
Warehouse["AI Warehouse (Tenstorrent, vLLM - No Storage)"]
end
subgraph Ailevate["[Ai]levate Cloud Services"]
Services["Orchestration, Auth, Workflows"]
end
Services --> Elastic
Services --> Warehouse
classDef secure fill:#cce5ff,stroke:#333,stroke-width:1.5px;
classDef private fill:#e6ffe6,stroke:#333,stroke-width:1.5px;
Elastic:::private
Warehouse:::private
Services:::secure
Security Recommendations for on-premise implementation
| Control Area | On-Premise Implementation |
|---|---|
| Encryption | AES-256 for storage and transit; customer-managed |
| Key Management | BYOK via Azure Key Vault; customer controls policy |
| Separation of Duties | Elastic stores data; AI Warehouse executes only |
| Tenant Isolation | Logical isolation across [Ai]levate Cloud Services |
| Access Control (RBAC) | Fine-grained role-based permissions in apps |
| Secure Data Sharing | Queries executed without exposing raw data |
| Compliance | HIPAA alignment across storage, compute, and cloud |
Checklist (Summary)
| Responsibility | Action | Owner |
|---|---|---|
| Elastic datastore | Provision, encrypt, backup | 👤 Customer |
| AI Warehouse | Procure, configure, patch | 👤 Customer |
| Relay VM | Deploy, patch, monitor | 👤 Customer |
| Networking | Configure inbound (Elastic & vLLM), outbound (Relay) | 👤 Customer |
| Identity/SSO | Configure SSO or Magic Link | 🤝 Joint |
| Cloud Services Layer | Provision & operate | 🏢 Ailevate |
| Compliance | HIPAA (cloud + infra) | 🤝 Shared |
Updated about 1 month ago
