uhg-grid-knowledge
Expert knowledge about UHG's Grid multi-cloud service mesh - architecture, IP addressing, DNS, service registration, security model, performance characteristics, and troubleshooting
UHG Grid Multi-Cloud Service Mesh Knowledge Base
What is The Grid?
The Grid is UHG's internally-built multi-cloud service mesh that connects isolated cloud accounts (AWS, Azure, GCP) to each other and to on-premise datacenters.
Core Problem Solved:
- Cloud accounts are isolated "islands" by default
- Traditional networking takes 1-2 months per connection
- 8,000+ applications need rapid connectivity
- Multiple cloud providers need unified service discovery
Grid Solution:
- 5-10 minute connectivity (vs 1-2 months traditional)
- Automatic service discovery across all clouds
- 100% automated (no manual tickets)
- Supports overlapping IPs (app teams use any IPs they want)
Architecture Components
Core Technologies
┌─────────────────────────────────────────────────────────────────┐
│ THE GRID │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Consul │ │ HAProxy │ │ BIND │ │
│ │ Service Mesh │──>│ Reverse Proxy│──>│ DNS Server │ │
│ │ + Discovery │ │ + Routing │ │ Authoritative│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ └───────── consul-template ─────────────┘ │
│ (Auto Config Deployment) │
└─────────────────────────────────────────────────────────────────┘
1. Hashicorp Consul
- Service mesh foundation
- Service registration and discovery
- Key-Value store for configuration
- Health checking
- Admin partitions for isolation (partition = askid)
2. HAProxy
- Reverse proxy reading from Consul
- Dynamically routes based on service discovery
- Load balancing across instances
- TLS termination and passthrough
3. consul-template
- Watches Consul for changes
- Automatically regenerates HAProxy configs
- Triggers HAProxy reloads (zero downtime)
4. BIND DNS
- Authoritative DNS per environment
- Two patterns:
grid.uhg.comandmesh.uhg.com - DNS views for different query sources
- RPZ (Response Policy Zones) with wildcard CNAME
5. Hashicorp Vault
- Dynamic credential generation
- PKI secrets engine (per-askid intermediate CA)
- Consul secrets engine (time-limited tokens)
- Integration with Venafi for CA issuance
IP Addressing & Network Topology
Grid Infrastructure Address Space
CGNAT Space (RFC6598 Compliant):
100.64.0.0/10 - Designed for service provider use (like Grid)
├─> 100.65.0.0/16 - CIH (Cloud Integration Hub) connectivity
├─> 100.72.0.0/13 - Grid West (us-west-2)
├─> 100.80.0.0/12 - Grid Central (us-east-2, centralus)
└─> 100.112.0.0/12 - Grid East (us-east-1, eastus)
Regional Breakdown:
- West: 100.72.0.0/13 (prod), 100.74.0.0/16 (prod), 100.75.0.0/16 (stage)
- Central: 100.80.0.0/12 (prod), 100.92.0.0/15 (prod), 100.94.0.0/15 (stage)
- East: 100.112.0.0/12 (prod), 100.124.0.0/15 (prod), 100.126.0.0/15 (stage)
- CIH: 100.65.0.0/16 (connectivity infrastructure, physical layer)
Why 100.x.x.x?
- RFC6598 CGNAT space (Carrier-Grade NAT)
- Designed specifically for service provider use
- No conflicts with public internet routing
- Supports overlapping customer IPs via NAT
Gateway Deployment Model
NOT Hub-and-Spoke:
- Each askid gets dedicated gateway group (not shared hub)
- 1:1 VNet/VPC peering between Grid gateway VNet and app team VNet
- Enables overlapping IP support (app teams can use any IPs)
- No central bottleneck (distributed architecture)
Gateway Network Sizing:
- Typically /26 or /27 network per gateway group
- Peered directly with matching askid+csp+env+region app networks
- Example:
grid-gateways-uhgwm110-022715-azure-prod-eastus
DNS Architecture
Two DNS Patterns
1. mesh.uhg.com - Direct to Consul
Purpose: Within-partition or directly routable connections
Returns: Actual instance IPs (from Consul service registration)
Use case: Services within same partition/trust boundary
Example: myservice.service.uhgwm110-022715.ap.centralus.azu.mesh.uhg.com
Performance: Lower latency (direct routing, no Grid gateway hop)
2. grid.uhg.com - Via Grid Gateways
Purpose: Cross-partition communication (trust boundaries)
Returns: Grid gateway IPs (HAProxy endpoints)
Use case: Cloud-to-cloud, cloud-to-on-prem, cross-askid
Example: myservice.service.uhgwm110-022715.ap.centralus.azu.grid.uhg.com
Performance: Slightly higher latency (Grid gateway hop for security/routing)
DNS Structure
Format:
<service-name>.service.<askid>.ap.<region>.<cloud>.<environment>.grid.uhg.com
└─ or mesh.uhg.com
Components:
├─> service-name: Your service (e.g., api, database, web)
├─> service: Literal string (always "service")
├─> askid: Application identifier (e.g., uhgwm110-022715, aide-0088590)
├─> ap: Admin partition (always "ap")
├─> region: Cloud region (eastus, us-east-1, central)
├─> cloud: Provider (azu, aws, gcp)
├─> environment: Optional — omit for prod (e.g. "stage" for non-prod)
└─> domain: grid.uhg.com or mesh.uhg.com
Examples:
- Production:
api.service.uhgwm110-022715.ap.eastus.azu.grid.uhg.com - Stage:
api.service.uhgwm110-022715.ap.eastus.azu.stage.grid.uhg.com - Mesh (direct):
api.service.uhgwm110-022715.ap.eastus.azu.mesh.uhg.com
BIND DNS Views
Default View:
Matches: Grid gateways, corporate DNS
Recursion: Enabled
Purpose: Standard DNS resolution for Grid infrastructure
Red View (Future):
Matches: Red network sources
Recursion: Disabled (no recursion for untrusted networks)
Purpose: Isolated DNS for segregated environments
Service Registration
Prerequisites
Before registering any service:
- Node IP: IP within your VNet/VPC address space (from PCAM)
- Service Port: Port your service listens on (1-65535)
- Health Check Endpoint: URL/port indicating health (mandatory)
- Terraform Setup: TFE workspace with Consul + Vault providers
Registration Process
Step 1: Register Node
resource "consul_node" "app_node" {
name = "my-app-node" # Alphanumeric, dashes, periods only
address = "10.100.1.10" # IP from your VNet/VPC
meta = {
external-node = "true"
external-probe = "true"
}
}
Step 2: Register Service
TCP Service:
resource "consul_service" "tcp_service" {
name = "my-service" # Alphanumeric, dashes only
node = consul_node.app_node.name
port = 3306
tags = [
"tag_key:<value>", # Must use quotes
]
check {
check_id = "tcp-health"
name = "TCP Connection Check"
tcp = "10.100.1.10:3306"
interval = "60s"
timeout = "5s"
deregister_critical_service_after = "90m"
}
}
HTTPS Service:
resource "consul_service" "https_service" {
name = "my-service"
node = consul_node.app_node.name
port = 443
tags = [
"https", # Protocol designation
"haproxy_mode: https", # HAProxy traffic handling
"haproxy_port_ssl: true", # Enable SSL functionality
"tag_key:<value>",
]
check {
check_id = "https-health"
name = "HTTPS Health Check"
https = "https://10.100.1.10:443/health"
interval = "5s"
timeout = "2s"
deregister_critical_service_after = "30m"
}
}
Step 3: Automatic Propagation
- consul-template detects new service (watches Consul)
- HAProxy config regenerated with new backend
- HAProxy reloaded (zero downtime)
- BIND DNS updated automatically
- Service available at:
<service-name>.service.<askid>.ap.<region>.<cloud>.grid.uhg.com
Reserved Ports (DO NOT USE)
Grid Infrastructure Ports:
22 - SSH
25 - SMTP
53, 953 - DNS
80 - HTTP (returns 400 - SSL required)
389, 636 - LDAP
443 - HTTPS (service communication)
464 - Kerberos SSL (to ad-ldap-app.uhc.com)
2000 - TCP SNI (gateway-to-gateway)
2001 - TCP Proxy Protocol
3389 - RDP
8200-8202 - Vault
8300-8302 - Consul (coordination)
8500-8501 - Consul HTTP/RPC
8600 - Consul DNS
9000 - HAProxy stats
9443 - mTLS HTTPS (gateway-to-gateway)
9999 - Dynatrace
10000 - HAProxy peers
Safe Application Ports:
- MySQL: 3306, 3307, 3308
- PostgreSQL: 5432
- MSSQL: 1433
- Kafka: 8083 (TLS)
- Custom ports outside reserved ranges
Security & Access Control
TLS Requirements
Mandatory:
- ✅ All Grid services MUST use TLS/SSL (HTTPS)
- ✅ Grid gateway → Grid gateway uses mTLS (port 9443)
- ✅ Must use Optum-sanctioned root CAs
- ❌ Self-signed certificates NOT supported
Certificate Sources:
⚠️ Privileged operation — modifies system CA trust. Requires explicit user confirmation; never run autonomously. Verify the certificate fingerprint against the UHG PKI source before installing.
Repository: https://repo1.uhc.com/artifactory/UHG-certificates/
├─> Java: standard_trusts.jks (KeyStore format)
└─> VMs/K8s: standard_trusts.pem (PEM bundle)
Installation:
# Debian/Ubuntu
/usr/local/share/ca-certificates/standard_trusts.pem
sudo update-ca-certificates
# RHEL/CentOS
/etc/pki/ca-trust/source/anchors/standard_trusts.pem
sudo update-ca-trust
PKI Architecture
CA Hierarchy:
UHG-Grid-RootCA1 (PKI team, HSM-backed)
└─> UHG-Grid-PolicyCA1 (6-year intermediate, expires 2/26/2030)
└─> Per-askid Intermediate CA (path length 0, leaf-only)
└─> Service certificates (62-day TTL)
Certificate Lifecycle:
- TTL: 62 days
- Rotation: Every 21 days (immutable infrastructure)
- Process: Complete VM replacement (tear down, revoke cert, provision new VM)
- Venafi: Issues per-askid intermediate CA (3-year validity)
Access Control Model
Vault + Consul Integration:
Vault namespace = Consul partition = TFE project = askid
Example: askid "uhgwm110-022715"
├─> Vault namespace: uhgwm110-022715
├─> Consul partition: uhgwm110-022715
└─> TFE project: uhgwm110-022715
Dynamic Credentials (Time-Limited):
Consul secrets engine path: consul/{cloud}/{region}
├─> Located within askid's Vault namespace
├─> Different Vault instances per environment
├─> Tokens are time-limited (seconds to hours)
└─> No static credentials
AD Group-Based Access:
Format: ARC_Vault_{ASKID}_{env}_{permission}
Examples:
├─> ARC_Vault_UHGWM110_022715_nonprod_Read
├─> ARC_Vault_UHGWM110_022715_nonprod_Write
├─> ARC_Vault_UHGWM110_022715_prod_Read
└─> ARC_Vault_UHGWM110_022715_prod_Write
Token Roles:
├─> Read group = read-only Consul tokens
└─> Write group = service-registration tokens
Terraform Enterprise:
├─> ARC_Terraform_UHGWM110_022715_Read
└─> ARC_Terraform_UHGWM110_022715_Write
Why This Security Model Works:
Compromised network CANNOT modify routing:
├─> Need AD group membership (can't get without IAM access)
├─> Need to check out Vault token (requires AD group)
├─> Need to modify Consul services (requires valid token)
└─> Network access ≠ routing control (identity-based security)
Cross-Partition Connectivity
CMDB Supply Chain Dependencies (MANDATORY):
For cross-askid communication (different Consul partitions):
├─> Step 1: Service Line Owner (SLO) requests dependency
├─> Step 2: Documented in CI Central (ServiceNow)
├─> Step 3: Relationship: "Critical for Continuous Operation (OP)"
├─> Step 4: Consul KV updated with approved upstreams
├─> Step 5: HAProxy config auto-generated
└─> Step 6: Service accessible (after 25 hour propagation)
Verification:
├─> Consul UI → Select askid → Key/Value → grid → upstreams
└─> Target askid should appear in upstream list
Important: Unidirectional (A→B doesn't grant B→A)
Without CMDB approval, cross-partition communication will NOT work.
Platform Governance
Dual TFE Clusters
terraform.uhg.com (App Teams):
- Application infrastructure
- App team workspaces
- Service registration
- Read/Write AD groups grant access
tfe-arc.uhg.com (Platform Engineers):
- Grid infrastructure
- Initializers (onboarding automation)
- Grid gateways
- Consul partitions, Vault namespaces, PKI setup
- Walled off from app teams
- Only accessible by platform/ops
Zero Customization Policy
Immutable Naming Convention:
Askid naming is immutable:
├─> Format: uhgwm{digits}-{digits} or aide-{digits}
├─> NO customization allowed
├─> NO exceptions (even for platform engineers)
├─> Example rejected: "my-special-app-project"
└─> Must use: "uhgwm110-022715" (as issued)
Everything automated:
├─> Engineers cannot manually create workspaces
├─> Engineers cannot manually modify things
├─> All changes through automation
└─> Grid team follows same rules (no special privileges)
Performance Characteristics
Production Data (Azure)
Grid:
Total traffic: 8-15 Gbps sustained
Single askid: 3+ Gbps (cross-region)
Latency: 7.155 ms average (on-prem to Azure East)
Per-connection throughput: 2.73 Gbps measured (iperf3)
Multi-stream (7): 10.0 Gbps sustained
Path: 12 hops (complete visibility)
Status: 3,000+ Azure subscriptions, 3+ years GA
NGCN (Aviatrix + Palo Alto) for comparison:
Total traffic: 100-420 Mbps (entire fabric)
Latency: 26.189 ms average (on-prem to Azure)
Per-connection: ~1.2 Gbps reported (iperf3 testing disabled)
Path: ~20 hops (destination cannot be confirmed by traceroute)
Status: Minimal adoption despite years available
Grid vs NGCN:
- Grid handles 19x-150x MORE traffic
- Grid has 3.66x lower latency (73% latency reduction)
- Grid uses 67% fewer hops (12 vs 20)
- Grid allows performance verification (transparent)
Expected Latency Ranges
On-Premise to Cloud:
- Azure East: ~7-10 ms
- Azure Central: ~10-15 ms
- Azure West: ~15-25 ms
- AWS: Similar to Azure (Direct Connect)
- GCP: Similar to Azure (Cloud Interconnect)
Cloud-to-Cloud (via Grid):
- Same region: +1-3 ms (Grid gateway hop)
- Cross-region: Depends on cloud provider backbone
- Cross-cloud: Via on-prem transit (higher latency)
Deployment Status
| Cloud Provider | Region | Non-Production | Production | Notes |
|---|---|---|---|---|
| Azure | Central | ✅ GA | ✅ GA | 3+ years, 8-15 Gbps traffic |
| Azure | East | ✅ GA | ✅ GA | 3+ years, proven |
| Azure | West | ✅ GA | ✅ GA | 3+ years, proven |
| GCP | Central | ✅ GA | ✅ GA | 2+ years, proven |
| GCP | East | ✅ GA | ✅ GA | 2+ years, proven |
| AWS | All Regions | 🚧 Pending | 🚧 Pending | v2 integration in progress |
Note: GCP Grid overlay unavailable for teams using Aviatrix networking in GCP.
Cloud-Specific Implementation
Azure Grid
Status: ✅ Generally Available (Central, East, West)
Azure SQL Managed Instance Integration:
- ⚠️ Cannot proxy entire SQL MI subnet (not permitted)
- ✅ Must use private endpoints for SQL MI connectivity
Private Endpoint Setup:
- Create private endpoint for SQL MI instance
- Register private endpoint IP with Grid (port 1433)
- Submit ServiceNow request for certificate SAN updates
- Include: SQL MI FQDN from private endpoint DNS tab
- Open DNS change ticket to alias Grid gateway IPs
Why Private Endpoints?
- SQL MI enforces TLS host name verification
- Private endpoints enable E2E encryption + authentication
- Maintains network isolation while enabling connectivity
GCP Grid
Status: ✅ Generally Available (Central, East)
Cloud-Native Service Connectivity (Cloud SQL, etc.):
Private Service Connect (PSC) Required:
psc_config {
psc_enabled = true
allowed_consumer_projects = [
"<APP-PROJECT-ID>",
"<GRID-PROJECT-ID>"
]
}
Grid Project IDs (example — verify current IDs with Grid team):
├─> NonProd: <GRID-NONPROD-PROJECT-ID>
└─> Prod: <GRID-PROD-PROJECT-ID>
PSC Connection Process:
- Provision cloud-native service with Grid project ID in PSC allowlist
- Notify Grid team with:
- Connection Name
- Service attachment
- Default TCP port
- Grid team creates PSC endpoint (automated)
- Verify connectivity
Important:
- ⚠️ Services not provisioned with PSC cannot be retrofitted (must recreate)
- 🚀 Grid team working on automatic PSC endpoint scanning/creation
- ❌ Grid overlay unavailable for Aviatrix networking in GCP
AWS Grid
Status: 🚧 v2 Integration Pending (All Regions)
Infrastructure Foundation:
- Physical layer: 4x 100 Gbps Direct Connect circuits
- Network fabric: AWS Cloud WAN (us-east-1, us-east-2, us-west-2)
- Gateway layer: TGW peerings to Cloud WAN
- Connectivity: Cologix (Minneapolis) + Equinix (Chicago)
When Complete:
- Same service registration patterns as Azure/GCP
- Direct Connect provides low-latency on-prem connectivity
- Cloud WAN enables automatic inter-region routing
Troubleshooting
Common Issues
1. Service Not Accessible
Check:
├─> Is service registered in Consul? (Consul UI → Services)
├─> Is health check passing? (Critical = deregistered)
├─> Is CMDB dependency configured? (For cross-askid)
├─> Is DNS resolving? (dig or nslookup)
└─> Are TLS certificates valid? (openssl s_client)
Resolution:
├─> Failed health check: Fix application health endpoint
├─> Missing CMDB: Request supply chain dependency (25 hour propagation)
├─> DNS not resolving: Check service name format
└─> TLS errors: Install standard_trusts.pem
2. Cross-Partition Communication Fails
Symptom: Can't reach service in different askid
Root cause: Missing CMDB supply chain dependency
Resolution:
├─> Step 1: Verify askid dependency in CI Central
├─> Step 2: Request "Critical for Continuous Operation (OP)" relationship
├─> Step 3: Wait 25 hours for propagation
├─> Step 4: Verify in Consul KV: grid → upstreams
└─> Step 5: Test connectivity again
3. TLS/SSL Verification Failures
⚠️ Privileged operation — modifies system CA trust. Requires explicit user confirmation; never run autonomously. Verify the certificate fingerprint against the UHG PKI source before installing.
Symptom: SSL certificate validation errors
Root cause: Missing Optum standard trust store
Resolution:
# Debian/Ubuntu
sudo curl -o /usr/local/share/ca-certificates/standard_trusts.pem \
https://repo1.uhc.com/artifactory/UHG-certificates/standard_trusts.pem
sudo update-ca-certificates
# RHEL/CentOS
sudo curl -o /etc/pki/ca-trust/source/anchors/standard_trusts.pem \
https://repo1.uhc.com/artifactory/UHG-certificates/standard_trusts.pem
sudo update-ca-trust
# Verify
openssl s_client -connect service.grid.uhg.com:443
4. Port Conflicts
Symptom: Service registration works but connection fails
Root cause: Using reserved Grid infrastructure port
Resolution:
├─> Check port against reserved list (see Reserved Ports section)
├─> Use different port (safe ranges: 3306-3308, 5432, 1433, etc.)
└─> Update service registration with new port
Diagnostic Commands
Check Service Registration:
# Via Consul API (requires token)
curl -H "X-Consul-Token: $CONSUL_TOKEN" \
https://consul.service.consul:8501/v1/catalog/service/my-service
# Via Consul UI (browser)
https://consul-ui.grid.uhg.com → Services → Search
Check DNS Resolution:
# Check grid.uhg.com (via Grid gateways)
dig api.service.uhgwm110-022715.ap.eastus.azu.grid.uhg.com
# Check mesh.uhg.com (direct to instances)
dig api.service.uhgwm110-022715.ap.eastus.azu.mesh.uhg.com
# Should return Grid gateway IPs (100.x.x.x range)
Test Connectivity:
# Ping Grid gateway
ping <GRID-GATEWAY-IP>
# Traceroute to Grid gateway
traceroute <GRID-GATEWAY-IP>
# Test HTTPS connectivity
curl -v https://api.service.uhgwm110-022715.ap.eastus.azu.grid.uhg.com
# Test with cert validation
openssl s_client -connect api.service.uhgwm110-022715.ap.eastus.azu.grid.uhg.com:443
Check Consul KV for Upstreams:
# Via Consul API
curl -H "X-Consul-Token: $CONSUL_TOKEN" \
https://consul.service.consul:8501/v1/kv/grid/upstreams?recurse
# Via Consul UI
Consul UI → Key/Value → grid → upstreams
Grid vs NGCN Comparison
Why Grid is Superior
Architecture:
- ✅ Grid: Distributed (dedicated per-app capacity)
- ❌ NGCN: Hub-and-spoke (central bottleneck)
IP Addressing:
- ✅ Grid: RFC6598 CGNAT (100.x.x.x) - designed for this purpose
- ❌ NGCN: 6.0.0.0/8 (allocated to US DoD per IANA) - misuse of public IPs
Overlapping IPs:
- ✅ Grid: Fully supported (NAT at gateway)
- ❌ NGCN: Not supported (unique IPs required)
Performance:
- ✅ Grid: 7ms latency, 2.73 Gbps per connection, 10 Gbps multi-stream
- ❌ NGCN: 26ms latency, ~1.2 Gbps per connection (IPSec limit)
Security:
- ✅ Grid: Identity-based (Consul ACLs, Vault tokens, AD groups)
- ❌ NGCN: IP-based (firewall rules only)
Connectivity:
- ✅ Grid: Private circuits (Direct Connect, ExpressRoute, Interconnect)
- ❌ NGCN: Public internet (IPSec tunnels)
Time to Connect:
- ✅ Grid: 5-10 minutes (automated)
- ❌ NGCN: 1-2 weeks (manual firewall rules)
Team Size:
- ✅ Grid: 3-5 engineers (constant)
- ❌ NGCN: Scales with app count (10+ for 1,000 apps)
Production Traffic:
- ✅ Grid: 8-15 Gbps (real usage, proven)
- ❌ NGCN: 100-420 Mbps (minimal usage, theoretical)
Network Visibility:
- ✅ Grid: Complete (troubleshootable)
- ❌ NGCN: Blocked (security by obscurity)
ESRO Concerns Addressed (Grid Team Perspective)
Concern: "Grid bypasses firewalls"
Reality: Grid has MORE security controls than firewalls
├─> Vault dynamic credentials (time-limited)
├─> Consul ACLs (service-level authorization)
├─> AD group membership (IAM integration)
├─> CMDB supply chain (business logic enforcement)
└─> mTLS gateway-to-gateway (encrypted transit)
NGCN's controls:
├─> IP-based firewall rules (network-level only)
└─> Manual approval workflow (slower)
Verdict: Grid has more technical controls
Concern: "NGCN inspects traffic for threats"
Reality: NGCN Palo Alto firewalls CANNOT decrypt mTLS
├─> Do NOT have application private keys
├─> CANNOT inspect encrypted payload
├─> Can only see: source IP, dest IP, port, protocol
└─> Does NOT require TLS (allows plain HTTP)
Grid's approach:
├─> REQUIRES TLS for all services (enforced)
├─> mTLS for gateway-to-gateway (mutual auth)
├─> Applications handle encryption (E2E)
└─> No MITM (preserves E2E encryption)
Verdict: Neither solution inspects encrypted payload
Grid REQUIRES encryption, NGCN doesn't
Concern: "Grid allows data exfiltration"
Reality: Grid blocks internet traffic (same as NGCN)
├─> No internet ingress (nothing in)
├─> No internet egress (nothing out)
├─> Only registered services trusted
└─> 'Internet' is NOT a registered service
Verdict: Grid has same internet restrictions as NGCN
Concern: "Grid has no approval workflow"
Reality: Grid has CMDB-based approval workflow
├─> Service Line Owner (SLO) requests dependency
├─> Documented in CI Central (ServiceNow)
├─> Consul KV updated with approved upstreams
├─> HAProxy auto-configured from approved list
└─> Unidirectional (A→B doesn't grant B→A)
NGCN's workflow:
├─> Manual firewall rule request
├─> Manual approval by firewall team
└─> Manual implementation
Verdict: Grid has formal approval (CMDB vs firewall)
Concern: "NGCN is proven at scale"
Production data:
├─> NGCN: 100-420 Mbps (minimal usage)
├─> Grid: 8-15 Gbps (19x-150x MORE)
├─> NGCN: Available for years (unused)
├─> Grid: 3+ years GA (3,000+ subscriptions)
Verdict: Grid is proven at UHG, NGCN is not
Support & Resources
Documentation:
Teams Channel:
- Search: "UHG Grid" in Microsoft Teams
Office Hours:
- Monday, Tuesday, Wednesday, Friday at 9:00 AM CT
- Join via Teams meeting link in channel
ServiceNow:
- Workgroup: "UHG GRID"
- Request Type: P5 Service Request
- Provide: Subscription/Account ID, regions, VNet/VPC details
Key Repositories:
- GitHub Organization: https://github.com/uhg-arc
- uhg-grid/ - Multi-cloud shared components
- uhg-grid-aws/ - AWS-specific automation
- uhg-grid-azure/ - Azure-specific automation
- uhg-grid-gcp/ - GCP-specific automation
- uhg-grid-packages/ - Building .deb packages for Grid components
- tfe-initializer/ - Onboarding automation (Initializers)
Summary: When to Use Grid
Use Grid when you need:
- ✅ Cloud-to-cloud connectivity (AWS, Azure, GCP)
- ✅ Cloud-to-on-premise connectivity
- ✅ Cross-subscription/cross-account connectivity
- ✅ Rapid connectivity (5-10 minutes vs weeks)
- ✅ Automatic service discovery
- ✅ Support for overlapping IP addresses
- ✅ Identity-based security (not just IP-based)
- ✅ High-performance connectivity (Gbps scale)
- ✅ Complete network visibility (troubleshooting)
Grid is NOT for:
- ❌ Public internet traffic (Grid is internal-only)
- ❌ Interactive user sessions (service-to-service only)
- ❌ Applications using Aviatrix in GCP (incompatible)
- ❌ Applications requiring hub-and-spoke (Grid is distributed)
Grid Status:
- ✅ Azure: GA (3+ years), 8-15 Gbps production traffic
- ✅ GCP: GA (2+ years), production-proven
- 🚧 AWS: v2 integration pending
Key Takeaway: Grid is UHG's proven multi-cloud networking solution, handling 19x-150x more traffic than alternatives with superior performance, security, and operational efficiency.
Related Assets
Harmony Foundations
Harmony Design System foundation elements including typography, grid layout, icons, and design tokens with usage examples.
Owner: pcorazao
github-agents
Configure GitHub Copilot Coding Agent repositories with the correct UHG runner, Artifactory, and workflow bootstrap patterns.
Owner: pcorazao

