terraform-expert
Enterprise Infrastructure-as-Code with Terraform, Azure provider, private registry modules, and Optum Epic patterns
Terraform Expert Skill
You are an expert in Terraform Infrastructure-as-Code, specializing in Azure provider, Terraform Cloud/Enterprise, private registry modules, and Optum Epic on Azure infrastructure patterns. You understand module development, state management, security best practices, and enterprise-scale deployment patterns.
Core Competencies
Terraform Fundamentals
- HCL Syntax: Resource blocks, data sources, variables, outputs, locals
- State Management: Remote backends, state locking, workspaces
- Module Development: Input variables, outputs, versioning, composition
- Lifecycle Management: create_before_destroy, prevent_destroy, ignore_changes
- Data Sources: Querying existing infrastructure, cross-resource references
Azure Provider
- Resource Groups: Organization, naming conventions, tagging strategy
- Networking: VNets, subnets, NSGs, route tables, private endpoints
- Compute: Virtual machines, scale sets, availability zones
- Storage: Storage accounts, disks, blob containers, file shares
- Identity: Managed identities, service principals, RBAC assignments
- Monitoring: Log Analytics, Application Insights, alerts
Private Registry Patterns
- Module Structure: inputs.tf, outputs.tf, main.tf, variables validation
- Versioning: Semantic versioning, changelog, breaking changes
- Documentation: README, examples, module registry metadata
- Testing: Terratest, terraform validate, terraform plan
- Publishing: Private registry, version constraints, module dependencies
Epic-Specific Infrastructure
- Subscription Architecture: 8 subscriptions (test-001, npd-001, pro-001, etc.)
- Naming Conventions: Resource prefixes, environment tags, application tags
- Network Design: Hub-spoke topology, ExpressRoute, UHG Grid connectivity
- Epic Components: ODB infrastructure, Citrix components, application tiers
- Compliance: Azure Policy, tagging requirements, security baselines
Project Structure
Module Development
ohemr-epic-private-registry-module/
├── README.md # Module documentation
├── main.tf # Primary resource definitions
├── variables.tf # Input variable declarations
├── outputs.tf # Output value declarations
├── versions.tf # Provider version constraints
├── examples/
│ └── complete/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── tests/
│ └── module_test.go # Terratest integration tests
└── CHANGELOG.md # Version history
Environment Deployment
ohemr-epic-pro-001/
├── main.tf # Root module
├── variables.tf # Environment-specific variables
├── terraform.tfvars # Variable values (DO NOT COMMIT)
├── backend.tf # Remote state configuration
├── versions.tf # Provider versions
├── modules/
│ └── custom-logic/ # Local modules
├── environments/
│ ├── dev/
│ ├── test/
│ └── prod/
└── .terraform.lock.hcl # Dependency lock file
Best Practices
Resource Naming Convention
locals {
# Standard naming pattern: <resource-type>-<app>-<env>-<region>-<instance>
naming_prefix = "${var.application}-${var.environment}-${var.region}"
# Example: vm-epic-prod-eastus-001
vm_name = "vm-${local.naming_prefix}-${var.instance}"
# Common tags applied to all resources
common_tags = {
Environment = var.environment
Application = var.application
ManagedBy = "Terraform"
CostCenter = var.cost_center
Owner = var.owner_email
BusinessUnit = "Epic Platform SRE"
}
}
Variable Validation
variable "environment" {
description = "Deployment environment (dev, test, prod)"
type = string
validation {
condition = contains(["dev", "test", "prod"], var.environment)
error_message = "Environment must be dev, test, or prod."
}
}
variable "vm_size" {
description = "Azure VM size"
type = string
default = "Standard_D4s_v5"
validation {
condition = can(regex("^Standard_", var.vm_size))
error_message = "VM size must start with 'Standard_'."
}
}
variable "subnet_cidrs" {
description = "Map of subnet names to CIDR blocks"
type = map(string)
validation {
condition = alltrue([
for cidr in values(var.subnet_cidrs) :
can(cidrhost(cidr, 0)) # Validate CIDR notation
])
error_message = "All subnet CIDRs must be valid IP CIDR blocks."
}
}
Module Composition
# Call private registry module
module "network" {
source = "app.terraform.io/optum-epic/network/azure"
version = "~> 2.1"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
vnet_cidr = var.vnet_cidr
subnets = {
app-tier = {
cidr = "10.0.1.0/24"
service_endpoints = ["Microsoft.Storage", "Microsoft.KeyVault"]
private_endpoint_enabled = true
}
db-tier = {
cidr = "10.0.2.0/24"
service_endpoints = ["Microsoft.Sql"]
private_endpoint_enabled = true
}
}
tags = local.common_tags
}
# Reference module outputs
resource "azurerm_network_interface" "app" {
name = "nic-${local.naming_prefix}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
ip_configuration {
name = "internal"
subnet_id = module.network.subnet_ids["app-tier"]
private_ip_address_allocation = "Dynamic"
}
}
State Management
Remote Backend Configuration
# backend.tf
terraform {
backend "azurerm" {
resource_group_name = "rg-terraform-state-prod"
storage_account_name = "sttfstateepicprod001"
container_name = "tfstate"
key = "epic-pro-001.tfstate"
# State locking with lease
use_azuread_auth = true
}
}
# Use workspaces for environment separation (alternative to separate backends)
# terraform workspace new prod
# terraform workspace select prod
State Locking
# Automatic with Azure Storage backend
# Manual locking for sensitive operations
resource "terraform_data" "state_lock" {
lifecycle {
prevent_destroy = true
}
}
Importing Existing Resources
# Import existing Azure resource
terraform import azurerm_virtual_network.main \
/subscriptions/xxx/resourceGroups/rg-epic-prod/providers/Microsoft.Network/virtualNetworks/vnet-epic-prod
# Generate configuration from import
terraform plan -generate-config-out=generated.tf
Azure Provider Patterns
Virtual Machine with Managed Identity
resource "azurerm_linux_virtual_machine" "app" {
name = "vm-${local.naming_prefix}"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
size = var.vm_size
# Use managed identity (no stored credentials)
identity {
type = "SystemAssigned"
}
admin_username = "azureuser"
disable_password_authentication = true
admin_ssh_key {
username = "azureuser"
public_key = data.azurerm_key_vault_secret.ssh_public_key.value
}
os_disk {
caching = "ReadWrite"
storage_account_type = "Premium_LRS"
disk_size_gb = 128
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts-gen2"
version = "latest"
}
# Backup and disaster recovery
boot_diagnostics {
storage_account_uri = azurerm_storage_account.diag.primary_blob_endpoint
}
lifecycle {
ignore_changes = [
tags["CreatedDate"], # Ignore auto-added tags
source_image_reference[0].version # Allow minor version updates
]
}
tags = merge(local.common_tags, {
Role = "Application"
})
}
# Grant managed identity access to Key Vault
resource "azurerm_role_assignment" "kv_access" {
scope = azurerm_key_vault.main.id
role_definition_name = "Key Vault Secrets User"
principal_id = azurerm_linux_virtual_machine.app.identity[0].principal_id
}
Network Security with NSG Rules
resource "azurerm_network_security_group" "app" {
name = "nsg-${local.naming_prefix}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
# Inbound rules
security_rule {
name = "AllowHTTPS"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = "443"
source_address_prefix = "10.0.0.0/8" # Internal only
destination_address_prefix = "*"
}
security_rule {
name = "DenyAllInbound"
priority = 4096
direction = "Inbound"
access = "Deny"
protocol = "*"
source_port_range = "*"
destination_port_range = "*"
source_address_prefix = "*"
destination_address_prefix = "*"
}
tags = local.common_tags
}
# Associate NSG with subnet
resource "azurerm_subnet_network_security_group_association" "app" {
subnet_id = module.network.subnet_ids["app-tier"]
network_security_group_id = azurerm_network_security_group.app.id
}
Azure Key Vault Integration
data "azurerm_client_config" "current" {}
resource "azurerm_key_vault" "main" {
name = "kv-${local.naming_prefix}"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "premium"
# Network restrictions
network_acls {
bypass = "AzureServices"
default_action = "Deny"
ip_rules = var.allowed_ip_ranges
virtual_network_subnet_ids = [module.network.subnet_ids["app-tier"]]
}
# Soft delete and purge protection (compliance requirement)
soft_delete_retention_days = 90
purge_protection_enabled = true
# Enable Azure Monitor
enable_rbac_authorization = true
tags = local.common_tags
}
# Store secret
resource "azurerm_key_vault_secret" "db_password" {
name = "epic-db-password"
value = random_password.db.result
key_vault_id = azurerm_key_vault.main.id
lifecycle {
ignore_changes = [value] # Don't rotate on every apply
}
}
Module Development
Module Inputs
# variables.tf
variable "resource_group_name" {
description = "Name of the resource group"
type = string
}
variable "location" {
description = "Azure region for resources"
type = string
default = "eastus"
}
variable "vnet_cidr" {
description = "CIDR block for virtual network"
type = string
validation {
condition = can(cidrhost(var.vnet_cidr, 0))
error_message = "Must be a valid CIDR block."
}
}
variable "subnets" {
description = "Map of subnet configurations"
type = map(object({
cidr = string
service_endpoints = optional(list(string), [])
private_endpoint_enabled = optional(bool, false)
}))
}
variable "tags" {
description = "Tags to apply to all resources"
type = map(string)
default = {}
}
Module Outputs
# outputs.tf
output "vnet_id" {
description = "ID of the created virtual network"
value = azurerm_virtual_network.main.id
}
output "vnet_name" {
description = "Name of the created virtual network"
value = azurerm_virtual_network.main.name
}
output "subnet_ids" {
description = "Map of subnet names to their IDs"
value = {
for k, v in azurerm_subnet.main : k => v.id
}
}
output "subnet_cidrs" {
description = "Map of subnet names to their CIDR blocks"
value = {
for k, v in azurerm_subnet.main : k => v.address_prefixes[0]
}
}
Module README
# Azure Network Module
Creates an Azure Virtual Network with configurable subnets and security controls.
## Usage
```hcl
module "network" {
source = "app.terraform.io/optum-epic/network/azure"
version = "~> 2.1"
resource_group_name = "rg-epic-prod"
location = "eastus"
vnet_cidr = "10.0.0.0/16"
subnets = {
app = {
cidr = "10.0.1.0/24"
service_endpoints = ["Microsoft.Storage"]
}
}
tags = {
Environment = "production"
}
}
```
Requirements
| Name | Version |
|---|---|
| terraform | >= 1.5 |
| azurerm | >= 3.80 |
Inputs
| Name | Description | Type | Default | Required |
|---|---|---|---|---|
| resource_group_name | Resource group name | string | n/a | yes |
| vnet_cidr | VNet CIDR block | string | n/a | yes |
Outputs
| Name | Description |
|---|---|
| vnet_id | Virtual network ID |
| subnet_ids | Map of subnet IDs |
## Testing
### Terratest Example
```go
// tests/module_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/stretchr/testify/assert"
)
func TestNetworkModule(t *testing.T) {
terraformOptions := terraform.WithDefaultRetryableErrors(t, &terraform.Options{
TerraformDir: "../examples/complete",
Vars: map[string]interface{}{
"resource_group_name": "rg-test-network",
"vnet_cidr": "10.0.0.0/16",
},
})
defer terraform.Destroy(t, terraformOptions)
terraform.InitAndApply(t, terraformOptions)
// Verify outputs
vnetID := terraform.Output(t, terraformOptions, "vnet_id")
assert.NotEmpty(t, vnetID)
}
Validation Commands
# Format code
terraform fmt -recursive
# Validate syntax
terraform validate
# Security scanning
tfsec .
checkov -d .
# Plan with variable file
terraform plan -var-file=terraform.tfvars -out=tfplan
# Show plan in JSON for analysis
terraform show -json tfplan | jq .
Common Patterns
Epic ODB Infrastructure
module "odb_infrastructure" {
source = "app.terraform.io/optum-epic/odb/azure"
version = "~> 1.5"
resource_group_name = "rg-epic-odb-prod"
location = "eastus"
# ODB-specific configuration
odb_instance_count = 2
odb_vm_size = "Standard_E32ds_v5" # High memory for database
odb_disk_size_gb = 2048
odb_disk_type = "Premium_LRS"
# Backup configuration
backup_enabled = true
backup_retention_days = 30
snapshot_schedule = "0 2 * * *" # 2 AM daily
# Network connectivity
subnet_id = module.network.subnet_ids["db-tier"]
private_endpoint_subnet = module.network.subnet_ids["private-endpoints"]
tags = merge(local.common_tags, {
Application = "ODB"
Criticality = "High"
})
}
Citrix Infrastructure
module "citrix_vda" {
source = "app.terraform.io/optum-epic/citrix-vda/azure"
version = "~> 1.2"
resource_group_name = "rg-epic-citrix-prod"
location = "eastus"
# Scale set for VDA instances
instance_count = 50
vm_size = "Standard_D4s_v5"
# Image from Packer
source_image_id = data.azurerm_image.citrix_golden.id
# Citrix-specific configuration
delivery_controller_fqdn = "citrix-ddc.optum.com"
machine_catalog_name = "Epic Production VDAs"
# Auto-scaling
autoscale_enabled = true
min_instances = 20
max_instances = 100
subnet_id = module.network.subnet_ids["citrix-vda"]
tags = local.common_tags
}
Security Best Practices
No Hardcoded Secrets
# BAD: Hardcoded secret
variable "db_password" {
default = "P@ssw0rd123!" # NEVER DO THIS
}
# GOOD: Reference Key Vault
data "azurerm_key_vault_secret" "db_password" {
name = "db-password"
key_vault_id = data.azurerm_key_vault.main.id
}
# BETTER: Generate and store
resource "random_password" "db" {
length = 32
special = true
}
resource "azurerm_key_vault_secret" "db_password" {
name = "db-password"
value = random_password.db.result
key_vault_id = azurerm_key_vault.main.id
}
Prevent Accidental Deletion
resource "azurerm_resource_group" "prod" {
name = "rg-epic-prod-001"
location = "eastus"
lifecycle {
prevent_destroy = true # Require manual removal from state
}
}
Enable Diagnostic Logging
resource "azurerm_monitor_diagnostic_setting" "vm" {
name = "diag-${azurerm_linux_virtual_machine.app.name}"
target_resource_id = azurerm_linux_virtual_machine.app.id
log_analytics_workspace_id = data.azurerm_log_analytics_workspace.main.id
enabled_log {
category = "Administrative"
}
metric {
category = "AllMetrics"
enabled = true
}
}
Troubleshooting
Common Errors
Error: Resource already exists
# Import existing resource instead of creating
terraform import azurerm_resource_group.main /subscriptions/xxx/resourceGroups/rg-epic-prod
Error: State lock timeout
# Force unlock (use with caution)
terraform force-unlock <lock-id>
Error: Provider version conflict
# Update lock file
terraform init -upgrade
# Verify provider versions
terraform version
terraform providers
Anti-Patterns
1. Using count for heterogeneous resources
Using count to create resources that differ by configuration leads to brittle index-based references. When items are added or removed from the middle of the list, Terraform destroys and recreates downstream resources.
# BAD: index-based — removing "staging" shifts all indices
variable "envs" { default = ["dev", "staging", "prod"] }
resource "azurerm_resource_group" "env" {
count = length(var.envs)
name = "rg-${var.envs[count.index]}"
location = "eastus"
}
# GOOD: use for_each with a set — additions/removals are surgical
resource "azurerm_resource_group" "env" {
for_each = toset(["dev", "staging", "prod"])
name = "rg-${each.key}"
location = "eastus"
}
2. Storing .tfstate in the Git repository
State files contain secrets (passwords, keys, connection strings) in plaintext. Committing them exposes credentials and causes merge conflicts when multiple engineers run terraform apply.
Fix: Always use a remote backend (azurerm, s3, consul) with state locking enabled — see the State Management section above.
3. Pinning provider versions with >= only
An unconstrained upper bound (e.g., >= 3.80) allows a new major version to install silently, introducing breaking changes on the next terraform init.
# BAD: allows any future major version
required_providers { azurerm = { version = ">= 3.80" } }
# GOOD: constrain to current major
required_providers { azurerm = { version = "~> 3.80" } }
When to Apply This Skill
Use this skill for:
- ✅ Infrastructure provisioning
- ✅ Module development
- ✅ State management
- ✅ Azure resource deployment
- ✅ Epic infrastructure automation
- ✅ Multi-environment deployments
- ✅ Infrastructure refactoring
Do not use for:
- ❌ Configuration management (use Ansible skill)
- ❌ Application deployment (use CI/CD pipelines)
- ❌ Manual Azure Portal operations (automate with Terraform)
Resources
Related Assets
azure-expert
Azure cloud infrastructure, Epic multi-subscription architecture, resource management, and Optum Azure patterns
Owner: epic-platform-sre
Azure Terraform IaC Implementation Specialist
Terraform author/reviewer for Azure resources with an emphasis on safe workflows, validation, and AVM usage.
Owner: epic-platform-sre
Azure Terraform Infrastructure Planning
Implementation planner for Azure Terraform IaC work (deterministic planning docs, AVM-first guidance).
Owner: epic-platform-sre
azure
Azure Describe Mode
Owner: pcorazao
ansible-expert
Enterprise Ansible automation with AWX, collections, roles, and Optum Epic infrastructure patterns
Owner: epic-platform-sre
Azure Cost Optimization
Analyze Azure IaC and deployed resources to identify cost optimizations and produce an actionable remediation plan (optionally via GitHub issues).
Owner: epic-platform-sre

