RDS Module - Artos Documentation

Overview

The RDS Module deploys an Amazon Aurora PostgreSQL database cluster optimized for the Artos platform. Aurora provides a fully-managed, highly available database with automated backups, point-in-time recovery, and read scaling capabilities. The module handles secure credential management, network isolation, parameter optimization, and monitoring integration.

Key Features

High Availability: Multi-AZ deployment with automatic failover
Secure Credentials: Automated password generation and Secrets Manager storage
Encryption: Data encrypted at rest using KMS and in transit using SSL
Automated Backups: Continuous backups with configurable retention
Performance Insights: Built-in query performance monitoring
Read Scaling: Support for multiple reader instances
Serverless Option: Aurora Serverless v2 for variable workloads

Core Components

1. Database Credentials Management

The module automatically generates and securely stores database credentials.

Random Password Generation

Purpose: Creates a cryptographically secure password for the database master user. Configuration:

Length: 32 characters
Character Set: Letters, numbers, and special characters
Special Characters: !#$%&*()-_=+[]{}|;:,.<>?~ (excludes problematic characters)

Security Benefits:

No hardcoded passwords in code or configuration
Meets complexity requirements for enterprise security policies
Unique per deployment

Secrets Manager Storage

Purpose: Securely stores database credentials for application access. Secret Name: {db_identifier}-password Secret Content:

{
  "username": "artos_admin",
  "password": "<generated-password>"
}

Recovery Window:

Production: 30 days (allows recovery from accidental deletion)
Non-Production: 0 days (immediate deletion for faster iteration)

Lifecycle Management: The secret version has ignore_changes on secret_string to prevent Terraform from updating the password after initial creation. This means:

Password rotations are managed outside Terraform
State file doesn’t contain password updates
Manual rotation doesn’t trigger Terraform changes

Accessing Credentials: Applications retrieve credentials from Secrets Manager using IAM authentication. The IAM module grants the necessary permissions to application pods.

2. Aurora PostgreSQL Cluster

The Aurora cluster is the logical database container that manages replication, backups, and endpoints. Engine: Aurora PostgreSQL (MySQL-compatible API on PostgreSQL engine) Engine Version: 15.4 (default, configurable) Database Name: artos (default, configurable) Key Configurations:

Storage and Encryption

Storage Type: Aurora Serverless storage (auto-scaling)
Encryption: Enabled by default using KMS
KMS Key: Customer-managed key for compliance requirements

Benefits:

Storage automatically grows from 10GB to 128TB
Only pay for storage actually used
KMS encryption enables fine-grained access control

Backup Configuration

Automated Backups:

Retention Period: 7 days (default), configurable up to 35 days
Backup Window: 03:00-04:00 UTC (default, configurable)
Continuous Backups: Transaction logs continuously backed up to S3

Point-in-Time Recovery:

Restore to any second within retention period
No impact on production performance
Fast restore times (minutes vs hours for snapshots)

Final Snapshot:

skip_final_snapshot: false (default)
Creates final snapshot when cluster is deleted
Enables data recovery after accidental deletion

Production Protection: The module sets deletion_protection = true by default. You must explicitly disable deletion protection before destroying the cluster to prevent accidental data loss.

Maintenance Configuration

Preferred Maintenance Window: sun:04:00-sun:05:00 (default) What Happens During Maintenance:

Minor version patches applied
Security updates installed
Parameter changes requiring reboot take effect

Best Practices:

Schedule during low-traffic periods
Coordinate with application maintenance windows
Monitor CloudWatch alarms during maintenance

CloudWatch Logs Integration

Enabled Logs: PostgreSQL logs Log Group: /aws/rds/cluster/{db_identifier}/postgresql Retention: 7 days (default, configurable) Log Contents:

SQL queries (if logging enabled)
Connection events
Error messages
Slow query logs
Database startup/shutdown events

3. Cluster Instances

Aurora uses a cluster architecture with separate compute instances. Instance Configuration:

Component	Description
Instance Count	2 (default) - 1 writer, 1 reader
Instance Class	db.r6g.large (default)
Writer Instance	Handles all write operations
Reader Instances	Handle read operations for scaling

High Availability: Multi-AZ Deployment:

Instances automatically deployed across multiple availability zones
Aurora maintains 6 copies of data across 3 AZs
Automatic failover in case of AZ failure (typically < 30 seconds)

Failover Process:

Primary instance becomes unavailable
Aurora promotes a reader instance to writer
DNS endpoint automatically updates
Applications reconnect transparently

Instance Classes: General Purpose (R6g - AWS Graviton2):

db.r6g.large: 2 vCPU, 16 GB RAM
db.r6g.xlarge: 4 vCPU, 32 GB RAM
db.r6g.2xlarge: 8 vCPU, 64 GB RAM

When to Use:

Standard production workloads
Predictable traffic patterns
Consistent performance requirements

Serverless v2:

db.serverless: Auto-scaling compute
ACUs (Aurora Capacity Units): 0.5 to 4 (default)
Scales in 0.5 ACU increments

When to Use:

Variable workloads with unpredictable traffic
Development/staging environments
Applications with long idle periods

Serverless Configuration:

instance_class = "db.serverless"
serverless_min_capacity = 0.5  # 1 GB RAM
serverless_max_capacity = 4     # 8 GB RAM

4. Parameter Group

The parameter group defines database configuration settings. Family: aurora-postgresql16 Configured Parameters:

max_connections

Default: 1000 connections Purpose: Maximum number of concurrent database connections. Formula: Based on instance memory

max_connections = (DBInstanceClassMemory / 9531392)

Tuning Recommendations:

Connection Pooling: Use PgBouncer or application-level pooling
Monitor: Track connection usage via CloudWatch metric DatabaseConnections
Adjust: Increase if seeing “too many connections” errors

Common Values:

Development: 100-200
Staging: 500-1000
Production: 1000-2000

shared_buffers

Default: 262144 KB (256 MB) Purpose: Memory used for caching data pages. Best Practice: Set to 25% of available memory for dedicated database server. Formula for Aurora:

shared_buffers = {DBInstanceClassMemory / 10922} KB

Examples:

db.r6g.large (16 GB): ~4 GB shared_buffers
db.r6g.xlarge (32 GB): ~8 GB shared_buffers

Impact:

Higher values: Better cache hit ratio, fewer disk reads
Too high: Less memory for other operations
Requires reboot to apply changes

Apply Method: Both parameters use apply_method = "pending-reboot", meaning changes take effect after cluster restart. Plan parameter changes during maintenance windows.

5. Network Configuration

DB Subnet Group

Purpose: Defines which subnets the RDS cluster can use. Requirements:

Minimum 2 subnets in different availability zones
Subnets must be in database tier (isolated from application tier)
All subnets in same VPC

Why Multiple AZs:

Required for Multi-AZ deployments
Enables automatic failover
Aurora distributes replicas across AZs

Security Group

Purpose: Controls network access to the database. Ingress Rules:

Port	Protocol	Source	Description
5432	TCP	Allowed CIDR blocks	PostgreSQL from specified networks
5432	TCP	Allowed security groups	PostgreSQL from EKS nodes

Egress Rules:

All traffic to VPC CIDR (enables communication within VPC)

Typical Configuration:

allowed_security_groups = [
  module.eks.eks_nodes_security_group_id
]
allowed_cidr_blocks = []  # Rely on security groups for access control

Security Best Practices:

Prefer security group references over CIDR blocks
Never use 0.0.0.0/0 for database access
Use bastion host for administrative access
Enable VPC Flow Logs to monitor connection attempts

6. Cluster Endpoints

Aurora provides multiple endpoints for different access patterns.

Writer Endpoint

Format: {cluster-identifier}.cluster-xxxxx.{region}.rds.amazonaws.com Purpose: All write operations and consistent reads Use Cases:

INSERT, UPDATE, DELETE operations
Schema changes (DDL)
Transactions requiring consistency
Administrative operations

Connection String:

DATABASE_URL = "postgresql://artos_admin:<password>@artos-production.cluster-xxxxx.us-east-1.rds.amazonaws.com:5432/artos"

Reader Endpoint

Format: {cluster-identifier}.cluster-ro-xxxxx.{region}.rds.amazonaws.com Purpose: Load-balanced read operations across reader instances Use Cases:

SELECT queries for reporting
Analytics workloads
Read-only API endpoints
Background processing tasks

Load Balancing:

Aurora automatically distributes connections across readers
Round-robin with session-level stickiness
Unhealthy readers automatically removed from rotation

Module Configuration

Basic Configuration

module "rds" {
  source = "./modules/rds"

  db_identifier = "artos-production"
  database_name = "artos"
  db_username   = "artos_admin"
  
  # Instance configuration
  instance_class = "db.r6g.large"
  instance_count = 2  # 1 writer + 1 reader
  engine_version = "15.4"
  
  # Network configuration
  vpc_id     = module.networking.vpc_id
  vpc_cidr   = module.networking.vpc_cidr_block
  subnet_ids = module.networking.database_subnet_ids
  
  # Security
  allowed_security_groups = [
    module.networking.eks_nodes_security_group_id
  ]
  kms_key_arn = module.kms.key_arn
  
  # Backups
  backup_retention_period = 7
  
  # Environment
  environment = "production"
  
  tags = {
    Environment = "production"
  }
}

Production Configuration

module "rds_production" {
  source = "./modules/rds"

  db_identifier = "artos-production"
  database_name = "artos"
  db_username   = "artos_admin"
  
  # Larger instances for production
  instance_class = "db.r6g.xlarge"
  instance_count = 3  # 1 writer + 2 readers for read scaling
  engine_version = "15.4"
  
  # Network configuration
  vpc_id     = module.networking.vpc_id
  vpc_cidr   = module.networking.vpc_cidr_block
  subnet_ids = module.networking.database_subnet_ids
  
  # Security
  allowed_security_groups = [
    module.networking.eks_nodes_security_group_id
  ]
  kms_key_arn = module.kms.key_arn
  
  # Extended backup retention for production
  backup_retention_period      = 30
  preferred_backup_window      = "03:00-04:00"
  preferred_maintenance_window = "sun:04:00-sun:05:00"
  
  # Protection settings
  skip_final_snapshot  = false
  deletion_protection  = true
  
  # Performance tuning
  max_connections = "2000"
  shared_buffers  = "524288"  # 512 MB
  
  # Monitoring
  performance_insights_enabled = true
  monitoring_interval          = 60
  log_retention_days           = 30
  
  environment = "production"
  
  tags = {
    Environment = "production"
    Backup      = "daily"
    Compliance  = "required"
  }
}

Serverless v2 Configuration

module "rds_dev" {
  source = "./modules/rds"

  db_identifier = "artos-dev"
  database_name = "artos"
  db_username   = "artos_admin"
  
  # Serverless v2 for variable workloads
  instance_class = "db.serverless"
  instance_count = 1
  engine_version = "15.4"
  
  # Serverless scaling
  serverless_min_capacity = 0.5   # 1 GB RAM
  serverless_max_capacity = 4     # 8 GB RAM
  
  # Network configuration
  vpc_id     = module.networking.vpc_id
  vpc_cidr   = module.networking.vpc_cidr_block
  subnet_ids = module.networking.database_subnet_ids
  
  # Security
  allowed_security_groups = [
    module.networking.eks_nodes_security_group_id
  ]
  kms_key_arn = module.kms.key_arn
  
  # Minimal backups for development
  backup_retention_period = 1
  skip_final_snapshot     = true
  deletion_protection     = false
  
  # Development settings
  performance_insights_enabled = false
  monitoring_interval          = 0  # Disable enhanced monitoring
  log_retention_days           = 3
  
  environment = "development"
  
  tags = {
    Environment  = "development"
    AutoShutdown = "true"
  }
}

Connecting to the Database

From Application Pods

Applications access the database using credentials from Secrets Manager: Python (using psycopg2):

import json
import boto3
import psycopg2

# Retrieve credentials from Secrets Manager
secrets_client = boto3.client('secretsmanager')
secret_value = secrets_client.get_secret_value(SecretId='artos-production-password')
credentials = json.loads(secret_value['SecretString'])

# Connect to database
conn = psycopg2.connect(
    host='artos-production.cluster-xxxxx.us-east-1.rds.amazonaws.com',
    port=5432,
    database='artos',
    user=credentials['username'],
    password=credentials['password'],
    sslmode='require'
)

From Bastion Host

For administrative access and troubleshooting: SSH to Bastion:

aws ssm start-session --target <bastion-instance-id> --region us-east-1

or use the connect-bastion.sh script:

chmod +x ./scripts/connect-bastion.sh
./scripts/connect-bastion.sh

Connect to Database:

# Retrieve password from Secrets Manager
export DB_PASSWORD=$(aws secretsmanager get-secret-value \
  --secret-id artos-production-password \
  --query SecretString \
  --output text | jq -r '.password')

# Connect using psql
psql -h artos-production.cluster-xxxxx.us-east-1.rds.amazonaws.com \
     -U artos_admin \
     -d artos \
     -p 5432

# Or export PGPASSWORD to avoid prompt
export PGPASSWORD=$DB_PASSWORD
psql -h artos-production.cluster-xxxxx.us-east-1.rds.amazonaws.com \
     -U artos_admin \
     -d artos

Connection Pooling

For production applications, use connection pooling to optimize database connections.

Database Maintenance

Backup and Restore

Manual Snapshot

Create a manual snapshot for specific points in time:

aws rds create-db-cluster-snapshot \
  --db-cluster-identifier artos-production \
  --db-cluster-snapshot-identifier artos-production-manual-2024-01-15

Point-in-Time Restore

Restore to any second within the retention period:

aws rds restore-db-cluster-to-point-in-time \
  --source-db-cluster-identifier artos-production \
  --db-cluster-identifier artos-production-restored \
  --restore-to-time 2024-01-15T10:30:00Z \
  --vpc-security-group-ids sg-xxxxx \
  --db-subnet-group-name artos-production-subnet-group

Restore from Snapshot

aws rds restore-db-cluster-from-snapshot \
  --db-cluster-identifier artos-production-restored \
  --snapshot-identifier artos-production-manual-2024-01-15 \
  --engine aurora-postgresql \
  --vpc-security-group-ids sg-xxxxx \
  --db-subnet-group-name artos-production-subnet-group

Parameter Changes

Applying parameter changes:

# Modify parameter group via Terraform
# Then reboot cluster for changes to take effect

# Reboot writer instance
aws rds reboot-db-instance --db-instance-identifier artos-production-1

# Reboot reader instances
aws rds reboot-db-instance --db-instance-identifier artos-production-2

Reboot Impact: Rebooting the writer instance causes a brief downtime (typically < 30 seconds). Aurora automatically promotes a reader to writer during reboot to minimize downtime.

Version Upgrades

Minor version upgrades are automatic during maintenance windows. For major version upgrades:

# Create snapshot before upgrade
aws rds create-db-cluster-snapshot \
  --db-cluster-identifier artos-production \
  --db-cluster-snapshot-identifier artos-production-before-upgrade

# Modify cluster to new version
aws rds modify-db-cluster \
  --db-cluster-identifier artos-production \
  --engine-version 16.1 \
  --apply-immediately

Monitoring and Troubleshooting

CloudWatch Metrics

Key metrics to monitor:

Metric	Description	Alarm Threshold
CPUUtilization	CPU usage percentage	> 80%
DatabaseConnections	Active connections	> 80% of max_connections
FreeableMemory	Available memory	< 1 GB
ReadLatency	Read operation latency	> 20ms
WriteLatency	Write operation latency	> 50ms
CommitLatency	Transaction commit time	> 100ms

Common Issues

Too Many Connections

Symptoms: FATAL: sorry, too many clients already Solutions:

Increase max_connections parameter
Implement connection pooling (PgBouncer)
Review application connection management
Close idle connections

Connection Timeouts

Symptoms: Applications cannot connect to database Troubleshooting:

# Verify security group rules
aws ec2 describe-security-groups --group-ids <rds-sg-id>

# Check if cluster is available
aws rds describe-db-clusters --db-cluster-identifier artos-production \
  --query 'DBClusters[0].Status'

# Test connectivity from EKS pod
kubectl run -it --rm debug --image=postgres:15 --restart=Never -- \
  psql -h artos-production.cluster-xxxxx.us-east-1.rds.amazonaws.com \
       -U artos_admin -d artos

Security

Descriptions of security around the RDS instances.

1. Network Isolation

Deploy in private database subnets (no internet access)
Use security groups to restrict access to EKS nodes only
Enable VPC Flow Logs for connection audit trail

2. Encryption

At Rest:

Always use KMS encryption with customer-managed keys
Enables fine-grained access control via key policies
Meets compliance requirements (HIPAA, PCI-DSS)

In Transit:

Enforce SSL/TLS connections
Set rds.force_ssl = 1 parameter (optional)
Verify SSL in application connection strings

3. Credential Management

Never hardcode database passwords
Rotate passwords regularly (outside Terraform)
Use Secrets Manager for automatic rotation
Grant minimal IAM permissions for secret access

Networking Module - Provides database subnets and security groups
IAM Module - Grants permissions to access RDS and Secrets Manager
Monitoring Module - CloudWatch log groups for RDS logs
Bastion Module - Administrative access to database

Module Maintenance: This module is compatible with Terraform 1.0+ and AWS Provider 5.x. Aurora PostgreSQL version 15.4 is the default. Review AWS Aurora release notes before upgrading to newer versions. Always test database upgrades in non-production environments first.

Getting Started

API Reference

Core Concepts

SDK Reference

Cookbooks

Integrations

Deployment

​Overview

​Key Features

​Core Components

​1. Database Credentials Management

​Random Password Generation

​Secrets Manager Storage

​2. Aurora PostgreSQL Cluster

​Storage and Encryption

​Backup Configuration

​Maintenance Configuration

​CloudWatch Logs Integration

​3. Cluster Instances

​4. Parameter Group

​max_connections

​shared_buffers

​5. Network Configuration

​DB Subnet Group

​Security Group

​6. Cluster Endpoints

​Writer Endpoint

​Reader Endpoint

​Module Configuration

​Basic Configuration

​Production Configuration

​Serverless v2 Configuration

​Connecting to the Database

​From Application Pods

​From Bastion Host

​Connection Pooling

​Database Maintenance

​Backup and Restore

​Manual Snapshot

​Point-in-Time Restore

​Restore from Snapshot

​Parameter Changes

​Version Upgrades

​Monitoring and Troubleshooting

​CloudWatch Metrics

​Common Issues

​Too Many Connections

​Connection Timeouts

​Security

​1. Network Isolation

​2. Encryption

​3. Credential Management

​Related Modules

Overview

Key Features

Core Components

1. Database Credentials Management

Random Password Generation

Secrets Manager Storage

2. Aurora PostgreSQL Cluster

Storage and Encryption

Backup Configuration

Maintenance Configuration

CloudWatch Logs Integration

3. Cluster Instances

4. Parameter Group

max_connections

shared_buffers

5. Network Configuration

DB Subnet Group

Security Group

6. Cluster Endpoints

Writer Endpoint

Reader Endpoint

Module Configuration

Basic Configuration

Production Configuration

Serverless v2 Configuration

Connecting to the Database

From Application Pods

From Bastion Host

Connection Pooling

Database Maintenance

Backup and Restore

Manual Snapshot

Point-in-Time Restore

Restore from Snapshot

Parameter Changes

Version Upgrades

Monitoring and Troubleshooting

CloudWatch Metrics

Common Issues

Too Many Connections

Connection Timeouts

Security

1. Network Isolation

2. Encryption

3. Credential Management

Related Modules