Skip to content

xalgorithm/epyc2

Repository files navigation

Kubernetes Infrastructure on Proxmox

A comprehensive Infrastructure as Code (IaC) solution for deploying a production-ready Kubernetes cluster on Proxmox with monitoring, backup, and network scanning capabilities.

πŸ—οΈ Architecture Overview

This project deploys a complete Kubernetes infrastructure stack including:

  • Kubernetes Cluster: K3s-based cluster with control plane and worker nodes
  • Load Balancing: MetalLB for bare-metal load balancing
  • Monitoring Stack: Prometheus, Grafana, Loki, and Mimir for comprehensive observability
  • Log Aggregation: Syslog receiver for OPNsense and external device logs
  • Backup System: Automated backup solution with NFS storage
  • Ingress: Nginx ingress controller with host-based routing
  • Automation: N8N workflow automation platform

πŸš€ Quick Start

Prerequisites

  • Proxmox VE 7.0+ with API access
  • Terraform 1.0+
  • SSH key pair for VM access
  • NFS server for backup storage (optional)

1. Clone and Configure

git clone <repository-url>
cd kubernetes-proxmox-infrastructure
cp terraform.tfvars.example terraform.tfvars

2. Configure Variables

Edit terraform.tfvars with your environment settings:

# Proxmox Configuration
proxmox_api_url      = "https://your-proxmox:8006/api2/json"
proxmox_api_token_id = "your-token-id"
proxmox_api_token_secret = "your-token-secret"

# VM Configuration
ssh_public_key_path  = "~/.ssh/id_ed25519.pub"
ssh_private_key_path = "~/.ssh/id_ed25519"

# Network Configuration
vm_network_bridge = "vmbr0"
vm_network_vlan   = 100

# NFS Backup Configuration (optional)
nfs_server_ip   = "192.168.1.100"
nfs_backup_path = "/data/kubernetes/backups"

3. Deploy Infrastructure

# Pre-flight checks
./scripts/deployment/pre-flight-check.sh

# Deploy full stack
./scripts/deployment/deploy-full-stack.sh

πŸ“ Project Structure

β”œβ”€β”€ terraform/                    # Terraform infrastructure code
β”‚   β”œβ”€β”€ infrastructure/          # Proxmox VMs, networking
β”‚   β”œβ”€β”€ kubernetes/              # K8s clusters, storage, ingress
β”‚   β”œβ”€β”€ applications/            # Application deployments
β”‚   β”œβ”€β”€ platform/                # Monitoring, backup, logging
β”‚   β”œβ”€β”€ main.tf                  # Main configuration
β”‚   β”œβ”€β”€ providers.tf             # Provider configurations
β”‚   β”œβ”€β”€ variables.tf             # Variable definitions
β”‚   β”œβ”€β”€ outputs.tf               # Output definitions
β”‚   └── terraform.tfvars         # Environment variables
β”œβ”€β”€ docs/                        # Documentation
β”‚   β”œβ”€β”€ deployment/              # Deployment guides
β”‚   β”œβ”€β”€ backup/                  # Backup documentation
β”‚   β”œβ”€β”€ monitoring/              # Monitoring setup
β”‚   └── troubleshooting/         # Troubleshooting guides
β”œβ”€β”€ scripts/                     # Automation scripts
β”‚   β”œβ”€β”€ deployment/              # Deployment scripts
β”‚   β”œβ”€β”€ backup/                  # Backup and restore scripts
β”‚   β”œβ”€β”€ maintenance/             # Maintenance scripts
β”‚   └── troubleshooting/         # Troubleshooting scripts
β”œβ”€β”€ configs/                     # Configuration files
β”‚   β”œβ”€β”€ grafana/                 # Grafana dashboards
β”‚   β”œβ”€β”€ prometheus/              # Prometheus configs
β”‚   └── backup/                  # Backup configurations
└── README.md                    # This file

πŸ”§ Components

Infrastructure (Terraform)

The Terraform configuration is organized into logical subdirectories for better maintainability:

Directory Structure

  • infrastructure/: Proxmox VMs, networking, and base infrastructure
  • kubernetes/: Kubernetes clusters, storage, and ingress configuration
  • applications/: Application deployments (Immich, media apps, automation)
  • platform/: Platform services (monitoring, backup, logging)

Core Files

  • main.tf: Main configuration and resource orchestration
  • providers.tf: Provider configurations (Proxmox, Kubernetes, Helm)
  • variables.tf: Input variable declarations
  • outputs.tf: Output value declarations
  • backend.tf: Backend configuration for state management
  • versions.tf: Terraform and provider version constraints

See terraform/README.md for detailed structure documentation.

Key Features

πŸ” Monitoring & Observability

  • Prometheus: Metrics collection and alerting
  • Grafana: Visualization and dashboards
  • Loki: Log aggregation and analysis
  • Mimir: Long-term metrics storage

πŸ’Ύ Backup & Recovery

  • Automated Backups: Scheduled ETCD and application data backups
  • Manual Backup Triggers: On-demand backup capabilities
  • Restoration Testing: Comprehensive restore validation
  • NFS Storage: Centralized backup storage with redundancy

🌐 Networking

  • MetalLB: Layer 2 load balancing for bare-metal
  • Traefik Ingress: HTTP/HTTPS routing with automatic SSL
  • Network Policies: Secure inter-pod communication

πŸ“– Documentation

Deployment

Backup & Recovery

Monitoring

Troubleshooting

πŸ› οΈ Common Operations

Deployment

# Navigate to terraform directory
cd terraform

# Full stack deployment
terraform init
terraform plan
terraform apply

# Or use deployment scripts from root
./scripts/deployment/deploy-full-stack.sh

Backup Operations

# Manual backup (all components)
./scripts/backup/trigger-manual-backup.sh

# Test backup restoration
./scripts/backup/test-backup-restoration.sh dry-run

# Restore specific component
./scripts/backup/test-individual-restore.sh grafana

Maintenance

# Check NFS permissions
./scripts/maintenance/test-nfs-permissions.sh

# Update Grafana dashboards
./scripts/maintenance/update-grafana-dashboards.sh

Troubleshooting

# Diagnose NFS access issues
./scripts/troubleshooting/diagnose-nfs-access.sh



# Fix kubeconfig secret encoding
./scripts/troubleshooting/fix-kubeconfig-secret.sh

πŸ” Security Considerations

  • SSH Key Authentication: Password authentication disabled by default
  • Network Segmentation: VLANs and network policies for isolation
  • Secret Management: Kubernetes secrets for sensitive data
  • Backup Encryption: Consider encrypting backup data at rest
  • Access Control: RBAC policies for service accounts

πŸ“Š Monitoring & Alerting

Default Dashboards

  • Kubernetes Cluster Overview: Node and pod metrics

  • Backup Monitoring: Backup status and performance

  • Application Metrics: Component-specific dashboards

Key Metrics

  • Cluster resource utilization
  • Backup success/failure rates
  • Network device discovery status
  • Application performance metrics

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

  • Documentation: Check the docs/ directory for detailed guides
  • Issues: Report bugs and feature requests via GitHub issues
  • Troubleshooting: Use the troubleshooting scripts in scripts/troubleshooting/

🏷️ Version

Current version: 1.0.0

πŸ“ Changelog

See CHANGELOG.md for version history and updates.


Note: This infrastructure is designed for production use but should be thoroughly tested in your environment before deployment. Always follow your organization's security and operational guidelines.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published