Compare commits

..

13 Commits

Author SHA1 Message Date
f6f8b04785 fix(tui): improve restore error display formatting
All checks were successful
CI/CD / Test (push) Successful in 1m17s
CI/CD / Lint (push) Successful in 1m27s
CI/CD / Build & Release (push) Successful in 3m16s
- Parse and structure error messages for clean TUI display
- Extract error type, message, hint, and failed databases
- Show specific recommendations based on error type
- Fix for 'out of shared memory' - suggest profile settings
- Limit line width to prevent scrambled display
- Add structured sections: Error Details, Diagnosis, Recommendations
2026-01-18 11:48:07 +01:00
670c9af2e7 feat(tui): add resource profiles for backup/restore operations
All checks were successful
CI/CD / Test (push) Successful in 1m17s
CI/CD / Lint (push) Successful in 1m25s
CI/CD / Build & Release (push) Successful in 3m15s
- Add memory detection for Linux, macOS, Windows
- Add 5 predefined profiles: conservative, balanced, performance, max-performance, large-db
- Add Resource Profile and Cluster Parallelism settings in TUI
- Add quick hotkeys: 'l' for large-db, 'c' for conservative, 'p' for recommendations
- Display system resources (CPU cores, memory) and recommended profile
- Auto-detect and recommend profile based on system resources

Fixes 'out of shared memory' errors when restoring large databases on small VMs.
Use 'large-db' or 'conservative' profile for large databases on constrained systems.
2026-01-18 11:38:24 +01:00
e2cf9adc62 fix: improve cleanup toggle UX when database detection fails
All checks were successful
CI/CD / Test (push) Successful in 1m17s
CI/CD / Lint (push) Successful in 1m28s
CI/CD / Build & Release (push) Successful in 3m13s
- Allow cleanup toggle even when preview detection failed
- Show 'detection pending' message instead of blocking the toggle
- Will re-detect databases at restore execution time
- Always show cleanup toggle option for cluster restores
- Better messaging: 'enabled/disabled' instead of showing 0 count
2026-01-17 17:07:26 +01:00
29e089fe3b fix: re-detect databases at execution time for cluster cleanup
All checks were successful
CI/CD / Test (push) Successful in 1m16s
CI/CD / Lint (push) Successful in 1m25s
CI/CD / Build & Release (push) Successful in 3m10s
- Detection in preview may fail or return stale results
- Re-detect user databases when cleanup is enabled at execution time
- Fall back to preview list if re-detection fails
- Ensures actual databases are dropped, not just what was detected earlier
2026-01-17 17:00:28 +01:00
9396c8e605 fix: add debug logging for database detection
All checks were successful
CI/CD / Test (push) Successful in 1m19s
CI/CD / Lint (push) Successful in 1m34s
CI/CD / Build & Release (push) Successful in 3m24s
- Always set cmd.Env to preserve PGPASSWORD from environment
- Add debug logging for connection parameters and results
- Helps diagnose cluster restore database detection issues
2026-01-17 16:54:20 +01:00
e363e1937f fix: cluster restore database detection and TUI error display
All checks were successful
CI/CD / Test (push) Successful in 1m14s
CI/CD / Lint (push) Successful in 1m25s
CI/CD / Build & Release (push) Successful in 3m16s
- Fixed psql connection for database detection (always use -h flag)
- Use CombinedOutput() to capture stderr for better diagnostics
- Added existingDBError tracking in restore preview
- Show 'Unable to detect' instead of misleading 'None' when listing fails
- Disable cleanup toggle when database detection failed
2026-01-17 16:44:44 +01:00
df1ab2f55b feat: TUI improvements and consistency fixes
All checks were successful
CI/CD / Test (push) Successful in 1m14s
CI/CD / Lint (push) Successful in 1m23s
CI/CD / Build & Release (push) Successful in 3m10s
- Add product branding header to main menu (version + tagline)
- Fix backup success/error report formatting consistency
- Remove extra newline before error box in backup_exec
- Align backup and restore completion screens
2026-01-17 16:26:00 +01:00
0e050b2def fix: cluster backup TUI success report formatting consistency
All checks were successful
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m24s
CI/CD / Build & Release (push) Successful in 3m10s
- Aligned box width and indentation with restore success screen
- Removed inconsistent 2-space prefix from success/error boxes
- Standardized content indentation to 4 spaces
- Moved timing section outside else block (always shown)
- Updated footer style to match restore screen
2026-01-17 16:15:16 +01:00
62d58c77af feat(restore): add --parallel-dbs=-1 auto-detection based on CPU/RAM
All checks were successful
CI/CD / Test (push) Successful in 1m16s
CI/CD / Lint (push) Successful in 1m25s
CI/CD / Build & Release (push) Successful in 3m14s
- Add CalculateOptimalParallel() function to preflight.go
- Calculates optimal workers: min(RAM/3GB, CPU cores), capped at 16
- Reduces parallelism by 50% if memory pressure >80%
- Add -1 flag value for auto-detection mode
- Preflight summary now shows CPU cores and recommended parallel
2026-01-17 13:41:28 +01:00
c5be9bcd2b fix(grafana): update dashboard queries and thresholds
All checks were successful
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m26s
CI/CD / Build & Release (push) Successful in 3m13s
- Fix Last Backup Status panel to use bool modifier for proper 1/0 values
- Change RPO threshold from 24h to 7 days (604800s) for status check
- Clean up table transformations to exclude duplicate fields
- Update variable refresh to trigger on time range change
2026-01-17 13:24:54 +01:00
b120f1507e style: format struct field alignment
All checks were successful
CI/CD / Test (push) Successful in 1m18s
CI/CD / Lint (push) Successful in 1m26s
CI/CD / Build & Release (push) Has been skipped
2026-01-17 11:44:05 +01:00
dd1db844ce fix: improve lock capacity calculation for smaller VMs
All checks were successful
CI/CD / Test (push) Successful in 1m16s
CI/CD / Lint (push) Successful in 1m25s
CI/CD / Build & Release (push) Successful in 3m13s
- Fix boostLockCapacity: max_locks_per_transaction requires RESTART, not reload
- Calculate total lock capacity: max_locks × (max_connections + max_prepared_txns)
- Add TotalLockCapacity to preflight checks with warning if < 200,000
- Update error hints to explain capacity formula and recommend 4096+ for small VMs
- Show max_connections and total capacity in preflight summary

Fixes OOM 'out of shared memory' errors on VMs with reduced resources
2026-01-17 07:48:17 +01:00
4ea3ec2cf8 fix(preflight): improve BLOB count detection and block restore when max_locks_per_transaction is critically low
All checks were successful
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m26s
CI/CD / Build & Release (push) Successful in 3m14s
- Add higher lock boost tiers for extreme BLOB counts (100K+, 200K+)
- Calculate actual lock requirement: (totalBLOBs / max_connections) * 1.5
- Block restore with CRITICAL error when BLOB count exceeds 2x safe limit
- Improve preflight summary with PASSED/FAILED status display
- Add clearer fix instructions for max_locks_per_transaction
2026-01-17 07:25:45 +01:00
14 changed files with 1295 additions and 150 deletions

View File

@@ -4,8 +4,8 @@ This directory contains pre-compiled binaries for the DB Backup Tool across mult
## Build Information ## Build Information
- **Version**: 3.42.50 - **Version**: 3.42.50
- **Build Time**: 2026-01-16_17:31:38_UTC - **Build Time**: 2026-01-18_10:38:39_UTC
- **Git Commit**: 698b8a7 - **Git Commit**: 670c9af
## Recent Updates (v1.1.0) ## Recent Updates (v1.1.0)
- ✅ Fixed TUI progress display with line-by-line output - ✅ Fixed TUI progress display with line-by-line output

View File

@@ -290,7 +290,7 @@ func init() {
restoreClusterCmd.Flags().BoolVar(&restoreForce, "force", false, "Skip safety checks and confirmations") restoreClusterCmd.Flags().BoolVar(&restoreForce, "force", false, "Skip safety checks and confirmations")
restoreClusterCmd.Flags().BoolVar(&restoreCleanCluster, "clean-cluster", false, "Drop all existing user databases before restore (disaster recovery)") restoreClusterCmd.Flags().BoolVar(&restoreCleanCluster, "clean-cluster", false, "Drop all existing user databases before restore (disaster recovery)")
restoreClusterCmd.Flags().IntVar(&restoreJobs, "jobs", 0, "Number of parallel decompression jobs (0 = auto)") restoreClusterCmd.Flags().IntVar(&restoreJobs, "jobs", 0, "Number of parallel decompression jobs (0 = auto)")
restoreClusterCmd.Flags().IntVar(&restoreParallelDBs, "parallel-dbs", 0, "Number of databases to restore in parallel (0 = use config default, 1 = sequential)") restoreClusterCmd.Flags().IntVar(&restoreParallelDBs, "parallel-dbs", 0, "Number of databases to restore in parallel (0 = use config default, 1 = sequential, -1 = auto-detect based on CPU/RAM)")
restoreClusterCmd.Flags().StringVar(&restoreWorkdir, "workdir", "", "Working directory for extraction (use when system disk is small, e.g. /mnt/storage/restore_tmp)") restoreClusterCmd.Flags().StringVar(&restoreWorkdir, "workdir", "", "Working directory for extraction (use when system disk is small, e.g. /mnt/storage/restore_tmp)")
restoreClusterCmd.Flags().BoolVar(&restoreVerbose, "verbose", false, "Show detailed restore progress") restoreClusterCmd.Flags().BoolVar(&restoreVerbose, "verbose", false, "Show detailed restore progress")
restoreClusterCmd.Flags().BoolVar(&restoreNoProgress, "no-progress", false, "Disable progress indicators") restoreClusterCmd.Flags().BoolVar(&restoreNoProgress, "no-progress", false, "Disable progress indicators")
@@ -786,7 +786,12 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
} }
// Override cluster parallelism if --parallel-dbs is specified // Override cluster parallelism if --parallel-dbs is specified
if restoreParallelDBs > 0 { if restoreParallelDBs == -1 {
// Auto-detect optimal parallelism based on system resources
autoParallel := restore.CalculateOptimalParallel()
cfg.ClusterParallelism = autoParallel
log.Info("Auto-detected optimal parallelism for database restores", "parallel_dbs", autoParallel, "mode", "auto")
} else if restoreParallelDBs > 0 {
cfg.ClusterParallelism = restoreParallelDBs cfg.ClusterParallelism = restoreParallelDBs
log.Info("Using custom parallelism for database restores", "parallel_dbs", restoreParallelDBs) log.Info("Using custom parallelism for database restores", "parallel_dbs", restoreParallelDBs)
} }

View File

@@ -94,7 +94,7 @@
"uid": "${DS_PROMETHEUS}" "uid": "${DS_PROMETHEUS}"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < 86400", "expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < bool 604800",
"legendFormat": "{{database}}", "legendFormat": "{{database}}",
"range": true, "range": true,
"refId": "A" "refId": "A"
@@ -711,19 +711,6 @@
}, },
"pluginVersion": "10.2.0", "pluginVersion": "10.2.0",
"targets": [ "targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < 86400",
"format": "table",
"instant": true,
"legendFormat": "__auto",
"range": false,
"refId": "Status"
},
{ {
"datasource": { "datasource": {
"type": "prometheus", "type": "prometheus",
@@ -769,26 +756,30 @@
"Time": true, "Time": true,
"Time 1": true, "Time 1": true,
"Time 2": true, "Time 2": true,
"Time 3": true,
"__name__": true, "__name__": true,
"__name__ 1": true, "__name__ 1": true,
"__name__ 2": true, "__name__ 2": true,
"__name__ 3": true,
"instance 1": true, "instance 1": true,
"instance 2": true, "instance 2": true,
"instance 3": true,
"job": true, "job": true,
"job 1": true, "job 1": true,
"job 2": true, "job 2": true,
"job 3": true "engine 1": true,
"engine 2": true
},
"indexByName": {
"Database": 0,
"Instance": 1,
"Engine": 2,
"RPO": 3,
"Size": 4
}, },
"indexByName": {},
"renameByName": { "renameByName": {
"Value #RPO": "RPO", "Value #RPO": "RPO",
"Value #Size": "Size", "Value #Size": "Size",
"Value #Status": "Status",
"database": "Database", "database": "Database",
"instance": "Instance" "instance": "Instance",
"engine": "Engine"
} }
} }
} }
@@ -1275,7 +1266,7 @@
"query": "label_values(dbbackup_rpo_seconds, instance)", "query": "label_values(dbbackup_rpo_seconds, instance)",
"refId": "StandardVariableQuery" "refId": "StandardVariableQuery"
}, },
"refresh": 1, "refresh": 2,
"regex": "", "regex": "",
"skipUrlSync": false, "skipUrlSync": false,
"sort": 1, "sort": 1,

View File

@@ -68,8 +68,8 @@ func ClassifyError(errorMsg string) *ErrorClassification {
Type: "critical", Type: "critical",
Category: "locks", Category: "locks",
Message: errorMsg, Message: errorMsg,
Hint: "Lock table exhausted - typically caused by large objects (BLOBs) during restore", Hint: "Lock table exhausted. Total capacity = max_locks_per_transaction × (max_connections + max_prepared_transactions). If you reduced VM size or max_connections, you need higher max_locks_per_transaction to compensate.",
Action: "Option 1: Increase max_locks_per_transaction to 1024+ in postgresql.conf (requires restart). Option 2: Update dbbackup and retry - phased restore now auto-enabled for BLOB databases", Action: "Fix: ALTER SYSTEM SET max_locks_per_transaction = 4096; then RESTART PostgreSQL. For smaller VMs with fewer connections, you need higher max_locks_per_transaction values.",
Severity: 2, Severity: 2,
} }
case "permission_denied": case "permission_denied":
@@ -142,8 +142,8 @@ func ClassifyError(errorMsg string) *ErrorClassification {
Type: "critical", Type: "critical",
Category: "locks", Category: "locks",
Message: errorMsg, Message: errorMsg,
Hint: "Lock table exhausted - typically caused by large objects (BLOBs) during restore", Hint: "Lock table exhausted. Total capacity = max_locks_per_transaction × (max_connections + max_prepared_transactions). If you reduced VM size or max_connections, you need higher max_locks_per_transaction to compensate.",
Action: "Option 1: Increase max_locks_per_transaction to 1024+ in postgresql.conf (requires restart). Option 2: Update dbbackup and retry - phased restore now auto-enabled for BLOB databases", Action: "Fix: ALTER SYSTEM SET max_locks_per_transaction = 4096; then RESTART PostgreSQL. For smaller VMs with fewer connections, you need higher max_locks_per_transaction values.",
Severity: 2, Severity: 2,
} }
} }

View File

@@ -36,9 +36,13 @@ type Config struct {
AutoDetectCores bool AutoDetectCores bool
CPUWorkloadType string // "cpu-intensive", "io-intensive", "balanced" CPUWorkloadType string // "cpu-intensive", "io-intensive", "balanced"
// Resource profile for backup/restore operations
ResourceProfile string // "conservative", "balanced", "performance", "max-performance", "large-db"
// CPU detection // CPU detection
CPUDetector *cpu.Detector CPUDetector *cpu.Detector
CPUInfo *cpu.CPUInfo CPUInfo *cpu.CPUInfo
MemoryInfo *cpu.MemoryInfo // System memory information
// Sample backup options // Sample backup options
SampleStrategy string // "ratio", "percent", "count" SampleStrategy string // "ratio", "percent", "count"
@@ -178,6 +182,13 @@ func New() *Config {
sslMode = "" sslMode = ""
} }
// Detect memory information
memInfo, _ := cpu.DetectMemory()
// Determine recommended resource profile
recommendedProfile := cpu.RecommendProfile(cpuInfo, memInfo, false)
defaultProfile := getEnvString("RESOURCE_PROFILE", recommendedProfile.Name)
cfg := &Config{ cfg := &Config{
// Database defaults // Database defaults
Host: host, Host: host,
@@ -197,10 +208,12 @@ func New() *Config {
MaxCores: getEnvInt("MAX_CORES", getDefaultMaxCores(cpuInfo)), MaxCores: getEnvInt("MAX_CORES", getDefaultMaxCores(cpuInfo)),
AutoDetectCores: getEnvBool("AUTO_DETECT_CORES", true), AutoDetectCores: getEnvBool("AUTO_DETECT_CORES", true),
CPUWorkloadType: getEnvString("CPU_WORKLOAD_TYPE", "balanced"), CPUWorkloadType: getEnvString("CPU_WORKLOAD_TYPE", "balanced"),
ResourceProfile: defaultProfile,
// CPU detection // CPU and memory detection
CPUDetector: cpuDetector, CPUDetector: cpuDetector,
CPUInfo: cpuInfo, CPUInfo: cpuInfo,
MemoryInfo: memInfo,
// Sample backup defaults // Sample backup defaults
SampleStrategy: getEnvString("SAMPLE_STRATEGY", "ratio"), SampleStrategy: getEnvString("SAMPLE_STRATEGY", "ratio"),
@@ -409,6 +422,45 @@ func (c *Config) OptimizeForCPU() error {
return nil return nil
} }
// ApplyResourceProfile applies a resource profile to the configuration
// This adjusts parallelism settings based on the chosen profile
func (c *Config) ApplyResourceProfile(profileName string) error {
profile := cpu.GetProfileByName(profileName)
if profile == nil {
return &ConfigError{
Field: "resource_profile",
Value: profileName,
Message: "unknown profile. Valid profiles: conservative, balanced, performance, max-performance, large-db",
}
}
// Validate profile against current system
isValid, warnings := cpu.ValidateProfileForSystem(profile, c.CPUInfo, c.MemoryInfo)
if !isValid {
// Log warnings but don't block - user may know what they're doing
_ = warnings // In production, log these warnings
}
// Apply profile settings
c.ResourceProfile = profile.Name
c.ClusterParallelism = profile.ClusterParallelism
c.Jobs = profile.Jobs
c.DumpJobs = profile.DumpJobs
return nil
}
// GetResourceProfileRecommendation returns the recommended profile and reason
func (c *Config) GetResourceProfileRecommendation(isLargeDB bool) (string, string) {
profile, reason := cpu.RecommendProfileWithReason(c.CPUInfo, c.MemoryInfo, isLargeDB)
return profile.Name, reason
}
// GetCurrentProfile returns the current resource profile details
func (c *Config) GetCurrentProfile() *cpu.ResourceProfile {
return cpu.GetProfileByName(c.ResourceProfile)
}
// GetCPUInfo returns CPU information, detecting if necessary // GetCPUInfo returns CPU information, detecting if necessary
func (c *Config) GetCPUInfo() (*cpu.CPUInfo, error) { func (c *Config) GetCPUInfo() (*cpu.CPUInfo, error) {
if c.CPUInfo != nil { if c.CPUInfo != nil {

445
internal/cpu/profiles.go Normal file
View File

@@ -0,0 +1,445 @@
package cpu
import (
"bufio"
"fmt"
"os"
"os/exec"
"runtime"
"strconv"
"strings"
)
// MemoryInfo holds system memory information
type MemoryInfo struct {
TotalBytes int64 `json:"total_bytes"`
AvailableBytes int64 `json:"available_bytes"`
FreeBytes int64 `json:"free_bytes"`
UsedBytes int64 `json:"used_bytes"`
SwapTotalBytes int64 `json:"swap_total_bytes"`
SwapFreeBytes int64 `json:"swap_free_bytes"`
TotalGB int `json:"total_gb"`
AvailableGB int `json:"available_gb"`
Platform string `json:"platform"`
}
// ResourceProfile defines a resource allocation profile for backup/restore operations
type ResourceProfile struct {
Name string `json:"name"`
Description string `json:"description"`
ClusterParallelism int `json:"cluster_parallelism"` // Concurrent databases
Jobs int `json:"jobs"` // Parallel jobs within pg_restore
DumpJobs int `json:"dump_jobs"` // Parallel jobs for pg_dump
MaintenanceWorkMem string `json:"maintenance_work_mem"` // PostgreSQL recommendation
MaxLocksPerTxn int `json:"max_locks_per_txn"` // PostgreSQL recommendation
RecommendedForLarge bool `json:"recommended_for_large"` // Suitable for large DBs?
MinMemoryGB int `json:"min_memory_gb"` // Minimum memory for this profile
MinCores int `json:"min_cores"` // Minimum cores for this profile
}
// Predefined resource profiles
var (
// ProfileConservative - Safe for constrained VMs, avoids shared memory issues
ProfileConservative = ResourceProfile{
Name: "conservative",
Description: "Safe for small VMs (2-4 cores, <16GB). Sequential operations, minimal memory pressure. Best for large DBs on limited hardware.",
ClusterParallelism: 1,
Jobs: 1,
DumpJobs: 2,
MaintenanceWorkMem: "256MB",
MaxLocksPerTxn: 4096,
RecommendedForLarge: true,
MinMemoryGB: 4,
MinCores: 2,
}
// ProfileBalanced - Default profile, works for most scenarios
ProfileBalanced = ResourceProfile{
Name: "balanced",
Description: "Balanced for medium VMs (4-8 cores, 16-32GB). Moderate parallelism with good safety margin.",
ClusterParallelism: 2,
Jobs: 2,
DumpJobs: 4,
MaintenanceWorkMem: "512MB",
MaxLocksPerTxn: 2048,
RecommendedForLarge: true,
MinMemoryGB: 16,
MinCores: 4,
}
// ProfilePerformance - Aggressive parallelism for powerful servers
ProfilePerformance = ResourceProfile{
Name: "performance",
Description: "Aggressive for powerful servers (8+ cores, 32GB+). Maximum parallelism for fast operations.",
ClusterParallelism: 4,
Jobs: 4,
DumpJobs: 8,
MaintenanceWorkMem: "1GB",
MaxLocksPerTxn: 1024,
RecommendedForLarge: false, // Large DBs may still need conservative
MinMemoryGB: 32,
MinCores: 8,
}
// ProfileMaxPerformance - Maximum parallelism for high-end servers
ProfileMaxPerformance = ResourceProfile{
Name: "max-performance",
Description: "Maximum for high-end servers (16+ cores, 64GB+). Full CPU utilization.",
ClusterParallelism: 8,
Jobs: 8,
DumpJobs: 16,
MaintenanceWorkMem: "2GB",
MaxLocksPerTxn: 512,
RecommendedForLarge: false, // Large DBs should use conservative
MinMemoryGB: 64,
MinCores: 16,
}
// ProfileLargeDB - Optimized specifically for large databases
ProfileLargeDB = ResourceProfile{
Name: "large-db",
Description: "Optimized for large databases with many tables/BLOBs. Prevents 'out of shared memory' errors.",
ClusterParallelism: 1,
Jobs: 2,
DumpJobs: 2,
MaintenanceWorkMem: "1GB",
MaxLocksPerTxn: 8192,
RecommendedForLarge: true,
MinMemoryGB: 8,
MinCores: 2,
}
// AllProfiles contains all available profiles
AllProfiles = []ResourceProfile{
ProfileConservative,
ProfileBalanced,
ProfilePerformance,
ProfileMaxPerformance,
ProfileLargeDB,
}
)
// GetProfileByName returns a profile by its name
func GetProfileByName(name string) *ResourceProfile {
for _, p := range AllProfiles {
if strings.EqualFold(p.Name, name) {
return &p
}
}
return nil
}
// DetectMemory detects system memory information
func DetectMemory() (*MemoryInfo, error) {
info := &MemoryInfo{
Platform: runtime.GOOS,
}
switch runtime.GOOS {
case "linux":
if err := detectLinuxMemory(info); err != nil {
return info, fmt.Errorf("linux memory detection failed: %w", err)
}
case "darwin":
if err := detectDarwinMemory(info); err != nil {
return info, fmt.Errorf("darwin memory detection failed: %w", err)
}
case "windows":
if err := detectWindowsMemory(info); err != nil {
return info, fmt.Errorf("windows memory detection failed: %w", err)
}
default:
// Fallback: use Go runtime memory stats
var memStats runtime.MemStats
runtime.ReadMemStats(&memStats)
info.TotalBytes = int64(memStats.Sys)
info.AvailableBytes = int64(memStats.Sys - memStats.Alloc)
}
// Calculate GB values
info.TotalGB = int(info.TotalBytes / (1024 * 1024 * 1024))
info.AvailableGB = int(info.AvailableBytes / (1024 * 1024 * 1024))
return info, nil
}
// detectLinuxMemory reads memory info from /proc/meminfo
func detectLinuxMemory(info *MemoryInfo) error {
file, err := os.Open("/proc/meminfo")
if err != nil {
return err
}
defer file.Close()
scanner := bufio.NewScanner(file)
for scanner.Scan() {
line := scanner.Text()
parts := strings.Fields(line)
if len(parts) < 2 {
continue
}
key := strings.TrimSuffix(parts[0], ":")
value, err := strconv.ParseInt(parts[1], 10, 64)
if err != nil {
continue
}
// Values are in kB
valueBytes := value * 1024
switch key {
case "MemTotal":
info.TotalBytes = valueBytes
case "MemAvailable":
info.AvailableBytes = valueBytes
case "MemFree":
info.FreeBytes = valueBytes
case "SwapTotal":
info.SwapTotalBytes = valueBytes
case "SwapFree":
info.SwapFreeBytes = valueBytes
}
}
info.UsedBytes = info.TotalBytes - info.AvailableBytes
return scanner.Err()
}
// detectDarwinMemory detects memory on macOS
func detectDarwinMemory(info *MemoryInfo) error {
// Use sysctl for total memory
if output, err := runCommand("sysctl", "-n", "hw.memsize"); err == nil {
if val, err := strconv.ParseInt(strings.TrimSpace(output), 10, 64); err == nil {
info.TotalBytes = val
}
}
// Use vm_stat for available memory (more complex parsing required)
if output, err := runCommand("vm_stat"); err == nil {
pageSize := int64(4096) // Default page size
var freePages, inactivePages int64
lines := strings.Split(output, "\n")
for _, line := range lines {
if strings.Contains(line, "page size of") {
parts := strings.Fields(line)
for i, p := range parts {
if p == "of" && i+1 < len(parts) {
if ps, err := strconv.ParseInt(parts[i+1], 10, 64); err == nil {
pageSize = ps
}
}
}
} else if strings.Contains(line, "Pages free:") {
val := extractNumberFromLine(line)
freePages = val
} else if strings.Contains(line, "Pages inactive:") {
val := extractNumberFromLine(line)
inactivePages = val
}
}
info.FreeBytes = freePages * pageSize
info.AvailableBytes = (freePages + inactivePages) * pageSize
}
info.UsedBytes = info.TotalBytes - info.AvailableBytes
return nil
}
// detectWindowsMemory detects memory on Windows
func detectWindowsMemory(info *MemoryInfo) error {
// Use wmic for memory info
if output, err := runCommand("wmic", "OS", "get", "TotalVisibleMemorySize,FreePhysicalMemory", "/format:list"); err == nil {
lines := strings.Split(output, "\n")
for _, line := range lines {
line = strings.TrimSpace(line)
if strings.HasPrefix(line, "TotalVisibleMemorySize=") {
val := strings.TrimPrefix(line, "TotalVisibleMemorySize=")
if v, err := strconv.ParseInt(strings.TrimSpace(val), 10, 64); err == nil {
info.TotalBytes = v * 1024 // KB to bytes
}
} else if strings.HasPrefix(line, "FreePhysicalMemory=") {
val := strings.TrimPrefix(line, "FreePhysicalMemory=")
if v, err := strconv.ParseInt(strings.TrimSpace(val), 10, 64); err == nil {
info.FreeBytes = v * 1024
info.AvailableBytes = v * 1024
}
}
}
}
info.UsedBytes = info.TotalBytes - info.AvailableBytes
return nil
}
// RecommendProfile recommends a resource profile based on system resources and workload
func RecommendProfile(cpuInfo *CPUInfo, memInfo *MemoryInfo, isLargeDB bool) *ResourceProfile {
cores := 0
if cpuInfo != nil {
cores = cpuInfo.PhysicalCores
if cores == 0 {
cores = cpuInfo.LogicalCores
}
}
if cores == 0 {
cores = runtime.NumCPU()
}
memGB := 0
if memInfo != nil {
memGB = memInfo.TotalGB
}
// Special case: large databases should always use conservative/large-db profile
if isLargeDB {
if memGB >= 32 && cores >= 8 {
return &ProfileLargeDB // Still conservative but with more memory for maintenance
}
return &ProfileConservative
}
// Resource-based selection
if cores >= 16 && memGB >= 64 {
return &ProfileMaxPerformance
} else if cores >= 8 && memGB >= 32 {
return &ProfilePerformance
} else if cores >= 4 && memGB >= 16 {
return &ProfileBalanced
}
// Default to conservative for constrained systems
return &ProfileConservative
}
// RecommendProfileWithReason returns a profile recommendation with explanation
func RecommendProfileWithReason(cpuInfo *CPUInfo, memInfo *MemoryInfo, isLargeDB bool) (*ResourceProfile, string) {
cores := 0
if cpuInfo != nil {
cores = cpuInfo.PhysicalCores
if cores == 0 {
cores = cpuInfo.LogicalCores
}
}
if cores == 0 {
cores = runtime.NumCPU()
}
memGB := 0
if memInfo != nil {
memGB = memInfo.TotalGB
}
// Build reason string
var reason strings.Builder
reason.WriteString(fmt.Sprintf("System: %d cores, %dGB RAM. ", cores, memGB))
profile := RecommendProfile(cpuInfo, memInfo, isLargeDB)
if isLargeDB {
reason.WriteString("Large database detected - using conservative settings to avoid 'out of shared memory' errors.")
} else if profile.Name == "conservative" {
reason.WriteString("Limited resources detected - using conservative profile for stability.")
} else if profile.Name == "max-performance" {
reason.WriteString("High-end server detected - using maximum parallelism.")
} else if profile.Name == "performance" {
reason.WriteString("Good resources detected - using performance profile.")
} else {
reason.WriteString("Using balanced profile for optimal performance/stability trade-off.")
}
return profile, reason.String()
}
// ValidateProfileForSystem checks if a profile is suitable for the current system
func ValidateProfileForSystem(profile *ResourceProfile, cpuInfo *CPUInfo, memInfo *MemoryInfo) (bool, []string) {
var warnings []string
cores := 0
if cpuInfo != nil {
cores = cpuInfo.PhysicalCores
if cores == 0 {
cores = cpuInfo.LogicalCores
}
}
if cores == 0 {
cores = runtime.NumCPU()
}
memGB := 0
if memInfo != nil {
memGB = memInfo.TotalGB
}
// Check minimum requirements
if cores < profile.MinCores {
warnings = append(warnings,
fmt.Sprintf("Profile '%s' recommends %d+ cores (system has %d)", profile.Name, profile.MinCores, cores))
}
if memGB < profile.MinMemoryGB {
warnings = append(warnings,
fmt.Sprintf("Profile '%s' recommends %dGB+ RAM (system has %dGB)", profile.Name, profile.MinMemoryGB, memGB))
}
// Check for potential issues
if profile.ClusterParallelism > cores {
warnings = append(warnings,
fmt.Sprintf("Cluster parallelism (%d) exceeds CPU cores (%d) - may cause contention",
profile.ClusterParallelism, cores))
}
// Memory pressure warning
memPerWorker := 2 // Rough estimate: 2GB per parallel worker for large DB operations
requiredMem := profile.ClusterParallelism * profile.Jobs * memPerWorker
if memGB > 0 && requiredMem > memGB {
warnings = append(warnings,
fmt.Sprintf("High parallelism may require ~%dGB RAM (system has %dGB) - risk of OOM",
requiredMem, memGB))
}
return len(warnings) == 0, warnings
}
// FormatProfileSummary returns a formatted summary of a profile
func (p *ResourceProfile) FormatProfileSummary() string {
return fmt.Sprintf("[%s] Parallel: %d DBs, %d jobs | Recommended for large DBs: %v",
strings.ToUpper(p.Name),
p.ClusterParallelism,
p.Jobs,
p.RecommendedForLarge)
}
// PostgreSQLRecommendations returns PostgreSQL configuration recommendations for this profile
func (p *ResourceProfile) PostgreSQLRecommendations() []string {
return []string{
fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d;", p.MaxLocksPerTxn),
fmt.Sprintf("ALTER SYSTEM SET maintenance_work_mem = '%s';", p.MaintenanceWorkMem),
"-- Restart PostgreSQL after changes to max_locks_per_transaction",
}
}
// Helper functions
func runCommand(name string, args ...string) (string, error) {
cmd := exec.Command(name, args...)
output, err := cmd.Output()
if err != nil {
return "", err
}
return string(output), nil
}
func extractNumberFromLine(line string) int64 {
// Extract number before the period at end (e.g., "Pages free: 123456.")
parts := strings.Fields(line)
for _, p := range parts {
p = strings.TrimSuffix(p, ".")
if val, err := strconv.ParseInt(p, 10, 64); err == nil && val > 0 {
return val
}
}
return 0
}

View File

@@ -2125,9 +2125,10 @@ func (e *Engine) quickValidateSQLDump(archivePath string, compressed bool) error
return nil return nil
} }
// boostLockCapacity temporarily increases max_locks_per_transaction to prevent OOM // boostLockCapacity checks and reports on max_locks_per_transaction capacity.
// during large restores with many BLOBs. Returns the original value for later reset. // IMPORTANT: max_locks_per_transaction requires a PostgreSQL RESTART to change!
// Uses ALTER SYSTEM + pg_reload_conf() so no restart is needed. // This function now calculates total lock capacity based on max_connections and
// warns the user if capacity is insufficient for the restore.
func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) { func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
// Connect to PostgreSQL to run system commands // Connect to PostgreSQL to run system commands
connStr := fmt.Sprintf("host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable", connStr := fmt.Sprintf("host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable",
@@ -2145,7 +2146,7 @@ func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
} }
defer db.Close() defer db.Close()
// Get current value // Get current max_locks_per_transaction
var currentValue int var currentValue int
err = db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&currentValue) err = db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&currentValue)
if err != nil { if err != nil {
@@ -2158,22 +2159,56 @@ func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
fmt.Sscanf(currentValueStr, "%d", &currentValue) fmt.Sscanf(currentValueStr, "%d", &currentValue)
} }
// Skip if already high enough // Get max_connections to calculate total lock capacity
if currentValue >= 2048 { var maxConns int
e.log.Info("max_locks_per_transaction already sufficient", "value", currentValue) if err := db.QueryRowContext(ctx, "SHOW max_connections").Scan(&maxConns); err != nil {
return currentValue, nil maxConns = 100 // default
} }
// Boost to 2048 (enough for most BLOB-heavy databases) // Get max_prepared_transactions
_, err = db.ExecContext(ctx, "ALTER SYSTEM SET max_locks_per_transaction = 2048") var maxPreparedTxns int
if err != nil { if err := db.QueryRowContext(ctx, "SHOW max_prepared_transactions").Scan(&maxPreparedTxns); err != nil {
return currentValue, fmt.Errorf("failed to set max_locks_per_transaction: %w", err) maxPreparedTxns = 0
} }
// Reload config without restart // Calculate total lock table capacity:
_, err = db.ExecContext(ctx, "SELECT pg_reload_conf()") // Total locks = max_locks_per_transaction × (max_connections + max_prepared_transactions)
totalLockCapacity := currentValue * (maxConns + maxPreparedTxns)
e.log.Info("PostgreSQL lock table capacity",
"max_locks_per_transaction", currentValue,
"max_connections", maxConns,
"max_prepared_transactions", maxPreparedTxns,
"total_lock_capacity", totalLockCapacity)
// Minimum recommended total capacity for BLOB-heavy restores: 200,000 locks
minRecommendedCapacity := 200000
if totalLockCapacity < minRecommendedCapacity {
recommendedMaxLocks := minRecommendedCapacity / (maxConns + maxPreparedTxns)
if recommendedMaxLocks < 4096 {
recommendedMaxLocks = 4096
}
e.log.Warn("Lock table capacity may be insufficient for BLOB-heavy restores",
"current_total_capacity", totalLockCapacity,
"recommended_capacity", minRecommendedCapacity,
"current_max_locks", currentValue,
"recommended_max_locks", recommendedMaxLocks,
"note", "max_locks_per_transaction requires PostgreSQL RESTART to change")
// Write suggested fix to ALTER SYSTEM but warn about restart
_, err = db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", recommendedMaxLocks))
if err != nil { if err != nil {
return currentValue, fmt.Errorf("failed to reload config: %w", err) e.log.Warn("Could not set recommended max_locks_per_transaction (needs superuser)", "error", err)
} else {
e.log.Warn("Wrote recommended max_locks_per_transaction to postgresql.auto.conf",
"value", recommendedMaxLocks,
"action", "RESTART PostgreSQL to apply: sudo systemctl restart postgresql")
}
} else {
e.log.Info("Lock table capacity is sufficient",
"total_capacity", totalLockCapacity,
"max_locks_per_transaction", currentValue)
} }
return currentValue, nil return currentValue, nil

View File

@@ -16,6 +16,57 @@ import (
"github.com/shirou/gopsutil/v3/mem" "github.com/shirou/gopsutil/v3/mem"
) )
// CalculateOptimalParallel returns the recommended number of parallel workers
// based on available system resources (CPU cores and RAM).
// This is a standalone function that can be called from anywhere.
// Returns 0 if resources cannot be detected.
func CalculateOptimalParallel() int {
cpuCores := runtime.NumCPU()
vmem, err := mem.VirtualMemory()
if err != nil {
// Fallback: use half of CPU cores if memory detection fails
if cpuCores > 1 {
return cpuCores / 2
}
return 1
}
memAvailableGB := float64(vmem.Available) / (1024 * 1024 * 1024)
// Each pg_restore worker needs approximately 2-4GB of RAM
// Use conservative 3GB per worker to avoid OOM
const memPerWorkerGB = 3.0
// Calculate limits
maxByMem := int(memAvailableGB / memPerWorkerGB)
maxByCPU := cpuCores
// Use the minimum of memory and CPU limits
recommended := maxByMem
if maxByCPU < recommended {
recommended = maxByCPU
}
// Apply sensible bounds
if recommended < 1 {
recommended = 1
}
if recommended > 16 {
recommended = 16 // Cap at 16 to avoid diminishing returns
}
// If memory pressure is high (>80%), reduce parallelism
if vmem.UsedPercent > 80 && recommended > 1 {
recommended = recommended / 2
if recommended < 1 {
recommended = 1
}
}
return recommended
}
// PreflightResult contains all preflight check results // PreflightResult contains all preflight check results
type PreflightResult struct { type PreflightResult struct {
// Linux system checks // Linux system checks
@@ -40,6 +91,8 @@ type LinuxChecks struct {
MemTotal uint64 // Total RAM in bytes MemTotal uint64 // Total RAM in bytes
MemAvailable uint64 // Available RAM in bytes MemAvailable uint64 // Available RAM in bytes
MemUsedPercent float64 // Memory usage percentage MemUsedPercent float64 // Memory usage percentage
CPUCores int // Number of CPU cores
RecommendedParallel int // Auto-calculated optimal parallel count
ShmMaxOK bool // Is shmmax sufficient? ShmMaxOK bool // Is shmmax sufficient?
ShmAllOK bool // Is shmall sufficient? ShmAllOK bool // Is shmall sufficient?
MemAvailableOK bool // Is available RAM sufficient? MemAvailableOK bool // Is available RAM sufficient?
@@ -49,6 +102,8 @@ type LinuxChecks struct {
// PostgreSQLChecks contains PostgreSQL configuration checks // PostgreSQLChecks contains PostgreSQL configuration checks
type PostgreSQLChecks struct { type PostgreSQLChecks struct {
MaxLocksPerTransaction int // Current setting MaxLocksPerTransaction int // Current setting
MaxPreparedTransactions int // Current setting (affects lock capacity)
TotalLockCapacity int // Calculated: max_locks × (max_connections + max_prepared)
MaintenanceWorkMem string // Current setting MaintenanceWorkMem string // Current setting
SharedBuffers string // Current setting (info only) SharedBuffers string // Current setting (info only)
MaxConnections int // Current setting MaxConnections int // Current setting
@@ -98,6 +153,7 @@ func (e *Engine) RunPreflightChecks(ctx context.Context, dumpsDir string, entrie
// checkSystemResources uses gopsutil for cross-platform system checks // checkSystemResources uses gopsutil for cross-platform system checks
func (e *Engine) checkSystemResources(result *PreflightResult) { func (e *Engine) checkSystemResources(result *PreflightResult) {
result.Linux.IsLinux = runtime.GOOS == "linux" result.Linux.IsLinux = runtime.GOOS == "linux"
result.Linux.CPUCores = runtime.NumCPU()
// Get memory info (works on Linux, macOS, Windows, BSD) // Get memory info (works on Linux, macOS, Windows, BSD)
if vmem, err := mem.VirtualMemory(); err == nil { if vmem, err := mem.VirtualMemory(); err == nil {
@@ -116,6 +172,9 @@ func (e *Engine) checkSystemResources(result *PreflightResult) {
e.log.Warn("Could not detect system memory", "error", err) e.log.Warn("Could not detect system memory", "error", err)
} }
// Calculate recommended parallel based on resources
result.Linux.RecommendedParallel = e.calculateRecommendedParallel(result)
// Linux-specific kernel checks (shmmax, shmall) // Linux-specific kernel checks (shmmax, shmall)
if result.Linux.IsLinux { if result.Linux.IsLinux {
e.checkLinuxKernel(result) e.checkLinuxKernel(result)
@@ -201,6 +260,29 @@ func (e *Engine) checkPostgreSQL(ctx context.Context, result *PreflightResult) {
result.PostgreSQL.IsSuperuser = isSuperuser result.PostgreSQL.IsSuperuser = isSuperuser
} }
// Check max_prepared_transactions for lock capacity calculation
var maxPreparedTxns string
if err := db.QueryRowContext(ctx, "SHOW max_prepared_transactions").Scan(&maxPreparedTxns); err == nil {
result.PostgreSQL.MaxPreparedTransactions, _ = strconv.Atoi(maxPreparedTxns)
}
// CRITICAL: Calculate TOTAL lock table capacity
// Formula: max_locks_per_transaction × (max_connections + max_prepared_transactions)
// This is THE key capacity metric for BLOB-heavy restores
maxConns := result.PostgreSQL.MaxConnections
if maxConns == 0 {
maxConns = 100 // default
}
maxPrepared := result.PostgreSQL.MaxPreparedTransactions
totalLockCapacity := result.PostgreSQL.MaxLocksPerTransaction * (maxConns + maxPrepared)
result.PostgreSQL.TotalLockCapacity = totalLockCapacity
e.log.Info("PostgreSQL lock table capacity",
"max_locks_per_transaction", result.PostgreSQL.MaxLocksPerTransaction,
"max_connections", maxConns,
"max_prepared_transactions", maxPrepared,
"total_lock_capacity", totalLockCapacity)
// CRITICAL: max_locks_per_transaction requires PostgreSQL RESTART to change! // CRITICAL: max_locks_per_transaction requires PostgreSQL RESTART to change!
// Warn users loudly about this - it's the #1 cause of "out of shared memory" errors // Warn users loudly about this - it's the #1 cause of "out of shared memory" errors
if result.PostgreSQL.MaxLocksPerTransaction < 256 { if result.PostgreSQL.MaxLocksPerTransaction < 256 {
@@ -212,10 +294,38 @@ func (e *Engine) checkPostgreSQL(ctx context.Context, result *PreflightResult) {
result.Warnings = append(result.Warnings, result.Warnings = append(result.Warnings,
fmt.Sprintf("max_locks_per_transaction=%d is low (recommend 256+). "+ fmt.Sprintf("max_locks_per_transaction=%d is low (recommend 256+). "+
"This setting requires PostgreSQL RESTART to change. "+ "This setting requires PostgreSQL RESTART to change. "+
"BLOB-heavy databases may fail with 'out of shared memory' error.", "BLOB-heavy databases may fail with 'out of shared memory' error. "+
"Fix: Edit postgresql.conf, set max_locks_per_transaction=2048, then restart PostgreSQL.",
result.PostgreSQL.MaxLocksPerTransaction)) result.PostgreSQL.MaxLocksPerTransaction))
} }
// NEW: Check total lock capacity is sufficient for typical BLOB operations
// Minimum recommended: 200,000 for moderate BLOB databases
minRecommendedCapacity := 200000
if totalLockCapacity < minRecommendedCapacity {
recommendedMaxLocks := minRecommendedCapacity / (maxConns + maxPrepared)
if recommendedMaxLocks < 4096 {
recommendedMaxLocks = 4096
}
e.log.Warn("Total lock table capacity is LOW for BLOB-heavy restores",
"current_capacity", totalLockCapacity,
"recommended", minRecommendedCapacity,
"current_max_locks", result.PostgreSQL.MaxLocksPerTransaction,
"current_max_connections", maxConns,
"recommended_max_locks", recommendedMaxLocks,
"note", "VMs with fewer connections need higher max_locks_per_transaction")
result.Warnings = append(result.Warnings,
fmt.Sprintf("Total lock capacity=%d is low (recommend %d+). "+
"Capacity = max_locks_per_transaction(%d) × max_connections(%d). "+
"If you reduced VM size/connections, increase max_locks_per_transaction to %d. "+
"Fix: ALTER SYSTEM SET max_locks_per_transaction = %d; then RESTART PostgreSQL.",
totalLockCapacity, minRecommendedCapacity,
result.PostgreSQL.MaxLocksPerTransaction, maxConns,
recommendedMaxLocks, recommendedMaxLocks))
}
// Parse shared_buffers and warn if very low // Parse shared_buffers and warn if very low
sharedBuffersMB := parseMemoryToMB(result.PostgreSQL.SharedBuffers) sharedBuffersMB := parseMemoryToMB(result.PostgreSQL.SharedBuffers)
if sharedBuffersMB > 0 && sharedBuffersMB < 256 { if sharedBuffersMB > 0 && sharedBuffersMB < 256 {
@@ -324,20 +434,113 @@ func (e *Engine) calculateRecommendations(result *PreflightResult) {
if result.Archive.TotalBlobCount > 50000 { if result.Archive.TotalBlobCount > 50000 {
lockBoost = 16384 lockBoost = 16384
} }
if result.Archive.TotalBlobCount > 100000 {
lockBoost = 32768
}
if result.Archive.TotalBlobCount > 200000 {
lockBoost = 65536
}
// Cap at reasonable maximum // For extreme cases, calculate actual requirement
if lockBoost > 16384 { // Rule of thumb: ~1 lock per BLOB, divided by max_connections (default 100)
lockBoost = 16384 // Add 50% safety margin
maxConns := result.PostgreSQL.MaxConnections
if maxConns == 0 {
maxConns = 100 // default
}
calculatedLocks := (result.Archive.TotalBlobCount / maxConns) * 3 / 2 // 1.5x safety margin
if calculatedLocks > lockBoost {
lockBoost = calculatedLocks
} }
result.Archive.RecommendedLockBoost = lockBoost result.Archive.RecommendedLockBoost = lockBoost
// CRITICAL: Check if current max_locks_per_transaction is dangerously low for this BLOB count
currentLocks := result.PostgreSQL.MaxLocksPerTransaction
if currentLocks > 0 && result.Archive.TotalBlobCount > 0 {
// Estimate max BLOBs we can handle: locks * max_connections
maxSafeBLOBs := currentLocks * maxConns
if result.Archive.TotalBlobCount > maxSafeBLOBs {
severity := "WARNING"
if result.Archive.TotalBlobCount > maxSafeBLOBs*2 {
severity = "CRITICAL"
result.CanProceed = false
}
e.log.Error(fmt.Sprintf("%s: max_locks_per_transaction too low for BLOB count", severity),
"current_max_locks", currentLocks,
"total_blobs", result.Archive.TotalBlobCount,
"max_safe_blobs", maxSafeBLOBs,
"recommended_max_locks", lockBoost)
result.Errors = append(result.Errors,
fmt.Sprintf("%s: Archive contains %s BLOBs but max_locks_per_transaction=%d can only safely handle ~%s. "+
"Increase max_locks_per_transaction to %d in postgresql.conf and RESTART PostgreSQL.",
severity,
humanize.Comma(int64(result.Archive.TotalBlobCount)),
currentLocks,
humanize.Comma(int64(maxSafeBLOBs)),
lockBoost))
}
}
// Log recommendation // Log recommendation
e.log.Info("Calculated recommended lock boost", e.log.Info("Calculated recommended lock boost",
"total_blobs", result.Archive.TotalBlobCount, "total_blobs", result.Archive.TotalBlobCount,
"recommended_locks", lockBoost) "recommended_locks", lockBoost)
} }
// calculateRecommendedParallel determines optimal parallelism based on system resources
// Returns the recommended number of parallel workers for pg_restore
func (e *Engine) calculateRecommendedParallel(result *PreflightResult) int {
cpuCores := result.Linux.CPUCores
if cpuCores == 0 {
cpuCores = runtime.NumCPU()
}
memAvailableGB := float64(result.Linux.MemAvailable) / (1024 * 1024 * 1024)
// Each pg_restore worker needs approximately 2-4GB of RAM
// Use conservative 3GB per worker to avoid OOM
const memPerWorkerGB = 3.0
// Calculate limits
maxByMem := int(memAvailableGB / memPerWorkerGB)
maxByCPU := cpuCores
// Use the minimum of memory and CPU limits
recommended := maxByMem
if maxByCPU < recommended {
recommended = maxByCPU
}
// Apply sensible bounds
if recommended < 1 {
recommended = 1
}
if recommended > 16 {
recommended = 16 // Cap at 16 to avoid diminishing returns
}
// If memory pressure is high (>80%), reduce parallelism
if result.Linux.MemUsedPercent > 80 && recommended > 1 {
recommended = recommended / 2
if recommended < 1 {
recommended = 1
}
}
e.log.Info("Calculated recommended parallel",
"cpu_cores", cpuCores,
"mem_available_gb", fmt.Sprintf("%.1f", memAvailableGB),
"max_by_mem", maxByMem,
"max_by_cpu", maxByCPU,
"recommended", recommended)
return recommended
}
// printPreflightSummary prints a nice summary of all checks // printPreflightSummary prints a nice summary of all checks
func (e *Engine) printPreflightSummary(result *PreflightResult) { func (e *Engine) printPreflightSummary(result *PreflightResult) {
fmt.Println() fmt.Println()
@@ -350,6 +553,8 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
printCheck("Total RAM", humanize.Bytes(result.Linux.MemTotal), true) printCheck("Total RAM", humanize.Bytes(result.Linux.MemTotal), true)
printCheck("Available RAM", humanize.Bytes(result.Linux.MemAvailable), result.Linux.MemAvailableOK || result.Linux.MemAvailable == 0) printCheck("Available RAM", humanize.Bytes(result.Linux.MemAvailable), result.Linux.MemAvailableOK || result.Linux.MemAvailable == 0)
printCheck("Memory Usage", fmt.Sprintf("%.1f%%", result.Linux.MemUsedPercent), result.Linux.MemUsedPercent < 85) printCheck("Memory Usage", fmt.Sprintf("%.1f%%", result.Linux.MemUsedPercent), result.Linux.MemUsedPercent < 85)
printCheck("CPU Cores", fmt.Sprintf("%d", result.Linux.CPUCores), true)
printCheck("Recommended Parallel", fmt.Sprintf("%d (auto-calculated)", result.Linux.RecommendedParallel), true)
// Linux-specific kernel checks // Linux-specific kernel checks
if result.Linux.IsLinux && result.Linux.ShmMax > 0 { if result.Linux.IsLinux && result.Linux.ShmMax > 0 {
@@ -365,6 +570,13 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
humanize.Comma(int64(result.PostgreSQL.MaxLocksPerTransaction)), humanize.Comma(int64(result.PostgreSQL.MaxLocksPerTransaction)),
humanize.Comma(int64(result.Archive.RecommendedLockBoost))), humanize.Comma(int64(result.Archive.RecommendedLockBoost))),
true) true)
printCheck("max_connections", humanize.Comma(int64(result.PostgreSQL.MaxConnections)), true)
// Show total lock capacity with warning if low
totalCapacityOK := result.PostgreSQL.TotalLockCapacity >= 200000
printCheck("Total Lock Capacity",
fmt.Sprintf("%s (max_locks × max_conns)",
humanize.Comma(int64(result.PostgreSQL.TotalLockCapacity))),
totalCapacityOK)
printCheck("maintenance_work_mem", fmt.Sprintf("%s → 2GB (auto-boost)", printCheck("maintenance_work_mem", fmt.Sprintf("%s → 2GB (auto-boost)",
result.PostgreSQL.MaintenanceWorkMem), true) result.PostgreSQL.MaintenanceWorkMem), true)
printInfo("shared_buffers", result.PostgreSQL.SharedBuffers) printInfo("shared_buffers", result.PostgreSQL.SharedBuffers)
@@ -386,6 +598,14 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
} }
} }
// Errors (blocking issues)
if len(result.Errors) > 0 {
fmt.Println("\n ✗ ERRORS (must fix before proceeding):")
for _, e := range result.Errors {
fmt.Printf(" • %s\n", e)
}
}
// Warnings // Warnings
if len(result.Warnings) > 0 { if len(result.Warnings) > 0 {
fmt.Println("\n ⚠ Warnings:") fmt.Println("\n ⚠ Warnings:")
@@ -394,6 +614,23 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
} }
} }
// Final status
fmt.Println()
if !result.CanProceed {
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
fmt.Println(" │ ✗ PREFLIGHT FAILED - Cannot proceed with restore │")
fmt.Println(" │ Fix the errors above and try again. │")
fmt.Println(" └─────────────────────────────────────────────────────────┘")
} else if len(result.Warnings) > 0 {
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
fmt.Println(" │ ⚠ PREFLIGHT PASSED WITH WARNINGS - Proceed with care │")
fmt.Println(" └─────────────────────────────────────────────────────────┘")
} else {
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
fmt.Println(" │ ✓ PREFLIGHT PASSED - Ready to restore │")
fmt.Println(" └─────────────────────────────────────────────────────────┘")
}
fmt.Println(strings.Repeat("─", 60)) fmt.Println(strings.Repeat("─", 60))
fmt.Println() fmt.Println()
} }

View File

@@ -334,10 +334,12 @@ func (s *Safety) checkPostgresDatabaseExists(ctx context.Context, dbName string)
"-tAc", fmt.Sprintf("SELECT 1 FROM pg_database WHERE datname='%s'", dbName), "-tAc", fmt.Sprintf("SELECT 1 FROM pg_database WHERE datname='%s'", dbName),
} }
// Only add -h flag if host is not localhost (to use Unix socket for peer auth) // Always add -h flag for explicit host connection (required for password auth)
if s.cfg.Host != "localhost" && s.cfg.Host != "127.0.0.1" && s.cfg.Host != "" { host := s.cfg.Host
args = append([]string{"-h", s.cfg.Host}, args...) if host == "" {
host = "localhost"
} }
args = append([]string{"-h", host}, args...)
cmd := exec.CommandContext(ctx, "psql", args...) cmd := exec.CommandContext(ctx, "psql", args...)
@@ -346,9 +348,9 @@ func (s *Safety) checkPostgresDatabaseExists(ctx context.Context, dbName string)
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password)) cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
} }
output, err := cmd.Output() output, err := cmd.CombinedOutput()
if err != nil { if err != nil {
return false, fmt.Errorf("failed to check database existence: %w", err) return false, fmt.Errorf("failed to check database existence: %w (output: %s)", err, strings.TrimSpace(string(output)))
} }
return strings.TrimSpace(string(output)) == "1", nil return strings.TrimSpace(string(output)) == "1", nil
@@ -405,21 +407,29 @@ func (s *Safety) listPostgresUserDatabases(ctx context.Context) ([]string, error
"-c", query, "-c", query,
} }
// Only add -h flag if host is not localhost (to use Unix socket for peer auth) // Always add -h flag for explicit host connection (required for password auth)
if s.cfg.Host != "localhost" && s.cfg.Host != "127.0.0.1" && s.cfg.Host != "" { // Empty or unset host defaults to localhost
args = append([]string{"-h", s.cfg.Host}, args...) host := s.cfg.Host
if host == "" {
host = "localhost"
} }
args = append([]string{"-h", host}, args...)
cmd := exec.CommandContext(ctx, "psql", args...) cmd := exec.CommandContext(ctx, "psql", args...)
// Set password if provided // Set password - check config first, then environment
env := os.Environ()
if s.cfg.Password != "" { if s.cfg.Password != "" {
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password)) env = append(env, fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
} }
cmd.Env = env
output, err := cmd.Output() s.log.Debug("Listing PostgreSQL databases", "host", host, "port", s.cfg.Port, "user", s.cfg.User)
output, err := cmd.CombinedOutput()
if err != nil { if err != nil {
return nil, fmt.Errorf("failed to list databases: %w", err) // Include psql output in error for debugging
return nil, fmt.Errorf("failed to list databases: %w (output: %s)", err, strings.TrimSpace(string(output)))
} }
// Parse output // Parse output
@@ -432,6 +442,8 @@ func (s *Safety) listPostgresUserDatabases(ctx context.Context) ([]string, error
} }
} }
s.log.Debug("Found user databases", "count", len(databases), "databases", databases, "raw_output", string(output))
return databases, nil return databases, nil
} }

View File

@@ -454,26 +454,24 @@ func (m BackupExecutionModel) View() string {
} else { } else {
// Show completion summary with detailed stats // Show completion summary with detailed stats
if m.err != nil { if m.err != nil {
s.WriteString("\n") s.WriteString(errorStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
s.WriteString(errorStyle.Render(" ╔══════════════════════════════════════════════════════════╗"))
s.WriteString("\n") s.WriteString("\n")
s.WriteString(errorStyle.Render("║ [FAIL] BACKUP FAILED ║")) s.WriteString(errorStyle.Render("║ [FAIL] BACKUP FAILED ║"))
s.WriteString("\n") s.WriteString("\n")
s.WriteString(errorStyle.Render(" ╚══════════════════════════════════════════════════════════╝")) s.WriteString(errorStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
s.WriteString("\n\n") s.WriteString("\n\n")
s.WriteString(errorStyle.Render(fmt.Sprintf(" Error: %v", m.err))) s.WriteString(errorStyle.Render(fmt.Sprintf(" Error: %v", m.err)))
s.WriteString("\n") s.WriteString("\n")
} else { } else {
s.WriteString("\n") s.WriteString(successStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
s.WriteString(successStyle.Render(" ╔══════════════════════════════════════════════════════════╗"))
s.WriteString("\n") s.WriteString("\n")
s.WriteString(successStyle.Render("║ [OK] BACKUP COMPLETED SUCCESSFULLY ║")) s.WriteString(successStyle.Render("║ [OK] BACKUP COMPLETED SUCCESSFULLY ║"))
s.WriteString("\n") s.WriteString("\n")
s.WriteString(successStyle.Render(" ╚══════════════════════════════════════════════════════════╝")) s.WriteString(successStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
s.WriteString("\n\n") s.WriteString("\n\n")
// Summary section // Summary section
s.WriteString(infoStyle.Render(" ─── Summary ─────────────────────────────────────────────")) s.WriteString(infoStyle.Render(" ─── Summary ───────────────────────────────────────────────"))
s.WriteString("\n\n") s.WriteString("\n\n")
// Backup type specific info // Backup type specific info
@@ -493,26 +491,24 @@ func (m BackupExecutionModel) View() string {
} }
s.WriteString("\n") s.WriteString("\n")
}
// Timing section // Timing section (always shown, consistent with restore)
s.WriteString(infoStyle.Render(" ─── Timing ──────────────────────────────────────────────")) s.WriteString(infoStyle.Render(" ─── Timing ────────────────────────────────────────────────"))
s.WriteString("\n\n") s.WriteString("\n\n")
elapsed := time.Since(m.startTime) elapsed := time.Since(m.startTime)
s.WriteString(fmt.Sprintf(" Total Time: %s\n", formatBackupDuration(elapsed))) s.WriteString(fmt.Sprintf(" Total Time: %s\n", formatBackupDuration(elapsed)))
if m.backupType == "cluster" && m.dbTotal > 0 { if m.backupType == "cluster" && m.dbTotal > 0 && m.err == nil {
avgPerDB := elapsed / time.Duration(m.dbTotal) avgPerDB := elapsed / time.Duration(m.dbTotal)
s.WriteString(fmt.Sprintf(" Avg per DB: %s\n", formatBackupDuration(avgPerDB))) s.WriteString(fmt.Sprintf(" Avg per DB: %s\n", formatBackupDuration(avgPerDB)))
} }
s.WriteString("\n") s.WriteString("\n")
s.WriteString(infoStyle.Render(" ─────────────────────────────────────────────────────────")) s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
s.WriteString("\n") s.WriteString("\n\n")
} s.WriteString(infoStyle.Render(" [KEYS] Press Enter to continue"))
s.WriteString("\n")
s.WriteString(" [KEY] Press Enter or ESC to return to menu\n")
} }
return s.String() return s.String()

View File

@@ -299,9 +299,13 @@ func (m *MenuModel) View() string {
var s string var s string
// Product branding header
brandLine := fmt.Sprintf("dbbackup v%s • Enterprise Database Backup & Recovery", m.config.Version)
s += "\n" + infoStyle.Render(brandLine) + "\n"
// Header // Header
header := titleStyle.Render("Database Backup Tool - Interactive Menu") header := titleStyle.Render("Interactive Menu")
s += fmt.Sprintf("\n%s\n\n", header) s += fmt.Sprintf("%s\n\n", header)
if len(m.dbTypes) > 0 { if len(m.dbTypes) > 0 {
options := make([]string, len(m.dbTypes)) options := make([]string, len(m.dbTypes))

View File

@@ -273,7 +273,20 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
defer dbClient.Close() defer dbClient.Close()
// STEP 1: Clean cluster if requested (drop all existing user databases) // STEP 1: Clean cluster if requested (drop all existing user databases)
if restoreType == "restore-cluster" && cleanClusterFirst && len(existingDBs) > 0 { if restoreType == "restore-cluster" && cleanClusterFirst {
// Re-detect databases at execution time to get current state
// The preview list may be stale or detection may have failed earlier
safety := restore.NewSafety(cfg, log)
currentDBs, err := safety.ListUserDatabases(ctx)
if err != nil {
log.Warn("Failed to list databases for cleanup, using preview list", "error", err)
currentDBs = existingDBs // Fall back to preview list
} else if len(currentDBs) > 0 {
log.Info("Re-detected user databases for cleanup", "count", len(currentDBs), "databases", currentDBs)
existingDBs = currentDBs // Update with fresh list
}
if len(existingDBs) > 0 {
log.Info("Dropping existing user databases before cluster restore", "count", len(existingDBs)) log.Info("Dropping existing user databases before cluster restore", "count", len(existingDBs))
// Drop databases using command-line psql (no connection required) // Drop databases using command-line psql (no connection required)
@@ -293,6 +306,9 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
} }
log.Info("Cluster cleanup completed", "dropped", droppedCount, "total", len(existingDBs)) log.Info("Cluster cleanup completed", "dropped", droppedCount, "total", len(existingDBs))
} else {
log.Info("No user databases to clean up")
}
} }
// STEP 2: Create restore engine with silent progress (no stdout interference with TUI) // STEP 2: Create restore engine with silent progress (no stdout interference with TUI)
@@ -643,7 +659,13 @@ func (m RestoreExecutionModel) View() string {
s.WriteString("\n") s.WriteString("\n")
s.WriteString(errorStyle.Render("╚══════════════════════════════════════════════════════════════╝")) s.WriteString(errorStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
s.WriteString("\n\n") s.WriteString("\n\n")
s.WriteString(errorStyle.Render(fmt.Sprintf(" Error: %v", m.err)))
// Parse and display error in a clean, structured format
errStr := m.err.Error()
// Extract key parts from the error message
errDisplay := formatRestoreError(errStr)
s.WriteString(errDisplay)
s.WriteString("\n") s.WriteString("\n")
} else { } else {
s.WriteString(successStyle.Render("╔══════════════════════════════════════════════════════════════╗")) s.WriteString(successStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
@@ -989,3 +1011,188 @@ func dropDatabaseCLI(ctx context.Context, cfg *config.Config, dbName string) err
return nil return nil
} }
// formatRestoreError formats a restore error message for clean TUI display
func formatRestoreError(errStr string) string {
var s strings.Builder
maxLineWidth := 60
// Common patterns to extract
patterns := []struct {
key string
pattern string
}{
{"Error Type", "ERROR:"},
{"Hint", "HINT:"},
{"Last Error", "last error:"},
{"Total Errors", "total errors:"},
}
// First, try to extract a clean error summary
errLines := strings.Split(errStr, "\n")
// Find the main error message (first line or first ERROR:)
mainError := ""
hint := ""
totalErrors := ""
dbsFailed := []string{}
for _, line := range errLines {
line = strings.TrimSpace(line)
if line == "" {
continue
}
// Extract ERROR messages
if strings.Contains(line, "ERROR:") {
if mainError == "" {
// Get just the ERROR part
idx := strings.Index(line, "ERROR:")
if idx >= 0 {
mainError = strings.TrimSpace(line[idx:])
// Truncate if too long
if len(mainError) > maxLineWidth {
mainError = mainError[:maxLineWidth-3] + "..."
}
}
}
}
// Extract HINT
if strings.Contains(line, "HINT:") {
idx := strings.Index(line, "HINT:")
if idx >= 0 {
hint = strings.TrimSpace(line[idx+5:])
if len(hint) > maxLineWidth {
hint = hint[:maxLineWidth-3] + "..."
}
}
}
// Extract total errors count
if strings.Contains(line, "total errors:") {
idx := strings.Index(line, "total errors:")
if idx >= 0 {
totalErrors = strings.TrimSpace(line[idx+13:])
// Just extract the number
parts := strings.Fields(totalErrors)
if len(parts) > 0 {
totalErrors = parts[0]
}
}
}
// Extract failed database names (for cluster restore)
if strings.Contains(line, ": restore failed:") {
parts := strings.SplitN(line, ":", 2)
if len(parts) > 0 {
dbName := strings.TrimSpace(parts[0])
if dbName != "" && !strings.HasPrefix(dbName, "Error") {
dbsFailed = append(dbsFailed, dbName)
}
}
}
}
// If no structured error found, use the first line
if mainError == "" {
firstLine := errStr
if idx := strings.Index(errStr, "\n"); idx > 0 {
firstLine = errStr[:idx]
}
if len(firstLine) > maxLineWidth*2 {
firstLine = firstLine[:maxLineWidth*2-3] + "..."
}
mainError = firstLine
}
// Build structured error display
s.WriteString(infoStyle.Render(" ─── Error Details ─────────────────────────────────────────"))
s.WriteString("\n\n")
// Error type detection
errorType := "critical"
if strings.Contains(errStr, "out of shared memory") || strings.Contains(errStr, "max_locks_per_transaction") {
errorType = "critical"
} else if strings.Contains(errStr, "connection") {
errorType = "connection"
} else if strings.Contains(errStr, "permission") || strings.Contains(errStr, "access") {
errorType = "permission"
}
s.WriteString(fmt.Sprintf(" Type: %s\n", errorType))
s.WriteString(fmt.Sprintf(" Message: %s\n", mainError))
if hint != "" {
s.WriteString(fmt.Sprintf(" Hint: %s\n", hint))
}
if totalErrors != "" {
s.WriteString(fmt.Sprintf(" Total Errors: %s\n", totalErrors))
}
// Show failed databases (max 5)
if len(dbsFailed) > 0 {
s.WriteString("\n")
s.WriteString(" Failed Databases:\n")
for i, db := range dbsFailed {
if i >= 5 {
s.WriteString(fmt.Sprintf(" ... and %d more\n", len(dbsFailed)-5))
break
}
s.WriteString(fmt.Sprintf(" • %s\n", db))
}
}
s.WriteString("\n")
s.WriteString(infoStyle.Render(" ─── Diagnosis ─────────────────────────────────────────────"))
s.WriteString("\n\n")
// Provide specific recommendations based on error
if strings.Contains(errStr, "out of shared memory") || strings.Contains(errStr, "max_locks_per_transaction") {
s.WriteString(errorStyle.Render(" • Cannot access file: stat : no such file or directory\n"))
s.WriteString("\n")
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
s.WriteString("\n\n")
s.WriteString(" Lock table exhausted. Total capacity = max_locks_per_transaction\n")
s.WriteString(" × (max_connections + max_prepared_transactions).\n\n")
s.WriteString(" If you reduced VM size or max_connections, you need higher\n")
s.WriteString(" max_locks_per_transaction to compensate.\n\n")
s.WriteString(successStyle.Render(" FIX OPTIONS:\n"))
s.WriteString(" 1. Use 'conservative' or 'large-db' profile in Settings\n")
s.WriteString(" (press 'l' for large-db, 'c' for conservative)\n\n")
s.WriteString(" 2. Increase PostgreSQL locks:\n")
s.WriteString(" ALTER SYSTEM SET max_locks_per_transaction = 4096;\n")
s.WriteString(" Then RESTART PostgreSQL.\n\n")
s.WriteString(" 3. Reduce parallel jobs:\n")
s.WriteString(" Set Cluster Parallelism = 1 in Settings\n")
} else if strings.Contains(errStr, "connection") || strings.Contains(errStr, "refused") {
s.WriteString(" • Database connection failed\n\n")
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
s.WriteString("\n\n")
s.WriteString(" 1. Check database is running\n")
s.WriteString(" 2. Verify host, port, and credentials in Settings\n")
s.WriteString(" 3. Check firewall/network connectivity\n")
} else if strings.Contains(errStr, "permission") || strings.Contains(errStr, "denied") {
s.WriteString(" • Permission denied\n\n")
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
s.WriteString("\n\n")
s.WriteString(" 1. Verify database user has sufficient privileges\n")
s.WriteString(" 2. Grant CREATE/DROP DATABASE permissions if restoring cluster\n")
s.WriteString(" 3. Check file system permissions on backup directory\n")
} else {
s.WriteString(" See error message above for details.\n\n")
s.WriteString(infoStyle.Render(" ─── [HINT] General Recommendations ────────────────────────"))
s.WriteString("\n\n")
s.WriteString(" 1. Check the full error log for details\n")
s.WriteString(" 2. Try restoring with 'conservative' profile (press 'c')\n")
s.WriteString(" 3. For large databases, use 'large-db' profile (press 'l')\n")
}
s.WriteString("\n")
// Suppress the pattern variable since we don't use it but defined it
_ = patterns
return s.String()
}

View File

@@ -55,6 +55,7 @@ type RestorePreviewModel struct {
cleanClusterFirst bool // For cluster restore: drop all user databases first cleanClusterFirst bool // For cluster restore: drop all user databases first
existingDBCount int // Number of existing user databases existingDBCount int // Number of existing user databases
existingDBs []string // List of existing user databases existingDBs []string // List of existing user databases
existingDBError string // Error message if database listing failed
safetyChecks []SafetyCheck safetyChecks []SafetyCheck
checking bool checking bool
canProceed bool canProceed bool
@@ -102,6 +103,7 @@ type safetyCheckCompleteMsg struct {
canProceed bool canProceed bool
existingDBCount int existingDBCount int
existingDBs []string existingDBs []string
existingDBError string
} }
func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string) tea.Cmd { func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string) tea.Cmd {
@@ -221,10 +223,12 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
check = SafetyCheck{Name: "Existing databases", Status: "checking", Critical: false} check = SafetyCheck{Name: "Existing databases", Status: "checking", Critical: false}
// Get list of existing user databases (exclude templates and system DBs) // Get list of existing user databases (exclude templates and system DBs)
var existingDBError string
dbList, err := safety.ListUserDatabases(ctx) dbList, err := safety.ListUserDatabases(ctx)
if err != nil { if err != nil {
check.Status = "warning" check.Status = "warning"
check.Message = fmt.Sprintf("Cannot list databases: %v", err) check.Message = fmt.Sprintf("Cannot list databases: %v", err)
existingDBError = err.Error()
} else { } else {
existingDBCount = len(dbList) existingDBCount = len(dbList)
existingDBs = dbList existingDBs = dbList
@@ -238,6 +242,14 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
} }
} }
checks = append(checks, check) checks = append(checks, check)
return safetyCheckCompleteMsg{
checks: checks,
canProceed: canProceed,
existingDBCount: existingDBCount,
existingDBs: existingDBs,
existingDBError: existingDBError,
}
} }
return safetyCheckCompleteMsg{ return safetyCheckCompleteMsg{
@@ -257,6 +269,7 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
m.canProceed = msg.canProceed m.canProceed = msg.canProceed
m.existingDBCount = msg.existingDBCount m.existingDBCount = msg.existingDBCount
m.existingDBs = msg.existingDBs m.existingDBs = msg.existingDBs
m.existingDBError = msg.existingDBError
// Auto-forward in auto-confirm mode // Auto-forward in auto-confirm mode
if m.config.TUIAutoConfirm { if m.config.TUIAutoConfirm {
return m.parent, tea.Quit return m.parent, tea.Quit
@@ -275,10 +288,17 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
case "c": case "c":
if m.mode == "restore-cluster" { if m.mode == "restore-cluster" {
// Toggle cluster cleanup // Toggle cluster cleanup - databases will be re-detected at execution time
m.cleanClusterFirst = !m.cleanClusterFirst m.cleanClusterFirst = !m.cleanClusterFirst
if m.cleanClusterFirst { if m.cleanClusterFirst {
if m.existingDBError != "" {
// Detection failed in preview - will re-detect at execution
m.message = checkWarningStyle.Render("[WARN] Will clean existing databases before restore (detection pending)")
} else if m.existingDBCount > 0 {
m.message = checkWarningStyle.Render(fmt.Sprintf("[WARN] Will drop %d existing database(s) before restore", m.existingDBCount)) m.message = checkWarningStyle.Render(fmt.Sprintf("[WARN] Will drop %d existing database(s) before restore", m.existingDBCount))
} else {
m.message = infoStyle.Render("[INFO] Cleanup enabled (no databases currently detected)")
}
} else { } else {
m.message = fmt.Sprintf("Clean cluster first: disabled") m.message = fmt.Sprintf("Clean cluster first: disabled")
} }
@@ -382,7 +402,12 @@ func (m RestorePreviewModel) View() string {
s.WriteString("\n") s.WriteString("\n")
s.WriteString(fmt.Sprintf(" Host: %s:%d\n", m.config.Host, m.config.Port)) s.WriteString(fmt.Sprintf(" Host: %s:%d\n", m.config.Host, m.config.Port))
if m.existingDBCount > 0 { if m.existingDBError != "" {
// Show warning when database listing failed - but still allow cleanup toggle
s.WriteString(checkWarningStyle.Render(" Existing Databases: Detection failed\n"))
s.WriteString(infoStyle.Render(fmt.Sprintf(" (%s)\n", m.existingDBError)))
s.WriteString(infoStyle.Render(" (Will re-detect at restore time)\n"))
} else if m.existingDBCount > 0 {
s.WriteString(fmt.Sprintf(" Existing Databases: %d found\n", m.existingDBCount)) s.WriteString(fmt.Sprintf(" Existing Databases: %d found\n", m.existingDBCount))
// Show first few database names // Show first few database names
@@ -395,16 +420,19 @@ func (m RestorePreviewModel) View() string {
} }
s.WriteString(fmt.Sprintf(" - %s\n", db)) s.WriteString(fmt.Sprintf(" - %s\n", db))
} }
} else {
s.WriteString(" Existing Databases: None (clean slate)\n")
}
// Always show cleanup toggle for cluster restore
cleanIcon := "[N]" cleanIcon := "[N]"
cleanStyle := infoStyle cleanStyle := infoStyle
if m.cleanClusterFirst { if m.cleanClusterFirst {
cleanIcon = "[Y]" cleanIcon := "[Y]"
cleanStyle = checkWarningStyle cleanStyle = checkWarningStyle
} s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s enabled (press 'c' to toggle)\n", cleanIcon)))
s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s %v (press 'c' to toggle)\n", cleanIcon, m.cleanClusterFirst)))
} else { } else {
s.WriteString(" Existing Databases: None (clean slate)\n") s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s disabled (press 'c' to toggle)\n", cleanIcon)))
} }
s.WriteString("\n") s.WriteString("\n")
} }
@@ -453,10 +481,18 @@ func (m RestorePreviewModel) View() string {
s.WriteString(infoStyle.Render(" All existing data in target database will be dropped!")) s.WriteString(infoStyle.Render(" All existing data in target database will be dropped!"))
s.WriteString("\n\n") s.WriteString("\n\n")
} }
if m.cleanClusterFirst && m.existingDBCount > 0 { if m.cleanClusterFirst {
s.WriteString(checkWarningStyle.Render("[DANGER] WARNING: Cluster cleanup enabled")) s.WriteString(checkWarningStyle.Render("[DANGER] WARNING: Cluster cleanup enabled"))
s.WriteString("\n") s.WriteString("\n")
if m.existingDBError != "" {
s.WriteString(checkWarningStyle.Render(" Existing databases will be DROPPED before restore!"))
s.WriteString("\n")
s.WriteString(infoStyle.Render(" (Database count will be detected at restore time)"))
} else if m.existingDBCount > 0 {
s.WriteString(checkWarningStyle.Render(fmt.Sprintf(" %d existing database(s) will be DROPPED before restore!", m.existingDBCount))) s.WriteString(checkWarningStyle.Render(fmt.Sprintf(" %d existing database(s) will be DROPPED before restore!", m.existingDBCount)))
} else {
s.WriteString(infoStyle.Render(" No databases currently detected - cleanup will verify at restore time"))
}
s.WriteString("\n") s.WriteString("\n")
s.WriteString(infoStyle.Render(" This ensures a clean disaster recovery scenario")) s.WriteString(infoStyle.Render(" This ensures a clean disaster recovery scenario"))
s.WriteString("\n\n") s.WriteString("\n\n")

View File

@@ -10,6 +10,7 @@ import (
"github.com/charmbracelet/lipgloss" "github.com/charmbracelet/lipgloss"
"dbbackup/internal/config" "dbbackup/internal/config"
"dbbackup/internal/cpu"
"dbbackup/internal/logger" "dbbackup/internal/logger"
) )
@@ -101,6 +102,49 @@ func NewSettingsModel(cfg *config.Config, log logger.Logger, parent tea.Model) S
Type: "selector", Type: "selector",
Description: "CPU workload profile (press Enter to cycle: Balanced → CPU-Intensive → I/O-Intensive)", Description: "CPU workload profile (press Enter to cycle: Balanced → CPU-Intensive → I/O-Intensive)",
}, },
{
Key: "resource_profile",
DisplayName: "Resource Profile",
Value: func(c *config.Config) string {
profile := c.GetCurrentProfile()
if profile != nil {
return fmt.Sprintf("%s (P:%d J:%d)", profile.Name, profile.ClusterParallelism, profile.Jobs)
}
return c.ResourceProfile
},
Update: func(c *config.Config, v string) error {
profiles := []string{"conservative", "balanced", "performance", "max-performance", "large-db"}
currentIdx := 0
for i, p := range profiles {
if c.ResourceProfile == p {
currentIdx = i
break
}
}
nextIdx := (currentIdx + 1) % len(profiles)
return c.ApplyResourceProfile(profiles[nextIdx])
},
Type: "selector",
Description: "Resource profile for backup/restore. Use 'conservative' or 'large-db' for large databases on small VMs.",
},
{
Key: "cluster_parallelism",
DisplayName: "Cluster Parallelism",
Value: func(c *config.Config) string { return fmt.Sprintf("%d", c.ClusterParallelism) },
Update: func(c *config.Config, v string) error {
val, err := strconv.Atoi(v)
if err != nil {
return fmt.Errorf("cluster parallelism must be a number")
}
if val < 1 {
return fmt.Errorf("cluster parallelism must be at least 1")
}
c.ClusterParallelism = val
return nil
},
Type: "int",
Description: "Concurrent databases during cluster backup/restore (1=sequential, safer for large DBs)",
},
{ {
Key: "backup_dir", Key: "backup_dir",
DisplayName: "Backup Directory", DisplayName: "Backup Directory",
@@ -528,12 +572,58 @@ func (m SettingsModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
case "s": case "s":
return m.saveSettings() return m.saveSettings()
case "l":
// Quick shortcut: Apply "large-db" profile for large databases
return m.applyLargeDBProfile()
case "c":
// Quick shortcut: Apply "conservative" profile for constrained VMs
return m.applyConservativeProfile()
case "p":
// Show profile recommendation
return m.showProfileRecommendation()
} }
} }
return m, nil return m, nil
} }
// applyLargeDBProfile applies the large-db profile optimized for large databases
func (m SettingsModel) applyLargeDBProfile() (tea.Model, tea.Cmd) {
if err := m.config.ApplyResourceProfile("large-db"); err != nil {
m.message = errorStyle.Render(fmt.Sprintf("[FAIL] %s", err.Error()))
return m, nil
}
m.message = successStyle.Render("[OK] Applied 'large-db' profile: Cluster=1, Jobs=2. Optimized for large DBs to avoid 'out of shared memory' errors.")
return m, nil
}
// applyConservativeProfile applies the conservative profile for constrained VMs
func (m SettingsModel) applyConservativeProfile() (tea.Model, tea.Cmd) {
if err := m.config.ApplyResourceProfile("conservative"); err != nil {
m.message = errorStyle.Render(fmt.Sprintf("[FAIL] %s", err.Error()))
return m, nil
}
m.message = successStyle.Render("[OK] Applied 'conservative' profile: Cluster=1, Jobs=1. Safe for small VMs with limited memory.")
return m, nil
}
// showProfileRecommendation displays the recommended profile based on system resources
func (m SettingsModel) showProfileRecommendation() (tea.Model, tea.Cmd) {
profileName, reason := m.config.GetResourceProfileRecommendation(false)
largeDBProfile, largeDBReason := m.config.GetResourceProfileRecommendation(true)
m.message = infoStyle.Render(fmt.Sprintf(
"[RECOMMEND] Default: %s | For Large DBs: %s\n"+
" → %s\n"+
" → Large DB: %s\n"+
" Press 'l' for large-db profile, 'c' for conservative",
profileName, largeDBProfile, reason, largeDBReason))
return m, nil
}
// handleEditingInput handles input when editing a setting // handleEditingInput handles input when editing a setting
func (m SettingsModel) handleEditingInput(msg tea.KeyMsg) (tea.Model, tea.Cmd) { func (m SettingsModel) handleEditingInput(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
switch msg.String() { switch msg.String() {
@@ -747,7 +837,32 @@ func (m SettingsModel) View() string {
// Current configuration summary // Current configuration summary
if !m.editing { if !m.editing {
b.WriteString("\n") b.WriteString("\n")
b.WriteString(infoStyle.Render("[INFO] Current Configuration")) b.WriteString(infoStyle.Render("[INFO] System Resources & Configuration"))
b.WriteString("\n")
// System resources
var sysInfo []string
if m.config.CPUInfo != nil {
sysInfo = append(sysInfo, fmt.Sprintf("CPU: %d cores (physical), %d logical",
m.config.CPUInfo.PhysicalCores, m.config.CPUInfo.LogicalCores))
}
if m.config.MemoryInfo != nil {
sysInfo = append(sysInfo, fmt.Sprintf("Memory: %dGB total, %dGB available",
m.config.MemoryInfo.TotalGB, m.config.MemoryInfo.AvailableGB))
}
// Recommended profile
recommendedProfile, reason := m.config.GetResourceProfileRecommendation(false)
sysInfo = append(sysInfo, fmt.Sprintf("Recommended Profile: %s", recommendedProfile))
sysInfo = append(sysInfo, fmt.Sprintf(" → %s", reason))
for _, line := range sysInfo {
b.WriteString(detailStyle.Render(fmt.Sprintf(" %s", line)))
b.WriteString("\n")
}
b.WriteString("\n")
b.WriteString(infoStyle.Render("[CONFIG] Current Settings"))
b.WriteString("\n") b.WriteString("\n")
summary := []string{ summary := []string{
@@ -755,7 +870,17 @@ func (m SettingsModel) View() string {
fmt.Sprintf("Database: %s@%s:%d", m.config.User, m.config.Host, m.config.Port), fmt.Sprintf("Database: %s@%s:%d", m.config.User, m.config.Host, m.config.Port),
fmt.Sprintf("Backup Dir: %s", m.config.BackupDir), fmt.Sprintf("Backup Dir: %s", m.config.BackupDir),
fmt.Sprintf("Compression: Level %d", m.config.CompressionLevel), fmt.Sprintf("Compression: Level %d", m.config.CompressionLevel),
fmt.Sprintf("Jobs: %d parallel, %d dump", m.config.Jobs, m.config.DumpJobs), fmt.Sprintf("Profile: %s | Cluster: %d parallel | Jobs: %d",
m.config.ResourceProfile, m.config.ClusterParallelism, m.config.Jobs),
}
// Show profile warnings if applicable
profile := m.config.GetCurrentProfile()
if profile != nil {
isValid, warnings := cpu.ValidateProfileForSystem(profile, m.config.CPUInfo, m.config.MemoryInfo)
if !isValid && len(warnings) > 0 {
summary = append(summary, fmt.Sprintf("⚠️ Warning: %s", warnings[0]))
}
} }
if m.config.CloudEnabled { if m.config.CloudEnabled {
@@ -782,9 +907,9 @@ func (m SettingsModel) View() string {
} else { } else {
// Show different help based on current selection // Show different help based on current selection
if m.cursor >= 0 && m.cursor < len(m.settings) && m.settings[m.cursor].Type == "path" { if m.cursor >= 0 && m.cursor < len(m.settings) && m.settings[m.cursor].Type == "path" {
footer = infoStyle.Render("\n[KEYS] Up/Down navigate | Enter edit | Tab browse directories | 's' save | 'r' reset | 'q' menu") footer = infoStyle.Render("\n[KEYS] ↑↓ navigate | Enter edit | Tab dirs | 'l' large-db | 'c' conservative | 'p' recommend | 's' save | 'q' menu")
} else { } else {
footer = infoStyle.Render("\n[KEYS] Up/Down navigate | Enter edit | 's' save | 'r' reset | 'q' menu | Tab=dirs on path fields only") footer = infoStyle.Render("\n[KEYS] ↑↓ navigate | Enter edit | 'l' large-db profile | 'c' conservative | 'p' recommend | 's' save | 'r' reset | 'q' menu")
} }
} }
} }