Compare commits
31 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 59a717abe7 | |||
| 490a12f858 | |||
| ea4337e298 | |||
| bbd4f0ceac | |||
| f6f8b04785 | |||
| 670c9af2e7 | |||
| e2cf9adc62 | |||
| 29e089fe3b | |||
| 9396c8e605 | |||
| e363e1937f | |||
| df1ab2f55b | |||
| 0e050b2def | |||
| 62d58c77af | |||
| c5be9bcd2b | |||
| b120f1507e | |||
| dd1db844ce | |||
| 4ea3ec2cf8 | |||
| 9200024e50 | |||
| 698b8a761c | |||
| dd7c4da0eb | |||
| b2a78cad2a | |||
| 5728b465e6 | |||
| bfe99e959c | |||
| 780beaadfb | |||
| 838c5b8c15 | |||
| 9d95a193db | |||
| 3201f0fb6a | |||
| 62ddc57fb7 | |||
| 510175ff04 | |||
| a85ad0c88c | |||
| 4938dc1918 |
62
CHANGELOG.md
62
CHANGELOG.md
@@ -5,6 +5,68 @@ All notable changes to dbbackup will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [3.42.50] - 2026-01-16 "Ctrl+C Signal Handling Fix"
|
||||
|
||||
### Fixed - Proper Ctrl+C/SIGINT Handling in TUI
|
||||
- **Added tea.InterruptMsg handling** - Bubbletea v1.3+ sends `InterruptMsg` for SIGINT signals
|
||||
instead of a `KeyMsg` with "ctrl+c", causing cancellation to not work
|
||||
- **Fixed cluster restore cancellation** - Ctrl+C now properly cancels running restore operations
|
||||
- **Fixed cluster backup cancellation** - Ctrl+C now properly cancels running backup operations
|
||||
- **Added interrupt handling to main menu** - Proper cleanup on SIGINT from menu
|
||||
- **Orphaned process cleanup** - `cleanup.KillOrphanedProcesses()` called on all interrupt paths
|
||||
|
||||
### Changed
|
||||
- All TUI execution views now handle both `tea.KeyMsg` ("ctrl+c") and `tea.InterruptMsg`
|
||||
- Context cancellation properly propagates to child processes via `exec.CommandContext`
|
||||
- No zombie pg_dump/pg_restore/gzip processes left behind on cancellation
|
||||
|
||||
## [3.42.49] - 2026-01-16 "Unified Cluster Backup Progress"
|
||||
|
||||
### Added - Unified Progress Display for Cluster Backup
|
||||
- **Combined overall progress bar** for cluster backup showing all phases:
|
||||
- Phase 1/3: Backing up Globals (0-15% of overall)
|
||||
- Phase 2/3: Backing up Databases (15-90% of overall)
|
||||
- Phase 3/3: Compressing Archive (90-100% of overall)
|
||||
- **Current database indicator** - Shows which database is currently being backed up
|
||||
- **Phase-aware progress tracking** - New fields in backup progress state:
|
||||
- `overallPhase` - Current phase (1=globals, 2=databases, 3=compressing)
|
||||
- `phaseDesc` - Human-readable phase description
|
||||
- **Dual progress bars** for cluster backup:
|
||||
- Overall progress bar showing combined operation progress
|
||||
- Database count progress bar showing individual database progress
|
||||
|
||||
### Changed
|
||||
- Cluster backup TUI now shows unified progress display matching restore
|
||||
- Progress callbacks now include phase information
|
||||
- Better visual feedback during entire cluster backup operation
|
||||
|
||||
## [3.42.48] - 2026-01-15 "Unified Cluster Restore Progress"
|
||||
|
||||
### Added - Unified Progress Display for Cluster Restore
|
||||
- **Combined overall progress bar** showing progress across all restore phases:
|
||||
- Phase 1/3: Extracting Archive (0-60% of overall)
|
||||
- Phase 2/3: Restoring Globals (60-65% of overall)
|
||||
- Phase 3/3: Restoring Databases (65-100% of overall)
|
||||
- **Current database indicator** - Shows which database is currently being restored
|
||||
- **Phase-aware progress tracking** - New fields in progress state:
|
||||
- `overallPhase` - Current phase (1=extraction, 2=globals, 3=databases)
|
||||
- `currentDB` - Name of database currently being restored
|
||||
- `extractionDone` - Boolean flag for phase transition
|
||||
- **Dual progress bars** for cluster restore:
|
||||
- Overall progress bar showing combined operation progress
|
||||
- Phase-specific progress bar (extraction bytes or database count)
|
||||
|
||||
### Changed
|
||||
- Cluster restore TUI now shows unified progress display
|
||||
- Progress callbacks now set phase and current database information
|
||||
- Extraction completion triggers automatic transition to globals phase
|
||||
- Database restore phase shows current database name with spinner
|
||||
|
||||
### Improved
|
||||
- Better visual feedback during entire cluster restore operation
|
||||
- Clear phase indicators help users understand restore progress
|
||||
- Overall progress percentage gives better time estimates
|
||||
|
||||
## [3.42.35] - 2026-01-15 "TUI Detailed Progress"
|
||||
|
||||
### Added - Enhanced TUI Progress Display
|
||||
|
||||
42
README.md
42
README.md
@@ -194,21 +194,51 @@ r: Restore | v: Verify | i: Info | d: Diagnose | D: Delete | R: Refresh | Esc: B
|
||||
```
|
||||
Configuration Settings
|
||||
|
||||
[SYSTEM] Detected Resources
|
||||
CPU: 8 physical cores, 16 logical cores
|
||||
Memory: 32GB total, 28GB available
|
||||
Recommended Profile: balanced
|
||||
→ 8 cores and 32GB RAM supports moderate parallelism
|
||||
|
||||
[CONFIG] Current Settings
|
||||
Target DB: PostgreSQL (postgres)
|
||||
Database: postgres@localhost:5432
|
||||
Backup Dir: /var/backups/postgres
|
||||
Compression: Level 6
|
||||
Profile: balanced | Cluster: 2 parallel | Jobs: 4
|
||||
|
||||
> Database Type: postgres
|
||||
CPU Workload Type: balanced
|
||||
Backup Directory: /root/db_backups
|
||||
Work Directory: /tmp
|
||||
Resource Profile: balanced (P:2 J:4)
|
||||
Cluster Parallelism: 2
|
||||
Backup Directory: /var/backups/postgres
|
||||
Work Directory: (system temp)
|
||||
Compression Level: 6
|
||||
Parallel Jobs: 16
|
||||
Dump Jobs: 8
|
||||
Parallel Jobs: 4
|
||||
Dump Jobs: 4
|
||||
Database Host: localhost
|
||||
Database Port: 5432
|
||||
Database User: root
|
||||
Database User: postgres
|
||||
SSL Mode: prefer
|
||||
|
||||
s: Save | r: Reset | q: Menu
|
||||
[KEYS] ↑↓ navigate | Enter edit | 'l' toggle LargeDB | 'c' conservative | 'p' recommend | 's' save | 'q' menu
|
||||
```
|
||||
|
||||
**Resource Profiles for Large Databases:**
|
||||
|
||||
When restoring large databases on VMs with limited resources, use the resource profile settings to prevent "out of shared memory" errors:
|
||||
|
||||
| Profile | Cluster Parallel | Jobs | Best For |
|
||||
|---------|------------------|------|----------|
|
||||
| conservative | 1 | 1 | Small VMs (<16GB RAM) |
|
||||
| balanced | 2 | 2-4 | Medium VMs (16-32GB RAM) |
|
||||
| performance | 4 | 4-8 | Large servers (32GB+ RAM) |
|
||||
| max-performance | 8 | 8-16 | High-end servers (64GB+) |
|
||||
|
||||
**Large DB Mode:** Toggle with `l` key. Reduces parallelism by 50% and sets max_locks_per_transaction=8192 for complex databases with many tables/LOBs.
|
||||
|
||||
**Quick shortcuts:** Press `l` to toggle Large DB Mode, `c` for conservative, `p` to show recommendation.
|
||||
|
||||
**Database Status:**
|
||||
```
|
||||
Database Status & Health Check
|
||||
|
||||
@@ -3,9 +3,9 @@
|
||||
This directory contains pre-compiled binaries for the DB Backup Tool across multiple platforms and architectures.
|
||||
|
||||
## Build Information
|
||||
- **Version**: 3.42.34
|
||||
- **Build Time**: 2026-01-15_14:16:33_UTC
|
||||
- **Git Commit**: eeacbfa
|
||||
- **Version**: 3.42.50
|
||||
- **Build Time**: 2026-01-18_11:19:47_UTC
|
||||
- **Git Commit**: 490a12f
|
||||
|
||||
## Recent Updates (v1.1.0)
|
||||
- ✅ Fixed TUI progress display with line-by-line output
|
||||
|
||||
@@ -28,6 +28,7 @@ var (
|
||||
restoreClean bool
|
||||
restoreCreate bool
|
||||
restoreJobs int
|
||||
restoreParallelDBs int // Number of parallel database restores
|
||||
restoreTarget string
|
||||
restoreVerbose bool
|
||||
restoreNoProgress bool
|
||||
@@ -289,6 +290,7 @@ func init() {
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreForce, "force", false, "Skip safety checks and confirmations")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreCleanCluster, "clean-cluster", false, "Drop all existing user databases before restore (disaster recovery)")
|
||||
restoreClusterCmd.Flags().IntVar(&restoreJobs, "jobs", 0, "Number of parallel decompression jobs (0 = auto)")
|
||||
restoreClusterCmd.Flags().IntVar(&restoreParallelDBs, "parallel-dbs", 0, "Number of databases to restore in parallel (0 = use config default, 1 = sequential, -1 = auto-detect based on CPU/RAM)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreWorkdir, "workdir", "", "Working directory for extraction (use when system disk is small, e.g. /mnt/storage/restore_tmp)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreVerbose, "verbose", false, "Show detailed restore progress")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreNoProgress, "no-progress", false, "Disable progress indicators")
|
||||
@@ -783,6 +785,17 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
}
|
||||
}
|
||||
|
||||
// Override cluster parallelism if --parallel-dbs is specified
|
||||
if restoreParallelDBs == -1 {
|
||||
// Auto-detect optimal parallelism based on system resources
|
||||
autoParallel := restore.CalculateOptimalParallel()
|
||||
cfg.ClusterParallelism = autoParallel
|
||||
log.Info("Auto-detected optimal parallelism for database restores", "parallel_dbs", autoParallel, "mode", "auto")
|
||||
} else if restoreParallelDBs > 0 {
|
||||
cfg.ClusterParallelism = restoreParallelDBs
|
||||
log.Info("Using custom parallelism for database restores", "parallel_dbs", restoreParallelDBs)
|
||||
}
|
||||
|
||||
// Create restore engine
|
||||
engine := restore.New(cfg, log, db)
|
||||
|
||||
|
||||
@@ -94,7 +94,7 @@
|
||||
"uid": "${DS_PROMETHEUS}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < 86400",
|
||||
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < bool 604800",
|
||||
"legendFormat": "{{database}}",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
@@ -711,19 +711,6 @@
|
||||
},
|
||||
"pluginVersion": "10.2.0",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${DS_PROMETHEUS}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < 86400",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"legendFormat": "__auto",
|
||||
"range": false,
|
||||
"refId": "Status"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
@@ -769,26 +756,30 @@
|
||||
"Time": true,
|
||||
"Time 1": true,
|
||||
"Time 2": true,
|
||||
"Time 3": true,
|
||||
"__name__": true,
|
||||
"__name__ 1": true,
|
||||
"__name__ 2": true,
|
||||
"__name__ 3": true,
|
||||
"instance 1": true,
|
||||
"instance 2": true,
|
||||
"instance 3": true,
|
||||
"job": true,
|
||||
"job 1": true,
|
||||
"job 2": true,
|
||||
"job 3": true
|
||||
"engine 1": true,
|
||||
"engine 2": true
|
||||
},
|
||||
"indexByName": {
|
||||
"Database": 0,
|
||||
"Instance": 1,
|
||||
"Engine": 2,
|
||||
"RPO": 3,
|
||||
"Size": 4
|
||||
},
|
||||
"indexByName": {},
|
||||
"renameByName": {
|
||||
"Value #RPO": "RPO",
|
||||
"Value #Size": "Size",
|
||||
"Value #Status": "Status",
|
||||
"database": "Database",
|
||||
"instance": "Instance"
|
||||
"instance": "Instance",
|
||||
"engine": "Engine"
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1275,7 +1266,7 @@
|
||||
"query": "label_values(dbbackup_rpo_seconds, instance)",
|
||||
"refId": "StandardVariableQuery"
|
||||
},
|
||||
"refresh": 1,
|
||||
"refresh": 2,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"sort": 1,
|
||||
|
||||
@@ -84,20 +84,14 @@ func findHbaFileViaPostgres() string {
|
||||
|
||||
// parsePgHbaConf parses pg_hba.conf and returns the authentication method
|
||||
func parsePgHbaConf(path string, user string) AuthMethod {
|
||||
// Try with sudo if we can't read directly
|
||||
// Try to read the file directly - do NOT use sudo as it triggers password prompts
|
||||
// If we can't read pg_hba.conf, we'll rely on connection attempts to determine auth
|
||||
file, err := os.Open(path)
|
||||
if err != nil {
|
||||
// Try with sudo (with timeout)
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
|
||||
cmd := exec.CommandContext(ctx, "sudo", "cat", path)
|
||||
output, err := cmd.Output()
|
||||
if err != nil {
|
||||
// If we can't read the file, return unknown and let the connection determine auth
|
||||
// This avoids sudo password prompts when running as postgres via su
|
||||
return AuthUnknown
|
||||
}
|
||||
return parseHbaContent(string(output), user)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
scanner := bufio.NewScanner(file)
|
||||
|
||||
@@ -937,11 +937,15 @@ func (e *Engine) createSampleBackup(ctx context.Context, databaseName, outputFil
|
||||
func (e *Engine) backupGlobals(ctx context.Context, tempDir string) error {
|
||||
globalsFile := filepath.Join(tempDir, "globals.sql")
|
||||
|
||||
cmd := exec.CommandContext(ctx, "pg_dumpall", "--globals-only")
|
||||
if e.cfg.Host != "localhost" {
|
||||
cmd.Args = append(cmd.Args, "-h", e.cfg.Host, "-p", fmt.Sprintf("%d", e.cfg.Port))
|
||||
// CRITICAL: Always pass port even for localhost - user may have non-standard port
|
||||
cmd := exec.CommandContext(ctx, "pg_dumpall", "--globals-only",
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User)
|
||||
|
||||
// Only add -h flag for non-localhost to use Unix socket for peer auth
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
cmd.Args = append([]string{cmd.Args[0], "-h", e.cfg.Host}, cmd.Args[1:]...)
|
||||
}
|
||||
cmd.Args = append(cmd.Args, "-U", e.cfg.User)
|
||||
|
||||
cmd.Env = os.Environ()
|
||||
if e.cfg.Password != "" {
|
||||
|
||||
@@ -68,8 +68,8 @@ func ClassifyError(errorMsg string) *ErrorClassification {
|
||||
Type: "critical",
|
||||
Category: "locks",
|
||||
Message: errorMsg,
|
||||
Hint: "Lock table exhausted - typically caused by large objects (BLOBs) during restore",
|
||||
Action: "Option 1: Increase max_locks_per_transaction to 1024+ in postgresql.conf (requires restart). Option 2: Update dbbackup and retry - phased restore now auto-enabled for BLOB databases",
|
||||
Hint: "Lock table exhausted. Total capacity = max_locks_per_transaction × (max_connections + max_prepared_transactions). If you reduced VM size or max_connections, you need higher max_locks_per_transaction to compensate.",
|
||||
Action: "Fix: ALTER SYSTEM SET max_locks_per_transaction = 4096; then RESTART PostgreSQL. For smaller VMs with fewer connections, you need higher max_locks_per_transaction values.",
|
||||
Severity: 2,
|
||||
}
|
||||
case "permission_denied":
|
||||
@@ -142,8 +142,8 @@ func ClassifyError(errorMsg string) *ErrorClassification {
|
||||
Type: "critical",
|
||||
Category: "locks",
|
||||
Message: errorMsg,
|
||||
Hint: "Lock table exhausted - typically caused by large objects (BLOBs) during restore",
|
||||
Action: "Option 1: Increase max_locks_per_transaction to 1024+ in postgresql.conf (requires restart). Option 2: Update dbbackup and retry - phased restore now auto-enabled for BLOB databases",
|
||||
Hint: "Lock table exhausted. Total capacity = max_locks_per_transaction × (max_connections + max_prepared_transactions). If you reduced VM size or max_connections, you need higher max_locks_per_transaction to compensate.",
|
||||
Action: "Fix: ALTER SYSTEM SET max_locks_per_transaction = 4096; then RESTART PostgreSQL. For smaller VMs with fewer connections, you need higher max_locks_per_transaction values.",
|
||||
Severity: 2,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -36,9 +36,14 @@ type Config struct {
|
||||
AutoDetectCores bool
|
||||
CPUWorkloadType string // "cpu-intensive", "io-intensive", "balanced"
|
||||
|
||||
// Resource profile for backup/restore operations
|
||||
ResourceProfile string // "conservative", "balanced", "performance", "max-performance"
|
||||
LargeDBMode bool // Enable large database mode (reduces parallelism, increases max_locks)
|
||||
|
||||
// CPU detection
|
||||
CPUDetector *cpu.Detector
|
||||
CPUInfo *cpu.CPUInfo
|
||||
MemoryInfo *cpu.MemoryInfo // System memory information
|
||||
|
||||
// Sample backup options
|
||||
SampleStrategy string // "ratio", "percent", "count"
|
||||
@@ -178,6 +183,13 @@ func New() *Config {
|
||||
sslMode = ""
|
||||
}
|
||||
|
||||
// Detect memory information
|
||||
memInfo, _ := cpu.DetectMemory()
|
||||
|
||||
// Determine recommended resource profile
|
||||
recommendedProfile := cpu.RecommendProfile(cpuInfo, memInfo, false)
|
||||
defaultProfile := getEnvString("RESOURCE_PROFILE", recommendedProfile.Name)
|
||||
|
||||
cfg := &Config{
|
||||
// Database defaults
|
||||
Host: host,
|
||||
@@ -189,18 +201,21 @@ func New() *Config {
|
||||
SSLMode: sslMode,
|
||||
Insecure: getEnvBool("INSECURE", false),
|
||||
|
||||
// Backup defaults
|
||||
// Backup defaults - use recommended profile's settings for small VMs
|
||||
BackupDir: backupDir,
|
||||
CompressionLevel: getEnvInt("COMPRESS_LEVEL", 6),
|
||||
Jobs: getEnvInt("JOBS", getDefaultJobs(cpuInfo)),
|
||||
DumpJobs: getEnvInt("DUMP_JOBS", getDefaultDumpJobs(cpuInfo)),
|
||||
Jobs: getEnvInt("JOBS", recommendedProfile.Jobs),
|
||||
DumpJobs: getEnvInt("DUMP_JOBS", recommendedProfile.DumpJobs),
|
||||
MaxCores: getEnvInt("MAX_CORES", getDefaultMaxCores(cpuInfo)),
|
||||
AutoDetectCores: getEnvBool("AUTO_DETECT_CORES", true),
|
||||
CPUWorkloadType: getEnvString("CPU_WORKLOAD_TYPE", "balanced"),
|
||||
ResourceProfile: defaultProfile,
|
||||
LargeDBMode: getEnvBool("LARGE_DB_MODE", false),
|
||||
|
||||
// CPU detection
|
||||
// CPU and memory detection
|
||||
CPUDetector: cpuDetector,
|
||||
CPUInfo: cpuInfo,
|
||||
MemoryInfo: memInfo,
|
||||
|
||||
// Sample backup defaults
|
||||
SampleStrategy: getEnvString("SAMPLE_STRATEGY", "ratio"),
|
||||
@@ -220,8 +235,8 @@ func New() *Config {
|
||||
// Timeouts - default 24 hours (1440 min) to handle very large databases with large objects
|
||||
ClusterTimeoutMinutes: getEnvInt("CLUSTER_TIMEOUT_MIN", 1440),
|
||||
|
||||
// Cluster parallelism (default: 2 concurrent operations for faster cluster backup/restore)
|
||||
ClusterParallelism: getEnvInt("CLUSTER_PARALLELISM", 2),
|
||||
// Cluster parallelism - use recommended profile's setting for small VMs
|
||||
ClusterParallelism: getEnvInt("CLUSTER_PARALLELISM", recommendedProfile.ClusterParallelism),
|
||||
|
||||
// Working directory for large operations (default: system temp)
|
||||
WorkDir: getEnvString("WORK_DIR", ""),
|
||||
@@ -409,6 +424,56 @@ func (c *Config) OptimizeForCPU() error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// ApplyResourceProfile applies a resource profile to the configuration
|
||||
// This adjusts parallelism settings based on the chosen profile
|
||||
func (c *Config) ApplyResourceProfile(profileName string) error {
|
||||
profile := cpu.GetProfileByName(profileName)
|
||||
if profile == nil {
|
||||
return &ConfigError{
|
||||
Field: "resource_profile",
|
||||
Value: profileName,
|
||||
Message: "unknown profile. Valid profiles: conservative, balanced, performance, max-performance",
|
||||
}
|
||||
}
|
||||
|
||||
// Validate profile against current system
|
||||
isValid, warnings := cpu.ValidateProfileForSystem(profile, c.CPUInfo, c.MemoryInfo)
|
||||
if !isValid {
|
||||
// Log warnings but don't block - user may know what they're doing
|
||||
_ = warnings // In production, log these warnings
|
||||
}
|
||||
|
||||
// Apply profile settings
|
||||
c.ResourceProfile = profile.Name
|
||||
c.ClusterParallelism = profile.ClusterParallelism
|
||||
c.Jobs = profile.Jobs
|
||||
c.DumpJobs = profile.DumpJobs
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetResourceProfileRecommendation returns the recommended profile and reason
|
||||
func (c *Config) GetResourceProfileRecommendation(isLargeDB bool) (string, string) {
|
||||
profile, reason := cpu.RecommendProfileWithReason(c.CPUInfo, c.MemoryInfo, isLargeDB)
|
||||
return profile.Name, reason
|
||||
}
|
||||
|
||||
// GetCurrentProfile returns the current resource profile details
|
||||
// If LargeDBMode is enabled, returns a modified profile with reduced parallelism
|
||||
func (c *Config) GetCurrentProfile() *cpu.ResourceProfile {
|
||||
profile := cpu.GetProfileByName(c.ResourceProfile)
|
||||
if profile == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Apply LargeDBMode modifier if enabled
|
||||
if c.LargeDBMode {
|
||||
return cpu.ApplyLargeDBMode(profile)
|
||||
}
|
||||
|
||||
return profile
|
||||
}
|
||||
|
||||
// GetCPUInfo returns CPU information, detecting if necessary
|
||||
func (c *Config) GetCPUInfo() (*cpu.CPUInfo, error) {
|
||||
if c.CPUInfo != nil {
|
||||
|
||||
@@ -31,6 +31,8 @@ type LocalConfig struct {
|
||||
CPUWorkload string
|
||||
MaxCores int
|
||||
ClusterTimeout int // Cluster operation timeout in minutes (default: 1440 = 24 hours)
|
||||
ResourceProfile string
|
||||
LargeDBMode bool // Enable large database mode (reduces parallelism, increases locks)
|
||||
|
||||
// Security settings
|
||||
RetentionDays int
|
||||
@@ -126,6 +128,10 @@ func LoadLocalConfig() (*LocalConfig, error) {
|
||||
if ct, err := strconv.Atoi(value); err == nil {
|
||||
cfg.ClusterTimeout = ct
|
||||
}
|
||||
case "resource_profile":
|
||||
cfg.ResourceProfile = value
|
||||
case "large_db_mode":
|
||||
cfg.LargeDBMode = value == "true" || value == "1"
|
||||
}
|
||||
case "security":
|
||||
switch key {
|
||||
@@ -207,6 +213,12 @@ func SaveLocalConfig(cfg *LocalConfig) error {
|
||||
if cfg.ClusterTimeout != 0 {
|
||||
sb.WriteString(fmt.Sprintf("cluster_timeout = %d\n", cfg.ClusterTimeout))
|
||||
}
|
||||
if cfg.ResourceProfile != "" {
|
||||
sb.WriteString(fmt.Sprintf("resource_profile = %s\n", cfg.ResourceProfile))
|
||||
}
|
||||
if cfg.LargeDBMode {
|
||||
sb.WriteString("large_db_mode = true\n")
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
|
||||
// Security section
|
||||
@@ -280,6 +292,14 @@ func ApplyLocalConfig(cfg *Config, local *LocalConfig) {
|
||||
if local.ClusterTimeout != 0 {
|
||||
cfg.ClusterTimeoutMinutes = local.ClusterTimeout
|
||||
}
|
||||
// Apply resource profile settings
|
||||
if local.ResourceProfile != "" {
|
||||
cfg.ResourceProfile = local.ResourceProfile
|
||||
}
|
||||
// LargeDBMode is a boolean - apply if true in config
|
||||
if local.LargeDBMode {
|
||||
cfg.LargeDBMode = true
|
||||
}
|
||||
if cfg.RetentionDays == 30 && local.RetentionDays != 0 {
|
||||
cfg.RetentionDays = local.RetentionDays
|
||||
}
|
||||
@@ -308,6 +328,8 @@ func ConfigFromConfig(cfg *Config) *LocalConfig {
|
||||
CPUWorkload: cfg.CPUWorkloadType,
|
||||
MaxCores: cfg.MaxCores,
|
||||
ClusterTimeout: cfg.ClusterTimeoutMinutes,
|
||||
ResourceProfile: cfg.ResourceProfile,
|
||||
LargeDBMode: cfg.LargeDBMode,
|
||||
RetentionDays: cfg.RetentionDays,
|
||||
MinBackups: cfg.MinBackups,
|
||||
MaxRetries: cfg.MaxRetries,
|
||||
|
||||
475
internal/cpu/profiles.go
Normal file
475
internal/cpu/profiles.go
Normal file
@@ -0,0 +1,475 @@
|
||||
package cpu
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"runtime"
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// MemoryInfo holds system memory information
|
||||
type MemoryInfo struct {
|
||||
TotalBytes int64 `json:"total_bytes"`
|
||||
AvailableBytes int64 `json:"available_bytes"`
|
||||
FreeBytes int64 `json:"free_bytes"`
|
||||
UsedBytes int64 `json:"used_bytes"`
|
||||
SwapTotalBytes int64 `json:"swap_total_bytes"`
|
||||
SwapFreeBytes int64 `json:"swap_free_bytes"`
|
||||
TotalGB int `json:"total_gb"`
|
||||
AvailableGB int `json:"available_gb"`
|
||||
Platform string `json:"platform"`
|
||||
}
|
||||
|
||||
// ResourceProfile defines a resource allocation profile for backup/restore operations
|
||||
type ResourceProfile struct {
|
||||
Name string `json:"name"`
|
||||
Description string `json:"description"`
|
||||
ClusterParallelism int `json:"cluster_parallelism"` // Concurrent databases
|
||||
Jobs int `json:"jobs"` // Parallel jobs within pg_restore
|
||||
DumpJobs int `json:"dump_jobs"` // Parallel jobs for pg_dump
|
||||
MaintenanceWorkMem string `json:"maintenance_work_mem"` // PostgreSQL recommendation
|
||||
MaxLocksPerTxn int `json:"max_locks_per_txn"` // PostgreSQL recommendation
|
||||
RecommendedForLarge bool `json:"recommended_for_large"` // Suitable for large DBs?
|
||||
MinMemoryGB int `json:"min_memory_gb"` // Minimum memory for this profile
|
||||
MinCores int `json:"min_cores"` // Minimum cores for this profile
|
||||
}
|
||||
|
||||
// Predefined resource profiles
|
||||
var (
|
||||
// ProfileConservative - Safe for constrained VMs, avoids shared memory issues
|
||||
ProfileConservative = ResourceProfile{
|
||||
Name: "conservative",
|
||||
Description: "Safe for small VMs (2-4 cores, <16GB). Sequential operations, minimal memory pressure. Best for large DBs on limited hardware.",
|
||||
ClusterParallelism: 1,
|
||||
Jobs: 1,
|
||||
DumpJobs: 2,
|
||||
MaintenanceWorkMem: "256MB",
|
||||
MaxLocksPerTxn: 4096,
|
||||
RecommendedForLarge: true,
|
||||
MinMemoryGB: 4,
|
||||
MinCores: 2,
|
||||
}
|
||||
|
||||
// ProfileBalanced - Default profile, works for most scenarios
|
||||
ProfileBalanced = ResourceProfile{
|
||||
Name: "balanced",
|
||||
Description: "Balanced for medium VMs (4-8 cores, 16-32GB). Moderate parallelism with good safety margin.",
|
||||
ClusterParallelism: 2,
|
||||
Jobs: 2,
|
||||
DumpJobs: 4,
|
||||
MaintenanceWorkMem: "512MB",
|
||||
MaxLocksPerTxn: 2048,
|
||||
RecommendedForLarge: true,
|
||||
MinMemoryGB: 16,
|
||||
MinCores: 4,
|
||||
}
|
||||
|
||||
// ProfilePerformance - Aggressive parallelism for powerful servers
|
||||
ProfilePerformance = ResourceProfile{
|
||||
Name: "performance",
|
||||
Description: "Aggressive for powerful servers (8+ cores, 32GB+). Maximum parallelism for fast operations.",
|
||||
ClusterParallelism: 4,
|
||||
Jobs: 4,
|
||||
DumpJobs: 8,
|
||||
MaintenanceWorkMem: "1GB",
|
||||
MaxLocksPerTxn: 1024,
|
||||
RecommendedForLarge: false, // Large DBs may still need conservative
|
||||
MinMemoryGB: 32,
|
||||
MinCores: 8,
|
||||
}
|
||||
|
||||
// ProfileMaxPerformance - Maximum parallelism for high-end servers
|
||||
ProfileMaxPerformance = ResourceProfile{
|
||||
Name: "max-performance",
|
||||
Description: "Maximum for high-end servers (16+ cores, 64GB+). Full CPU utilization.",
|
||||
ClusterParallelism: 8,
|
||||
Jobs: 8,
|
||||
DumpJobs: 16,
|
||||
MaintenanceWorkMem: "2GB",
|
||||
MaxLocksPerTxn: 512,
|
||||
RecommendedForLarge: false, // Large DBs should use LargeDBMode
|
||||
MinMemoryGB: 64,
|
||||
MinCores: 16,
|
||||
}
|
||||
|
||||
// AllProfiles contains all available profiles (VM resource-based)
|
||||
AllProfiles = []ResourceProfile{
|
||||
ProfileConservative,
|
||||
ProfileBalanced,
|
||||
ProfilePerformance,
|
||||
ProfileMaxPerformance,
|
||||
}
|
||||
)
|
||||
|
||||
// GetProfileByName returns a profile by its name
|
||||
func GetProfileByName(name string) *ResourceProfile {
|
||||
for _, p := range AllProfiles {
|
||||
if strings.EqualFold(p.Name, name) {
|
||||
return &p
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ApplyLargeDBMode modifies a profile for large database operations.
|
||||
// This is a modifier that reduces parallelism and increases max_locks_per_transaction
|
||||
// to prevent "out of shared memory" errors with large databases (many tables, LOBs, etc.).
|
||||
// It returns a new profile with adjusted settings, leaving the original unchanged.
|
||||
func ApplyLargeDBMode(profile *ResourceProfile) *ResourceProfile {
|
||||
if profile == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Create a copy with adjusted settings
|
||||
modified := *profile
|
||||
|
||||
// Add "(large-db)" suffix to indicate this is modified
|
||||
modified.Name = profile.Name + " +large-db"
|
||||
modified.Description = fmt.Sprintf("%s [LargeDBMode: reduced parallelism, high locks]", profile.Description)
|
||||
|
||||
// Reduce parallelism to avoid lock exhaustion
|
||||
// Rule: halve parallelism, minimum 1
|
||||
modified.ClusterParallelism = max(1, profile.ClusterParallelism/2)
|
||||
modified.Jobs = max(1, profile.Jobs/2)
|
||||
modified.DumpJobs = max(2, profile.DumpJobs/2)
|
||||
|
||||
// Force high max_locks_per_transaction for large schemas
|
||||
modified.MaxLocksPerTxn = 8192
|
||||
|
||||
// Increase maintenance_work_mem for complex operations
|
||||
// Keep or boost maintenance work mem
|
||||
modified.MaintenanceWorkMem = "1GB"
|
||||
if profile.MinMemoryGB >= 32 {
|
||||
modified.MaintenanceWorkMem = "2GB"
|
||||
}
|
||||
|
||||
modified.RecommendedForLarge = true
|
||||
|
||||
return &modified
|
||||
}
|
||||
|
||||
// max returns the larger of two integers
|
||||
func max(a, b int) int {
|
||||
if a > b {
|
||||
return a
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
// DetectMemory detects system memory information
|
||||
func DetectMemory() (*MemoryInfo, error) {
|
||||
info := &MemoryInfo{
|
||||
Platform: runtime.GOOS,
|
||||
}
|
||||
|
||||
switch runtime.GOOS {
|
||||
case "linux":
|
||||
if err := detectLinuxMemory(info); err != nil {
|
||||
return info, fmt.Errorf("linux memory detection failed: %w", err)
|
||||
}
|
||||
case "darwin":
|
||||
if err := detectDarwinMemory(info); err != nil {
|
||||
return info, fmt.Errorf("darwin memory detection failed: %w", err)
|
||||
}
|
||||
case "windows":
|
||||
if err := detectWindowsMemory(info); err != nil {
|
||||
return info, fmt.Errorf("windows memory detection failed: %w", err)
|
||||
}
|
||||
default:
|
||||
// Fallback: use Go runtime memory stats
|
||||
var memStats runtime.MemStats
|
||||
runtime.ReadMemStats(&memStats)
|
||||
info.TotalBytes = int64(memStats.Sys)
|
||||
info.AvailableBytes = int64(memStats.Sys - memStats.Alloc)
|
||||
}
|
||||
|
||||
// Calculate GB values
|
||||
info.TotalGB = int(info.TotalBytes / (1024 * 1024 * 1024))
|
||||
info.AvailableGB = int(info.AvailableBytes / (1024 * 1024 * 1024))
|
||||
|
||||
return info, nil
|
||||
}
|
||||
|
||||
// detectLinuxMemory reads memory info from /proc/meminfo
|
||||
func detectLinuxMemory(info *MemoryInfo) error {
|
||||
file, err := os.Open("/proc/meminfo")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
scanner := bufio.NewScanner(file)
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
parts := strings.Fields(line)
|
||||
if len(parts) < 2 {
|
||||
continue
|
||||
}
|
||||
|
||||
key := strings.TrimSuffix(parts[0], ":")
|
||||
value, err := strconv.ParseInt(parts[1], 10, 64)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
|
||||
// Values are in kB
|
||||
valueBytes := value * 1024
|
||||
|
||||
switch key {
|
||||
case "MemTotal":
|
||||
info.TotalBytes = valueBytes
|
||||
case "MemAvailable":
|
||||
info.AvailableBytes = valueBytes
|
||||
case "MemFree":
|
||||
info.FreeBytes = valueBytes
|
||||
case "SwapTotal":
|
||||
info.SwapTotalBytes = valueBytes
|
||||
case "SwapFree":
|
||||
info.SwapFreeBytes = valueBytes
|
||||
}
|
||||
}
|
||||
|
||||
info.UsedBytes = info.TotalBytes - info.AvailableBytes
|
||||
|
||||
return scanner.Err()
|
||||
}
|
||||
|
||||
// detectDarwinMemory detects memory on macOS
|
||||
func detectDarwinMemory(info *MemoryInfo) error {
|
||||
// Use sysctl for total memory
|
||||
if output, err := runCommand("sysctl", "-n", "hw.memsize"); err == nil {
|
||||
if val, err := strconv.ParseInt(strings.TrimSpace(output), 10, 64); err == nil {
|
||||
info.TotalBytes = val
|
||||
}
|
||||
}
|
||||
|
||||
// Use vm_stat for available memory (more complex parsing required)
|
||||
if output, err := runCommand("vm_stat"); err == nil {
|
||||
pageSize := int64(4096) // Default page size
|
||||
var freePages, inactivePages int64
|
||||
|
||||
lines := strings.Split(output, "\n")
|
||||
for _, line := range lines {
|
||||
if strings.Contains(line, "page size of") {
|
||||
parts := strings.Fields(line)
|
||||
for i, p := range parts {
|
||||
if p == "of" && i+1 < len(parts) {
|
||||
if ps, err := strconv.ParseInt(parts[i+1], 10, 64); err == nil {
|
||||
pageSize = ps
|
||||
}
|
||||
}
|
||||
}
|
||||
} else if strings.Contains(line, "Pages free:") {
|
||||
val := extractNumberFromLine(line)
|
||||
freePages = val
|
||||
} else if strings.Contains(line, "Pages inactive:") {
|
||||
val := extractNumberFromLine(line)
|
||||
inactivePages = val
|
||||
}
|
||||
}
|
||||
|
||||
info.FreeBytes = freePages * pageSize
|
||||
info.AvailableBytes = (freePages + inactivePages) * pageSize
|
||||
}
|
||||
|
||||
info.UsedBytes = info.TotalBytes - info.AvailableBytes
|
||||
return nil
|
||||
}
|
||||
|
||||
// detectWindowsMemory detects memory on Windows
|
||||
func detectWindowsMemory(info *MemoryInfo) error {
|
||||
// Use wmic for memory info
|
||||
if output, err := runCommand("wmic", "OS", "get", "TotalVisibleMemorySize,FreePhysicalMemory", "/format:list"); err == nil {
|
||||
lines := strings.Split(output, "\n")
|
||||
for _, line := range lines {
|
||||
line = strings.TrimSpace(line)
|
||||
if strings.HasPrefix(line, "TotalVisibleMemorySize=") {
|
||||
val := strings.TrimPrefix(line, "TotalVisibleMemorySize=")
|
||||
if v, err := strconv.ParseInt(strings.TrimSpace(val), 10, 64); err == nil {
|
||||
info.TotalBytes = v * 1024 // KB to bytes
|
||||
}
|
||||
} else if strings.HasPrefix(line, "FreePhysicalMemory=") {
|
||||
val := strings.TrimPrefix(line, "FreePhysicalMemory=")
|
||||
if v, err := strconv.ParseInt(strings.TrimSpace(val), 10, 64); err == nil {
|
||||
info.FreeBytes = v * 1024
|
||||
info.AvailableBytes = v * 1024
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
info.UsedBytes = info.TotalBytes - info.AvailableBytes
|
||||
return nil
|
||||
}
|
||||
|
||||
// RecommendProfile recommends a resource profile based on system resources and workload
|
||||
func RecommendProfile(cpuInfo *CPUInfo, memInfo *MemoryInfo, isLargeDB bool) *ResourceProfile {
|
||||
cores := 0
|
||||
if cpuInfo != nil {
|
||||
cores = cpuInfo.PhysicalCores
|
||||
if cores == 0 {
|
||||
cores = cpuInfo.LogicalCores
|
||||
}
|
||||
}
|
||||
if cores == 0 {
|
||||
cores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memGB := 0
|
||||
if memInfo != nil {
|
||||
memGB = memInfo.TotalGB
|
||||
}
|
||||
|
||||
// Special case: large databases should use conservative profile
|
||||
// The caller should also enable LargeDBMode for increased MaxLocksPerTxn
|
||||
if isLargeDB {
|
||||
// For large DBs, recommend conservative regardless of resources
|
||||
// LargeDBMode flag will handle the lock settings separately
|
||||
return &ProfileConservative
|
||||
}
|
||||
|
||||
// Resource-based selection
|
||||
if cores >= 16 && memGB >= 64 {
|
||||
return &ProfileMaxPerformance
|
||||
} else if cores >= 8 && memGB >= 32 {
|
||||
return &ProfilePerformance
|
||||
} else if cores >= 4 && memGB >= 16 {
|
||||
return &ProfileBalanced
|
||||
}
|
||||
|
||||
// Default to conservative for constrained systems
|
||||
return &ProfileConservative
|
||||
}
|
||||
|
||||
// RecommendProfileWithReason returns a profile recommendation with explanation
|
||||
func RecommendProfileWithReason(cpuInfo *CPUInfo, memInfo *MemoryInfo, isLargeDB bool) (*ResourceProfile, string) {
|
||||
cores := 0
|
||||
if cpuInfo != nil {
|
||||
cores = cpuInfo.PhysicalCores
|
||||
if cores == 0 {
|
||||
cores = cpuInfo.LogicalCores
|
||||
}
|
||||
}
|
||||
if cores == 0 {
|
||||
cores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memGB := 0
|
||||
if memInfo != nil {
|
||||
memGB = memInfo.TotalGB
|
||||
}
|
||||
|
||||
// Build reason string
|
||||
var reason strings.Builder
|
||||
reason.WriteString(fmt.Sprintf("System: %d cores, %dGB RAM. ", cores, memGB))
|
||||
|
||||
profile := RecommendProfile(cpuInfo, memInfo, isLargeDB)
|
||||
|
||||
if isLargeDB {
|
||||
reason.WriteString("Large database mode - using conservative settings. Enable LargeDBMode for higher max_locks.")
|
||||
} else if profile.Name == "conservative" {
|
||||
reason.WriteString("Limited resources detected - using conservative profile for stability.")
|
||||
} else if profile.Name == "max-performance" {
|
||||
reason.WriteString("High-end server detected - using maximum parallelism.")
|
||||
} else if profile.Name == "performance" {
|
||||
reason.WriteString("Good resources detected - using performance profile.")
|
||||
} else {
|
||||
reason.WriteString("Using balanced profile for optimal performance/stability trade-off.")
|
||||
}
|
||||
|
||||
return profile, reason.String()
|
||||
}
|
||||
|
||||
// ValidateProfileForSystem checks if a profile is suitable for the current system
|
||||
func ValidateProfileForSystem(profile *ResourceProfile, cpuInfo *CPUInfo, memInfo *MemoryInfo) (bool, []string) {
|
||||
var warnings []string
|
||||
|
||||
cores := 0
|
||||
if cpuInfo != nil {
|
||||
cores = cpuInfo.PhysicalCores
|
||||
if cores == 0 {
|
||||
cores = cpuInfo.LogicalCores
|
||||
}
|
||||
}
|
||||
if cores == 0 {
|
||||
cores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memGB := 0
|
||||
if memInfo != nil {
|
||||
memGB = memInfo.TotalGB
|
||||
}
|
||||
|
||||
// Check minimum requirements
|
||||
if cores < profile.MinCores {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("Profile '%s' recommends %d+ cores (system has %d)", profile.Name, profile.MinCores, cores))
|
||||
}
|
||||
|
||||
if memGB < profile.MinMemoryGB {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("Profile '%s' recommends %dGB+ RAM (system has %dGB)", profile.Name, profile.MinMemoryGB, memGB))
|
||||
}
|
||||
|
||||
// Check for potential issues
|
||||
if profile.ClusterParallelism > cores {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("Cluster parallelism (%d) exceeds CPU cores (%d) - may cause contention",
|
||||
profile.ClusterParallelism, cores))
|
||||
}
|
||||
|
||||
// Memory pressure warning
|
||||
memPerWorker := 2 // Rough estimate: 2GB per parallel worker for large DB operations
|
||||
requiredMem := profile.ClusterParallelism * profile.Jobs * memPerWorker
|
||||
if memGB > 0 && requiredMem > memGB {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("High parallelism may require ~%dGB RAM (system has %dGB) - risk of OOM",
|
||||
requiredMem, memGB))
|
||||
}
|
||||
|
||||
return len(warnings) == 0, warnings
|
||||
}
|
||||
|
||||
// FormatProfileSummary returns a formatted summary of a profile
|
||||
func (p *ResourceProfile) FormatProfileSummary() string {
|
||||
return fmt.Sprintf("[%s] Parallel: %d DBs, %d jobs | Recommended for large DBs: %v",
|
||||
strings.ToUpper(p.Name),
|
||||
p.ClusterParallelism,
|
||||
p.Jobs,
|
||||
p.RecommendedForLarge)
|
||||
}
|
||||
|
||||
// PostgreSQLRecommendations returns PostgreSQL configuration recommendations for this profile
|
||||
func (p *ResourceProfile) PostgreSQLRecommendations() []string {
|
||||
return []string{
|
||||
fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d;", p.MaxLocksPerTxn),
|
||||
fmt.Sprintf("ALTER SYSTEM SET maintenance_work_mem = '%s';", p.MaintenanceWorkMem),
|
||||
"-- Restart PostgreSQL after changes to max_locks_per_transaction",
|
||||
}
|
||||
}
|
||||
|
||||
// Helper functions
|
||||
|
||||
func runCommand(name string, args ...string) (string, error) {
|
||||
cmd := exec.Command(name, args...)
|
||||
output, err := cmd.Output()
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
return string(output), nil
|
||||
}
|
||||
|
||||
func extractNumberFromLine(line string) int64 {
|
||||
// Extract number before the period at end (e.g., "Pages free: 123456.")
|
||||
parts := strings.Fields(line)
|
||||
for _, p := range parts {
|
||||
p = strings.TrimSuffix(p, ".")
|
||||
if val, err := strconv.ParseInt(p, 10, 64); err == nil && val > 0 {
|
||||
return val
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
@@ -316,11 +316,12 @@ func (p *PostgreSQL) BuildBackupCommand(database, outputFile string, options Bac
|
||||
cmd := []string{"pg_dump"}
|
||||
|
||||
// Connection parameters
|
||||
if p.cfg.Host != "localhost" {
|
||||
// CRITICAL: Always pass port even for localhost - user may have non-standard port
|
||||
if p.cfg.Host != "localhost" && p.cfg.Host != "127.0.0.1" && p.cfg.Host != "" {
|
||||
cmd = append(cmd, "-h", p.cfg.Host)
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "--no-password")
|
||||
}
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "-U", p.cfg.User)
|
||||
|
||||
// Format and compression
|
||||
@@ -380,11 +381,12 @@ func (p *PostgreSQL) BuildRestoreCommand(database, inputFile string, options Res
|
||||
cmd := []string{"pg_restore"}
|
||||
|
||||
// Connection parameters
|
||||
if p.cfg.Host != "localhost" {
|
||||
// CRITICAL: Always pass port even for localhost - user may have non-standard port
|
||||
if p.cfg.Host != "localhost" && p.cfg.Host != "127.0.0.1" && p.cfg.Host != "" {
|
||||
cmd = append(cmd, "-h", p.cfg.Host)
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "--no-password")
|
||||
}
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "-U", p.cfg.User)
|
||||
|
||||
// Parallel jobs (incompatible with --single-transaction per PostgreSQL docs)
|
||||
|
||||
@@ -4,6 +4,7 @@ import (
|
||||
"bytes"
|
||||
"crypto/rand"
|
||||
"io"
|
||||
mathrand "math/rand"
|
||||
"testing"
|
||||
)
|
||||
|
||||
@@ -100,12 +101,15 @@ func TestChunker_Deterministic(t *testing.T) {
|
||||
|
||||
func TestChunker_ShiftedData(t *testing.T) {
|
||||
// Test that shifted data still shares chunks (the key CDC benefit)
|
||||
// Use deterministic random data for reproducible test results
|
||||
rng := mathrand.New(mathrand.NewSource(42))
|
||||
|
||||
original := make([]byte, 100*1024)
|
||||
rand.Read(original)
|
||||
rng.Read(original)
|
||||
|
||||
// Create shifted version (prepend some bytes)
|
||||
prefix := make([]byte, 1000)
|
||||
rand.Read(prefix)
|
||||
rng.Read(prefix)
|
||||
shifted := append(prefix, original...)
|
||||
|
||||
// Chunk both
|
||||
|
||||
@@ -34,6 +34,14 @@ type ProgressCallback func(current, total int64, description string)
|
||||
// DatabaseProgressCallback is called with database count progress during cluster restore
|
||||
type DatabaseProgressCallback func(done, total int, dbName string)
|
||||
|
||||
// DatabaseProgressWithTimingCallback is called with database progress including timing info
|
||||
// Parameters: done count, total count, database name, elapsed time for current restore phase, avg duration per DB
|
||||
type DatabaseProgressWithTimingCallback func(done, total int, dbName string, phaseElapsed, avgPerDB time.Duration)
|
||||
|
||||
// DatabaseProgressByBytesCallback is called with progress weighted by database sizes (bytes)
|
||||
// Parameters: bytes completed, total bytes, current database name, databases done count, total database count
|
||||
type DatabaseProgressByBytesCallback func(bytesDone, bytesTotal int64, dbName string, dbDone, dbTotal int)
|
||||
|
||||
// Engine handles database restore operations
|
||||
type Engine struct {
|
||||
cfg *config.Config
|
||||
@@ -47,6 +55,8 @@ type Engine struct {
|
||||
// TUI progress callback for detailed progress reporting
|
||||
progressCallback ProgressCallback
|
||||
dbProgressCallback DatabaseProgressCallback
|
||||
dbProgressTimingCallback DatabaseProgressWithTimingCallback
|
||||
dbProgressByBytesCallback DatabaseProgressByBytesCallback
|
||||
}
|
||||
|
||||
// New creates a new restore engine
|
||||
@@ -112,6 +122,16 @@ func (e *Engine) SetDatabaseProgressCallback(cb DatabaseProgressCallback) {
|
||||
e.dbProgressCallback = cb
|
||||
}
|
||||
|
||||
// SetDatabaseProgressWithTimingCallback sets a callback for database progress with timing info
|
||||
func (e *Engine) SetDatabaseProgressWithTimingCallback(cb DatabaseProgressWithTimingCallback) {
|
||||
e.dbProgressTimingCallback = cb
|
||||
}
|
||||
|
||||
// SetDatabaseProgressByBytesCallback sets a callback for progress weighted by database sizes
|
||||
func (e *Engine) SetDatabaseProgressByBytesCallback(cb DatabaseProgressByBytesCallback) {
|
||||
e.dbProgressByBytesCallback = cb
|
||||
}
|
||||
|
||||
// reportProgress safely calls the progress callback if set
|
||||
func (e *Engine) reportProgress(current, total int64, description string) {
|
||||
if e.progressCallback != nil {
|
||||
@@ -126,6 +146,20 @@ func (e *Engine) reportDatabaseProgress(done, total int, dbName string) {
|
||||
}
|
||||
}
|
||||
|
||||
// reportDatabaseProgressWithTiming safely calls the timing-aware callback if set
|
||||
func (e *Engine) reportDatabaseProgressWithTiming(done, total int, dbName string, phaseElapsed, avgPerDB time.Duration) {
|
||||
if e.dbProgressTimingCallback != nil {
|
||||
e.dbProgressTimingCallback(done, total, dbName, phaseElapsed, avgPerDB)
|
||||
}
|
||||
}
|
||||
|
||||
// reportDatabaseProgressByBytes safely calls the bytes-weighted callback if set
|
||||
func (e *Engine) reportDatabaseProgressByBytes(bytesDone, bytesTotal int64, dbName string, dbDone, dbTotal int) {
|
||||
if e.dbProgressByBytesCallback != nil {
|
||||
e.dbProgressByBytesCallback(bytesDone, bytesTotal, dbName, dbDone, dbTotal)
|
||||
}
|
||||
}
|
||||
|
||||
// loggerAdapter adapts our logger to the progress.Logger interface
|
||||
type loggerAdapter struct {
|
||||
logger logger.Logger
|
||||
@@ -425,16 +459,18 @@ func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB
|
||||
var cmd []string
|
||||
|
||||
// For localhost, omit -h to use Unix socket (avoids Ident auth issues)
|
||||
// But always include -p for port (in case of non-standard port)
|
||||
hostArg := ""
|
||||
portArg := fmt.Sprintf("-p %d", e.cfg.Port)
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "" {
|
||||
hostArg = fmt.Sprintf("-h %s -p %d", e.cfg.Host, e.cfg.Port)
|
||||
hostArg = fmt.Sprintf("-h %s", e.cfg.Host)
|
||||
}
|
||||
|
||||
if compressed {
|
||||
// Use ON_ERROR_STOP=1 to fail fast on first error (prevents millions of errors on truncated dumps)
|
||||
psqlCmd := fmt.Sprintf("psql -U %s -d %s -v ON_ERROR_STOP=1", e.cfg.User, targetDB)
|
||||
psqlCmd := fmt.Sprintf("psql %s -U %s -d %s -v ON_ERROR_STOP=1", portArg, e.cfg.User, targetDB)
|
||||
if hostArg != "" {
|
||||
psqlCmd = fmt.Sprintf("psql %s -U %s -d %s -v ON_ERROR_STOP=1", hostArg, e.cfg.User, targetDB)
|
||||
psqlCmd = fmt.Sprintf("psql %s %s -U %s -d %s -v ON_ERROR_STOP=1", hostArg, portArg, e.cfg.User, targetDB)
|
||||
}
|
||||
// Set PGPASSWORD in the bash command for password-less auth
|
||||
cmd = []string{
|
||||
@@ -455,6 +491,7 @@ func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB
|
||||
} else {
|
||||
cmd = []string{
|
||||
"psql",
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", targetDB,
|
||||
"-v", "ON_ERROR_STOP=1",
|
||||
@@ -841,6 +878,25 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
// Create temporary extraction directory in configured WorkDir
|
||||
workDir := e.cfg.GetEffectiveWorkDir()
|
||||
tempDir := filepath.Join(workDir, fmt.Sprintf(".restore_%d", time.Now().Unix()))
|
||||
|
||||
// Check disk space for extraction (need ~3x archive size: compressed + extracted + working space)
|
||||
if archiveInfo != nil {
|
||||
requiredBytes := uint64(archiveInfo.Size()) * 3
|
||||
extractionCheck := checks.CheckDiskSpace(workDir)
|
||||
if extractionCheck.AvailableBytes < requiredBytes {
|
||||
operation.Fail("Insufficient disk space for extraction")
|
||||
return fmt.Errorf("insufficient disk space for extraction in %s: need %.1f GB, have %.1f GB (archive size: %.1f GB × 3)",
|
||||
workDir,
|
||||
float64(requiredBytes)/(1024*1024*1024),
|
||||
float64(extractionCheck.AvailableBytes)/(1024*1024*1024),
|
||||
float64(archiveInfo.Size())/(1024*1024*1024))
|
||||
}
|
||||
e.log.Info("Disk space check for extraction passed",
|
||||
"workdir", workDir,
|
||||
"required_gb", float64(requiredBytes)/(1024*1024*1024),
|
||||
"available_gb", float64(extractionCheck.AvailableBytes)/(1024*1024*1024))
|
||||
}
|
||||
|
||||
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
||||
operation.Fail("Failed to create temporary directory")
|
||||
return fmt.Errorf("failed to create temp directory in %s: %w", workDir, err)
|
||||
@@ -854,6 +910,16 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
return fmt.Errorf("failed to extract archive: %w", err)
|
||||
}
|
||||
|
||||
// Check context validity after extraction (debugging context cancellation issues)
|
||||
if ctx.Err() != nil {
|
||||
e.log.Error("Context cancelled after extraction - this should not happen",
|
||||
"context_error", ctx.Err(),
|
||||
"extraction_completed", true)
|
||||
operation.Fail("Context cancelled unexpectedly")
|
||||
return fmt.Errorf("context cancelled after extraction completed: %w", ctx.Err())
|
||||
}
|
||||
e.log.Info("Extraction completed, context still valid")
|
||||
|
||||
// Check if user has superuser privileges (required for ownership restoration)
|
||||
e.progress.Update("Checking privileges...")
|
||||
isSuperuser, err := e.checkSuperuser(ctx)
|
||||
@@ -1004,12 +1070,27 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
var restoreErrorsMu sync.Mutex
|
||||
totalDBs := 0
|
||||
|
||||
// Count total databases
|
||||
// Count total databases and calculate total bytes for weighted progress
|
||||
var totalBytes int64
|
||||
dbSizes := make(map[string]int64) // Map database name to dump file size
|
||||
for _, entry := range entries {
|
||||
if !entry.IsDir() {
|
||||
totalDBs++
|
||||
dumpFile := filepath.Join(dumpsDir, entry.Name())
|
||||
if info, err := os.Stat(dumpFile); err == nil {
|
||||
dbName := entry.Name()
|
||||
dbName = strings.TrimSuffix(dbName, ".dump")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql.gz")
|
||||
dbSizes[dbName] = info.Size()
|
||||
totalBytes += info.Size()
|
||||
}
|
||||
}
|
||||
}
|
||||
e.log.Info("Calculated total restore size", "databases", totalDBs, "total_bytes", totalBytes)
|
||||
|
||||
// Track bytes completed for weighted progress
|
||||
var bytesCompleted int64
|
||||
var bytesCompletedMu sync.Mutex
|
||||
|
||||
// Create ETA estimator for database restores
|
||||
estimator := progress.NewETAEstimator("Restoring cluster", totalDBs)
|
||||
@@ -1037,6 +1118,23 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
var successCount, failCount int32
|
||||
var mu sync.Mutex // Protect shared resources (progress, logger)
|
||||
|
||||
// CRITICAL: Check context before starting database restore loop
|
||||
// This helps debug issues where context gets cancelled between extraction and restore
|
||||
if ctx.Err() != nil {
|
||||
e.log.Error("Context cancelled before database restore loop started",
|
||||
"context_error", ctx.Err(),
|
||||
"total_databases", totalDBs,
|
||||
"parallelism", parallelism)
|
||||
operation.Fail("Context cancelled before database restores could start")
|
||||
return fmt.Errorf("context cancelled before database restore: %w", ctx.Err())
|
||||
}
|
||||
e.log.Info("Starting database restore loop", "databases", totalDBs, "parallelism", parallelism)
|
||||
|
||||
// Timing tracking for restore phase progress
|
||||
restorePhaseStart := time.Now()
|
||||
var completedDBTimes []time.Duration // Track duration for each completed DB restore
|
||||
var completedDBTimesMu sync.Mutex
|
||||
|
||||
// Create semaphore to limit concurrency
|
||||
semaphore := make(chan struct{}, parallelism)
|
||||
var wg sync.WaitGroup
|
||||
@@ -1062,6 +1160,19 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
}
|
||||
}()
|
||||
|
||||
// Check for context cancellation before starting
|
||||
if ctx.Err() != nil {
|
||||
e.log.Warn("Context cancelled - skipping database restore", "file", filename)
|
||||
atomic.AddInt32(&failCount, 1)
|
||||
restoreErrorsMu.Lock()
|
||||
restoreErrors = multierror.Append(restoreErrors, fmt.Errorf("%s: restore skipped (context cancelled)", strings.TrimSuffix(strings.TrimSuffix(filename, ".dump"), ".sql.gz")))
|
||||
restoreErrorsMu.Unlock()
|
||||
return
|
||||
}
|
||||
|
||||
// Track timing for this database restore
|
||||
dbRestoreStart := time.Now()
|
||||
|
||||
// Update estimator progress (thread-safe)
|
||||
mu.Lock()
|
||||
estimator.UpdateProgress(idx)
|
||||
@@ -1074,12 +1185,26 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
|
||||
dbProgress := 15 + int(float64(idx)/float64(totalDBs)*85.0)
|
||||
|
||||
// Calculate average time per DB and report progress with timing
|
||||
completedDBTimesMu.Lock()
|
||||
var avgPerDB time.Duration
|
||||
if len(completedDBTimes) > 0 {
|
||||
var totalDuration time.Duration
|
||||
for _, d := range completedDBTimes {
|
||||
totalDuration += d
|
||||
}
|
||||
avgPerDB = totalDuration / time.Duration(len(completedDBTimes))
|
||||
}
|
||||
phaseElapsed := time.Since(restorePhaseStart)
|
||||
completedDBTimesMu.Unlock()
|
||||
|
||||
mu.Lock()
|
||||
statusMsg := fmt.Sprintf("Restoring database %s (%d/%d)", dbName, idx+1, totalDBs)
|
||||
e.progress.Update(statusMsg)
|
||||
e.log.Info("Restoring database", "name", dbName, "file", dumpFile, "progress", dbProgress)
|
||||
// Report database progress for TUI
|
||||
// Report database progress for TUI (both callbacks)
|
||||
e.reportDatabaseProgress(idx, totalDBs, dbName)
|
||||
e.reportDatabaseProgressWithTiming(idx, totalDBs, dbName, phaseElapsed, avgPerDB)
|
||||
mu.Unlock()
|
||||
|
||||
// STEP 1: Drop existing database completely (clean slate)
|
||||
@@ -1144,7 +1269,27 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
return
|
||||
}
|
||||
|
||||
// Track completed database restore duration for ETA calculation
|
||||
dbRestoreDuration := time.Since(dbRestoreStart)
|
||||
completedDBTimesMu.Lock()
|
||||
completedDBTimes = append(completedDBTimes, dbRestoreDuration)
|
||||
completedDBTimesMu.Unlock()
|
||||
|
||||
// Update bytes completed for weighted progress
|
||||
dbSize := dbSizes[dbName]
|
||||
bytesCompletedMu.Lock()
|
||||
bytesCompleted += dbSize
|
||||
currentBytesCompleted := bytesCompleted
|
||||
currentSuccessCount := int(atomic.LoadInt32(&successCount)) + 1 // +1 because we're about to increment
|
||||
bytesCompletedMu.Unlock()
|
||||
|
||||
// Report weighted progress (bytes-based)
|
||||
e.reportDatabaseProgressByBytes(currentBytesCompleted, totalBytes, dbName, currentSuccessCount, totalDBs)
|
||||
|
||||
atomic.AddInt32(&successCount, 1)
|
||||
|
||||
// Small delay to ensure PostgreSQL fully closes connections before next restore
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
}(dbIndex, entry.Name())
|
||||
|
||||
dbIndex++
|
||||
@@ -1156,6 +1301,35 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
successCountFinal := int(atomic.LoadInt32(&successCount))
|
||||
failCountFinal := int(atomic.LoadInt32(&failCount))
|
||||
|
||||
// SANITY CHECK: Verify all databases were accounted for
|
||||
// This catches any goroutine that exited without updating counters
|
||||
accountedFor := successCountFinal + failCountFinal
|
||||
if accountedFor != totalDBs {
|
||||
missingCount := totalDBs - accountedFor
|
||||
e.log.Error("INTERNAL ERROR: Some database restore goroutines did not report status",
|
||||
"expected", totalDBs,
|
||||
"success", successCountFinal,
|
||||
"failed", failCountFinal,
|
||||
"unaccounted", missingCount)
|
||||
|
||||
// Treat unaccounted databases as failures
|
||||
failCountFinal += missingCount
|
||||
restoreErrorsMu.Lock()
|
||||
restoreErrors = multierror.Append(restoreErrors, fmt.Errorf("%d database(s) did not complete (possible goroutine crash or deadlock)", missingCount))
|
||||
restoreErrorsMu.Unlock()
|
||||
}
|
||||
|
||||
// CRITICAL: Check if no databases were restored at all
|
||||
if successCountFinal == 0 {
|
||||
e.progress.Fail(fmt.Sprintf("Cluster restore FAILED: 0 of %d databases restored", totalDBs))
|
||||
operation.Fail("No databases were restored")
|
||||
|
||||
if failCountFinal > 0 && restoreErrors != nil {
|
||||
return fmt.Errorf("cluster restore failed: all %d database(s) failed:\n%s", failCountFinal, restoreErrors.Error())
|
||||
}
|
||||
return fmt.Errorf("cluster restore failed: no databases were restored (0 of %d total). Check PostgreSQL logs for details", totalDBs)
|
||||
}
|
||||
|
||||
if failCountFinal > 0 {
|
||||
// Format multi-error with detailed output
|
||||
restoreErrors.ErrorFormat = func(errs []error) string {
|
||||
@@ -1375,6 +1549,8 @@ func (e *Engine) extractArchiveShell(ctx context.Context, archivePath, destDir s
|
||||
}
|
||||
|
||||
// restoreGlobals restores global objects (roles, tablespaces)
|
||||
// Note: psql returns 0 even when some statements fail (e.g., role already exists)
|
||||
// We track errors but only fail on FATAL errors that would prevent restore
|
||||
func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
args := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
@@ -1404,6 +1580,8 @@ func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
|
||||
// Read stderr in chunks in goroutine
|
||||
var lastError string
|
||||
var errorCount int
|
||||
var fatalError bool
|
||||
stderrDone := make(chan struct{})
|
||||
go func() {
|
||||
defer close(stderrDone)
|
||||
@@ -1412,9 +1590,23 @@ func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
n, err := stderr.Read(buf)
|
||||
if n > 0 {
|
||||
chunk := string(buf[:n])
|
||||
if strings.Contains(chunk, "ERROR") || strings.Contains(chunk, "FATAL") {
|
||||
// Track different error types
|
||||
if strings.Contains(chunk, "FATAL") {
|
||||
fatalError = true
|
||||
lastError = chunk
|
||||
e.log.Warn("Globals restore stderr", "output", chunk)
|
||||
e.log.Error("Globals restore FATAL error", "output", chunk)
|
||||
} else if strings.Contains(chunk, "ERROR") {
|
||||
errorCount++
|
||||
lastError = chunk
|
||||
// Only log first few errors to avoid spam
|
||||
if errorCount <= 5 {
|
||||
// Check if it's an ignorable "already exists" error
|
||||
if strings.Contains(chunk, "already exists") {
|
||||
e.log.Debug("Globals restore: object already exists (expected)", "output", chunk)
|
||||
} else {
|
||||
e.log.Warn("Globals restore error", "output", chunk)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if err != nil {
|
||||
@@ -1442,10 +1634,23 @@ func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
|
||||
<-stderrDone
|
||||
|
||||
// Only fail on actual command errors or FATAL PostgreSQL errors
|
||||
// Regular ERROR messages (like "role already exists") are expected
|
||||
if cmdErr != nil {
|
||||
return fmt.Errorf("failed to restore globals: %w (last error: %s)", cmdErr, lastError)
|
||||
}
|
||||
|
||||
// If we had FATAL errors, those are real problems
|
||||
if fatalError {
|
||||
return fmt.Errorf("globals restore had FATAL error: %s", lastError)
|
||||
}
|
||||
|
||||
// Log summary if there were errors (but don't fail)
|
||||
if errorCount > 0 {
|
||||
e.log.Info("Globals restore completed with some errors (usually 'already exists' - expected)",
|
||||
"error_count", errorCount)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -1513,6 +1718,7 @@ func (e *Engine) terminateConnections(ctx context.Context, dbName string) error
|
||||
}
|
||||
|
||||
// dropDatabaseIfExists drops a database completely (clean slate)
|
||||
// Uses PostgreSQL 13+ WITH (FORCE) option to forcefully drop even with active connections
|
||||
func (e *Engine) dropDatabaseIfExists(ctx context.Context, dbName string) error {
|
||||
// First terminate all connections
|
||||
if err := e.terminateConnections(ctx, dbName); err != nil {
|
||||
@@ -1522,28 +1728,69 @@ func (e *Engine) dropDatabaseIfExists(ctx context.Context, dbName string) error
|
||||
// Wait a moment for connections to terminate
|
||||
time.Sleep(500 * time.Millisecond)
|
||||
|
||||
// Drop the database
|
||||
// Try to revoke new connections (prevents race condition)
|
||||
// This only works if we have the privilege to do so
|
||||
revokeArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("REVOKE CONNECT ON DATABASE \"%s\" FROM PUBLIC", dbName),
|
||||
}
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
revokeArgs = append([]string{"-h", e.cfg.Host}, revokeArgs...)
|
||||
}
|
||||
revokeCmd := exec.CommandContext(ctx, "psql", revokeArgs...)
|
||||
revokeCmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
revokeCmd.Run() // Ignore errors - database might not exist
|
||||
|
||||
// Terminate connections again after revoking connect privilege
|
||||
e.terminateConnections(ctx, dbName)
|
||||
time.Sleep(200 * time.Millisecond)
|
||||
|
||||
// Try DROP DATABASE WITH (FORCE) first (PostgreSQL 13+)
|
||||
// This forcefully terminates connections and drops the database atomically
|
||||
forceArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("DROP DATABASE IF EXISTS \"%s\" WITH (FORCE)", dbName),
|
||||
}
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
forceArgs = append([]string{"-h", e.cfg.Host}, forceArgs...)
|
||||
}
|
||||
forceCmd := exec.CommandContext(ctx, "psql", forceArgs...)
|
||||
forceCmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
|
||||
output, err := forceCmd.CombinedOutput()
|
||||
if err == nil {
|
||||
e.log.Info("Dropped existing database (with FORCE)", "name", dbName)
|
||||
return nil
|
||||
}
|
||||
|
||||
// If FORCE option failed (PostgreSQL < 13), try regular drop
|
||||
if strings.Contains(string(output), "syntax error") || strings.Contains(string(output), "WITH (FORCE)") {
|
||||
e.log.Debug("WITH (FORCE) not supported, using standard DROP", "name", dbName)
|
||||
|
||||
args := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("DROP DATABASE IF EXISTS \"%s\"", dbName),
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
args = append([]string{"-h", e.cfg.Host}, args...)
|
||||
}
|
||||
|
||||
cmd := exec.CommandContext(ctx, "psql", args...)
|
||||
|
||||
// Always set PGPASSWORD (empty string is fine for peer/ident auth)
|
||||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
|
||||
output, err := cmd.CombinedOutput()
|
||||
output, err = cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to drop database '%s': %w\nOutput: %s", dbName, err, string(output))
|
||||
}
|
||||
} else if err != nil {
|
||||
return fmt.Errorf("failed to drop database '%s': %w\nOutput: %s", dbName, err, string(output))
|
||||
}
|
||||
|
||||
e.log.Info("Dropped existing database", "name", dbName)
|
||||
return nil
|
||||
@@ -1584,12 +1831,14 @@ func (e *Engine) ensureMySQLDatabaseExists(ctx context.Context, dbName string) e
|
||||
}
|
||||
|
||||
// ensurePostgresDatabaseExists checks if a PostgreSQL database exists and creates it if not
|
||||
// It attempts to extract encoding/locale from the dump file to preserve original settings
|
||||
func (e *Engine) ensurePostgresDatabaseExists(ctx context.Context, dbName string) error {
|
||||
// Skip creation for postgres and template databases - they should already exist
|
||||
if dbName == "postgres" || dbName == "template0" || dbName == "template1" {
|
||||
e.log.Info("Skipping create for system database (assume exists)", "name", dbName)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Build psql command with authentication
|
||||
buildPsqlCmd := func(ctx context.Context, database, query string) *exec.Cmd {
|
||||
args := []string{
|
||||
@@ -1629,14 +1878,31 @@ func (e *Engine) ensurePostgresDatabaseExists(ctx context.Context, dbName string
|
||||
|
||||
// Database doesn't exist, create it
|
||||
// IMPORTANT: Use template0 to avoid duplicate definition errors from local additions to template1
|
||||
// Also use UTF8 encoding explicitly as it's the most common and safest choice
|
||||
// See PostgreSQL docs: https://www.postgresql.org/docs/current/app-pgrestore.html#APP-PGRESTORE-NOTES
|
||||
e.log.Info("Creating database from template0", "name", dbName)
|
||||
e.log.Info("Creating database from template0 with UTF8 encoding", "name", dbName)
|
||||
|
||||
// Get server's default locale for LC_COLLATE and LC_CTYPE
|
||||
// This ensures compatibility while using the correct encoding
|
||||
localeCmd := buildPsqlCmd(ctx, "postgres", "SHOW lc_collate")
|
||||
localeOutput, _ := localeCmd.CombinedOutput()
|
||||
serverLocale := strings.TrimSpace(string(localeOutput))
|
||||
if serverLocale == "" {
|
||||
serverLocale = "en_US.UTF-8" // Fallback to common default
|
||||
}
|
||||
|
||||
// Build CREATE DATABASE command with encoding and locale
|
||||
// Using ENCODING 'UTF8' explicitly ensures the dump can be restored
|
||||
createSQL := fmt.Sprintf(
|
||||
"CREATE DATABASE \"%s\" WITH TEMPLATE template0 ENCODING 'UTF8' LC_COLLATE '%s' LC_CTYPE '%s'",
|
||||
dbName, serverLocale, serverLocale,
|
||||
)
|
||||
|
||||
createArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("CREATE DATABASE \"%s\" WITH TEMPLATE template0", dbName),
|
||||
"-c", createSQL,
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
@@ -1651,10 +1917,28 @@ func (e *Engine) ensurePostgresDatabaseExists(ctx context.Context, dbName string
|
||||
|
||||
output, err = createCmd.CombinedOutput()
|
||||
if err != nil {
|
||||
// Log the error and include the psql output in the returned error to aid debugging
|
||||
// If encoding/locale fails, try simpler CREATE DATABASE
|
||||
e.log.Warn("Database creation with encoding failed, trying simple create", "name", dbName, "error", err)
|
||||
|
||||
simpleArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("CREATE DATABASE \"%s\" WITH TEMPLATE template0", dbName),
|
||||
}
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
simpleArgs = append([]string{"-h", e.cfg.Host}, simpleArgs...)
|
||||
}
|
||||
|
||||
simpleCmd := exec.CommandContext(ctx, "psql", simpleArgs...)
|
||||
simpleCmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
|
||||
output, err = simpleCmd.CombinedOutput()
|
||||
if err != nil {
|
||||
e.log.Warn("Database creation failed", "name", dbName, "error", err, "output", string(output))
|
||||
return fmt.Errorf("failed to create database '%s': %w (output: %s)", dbName, err, strings.TrimSpace(string(output)))
|
||||
}
|
||||
}
|
||||
|
||||
e.log.Info("Successfully created database from template0", "name", dbName)
|
||||
return nil
|
||||
@@ -1841,9 +2125,10 @@ func (e *Engine) quickValidateSQLDump(archivePath string, compressed bool) error
|
||||
return nil
|
||||
}
|
||||
|
||||
// boostLockCapacity temporarily increases max_locks_per_transaction to prevent OOM
|
||||
// during large restores with many BLOBs. Returns the original value for later reset.
|
||||
// Uses ALTER SYSTEM + pg_reload_conf() so no restart is needed.
|
||||
// boostLockCapacity checks and reports on max_locks_per_transaction capacity.
|
||||
// IMPORTANT: max_locks_per_transaction requires a PostgreSQL RESTART to change!
|
||||
// This function now calculates total lock capacity based on max_connections and
|
||||
// warns the user if capacity is insufficient for the restore.
|
||||
func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
|
||||
// Connect to PostgreSQL to run system commands
|
||||
connStr := fmt.Sprintf("host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable",
|
||||
@@ -1861,7 +2146,7 @@ func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
// Get current value
|
||||
// Get current max_locks_per_transaction
|
||||
var currentValue int
|
||||
err = db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(¤tValue)
|
||||
if err != nil {
|
||||
@@ -1874,22 +2159,56 @@ func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
|
||||
fmt.Sscanf(currentValueStr, "%d", ¤tValue)
|
||||
}
|
||||
|
||||
// Skip if already high enough
|
||||
if currentValue >= 2048 {
|
||||
e.log.Info("max_locks_per_transaction already sufficient", "value", currentValue)
|
||||
return currentValue, nil
|
||||
// Get max_connections to calculate total lock capacity
|
||||
var maxConns int
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_connections").Scan(&maxConns); err != nil {
|
||||
maxConns = 100 // default
|
||||
}
|
||||
|
||||
// Boost to 2048 (enough for most BLOB-heavy databases)
|
||||
_, err = db.ExecContext(ctx, "ALTER SYSTEM SET max_locks_per_transaction = 2048")
|
||||
if err != nil {
|
||||
return currentValue, fmt.Errorf("failed to set max_locks_per_transaction: %w", err)
|
||||
// Get max_prepared_transactions
|
||||
var maxPreparedTxns int
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_prepared_transactions").Scan(&maxPreparedTxns); err != nil {
|
||||
maxPreparedTxns = 0
|
||||
}
|
||||
|
||||
// Reload config without restart
|
||||
_, err = db.ExecContext(ctx, "SELECT pg_reload_conf()")
|
||||
// Calculate total lock table capacity:
|
||||
// Total locks = max_locks_per_transaction × (max_connections + max_prepared_transactions)
|
||||
totalLockCapacity := currentValue * (maxConns + maxPreparedTxns)
|
||||
|
||||
e.log.Info("PostgreSQL lock table capacity",
|
||||
"max_locks_per_transaction", currentValue,
|
||||
"max_connections", maxConns,
|
||||
"max_prepared_transactions", maxPreparedTxns,
|
||||
"total_lock_capacity", totalLockCapacity)
|
||||
|
||||
// Minimum recommended total capacity for BLOB-heavy restores: 200,000 locks
|
||||
minRecommendedCapacity := 200000
|
||||
if totalLockCapacity < minRecommendedCapacity {
|
||||
recommendedMaxLocks := minRecommendedCapacity / (maxConns + maxPreparedTxns)
|
||||
if recommendedMaxLocks < 4096 {
|
||||
recommendedMaxLocks = 4096
|
||||
}
|
||||
|
||||
e.log.Warn("Lock table capacity may be insufficient for BLOB-heavy restores",
|
||||
"current_total_capacity", totalLockCapacity,
|
||||
"recommended_capacity", minRecommendedCapacity,
|
||||
"current_max_locks", currentValue,
|
||||
"recommended_max_locks", recommendedMaxLocks,
|
||||
"note", "max_locks_per_transaction requires PostgreSQL RESTART to change")
|
||||
|
||||
// Write suggested fix to ALTER SYSTEM but warn about restart
|
||||
_, err = db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", recommendedMaxLocks))
|
||||
if err != nil {
|
||||
return currentValue, fmt.Errorf("failed to reload config: %w", err)
|
||||
e.log.Warn("Could not set recommended max_locks_per_transaction (needs superuser)", "error", err)
|
||||
} else {
|
||||
e.log.Warn("Wrote recommended max_locks_per_transaction to postgresql.auto.conf",
|
||||
"value", recommendedMaxLocks,
|
||||
"action", "RESTART PostgreSQL to apply: sudo systemctl restart postgresql")
|
||||
}
|
||||
} else {
|
||||
e.log.Info("Lock table capacity is sufficient",
|
||||
"total_capacity", totalLockCapacity,
|
||||
"max_locks_per_transaction", currentValue)
|
||||
}
|
||||
|
||||
return currentValue, nil
|
||||
@@ -1937,6 +2256,8 @@ type OriginalSettings struct {
|
||||
}
|
||||
|
||||
// boostPostgreSQLSettings boosts multiple PostgreSQL settings for large restores
|
||||
// NOTE: max_locks_per_transaction requires a PostgreSQL RESTART to take effect!
|
||||
// maintenance_work_mem can be changed with pg_reload_conf().
|
||||
func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int) (*OriginalSettings, error) {
|
||||
connStr := e.buildConnString()
|
||||
db, err := sql.Open("pgx", connStr)
|
||||
@@ -1956,30 +2277,156 @@ func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int
|
||||
// Get current maintenance_work_mem
|
||||
db.QueryRowContext(ctx, "SHOW maintenance_work_mem").Scan(&original.MaintenanceWorkMem)
|
||||
|
||||
// Boost max_locks_per_transaction (if not already high enough)
|
||||
// CRITICAL: max_locks_per_transaction requires a PostgreSQL RESTART!
|
||||
// pg_reload_conf() is NOT sufficient for this parameter.
|
||||
needsRestart := false
|
||||
if original.MaxLocks < lockBoostValue {
|
||||
_, err = db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", lockBoostValue))
|
||||
if err != nil {
|
||||
e.log.Warn("Could not boost max_locks_per_transaction", "error", err)
|
||||
e.log.Warn("Could not set max_locks_per_transaction", "error", err)
|
||||
} else {
|
||||
needsRestart = true
|
||||
e.log.Warn("max_locks_per_transaction requires PostgreSQL restart to take effect",
|
||||
"current", original.MaxLocks,
|
||||
"target", lockBoostValue)
|
||||
}
|
||||
}
|
||||
|
||||
// Boost maintenance_work_mem to 2GB for faster index creation
|
||||
// (this one CAN be applied via pg_reload_conf)
|
||||
_, err = db.ExecContext(ctx, "ALTER SYSTEM SET maintenance_work_mem = '2GB'")
|
||||
if err != nil {
|
||||
e.log.Warn("Could not boost maintenance_work_mem", "error", err)
|
||||
}
|
||||
|
||||
// Reload config to apply changes (no restart needed for these settings)
|
||||
// Reload config to apply maintenance_work_mem
|
||||
_, err = db.ExecContext(ctx, "SELECT pg_reload_conf()")
|
||||
if err != nil {
|
||||
return original, fmt.Errorf("failed to reload config: %w", err)
|
||||
}
|
||||
|
||||
// If max_locks_per_transaction needs a restart, try to do it
|
||||
if needsRestart {
|
||||
if restarted := e.tryRestartPostgreSQL(ctx); restarted {
|
||||
e.log.Info("PostgreSQL restarted successfully - max_locks_per_transaction now active")
|
||||
// Wait for PostgreSQL to be ready
|
||||
time.Sleep(3 * time.Second)
|
||||
} else {
|
||||
// Cannot restart - warn user but continue
|
||||
// The setting is written to postgresql.auto.conf and will take effect on next restart
|
||||
e.log.Warn("=" + strings.Repeat("=", 70))
|
||||
e.log.Warn("NOTE: max_locks_per_transaction change requires PostgreSQL restart")
|
||||
e.log.Warn("Current value: " + strconv.Itoa(original.MaxLocks) + ", target: " + strconv.Itoa(lockBoostValue))
|
||||
e.log.Warn("")
|
||||
e.log.Warn("The setting has been saved to postgresql.auto.conf and will take")
|
||||
e.log.Warn("effect on the next PostgreSQL restart. If restore fails with")
|
||||
e.log.Warn("'out of shared memory' errors, ask your DBA to restart PostgreSQL.")
|
||||
e.log.Warn("")
|
||||
e.log.Warn("Continuing with restore - this may succeed if your databases")
|
||||
e.log.Warn("don't have many large objects (BLOBs).")
|
||||
e.log.Warn("=" + strings.Repeat("=", 70))
|
||||
// Continue anyway - might work for small restores or DBs without BLOBs
|
||||
}
|
||||
}
|
||||
|
||||
return original, nil
|
||||
}
|
||||
|
||||
// canRestartPostgreSQL checks if we have the ability to restart PostgreSQL
|
||||
// Returns false if running in a restricted environment (e.g., su postgres on enterprise systems)
|
||||
func (e *Engine) canRestartPostgreSQL() bool {
|
||||
// Check if we're running as postgres user - if so, we likely can't restart
|
||||
// because PostgreSQL is managed by init/systemd, not directly by pg_ctl
|
||||
currentUser := os.Getenv("USER")
|
||||
if currentUser == "" {
|
||||
currentUser = os.Getenv("LOGNAME")
|
||||
}
|
||||
|
||||
// If we're the postgres user, check if we have sudo access
|
||||
if currentUser == "postgres" {
|
||||
// Try a quick sudo check - if this fails, we can't restart
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(ctx, "sudo", "-n", "true")
|
||||
cmd.Stdin = nil
|
||||
if err := cmd.Run(); err != nil {
|
||||
e.log.Info("Running as postgres user without sudo access - cannot restart PostgreSQL",
|
||||
"user", currentUser,
|
||||
"hint", "Ask system administrator to restart PostgreSQL if needed")
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
// tryRestartPostgreSQL attempts to restart PostgreSQL using various methods
|
||||
// Returns true if restart was successful
|
||||
// IMPORTANT: Uses short timeouts and non-interactive sudo to avoid blocking on password prompts
|
||||
// NOTE: This function will return false immediately if running as postgres without sudo
|
||||
func (e *Engine) tryRestartPostgreSQL(ctx context.Context) bool {
|
||||
// First check if we can even attempt a restart
|
||||
if !e.canRestartPostgreSQL() {
|
||||
e.log.Info("Skipping PostgreSQL restart attempt (no privileges)")
|
||||
return false
|
||||
}
|
||||
|
||||
e.progress.Update("Attempting PostgreSQL restart for lock settings...")
|
||||
|
||||
// Use short timeout for each restart attempt (don't block on sudo password prompts)
|
||||
runWithTimeout := func(args ...string) bool {
|
||||
cmdCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(cmdCtx, args[0], args[1:]...)
|
||||
// Set stdin to /dev/null to prevent sudo from waiting for password
|
||||
cmd.Stdin = nil
|
||||
return cmd.Run() == nil
|
||||
}
|
||||
|
||||
// Method 1: systemctl (most common on modern Linux) - use sudo -n for non-interactive
|
||||
if runWithTimeout("sudo", "-n", "systemctl", "restart", "postgresql") {
|
||||
return true
|
||||
}
|
||||
|
||||
// Method 2: systemctl with version suffix (e.g., postgresql-15)
|
||||
for _, ver := range []string{"17", "16", "15", "14", "13", "12"} {
|
||||
if runWithTimeout("sudo", "-n", "systemctl", "restart", "postgresql-"+ver) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
// Method 3: service command (older systems)
|
||||
if runWithTimeout("sudo", "-n", "service", "postgresql", "restart") {
|
||||
return true
|
||||
}
|
||||
|
||||
// Method 4: pg_ctl as postgres user (if we ARE postgres user, no sudo needed)
|
||||
if runWithTimeout("pg_ctl", "restart", "-D", "/var/lib/postgresql/data", "-m", "fast") {
|
||||
return true
|
||||
}
|
||||
|
||||
// Method 5: Try common PGDATA paths with pg_ctl directly (for postgres user)
|
||||
pgdataPaths := []string{
|
||||
"/var/lib/pgsql/data",
|
||||
"/var/lib/pgsql/17/data",
|
||||
"/var/lib/pgsql/16/data",
|
||||
"/var/lib/pgsql/15/data",
|
||||
"/var/lib/postgresql/17/main",
|
||||
"/var/lib/postgresql/16/main",
|
||||
"/var/lib/postgresql/15/main",
|
||||
}
|
||||
for _, pgdata := range pgdataPaths {
|
||||
if runWithTimeout("pg_ctl", "restart", "-D", pgdata, "-m", "fast") {
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
|
||||
// resetPostgreSQLSettings restores original PostgreSQL settings
|
||||
// NOTE: max_locks_per_transaction changes are written but require restart to take effect.
|
||||
// We don't restart here since we're done with the restore.
|
||||
func (e *Engine) resetPostgreSQLSettings(ctx context.Context, original *OriginalSettings) error {
|
||||
connStr := e.buildConnString()
|
||||
db, err := sql.Open("pgx", connStr)
|
||||
@@ -1988,25 +2435,28 @@ func (e *Engine) resetPostgreSQLSettings(ctx context.Context, original *Original
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
// Reset max_locks_per_transaction
|
||||
// Reset max_locks_per_transaction (will take effect on next restart)
|
||||
if original.MaxLocks == 64 { // Default
|
||||
db.ExecContext(ctx, "ALTER SYSTEM RESET max_locks_per_transaction")
|
||||
} else if original.MaxLocks > 0 {
|
||||
db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", original.MaxLocks))
|
||||
}
|
||||
|
||||
// Reset maintenance_work_mem
|
||||
// Reset maintenance_work_mem (takes effect immediately with reload)
|
||||
if original.MaintenanceWorkMem == "64MB" { // Default
|
||||
db.ExecContext(ctx, "ALTER SYSTEM RESET maintenance_work_mem")
|
||||
} else if original.MaintenanceWorkMem != "" {
|
||||
db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET maintenance_work_mem = '%s'", original.MaintenanceWorkMem))
|
||||
}
|
||||
|
||||
// Reload config
|
||||
// Reload config (only maintenance_work_mem will take effect immediately)
|
||||
_, err = db.ExecContext(ctx, "SELECT pg_reload_conf()")
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to reload config: %w", err)
|
||||
}
|
||||
|
||||
e.log.Info("PostgreSQL settings reset queued",
|
||||
"note", "max_locks_per_transaction will revert on next PostgreSQL restart")
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -16,6 +16,57 @@ import (
|
||||
"github.com/shirou/gopsutil/v3/mem"
|
||||
)
|
||||
|
||||
// CalculateOptimalParallel returns the recommended number of parallel workers
|
||||
// based on available system resources (CPU cores and RAM).
|
||||
// This is a standalone function that can be called from anywhere.
|
||||
// Returns 0 if resources cannot be detected.
|
||||
func CalculateOptimalParallel() int {
|
||||
cpuCores := runtime.NumCPU()
|
||||
|
||||
vmem, err := mem.VirtualMemory()
|
||||
if err != nil {
|
||||
// Fallback: use half of CPU cores if memory detection fails
|
||||
if cpuCores > 1 {
|
||||
return cpuCores / 2
|
||||
}
|
||||
return 1
|
||||
}
|
||||
|
||||
memAvailableGB := float64(vmem.Available) / (1024 * 1024 * 1024)
|
||||
|
||||
// Each pg_restore worker needs approximately 2-4GB of RAM
|
||||
// Use conservative 3GB per worker to avoid OOM
|
||||
const memPerWorkerGB = 3.0
|
||||
|
||||
// Calculate limits
|
||||
maxByMem := int(memAvailableGB / memPerWorkerGB)
|
||||
maxByCPU := cpuCores
|
||||
|
||||
// Use the minimum of memory and CPU limits
|
||||
recommended := maxByMem
|
||||
if maxByCPU < recommended {
|
||||
recommended = maxByCPU
|
||||
}
|
||||
|
||||
// Apply sensible bounds
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
if recommended > 16 {
|
||||
recommended = 16 // Cap at 16 to avoid diminishing returns
|
||||
}
|
||||
|
||||
// If memory pressure is high (>80%), reduce parallelism
|
||||
if vmem.UsedPercent > 80 && recommended > 1 {
|
||||
recommended = recommended / 2
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
}
|
||||
|
||||
return recommended
|
||||
}
|
||||
|
||||
// PreflightResult contains all preflight check results
|
||||
type PreflightResult struct {
|
||||
// Linux system checks
|
||||
@@ -40,6 +91,8 @@ type LinuxChecks struct {
|
||||
MemTotal uint64 // Total RAM in bytes
|
||||
MemAvailable uint64 // Available RAM in bytes
|
||||
MemUsedPercent float64 // Memory usage percentage
|
||||
CPUCores int // Number of CPU cores
|
||||
RecommendedParallel int // Auto-calculated optimal parallel count
|
||||
ShmMaxOK bool // Is shmmax sufficient?
|
||||
ShmAllOK bool // Is shmall sufficient?
|
||||
MemAvailableOK bool // Is available RAM sufficient?
|
||||
@@ -49,6 +102,8 @@ type LinuxChecks struct {
|
||||
// PostgreSQLChecks contains PostgreSQL configuration checks
|
||||
type PostgreSQLChecks struct {
|
||||
MaxLocksPerTransaction int // Current setting
|
||||
MaxPreparedTransactions int // Current setting (affects lock capacity)
|
||||
TotalLockCapacity int // Calculated: max_locks × (max_connections + max_prepared)
|
||||
MaintenanceWorkMem string // Current setting
|
||||
SharedBuffers string // Current setting (info only)
|
||||
MaxConnections int // Current setting
|
||||
@@ -98,6 +153,7 @@ func (e *Engine) RunPreflightChecks(ctx context.Context, dumpsDir string, entrie
|
||||
// checkSystemResources uses gopsutil for cross-platform system checks
|
||||
func (e *Engine) checkSystemResources(result *PreflightResult) {
|
||||
result.Linux.IsLinux = runtime.GOOS == "linux"
|
||||
result.Linux.CPUCores = runtime.NumCPU()
|
||||
|
||||
// Get memory info (works on Linux, macOS, Windows, BSD)
|
||||
if vmem, err := mem.VirtualMemory(); err == nil {
|
||||
@@ -116,6 +172,9 @@ func (e *Engine) checkSystemResources(result *PreflightResult) {
|
||||
e.log.Warn("Could not detect system memory", "error", err)
|
||||
}
|
||||
|
||||
// Calculate recommended parallel based on resources
|
||||
result.Linux.RecommendedParallel = e.calculateRecommendedParallel(result)
|
||||
|
||||
// Linux-specific kernel checks (shmmax, shmall)
|
||||
if result.Linux.IsLinux {
|
||||
e.checkLinuxKernel(result)
|
||||
@@ -201,10 +260,70 @@ func (e *Engine) checkPostgreSQL(ctx context.Context, result *PreflightResult) {
|
||||
result.PostgreSQL.IsSuperuser = isSuperuser
|
||||
}
|
||||
|
||||
// Add info/warnings
|
||||
// Check max_prepared_transactions for lock capacity calculation
|
||||
var maxPreparedTxns string
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_prepared_transactions").Scan(&maxPreparedTxns); err == nil {
|
||||
result.PostgreSQL.MaxPreparedTransactions, _ = strconv.Atoi(maxPreparedTxns)
|
||||
}
|
||||
|
||||
// CRITICAL: Calculate TOTAL lock table capacity
|
||||
// Formula: max_locks_per_transaction × (max_connections + max_prepared_transactions)
|
||||
// This is THE key capacity metric for BLOB-heavy restores
|
||||
maxConns := result.PostgreSQL.MaxConnections
|
||||
if maxConns == 0 {
|
||||
maxConns = 100 // default
|
||||
}
|
||||
maxPrepared := result.PostgreSQL.MaxPreparedTransactions
|
||||
totalLockCapacity := result.PostgreSQL.MaxLocksPerTransaction * (maxConns + maxPrepared)
|
||||
result.PostgreSQL.TotalLockCapacity = totalLockCapacity
|
||||
|
||||
e.log.Info("PostgreSQL lock table capacity",
|
||||
"max_locks_per_transaction", result.PostgreSQL.MaxLocksPerTransaction,
|
||||
"max_connections", maxConns,
|
||||
"max_prepared_transactions", maxPrepared,
|
||||
"total_lock_capacity", totalLockCapacity)
|
||||
|
||||
// CRITICAL: max_locks_per_transaction requires PostgreSQL RESTART to change!
|
||||
// Warn users loudly about this - it's the #1 cause of "out of shared memory" errors
|
||||
if result.PostgreSQL.MaxLocksPerTransaction < 256 {
|
||||
e.log.Info("PostgreSQL max_locks_per_transaction is low - will auto-boost",
|
||||
"current", result.PostgreSQL.MaxLocksPerTransaction)
|
||||
e.log.Warn("PostgreSQL max_locks_per_transaction is LOW",
|
||||
"current", result.PostgreSQL.MaxLocksPerTransaction,
|
||||
"recommended", "256+",
|
||||
"note", "REQUIRES PostgreSQL restart to change!")
|
||||
|
||||
result.Warnings = append(result.Warnings,
|
||||
fmt.Sprintf("max_locks_per_transaction=%d is low (recommend 256+). "+
|
||||
"This setting requires PostgreSQL RESTART to change. "+
|
||||
"BLOB-heavy databases may fail with 'out of shared memory' error. "+
|
||||
"Fix: Edit postgresql.conf, set max_locks_per_transaction=2048, then restart PostgreSQL.",
|
||||
result.PostgreSQL.MaxLocksPerTransaction))
|
||||
}
|
||||
|
||||
// NEW: Check total lock capacity is sufficient for typical BLOB operations
|
||||
// Minimum recommended: 200,000 for moderate BLOB databases
|
||||
minRecommendedCapacity := 200000
|
||||
if totalLockCapacity < minRecommendedCapacity {
|
||||
recommendedMaxLocks := minRecommendedCapacity / (maxConns + maxPrepared)
|
||||
if recommendedMaxLocks < 4096 {
|
||||
recommendedMaxLocks = 4096
|
||||
}
|
||||
|
||||
e.log.Warn("Total lock table capacity is LOW for BLOB-heavy restores",
|
||||
"current_capacity", totalLockCapacity,
|
||||
"recommended", minRecommendedCapacity,
|
||||
"current_max_locks", result.PostgreSQL.MaxLocksPerTransaction,
|
||||
"current_max_connections", maxConns,
|
||||
"recommended_max_locks", recommendedMaxLocks,
|
||||
"note", "VMs with fewer connections need higher max_locks_per_transaction")
|
||||
|
||||
result.Warnings = append(result.Warnings,
|
||||
fmt.Sprintf("Total lock capacity=%d is low (recommend %d+). "+
|
||||
"Capacity = max_locks_per_transaction(%d) × max_connections(%d). "+
|
||||
"If you reduced VM size/connections, increase max_locks_per_transaction to %d. "+
|
||||
"Fix: ALTER SYSTEM SET max_locks_per_transaction = %d; then RESTART PostgreSQL.",
|
||||
totalLockCapacity, minRecommendedCapacity,
|
||||
result.PostgreSQL.MaxLocksPerTransaction, maxConns,
|
||||
recommendedMaxLocks, recommendedMaxLocks))
|
||||
}
|
||||
|
||||
// Parse shared_buffers and warn if very low
|
||||
@@ -315,20 +434,113 @@ func (e *Engine) calculateRecommendations(result *PreflightResult) {
|
||||
if result.Archive.TotalBlobCount > 50000 {
|
||||
lockBoost = 16384
|
||||
}
|
||||
if result.Archive.TotalBlobCount > 100000 {
|
||||
lockBoost = 32768
|
||||
}
|
||||
if result.Archive.TotalBlobCount > 200000 {
|
||||
lockBoost = 65536
|
||||
}
|
||||
|
||||
// Cap at reasonable maximum
|
||||
if lockBoost > 16384 {
|
||||
lockBoost = 16384
|
||||
// For extreme cases, calculate actual requirement
|
||||
// Rule of thumb: ~1 lock per BLOB, divided by max_connections (default 100)
|
||||
// Add 50% safety margin
|
||||
maxConns := result.PostgreSQL.MaxConnections
|
||||
if maxConns == 0 {
|
||||
maxConns = 100 // default
|
||||
}
|
||||
calculatedLocks := (result.Archive.TotalBlobCount / maxConns) * 3 / 2 // 1.5x safety margin
|
||||
if calculatedLocks > lockBoost {
|
||||
lockBoost = calculatedLocks
|
||||
}
|
||||
|
||||
result.Archive.RecommendedLockBoost = lockBoost
|
||||
|
||||
// CRITICAL: Check if current max_locks_per_transaction is dangerously low for this BLOB count
|
||||
currentLocks := result.PostgreSQL.MaxLocksPerTransaction
|
||||
if currentLocks > 0 && result.Archive.TotalBlobCount > 0 {
|
||||
// Estimate max BLOBs we can handle: locks * max_connections
|
||||
maxSafeBLOBs := currentLocks * maxConns
|
||||
|
||||
if result.Archive.TotalBlobCount > maxSafeBLOBs {
|
||||
severity := "WARNING"
|
||||
if result.Archive.TotalBlobCount > maxSafeBLOBs*2 {
|
||||
severity = "CRITICAL"
|
||||
result.CanProceed = false
|
||||
}
|
||||
|
||||
e.log.Error(fmt.Sprintf("%s: max_locks_per_transaction too low for BLOB count", severity),
|
||||
"current_max_locks", currentLocks,
|
||||
"total_blobs", result.Archive.TotalBlobCount,
|
||||
"max_safe_blobs", maxSafeBLOBs,
|
||||
"recommended_max_locks", lockBoost)
|
||||
|
||||
result.Errors = append(result.Errors,
|
||||
fmt.Sprintf("%s: Archive contains %s BLOBs but max_locks_per_transaction=%d can only safely handle ~%s. "+
|
||||
"Increase max_locks_per_transaction to %d in postgresql.conf and RESTART PostgreSQL.",
|
||||
severity,
|
||||
humanize.Comma(int64(result.Archive.TotalBlobCount)),
|
||||
currentLocks,
|
||||
humanize.Comma(int64(maxSafeBLOBs)),
|
||||
lockBoost))
|
||||
}
|
||||
}
|
||||
|
||||
// Log recommendation
|
||||
e.log.Info("Calculated recommended lock boost",
|
||||
"total_blobs", result.Archive.TotalBlobCount,
|
||||
"recommended_locks", lockBoost)
|
||||
}
|
||||
|
||||
// calculateRecommendedParallel determines optimal parallelism based on system resources
|
||||
// Returns the recommended number of parallel workers for pg_restore
|
||||
func (e *Engine) calculateRecommendedParallel(result *PreflightResult) int {
|
||||
cpuCores := result.Linux.CPUCores
|
||||
if cpuCores == 0 {
|
||||
cpuCores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memAvailableGB := float64(result.Linux.MemAvailable) / (1024 * 1024 * 1024)
|
||||
|
||||
// Each pg_restore worker needs approximately 2-4GB of RAM
|
||||
// Use conservative 3GB per worker to avoid OOM
|
||||
const memPerWorkerGB = 3.0
|
||||
|
||||
// Calculate limits
|
||||
maxByMem := int(memAvailableGB / memPerWorkerGB)
|
||||
maxByCPU := cpuCores
|
||||
|
||||
// Use the minimum of memory and CPU limits
|
||||
recommended := maxByMem
|
||||
if maxByCPU < recommended {
|
||||
recommended = maxByCPU
|
||||
}
|
||||
|
||||
// Apply sensible bounds
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
if recommended > 16 {
|
||||
recommended = 16 // Cap at 16 to avoid diminishing returns
|
||||
}
|
||||
|
||||
// If memory pressure is high (>80%), reduce parallelism
|
||||
if result.Linux.MemUsedPercent > 80 && recommended > 1 {
|
||||
recommended = recommended / 2
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
}
|
||||
|
||||
e.log.Info("Calculated recommended parallel",
|
||||
"cpu_cores", cpuCores,
|
||||
"mem_available_gb", fmt.Sprintf("%.1f", memAvailableGB),
|
||||
"max_by_mem", maxByMem,
|
||||
"max_by_cpu", maxByCPU,
|
||||
"recommended", recommended)
|
||||
|
||||
return recommended
|
||||
}
|
||||
|
||||
// printPreflightSummary prints a nice summary of all checks
|
||||
func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
fmt.Println()
|
||||
@@ -341,6 +553,8 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
printCheck("Total RAM", humanize.Bytes(result.Linux.MemTotal), true)
|
||||
printCheck("Available RAM", humanize.Bytes(result.Linux.MemAvailable), result.Linux.MemAvailableOK || result.Linux.MemAvailable == 0)
|
||||
printCheck("Memory Usage", fmt.Sprintf("%.1f%%", result.Linux.MemUsedPercent), result.Linux.MemUsedPercent < 85)
|
||||
printCheck("CPU Cores", fmt.Sprintf("%d", result.Linux.CPUCores), true)
|
||||
printCheck("Recommended Parallel", fmt.Sprintf("%d (auto-calculated)", result.Linux.RecommendedParallel), true)
|
||||
|
||||
// Linux-specific kernel checks
|
||||
if result.Linux.IsLinux && result.Linux.ShmMax > 0 {
|
||||
@@ -356,6 +570,13 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
humanize.Comma(int64(result.PostgreSQL.MaxLocksPerTransaction)),
|
||||
humanize.Comma(int64(result.Archive.RecommendedLockBoost))),
|
||||
true)
|
||||
printCheck("max_connections", humanize.Comma(int64(result.PostgreSQL.MaxConnections)), true)
|
||||
// Show total lock capacity with warning if low
|
||||
totalCapacityOK := result.PostgreSQL.TotalLockCapacity >= 200000
|
||||
printCheck("Total Lock Capacity",
|
||||
fmt.Sprintf("%s (max_locks × max_conns)",
|
||||
humanize.Comma(int64(result.PostgreSQL.TotalLockCapacity))),
|
||||
totalCapacityOK)
|
||||
printCheck("maintenance_work_mem", fmt.Sprintf("%s → 2GB (auto-boost)",
|
||||
result.PostgreSQL.MaintenanceWorkMem), true)
|
||||
printInfo("shared_buffers", result.PostgreSQL.SharedBuffers)
|
||||
@@ -377,6 +598,14 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
}
|
||||
}
|
||||
|
||||
// Errors (blocking issues)
|
||||
if len(result.Errors) > 0 {
|
||||
fmt.Println("\n ✗ ERRORS (must fix before proceeding):")
|
||||
for _, e := range result.Errors {
|
||||
fmt.Printf(" • %s\n", e)
|
||||
}
|
||||
}
|
||||
|
||||
// Warnings
|
||||
if len(result.Warnings) > 0 {
|
||||
fmt.Println("\n ⚠ Warnings:")
|
||||
@@ -385,6 +614,23 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
}
|
||||
}
|
||||
|
||||
// Final status
|
||||
fmt.Println()
|
||||
if !result.CanProceed {
|
||||
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
|
||||
fmt.Println(" │ ✗ PREFLIGHT FAILED - Cannot proceed with restore │")
|
||||
fmt.Println(" │ Fix the errors above and try again. │")
|
||||
fmt.Println(" └─────────────────────────────────────────────────────────┘")
|
||||
} else if len(result.Warnings) > 0 {
|
||||
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
|
||||
fmt.Println(" │ ⚠ PREFLIGHT PASSED WITH WARNINGS - Proceed with care │")
|
||||
fmt.Println(" └─────────────────────────────────────────────────────────┘")
|
||||
} else {
|
||||
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
|
||||
fmt.Println(" │ ✓ PREFLIGHT PASSED - Ready to restore │")
|
||||
fmt.Println(" └─────────────────────────────────────────────────────────┘")
|
||||
}
|
||||
|
||||
fmt.Println(strings.Repeat("─", 60))
|
||||
fmt.Println()
|
||||
}
|
||||
|
||||
@@ -334,10 +334,12 @@ func (s *Safety) checkPostgresDatabaseExists(ctx context.Context, dbName string)
|
||||
"-tAc", fmt.Sprintf("SELECT 1 FROM pg_database WHERE datname='%s'", dbName),
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
if s.cfg.Host != "localhost" && s.cfg.Host != "127.0.0.1" && s.cfg.Host != "" {
|
||||
args = append([]string{"-h", s.cfg.Host}, args...)
|
||||
// Always add -h flag for explicit host connection (required for password auth)
|
||||
host := s.cfg.Host
|
||||
if host == "" {
|
||||
host = "localhost"
|
||||
}
|
||||
args = append([]string{"-h", host}, args...)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "psql", args...)
|
||||
|
||||
@@ -346,9 +348,9 @@ func (s *Safety) checkPostgresDatabaseExists(ctx context.Context, dbName string)
|
||||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
|
||||
}
|
||||
|
||||
output, err := cmd.Output()
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
return false, fmt.Errorf("failed to check database existence: %w", err)
|
||||
return false, fmt.Errorf("failed to check database existence: %w (output: %s)", err, strings.TrimSpace(string(output)))
|
||||
}
|
||||
|
||||
return strings.TrimSpace(string(output)) == "1", nil
|
||||
@@ -405,21 +407,29 @@ func (s *Safety) listPostgresUserDatabases(ctx context.Context) ([]string, error
|
||||
"-c", query,
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
if s.cfg.Host != "localhost" && s.cfg.Host != "127.0.0.1" && s.cfg.Host != "" {
|
||||
args = append([]string{"-h", s.cfg.Host}, args...)
|
||||
// Always add -h flag for explicit host connection (required for password auth)
|
||||
// Empty or unset host defaults to localhost
|
||||
host := s.cfg.Host
|
||||
if host == "" {
|
||||
host = "localhost"
|
||||
}
|
||||
args = append([]string{"-h", host}, args...)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "psql", args...)
|
||||
|
||||
// Set password if provided
|
||||
// Set password - check config first, then environment
|
||||
env := os.Environ()
|
||||
if s.cfg.Password != "" {
|
||||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
|
||||
env = append(env, fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
|
||||
}
|
||||
cmd.Env = env
|
||||
|
||||
output, err := cmd.Output()
|
||||
s.log.Debug("Listing PostgreSQL databases", "host", host, "port", s.cfg.Port, "user", s.cfg.User)
|
||||
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to list databases: %w", err)
|
||||
// Include psql output in error for debugging
|
||||
return nil, fmt.Errorf("failed to list databases: %w (output: %s)", err, strings.TrimSpace(string(output)))
|
||||
}
|
||||
|
||||
// Parse output
|
||||
@@ -432,6 +442,8 @@ func (s *Safety) listPostgresUserDatabases(ctx context.Context) ([]string, error
|
||||
}
|
||||
}
|
||||
|
||||
s.log.Debug("Found user databases", "count", len(databases), "databases", databases, "raw_output", string(output))
|
||||
|
||||
return databases, nil
|
||||
}
|
||||
|
||||
|
||||
@@ -39,6 +39,8 @@ type BackupExecutionModel struct {
|
||||
dbTotal int
|
||||
dbDone int
|
||||
dbName string // Current database being backed up
|
||||
overallPhase int // 1=globals, 2=databases, 3=compressing
|
||||
phaseDesc string // Description of current phase
|
||||
}
|
||||
|
||||
// sharedBackupProgressState holds progress state that can be safely accessed from callbacks
|
||||
@@ -47,6 +49,8 @@ type sharedBackupProgressState struct {
|
||||
dbTotal int
|
||||
dbDone int
|
||||
dbName string
|
||||
overallPhase int // 1=globals, 2=databases, 3=compressing
|
||||
phaseDesc string // Description of current phase
|
||||
hasUpdate bool
|
||||
}
|
||||
|
||||
@@ -68,12 +72,12 @@ func clearCurrentBackupProgress() {
|
||||
currentBackupProgressState = nil
|
||||
}
|
||||
|
||||
func getCurrentBackupProgress() (dbTotal, dbDone int, dbName string, hasUpdate bool) {
|
||||
func getCurrentBackupProgress() (dbTotal, dbDone int, dbName string, overallPhase int, phaseDesc string, hasUpdate bool) {
|
||||
currentBackupProgressMu.Lock()
|
||||
defer currentBackupProgressMu.Unlock()
|
||||
|
||||
if currentBackupProgressState == nil {
|
||||
return 0, 0, "", false
|
||||
return 0, 0, "", 0, "", false
|
||||
}
|
||||
|
||||
currentBackupProgressState.mu.Lock()
|
||||
@@ -83,7 +87,8 @@ func getCurrentBackupProgress() (dbTotal, dbDone int, dbName string, hasUpdate b
|
||||
currentBackupProgressState.hasUpdate = false
|
||||
|
||||
return currentBackupProgressState.dbTotal, currentBackupProgressState.dbDone,
|
||||
currentBackupProgressState.dbName, hasUpdate
|
||||
currentBackupProgressState.dbName, currentBackupProgressState.overallPhase,
|
||||
currentBackupProgressState.phaseDesc, hasUpdate
|
||||
}
|
||||
|
||||
func NewBackupExecution(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, backupType, dbName string, ratio int) BackupExecutionModel {
|
||||
@@ -171,6 +176,8 @@ func executeBackupWithTUIProgress(parentCtx context.Context, cfg *config.Config,
|
||||
progressState.dbDone = done
|
||||
progressState.dbTotal = total
|
||||
progressState.dbName = currentDB
|
||||
progressState.overallPhase = 2 // Phase 2: Backing up databases
|
||||
progressState.phaseDesc = fmt.Sprintf("Phase 2/3: Databases (%d/%d)", done, total)
|
||||
progressState.hasUpdate = true
|
||||
progressState.mu.Unlock()
|
||||
})
|
||||
@@ -223,11 +230,13 @@ func (m BackupExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.spinnerFrame = (m.spinnerFrame + 1) % len(spinnerFrames)
|
||||
|
||||
// Poll for database progress updates from callbacks
|
||||
dbTotal, dbDone, dbName, hasUpdate := getCurrentBackupProgress()
|
||||
dbTotal, dbDone, dbName, overallPhase, phaseDesc, hasUpdate := getCurrentBackupProgress()
|
||||
if hasUpdate {
|
||||
m.dbTotal = dbTotal
|
||||
m.dbDone = dbDone
|
||||
m.dbName = dbName
|
||||
m.overallPhase = overallPhase
|
||||
m.phaseDesc = phaseDesc
|
||||
}
|
||||
|
||||
// Update status based on progress and elapsed time
|
||||
@@ -286,6 +295,20 @@ func (m BackupExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.InterruptMsg:
|
||||
// Handle Ctrl+C signal (SIGINT) - Bubbletea v1.3+ sends this instead of KeyMsg for ctrl+c
|
||||
if !m.done && !m.cancelling {
|
||||
m.cancelling = true
|
||||
m.status = "[STOP] Cancelling backup... (please wait)"
|
||||
if m.cancel != nil {
|
||||
m.cancel()
|
||||
}
|
||||
return m, nil
|
||||
} else if m.done {
|
||||
return m.parent, tea.Quit
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.KeyMsg:
|
||||
switch msg.String() {
|
||||
case "ctrl+c", "esc":
|
||||
@@ -361,41 +384,147 @@ func (m BackupExecutionModel) View() string {
|
||||
|
||||
// Status display
|
||||
if !m.done {
|
||||
// Show database progress bar if we have progress data (cluster backup)
|
||||
// Unified progress display for cluster backup
|
||||
if m.backupType == "cluster" {
|
||||
// Calculate overall progress across all phases
|
||||
// Phase 1: Globals (0-15%)
|
||||
// Phase 2: Databases (15-90%)
|
||||
// Phase 3: Compressing (90-100%)
|
||||
overallProgress := 0
|
||||
phaseLabel := "Starting..."
|
||||
|
||||
elapsedSec := int(time.Since(m.startTime).Seconds())
|
||||
|
||||
if m.overallPhase == 2 && m.dbTotal > 0 {
|
||||
// Phase 2: Database backups - contributes 15-90%
|
||||
dbPct := int((int64(m.dbDone) * 100) / int64(m.dbTotal))
|
||||
overallProgress = 15 + (dbPct * 75 / 100)
|
||||
phaseLabel = m.phaseDesc
|
||||
} else if elapsedSec < 5 {
|
||||
// Initial setup
|
||||
overallProgress = 2
|
||||
phaseLabel = "Phase 1/3: Initializing..."
|
||||
} else if m.dbTotal == 0 {
|
||||
// Phase 1: Globals backup (before databases start)
|
||||
overallProgress = 10
|
||||
phaseLabel = "Phase 1/3: Backing up Globals"
|
||||
}
|
||||
|
||||
// Header with phase and overall progress
|
||||
s.WriteString(infoStyle.Render(" ─── Cluster Backup Progress ──────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(fmt.Sprintf(" %s\n\n", phaseLabel))
|
||||
|
||||
// Overall progress bar
|
||||
s.WriteString(" Overall: ")
|
||||
s.WriteString(renderProgressBar(overallProgress))
|
||||
s.WriteString(fmt.Sprintf(" %d%%\n", overallProgress))
|
||||
|
||||
// Phase-specific details
|
||||
if m.dbTotal > 0 && m.dbDone > 0 {
|
||||
// Show progress bar instead of spinner when we have real progress
|
||||
// Show current database being backed up
|
||||
s.WriteString("\n")
|
||||
spinner := spinnerFrames[m.spinnerFrame]
|
||||
if m.dbName != "" && m.dbDone <= m.dbTotal {
|
||||
s.WriteString(fmt.Sprintf(" Current: %s %s\n", spinner, m.dbName))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
// Database progress bar
|
||||
progressBar := renderBackupDatabaseProgressBar(m.dbDone, m.dbTotal, m.dbName, 50)
|
||||
s.WriteString(progressBar + "\n")
|
||||
s.WriteString(fmt.Sprintf(" %s\n", m.status))
|
||||
} else {
|
||||
// Show spinner during initial phases
|
||||
if m.cancelling {
|
||||
s.WriteString(fmt.Sprintf(" %s %s\n", spinnerFrames[m.spinnerFrame], m.status))
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" %s %s\n", spinnerFrames[m.spinnerFrame], m.status))
|
||||
// Intermediate phase (globals)
|
||||
spinner := spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("\n %s %s\n\n", spinner, m.status))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
} else {
|
||||
// Single/sample database backup - simpler display
|
||||
spinner := spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf(" %s %s\n", spinner, m.status))
|
||||
}
|
||||
|
||||
if !m.cancelling {
|
||||
s.WriteString("\n [KEY] Press Ctrl+C or ESC to cancel\n")
|
||||
}
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" %s\n\n", m.status))
|
||||
|
||||
// Show completion summary with detailed stats
|
||||
if m.err != nil {
|
||||
s.WriteString(fmt.Sprintf(" [FAIL] Error: %v\n", m.err))
|
||||
} else if m.result != "" {
|
||||
// Parse and display result cleanly
|
||||
lines := strings.Split(m.result, "\n")
|
||||
for _, line := range lines {
|
||||
line = strings.TrimSpace(line)
|
||||
if line != "" {
|
||||
s.WriteString(" " + line + "\n")
|
||||
s.WriteString(errorStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("║ [FAIL] BACKUP FAILED ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(errorStyle.Render(fmt.Sprintf(" Error: %v", m.err)))
|
||||
s.WriteString("\n")
|
||||
} else {
|
||||
s.WriteString(successStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("║ [OK] BACKUP COMPLETED SUCCESSFULLY ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Summary section
|
||||
s.WriteString(infoStyle.Render(" ─── Summary ───────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Backup type specific info
|
||||
switch m.backupType {
|
||||
case "cluster":
|
||||
s.WriteString(" Type: Cluster Backup\n")
|
||||
if m.dbTotal > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Databases: %d backed up\n", m.dbTotal))
|
||||
}
|
||||
case "single":
|
||||
s.WriteString(" Type: Single Database Backup\n")
|
||||
s.WriteString(fmt.Sprintf(" Database: %s\n", m.databaseName))
|
||||
case "sample":
|
||||
s.WriteString(" Type: Sample Backup\n")
|
||||
s.WriteString(fmt.Sprintf(" Database: %s\n", m.databaseName))
|
||||
s.WriteString(fmt.Sprintf(" Sample Ratio: %d\n", m.ratio))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
}
|
||||
s.WriteString("\n [KEY] Press Enter or ESC to return to menu\n")
|
||||
|
||||
// Timing section (always shown, consistent with restore)
|
||||
s.WriteString(infoStyle.Render(" ─── Timing ────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
elapsed := time.Since(m.startTime)
|
||||
s.WriteString(fmt.Sprintf(" Total Time: %s\n", formatBackupDuration(elapsed)))
|
||||
|
||||
if m.backupType == "cluster" && m.dbTotal > 0 && m.err == nil {
|
||||
avgPerDB := elapsed / time.Duration(m.dbTotal)
|
||||
s.WriteString(fmt.Sprintf(" Avg per DB: %s\n", formatBackupDuration(avgPerDB)))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(infoStyle.Render(" [KEYS] Press Enter to continue"))
|
||||
}
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// formatBackupDuration formats duration in human readable format
|
||||
func formatBackupDuration(d time.Duration) string {
|
||||
if d < time.Minute {
|
||||
return fmt.Sprintf("%.1fs", d.Seconds())
|
||||
}
|
||||
if d < time.Hour {
|
||||
minutes := int(d.Minutes())
|
||||
seconds := int(d.Seconds()) % 60
|
||||
return fmt.Sprintf("%dm %ds", minutes, seconds)
|
||||
}
|
||||
hours := int(d.Hours())
|
||||
minutes := int(d.Minutes()) % 60
|
||||
return fmt.Sprintf("%dh %dm", hours, minutes)
|
||||
}
|
||||
|
||||
@@ -188,6 +188,21 @@ func (m *MenuModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.InterruptMsg:
|
||||
// Handle Ctrl+C signal (SIGINT) - Bubbletea v1.3+ sends this
|
||||
if m.cancel != nil {
|
||||
m.cancel()
|
||||
}
|
||||
|
||||
// Clean up any orphaned processes before exit
|
||||
m.logger.Info("Cleaning up processes before exit (SIGINT)")
|
||||
if err := cleanup.KillOrphanedProcesses(m.logger); err != nil {
|
||||
m.logger.Warn("Failed to clean up all processes", "error", err)
|
||||
}
|
||||
|
||||
m.quitting = true
|
||||
return m, tea.Quit
|
||||
|
||||
case tea.KeyMsg:
|
||||
switch msg.String() {
|
||||
case "ctrl+c", "q":
|
||||
@@ -284,9 +299,13 @@ func (m *MenuModel) View() string {
|
||||
|
||||
var s string
|
||||
|
||||
// Product branding header
|
||||
brandLine := fmt.Sprintf("dbbackup v%s • Enterprise Database Backup & Recovery", m.config.Version)
|
||||
s += "\n" + infoStyle.Render(brandLine) + "\n"
|
||||
|
||||
// Header
|
||||
header := titleStyle.Render("Database Backup Tool - Interactive Menu")
|
||||
s += fmt.Sprintf("\n%s\n\n", header)
|
||||
header := titleStyle.Render("Interactive Menu")
|
||||
s += fmt.Sprintf("%s\n\n", header)
|
||||
|
||||
if len(m.dbTypes) > 0 {
|
||||
options := make([]string, len(m.dbTypes))
|
||||
|
||||
@@ -57,6 +57,18 @@ type RestoreExecutionModel struct {
|
||||
dbTotal int
|
||||
dbDone int
|
||||
|
||||
// Current database being restored (for detailed display)
|
||||
currentDB string
|
||||
|
||||
// Timing info for database restore phase (ETA calculation)
|
||||
dbPhaseElapsed time.Duration // Elapsed time since restore phase started
|
||||
dbAvgPerDB time.Duration // Average time per database restore
|
||||
|
||||
// Overall progress tracking for unified display
|
||||
overallPhase int // 1=Extracting, 2=Globals, 3=Databases
|
||||
extractionDone bool
|
||||
extractionTime time.Duration // How long extraction took (for ETA calc)
|
||||
|
||||
// Results
|
||||
done bool
|
||||
cancelling bool // True when user has requested cancellation
|
||||
@@ -136,6 +148,21 @@ type sharedProgressState struct {
|
||||
dbTotal int
|
||||
dbDone int
|
||||
|
||||
// Current database being restored
|
||||
currentDB string
|
||||
|
||||
// Timing info for database restore phase
|
||||
dbPhaseElapsed time.Duration // Elapsed time since restore phase started
|
||||
dbAvgPerDB time.Duration // Average time per database restore
|
||||
|
||||
// Overall phase tracking (1=Extract, 2=Globals, 3=Databases)
|
||||
overallPhase int
|
||||
extractionDone bool
|
||||
|
||||
// Weighted progress by database sizes (bytes)
|
||||
dbBytesTotal int64 // Total bytes across all databases
|
||||
dbBytesDone int64 // Bytes completed (sum of finished DB sizes)
|
||||
|
||||
// Rolling window for speed calculation
|
||||
speedSamples []restoreSpeedSample
|
||||
}
|
||||
@@ -163,12 +190,12 @@ func clearCurrentRestoreProgress() {
|
||||
currentRestoreProgressState = nil
|
||||
}
|
||||
|
||||
func getCurrentRestoreProgress() (bytesTotal, bytesDone int64, description string, hasUpdate bool, dbTotal, dbDone int, speed float64) {
|
||||
func getCurrentRestoreProgress() (bytesTotal, bytesDone int64, description string, hasUpdate bool, dbTotal, dbDone int, speed float64, dbPhaseElapsed, dbAvgPerDB time.Duration, currentDB string, overallPhase int, extractionDone bool, dbBytesTotal, dbBytesDone int64) {
|
||||
currentRestoreProgressMu.Lock()
|
||||
defer currentRestoreProgressMu.Unlock()
|
||||
|
||||
if currentRestoreProgressState == nil {
|
||||
return 0, 0, "", false, 0, 0, 0
|
||||
return 0, 0, "", false, 0, 0, 0, 0, 0, "", 0, false, 0, 0
|
||||
}
|
||||
|
||||
currentRestoreProgressState.mu.Lock()
|
||||
@@ -179,7 +206,11 @@ func getCurrentRestoreProgress() (bytesTotal, bytesDone int64, description strin
|
||||
|
||||
return currentRestoreProgressState.bytesTotal, currentRestoreProgressState.bytesDone,
|
||||
currentRestoreProgressState.description, currentRestoreProgressState.hasUpdate,
|
||||
currentRestoreProgressState.dbTotal, currentRestoreProgressState.dbDone, speed
|
||||
currentRestoreProgressState.dbTotal, currentRestoreProgressState.dbDone, speed,
|
||||
currentRestoreProgressState.dbPhaseElapsed, currentRestoreProgressState.dbAvgPerDB,
|
||||
currentRestoreProgressState.currentDB, currentRestoreProgressState.overallPhase,
|
||||
currentRestoreProgressState.extractionDone,
|
||||
currentRestoreProgressState.dbBytesTotal, currentRestoreProgressState.dbBytesDone
|
||||
}
|
||||
|
||||
// calculateRollingSpeed calculates speed from recent samples (last 5 seconds)
|
||||
@@ -242,7 +273,20 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
defer dbClient.Close()
|
||||
|
||||
// STEP 1: Clean cluster if requested (drop all existing user databases)
|
||||
if restoreType == "restore-cluster" && cleanClusterFirst && len(existingDBs) > 0 {
|
||||
if restoreType == "restore-cluster" && cleanClusterFirst {
|
||||
// Re-detect databases at execution time to get current state
|
||||
// The preview list may be stale or detection may have failed earlier
|
||||
safety := restore.NewSafety(cfg, log)
|
||||
currentDBs, err := safety.ListUserDatabases(ctx)
|
||||
if err != nil {
|
||||
log.Warn("Failed to list databases for cleanup, using preview list", "error", err)
|
||||
currentDBs = existingDBs // Fall back to preview list
|
||||
} else if len(currentDBs) > 0 {
|
||||
log.Info("Re-detected user databases for cleanup", "count", len(currentDBs), "databases", currentDBs)
|
||||
existingDBs = currentDBs // Update with fresh list
|
||||
}
|
||||
|
||||
if len(existingDBs) > 0 {
|
||||
log.Info("Dropping existing user databases before cluster restore", "count", len(existingDBs))
|
||||
|
||||
// Drop databases using command-line psql (no connection required)
|
||||
@@ -262,6 +306,9 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
}
|
||||
|
||||
log.Info("Cluster cleanup completed", "dropped", droppedCount, "total", len(existingDBs))
|
||||
} else {
|
||||
log.Info("No user databases to clean up")
|
||||
}
|
||||
}
|
||||
|
||||
// STEP 2: Create restore engine with silent progress (no stdout interference with TUI)
|
||||
@@ -279,6 +326,14 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
progressState.bytesTotal = total
|
||||
progressState.description = description
|
||||
progressState.hasUpdate = true
|
||||
progressState.overallPhase = 1
|
||||
progressState.extractionDone = false
|
||||
|
||||
// Check if extraction is complete
|
||||
if current >= total && total > 0 {
|
||||
progressState.extractionDone = true
|
||||
progressState.overallPhase = 2
|
||||
}
|
||||
|
||||
// Add speed sample for rolling window calculation
|
||||
progressState.speedSamples = append(progressState.speedSamples, restoreSpeedSample{
|
||||
@@ -298,12 +353,47 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
progressState.dbDone = done
|
||||
progressState.dbTotal = total
|
||||
progressState.description = fmt.Sprintf("Restoring %s", dbName)
|
||||
progressState.currentDB = dbName
|
||||
progressState.overallPhase = 3
|
||||
progressState.extractionDone = true
|
||||
progressState.hasUpdate = true
|
||||
// Clear byte progress when switching to db progress
|
||||
progressState.bytesTotal = 0
|
||||
progressState.bytesDone = 0
|
||||
})
|
||||
|
||||
// Set up timing-aware database progress callback for cluster restore ETA
|
||||
engine.SetDatabaseProgressWithTimingCallback(func(done, total int, dbName string, phaseElapsed, avgPerDB time.Duration) {
|
||||
progressState.mu.Lock()
|
||||
defer progressState.mu.Unlock()
|
||||
progressState.dbDone = done
|
||||
progressState.dbTotal = total
|
||||
progressState.description = fmt.Sprintf("Restoring %s", dbName)
|
||||
progressState.currentDB = dbName
|
||||
progressState.overallPhase = 3
|
||||
progressState.extractionDone = true
|
||||
progressState.dbPhaseElapsed = phaseElapsed
|
||||
progressState.dbAvgPerDB = avgPerDB
|
||||
progressState.hasUpdate = true
|
||||
// Clear byte progress when switching to db progress
|
||||
progressState.bytesTotal = 0
|
||||
progressState.bytesDone = 0
|
||||
})
|
||||
|
||||
// Set up weighted (bytes-based) progress callback for accurate cluster restore progress
|
||||
engine.SetDatabaseProgressByBytesCallback(func(bytesDone, bytesTotal int64, dbName string, dbDone, dbTotal int) {
|
||||
progressState.mu.Lock()
|
||||
defer progressState.mu.Unlock()
|
||||
progressState.dbBytesDone = bytesDone
|
||||
progressState.dbBytesTotal = bytesTotal
|
||||
progressState.dbDone = dbDone
|
||||
progressState.dbTotal = dbTotal
|
||||
progressState.currentDB = dbName
|
||||
progressState.overallPhase = 3
|
||||
progressState.extractionDone = true
|
||||
progressState.hasUpdate = true
|
||||
})
|
||||
|
||||
// Store progress state in a package-level variable for the ticker to access
|
||||
// This is a workaround because tea messages can't be sent from callbacks
|
||||
setCurrentRestoreProgress(progressState)
|
||||
@@ -357,26 +447,54 @@ func (m RestoreExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.elapsed = time.Since(m.startTime)
|
||||
|
||||
// Poll shared progress state for real-time updates
|
||||
bytesTotal, bytesDone, description, hasUpdate, dbTotal, dbDone, speed := getCurrentRestoreProgress()
|
||||
if hasUpdate && bytesTotal > 0 {
|
||||
bytesTotal, bytesDone, description, hasUpdate, dbTotal, dbDone, speed, dbPhaseElapsed, dbAvgPerDB, currentDB, overallPhase, extractionDone, dbBytesTotal, dbBytesDone := getCurrentRestoreProgress()
|
||||
if hasUpdate && bytesTotal > 0 && !extractionDone {
|
||||
// Phase 1: Extraction
|
||||
m.bytesTotal = bytesTotal
|
||||
m.bytesDone = bytesDone
|
||||
m.description = description
|
||||
m.showBytes = true
|
||||
m.speed = speed
|
||||
m.overallPhase = 1
|
||||
m.extractionDone = false
|
||||
|
||||
// Update status to reflect actual progress
|
||||
m.status = description
|
||||
m.phase = "Extracting"
|
||||
m.phase = "Phase 1/3: Extracting Archive"
|
||||
m.progress = int((bytesDone * 100) / bytesTotal)
|
||||
} else if hasUpdate && dbTotal > 0 {
|
||||
// Database count progress for cluster restore
|
||||
// Phase 3: Database restores
|
||||
m.dbTotal = dbTotal
|
||||
m.dbDone = dbDone
|
||||
m.dbPhaseElapsed = dbPhaseElapsed
|
||||
m.dbAvgPerDB = dbAvgPerDB
|
||||
m.currentDB = currentDB
|
||||
m.overallPhase = overallPhase
|
||||
m.extractionDone = extractionDone
|
||||
m.showBytes = false
|
||||
m.status = fmt.Sprintf("Restoring database %d of %d...", dbDone+1, dbTotal)
|
||||
m.phase = "Restore"
|
||||
|
||||
if dbDone < dbTotal {
|
||||
m.status = fmt.Sprintf("Restoring: %s", currentDB)
|
||||
} else {
|
||||
m.status = "Finalizing..."
|
||||
}
|
||||
|
||||
// Use weighted progress by bytes if available, otherwise use count
|
||||
if dbBytesTotal > 0 {
|
||||
weightedPercent := int((dbBytesDone * 100) / dbBytesTotal)
|
||||
m.phase = fmt.Sprintf("Phase 3/3: Databases (%d/%d) - %.1f%% by size", dbDone, dbTotal, float64(dbBytesDone*100)/float64(dbBytesTotal))
|
||||
m.progress = weightedPercent
|
||||
} else {
|
||||
m.phase = fmt.Sprintf("Phase 3/3: Databases (%d/%d)", dbDone, dbTotal)
|
||||
m.progress = int((dbDone * 100) / dbTotal)
|
||||
}
|
||||
} else if hasUpdate && extractionDone && dbTotal == 0 {
|
||||
// Phase 2: Globals restore (brief phase between extraction and databases)
|
||||
m.overallPhase = 2
|
||||
m.extractionDone = true
|
||||
m.showBytes = false
|
||||
m.status = "Restoring global objects (roles, tablespaces)..."
|
||||
m.phase = "Phase 2/3: Restoring Globals"
|
||||
} else {
|
||||
// Fallback: Update status based on elapsed time to show progress
|
||||
// This provides visual feedback even though we don't have real-time progress
|
||||
@@ -461,6 +579,21 @@ func (m RestoreExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.InterruptMsg:
|
||||
// Handle Ctrl+C signal (SIGINT) - Bubbletea v1.3+ sends this instead of KeyMsg for ctrl+c
|
||||
if !m.done && !m.cancelling {
|
||||
m.cancelling = true
|
||||
m.status = "[STOP] Cancelling restore... (please wait)"
|
||||
m.phase = "Cancelling"
|
||||
if m.cancel != nil {
|
||||
m.cancel()
|
||||
}
|
||||
return m, nil
|
||||
} else if m.done {
|
||||
return m.parent, tea.Quit
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.KeyMsg:
|
||||
switch msg.String() {
|
||||
case "ctrl+c", "esc":
|
||||
@@ -518,53 +651,160 @@ func (m RestoreExecutionModel) View() string {
|
||||
s.WriteString("\n")
|
||||
|
||||
if m.done {
|
||||
// Show result
|
||||
// Show result with comprehensive summary
|
||||
if m.err != nil {
|
||||
s.WriteString(errorStyle.Render("[FAIL] Restore Failed"))
|
||||
s.WriteString(errorStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("║ [FAIL] RESTORE FAILED ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(errorStyle.Render(fmt.Sprintf("Error: %v", m.err)))
|
||||
|
||||
// Parse and display error in a clean, structured format
|
||||
errStr := m.err.Error()
|
||||
|
||||
// Extract key parts from the error message
|
||||
errDisplay := formatRestoreError(errStr)
|
||||
s.WriteString(errDisplay)
|
||||
s.WriteString("\n")
|
||||
} else {
|
||||
s.WriteString(successStyle.Render("[OK] Restore Completed Successfully"))
|
||||
s.WriteString(successStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("║ [OK] RESTORE COMPLETED SUCCESSFULLY ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(successStyle.Render(m.result))
|
||||
|
||||
// Summary section
|
||||
s.WriteString(infoStyle.Render(" ─── Summary ───────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Archive info
|
||||
s.WriteString(fmt.Sprintf(" Archive: %s\n", m.archive.Name))
|
||||
if m.archive.Size > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Archive Size: %s\n", FormatBytes(m.archive.Size)))
|
||||
}
|
||||
|
||||
// Restore type specific info
|
||||
if m.restoreType == "restore-cluster" {
|
||||
s.WriteString(fmt.Sprintf(" Type: Cluster Restore\n"))
|
||||
if m.dbTotal > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Databases: %d restored\n", m.dbTotal))
|
||||
}
|
||||
if m.cleanClusterFirst && len(m.existingDBs) > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Cleaned: %d existing database(s) dropped\n", len(m.existingDBs)))
|
||||
}
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" Type: Single Database Restore\n"))
|
||||
s.WriteString(fmt.Sprintf(" Target DB: %s\n", m.targetDB))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
s.WriteString(fmt.Sprintf("\nElapsed Time: %s\n", formatDuration(m.elapsed)))
|
||||
// Timing section
|
||||
s.WriteString(infoStyle.Render(" ─── Timing ────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(fmt.Sprintf(" Total Time: %s\n", formatDuration(m.elapsed)))
|
||||
|
||||
// Calculate and show throughput if we have size info
|
||||
if m.archive.Size > 0 && m.elapsed.Seconds() > 0 {
|
||||
throughput := float64(m.archive.Size) / m.elapsed.Seconds()
|
||||
s.WriteString(fmt.Sprintf(" Throughput: %s/s (average)\n", FormatBytes(int64(throughput))))
|
||||
}
|
||||
|
||||
if m.dbTotal > 0 && m.err == nil {
|
||||
avgPerDB := m.elapsed / time.Duration(m.dbTotal)
|
||||
s.WriteString(fmt.Sprintf(" Avg per DB: %s\n", formatDuration(avgPerDB)))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(infoStyle.Render(" [KEYS] Press Enter to continue"))
|
||||
} else {
|
||||
// Show progress
|
||||
// Show unified progress for cluster restore
|
||||
if m.restoreType == "restore-cluster" {
|
||||
// Calculate overall progress across all phases
|
||||
// Phase 1: Extraction (0-60%)
|
||||
// Phase 2: Globals (60-65%)
|
||||
// Phase 3: Databases (65-100%)
|
||||
overallProgress := 0
|
||||
phaseLabel := "Starting..."
|
||||
|
||||
if m.showBytes && m.bytesTotal > 0 {
|
||||
// Phase 1: Extraction - contributes 0-60%
|
||||
extractPct := int((m.bytesDone * 100) / m.bytesTotal)
|
||||
overallProgress = (extractPct * 60) / 100
|
||||
phaseLabel = "Phase 1/3: Extracting Archive"
|
||||
} else if m.extractionDone && m.dbTotal == 0 {
|
||||
// Phase 2: Globals restore
|
||||
overallProgress = 62
|
||||
phaseLabel = "Phase 2/3: Restoring Globals"
|
||||
} else if m.dbTotal > 0 {
|
||||
// Phase 3: Database restores - contributes 65-100%
|
||||
dbPct := int((int64(m.dbDone) * 100) / int64(m.dbTotal))
|
||||
overallProgress = 65 + (dbPct * 35 / 100)
|
||||
phaseLabel = fmt.Sprintf("Phase 3/3: Databases (%d/%d)", m.dbDone, m.dbTotal)
|
||||
}
|
||||
|
||||
// Header with phase and overall progress
|
||||
s.WriteString(infoStyle.Render(" ─── Cluster Restore Progress ─────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(fmt.Sprintf(" %s\n\n", phaseLabel))
|
||||
|
||||
// Overall progress bar
|
||||
s.WriteString(" Overall: ")
|
||||
s.WriteString(renderProgressBar(overallProgress))
|
||||
s.WriteString(fmt.Sprintf(" %d%%\n", overallProgress))
|
||||
|
||||
// Phase-specific details
|
||||
if m.showBytes && m.bytesTotal > 0 {
|
||||
// Show extraction details
|
||||
s.WriteString("\n")
|
||||
s.WriteString(fmt.Sprintf(" %s\n", m.status))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(renderDetailedProgressBarWithSpeed(m.bytesDone, m.bytesTotal, m.speed))
|
||||
s.WriteString("\n")
|
||||
} else if m.dbTotal > 0 {
|
||||
// Show current database being restored
|
||||
s.WriteString("\n")
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
if m.currentDB != "" && m.dbDone < m.dbTotal {
|
||||
s.WriteString(fmt.Sprintf(" Current: %s %s\n", spinner, m.currentDB))
|
||||
} else if m.dbDone >= m.dbTotal {
|
||||
s.WriteString(fmt.Sprintf(" %s Finalizing...\n", spinner))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
// Database progress bar with timing
|
||||
s.WriteString(renderDatabaseProgressBarWithTiming(m.dbDone, m.dbTotal, m.dbPhaseElapsed, m.dbAvgPerDB))
|
||||
s.WriteString("\n")
|
||||
} else {
|
||||
// Intermediate phase (globals)
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("\n %s %s\n\n", spinner, m.status))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
} else {
|
||||
// Single database restore - simpler display
|
||||
s.WriteString(fmt.Sprintf("Phase: %s\n", m.phase))
|
||||
|
||||
// Show detailed progress bar when we have byte-level information
|
||||
// In this case, hide the spinner for cleaner display
|
||||
if m.showBytes && m.bytesTotal > 0 {
|
||||
// Status line without spinner (progress bar provides activity indication)
|
||||
s.WriteString(fmt.Sprintf("Status: %s\n", m.status))
|
||||
s.WriteString("\n")
|
||||
|
||||
// Render schollz-style progress bar with bytes, rolling speed, ETA
|
||||
s.WriteString(renderDetailedProgressBarWithSpeed(m.bytesDone, m.bytesTotal, m.speed))
|
||||
s.WriteString("\n\n")
|
||||
} else if m.dbTotal > 0 {
|
||||
// Database count progress for cluster restore
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("Status: %s %s\n", spinner, m.status))
|
||||
s.WriteString("\n")
|
||||
|
||||
// Show database progress bar
|
||||
s.WriteString(renderDatabaseProgressBar(m.dbDone, m.dbTotal))
|
||||
s.WriteString("\n\n")
|
||||
} else {
|
||||
// Show status with rotating spinner (for phases without detailed progress)
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("Status: %s %s\n", spinner, m.status))
|
||||
s.WriteString("\n")
|
||||
|
||||
if m.restoreType == "restore-single" {
|
||||
// Fallback to simple progress bar for single database restore
|
||||
// Fallback to simple progress bar
|
||||
progressBar := renderProgressBar(m.progress)
|
||||
s.WriteString(progressBar)
|
||||
s.WriteString(fmt.Sprintf(" %d%%\n", m.progress))
|
||||
@@ -678,6 +918,55 @@ func renderDatabaseProgressBar(done, total int) string {
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// renderDatabaseProgressBarWithTiming renders a progress bar for database count with timing and ETA
|
||||
func renderDatabaseProgressBarWithTiming(done, total int, phaseElapsed, avgPerDB time.Duration) string {
|
||||
var s strings.Builder
|
||||
|
||||
// Calculate percentage
|
||||
percent := 0
|
||||
if total > 0 {
|
||||
percent = (done * 100) / total
|
||||
if percent > 100 {
|
||||
percent = 100
|
||||
}
|
||||
}
|
||||
|
||||
// Render progress bar
|
||||
width := 30
|
||||
filled := (percent * width) / 100
|
||||
barFilled := strings.Repeat("█", filled)
|
||||
barEmpty := strings.Repeat("░", width-filled)
|
||||
|
||||
s.WriteString(successStyle.Render("["))
|
||||
s.WriteString(successStyle.Render(barFilled))
|
||||
s.WriteString(infoStyle.Render(barEmpty))
|
||||
s.WriteString(successStyle.Render("]"))
|
||||
|
||||
// Count and percentage
|
||||
s.WriteString(fmt.Sprintf(" %3d%% %d / %d databases", percent, done, total))
|
||||
|
||||
// Timing and ETA
|
||||
if phaseElapsed > 0 {
|
||||
s.WriteString(fmt.Sprintf(" [%s", FormatDurationShort(phaseElapsed)))
|
||||
|
||||
// Calculate ETA based on average time per database
|
||||
if avgPerDB > 0 && done < total {
|
||||
remainingDBs := total - done
|
||||
eta := time.Duration(remainingDBs) * avgPerDB
|
||||
s.WriteString(fmt.Sprintf(" / ETA: %s", FormatDurationShort(eta)))
|
||||
} else if done > 0 && done < total {
|
||||
// Fallback: estimate ETA from overall elapsed time
|
||||
avgElapsed := phaseElapsed / time.Duration(done)
|
||||
remainingDBs := total - done
|
||||
eta := time.Duration(remainingDBs) * avgElapsed
|
||||
s.WriteString(fmt.Sprintf(" / ETA: ~%s", FormatDurationShort(eta)))
|
||||
}
|
||||
s.WriteString("]")
|
||||
}
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// formatDuration formats duration in human readable format
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Minute {
|
||||
@@ -722,3 +1011,188 @@ func dropDatabaseCLI(ctx context.Context, cfg *config.Config, dbName string) err
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// formatRestoreError formats a restore error message for clean TUI display
|
||||
func formatRestoreError(errStr string) string {
|
||||
var s strings.Builder
|
||||
maxLineWidth := 60
|
||||
|
||||
// Common patterns to extract
|
||||
patterns := []struct {
|
||||
key string
|
||||
pattern string
|
||||
}{
|
||||
{"Error Type", "ERROR:"},
|
||||
{"Hint", "HINT:"},
|
||||
{"Last Error", "last error:"},
|
||||
{"Total Errors", "total errors:"},
|
||||
}
|
||||
|
||||
// First, try to extract a clean error summary
|
||||
errLines := strings.Split(errStr, "\n")
|
||||
|
||||
// Find the main error message (first line or first ERROR:)
|
||||
mainError := ""
|
||||
hint := ""
|
||||
totalErrors := ""
|
||||
dbsFailed := []string{}
|
||||
|
||||
for _, line := range errLines {
|
||||
line = strings.TrimSpace(line)
|
||||
if line == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
// Extract ERROR messages
|
||||
if strings.Contains(line, "ERROR:") {
|
||||
if mainError == "" {
|
||||
// Get just the ERROR part
|
||||
idx := strings.Index(line, "ERROR:")
|
||||
if idx >= 0 {
|
||||
mainError = strings.TrimSpace(line[idx:])
|
||||
// Truncate if too long
|
||||
if len(mainError) > maxLineWidth {
|
||||
mainError = mainError[:maxLineWidth-3] + "..."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Extract HINT
|
||||
if strings.Contains(line, "HINT:") {
|
||||
idx := strings.Index(line, "HINT:")
|
||||
if idx >= 0 {
|
||||
hint = strings.TrimSpace(line[idx+5:])
|
||||
if len(hint) > maxLineWidth {
|
||||
hint = hint[:maxLineWidth-3] + "..."
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Extract total errors count
|
||||
if strings.Contains(line, "total errors:") {
|
||||
idx := strings.Index(line, "total errors:")
|
||||
if idx >= 0 {
|
||||
totalErrors = strings.TrimSpace(line[idx+13:])
|
||||
// Just extract the number
|
||||
parts := strings.Fields(totalErrors)
|
||||
if len(parts) > 0 {
|
||||
totalErrors = parts[0]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Extract failed database names (for cluster restore)
|
||||
if strings.Contains(line, ": restore failed:") {
|
||||
parts := strings.SplitN(line, ":", 2)
|
||||
if len(parts) > 0 {
|
||||
dbName := strings.TrimSpace(parts[0])
|
||||
if dbName != "" && !strings.HasPrefix(dbName, "Error") {
|
||||
dbsFailed = append(dbsFailed, dbName)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// If no structured error found, use the first line
|
||||
if mainError == "" {
|
||||
firstLine := errStr
|
||||
if idx := strings.Index(errStr, "\n"); idx > 0 {
|
||||
firstLine = errStr[:idx]
|
||||
}
|
||||
if len(firstLine) > maxLineWidth*2 {
|
||||
firstLine = firstLine[:maxLineWidth*2-3] + "..."
|
||||
}
|
||||
mainError = firstLine
|
||||
}
|
||||
|
||||
// Build structured error display
|
||||
s.WriteString(infoStyle.Render(" ─── Error Details ─────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Error type detection
|
||||
errorType := "critical"
|
||||
if strings.Contains(errStr, "out of shared memory") || strings.Contains(errStr, "max_locks_per_transaction") {
|
||||
errorType = "critical"
|
||||
} else if strings.Contains(errStr, "connection") {
|
||||
errorType = "connection"
|
||||
} else if strings.Contains(errStr, "permission") || strings.Contains(errStr, "access") {
|
||||
errorType = "permission"
|
||||
}
|
||||
|
||||
s.WriteString(fmt.Sprintf(" Type: %s\n", errorType))
|
||||
s.WriteString(fmt.Sprintf(" Message: %s\n", mainError))
|
||||
|
||||
if hint != "" {
|
||||
s.WriteString(fmt.Sprintf(" Hint: %s\n", hint))
|
||||
}
|
||||
|
||||
if totalErrors != "" {
|
||||
s.WriteString(fmt.Sprintf(" Total Errors: %s\n", totalErrors))
|
||||
}
|
||||
|
||||
// Show failed databases (max 5)
|
||||
if len(dbsFailed) > 0 {
|
||||
s.WriteString("\n")
|
||||
s.WriteString(" Failed Databases:\n")
|
||||
for i, db := range dbsFailed {
|
||||
if i >= 5 {
|
||||
s.WriteString(fmt.Sprintf(" ... and %d more\n", len(dbsFailed)-5))
|
||||
break
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" • %s\n", db))
|
||||
}
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ─── Diagnosis ─────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Provide specific recommendations based on error
|
||||
if strings.Contains(errStr, "out of shared memory") || strings.Contains(errStr, "max_locks_per_transaction") {
|
||||
s.WriteString(errorStyle.Render(" • Cannot access file: stat : no such file or directory\n"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" Lock table exhausted. Total capacity = max_locks_per_transaction\n")
|
||||
s.WriteString(" × (max_connections + max_prepared_transactions).\n\n")
|
||||
s.WriteString(" If you reduced VM size or max_connections, you need higher\n")
|
||||
s.WriteString(" max_locks_per_transaction to compensate.\n\n")
|
||||
s.WriteString(successStyle.Render(" FIX OPTIONS:\n"))
|
||||
s.WriteString(" 1. Enable 'Large DB Mode' in Settings\n")
|
||||
s.WriteString(" (press 'l' to toggle, reduces parallelism, increases locks)\n\n")
|
||||
s.WriteString(" 2. Increase PostgreSQL locks:\n")
|
||||
s.WriteString(" ALTER SYSTEM SET max_locks_per_transaction = 4096;\n")
|
||||
s.WriteString(" Then RESTART PostgreSQL.\n\n")
|
||||
s.WriteString(" 3. Reduce parallel jobs:\n")
|
||||
s.WriteString(" Set Cluster Parallelism = 1 in Settings\n")
|
||||
} else if strings.Contains(errStr, "connection") || strings.Contains(errStr, "refused") {
|
||||
s.WriteString(" • Database connection failed\n\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" 1. Check database is running\n")
|
||||
s.WriteString(" 2. Verify host, port, and credentials in Settings\n")
|
||||
s.WriteString(" 3. Check firewall/network connectivity\n")
|
||||
} else if strings.Contains(errStr, "permission") || strings.Contains(errStr, "denied") {
|
||||
s.WriteString(" • Permission denied\n\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" 1. Verify database user has sufficient privileges\n")
|
||||
s.WriteString(" 2. Grant CREATE/DROP DATABASE permissions if restoring cluster\n")
|
||||
s.WriteString(" 3. Check file system permissions on backup directory\n")
|
||||
} else {
|
||||
s.WriteString(" See error message above for details.\n\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] General Recommendations ────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" 1. Check the full error log for details\n")
|
||||
s.WriteString(" 2. Try restoring with 'conservative' profile (press 'c')\n")
|
||||
s.WriteString(" 3. For complex databases, enable 'Large DB Mode' (press 'l')\n")
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
|
||||
// Suppress the pattern variable since we don't use it but defined it
|
||||
_ = patterns
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
@@ -55,6 +55,7 @@ type RestorePreviewModel struct {
|
||||
cleanClusterFirst bool // For cluster restore: drop all user databases first
|
||||
existingDBCount int // Number of existing user databases
|
||||
existingDBs []string // List of existing user databases
|
||||
existingDBError string // Error message if database listing failed
|
||||
safetyChecks []SafetyCheck
|
||||
checking bool
|
||||
canProceed bool
|
||||
@@ -102,6 +103,7 @@ type safetyCheckCompleteMsg struct {
|
||||
canProceed bool
|
||||
existingDBCount int
|
||||
existingDBs []string
|
||||
existingDBError string
|
||||
}
|
||||
|
||||
func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string) tea.Cmd {
|
||||
@@ -221,10 +223,12 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
|
||||
check = SafetyCheck{Name: "Existing databases", Status: "checking", Critical: false}
|
||||
|
||||
// Get list of existing user databases (exclude templates and system DBs)
|
||||
var existingDBError string
|
||||
dbList, err := safety.ListUserDatabases(ctx)
|
||||
if err != nil {
|
||||
check.Status = "warning"
|
||||
check.Message = fmt.Sprintf("Cannot list databases: %v", err)
|
||||
existingDBError = err.Error()
|
||||
} else {
|
||||
existingDBCount = len(dbList)
|
||||
existingDBs = dbList
|
||||
@@ -238,6 +242,14 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
|
||||
}
|
||||
}
|
||||
checks = append(checks, check)
|
||||
|
||||
return safetyCheckCompleteMsg{
|
||||
checks: checks,
|
||||
canProceed: canProceed,
|
||||
existingDBCount: existingDBCount,
|
||||
existingDBs: existingDBs,
|
||||
existingDBError: existingDBError,
|
||||
}
|
||||
}
|
||||
|
||||
return safetyCheckCompleteMsg{
|
||||
@@ -257,6 +269,7 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.canProceed = msg.canProceed
|
||||
m.existingDBCount = msg.existingDBCount
|
||||
m.existingDBs = msg.existingDBs
|
||||
m.existingDBError = msg.existingDBError
|
||||
// Auto-forward in auto-confirm mode
|
||||
if m.config.TUIAutoConfirm {
|
||||
return m.parent, tea.Quit
|
||||
@@ -275,10 +288,17 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
|
||||
case "c":
|
||||
if m.mode == "restore-cluster" {
|
||||
// Toggle cluster cleanup
|
||||
// Toggle cluster cleanup - databases will be re-detected at execution time
|
||||
m.cleanClusterFirst = !m.cleanClusterFirst
|
||||
if m.cleanClusterFirst {
|
||||
if m.existingDBError != "" {
|
||||
// Detection failed in preview - will re-detect at execution
|
||||
m.message = checkWarningStyle.Render("[WARN] Will clean existing databases before restore (detection pending)")
|
||||
} else if m.existingDBCount > 0 {
|
||||
m.message = checkWarningStyle.Render(fmt.Sprintf("[WARN] Will drop %d existing database(s) before restore", m.existingDBCount))
|
||||
} else {
|
||||
m.message = infoStyle.Render("[INFO] Cleanup enabled (no databases currently detected)")
|
||||
}
|
||||
} else {
|
||||
m.message = fmt.Sprintf("Clean cluster first: disabled")
|
||||
}
|
||||
@@ -382,7 +402,27 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString("\n")
|
||||
s.WriteString(fmt.Sprintf(" Host: %s:%d\n", m.config.Host, m.config.Port))
|
||||
|
||||
if m.existingDBCount > 0 {
|
||||
// Show Resource Profile and CPU Workload settings
|
||||
profile := m.config.GetCurrentProfile()
|
||||
if profile != nil {
|
||||
s.WriteString(fmt.Sprintf(" Resource Profile: %s (Parallel:%d, Jobs:%d)\n",
|
||||
profile.Name, profile.ClusterParallelism, profile.Jobs))
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" Resource Profile: %s\n", m.config.ResourceProfile))
|
||||
}
|
||||
// Show Large DB Mode status
|
||||
if m.config.LargeDBMode {
|
||||
s.WriteString(" Large DB Mode: ON (reduced parallelism, high locks)\n")
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" CPU Workload: %s\n", m.config.CPUWorkloadType))
|
||||
s.WriteString(fmt.Sprintf(" Cluster Parallelism: %d databases\n", m.config.ClusterParallelism))
|
||||
|
||||
if m.existingDBError != "" {
|
||||
// Show warning when database listing failed - but still allow cleanup toggle
|
||||
s.WriteString(checkWarningStyle.Render(" Existing Databases: Detection failed\n"))
|
||||
s.WriteString(infoStyle.Render(fmt.Sprintf(" (%s)\n", m.existingDBError)))
|
||||
s.WriteString(infoStyle.Render(" (Will re-detect at restore time)\n"))
|
||||
} else if m.existingDBCount > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Existing Databases: %d found\n", m.existingDBCount))
|
||||
|
||||
// Show first few database names
|
||||
@@ -395,16 +435,19 @@ func (m RestorePreviewModel) View() string {
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" - %s\n", db))
|
||||
}
|
||||
} else {
|
||||
s.WriteString(" Existing Databases: None (clean slate)\n")
|
||||
}
|
||||
|
||||
// Always show cleanup toggle for cluster restore
|
||||
cleanIcon := "[N]"
|
||||
cleanStyle := infoStyle
|
||||
if m.cleanClusterFirst {
|
||||
cleanIcon = "[Y]"
|
||||
cleanIcon := "[Y]"
|
||||
cleanStyle = checkWarningStyle
|
||||
}
|
||||
s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s %v (press 'c' to toggle)\n", cleanIcon, m.cleanClusterFirst)))
|
||||
s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s enabled (press 'c' to toggle)\n", cleanIcon)))
|
||||
} else {
|
||||
s.WriteString(" Existing Databases: None (clean slate)\n")
|
||||
s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s disabled (press 'c' to toggle)\n", cleanIcon)))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
}
|
||||
@@ -453,10 +496,18 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString(infoStyle.Render(" All existing data in target database will be dropped!"))
|
||||
s.WriteString("\n\n")
|
||||
}
|
||||
if m.cleanClusterFirst && m.existingDBCount > 0 {
|
||||
if m.cleanClusterFirst {
|
||||
s.WriteString(checkWarningStyle.Render("[DANGER] WARNING: Cluster cleanup enabled"))
|
||||
s.WriteString("\n")
|
||||
if m.existingDBError != "" {
|
||||
s.WriteString(checkWarningStyle.Render(" Existing databases will be DROPPED before restore!"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" (Database count will be detected at restore time)"))
|
||||
} else if m.existingDBCount > 0 {
|
||||
s.WriteString(checkWarningStyle.Render(fmt.Sprintf(" %d existing database(s) will be DROPPED before restore!", m.existingDBCount)))
|
||||
} else {
|
||||
s.WriteString(infoStyle.Render(" No databases currently detected - cleanup will verify at restore time"))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" This ensures a clean disaster recovery scenario"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
@@ -10,6 +10,7 @@ import (
|
||||
"github.com/charmbracelet/lipgloss"
|
||||
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/cpu"
|
||||
"dbbackup/internal/logger"
|
||||
)
|
||||
|
||||
@@ -101,6 +102,65 @@ func NewSettingsModel(cfg *config.Config, log logger.Logger, parent tea.Model) S
|
||||
Type: "selector",
|
||||
Description: "CPU workload profile (press Enter to cycle: Balanced → CPU-Intensive → I/O-Intensive)",
|
||||
},
|
||||
{
|
||||
Key: "resource_profile",
|
||||
DisplayName: "Resource Profile",
|
||||
Value: func(c *config.Config) string {
|
||||
profile := c.GetCurrentProfile()
|
||||
if profile != nil {
|
||||
return fmt.Sprintf("%s (P:%d J:%d)", profile.Name, profile.ClusterParallelism, profile.Jobs)
|
||||
}
|
||||
return c.ResourceProfile
|
||||
},
|
||||
Update: func(c *config.Config, v string) error {
|
||||
profiles := []string{"conservative", "balanced", "performance", "max-performance"}
|
||||
currentIdx := 0
|
||||
for i, p := range profiles {
|
||||
if c.ResourceProfile == p {
|
||||
currentIdx = i
|
||||
break
|
||||
}
|
||||
}
|
||||
nextIdx := (currentIdx + 1) % len(profiles)
|
||||
return c.ApplyResourceProfile(profiles[nextIdx])
|
||||
},
|
||||
Type: "selector",
|
||||
Description: "Resource profile for VM capacity. Toggle 'l' for Large DB Mode on any profile.",
|
||||
},
|
||||
{
|
||||
Key: "large_db_mode",
|
||||
DisplayName: "Large DB Mode",
|
||||
Value: func(c *config.Config) string {
|
||||
if c.LargeDBMode {
|
||||
return "ON (↓parallelism, ↑locks)"
|
||||
}
|
||||
return "OFF"
|
||||
},
|
||||
Update: func(c *config.Config, v string) error {
|
||||
c.LargeDBMode = !c.LargeDBMode
|
||||
return nil
|
||||
},
|
||||
Type: "selector",
|
||||
Description: "Enable for databases with many tables/LOBs. Reduces parallelism, increases max_locks_per_transaction.",
|
||||
},
|
||||
{
|
||||
Key: "cluster_parallelism",
|
||||
DisplayName: "Cluster Parallelism",
|
||||
Value: func(c *config.Config) string { return fmt.Sprintf("%d", c.ClusterParallelism) },
|
||||
Update: func(c *config.Config, v string) error {
|
||||
val, err := strconv.Atoi(v)
|
||||
if err != nil {
|
||||
return fmt.Errorf("cluster parallelism must be a number")
|
||||
}
|
||||
if val < 1 {
|
||||
return fmt.Errorf("cluster parallelism must be at least 1")
|
||||
}
|
||||
c.ClusterParallelism = val
|
||||
return nil
|
||||
},
|
||||
Type: "int",
|
||||
Description: "Concurrent databases during cluster backup/restore (1=sequential, safer for large DBs)",
|
||||
},
|
||||
{
|
||||
Key: "backup_dir",
|
||||
DisplayName: "Backup Directory",
|
||||
@@ -528,12 +588,70 @@ func (m SettingsModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
|
||||
case "s":
|
||||
return m.saveSettings()
|
||||
|
||||
case "l":
|
||||
// Quick shortcut: Toggle Large DB Mode
|
||||
return m.toggleLargeDBMode()
|
||||
|
||||
case "c":
|
||||
// Quick shortcut: Apply "conservative" profile for constrained VMs
|
||||
return m.applyConservativeProfile()
|
||||
|
||||
case "p":
|
||||
// Show profile recommendation
|
||||
return m.showProfileRecommendation()
|
||||
}
|
||||
}
|
||||
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// toggleLargeDBMode toggles the Large DB Mode flag
|
||||
func (m SettingsModel) toggleLargeDBMode() (tea.Model, tea.Cmd) {
|
||||
m.config.LargeDBMode = !m.config.LargeDBMode
|
||||
if m.config.LargeDBMode {
|
||||
profile := m.config.GetCurrentProfile()
|
||||
m.message = successStyle.Render(fmt.Sprintf(
|
||||
"[ON] Large DB Mode enabled: %s → Parallel=%d, Jobs=%d, MaxLocks=%d",
|
||||
profile.Name, profile.ClusterParallelism, profile.Jobs, profile.MaxLocksPerTxn))
|
||||
} else {
|
||||
profile := m.config.GetCurrentProfile()
|
||||
m.message = successStyle.Render(fmt.Sprintf(
|
||||
"[OFF] Large DB Mode disabled: %s → Parallel=%d, Jobs=%d",
|
||||
profile.Name, profile.ClusterParallelism, profile.Jobs))
|
||||
}
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// applyConservativeProfile applies the conservative profile for constrained VMs
|
||||
func (m SettingsModel) applyConservativeProfile() (tea.Model, tea.Cmd) {
|
||||
if err := m.config.ApplyResourceProfile("conservative"); err != nil {
|
||||
m.message = errorStyle.Render(fmt.Sprintf("[FAIL] %s", err.Error()))
|
||||
return m, nil
|
||||
}
|
||||
m.message = successStyle.Render("[OK] Applied 'conservative' profile: Cluster=1, Jobs=1. Safe for small VMs with limited memory.")
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// showProfileRecommendation displays the recommended profile based on system resources
|
||||
func (m SettingsModel) showProfileRecommendation() (tea.Model, tea.Cmd) {
|
||||
profileName, reason := m.config.GetResourceProfileRecommendation(false)
|
||||
|
||||
var largeDBHint string
|
||||
if m.config.LargeDBMode {
|
||||
largeDBHint = "Large DB Mode: ON"
|
||||
} else {
|
||||
largeDBHint = "Large DB Mode: OFF (press 'l' to enable)"
|
||||
}
|
||||
|
||||
m.message = infoStyle.Render(fmt.Sprintf(
|
||||
"[RECOMMEND] Profile: %s | %s\n"+
|
||||
" → %s\n"+
|
||||
" Press 'l' to toggle Large DB Mode, 'c' for conservative",
|
||||
profileName, largeDBHint, reason))
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// handleEditingInput handles input when editing a setting
|
||||
func (m SettingsModel) handleEditingInput(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
|
||||
switch msg.String() {
|
||||
@@ -747,7 +865,32 @@ func (m SettingsModel) View() string {
|
||||
// Current configuration summary
|
||||
if !m.editing {
|
||||
b.WriteString("\n")
|
||||
b.WriteString(infoStyle.Render("[INFO] Current Configuration"))
|
||||
b.WriteString(infoStyle.Render("[INFO] System Resources & Configuration"))
|
||||
b.WriteString("\n")
|
||||
|
||||
// System resources
|
||||
var sysInfo []string
|
||||
if m.config.CPUInfo != nil {
|
||||
sysInfo = append(sysInfo, fmt.Sprintf("CPU: %d cores (physical), %d logical",
|
||||
m.config.CPUInfo.PhysicalCores, m.config.CPUInfo.LogicalCores))
|
||||
}
|
||||
if m.config.MemoryInfo != nil {
|
||||
sysInfo = append(sysInfo, fmt.Sprintf("Memory: %dGB total, %dGB available",
|
||||
m.config.MemoryInfo.TotalGB, m.config.MemoryInfo.AvailableGB))
|
||||
}
|
||||
|
||||
// Recommended profile
|
||||
recommendedProfile, reason := m.config.GetResourceProfileRecommendation(false)
|
||||
sysInfo = append(sysInfo, fmt.Sprintf("Recommended Profile: %s", recommendedProfile))
|
||||
sysInfo = append(sysInfo, fmt.Sprintf(" → %s", reason))
|
||||
|
||||
for _, line := range sysInfo {
|
||||
b.WriteString(detailStyle.Render(fmt.Sprintf(" %s", line)))
|
||||
b.WriteString("\n")
|
||||
}
|
||||
|
||||
b.WriteString("\n")
|
||||
b.WriteString(infoStyle.Render("[CONFIG] Current Settings"))
|
||||
b.WriteString("\n")
|
||||
|
||||
summary := []string{
|
||||
@@ -755,7 +898,17 @@ func (m SettingsModel) View() string {
|
||||
fmt.Sprintf("Database: %s@%s:%d", m.config.User, m.config.Host, m.config.Port),
|
||||
fmt.Sprintf("Backup Dir: %s", m.config.BackupDir),
|
||||
fmt.Sprintf("Compression: Level %d", m.config.CompressionLevel),
|
||||
fmt.Sprintf("Jobs: %d parallel, %d dump", m.config.Jobs, m.config.DumpJobs),
|
||||
fmt.Sprintf("Profile: %s | Cluster: %d parallel | Jobs: %d",
|
||||
m.config.ResourceProfile, m.config.ClusterParallelism, m.config.Jobs),
|
||||
}
|
||||
|
||||
// Show profile warnings if applicable
|
||||
profile := m.config.GetCurrentProfile()
|
||||
if profile != nil {
|
||||
isValid, warnings := cpu.ValidateProfileForSystem(profile, m.config.CPUInfo, m.config.MemoryInfo)
|
||||
if !isValid && len(warnings) > 0 {
|
||||
summary = append(summary, fmt.Sprintf("⚠️ Warning: %s", warnings[0]))
|
||||
}
|
||||
}
|
||||
|
||||
if m.config.CloudEnabled {
|
||||
@@ -782,9 +935,9 @@ func (m SettingsModel) View() string {
|
||||
} else {
|
||||
// Show different help based on current selection
|
||||
if m.cursor >= 0 && m.cursor < len(m.settings) && m.settings[m.cursor].Type == "path" {
|
||||
footer = infoStyle.Render("\n[KEYS] Up/Down navigate | Enter edit | Tab browse directories | 's' save | 'r' reset | 'q' menu")
|
||||
footer = infoStyle.Render("\n[KEYS] ↑↓ navigate | Enter edit | Tab dirs | 'l' toggle LargeDB | 'c' conservative | 'p' recommend | 's' save | 'q' menu")
|
||||
} else {
|
||||
footer = infoStyle.Render("\n[KEYS] Up/Down navigate | Enter edit | 's' save | 'r' reset | 'q' menu | Tab=dirs on path fields only")
|
||||
footer = infoStyle.Render("\n[KEYS] ↑↓ navigate | Enter edit | 'l' toggle LargeDB mode | 'c' conservative | 'p' recommend | 's' save | 'r' reset | 'q' menu")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user