Compare commits
28 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 527435a3b8 | |||
| 6a7cf3c11e | |||
| fd3f8770b7 | |||
| 15f10c280c | |||
| 35a9a6e837 | |||
| 82378be971 | |||
| 9fec2c79f8 | |||
| ae34467b4a | |||
| 379ca06146 | |||
| c9bca42f28 | |||
| c90ec1156e | |||
| 23265a33a4 | |||
| 9b9abbfde7 | |||
| 6282d66693 | |||
| 4486a5d617 | |||
| 75dee1fff5 | |||
| 91d494537d | |||
| 8ffc1ba23c | |||
| 8e8045d8c0 | |||
| 0e94dcf384 | |||
| 33adfbdb38 | |||
| af34eaa073 | |||
| babce7cc83 | |||
| ae8c8fde3d | |||
| 346cb7fb61 | |||
| 18549584b1 | |||
| b1d1d57b61 | |||
| d0e1da1bea |
132
CHANGELOG.md
132
CHANGELOG.md
@ -5,6 +5,138 @@ All notable changes to dbbackup will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Added - Single Database Extraction from Cluster Backups (CLI + TUI)
|
||||
- **Extract and restore individual databases from cluster backups** - selective restore without full cluster restoration
|
||||
- **CLI Commands**:
|
||||
- **List databases**: `dbbackup restore cluster backup.tar.gz --list-databases`
|
||||
- Shows all databases in cluster backup with sizes
|
||||
- Fast scan without full extraction
|
||||
- **Extract single database**: `dbbackup restore cluster backup.tar.gz --database myapp --output-dir /tmp/extract`
|
||||
- Extracts only the specified database dump
|
||||
- No restore, just file extraction
|
||||
- **Restore single database from cluster**: `dbbackup restore cluster backup.tar.gz --database myapp --confirm`
|
||||
- Extracts and restores only one database
|
||||
- Much faster than full cluster restore when you only need one database
|
||||
- **Rename on restore**: `dbbackup restore cluster backup.tar.gz --database myapp --target myapp_test --confirm`
|
||||
- Restore with different database name (useful for testing)
|
||||
- **Extract multiple databases**: `dbbackup restore cluster backup.tar.gz --databases "app1,app2,app3" --output-dir /tmp/extract`
|
||||
- Comma-separated list of databases to extract
|
||||
- **TUI Support**:
|
||||
- Press **'s'** on any cluster backup in archive browser to select individual databases
|
||||
- New **ClusterDatabaseSelector** view shows all databases with sizes
|
||||
- Navigate with arrow keys, select with Enter
|
||||
- Automatic handling when cluster backup selected in single restore mode
|
||||
- Full restore preview and confirmation workflow
|
||||
- **Benefits**:
|
||||
- Faster restores (extract only what you need)
|
||||
- Less disk space usage during restore
|
||||
- Easy database migration/copying
|
||||
- Better testing workflow
|
||||
- Selective disaster recovery
|
||||
|
||||
### Performance - Cluster Restore Optimization
|
||||
- **Eliminated duplicate archive extraction in cluster restore** - saves 30-50% time on large restores
|
||||
- Previously: Archive was extracted twice (once in preflight validation, once in actual restore)
|
||||
- Now: Archive extracted once and reused for both validation and restore
|
||||
- **Time savings**:
|
||||
- 50 GB cluster: ~3-6 minutes faster
|
||||
- 10 GB cluster: ~1-2 minutes faster
|
||||
- Small clusters (<5 GB): ~30 seconds faster
|
||||
- Optimization automatically enabled when `--diagnose` flag is used
|
||||
- New `ValidateAndExtractCluster()` performs combined validation + extraction
|
||||
- `RestoreCluster()` accepts optional `preExtractedPath` parameter to reuse extracted directory
|
||||
- Disk space checks intelligently skipped when using pre-extracted directory
|
||||
- Maintains backward compatibility - works with and without pre-extraction
|
||||
- Log output shows optimization: `"Using pre-extracted cluster directory ... optimization: skipping duplicate extraction"`
|
||||
|
||||
### Improved - Archive Validation
|
||||
- **Enhanced tar.gz validation with stream-based checks**
|
||||
- Fast header-only validation (validates gzip + tar structure without full extraction)
|
||||
- Checks gzip magic bytes (0x1f 0x8b) and tar header signature
|
||||
- Reduces preflight validation time from minutes to seconds on large archives
|
||||
- Falls back to full extraction only when necessary (with `--diagnose`)
|
||||
|
||||
## [3.42.74] - 2026-01-20 "Resource Profile System + Critical Ctrl+C Fix"
|
||||
|
||||
### Critical Bug Fix
|
||||
- **Fixed Ctrl+C not working in TUI backup/restore** - Context cancellation was broken in TUI mode
|
||||
- `executeBackupWithTUIProgress()` and `executeRestoreWithTUIProgress()` created new contexts with `WithCancel(parentCtx)`
|
||||
- When user pressed Ctrl+C, `model.cancel()` was called on parent context but execution had separate context
|
||||
- Fixed by using parent context directly instead of creating new one
|
||||
- Ctrl+C/ESC/q now properly propagate cancellation to running operations
|
||||
- Users can now interrupt long-running TUI operations
|
||||
|
||||
### Added - Resource Profile System
|
||||
- **`--profile` flag for restore operations** with three presets:
|
||||
- **Conservative** (`--profile=conservative`): Single-threaded (`--parallel=1`), minimal memory usage
|
||||
- Best for resource-constrained servers, shared hosting, or when "out of shared memory" errors occur
|
||||
- Automatically enables `LargeDBMode` for better resource management
|
||||
- **Balanced** (default): Auto-detect resources, moderate parallelism
|
||||
- Good default for most scenarios
|
||||
- **Aggressive** (`--profile=aggressive`): Maximum parallelism, all available resources
|
||||
- Best for dedicated database servers with ample resources
|
||||
- **Potato** (`--profile=potato`): Easter egg 🥔, same as conservative
|
||||
- **Profile system applies to both CLI and TUI**:
|
||||
- CLI: `dbbackup restore cluster backup.tar.gz --profile=conservative --confirm`
|
||||
- TUI: Automatically uses conservative profile for safer interactive operation
|
||||
- **User overrides supported**: `--jobs` and `--parallel-dbs` flags override profile settings
|
||||
- **New `internal/config/profile.go`** module:
|
||||
- `GetRestoreProfile(name)` - Returns profile settings
|
||||
- `ApplyProfile(cfg, profile, jobs, parallelDBs)` - Applies profile with overrides
|
||||
- `GetProfileDescription(name)` - Human-readable descriptions
|
||||
- `ListProfiles()` - All available profiles
|
||||
|
||||
### Added - PostgreSQL Diagnostic Tools
|
||||
- **`diagnose_postgres_memory.sh`** - Comprehensive memory and resource analysis script:
|
||||
- System memory overview with usage percentages and warnings
|
||||
- Top 15 memory consuming processes
|
||||
- PostgreSQL-specific memory configuration analysis
|
||||
- Current locks and connections monitoring
|
||||
- Shared memory segments inspection
|
||||
- Disk space and swap usage checks
|
||||
- Identifies other resource consumers (Nessus, Elastic Agent, monitoring tools)
|
||||
- Smart recommendations based on findings
|
||||
- Detects temp file usage (indicator of low work_mem)
|
||||
- **`fix_postgres_locks.sh`** - PostgreSQL lock configuration helper:
|
||||
- Automatically increases `max_locks_per_transaction` to 4096
|
||||
- Shows current configuration before applying changes
|
||||
- Calculates total lock capacity
|
||||
- Provides restart commands for different PostgreSQL setups
|
||||
- References diagnostic tool for comprehensive analysis
|
||||
|
||||
### Added - Documentation
|
||||
- **`RESTORE_PROFILES.md`** - Complete profile guide with real-world scenarios:
|
||||
- Profile comparison table
|
||||
- When to use each profile
|
||||
- Override examples
|
||||
- Troubleshooting guide for "out of shared memory" errors
|
||||
- Integration with diagnostic tools
|
||||
- **`email_infra_team.txt`** - Admin communication template (German):
|
||||
- Analysis results template
|
||||
- Problem identification section
|
||||
- Three solution variants (temporary, permanent, workaround)
|
||||
- Includes diagnostic tool references
|
||||
|
||||
### Changed - TUI Improvements
|
||||
- **TUI mode defaults to conservative profile** for safer operation
|
||||
- Interactive users benefit from stability over speed
|
||||
- Prevents resource exhaustion on shared systems
|
||||
- Can be overridden with environment variable: `export RESOURCE_PROFILE=balanced`
|
||||
|
||||
### Fixed
|
||||
- Context cancellation in TUI backup operations (critical)
|
||||
- Context cancellation in TUI restore operations (critical)
|
||||
- Better error diagnostics for "out of shared memory" errors
|
||||
- Improved resource detection and management
|
||||
|
||||
### Technical Details
|
||||
- Profile system respects explicit user flags (`--jobs`, `--parallel-dbs`)
|
||||
- Conservative profile sets `cfg.LargeDBMode = true` automatically
|
||||
- TUI profile selection logged when `Debug` mode enabled
|
||||
- All profiles support both single and cluster restore operations
|
||||
|
||||
## [3.42.50] - 2026-01-16 "Ctrl+C Signal Handling Fix"
|
||||
|
||||
### Fixed - Proper Ctrl+C/SIGINT Handling in TUI
|
||||
|
||||
15
README.md
15
README.md
@ -56,7 +56,7 @@ Download from [releases](https://git.uuxo.net/UUXO/dbbackup/releases):
|
||||
|
||||
```bash
|
||||
# Linux x86_64
|
||||
wget https://git.uuxo.net/UUXO/dbbackup/releases/download/v3.42.35/dbbackup-linux-amd64
|
||||
wget https://git.uuxo.net/UUXO/dbbackup/releases/download/v3.42.74/dbbackup-linux-amd64
|
||||
chmod +x dbbackup-linux-amd64
|
||||
sudo mv dbbackup-linux-amd64 /usr/local/bin/dbbackup
|
||||
```
|
||||
@ -239,6 +239,14 @@ When restoring large databases on VMs with limited resources, use the resource p
|
||||
|
||||
**Quick shortcuts:** Press `l` to toggle Large DB Mode, `c` for conservative, `p` to show recommendation.
|
||||
|
||||
**Troubleshooting Tools:**
|
||||
|
||||
For PostgreSQL restore issues ("out of shared memory" errors), diagnostic scripts are available:
|
||||
- **diagnose_postgres_memory.sh** - Comprehensive system memory, PostgreSQL configuration, and resource analysis
|
||||
- **fix_postgres_locks.sh** - Automatically increase max_locks_per_transaction to 4096
|
||||
|
||||
See [RESTORE_PROFILES.md](RESTORE_PROFILES.md) for detailed troubleshooting guidance.
|
||||
|
||||
**Database Status:**
|
||||
```
|
||||
Database Status & Health Check
|
||||
@ -278,6 +286,9 @@ dbbackup restore single backup.dump --target myapp_db --create --confirm
|
||||
# Restore cluster
|
||||
dbbackup restore cluster cluster_backup.tar.gz --confirm
|
||||
|
||||
# Restore with resource profile (for resource-constrained servers)
|
||||
dbbackup restore cluster backup.tar.gz --profile=conservative --confirm
|
||||
|
||||
# Restore with debug logging (saves detailed error report on failure)
|
||||
dbbackup restore cluster backup.tar.gz --save-debug-log /tmp/restore-debug.json --confirm
|
||||
|
||||
@ -333,6 +344,7 @@ dbbackup backup single mydb --dry-run
|
||||
| `--backup-dir` | Backup directory | ~/db_backups |
|
||||
| `--compression` | Compression level (0-9) | 6 |
|
||||
| `--jobs` | Parallel jobs | 8 |
|
||||
| `--profile` | Resource profile (conservative/balanced/aggressive) | balanced |
|
||||
| `--cloud` | Cloud storage URI | - |
|
||||
| `--encrypt` | Enable encryption | false |
|
||||
| `--dry-run, -n` | Run preflight checks only | false |
|
||||
@ -888,6 +900,7 @@ Workload types:
|
||||
|
||||
## Documentation
|
||||
|
||||
- [RESTORE_PROFILES.md](RESTORE_PROFILES.md) - Restore resource profiles & troubleshooting
|
||||
- [SYSTEMD.md](SYSTEMD.md) - Systemd installation & scheduling
|
||||
- [DOCKER.md](DOCKER.md) - Docker deployment
|
||||
- [CLOUD.md](CLOUD.md) - Cloud storage configuration
|
||||
|
||||
171
RESTORE_PROGRESS_PROPOSAL.md
Normal file
171
RESTORE_PROGRESS_PROPOSAL.md
Normal file
@ -0,0 +1,171 @@
|
||||
# Restore Progress Bar Enhancement Proposal
|
||||
|
||||
## Problem
|
||||
During Phase 2 cluster restore, the progress bar is not real-time because:
|
||||
- `pg_restore` subprocess blocks until completion
|
||||
- Progress updates only happen **before** each database restore starts
|
||||
- No feedback during actual restore execution (which can take hours)
|
||||
- Users see frozen progress bar during large database restores
|
||||
|
||||
## Root Cause
|
||||
In `internal/restore/engine.go`:
|
||||
- `executeRestoreCommand()` blocks on `cmd.Wait()`
|
||||
- Progress is only reported at goroutine entry (line ~1315)
|
||||
- No streaming progress during pg_restore execution
|
||||
|
||||
## Proposed Solutions
|
||||
|
||||
### Option 1: Parse pg_restore stderr for progress (RECOMMENDED)
|
||||
**Pros:**
|
||||
- Real-time feedback during restore
|
||||
- Works with existing pg_restore
|
||||
- No external tools needed
|
||||
|
||||
**Implementation:**
|
||||
```go
|
||||
// In executeRestoreCommand, modify stderr reader:
|
||||
go func() {
|
||||
scanner := bufio.NewScanner(stderr)
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
|
||||
// Parse pg_restore progress lines
|
||||
// Format: "pg_restore: processing item 1234 TABLE public users"
|
||||
if strings.Contains(line, "processing item") {
|
||||
e.reportItemProgress(line) // Update progress bar
|
||||
}
|
||||
|
||||
// Capture errors
|
||||
if strings.Contains(line, "ERROR:") {
|
||||
lastError = line
|
||||
errorCount++
|
||||
}
|
||||
}
|
||||
}()
|
||||
```
|
||||
|
||||
**Add to RestoreCluster goroutine:**
|
||||
```go
|
||||
// Track sub-items within each database
|
||||
var currentDBItems, totalDBItems int
|
||||
e.setItemProgressCallback(func(current, total int) {
|
||||
currentDBItems = current
|
||||
totalDBItems = total
|
||||
// Update TUI with sub-progress
|
||||
e.reportDatabaseSubProgress(idx, totalDBs, dbName, current, total)
|
||||
})
|
||||
```
|
||||
|
||||
### Option 2: Verbose mode with line counting
|
||||
**Pros:**
|
||||
- More granular progress (row-level)
|
||||
- Shows exact operation being performed
|
||||
|
||||
**Cons:**
|
||||
- `--verbose` causes massive stderr output (OOM risk on huge DBs)
|
||||
- Currently disabled for memory safety
|
||||
- Requires careful memory management
|
||||
|
||||
### Option 3: Hybrid approach (BEST)
|
||||
**Combine both:**
|
||||
1. **Default**: Parse non-verbose pg_restore output for item counts
|
||||
2. **Small DBs** (<500MB): Enable verbose for detailed progress
|
||||
3. **Periodic updates**: Report progress every 5 seconds even without stderr changes
|
||||
|
||||
**Implementation:**
|
||||
```go
|
||||
// Add periodic progress ticker
|
||||
progressTicker := time.NewTicker(5 * time.Second)
|
||||
defer progressTicker.Stop()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-progressTicker.C:
|
||||
// Report heartbeat even if no stderr
|
||||
e.reportHeartbeat(dbName, time.Since(dbRestoreStart))
|
||||
case <-stderrDone:
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
```
|
||||
|
||||
## Recommended Implementation Plan
|
||||
|
||||
### Phase 1: Quick Win (1-2 hours)
|
||||
1. Add heartbeat ticker in cluster restore goroutines
|
||||
2. Update TUI to show "Restoring database X... (elapsed: 3m 45s)"
|
||||
3. No code changes to pg_restore wrapper
|
||||
|
||||
### Phase 2: Parse pg_restore Output (4-6 hours)
|
||||
1. Parse stderr for "processing item" lines
|
||||
2. Extract current/total item counts
|
||||
3. Report sub-progress to TUI
|
||||
4. Update progress bar calculation:
|
||||
```
|
||||
dbProgress = baseProgress + (itemsDone/totalItems) * dbWeightedPercent
|
||||
```
|
||||
|
||||
### Phase 3: Smart Verbose Mode (optional)
|
||||
1. Detect database size before restore
|
||||
2. Enable verbose for DBs < 500MB
|
||||
3. Parse verbose output for detailed progress
|
||||
4. Automatic fallback to item-based for large DBs
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. **internal/restore/engine.go**:
|
||||
- `executeRestoreCommand()` - add progress parsing
|
||||
- `RestoreCluster()` - add heartbeat ticker
|
||||
- New: `reportItemProgress()`, `reportHeartbeat()`
|
||||
|
||||
2. **internal/tui/restore_exec.go**:
|
||||
- Update `RestoreExecModel` to handle sub-progress
|
||||
- Add "elapsed time" display during restore
|
||||
- Show item counts: "Restoring tables... (234/567)"
|
||||
|
||||
3. **internal/progress/indicator.go**:
|
||||
- Add `UpdateSubProgress(current, total int)` method
|
||||
- Add `ReportHeartbeat(elapsed time.Duration)` method
|
||||
|
||||
## Example Output
|
||||
|
||||
**Before (current):**
|
||||
```
|
||||
[====================] Phase 2/3: Restoring Databases (1/5)
|
||||
Restoring database myapp...
|
||||
[frozen for 30 minutes]
|
||||
```
|
||||
|
||||
**After (with heartbeat):**
|
||||
```
|
||||
[====================] Phase 2/3: Restoring Databases (1/5)
|
||||
Restoring database myapp... (elapsed: 4m 32s)
|
||||
[updates every 5 seconds]
|
||||
```
|
||||
|
||||
**After (with item parsing):**
|
||||
```
|
||||
[=========>-----------] Phase 2/3: Restoring Databases (1/5)
|
||||
Restoring database myapp... (processing item 1,234/5,678) (elapsed: 4m 32s)
|
||||
[smooth progress bar movement]
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
1. Test with small DB (< 100MB) - verify heartbeat works
|
||||
2. Test with large DB (> 10GB) - verify no OOM, heartbeat works
|
||||
3. Test with BLOB-heavy DB - verify phased restore shows progress
|
||||
4. Test parallel cluster restore - verify multiple heartbeats don't conflict
|
||||
|
||||
## Risk Assessment
|
||||
- **Low risk**: Heartbeat ticker (Phase 1)
|
||||
- **Medium risk**: stderr parsing (Phase 2) - test thoroughly
|
||||
- **High risk**: Verbose mode (Phase 3) - can cause OOM
|
||||
|
||||
## Estimated Implementation Time
|
||||
- Phase 1 (heartbeat): 1-2 hours
|
||||
- Phase 2 (item parsing): 4-6 hours
|
||||
- Phase 3 (smart verbose): 8-10 hours (optional)
|
||||
|
||||
**Total for Phases 1+2: 5-8 hours**
|
||||
@ -3,9 +3,9 @@
|
||||
This directory contains pre-compiled binaries for the DB Backup Tool across multiple platforms and architectures.
|
||||
|
||||
## Build Information
|
||||
- **Version**: 3.42.50
|
||||
- **Build Time**: 2026-01-18_20:23:55_UTC
|
||||
- **Git Commit**: a759f4d
|
||||
- **Version**: 3.42.81
|
||||
- **Build Time**: 2026-01-22_17:13:41_UTC
|
||||
- **Git Commit**: 6a7cf3c
|
||||
|
||||
## Recent Updates (v1.1.0)
|
||||
- ✅ Fixed TUI progress display with line-by-line output
|
||||
|
||||
290
cmd/restore.go
290
cmd/restore.go
@ -16,6 +16,7 @@ import (
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/database"
|
||||
"dbbackup/internal/pitr"
|
||||
"dbbackup/internal/progress"
|
||||
"dbbackup/internal/restore"
|
||||
"dbbackup/internal/security"
|
||||
|
||||
@ -38,6 +39,13 @@ var (
|
||||
restoreCleanCluster bool
|
||||
restoreDiagnose bool // Run diagnosis before restore
|
||||
restoreSaveDebugLog string // Path to save debug log on failure
|
||||
restoreDebugLocks bool // Enable detailed lock debugging
|
||||
|
||||
// Single database extraction from cluster flags
|
||||
restoreDatabase string // Single database to extract/restore from cluster
|
||||
restoreDatabases string // Comma-separated list of databases to extract
|
||||
restoreOutputDir string // Extract to directory (no restore)
|
||||
restoreListDBs bool // List databases in cluster backup
|
||||
|
||||
// Diagnose flags
|
||||
diagnoseJSON bool
|
||||
@ -136,6 +144,11 @@ var restoreClusterCmd = &cobra.Command{
|
||||
This command restores all databases that were backed up together
|
||||
in a cluster backup operation.
|
||||
|
||||
Single Database Extraction:
|
||||
Use --list-databases to see available databases
|
||||
Use --database to extract/restore a specific database
|
||||
Use --output-dir to extract without restoring
|
||||
|
||||
Safety features:
|
||||
- Dry-run by default (use --confirm to execute)
|
||||
- Archive validation and listing
|
||||
@ -143,6 +156,21 @@ Safety features:
|
||||
- Sequential database restoration
|
||||
|
||||
Examples:
|
||||
# List databases in cluster backup
|
||||
dbbackup restore cluster backup.tar.gz --list-databases
|
||||
|
||||
# Extract single database (no restore)
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --output-dir /tmp/extract
|
||||
|
||||
# Restore single database from cluster
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --confirm
|
||||
|
||||
# Restore single database with different name
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --target myapp_test --confirm
|
||||
|
||||
# Extract multiple databases
|
||||
dbbackup restore cluster backup.tar.gz --databases "app1,app2,app3" --output-dir /tmp/extract
|
||||
|
||||
# Preview cluster restore
|
||||
dbbackup restore cluster cluster_backup_20240101_120000.tar.gz
|
||||
|
||||
@ -295,13 +323,18 @@ func init() {
|
||||
restoreSingleCmd.Flags().StringVar(&restoreEncryptionKeyEnv, "encryption-key-env", "DBBACKUP_ENCRYPTION_KEY", "Environment variable containing encryption key")
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreDiagnose, "diagnose", false, "Run deep diagnosis before restore to detect corruption/truncation")
|
||||
restoreSingleCmd.Flags().StringVar(&restoreSaveDebugLog, "save-debug-log", "", "Save detailed error report to file on failure (e.g., /tmp/restore-debug.json)")
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreDebugLocks, "debug-locks", false, "Enable detailed lock debugging (captures PostgreSQL config, Guard decisions, boost attempts)")
|
||||
|
||||
// Cluster restore flags
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreListDBs, "list-databases", false, "List databases in cluster backup and exit")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreDatabase, "database", "", "Extract/restore single database from cluster")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreDatabases, "databases", "", "Extract multiple databases (comma-separated)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreOutputDir, "output-dir", "", "Extract to directory without restoring (requires --database or --databases)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreConfirm, "confirm", false, "Confirm and execute restore (required)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreDryRun, "dry-run", false, "Show what would be done without executing")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreForce, "force", false, "Skip safety checks and confirmations")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreCleanCluster, "clean-cluster", false, "Drop all existing user databases before restore (disaster recovery)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreProfile, "profile", "balanced", "Resource profile: conservative (--parallel=1, low memory), balanced, aggressive (max performance)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreProfile, "profile", "conservative", "Resource profile: conservative (single-threaded, prevents lock issues), balanced (auto-detect), aggressive (max speed)")
|
||||
restoreClusterCmd.Flags().IntVar(&restoreJobs, "jobs", 0, "Number of parallel decompression jobs (0 = auto, overrides profile)")
|
||||
restoreClusterCmd.Flags().IntVar(&restoreParallelDBs, "parallel-dbs", 0, "Number of databases to restore in parallel (0 = use profile, 1 = sequential, -1 = auto-detect, overrides profile)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreWorkdir, "workdir", "", "Working directory for extraction (use when system disk is small, e.g. /mnt/storage/restore_tmp)")
|
||||
@ -311,6 +344,9 @@ func init() {
|
||||
restoreClusterCmd.Flags().StringVar(&restoreEncryptionKeyEnv, "encryption-key-env", "DBBACKUP_ENCRYPTION_KEY", "Environment variable containing encryption key")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreDiagnose, "diagnose", false, "Run deep diagnosis on all dumps before restore")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreSaveDebugLog, "save-debug-log", "", "Save detailed error report to file on failure (e.g., /tmp/restore-debug.json)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreDebugLocks, "debug-locks", false, "Enable detailed lock debugging (captures PostgreSQL config, Guard decisions, boost attempts)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreClean, "clean", false, "Drop and recreate target database (for single DB restore)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreCreate, "create", false, "Create target database if it doesn't exist (for single DB restore)")
|
||||
|
||||
// PITR restore flags
|
||||
restorePITRCmd.Flags().StringVar(&pitrBaseBackup, "base-backup", "", "Path to base backup file (.tar.gz) (required)")
|
||||
@ -597,6 +633,12 @@ func runRestoreSingle(cmd *cobra.Command, args []string) error {
|
||||
log.Info("Debug logging enabled", "output", restoreSaveDebugLog)
|
||||
}
|
||||
|
||||
// Enable lock debugging if requested (single restore)
|
||||
if restoreDebugLocks {
|
||||
cfg.DebugLocks = true
|
||||
log.Info("🔍 Lock debugging enabled - will capture PostgreSQL lock config, Guard decisions, boost attempts")
|
||||
}
|
||||
|
||||
// Setup signal handling
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
defer cancel()
|
||||
@ -666,6 +708,193 @@ func runRestoreSingle(cmd *cobra.Command, args []string) error {
|
||||
func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
archivePath := args[0]
|
||||
|
||||
// Convert to absolute path
|
||||
if !filepath.IsAbs(archivePath) {
|
||||
absPath, err := filepath.Abs(archivePath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("invalid archive path: %w", err)
|
||||
}
|
||||
archivePath = absPath
|
||||
}
|
||||
|
||||
// Check if file exists
|
||||
if _, err := os.Stat(archivePath); err != nil {
|
||||
return fmt.Errorf("archive not found: %s", archivePath)
|
||||
}
|
||||
|
||||
// Handle --list-databases flag
|
||||
if restoreListDBs {
|
||||
return runListDatabases(archivePath)
|
||||
}
|
||||
|
||||
// Handle single/multiple database extraction
|
||||
if restoreDatabase != "" || restoreDatabases != "" {
|
||||
return runExtractDatabases(archivePath)
|
||||
}
|
||||
|
||||
// Otherwise proceed with full cluster restore
|
||||
return runFullClusterRestore(archivePath)
|
||||
}
|
||||
|
||||
// runListDatabases lists all databases in a cluster backup
|
||||
func runListDatabases(archivePath string) error {
|
||||
ctx := context.Background()
|
||||
|
||||
log.Info("Scanning cluster backup", "archive", filepath.Base(archivePath))
|
||||
fmt.Println()
|
||||
|
||||
databases, err := restore.ListDatabasesInCluster(ctx, archivePath, log)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to list databases: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("📦 Databases in cluster backup:\n")
|
||||
var totalSize int64
|
||||
for _, db := range databases {
|
||||
sizeStr := formatSize(db.Size)
|
||||
fmt.Printf(" - %-30s (%s)\n", db.Name, sizeStr)
|
||||
totalSize += db.Size
|
||||
}
|
||||
|
||||
fmt.Printf("\nTotal: %s across %d database(s)\n", formatSize(totalSize), len(databases))
|
||||
return nil
|
||||
}
|
||||
|
||||
// runExtractDatabases extracts single or multiple databases from cluster backup
|
||||
func runExtractDatabases(archivePath string) error {
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
defer cancel()
|
||||
|
||||
// Setup signal handling
|
||||
sigChan := make(chan os.Signal, 1)
|
||||
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
|
||||
defer signal.Stop(sigChan)
|
||||
|
||||
go func() {
|
||||
<-sigChan
|
||||
log.Warn("Extraction interrupted by user")
|
||||
cancel()
|
||||
}()
|
||||
|
||||
// Single database extraction
|
||||
if restoreDatabase != "" {
|
||||
return handleSingleDatabaseExtraction(ctx, archivePath, restoreDatabase)
|
||||
}
|
||||
|
||||
// Multiple database extraction
|
||||
if restoreDatabases != "" {
|
||||
return handleMultipleDatabaseExtraction(ctx, archivePath, restoreDatabases)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// handleSingleDatabaseExtraction handles single database extraction or restore
|
||||
func handleSingleDatabaseExtraction(ctx context.Context, archivePath, dbName string) error {
|
||||
// Extract-only mode (no restore)
|
||||
if restoreOutputDir != "" {
|
||||
return extractSingleDatabase(ctx, archivePath, dbName, restoreOutputDir)
|
||||
}
|
||||
|
||||
// Restore mode
|
||||
if !restoreConfirm {
|
||||
fmt.Println("\n[DRY-RUN] DRY-RUN MODE - No changes will be made")
|
||||
fmt.Printf("\nWould extract and restore:\n")
|
||||
fmt.Printf(" Database: %s\n", dbName)
|
||||
fmt.Printf(" From: %s\n", archivePath)
|
||||
targetDB := restoreTarget
|
||||
if targetDB == "" {
|
||||
targetDB = dbName
|
||||
}
|
||||
fmt.Printf(" Target: %s\n", targetDB)
|
||||
if restoreClean {
|
||||
fmt.Printf(" Clean: true (drop and recreate)\n")
|
||||
}
|
||||
if restoreCreate {
|
||||
fmt.Printf(" Create: true (create if missing)\n")
|
||||
}
|
||||
fmt.Println("\nTo execute this restore, add --confirm flag")
|
||||
return nil
|
||||
}
|
||||
|
||||
// Create database instance
|
||||
db, err := database.New(cfg, log)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create database instance: %w", err)
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
// Create restore engine
|
||||
engine := restore.New(cfg, log, db)
|
||||
|
||||
// Determine target database name
|
||||
targetDB := restoreTarget
|
||||
if targetDB == "" {
|
||||
targetDB = dbName
|
||||
}
|
||||
|
||||
log.Info("Restoring single database from cluster", "database", dbName, "target", targetDB)
|
||||
|
||||
// Restore single database from cluster
|
||||
if err := engine.RestoreSingleFromCluster(ctx, archivePath, dbName, targetDB, restoreClean, restoreCreate); err != nil {
|
||||
return fmt.Errorf("restore failed: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("\n✅ Successfully restored '%s' as '%s'\n", dbName, targetDB)
|
||||
return nil
|
||||
}
|
||||
|
||||
// extractSingleDatabase extracts a single database without restoring
|
||||
func extractSingleDatabase(ctx context.Context, archivePath, dbName, outputDir string) error {
|
||||
log.Info("Extracting database", "database", dbName, "output", outputDir)
|
||||
|
||||
// Create progress indicator
|
||||
prog := progress.NewIndicator(!restoreNoProgress, "dots")
|
||||
|
||||
extractedPath, err := restore.ExtractDatabaseFromCluster(ctx, archivePath, dbName, outputDir, log, prog)
|
||||
if err != nil {
|
||||
return fmt.Errorf("extraction failed: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("\n✅ Extracted: %s\n", extractedPath)
|
||||
fmt.Printf(" Database: %s\n", dbName)
|
||||
fmt.Printf(" Location: %s\n", outputDir)
|
||||
return nil
|
||||
}
|
||||
|
||||
// handleMultipleDatabaseExtraction handles multiple database extraction
|
||||
func handleMultipleDatabaseExtraction(ctx context.Context, archivePath, databases string) error {
|
||||
if restoreOutputDir == "" {
|
||||
return fmt.Errorf("--output-dir required when using --databases")
|
||||
}
|
||||
|
||||
// Parse database list
|
||||
dbNames := strings.Split(databases, ",")
|
||||
for i := range dbNames {
|
||||
dbNames[i] = strings.TrimSpace(dbNames[i])
|
||||
}
|
||||
|
||||
log.Info("Extracting multiple databases", "count", len(dbNames), "output", restoreOutputDir)
|
||||
|
||||
// Create progress indicator
|
||||
prog := progress.NewIndicator(!restoreNoProgress, "dots")
|
||||
|
||||
extractedPaths, err := restore.ExtractMultipleDatabasesFromCluster(ctx, archivePath, dbNames, restoreOutputDir, log, prog)
|
||||
if err != nil {
|
||||
return fmt.Errorf("extraction failed: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("\n✅ Extracted %d database(s):\n", len(extractedPaths))
|
||||
for dbName, path := range extractedPaths {
|
||||
fmt.Printf(" - %s → %s\n", dbName, filepath.Base(path))
|
||||
}
|
||||
fmt.Printf(" Location: %s\n", restoreOutputDir)
|
||||
return nil
|
||||
}
|
||||
|
||||
// runFullClusterRestore performs a full cluster restore
|
||||
func runFullClusterRestore(archivePath string) error {
|
||||
|
||||
// Apply resource profile
|
||||
if err := config.ApplyProfile(cfg, restoreProfile, restoreJobs, restoreParallelDBs); err != nil {
|
||||
log.Warn("Invalid profile, using balanced", "error", err)
|
||||
@ -838,6 +1067,12 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
log.Info("Debug logging enabled", "output", restoreSaveDebugLog)
|
||||
}
|
||||
|
||||
// Enable lock debugging if requested (cluster restore)
|
||||
if restoreDebugLocks {
|
||||
cfg.DebugLocks = true
|
||||
log.Info("🔍 Lock debugging enabled - will capture PostgreSQL lock config, Guard decisions, boost attempts")
|
||||
}
|
||||
|
||||
// Setup signal handling
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
defer cancel()
|
||||
@ -873,22 +1108,50 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
log.Info("Database cleanup completed")
|
||||
}
|
||||
|
||||
// Run pre-restore diagnosis if requested
|
||||
if restoreDiagnose {
|
||||
log.Info("[DIAG] Running pre-restore diagnosis...")
|
||||
// OPTIMIZATION: Pre-extract archive once for both diagnosis and restore
|
||||
// This avoids extracting the same tar.gz twice (saves 5-10 min on large clusters)
|
||||
var extractedDir string
|
||||
var extractErr error
|
||||
|
||||
// Create temp directory for extraction in configured WorkDir
|
||||
workDir := cfg.GetEffectiveWorkDir()
|
||||
diagTempDir, err := os.MkdirTemp(workDir, "dbbackup-diagnose-*")
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create temp directory for diagnosis in %s: %w", workDir, err)
|
||||
if restoreDiagnose || restoreConfirm {
|
||||
log.Info("Pre-extracting cluster archive (shared for validation and restore)...")
|
||||
extractedDir, extractErr = safety.ValidateAndExtractCluster(ctx, archivePath)
|
||||
if extractErr != nil {
|
||||
return fmt.Errorf("failed to extract cluster archive: %w", extractErr)
|
||||
}
|
||||
defer os.RemoveAll(diagTempDir)
|
||||
defer os.RemoveAll(extractedDir) // Cleanup at end
|
||||
log.Info("Archive extracted successfully", "location", extractedDir)
|
||||
}
|
||||
|
||||
// Run pre-restore diagnosis if requested (using already-extracted directory)
|
||||
if restoreDiagnose {
|
||||
log.Info("[DIAG] Running pre-restore diagnosis on extracted dumps...")
|
||||
|
||||
diagnoser := restore.NewDiagnoser(log, restoreVerbose)
|
||||
results, err := diagnoser.DiagnoseClusterDumps(archivePath, diagTempDir)
|
||||
// Diagnose dumps directly from extracted directory
|
||||
dumpsDir := filepath.Join(extractedDir, "dumps")
|
||||
if _, err := os.Stat(dumpsDir); err != nil {
|
||||
return fmt.Errorf("no dumps directory found in extracted archive: %w", err)
|
||||
}
|
||||
|
||||
entries, err := os.ReadDir(dumpsDir)
|
||||
if err != nil {
|
||||
return fmt.Errorf("diagnosis failed: %w", err)
|
||||
return fmt.Errorf("failed to read dumps directory: %w", err)
|
||||
}
|
||||
|
||||
// Diagnose each dump file
|
||||
var results []*restore.DiagnoseResult
|
||||
for _, entry := range entries {
|
||||
if entry.IsDir() {
|
||||
continue
|
||||
}
|
||||
dumpPath := filepath.Join(dumpsDir, entry.Name())
|
||||
result, err := diagnoser.DiagnoseFile(dumpPath)
|
||||
if err != nil {
|
||||
log.Warn("Could not diagnose dump", "file", entry.Name(), "error", err)
|
||||
continue
|
||||
}
|
||||
results = append(results, result)
|
||||
}
|
||||
|
||||
// Check for any invalid dumps
|
||||
@ -928,7 +1191,8 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
startTime := time.Now()
|
||||
auditLogger.LogRestoreStart(user, "all_databases", archivePath)
|
||||
|
||||
if err := engine.RestoreCluster(ctx, archivePath); err != nil {
|
||||
// Pass pre-extracted directory to avoid double extraction
|
||||
if err := engine.RestoreCluster(ctx, archivePath, extractedDir); err != nil {
|
||||
auditLogger.LogRestoreFailed(user, "all_databases", err)
|
||||
return fmt.Errorf("cluster restore failed: %w", err)
|
||||
}
|
||||
|
||||
@ -134,6 +134,7 @@ func Execute(ctx context.Context, config *config.Config, logger logger.Logger) e
|
||||
rootCmd.PersistentFlags().StringVar(&cfg.BackupDir, "backup-dir", cfg.BackupDir, "Backup directory")
|
||||
rootCmd.PersistentFlags().BoolVar(&cfg.NoColor, "no-color", cfg.NoColor, "Disable colored output")
|
||||
rootCmd.PersistentFlags().BoolVar(&cfg.Debug, "debug", cfg.Debug, "Enable debug logging")
|
||||
rootCmd.PersistentFlags().BoolVar(&cfg.DebugLocks, "debug-locks", cfg.DebugLocks, "Enable detailed lock debugging (captures PostgreSQL lock configuration, Large DB Guard decisions, boost attempts)")
|
||||
rootCmd.PersistentFlags().IntVar(&cfg.Jobs, "jobs", cfg.Jobs, "Number of parallel jobs")
|
||||
rootCmd.PersistentFlags().IntVar(&cfg.DumpJobs, "dump-jobs", cfg.DumpJobs, "Number of parallel dump jobs")
|
||||
rootCmd.PersistentFlags().IntVar(&cfg.MaxCores, "max-cores", cfg.MaxCores, "Maximum CPU cores to use")
|
||||
|
||||
@ -35,34 +35,77 @@ echo "📊 Current PostgreSQL Configuration:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
sudo -u postgres psql -c "SHOW max_locks_per_transaction;" 2>/dev/null || psql -c "SHOW max_locks_per_transaction;" || echo "Unable to query current value"
|
||||
sudo -u postgres psql -c "SHOW max_connections;" 2>/dev/null || psql -c "SHOW max_connections;" || echo "Unable to query current value"
|
||||
sudo -u postgres psql -c "SHOW work_mem;" 2>/dev/null || psql -c "SHOW work_mem;" || echo "Unable to query current value"
|
||||
sudo -u postgres psql -c "SHOW maintenance_work_mem;" 2>/dev/null || psql -c "SHOW maintenance_work_mem;" || echo "Unable to query current value"
|
||||
echo
|
||||
|
||||
# Recommended value
|
||||
# Recommended values
|
||||
RECOMMENDED_LOCKS=4096
|
||||
RECOMMENDED_WORK_MEM="256MB"
|
||||
RECOMMENDED_MAINTENANCE_WORK_MEM="4GB"
|
||||
|
||||
echo "🔧 Applying Fix:"
|
||||
echo "🔧 Applying Fixes:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
echo "Setting max_locks_per_transaction = $RECOMMENDED_LOCKS"
|
||||
echo "1. Setting max_locks_per_transaction = $RECOMMENDED_LOCKS"
|
||||
echo "2. Setting work_mem = $RECOMMENDED_WORK_MEM (improves query performance)"
|
||||
echo "3. Setting maintenance_work_mem = $RECOMMENDED_MAINTENANCE_WORK_MEM (speeds up restore/vacuum)"
|
||||
echo
|
||||
|
||||
# Apply the setting
|
||||
# Apply the settings
|
||||
SUCCESS=0
|
||||
|
||||
# Fix 1: max_locks_per_transaction
|
||||
if sudo -u postgres psql -c "ALTER SYSTEM SET max_locks_per_transaction = $RECOMMENDED_LOCKS;" 2>/dev/null; then
|
||||
echo "✅ Configuration updated successfully"
|
||||
echo "✅ max_locks_per_transaction updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
elif psql -c "ALTER SYSTEM SET max_locks_per_transaction = $RECOMMENDED_LOCKS;" 2>/dev/null; then
|
||||
echo "✅ Configuration updated successfully"
|
||||
echo "✅ max_locks_per_transaction updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
else
|
||||
echo "❌ Failed to update configuration"
|
||||
echo "❌ Failed to update max_locks_per_transaction"
|
||||
fi
|
||||
|
||||
# Fix 2: work_mem
|
||||
if sudo -u postgres psql -c "ALTER SYSTEM SET work_mem = '$RECOMMENDED_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
elif psql -c "ALTER SYSTEM SET work_mem = '$RECOMMENDED_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
else
|
||||
echo "❌ Failed to update work_mem"
|
||||
fi
|
||||
|
||||
# Fix 3: maintenance_work_mem
|
||||
if sudo -u postgres psql -c "ALTER SYSTEM SET maintenance_work_mem = '$RECOMMENDED_MAINTENANCE_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ maintenance_work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
elif psql -c "ALTER SYSTEM SET maintenance_work_mem = '$RECOMMENDED_MAINTENANCE_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ maintenance_work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
else
|
||||
echo "❌ Failed to update maintenance_work_mem"
|
||||
fi
|
||||
|
||||
if [ $SUCCESS -eq 0 ]; then
|
||||
echo
|
||||
echo "❌ All configuration updates failed"
|
||||
echo
|
||||
echo "Manual steps:"
|
||||
echo "1. Connect to PostgreSQL as superuser:"
|
||||
echo " sudo -u postgres psql"
|
||||
echo
|
||||
echo "2. Run this command:"
|
||||
echo "2. Run these commands:"
|
||||
echo " ALTER SYSTEM SET max_locks_per_transaction = $RECOMMENDED_LOCKS;"
|
||||
echo " ALTER SYSTEM SET work_mem = '$RECOMMENDED_WORK_MEM';"
|
||||
echo " ALTER SYSTEM SET maintenance_work_mem = '$RECOMMENDED_MAINTENANCE_WORK_MEM';"
|
||||
echo
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo
|
||||
echo "✅ Applied $SUCCESS out of 3 configuration changes"
|
||||
|
||||
echo
|
||||
echo "⚠️ IMPORTANT: PostgreSQL restart required!"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
@ -73,12 +116,23 @@ echo " • systemd: sudo systemctl restart postgresql"
|
||||
echo " • pg_ctl: sudo -u postgres pg_ctl restart -D /var/lib/postgresql/data"
|
||||
echo " • service: sudo service postgresql restart"
|
||||
echo
|
||||
echo "Lock capacity after restart will be:"
|
||||
echo " max_locks_per_transaction × (max_connections + max_prepared_transactions)"
|
||||
echo " = $RECOMMENDED_LOCKS × (connections + prepared)"
|
||||
echo "📊 Expected capacity after restart:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
echo " Lock capacity: max_locks_per_transaction × (max_connections + max_prepared)"
|
||||
echo " = $RECOMMENDED_LOCKS × (connections + prepared)"
|
||||
echo
|
||||
echo " Work memory: $RECOMMENDED_WORK_MEM per query operation"
|
||||
echo " Maintenance: $RECOMMENDED_MAINTENANCE_WORK_MEM for restore/vacuum/index"
|
||||
echo
|
||||
echo "After restarting, verify with:"
|
||||
echo " psql -c 'SHOW max_locks_per_transaction;'"
|
||||
echo " psql -c 'SHOW work_mem;'"
|
||||
echo " psql -c 'SHOW maintenance_work_mem;'"
|
||||
echo
|
||||
echo "💡 Benefits:"
|
||||
echo " ✓ Prevents 'out of shared memory' errors during restore"
|
||||
echo " ✓ Reduces temp file usage (better performance)"
|
||||
echo " ✓ Faster restore, vacuum, and index operations"
|
||||
echo
|
||||
echo "🔍 For comprehensive diagnostics, run:"
|
||||
echo " ./diagnose_postgres_memory.sh"
|
||||
|
||||
@ -1372,6 +1372,27 @@ func (e *Engine) executeCommand(ctx context.Context, cmdArgs []string, outputFil
|
||||
// NO GO BUFFERING - pg_dump writes directly to disk
|
||||
cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
|
||||
|
||||
// Start heartbeat ticker for backup progress
|
||||
backupStart := time.Now()
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
defer heartbeatTicker.Stop()
|
||||
defer cancelHeartbeat()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(backupStart)
|
||||
if e.progress != nil {
|
||||
e.progress.Update(fmt.Sprintf("Backing up database... (elapsed: %s)", formatDuration(elapsed)))
|
||||
}
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
// Set environment variables for database tools
|
||||
cmd.Env = os.Environ()
|
||||
if e.cfg.Password != "" {
|
||||
@ -1598,3 +1619,22 @@ func formatBytes(bytes int64) string {
|
||||
}
|
||||
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
|
||||
// formatDuration formats a duration to human readable format (e.g., "3m 45s", "1h 23m", "45s")
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Second {
|
||||
return "0s"
|
||||
}
|
||||
|
||||
hours := int(d.Hours())
|
||||
minutes := int(d.Minutes()) % 60
|
||||
seconds := int(d.Seconds()) % 60
|
||||
|
||||
if hours > 0 {
|
||||
return fmt.Sprintf("%dh %dm", hours, minutes)
|
||||
}
|
||||
if minutes > 0 {
|
||||
return fmt.Sprintf("%dm %ds", minutes, seconds)
|
||||
}
|
||||
return fmt.Sprintf("%ds", seconds)
|
||||
}
|
||||
|
||||
@ -50,10 +50,11 @@ type Config struct {
|
||||
SampleValue int
|
||||
|
||||
// Output options
|
||||
NoColor bool
|
||||
Debug bool
|
||||
LogLevel string
|
||||
LogFormat string
|
||||
NoColor bool
|
||||
Debug bool
|
||||
DebugLocks bool // Extended lock debugging (captures lock detection, Guard decisions, boost attempts)
|
||||
LogLevel string
|
||||
LogFormat string
|
||||
|
||||
// Config persistence
|
||||
NoSaveConfig bool
|
||||
|
||||
@ -146,7 +146,7 @@ func (d *Dots) Start(message string) {
|
||||
fmt.Fprint(d.writer, message)
|
||||
|
||||
go func() {
|
||||
ticker := time.NewTicker(500 * time.Millisecond)
|
||||
ticker := time.NewTicker(100 * time.Millisecond)
|
||||
defer ticker.Stop()
|
||||
|
||||
count := 0
|
||||
|
||||
@ -292,6 +292,25 @@ func (e *Engine) restorePostgreSQLDump(ctx context.Context, archivePath, targetD
|
||||
|
||||
cmd := e.db.BuildRestoreCommand(targetDB, archivePath, opts)
|
||||
|
||||
// Start heartbeat ticker for restore progress
|
||||
restoreStart := time.Now()
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
defer heartbeatTicker.Stop()
|
||||
defer cancelHeartbeat()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(restoreStart)
|
||||
e.progress.Update(fmt.Sprintf("Restoring %s... (elapsed: %s)", targetDB, formatDuration(elapsed)))
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
if compressed {
|
||||
// For compressed dumps, decompress first
|
||||
return e.executeRestoreWithDecompression(ctx, archivePath, cmd)
|
||||
@ -820,8 +839,99 @@ func (e *Engine) previewRestore(archivePath, targetDB string, format ArchiveForm
|
||||
return nil
|
||||
}
|
||||
|
||||
// RestoreSingleFromCluster extracts and restores a single database from a cluster backup
|
||||
func (e *Engine) RestoreSingleFromCluster(ctx context.Context, clusterArchivePath, dbName, targetDB string, cleanFirst, createIfMissing bool) error {
|
||||
operation := e.log.StartOperation("Single Database Restore from Cluster")
|
||||
|
||||
// Validate and sanitize archive path
|
||||
validArchivePath, pathErr := security.ValidateArchivePath(clusterArchivePath)
|
||||
if pathErr != nil {
|
||||
operation.Fail(fmt.Sprintf("Invalid archive path: %v", pathErr))
|
||||
return fmt.Errorf("invalid archive path: %w", pathErr)
|
||||
}
|
||||
clusterArchivePath = validArchivePath
|
||||
|
||||
// Validate archive exists
|
||||
if _, err := os.Stat(clusterArchivePath); os.IsNotExist(err) {
|
||||
operation.Fail("Archive not found")
|
||||
return fmt.Errorf("archive not found: %s", clusterArchivePath)
|
||||
}
|
||||
|
||||
// Verify it's a cluster archive
|
||||
format := DetectArchiveFormat(clusterArchivePath)
|
||||
if format != FormatClusterTarGz {
|
||||
operation.Fail("Not a cluster archive")
|
||||
return fmt.Errorf("not a cluster archive: %s (format: %s)", clusterArchivePath, format)
|
||||
}
|
||||
|
||||
// Create temporary directory for extraction
|
||||
workDir := e.cfg.GetEffectiveWorkDir()
|
||||
tempDir := filepath.Join(workDir, fmt.Sprintf(".extract_%d", time.Now().Unix()))
|
||||
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
||||
operation.Fail("Failed to create temporary directory")
|
||||
return fmt.Errorf("failed to create temp directory: %w", err)
|
||||
}
|
||||
defer os.RemoveAll(tempDir)
|
||||
|
||||
// Extract the specific database from cluster archive
|
||||
e.log.Info("Extracting database from cluster backup", "database", dbName, "cluster", filepath.Base(clusterArchivePath))
|
||||
e.progress.Start(fmt.Sprintf("Extracting '%s' from cluster backup", dbName))
|
||||
|
||||
extractedPath, err := ExtractDatabaseFromCluster(ctx, clusterArchivePath, dbName, tempDir, e.log, e.progress)
|
||||
if err != nil {
|
||||
e.progress.Fail(fmt.Sprintf("Extraction failed: %v", err))
|
||||
operation.Fail(fmt.Sprintf("Extraction failed: %v", err))
|
||||
return fmt.Errorf("failed to extract database: %w", err)
|
||||
}
|
||||
|
||||
e.progress.Update(fmt.Sprintf("Extracted: %s", filepath.Base(extractedPath)))
|
||||
e.log.Info("Database extracted successfully", "path", extractedPath)
|
||||
|
||||
// Now restore the extracted database file
|
||||
e.progress.Update("Restoring database...")
|
||||
|
||||
// Create database if requested and it doesn't exist
|
||||
if createIfMissing {
|
||||
e.log.Info("Checking if target database exists", "database", targetDB)
|
||||
if err := e.ensureDatabaseExists(ctx, targetDB); err != nil {
|
||||
operation.Fail(fmt.Sprintf("Failed to create database: %v", err))
|
||||
return fmt.Errorf("failed to create database '%s': %w", targetDB, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Detect format of extracted file
|
||||
extractedFormat := DetectArchiveFormat(extractedPath)
|
||||
e.log.Info("Restoring extracted database", "format", extractedFormat, "target", targetDB)
|
||||
|
||||
// Restore based on format
|
||||
var restoreErr error
|
||||
switch extractedFormat {
|
||||
case FormatPostgreSQLDump, FormatPostgreSQLDumpGz:
|
||||
restoreErr = e.restorePostgreSQLDump(ctx, extractedPath, targetDB, extractedFormat == FormatPostgreSQLDumpGz, cleanFirst)
|
||||
case FormatPostgreSQLSQL, FormatPostgreSQLSQLGz:
|
||||
restoreErr = e.restorePostgreSQLSQL(ctx, extractedPath, targetDB, extractedFormat == FormatPostgreSQLSQLGz)
|
||||
case FormatMySQLSQL, FormatMySQLSQLGz:
|
||||
restoreErr = e.restoreMySQLSQL(ctx, extractedPath, targetDB, extractedFormat == FormatMySQLSQLGz)
|
||||
default:
|
||||
operation.Fail("Unsupported extracted format")
|
||||
return fmt.Errorf("unsupported extracted format: %s", extractedFormat)
|
||||
}
|
||||
|
||||
if restoreErr != nil {
|
||||
e.progress.Fail(fmt.Sprintf("Restore failed: %v", restoreErr))
|
||||
operation.Fail(fmt.Sprintf("Restore failed: %v", restoreErr))
|
||||
return restoreErr
|
||||
}
|
||||
|
||||
e.progress.Complete(fmt.Sprintf("Database '%s' restored from cluster backup", targetDB))
|
||||
operation.Complete(fmt.Sprintf("Restored '%s' from cluster as '%s'", dbName, targetDB))
|
||||
return nil
|
||||
}
|
||||
|
||||
// RestoreCluster restores a full cluster from a tar.gz archive
|
||||
func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
// If preExtractedPath is non-empty, uses that directory instead of extracting archivePath
|
||||
// This avoids double extraction when ValidateAndExtractCluster was already called
|
||||
func (e *Engine) RestoreCluster(ctx context.Context, archivePath string, preExtractedPath ...string) error {
|
||||
operation := e.log.StartOperation("Cluster Restore")
|
||||
|
||||
// Validate and sanitize archive path
|
||||
@ -852,22 +962,32 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
return fmt.Errorf("not a cluster archive: %s (detected format: %s)", archivePath, format)
|
||||
}
|
||||
|
||||
// Check disk space before starting restore
|
||||
e.log.Info("Checking disk space for restore")
|
||||
archiveInfo, err := os.Stat(archivePath)
|
||||
if err == nil {
|
||||
spaceCheck := checks.CheckDiskSpaceForRestore(e.cfg.BackupDir, archiveInfo.Size())
|
||||
// Check if we have a pre-extracted directory (optimization to avoid double extraction)
|
||||
// This check must happen BEFORE disk space checks to avoid false failures
|
||||
usingPreExtracted := len(preExtractedPath) > 0 && preExtractedPath[0] != ""
|
||||
|
||||
if spaceCheck.Critical {
|
||||
operation.Fail("Insufficient disk space")
|
||||
return fmt.Errorf("insufficient disk space for restore: %.1f%% used - need at least 4x archive size", spaceCheck.UsedPercent)
|
||||
}
|
||||
// Check disk space before starting restore (skip if using pre-extracted directory)
|
||||
var archiveInfo os.FileInfo
|
||||
var err error
|
||||
if !usingPreExtracted {
|
||||
e.log.Info("Checking disk space for restore")
|
||||
archiveInfo, err = os.Stat(archivePath)
|
||||
if err == nil {
|
||||
spaceCheck := checks.CheckDiskSpaceForRestore(e.cfg.BackupDir, archiveInfo.Size())
|
||||
|
||||
if spaceCheck.Warning {
|
||||
e.log.Warn("Low disk space - restore may fail",
|
||||
"available_gb", float64(spaceCheck.AvailableBytes)/(1024*1024*1024),
|
||||
"used_percent", spaceCheck.UsedPercent)
|
||||
if spaceCheck.Critical {
|
||||
operation.Fail("Insufficient disk space")
|
||||
return fmt.Errorf("insufficient disk space for restore: %.1f%% used - need at least 4x archive size", spaceCheck.UsedPercent)
|
||||
}
|
||||
|
||||
if spaceCheck.Warning {
|
||||
e.log.Warn("Low disk space - restore may fail",
|
||||
"available_gb", float64(spaceCheck.AvailableBytes)/(1024*1024*1024),
|
||||
"used_percent", spaceCheck.UsedPercent)
|
||||
}
|
||||
}
|
||||
} else {
|
||||
e.log.Info("Skipping disk space check (using pre-extracted directory)")
|
||||
}
|
||||
|
||||
if e.dryRun {
|
||||
@ -881,46 +1001,56 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
workDir := e.cfg.GetEffectiveWorkDir()
|
||||
tempDir := filepath.Join(workDir, fmt.Sprintf(".restore_%d", time.Now().Unix()))
|
||||
|
||||
// Check disk space for extraction (need ~3x archive size: compressed + extracted + working space)
|
||||
if archiveInfo != nil {
|
||||
requiredBytes := uint64(archiveInfo.Size()) * 3
|
||||
extractionCheck := checks.CheckDiskSpace(workDir)
|
||||
if extractionCheck.AvailableBytes < requiredBytes {
|
||||
operation.Fail("Insufficient disk space for extraction")
|
||||
return fmt.Errorf("insufficient disk space for extraction in %s: need %.1f GB, have %.1f GB (archive size: %.1f GB × 3)",
|
||||
workDir,
|
||||
float64(requiredBytes)/(1024*1024*1024),
|
||||
float64(extractionCheck.AvailableBytes)/(1024*1024*1024),
|
||||
float64(archiveInfo.Size())/(1024*1024*1024))
|
||||
// Handle pre-extracted directory or extract archive
|
||||
if usingPreExtracted {
|
||||
tempDir = preExtractedPath[0]
|
||||
// Note: Caller handles cleanup of pre-extracted directory
|
||||
e.log.Info("Using pre-extracted cluster directory",
|
||||
"path", tempDir,
|
||||
"optimization", "skipping duplicate extraction")
|
||||
} else {
|
||||
// Check disk space for extraction (need ~3x archive size: compressed + extracted + working space)
|
||||
if archiveInfo != nil {
|
||||
requiredBytes := uint64(archiveInfo.Size()) * 3
|
||||
extractionCheck := checks.CheckDiskSpace(workDir)
|
||||
if extractionCheck.AvailableBytes < requiredBytes {
|
||||
operation.Fail("Insufficient disk space for extraction")
|
||||
return fmt.Errorf("insufficient disk space for extraction in %s: need %.1f GB, have %.1f GB (archive size: %.1f GB × 3)",
|
||||
workDir,
|
||||
float64(requiredBytes)/(1024*1024*1024),
|
||||
float64(extractionCheck.AvailableBytes)/(1024*1024*1024),
|
||||
float64(archiveInfo.Size())/(1024*1024*1024))
|
||||
}
|
||||
e.log.Info("Disk space check for extraction passed",
|
||||
"workdir", workDir,
|
||||
"required_gb", float64(requiredBytes)/(1024*1024*1024),
|
||||
"available_gb", float64(extractionCheck.AvailableBytes)/(1024*1024*1024))
|
||||
}
|
||||
e.log.Info("Disk space check for extraction passed",
|
||||
"workdir", workDir,
|
||||
"required_gb", float64(requiredBytes)/(1024*1024*1024),
|
||||
"available_gb", float64(extractionCheck.AvailableBytes)/(1024*1024*1024))
|
||||
}
|
||||
|
||||
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
||||
operation.Fail("Failed to create temporary directory")
|
||||
return fmt.Errorf("failed to create temp directory in %s: %w", workDir, err)
|
||||
}
|
||||
defer os.RemoveAll(tempDir)
|
||||
// Need to extract archive ourselves
|
||||
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
||||
operation.Fail("Failed to create temporary directory")
|
||||
return fmt.Errorf("failed to create temp directory in %s: %w", workDir, err)
|
||||
}
|
||||
defer os.RemoveAll(tempDir)
|
||||
|
||||
// Extract archive
|
||||
e.log.Info("Extracting cluster archive", "archive", archivePath, "tempDir", tempDir)
|
||||
if err := e.extractArchive(ctx, archivePath, tempDir); err != nil {
|
||||
operation.Fail("Archive extraction failed")
|
||||
return fmt.Errorf("failed to extract archive: %w", err)
|
||||
}
|
||||
// Extract archive
|
||||
e.log.Info("Extracting cluster archive", "archive", archivePath, "tempDir", tempDir)
|
||||
if err := e.extractArchive(ctx, archivePath, tempDir); err != nil {
|
||||
operation.Fail("Archive extraction failed")
|
||||
return fmt.Errorf("failed to extract archive: %w", err)
|
||||
}
|
||||
|
||||
// Check context validity after extraction (debugging context cancellation issues)
|
||||
if ctx.Err() != nil {
|
||||
e.log.Error("Context cancelled after extraction - this should not happen",
|
||||
"context_error", ctx.Err(),
|
||||
"extraction_completed", true)
|
||||
operation.Fail("Context cancelled unexpectedly")
|
||||
return fmt.Errorf("context cancelled after extraction completed: %w", ctx.Err())
|
||||
// Check context validity after extraction (debugging context cancellation issues)
|
||||
if ctx.Err() != nil {
|
||||
e.log.Error("Context cancelled after extraction - this should not happen",
|
||||
"context_error", ctx.Err(),
|
||||
"extraction_completed", true)
|
||||
operation.Fail("Context cancelled unexpectedly")
|
||||
return fmt.Errorf("context cancelled after extraction completed: %w", ctx.Err())
|
||||
}
|
||||
e.log.Info("Extraction completed, context still valid")
|
||||
}
|
||||
e.log.Info("Extraction completed, context still valid")
|
||||
|
||||
// Check if user has superuser privileges (required for ownership restoration)
|
||||
e.progress.Update("Checking privileges...")
|
||||
@ -1042,6 +1172,27 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
e.log.Warn("Preflight checks failed", "error", preflightErr)
|
||||
}
|
||||
|
||||
// 🛡️ LARGE DATABASE GUARD - Bulletproof protection for large database restores
|
||||
e.progress.Update("Analyzing database characteristics...")
|
||||
guard := NewLargeDBGuard(e.cfg, e.log)
|
||||
|
||||
// Build list of dump files for analysis
|
||||
var dumpFilePaths []string
|
||||
for _, entry := range entries {
|
||||
if !entry.IsDir() {
|
||||
dumpFilePaths = append(dumpFilePaths, filepath.Join(dumpsDir, entry.Name()))
|
||||
}
|
||||
}
|
||||
|
||||
// Determine optimal restore strategy
|
||||
strategy := guard.DetermineStrategy(ctx, archivePath, dumpFilePaths)
|
||||
|
||||
// Apply strategy (override config if needed)
|
||||
if strategy.UseConservative {
|
||||
guard.ApplyStrategy(strategy, e.cfg)
|
||||
guard.WarnUser(strategy, e.silentMode)
|
||||
}
|
||||
|
||||
// Calculate optimal lock boost based on BLOB count
|
||||
lockBoostValue := 2048 // Default
|
||||
if preflight != nil && preflight.Archive.RecommendedLockBoost > 0 {
|
||||
@ -1050,23 +1201,88 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
|
||||
// AUTO-TUNE: Boost PostgreSQL settings for large restores
|
||||
e.progress.Update("Tuning PostgreSQL for large restore...")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Attempting to boost PostgreSQL lock settings",
|
||||
"target_max_locks", lockBoostValue,
|
||||
"conservative_mode", strategy.UseConservative)
|
||||
}
|
||||
|
||||
originalSettings, tuneErr := e.boostPostgreSQLSettings(ctx, lockBoostValue)
|
||||
if tuneErr != nil {
|
||||
e.log.Warn("Could not boost PostgreSQL settings - restore may fail on BLOB-heavy databases",
|
||||
"error", tuneErr)
|
||||
} else {
|
||||
e.log.Info("Boosted PostgreSQL settings for restore",
|
||||
"max_locks_per_transaction", fmt.Sprintf("%d → %d", originalSettings.MaxLocks, lockBoostValue),
|
||||
"maintenance_work_mem", fmt.Sprintf("%s → 2GB", originalSettings.MaintenanceWorkMem))
|
||||
// Ensure we reset settings when done (even on failure)
|
||||
defer func() {
|
||||
if resetErr := e.resetPostgreSQLSettings(ctx, originalSettings); resetErr != nil {
|
||||
e.log.Warn("Could not reset PostgreSQL settings", "error", resetErr)
|
||||
} else {
|
||||
e.log.Info("Reset PostgreSQL settings to original values")
|
||||
}
|
||||
}()
|
||||
e.log.Error("Could not boost PostgreSQL settings", "error", tuneErr)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] Lock boost attempt FAILED",
|
||||
"error", tuneErr,
|
||||
"phase", "boostPostgreSQLSettings")
|
||||
}
|
||||
|
||||
operation.Fail("PostgreSQL tuning failed")
|
||||
return fmt.Errorf("failed to boost PostgreSQL settings: %w", tuneErr)
|
||||
}
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Lock boost function returned",
|
||||
"original_max_locks", originalSettings.MaxLocks,
|
||||
"target_max_locks", lockBoostValue,
|
||||
"boost_successful", originalSettings.MaxLocks >= lockBoostValue)
|
||||
}
|
||||
|
||||
// CRITICAL: Verify locks were actually increased
|
||||
// Even in conservative mode (--jobs=1), a single massive database can exhaust locks
|
||||
// If boost failed (couldn't restart PostgreSQL), we MUST abort
|
||||
if originalSettings.MaxLocks < lockBoostValue {
|
||||
e.log.Error("PostgreSQL lock boost FAILED - restart required but not possible",
|
||||
"current_locks", originalSettings.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"conservative_mode", strategy.UseConservative,
|
||||
"note", "Even single-threaded restore can fail with massive databases")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] CRITICAL: Lock verification FAILED",
|
||||
"actual_locks", originalSettings.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"delta", lockBoostValue-originalSettings.MaxLocks,
|
||||
"verdict", "ABORT RESTORE")
|
||||
}
|
||||
|
||||
operation.Fail(fmt.Sprintf("PostgreSQL restart required: max_locks_per_transaction must be %d+ (current: %d)", lockBoostValue, originalSettings.MaxLocks))
|
||||
|
||||
// Provide clear instructions
|
||||
e.log.Error("=" + strings.Repeat("=", 70))
|
||||
e.log.Error("RESTORE ABORTED - Action Required:")
|
||||
e.log.Error("1. ALTER SYSTEM has saved max_locks_per_transaction=%d to postgresql.auto.conf", lockBoostValue)
|
||||
e.log.Error("2. Restart PostgreSQL to activate the new setting:")
|
||||
e.log.Error(" sudo systemctl restart postgresql")
|
||||
e.log.Error("3. Retry the restore - it will then complete successfully")
|
||||
e.log.Error("=" + strings.Repeat("=", 70))
|
||||
|
||||
return fmt.Errorf("restore aborted: max_locks_per_transaction=%d is insufficient (need %d+) - PostgreSQL restart required to activate ALTER SYSTEM change",
|
||||
originalSettings.MaxLocks, lockBoostValue)
|
||||
}
|
||||
|
||||
e.log.Info("PostgreSQL tuning verified - locks sufficient for restore",
|
||||
"max_locks_per_transaction", originalSettings.MaxLocks,
|
||||
"target_locks", lockBoostValue,
|
||||
"maintenance_work_mem", "2GB",
|
||||
"conservative_mode", strategy.UseConservative)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Lock verification PASSED",
|
||||
"actual_locks", originalSettings.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"verdict", "PROCEED WITH RESTORE")
|
||||
}
|
||||
|
||||
// Ensure we reset settings when done (even on failure)
|
||||
defer func() {
|
||||
if resetErr := e.resetPostgreSQLSettings(ctx, originalSettings); resetErr != nil {
|
||||
e.log.Warn("Could not reset PostgreSQL settings", "error", resetErr)
|
||||
} else {
|
||||
e.log.Info("Reset PostgreSQL settings to original values")
|
||||
}
|
||||
}()
|
||||
|
||||
var restoreErrors *multierror.Error
|
||||
var restoreErrorsMu sync.Mutex
|
||||
@ -1229,6 +1445,25 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
preserveOwnership := isSuperuser
|
||||
isCompressedSQL := strings.HasSuffix(dumpFile, ".sql.gz")
|
||||
|
||||
// Start heartbeat ticker to show progress during long-running restore
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(dbRestoreStart)
|
||||
mu.Lock()
|
||||
statusMsg := fmt.Sprintf("Restoring %s (%d/%d) - elapsed: %s",
|
||||
dbName, idx+1, totalDBs, formatDuration(elapsed))
|
||||
e.progress.Update(statusMsg)
|
||||
mu.Unlock()
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
var restoreErr error
|
||||
if isCompressedSQL {
|
||||
mu.Lock()
|
||||
@ -1242,6 +1477,10 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
restoreErr = e.restorePostgreSQLDumpWithOwnership(ctx, dumpFile, dbName, false, preserveOwnership)
|
||||
}
|
||||
|
||||
// Stop heartbeat ticker
|
||||
heartbeatTicker.Stop()
|
||||
cancelHeartbeat()
|
||||
|
||||
if restoreErr != nil {
|
||||
mu.Lock()
|
||||
e.log.Error("Failed to restore database", "name", dbName, "file", dumpFile, "error", restoreErr)
|
||||
@ -1484,9 +1723,9 @@ func (pr *progressReader) Read(p []byte) (n int, err error) {
|
||||
n, err = pr.reader.Read(p)
|
||||
pr.bytesRead += int64(n)
|
||||
|
||||
// Throttle progress reporting to every 100ms
|
||||
// Throttle progress reporting to every 50ms for smoother updates
|
||||
if pr.reportEvery == 0 {
|
||||
pr.reportEvery = 100 * time.Millisecond
|
||||
pr.reportEvery = 50 * time.Millisecond
|
||||
}
|
||||
if time.Since(pr.lastReport) > pr.reportEvery {
|
||||
if pr.callback != nil {
|
||||
@ -1500,6 +1739,25 @@ func (pr *progressReader) Read(p []byte) (n int, err error) {
|
||||
|
||||
// extractArchiveShell extracts using shell tar command (faster but no progress)
|
||||
func (e *Engine) extractArchiveShell(ctx context.Context, archivePath, destDir string) error {
|
||||
// Start heartbeat ticker for extraction progress
|
||||
extractionStart := time.Now()
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
defer heartbeatTicker.Stop()
|
||||
defer cancelHeartbeat()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(extractionStart)
|
||||
e.progress.Update(fmt.Sprintf("Extracting archive... (elapsed: %s)", formatDuration(elapsed)))
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
cmd := exec.CommandContext(ctx, "tar", "-xzf", archivePath, "-C", destDir)
|
||||
|
||||
// Stream stderr to avoid memory issues - tar can produce lots of output for large archives
|
||||
@ -2082,6 +2340,25 @@ func FormatBytes(bytes int64) string {
|
||||
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
|
||||
// formatDuration formats a duration to human readable format (e.g., "3m 45s", "1h 23m", "45s")
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Second {
|
||||
return "0s"
|
||||
}
|
||||
|
||||
hours := int(d.Hours())
|
||||
minutes := int(d.Minutes()) % 60
|
||||
seconds := int(d.Seconds()) % 60
|
||||
|
||||
if hours > 0 {
|
||||
return fmt.Sprintf("%dh %dm", hours, minutes)
|
||||
}
|
||||
if minutes > 0 {
|
||||
return fmt.Sprintf("%dm %ds", minutes, seconds)
|
||||
}
|
||||
return fmt.Sprintf("%ds", seconds)
|
||||
}
|
||||
|
||||
// quickValidateSQLDump performs a fast validation of SQL dump files
|
||||
// by checking for truncated COPY blocks. This catches corrupted dumps
|
||||
// BEFORE attempting a full restore (which could waste 49+ minutes).
|
||||
@ -2261,9 +2538,18 @@ type OriginalSettings struct {
|
||||
// NOTE: max_locks_per_transaction requires a PostgreSQL RESTART to take effect!
|
||||
// maintenance_work_mem can be changed with pg_reload_conf().
|
||||
func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int) (*OriginalSettings, error) {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] boostPostgreSQLSettings: Starting lock boost procedure",
|
||||
"target_lock_value", lockBoostValue)
|
||||
}
|
||||
|
||||
connStr := e.buildConnString()
|
||||
db, err := sql.Open("pgx", connStr)
|
||||
if err != nil {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] Failed to connect to PostgreSQL",
|
||||
"error", err)
|
||||
}
|
||||
return nil, fmt.Errorf("failed to connect: %w", err)
|
||||
}
|
||||
defer db.Close()
|
||||
@ -2275,6 +2561,13 @@ func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&maxLocksStr); err == nil {
|
||||
original.MaxLocks, _ = strconv.Atoi(maxLocksStr)
|
||||
}
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Current PostgreSQL lock configuration",
|
||||
"current_max_locks", original.MaxLocks,
|
||||
"target_max_locks", lockBoostValue,
|
||||
"boost_required", original.MaxLocks < lockBoostValue)
|
||||
}
|
||||
|
||||
// Get current maintenance_work_mem
|
||||
db.QueryRowContext(ctx, "SHOW maintenance_work_mem").Scan(&original.MaintenanceWorkMem)
|
||||
@ -2283,14 +2576,31 @@ func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int
|
||||
// pg_reload_conf() is NOT sufficient for this parameter.
|
||||
needsRestart := false
|
||||
if original.MaxLocks < lockBoostValue {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Executing ALTER SYSTEM to boost locks",
|
||||
"from", original.MaxLocks,
|
||||
"to", lockBoostValue)
|
||||
}
|
||||
|
||||
_, err = db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", lockBoostValue))
|
||||
if err != nil {
|
||||
e.log.Warn("Could not set max_locks_per_transaction", "error", err)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] ALTER SYSTEM failed",
|
||||
"error", err)
|
||||
}
|
||||
} else {
|
||||
needsRestart = true
|
||||
e.log.Warn("max_locks_per_transaction requires PostgreSQL restart to take effect",
|
||||
"current", original.MaxLocks,
|
||||
"target", lockBoostValue)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] ALTER SYSTEM succeeded - restart required",
|
||||
"setting_saved_to", "postgresql.auto.conf",
|
||||
"active_after", "PostgreSQL restart")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -2309,28 +2619,62 @@ func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int
|
||||
|
||||
// If max_locks_per_transaction needs a restart, try to do it
|
||||
if needsRestart {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Attempting PostgreSQL restart to activate new lock setting")
|
||||
}
|
||||
|
||||
if restarted := e.tryRestartPostgreSQL(ctx); restarted {
|
||||
e.log.Info("PostgreSQL restarted successfully - max_locks_per_transaction now active")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] PostgreSQL restart SUCCEEDED")
|
||||
}
|
||||
|
||||
// Wait for PostgreSQL to be ready
|
||||
time.Sleep(3 * time.Second)
|
||||
// Update original.MaxLocks to reflect the new value after restart
|
||||
var newMaxLocksStr string
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&newMaxLocksStr); err == nil {
|
||||
original.MaxLocks, _ = strconv.Atoi(newMaxLocksStr)
|
||||
e.log.Info("Verified new max_locks_per_transaction after restart", "value", original.MaxLocks)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Post-restart verification",
|
||||
"new_max_locks", original.MaxLocks,
|
||||
"target_was", lockBoostValue,
|
||||
"verification", "PASS")
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Cannot restart - warn user but continue
|
||||
// The setting is written to postgresql.auto.conf and will take effect on next restart
|
||||
e.log.Warn("=" + strings.Repeat("=", 70))
|
||||
e.log.Warn("NOTE: max_locks_per_transaction change requires PostgreSQL restart")
|
||||
e.log.Warn("Current value: " + strconv.Itoa(original.MaxLocks) + ", target: " + strconv.Itoa(lockBoostValue))
|
||||
e.log.Warn("")
|
||||
e.log.Warn("The setting has been saved to postgresql.auto.conf and will take")
|
||||
e.log.Warn("effect on the next PostgreSQL restart. If restore fails with")
|
||||
e.log.Warn("'out of shared memory' errors, ask your DBA to restart PostgreSQL.")
|
||||
e.log.Warn("")
|
||||
e.log.Warn("Continuing with restore - this may succeed if your databases")
|
||||
e.log.Warn("don't have many large objects (BLOBs).")
|
||||
e.log.Warn("=" + strings.Repeat("=", 70))
|
||||
// Continue anyway - might work for small restores or DBs without BLOBs
|
||||
// Cannot restart - this is now a CRITICAL failure
|
||||
// We tried to boost locks but can't apply them without restart
|
||||
e.log.Error("CRITICAL: max_locks_per_transaction boost requires PostgreSQL restart")
|
||||
e.log.Error("Current value: "+strconv.Itoa(original.MaxLocks)+", required: "+strconv.Itoa(lockBoostValue))
|
||||
e.log.Error("The setting has been saved to postgresql.auto.conf but is NOT ACTIVE")
|
||||
e.log.Error("Restore will ABORT to prevent 'out of shared memory' failure")
|
||||
e.log.Error("Action required: Ask DBA to restart PostgreSQL, then retry restore")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] PostgreSQL restart FAILED",
|
||||
"current_locks", original.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"setting_saved", true,
|
||||
"setting_active", false,
|
||||
"verdict", "ABORT - Manual restart required")
|
||||
}
|
||||
|
||||
// Return original settings so caller can check and abort
|
||||
return original, nil
|
||||
}
|
||||
}
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] boostPostgreSQLSettings: Complete",
|
||||
"final_max_locks", original.MaxLocks,
|
||||
"target_was", lockBoostValue,
|
||||
"boost_successful", original.MaxLocks >= lockBoostValue)
|
||||
}
|
||||
|
||||
return original, nil
|
||||
}
|
||||
|
||||
|
||||
344
internal/restore/extract.go
Normal file
344
internal/restore/extract.go
Normal file
@ -0,0 +1,344 @@
|
||||
package restore
|
||||
|
||||
import (
|
||||
"archive/tar"
|
||||
"compress/gzip"
|
||||
"context"
|
||||
"fmt"
|
||||
"io"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sort"
|
||||
"strings"
|
||||
|
||||
"dbbackup/internal/logger"
|
||||
"dbbackup/internal/progress"
|
||||
)
|
||||
|
||||
// DatabaseInfo represents metadata about a database in a cluster backup
|
||||
type DatabaseInfo struct {
|
||||
Name string
|
||||
Filename string
|
||||
Size int64
|
||||
}
|
||||
|
||||
// ListDatabasesInCluster lists all databases in a cluster backup archive
|
||||
func ListDatabasesInCluster(ctx context.Context, archivePath string, log logger.Logger) ([]DatabaseInfo, error) {
|
||||
file, err := os.Open(archivePath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot open archive: %w", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
gz, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("not a valid gzip archive: %w", err)
|
||||
}
|
||||
defer gz.Close()
|
||||
|
||||
tarReader := tar.NewReader(gz)
|
||||
databases := make([]DatabaseInfo, 0)
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return nil, ctx.Err()
|
||||
default:
|
||||
}
|
||||
|
||||
header, err := tarReader.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error reading tar archive: %w", err)
|
||||
}
|
||||
|
||||
// Look for files in dumps/ directory
|
||||
if !header.FileInfo().IsDir() && strings.HasPrefix(header.Name, "dumps/") {
|
||||
filename := filepath.Base(header.Name)
|
||||
|
||||
// Extract database name from filename (remove .dump, .dump.gz, .sql, .sql.gz)
|
||||
dbName := filename
|
||||
dbName = strings.TrimSuffix(dbName, ".dump.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".dump")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql")
|
||||
|
||||
databases = append(databases, DatabaseInfo{
|
||||
Name: dbName,
|
||||
Filename: filename,
|
||||
Size: header.Size,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Sort by name for consistent output
|
||||
sort.Slice(databases, func(i, j int) bool {
|
||||
return databases[i].Name < databases[j].Name
|
||||
})
|
||||
|
||||
if len(databases) == 0 {
|
||||
return nil, fmt.Errorf("no databases found in cluster backup")
|
||||
}
|
||||
|
||||
return databases, nil
|
||||
}
|
||||
|
||||
// ExtractDatabaseFromCluster extracts a single database dump from cluster backup
|
||||
func ExtractDatabaseFromCluster(ctx context.Context, archivePath, dbName, outputDir string, log logger.Logger, prog progress.Indicator) (string, error) {
|
||||
file, err := os.Open(archivePath)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("cannot open archive: %w", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
stat, err := file.Stat()
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("cannot stat archive: %w", err)
|
||||
}
|
||||
archiveSize := stat.Size()
|
||||
|
||||
gz, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("not a valid gzip archive: %w", err)
|
||||
}
|
||||
defer gz.Close()
|
||||
|
||||
tarReader := tar.NewReader(gz)
|
||||
|
||||
// Create output directory if needed
|
||||
if err := os.MkdirAll(outputDir, 0755); err != nil {
|
||||
return "", fmt.Errorf("cannot create output directory: %w", err)
|
||||
}
|
||||
|
||||
targetPattern := fmt.Sprintf("dumps/%s.", dbName) // Match dbName.dump, dbName.sql, etc.
|
||||
var extractedPath string
|
||||
found := false
|
||||
|
||||
if prog != nil {
|
||||
prog.Start(fmt.Sprintf("Extracting database: %s", dbName))
|
||||
defer prog.Stop()
|
||||
}
|
||||
|
||||
var bytesRead int64
|
||||
ticker := make(chan struct{})
|
||||
stopTicker := make(chan struct{})
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-stopTicker:
|
||||
return
|
||||
case <-ticker:
|
||||
if prog != nil && archiveSize > 0 {
|
||||
percentage := float64(bytesRead) / float64(archiveSize) * 100
|
||||
prog.Update(fmt.Sprintf("Scanning: %.1f%%", percentage))
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
close(stopTicker)
|
||||
return "", ctx.Err()
|
||||
default:
|
||||
}
|
||||
|
||||
header, err := tarReader.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return "", fmt.Errorf("error reading tar archive: %w", err)
|
||||
}
|
||||
|
||||
bytesRead += header.Size
|
||||
select {
|
||||
case ticker <- struct{}{}:
|
||||
default:
|
||||
}
|
||||
|
||||
// Check if this is the database we're looking for
|
||||
if strings.HasPrefix(header.Name, targetPattern) && !header.FileInfo().IsDir() {
|
||||
filename := filepath.Base(header.Name)
|
||||
extractedPath = filepath.Join(outputDir, filename)
|
||||
|
||||
// Extract the file
|
||||
outFile, err := os.Create(extractedPath)
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return "", fmt.Errorf("cannot create output file: %w", err)
|
||||
}
|
||||
|
||||
if prog != nil {
|
||||
prog.Update(fmt.Sprintf("Extracting: %s", filename))
|
||||
}
|
||||
|
||||
written, err := io.Copy(outFile, tarReader)
|
||||
outFile.Close()
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return "", fmt.Errorf("extraction failed: %w", err)
|
||||
}
|
||||
|
||||
log.Info("Database extracted successfully", "database", dbName, "size", formatBytes(written), "path", extractedPath)
|
||||
found = true
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
close(stopTicker)
|
||||
|
||||
if !found {
|
||||
return "", fmt.Errorf("database '%s' not found in cluster backup", dbName)
|
||||
}
|
||||
|
||||
return extractedPath, nil
|
||||
}
|
||||
|
||||
// ExtractMultipleDatabasesFromCluster extracts multiple databases from cluster backup
|
||||
func ExtractMultipleDatabasesFromCluster(ctx context.Context, archivePath string, dbNames []string, outputDir string, log logger.Logger, prog progress.Indicator) (map[string]string, error) {
|
||||
file, err := os.Open(archivePath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot open archive: %w", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
stat, err := file.Stat()
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot stat archive: %w", err)
|
||||
}
|
||||
archiveSize := stat.Size()
|
||||
|
||||
gz, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("not a valid gzip archive: %w", err)
|
||||
}
|
||||
defer gz.Close()
|
||||
|
||||
tarReader := tar.NewReader(gz)
|
||||
|
||||
// Create output directory if needed
|
||||
if err := os.MkdirAll(outputDir, 0755); err != nil {
|
||||
return nil, fmt.Errorf("cannot create output directory: %w", err)
|
||||
}
|
||||
|
||||
// Build lookup map
|
||||
targetDBs := make(map[string]bool)
|
||||
for _, dbName := range dbNames {
|
||||
targetDBs[dbName] = true
|
||||
}
|
||||
|
||||
extractedPaths := make(map[string]string)
|
||||
|
||||
if prog != nil {
|
||||
prog.Start(fmt.Sprintf("Extracting %d databases", len(dbNames)))
|
||||
defer prog.Stop()
|
||||
}
|
||||
|
||||
var bytesRead int64
|
||||
ticker := make(chan struct{})
|
||||
stopTicker := make(chan struct{})
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-stopTicker:
|
||||
return
|
||||
case <-ticker:
|
||||
if prog != nil && archiveSize > 0 {
|
||||
percentage := float64(bytesRead) / float64(archiveSize) * 100
|
||||
prog.Update(fmt.Sprintf("Scanning: %.1f%% (%d/%d found)", percentage, len(extractedPaths), len(dbNames)))
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
close(stopTicker)
|
||||
return nil, ctx.Err()
|
||||
default:
|
||||
}
|
||||
|
||||
header, err := tarReader.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return nil, fmt.Errorf("error reading tar archive: %w", err)
|
||||
}
|
||||
|
||||
bytesRead += header.Size
|
||||
select {
|
||||
case ticker <- struct{}{}:
|
||||
default:
|
||||
}
|
||||
|
||||
// Check if this is one of the databases we're looking for
|
||||
if strings.HasPrefix(header.Name, "dumps/") && !header.FileInfo().IsDir() {
|
||||
filename := filepath.Base(header.Name)
|
||||
|
||||
// Extract database name
|
||||
dbName := filename
|
||||
dbName = strings.TrimSuffix(dbName, ".dump.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".dump")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql")
|
||||
|
||||
if targetDBs[dbName] {
|
||||
extractedPath := filepath.Join(outputDir, filename)
|
||||
|
||||
// Extract the file
|
||||
outFile, err := os.Create(extractedPath)
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return nil, fmt.Errorf("cannot create output file for %s: %w", dbName, err)
|
||||
}
|
||||
|
||||
if prog != nil {
|
||||
prog.Update(fmt.Sprintf("Extracting: %s (%d/%d)", dbName, len(extractedPaths)+1, len(dbNames)))
|
||||
}
|
||||
|
||||
written, err := io.Copy(outFile, tarReader)
|
||||
outFile.Close()
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return nil, fmt.Errorf("extraction failed for %s: %w", dbName, err)
|
||||
}
|
||||
|
||||
log.Info("Database extracted", "database", dbName, "size", formatBytes(written))
|
||||
extractedPaths[dbName] = extractedPath
|
||||
|
||||
// Stop early if we found all databases
|
||||
if len(extractedPaths) == len(dbNames) {
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
close(stopTicker)
|
||||
|
||||
// Check if all requested databases were found
|
||||
missing := make([]string, 0)
|
||||
for _, dbName := range dbNames {
|
||||
if _, found := extractedPaths[dbName]; !found {
|
||||
missing = append(missing, dbName)
|
||||
}
|
||||
}
|
||||
|
||||
if len(missing) > 0 {
|
||||
return extractedPaths, fmt.Errorf("databases not found in cluster backup: %s", strings.Join(missing, ", "))
|
||||
}
|
||||
|
||||
return extractedPaths, nil
|
||||
}
|
||||
363
internal/restore/large_db_guard.go
Normal file
363
internal/restore/large_db_guard.go
Normal file
@ -0,0 +1,363 @@
|
||||
package restore
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/logger"
|
||||
)
|
||||
|
||||
// LargeDBGuard provides bulletproof protection for large database restores
|
||||
type LargeDBGuard struct {
|
||||
log logger.Logger
|
||||
cfg *config.Config
|
||||
}
|
||||
|
||||
// RestoreStrategy determines how to restore based on database characteristics
|
||||
type RestoreStrategy struct {
|
||||
UseConservative bool // Force conservative (single-threaded) mode
|
||||
Reason string // Why this strategy was chosen
|
||||
Jobs int // Recommended --jobs value
|
||||
ParallelDBs int // Recommended parallel database restores
|
||||
ExpectedTime string // Estimated restore time
|
||||
}
|
||||
|
||||
// NewLargeDBGuard creates a new guard
|
||||
func NewLargeDBGuard(cfg *config.Config, log logger.Logger) *LargeDBGuard {
|
||||
return &LargeDBGuard{
|
||||
cfg: cfg,
|
||||
log: log,
|
||||
}
|
||||
}
|
||||
|
||||
// DetermineStrategy analyzes the restore and determines the safest approach
|
||||
func (g *LargeDBGuard) DetermineStrategy(ctx context.Context, archivePath string, dumpFiles []string) *RestoreStrategy {
|
||||
strategy := &RestoreStrategy{
|
||||
UseConservative: false,
|
||||
Jobs: 0, // Will use profile default
|
||||
ParallelDBs: 0, // Will use profile default
|
||||
}
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Large DB Guard: Starting strategy analysis",
|
||||
"archive", archivePath,
|
||||
"dump_count", len(dumpFiles))
|
||||
}
|
||||
|
||||
// 1. Check for large objects (BLOBs)
|
||||
hasLargeObjects, blobCount := g.detectLargeObjects(ctx, dumpFiles)
|
||||
if hasLargeObjects {
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("Database contains %d large objects (BLOBs)", blobCount)
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
|
||||
if blobCount > 10000 {
|
||||
strategy.ExpectedTime = "8-12 hours for very large BLOB database"
|
||||
} else if blobCount > 1000 {
|
||||
strategy.ExpectedTime = "4-8 hours for large BLOB database"
|
||||
} else {
|
||||
strategy.ExpectedTime = "2-4 hours"
|
||||
}
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: Forcing conservative mode",
|
||||
"blob_count", blobCount,
|
||||
"reason", strategy.Reason)
|
||||
return strategy
|
||||
}
|
||||
|
||||
// 2. Check total database size
|
||||
totalSize := g.estimateTotalSize(dumpFiles)
|
||||
if totalSize > 50*1024*1024*1024 { // > 50GB
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("Total database size: %s (>50GB)", FormatBytes(totalSize))
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
strategy.ExpectedTime = "6-10 hours for very large database"
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: Forcing conservative mode",
|
||||
"total_size_gb", totalSize/(1024*1024*1024),
|
||||
"reason", strategy.Reason)
|
||||
return strategy
|
||||
}
|
||||
|
||||
// 3. Check PostgreSQL lock configuration
|
||||
// CRITICAL: ALWAYS force conservative mode unless locks are 4096+
|
||||
// Parallel restore exhausts locks even with 2048 and high connection count
|
||||
// This is the PRIMARY protection - lock exhaustion is the #1 failure mode
|
||||
maxLocks, maxConns := g.checkLockConfiguration(ctx)
|
||||
lockCapacity := maxLocks * maxConns
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] PostgreSQL lock configuration detected",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"calculated_capacity", lockCapacity,
|
||||
"threshold_required", 4096,
|
||||
"below_threshold", maxLocks < 4096)
|
||||
}
|
||||
|
||||
if maxLocks < 4096 {
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("PostgreSQL max_locks_per_transaction=%d (need 4096+ for parallel restore)", maxLocks)
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: FORCING conservative mode - lock protection",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"total_capacity", lockCapacity,
|
||||
"required_locks", 4096,
|
||||
"reason", strategy.Reason)
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Guard decision: CONSERVATIVE mode",
|
||||
"jobs", 1,
|
||||
"parallel_dbs", 1,
|
||||
"reason", "Lock threshold not met (max_locks < 4096)")
|
||||
}
|
||||
return strategy
|
||||
}
|
||||
|
||||
g.log.Info("✅ Large DB Guard: Lock configuration OK for parallel restore",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"total_capacity", lockCapacity)
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Lock check PASSED - parallel restore allowed",
|
||||
"max_locks", maxLocks,
|
||||
"threshold", 4096,
|
||||
"verdict", "PASS")
|
||||
}
|
||||
|
||||
// 4. Check individual dump file sizes
|
||||
largestDump := g.findLargestDump(dumpFiles)
|
||||
if largestDump.size > 10*1024*1024*1024 { // > 10GB single dump
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("Largest database: %s (%s)", largestDump.name, FormatBytes(largestDump.size))
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: Forcing conservative mode",
|
||||
"largest_db", largestDump.name,
|
||||
"size_gb", largestDump.size/(1024*1024*1024),
|
||||
"reason", strategy.Reason)
|
||||
return strategy
|
||||
}
|
||||
|
||||
// All checks passed - safe to use default profile
|
||||
strategy.Reason = "No large database risks detected"
|
||||
g.log.Info("✅ Large DB Guard: Safe to use default profile")
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Final strategy: Default profile (no restrictions)",
|
||||
"use_conservative", false,
|
||||
"reason", strategy.Reason)
|
||||
}
|
||||
|
||||
return strategy
|
||||
}
|
||||
|
||||
// detectLargeObjects checks dump files for BLOBs/large objects
|
||||
func (g *LargeDBGuard) detectLargeObjects(ctx context.Context, dumpFiles []string) (bool, int) {
|
||||
totalBlobCount := 0
|
||||
|
||||
for _, dumpFile := range dumpFiles {
|
||||
// Skip if not a custom format dump
|
||||
if !strings.HasSuffix(dumpFile, ".dump") {
|
||||
continue
|
||||
}
|
||||
|
||||
// Use pg_restore -l to list contents (fast)
|
||||
listCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
|
||||
cmd := exec.CommandContext(listCtx, "pg_restore", "-l", dumpFile)
|
||||
output, err := cmd.Output()
|
||||
cancel()
|
||||
|
||||
if err != nil {
|
||||
continue // Skip on error
|
||||
}
|
||||
|
||||
// Count BLOB entries
|
||||
for _, line := range strings.Split(string(output), "\n") {
|
||||
if strings.Contains(line, "BLOB") ||
|
||||
strings.Contains(line, "LARGE OBJECT") ||
|
||||
strings.Contains(line, " BLOBS ") {
|
||||
totalBlobCount++
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return totalBlobCount > 0, totalBlobCount
|
||||
}
|
||||
|
||||
// estimateTotalSize calculates total size of all dump files
|
||||
func (g *LargeDBGuard) estimateTotalSize(dumpFiles []string) int64 {
|
||||
var total int64
|
||||
for _, file := range dumpFiles {
|
||||
if info, err := os.Stat(file); err == nil {
|
||||
total += info.Size()
|
||||
}
|
||||
}
|
||||
return total
|
||||
}
|
||||
|
||||
// checkLockCapacity gets PostgreSQL lock table capacity
|
||||
func (g *LargeDBGuard) checkLockCapacity(ctx context.Context) int {
|
||||
maxLocks, maxConns := g.checkLockConfiguration(ctx)
|
||||
maxPrepared := 0 // We don't use prepared transactions in restore
|
||||
|
||||
// Calculate total lock capacity
|
||||
capacity := maxLocks * (maxConns + maxPrepared)
|
||||
return capacity
|
||||
}
|
||||
|
||||
// checkLockConfiguration returns max_locks_per_transaction and max_connections
|
||||
func (g *LargeDBGuard) checkLockConfiguration(ctx context.Context) (int, int) {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Querying PostgreSQL for lock configuration",
|
||||
"host", g.cfg.Host,
|
||||
"port", g.cfg.Port,
|
||||
"user", g.cfg.User)
|
||||
}
|
||||
|
||||
// Build connection string
|
||||
connStr := fmt.Sprintf("host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable",
|
||||
g.cfg.Host, g.cfg.Port, g.cfg.User, g.cfg.Password)
|
||||
|
||||
db, err := sql.Open("pgx", connStr)
|
||||
if err != nil {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Warn("🔍 [LOCK-DEBUG] Failed to connect to PostgreSQL, using defaults",
|
||||
"error", err,
|
||||
"default_max_locks", 64,
|
||||
"default_max_connections", 100)
|
||||
}
|
||||
return 64, 100 // PostgreSQL defaults
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
var maxLocks, maxConns int
|
||||
|
||||
// Get max_locks_per_transaction
|
||||
err = db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&maxLocks)
|
||||
if err != nil {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Warn("🔍 [LOCK-DEBUG] Failed to query max_locks_per_transaction",
|
||||
"error", err,
|
||||
"using_default", 64)
|
||||
}
|
||||
maxLocks = 64 // PostgreSQL default
|
||||
}
|
||||
|
||||
// Get max_connections
|
||||
err = db.QueryRowContext(ctx, "SHOW max_connections").Scan(&maxConns)
|
||||
if err != nil {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Warn("🔍 [LOCK-DEBUG] Failed to query max_connections",
|
||||
"error", err,
|
||||
"using_default", 100)
|
||||
}
|
||||
maxConns = 100 // PostgreSQL default
|
||||
}
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Successfully retrieved PostgreSQL lock settings",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"total_capacity", maxLocks*maxConns)
|
||||
}
|
||||
|
||||
return maxLocks, maxConns
|
||||
}
|
||||
|
||||
// findLargestDump finds the largest individual dump file
|
||||
func (g *LargeDBGuard) findLargestDump(dumpFiles []string) struct {
|
||||
name string
|
||||
size int64
|
||||
} {
|
||||
var largest struct {
|
||||
name string
|
||||
size int64
|
||||
}
|
||||
|
||||
for _, file := range dumpFiles {
|
||||
if info, err := os.Stat(file); err == nil {
|
||||
if info.Size() > largest.size {
|
||||
largest.name = filepath.Base(file)
|
||||
largest.size = info.Size()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return largest
|
||||
}
|
||||
|
||||
// ApplyStrategy enforces the recommended strategy
|
||||
func (g *LargeDBGuard) ApplyStrategy(strategy *RestoreStrategy, cfg *config.Config) {
|
||||
if !strategy.UseConservative {
|
||||
return
|
||||
}
|
||||
|
||||
// Override configuration to force conservative settings
|
||||
if strategy.Jobs > 0 {
|
||||
cfg.Jobs = strategy.Jobs
|
||||
}
|
||||
if strategy.ParallelDBs > 0 {
|
||||
cfg.ClusterParallelism = strategy.ParallelDBs
|
||||
}
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard ACTIVE",
|
||||
"reason", strategy.Reason,
|
||||
"jobs", cfg.Jobs,
|
||||
"parallel_dbs", cfg.ClusterParallelism,
|
||||
"expected_time", strategy.ExpectedTime)
|
||||
}
|
||||
|
||||
// WarnUser displays prominent warning about single-threaded restore
|
||||
// In silent mode (TUI), this is skipped to prevent scrambled output
|
||||
func (g *LargeDBGuard) WarnUser(strategy *RestoreStrategy, silentMode bool) {
|
||||
if !strategy.UseConservative {
|
||||
return
|
||||
}
|
||||
|
||||
// In TUI/silent mode, don't print to stdout - it causes scrambled output
|
||||
if silentMode {
|
||||
// Log the warning instead for debugging
|
||||
g.log.Info("Large Database Protection Active",
|
||||
"reason", strategy.Reason,
|
||||
"jobs", strategy.Jobs,
|
||||
"parallel_dbs", strategy.ParallelDBs,
|
||||
"expected_time", strategy.ExpectedTime)
|
||||
return
|
||||
}
|
||||
|
||||
fmt.Println()
|
||||
fmt.Println("╔══════════════════════════════════════════════════════════════╗")
|
||||
fmt.Println("║ 🛡️ LARGE DATABASE PROTECTION ACTIVE 🛡️ ║")
|
||||
fmt.Println("╚══════════════════════════════════════════════════════════════╝")
|
||||
fmt.Println()
|
||||
fmt.Printf(" Reason: %s\n", strategy.Reason)
|
||||
fmt.Println()
|
||||
fmt.Println(" Strategy: SINGLE-THREADED RESTORE (Conservative Mode)")
|
||||
fmt.Println(" • Prevents PostgreSQL lock exhaustion")
|
||||
fmt.Println(" • Guarantees completion without 'out of shared memory' errors")
|
||||
fmt.Println(" • Slower but 100% reliable")
|
||||
fmt.Println()
|
||||
if strategy.ExpectedTime != "" {
|
||||
fmt.Printf(" Estimated Time: %s\n", strategy.ExpectedTime)
|
||||
fmt.Println()
|
||||
}
|
||||
fmt.Println(" This restore will complete successfully. Please be patient.")
|
||||
fmt.Println()
|
||||
fmt.Println("═══════════════════════════════════════════════════════════════")
|
||||
fmt.Println()
|
||||
}
|
||||
@ -190,7 +190,7 @@ func (s *Safety) validateSQLScriptGz(path string) error {
|
||||
return fmt.Errorf("does not appear to contain SQL content")
|
||||
}
|
||||
|
||||
// validateTarGz validates tar.gz archive
|
||||
// validateTarGz validates tar.gz archive with fast stream-based checks
|
||||
func (s *Safety) validateTarGz(path string) error {
|
||||
file, err := os.Open(path)
|
||||
if err != nil {
|
||||
@ -205,11 +205,40 @@ func (s *Safety) validateTarGz(path string) error {
|
||||
return fmt.Errorf("cannot read file header")
|
||||
}
|
||||
|
||||
if buffer[0] == 0x1f && buffer[1] == 0x8b {
|
||||
return nil // Valid gzip header
|
||||
if buffer[0] != 0x1f || buffer[1] != 0x8b {
|
||||
return fmt.Errorf("not a valid gzip file")
|
||||
}
|
||||
|
||||
return fmt.Errorf("not a valid gzip file")
|
||||
// Quick tar structure validation (stream-based, no full extraction)
|
||||
// Reset to start and decompress first few KB to check tar header
|
||||
file.Seek(0, 0)
|
||||
gzReader, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return fmt.Errorf("gzip corruption detected: %w", err)
|
||||
}
|
||||
defer gzReader.Close()
|
||||
|
||||
// Read first tar header to verify it's a valid tar archive
|
||||
headerBuf := make([]byte, 512) // Tar header is 512 bytes
|
||||
n, err = gzReader.Read(headerBuf)
|
||||
if err != nil && err != io.EOF {
|
||||
return fmt.Errorf("failed to read tar header: %w", err)
|
||||
}
|
||||
if n < 512 {
|
||||
return fmt.Errorf("archive too small or corrupted")
|
||||
}
|
||||
|
||||
// Check tar magic ("ustar\0" at offset 257)
|
||||
if len(headerBuf) >= 263 {
|
||||
magic := string(headerBuf[257:262])
|
||||
if magic != "ustar" {
|
||||
s.log.Debug("No tar magic found, but may still be valid tar", "magic", magic)
|
||||
// Don't fail - some tar implementations don't use magic
|
||||
}
|
||||
}
|
||||
|
||||
s.log.Debug("Cluster archive validation passed (stream-based check)")
|
||||
return nil // Valid gzip + tar structure
|
||||
}
|
||||
|
||||
// containsSQLKeywords checks if content contains SQL keywords
|
||||
@ -228,6 +257,42 @@ func containsSQLKeywords(content string) bool {
|
||||
return false
|
||||
}
|
||||
|
||||
// ValidateAndExtractCluster performs validation and pre-extraction for cluster restore
|
||||
// Returns path to extracted directory (in temp location) to avoid double-extraction
|
||||
// Caller must clean up the returned directory with os.RemoveAll() when done
|
||||
func (s *Safety) ValidateAndExtractCluster(ctx context.Context, archivePath string) (extractedDir string, err error) {
|
||||
// First validate archive integrity (fast stream check)
|
||||
if err := s.ValidateArchive(archivePath); err != nil {
|
||||
return "", fmt.Errorf("archive validation failed: %w", err)
|
||||
}
|
||||
|
||||
// Create temp directory for extraction in configured WorkDir
|
||||
workDir := s.cfg.GetEffectiveWorkDir()
|
||||
if workDir == "" {
|
||||
workDir = s.cfg.BackupDir
|
||||
}
|
||||
|
||||
tempDir, err := os.MkdirTemp(workDir, "dbbackup-cluster-extract-*")
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to create temp extraction directory in %s: %w", workDir, err)
|
||||
}
|
||||
|
||||
// Extract using tar command (fastest method)
|
||||
s.log.Info("Pre-extracting cluster archive for validation and restore",
|
||||
"archive", archivePath,
|
||||
"dest", tempDir)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "tar", "-xzf", archivePath, "-C", tempDir)
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
os.RemoveAll(tempDir) // Cleanup on failure
|
||||
return "", fmt.Errorf("extraction failed: %w: %s", err, string(output))
|
||||
}
|
||||
|
||||
s.log.Info("Cluster archive extracted successfully", "location", tempDir)
|
||||
return tempDir, nil
|
||||
}
|
||||
|
||||
// CheckDiskSpace verifies sufficient disk space for restore
|
||||
// Uses the effective work directory (WorkDir if set, otherwise BackupDir) since
|
||||
// that's where extraction actually happens for large databases
|
||||
|
||||
@ -214,8 +214,9 @@ func (m ArchiveBrowserModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
|
||||
if m.mode == "restore-single" && selected.Format.IsClusterBackup() {
|
||||
m.message = errorStyle.Render("[FAIL] Please select a single database backup")
|
||||
return m, nil
|
||||
// Cluster backup selected in single restore mode - offer to select individual database
|
||||
clusterSelector := NewClusterDatabaseSelector(m.config, m.logger, m, m.ctx, selected, "single", false)
|
||||
return clusterSelector, clusterSelector.Init()
|
||||
}
|
||||
|
||||
// Open restore preview
|
||||
@ -223,6 +224,18 @@ func (m ArchiveBrowserModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
return preview, preview.Init()
|
||||
}
|
||||
|
||||
case "s":
|
||||
// Select single database from cluster (shortcut key)
|
||||
if len(m.archives) > 0 && m.cursor < len(m.archives) {
|
||||
selected := m.archives[m.cursor]
|
||||
if selected.Format.IsClusterBackup() {
|
||||
clusterSelector := NewClusterDatabaseSelector(m.config, m.logger, m, m.ctx, selected, "single", false)
|
||||
return clusterSelector, clusterSelector.Init()
|
||||
} else {
|
||||
m.message = infoStyle.Render("💡 [s] only works with cluster backups")
|
||||
}
|
||||
}
|
||||
|
||||
case "i":
|
||||
// Show detailed info
|
||||
if len(m.archives) > 0 && m.cursor < len(m.archives) {
|
||||
@ -351,7 +364,7 @@ func (m ArchiveBrowserModel) View() string {
|
||||
s.WriteString(infoStyle.Render(fmt.Sprintf("Total: %d archive(s) | Selected: %d/%d",
|
||||
len(m.archives), m.cursor+1, len(m.archives))))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render("[KEY] ↑/↓: Navigate | Enter: Select | d: Diagnose | f: Filter | i: Info | Esc: Back"))
|
||||
s.WriteString(infoStyle.Render("[KEY] ↑/↓: Navigate | Enter: Select | s: Single DB from Cluster | d: Diagnose | f: Filter | i: Info | Esc: Back"))
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
281
internal/tui/cluster_db_selector.go
Normal file
281
internal/tui/cluster_db_selector.go
Normal file
@ -0,0 +1,281 @@
|
||||
package tui
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
tea "github.com/charmbracelet/bubbletea"
|
||||
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/logger"
|
||||
"dbbackup/internal/restore"
|
||||
)
|
||||
|
||||
// ClusterDatabaseSelectorModel for selecting databases from a cluster backup
|
||||
type ClusterDatabaseSelectorModel struct {
|
||||
config *config.Config
|
||||
logger logger.Logger
|
||||
parent tea.Model
|
||||
ctx context.Context
|
||||
archive ArchiveInfo
|
||||
databases []restore.DatabaseInfo
|
||||
cursor int
|
||||
selected map[int]bool // Track multiple selections
|
||||
loading bool
|
||||
err error
|
||||
title string
|
||||
mode string // "single" or "multiple"
|
||||
extractOnly bool // If true, extract without restoring
|
||||
}
|
||||
|
||||
func NewClusterDatabaseSelector(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, archive ArchiveInfo, mode string, extractOnly bool) ClusterDatabaseSelectorModel {
|
||||
return ClusterDatabaseSelectorModel{
|
||||
config: cfg,
|
||||
logger: log,
|
||||
parent: parent,
|
||||
ctx: ctx,
|
||||
archive: archive,
|
||||
databases: nil,
|
||||
selected: make(map[int]bool),
|
||||
title: "Select Database(s) from Cluster Backup",
|
||||
loading: true,
|
||||
mode: mode,
|
||||
extractOnly: extractOnly,
|
||||
}
|
||||
}
|
||||
|
||||
func (m ClusterDatabaseSelectorModel) Init() tea.Cmd {
|
||||
return fetchClusterDatabases(m.ctx, m.archive, m.logger)
|
||||
}
|
||||
|
||||
type clusterDatabaseListMsg struct {
|
||||
databases []restore.DatabaseInfo
|
||||
err error
|
||||
}
|
||||
|
||||
func fetchClusterDatabases(ctx context.Context, archive ArchiveInfo, log logger.Logger) tea.Cmd {
|
||||
return func() tea.Msg {
|
||||
databases, err := restore.ListDatabasesInCluster(ctx, archive.Path, log)
|
||||
if err != nil {
|
||||
return clusterDatabaseListMsg{databases: nil, err: fmt.Errorf("failed to list databases: %w", err)}
|
||||
}
|
||||
return clusterDatabaseListMsg{databases: databases, err: nil}
|
||||
}
|
||||
}
|
||||
|
||||
func (m ClusterDatabaseSelectorModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
switch msg := msg.(type) {
|
||||
case clusterDatabaseListMsg:
|
||||
m.loading = false
|
||||
if msg.err != nil {
|
||||
m.err = msg.err
|
||||
} else {
|
||||
m.databases = msg.databases
|
||||
if len(m.databases) > 0 && m.mode == "single" {
|
||||
m.selected[0] = true // Pre-select first database in single mode
|
||||
}
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.KeyMsg:
|
||||
if m.loading {
|
||||
return m, nil
|
||||
}
|
||||
|
||||
switch msg.String() {
|
||||
case "q", "esc":
|
||||
// Return to parent
|
||||
return m.parent, nil
|
||||
|
||||
case "up", "k":
|
||||
if m.cursor > 0 {
|
||||
m.cursor--
|
||||
}
|
||||
|
||||
case "down", "j":
|
||||
if m.cursor < len(m.databases)-1 {
|
||||
m.cursor++
|
||||
}
|
||||
|
||||
case " ": // Space to toggle selection (multiple mode)
|
||||
if m.mode == "multiple" {
|
||||
m.selected[m.cursor] = !m.selected[m.cursor]
|
||||
} else {
|
||||
// Single mode: clear all and select current
|
||||
m.selected = make(map[int]bool)
|
||||
m.selected[m.cursor] = true
|
||||
}
|
||||
|
||||
case "enter":
|
||||
if m.err != nil {
|
||||
return m.parent, nil
|
||||
}
|
||||
|
||||
if len(m.databases) == 0 {
|
||||
return m.parent, nil
|
||||
}
|
||||
|
||||
// Get selected database(s)
|
||||
var selectedDBs []restore.DatabaseInfo
|
||||
for i, selected := range m.selected {
|
||||
if selected && i < len(m.databases) {
|
||||
selectedDBs = append(selectedDBs, m.databases[i])
|
||||
}
|
||||
}
|
||||
|
||||
if len(selectedDBs) == 0 {
|
||||
// No selection, use cursor position
|
||||
selectedDBs = []restore.DatabaseInfo{m.databases[m.cursor]}
|
||||
}
|
||||
|
||||
if m.extractOnly {
|
||||
// TODO: Implement extraction flow
|
||||
m.logger.Info("Extract-only mode not yet implemented in TUI")
|
||||
return m.parent, nil
|
||||
}
|
||||
|
||||
// For restore: proceed to restore preview/confirmation
|
||||
if len(selectedDBs) == 1 {
|
||||
// Single database restore from cluster
|
||||
// Create a temporary archive info for the selected database
|
||||
dbArchive := ArchiveInfo{
|
||||
Name: selectedDBs[0].Filename,
|
||||
Path: m.archive.Path, // Still use cluster archive path
|
||||
Format: m.archive.Format,
|
||||
Size: selectedDBs[0].Size,
|
||||
Modified: m.archive.Modified,
|
||||
DatabaseName: selectedDBs[0].Name,
|
||||
}
|
||||
|
||||
preview := NewRestorePreview(m.config, m.logger, m.parent, m.ctx, dbArchive, "restore-cluster-single")
|
||||
return preview, preview.Init()
|
||||
} else {
|
||||
// Multiple database restore - not yet implemented
|
||||
m.logger.Info("Multiple database restore not yet implemented in TUI")
|
||||
return m.parent, nil
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return m, nil
|
||||
}
|
||||
|
||||
func (m ClusterDatabaseSelectorModel) View() string {
|
||||
if m.loading {
|
||||
return TitleStyle.Render("Loading databases from cluster backup...") + "\n\nPlease wait..."
|
||||
}
|
||||
|
||||
if m.err != nil {
|
||||
var s strings.Builder
|
||||
s.WriteString(TitleStyle.Render("Error"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusErrorStyle.Render("Failed to list databases"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(m.err.Error())
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusReadyStyle.Render("Press any key to go back"))
|
||||
return s.String()
|
||||
}
|
||||
|
||||
if len(m.databases) == 0 {
|
||||
var s strings.Builder
|
||||
s.WriteString(TitleStyle.Render("No Databases Found"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusWarningStyle.Render("The cluster backup appears to be empty or invalid."))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusReadyStyle.Render("Press any key to go back"))
|
||||
return s.String()
|
||||
}
|
||||
|
||||
var s strings.Builder
|
||||
|
||||
// Title
|
||||
s.WriteString(TitleStyle.Render(m.title))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Archive info
|
||||
s.WriteString(LabelStyle.Render("Archive: "))
|
||||
s.WriteString(m.archive.Name)
|
||||
s.WriteString("\n")
|
||||
s.WriteString(LabelStyle.Render("Databases: "))
|
||||
s.WriteString(fmt.Sprintf("%d", len(m.databases)))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Instructions
|
||||
if m.mode == "multiple" {
|
||||
s.WriteString(StatusReadyStyle.Render("↑/↓: navigate • space: select/deselect • enter: confirm • q/esc: back"))
|
||||
} else {
|
||||
s.WriteString(StatusReadyStyle.Render("↑/↓: navigate • enter: select • q/esc: back"))
|
||||
}
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Database list
|
||||
s.WriteString(ListHeaderStyle.Render("Available Databases:"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
for i, db := range m.databases {
|
||||
cursor := " "
|
||||
if m.cursor == i {
|
||||
cursor = "▶ "
|
||||
}
|
||||
|
||||
checkbox := ""
|
||||
if m.mode == "multiple" {
|
||||
if m.selected[i] {
|
||||
checkbox = "[✓] "
|
||||
} else {
|
||||
checkbox = "[ ] "
|
||||
}
|
||||
} else {
|
||||
if m.selected[i] {
|
||||
checkbox = "● "
|
||||
} else {
|
||||
checkbox = "○ "
|
||||
}
|
||||
}
|
||||
|
||||
sizeStr := formatBytes(db.Size)
|
||||
line := fmt.Sprintf("%s%s%-40s %10s", cursor, checkbox, db.Name, sizeStr)
|
||||
|
||||
if m.cursor == i {
|
||||
s.WriteString(ListSelectedStyle.Render(line))
|
||||
} else {
|
||||
s.WriteString(ListNormalStyle.Render(line))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
|
||||
// Selection summary
|
||||
selectedCount := 0
|
||||
var totalSize int64
|
||||
for i, selected := range m.selected {
|
||||
if selected && i < len(m.databases) {
|
||||
selectedCount++
|
||||
totalSize += m.databases[i].Size
|
||||
}
|
||||
}
|
||||
|
||||
if selectedCount > 0 {
|
||||
s.WriteString(StatusSuccessStyle.Render(fmt.Sprintf("Selected: %d database(s), Total size: %s", selectedCount, formatBytes(totalSize))))
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// formatBytes formats byte count as human-readable string
|
||||
func formatBytes(bytes int64) string {
|
||||
const unit = 1024
|
||||
if bytes < unit {
|
||||
return fmt.Sprintf("%d B", bytes)
|
||||
}
|
||||
div, exp := int64(unit), 0
|
||||
for n := bytes / unit; n >= unit; n /= unit {
|
||||
div *= unit
|
||||
exp++
|
||||
}
|
||||
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
@ -430,6 +430,9 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
var restoreErr error
|
||||
if restoreType == "restore-cluster" {
|
||||
restoreErr = engine.RestoreCluster(ctx, archive.Path)
|
||||
} else if restoreType == "restore-cluster-single" {
|
||||
// Restore single database from cluster backup
|
||||
restoreErr = engine.RestoreSingleFromCluster(ctx, archive.Path, targetDB, targetDB, cleanFirst, createIfMissing)
|
||||
} else {
|
||||
restoreErr = engine.RestoreSingle(ctx, archive.Path, targetDB, cleanFirst, createIfMissing)
|
||||
}
|
||||
@ -445,6 +448,8 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
result := fmt.Sprintf("Successfully restored from %s", archive.Name)
|
||||
if restoreType == "restore-single" {
|
||||
result = fmt.Sprintf("Successfully restored '%s' from %s", targetDB, archive.Name)
|
||||
} else if restoreType == "restore-cluster-single" {
|
||||
result = fmt.Sprintf("Successfully restored '%s' from cluster %s", targetDB, archive.Name)
|
||||
} else if restoreType == "restore-cluster" && cleanClusterFirst {
|
||||
result = fmt.Sprintf("Successfully restored cluster from %s (cleaned %d existing database(s) first)", archive.Name, len(existingDBs))
|
||||
}
|
||||
@ -658,13 +663,15 @@ func (m RestoreExecutionModel) View() string {
|
||||
title := "[EXEC] Restoring Database"
|
||||
if m.restoreType == "restore-cluster" {
|
||||
title = "[EXEC] Restoring Cluster"
|
||||
} else if m.restoreType == "restore-cluster-single" {
|
||||
title = "[EXEC] Restoring Single Database from Cluster"
|
||||
}
|
||||
s.WriteString(titleStyle.Render(title))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Archive info
|
||||
s.WriteString(fmt.Sprintf("Archive: %s\n", m.archive.Name))
|
||||
if m.restoreType == "restore-single" {
|
||||
if m.restoreType == "restore-single" || m.restoreType == "restore-cluster-single" {
|
||||
s.WriteString(fmt.Sprintf("Target: %s\n", m.targetDB))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
@ -61,6 +61,7 @@ type RestorePreviewModel struct {
|
||||
canProceed bool
|
||||
message string
|
||||
saveDebugLog bool // Save detailed error report on failure
|
||||
debugLocks bool // Enable detailed lock debugging
|
||||
workDir string // Custom work directory for extraction
|
||||
}
|
||||
|
||||
@ -317,6 +318,15 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.message = "Debug log: disabled"
|
||||
}
|
||||
|
||||
case "l":
|
||||
// Toggle lock debugging
|
||||
m.debugLocks = !m.debugLocks
|
||||
if m.debugLocks {
|
||||
m.message = infoStyle.Render("🔍 [LOCK-DEBUG] Lock debugging: ENABLED (captures PostgreSQL lock config, Guard decisions, boost attempts)")
|
||||
} else {
|
||||
m.message = "Lock debugging: disabled"
|
||||
}
|
||||
|
||||
case "w":
|
||||
// Toggle/set work directory
|
||||
if m.workDir == "" {
|
||||
@ -346,7 +356,10 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// Proceed to restore execution
|
||||
// Proceed to restore execution (enable lock debugging in Config)
|
||||
if m.debugLocks {
|
||||
m.config.DebugLocks = true
|
||||
}
|
||||
exec := NewRestoreExecution(m.config, m.logger, m.parent, m.ctx, m.archive, m.targetDB, m.cleanFirst, m.createIfMissing, m.mode, m.cleanClusterFirst, m.existingDBs, m.saveDebugLog, m.workDir)
|
||||
return exec, exec.Init()
|
||||
}
|
||||
@ -546,6 +559,20 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString(infoStyle.Render(fmt.Sprintf(" Saves detailed error report to %s on failure", m.config.GetEffectiveWorkDir())))
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
// Lock debugging option
|
||||
lockDebugIcon := "[-]"
|
||||
lockDebugStyle := infoStyle
|
||||
if m.debugLocks {
|
||||
lockDebugIcon = "[🔍]"
|
||||
lockDebugStyle = checkPassedStyle
|
||||
}
|
||||
s.WriteString(lockDebugStyle.Render(fmt.Sprintf(" %s Lock Debug: %v (press 'l' to toggle)", lockDebugIcon, m.debugLocks)))
|
||||
s.WriteString("\n")
|
||||
if m.debugLocks {
|
||||
s.WriteString(infoStyle.Render(" Captures PostgreSQL lock config, Guard decisions, boost attempts"))
|
||||
s.WriteString("\n")
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
// Message
|
||||
@ -561,10 +588,10 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString(successStyle.Render("[OK] Ready to restore"))
|
||||
s.WriteString("\n")
|
||||
if m.mode == "restore-single" {
|
||||
s.WriteString(infoStyle.Render("t: Clean-first | c: Create | w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
|
||||
s.WriteString(infoStyle.Render("t: Clean-first | c: Create | w: WorkDir | d: Debug | l: LockDebug | Enter: Proceed | Esc: Cancel"))
|
||||
} else if m.mode == "restore-cluster" {
|
||||
if m.existingDBCount > 0 {
|
||||
s.WriteString(infoStyle.Render("c: Cleanup | w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
|
||||
s.WriteString(infoStyle.Render("c: Cleanup | w: WorkDir | d: Debug | l: LockDebug | Enter: Proceed | Esc: Cancel"))
|
||||
} else {
|
||||
s.WriteString(infoStyle.Render("w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
|
||||
}
|
||||
|
||||
2
main.go
2
main.go
@ -16,7 +16,7 @@ import (
|
||||
|
||||
// Build information (set by ldflags)
|
||||
var (
|
||||
version = "3.42.50"
|
||||
version = "3.42.81"
|
||||
buildTime = "unknown"
|
||||
gitCommit = "unknown"
|
||||
)
|
||||
|
||||
77
release-notes-v3.42.77.md
Normal file
77
release-notes-v3.42.77.md
Normal file
@ -0,0 +1,77 @@
|
||||
# dbbackup v3.42.77
|
||||
|
||||
## 🎯 New Feature: Single Database Extraction from Cluster Backups
|
||||
|
||||
Extract and restore individual databases from cluster backups without full cluster restoration!
|
||||
|
||||
### 🆕 New Flags
|
||||
|
||||
- **`--list-databases`**: List all databases in cluster backup with sizes
|
||||
- **`--database <name>`**: Extract/restore a single database from cluster
|
||||
- **`--databases "db1,db2,db3"`**: Extract multiple databases (comma-separated)
|
||||
- **`--output-dir <path>`**: Extract to directory without restoring
|
||||
- **`--target <name>`**: Rename database during restore
|
||||
|
||||
### 📖 Examples
|
||||
|
||||
```bash
|
||||
# List databases in cluster backup
|
||||
dbbackup restore cluster backup.tar.gz --list-databases
|
||||
|
||||
# Extract single database (no restore)
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --output-dir /tmp/extract
|
||||
|
||||
# Restore single database from cluster
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --confirm
|
||||
|
||||
# Restore with different name (testing)
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --target myapp_test --confirm
|
||||
|
||||
# Extract multiple databases
|
||||
dbbackup restore cluster backup.tar.gz --databases "app1,app2,app3" --output-dir /tmp/extract
|
||||
```
|
||||
|
||||
### 💡 Use Cases
|
||||
|
||||
✅ **Selective disaster recovery** - restore only affected databases
|
||||
✅ **Database migration** - copy databases between clusters
|
||||
✅ **Testing workflows** - restore with different names
|
||||
✅ **Faster restores** - extract only what you need
|
||||
✅ **Less disk space** - no need to extract entire cluster
|
||||
|
||||
### ⚙️ Technical Details
|
||||
|
||||
- Stream-based extraction with progress feedback
|
||||
- Fast cluster archive scanning (no full extraction needed)
|
||||
- Works with all cluster backup formats (.tar.gz)
|
||||
- Compatible with existing cluster restore workflow
|
||||
- Automatic format detection for extracted dumps
|
||||
|
||||
### 🖥️ TUI Support (Interactive Mode)
|
||||
|
||||
**New in this release**: Press **`s`** key when viewing a cluster backup to select individual databases!
|
||||
|
||||
- Navigate cluster backups in TUI and press `s` for database selection
|
||||
- Interactive database picker with size information
|
||||
- Visual selection confirmation before restore
|
||||
- Seamless integration with existing TUI workflows
|
||||
|
||||
**TUI Workflow:**
|
||||
1. Launch TUI: `dbbackup` (no arguments)
|
||||
2. Navigate to "Restore" → "Single Database"
|
||||
3. Select cluster backup archive
|
||||
4. Press `s` to show database list
|
||||
5. Select database and confirm restore
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
Download the binary for your platform below and make it executable:
|
||||
|
||||
```bash
|
||||
chmod +x dbbackup_*
|
||||
./dbbackup_* --version
|
||||
```
|
||||
|
||||
## 🔍 Checksums
|
||||
|
||||
SHA256 checksums in `checksums.txt`.
|
||||
99
verify_postgres_locks.sh
Executable file
99
verify_postgres_locks.sh
Executable file
@ -0,0 +1,99 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# PostgreSQL Lock Configuration Check & Restore Guidance
|
||||
#
|
||||
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " PostgreSQL Lock Configuration & Restore Strategy"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
|
||||
# Get values - extract ONLY digits, remove all non-numeric chars
|
||||
LOCKS=$(sudo -u postgres psql --no-psqlrc -t -A -c "SHOW max_locks_per_transaction;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
CONNS=$(sudo -u postgres psql --no-psqlrc -t -A -c "SHOW max_connections;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
PREPARED=$(sudo -u postgres psql --no-psqlrc -t -A -c "SHOW max_prepared_transactions;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
|
||||
if [ -z "$LOCKS" ]; then
|
||||
LOCKS=$(psql --no-psqlrc -t -A -c "SHOW max_locks_per_transaction;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
CONNS=$(psql --no-psqlrc -t -A -c "SHOW max_connections;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
PREPARED=$(psql --no-psqlrc -t -A -c "SHOW max_prepared_transactions;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
fi
|
||||
|
||||
if [ -z "$LOCKS" ] || [ -z "$CONNS" ]; then
|
||||
echo "❌ ERROR: Could not retrieve PostgreSQL settings"
|
||||
echo " Ensure PostgreSQL is running and accessible"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "📊 Current Configuration:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
echo " max_locks_per_transaction: $LOCKS"
|
||||
echo " max_connections: $CONNS"
|
||||
echo " max_prepared_transactions: ${PREPARED:-0}"
|
||||
echo
|
||||
|
||||
# Calculate capacity
|
||||
PREPARED=${PREPARED:-0}
|
||||
CAPACITY=$((LOCKS * (CONNS + PREPARED)))
|
||||
|
||||
echo " Total Lock Capacity: $CAPACITY locks"
|
||||
echo
|
||||
|
||||
# Determine status
|
||||
if [ "$LOCKS" -lt 2048 ]; then
|
||||
STATUS="❌ CRITICAL"
|
||||
RECOMMENDATION="increase_locks"
|
||||
elif [ "$LOCKS" -lt 4096 ]; then
|
||||
STATUS="⚠️ LOW"
|
||||
RECOMMENDATION="single_threaded"
|
||||
else
|
||||
STATUS="✅ OK"
|
||||
RECOMMENDATION="single_threaded"
|
||||
fi
|
||||
|
||||
echo "Status: $STATUS (locks=$LOCKS, capacity=$CAPACITY)"
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " 🎯 RECOMMENDED RESTORE COMMAND"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
|
||||
if [ "$RECOMMENDATION" = "increase_locks" ]; then
|
||||
echo "CRITICAL: Locks too low. Increase first, THEN use single-threaded:"
|
||||
echo
|
||||
echo "1. Increase locks (requires PostgreSQL restart):"
|
||||
echo " sudo -u postgres psql -c \"ALTER SYSTEM SET max_locks_per_transaction = 4096;\""
|
||||
echo " sudo systemctl restart postgresql"
|
||||
echo
|
||||
echo "2. Run restore with single-threaded mode:"
|
||||
echo " dbbackup restore cluster <backup-file> \\"
|
||||
echo " --profile conservative \\"
|
||||
echo " --parallel-dbs 1 \\"
|
||||
echo " --jobs 1 \\"
|
||||
echo " --confirm"
|
||||
else
|
||||
echo "✅ Use default CONSERVATIVE profile (single-threaded, prevents lock issues):"
|
||||
echo
|
||||
echo " dbbackup restore cluster <backup-file> --confirm"
|
||||
echo
|
||||
echo " (Default profile is now 'conservative' = single-threaded)"
|
||||
echo
|
||||
echo " For faster restore (if locks are sufficient):"
|
||||
echo " dbbackup restore cluster <backup-file> --profile balanced --confirm"
|
||||
echo " dbbackup restore cluster <backup-file> --profile aggressive --confirm"
|
||||
fi
|
||||
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " ℹ️ WHY SINGLE-THREADED?"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
echo " Parallel restore with large databases (especially with BLOBs)"
|
||||
echo " can exhaust locks EVEN with high max_locks_per_transaction."
|
||||
echo
|
||||
echo " --jobs 1 = Single-threaded pg_restore (minimal locks)"
|
||||
echo " --parallel-dbs 1 = Restore one database at a time"
|
||||
echo
|
||||
echo " Trade-off: Slower restore, but GUARANTEED completion."
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
Reference in New Issue
Block a user