Compare commits
74 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 527435a3b8 | |||
| 6a7cf3c11e | |||
| fd3f8770b7 | |||
| 15f10c280c | |||
| 35a9a6e837 | |||
| 82378be971 | |||
| 9fec2c79f8 | |||
| ae34467b4a | |||
| 379ca06146 | |||
| c9bca42f28 | |||
| c90ec1156e | |||
| 23265a33a4 | |||
| 9b9abbfde7 | |||
| 6282d66693 | |||
| 4486a5d617 | |||
| 75dee1fff5 | |||
| 91d494537d | |||
| 8ffc1ba23c | |||
| 8e8045d8c0 | |||
| 0e94dcf384 | |||
| 33adfbdb38 | |||
| af34eaa073 | |||
| babce7cc83 | |||
| ae8c8fde3d | |||
| 346cb7fb61 | |||
| 18549584b1 | |||
| b1d1d57b61 | |||
| d0e1da1bea | |||
| 343a8b782d | |||
| bc5f7c07f4 | |||
| 821521470f | |||
| 147b9fc234 | |||
| 6f3e81a5a6 | |||
| bf1722c316 | |||
| a759f4d3db | |||
| 7cf1d6f85b | |||
| b305d1342e | |||
| 5456da7183 | |||
| f9ff45cf2a | |||
| 72c06ba5c2 | |||
| a0a401cab1 | |||
| 59a717abe7 | |||
| 490a12f858 | |||
| ea4337e298 | |||
| bbd4f0ceac | |||
| f6f8b04785 | |||
| 670c9af2e7 | |||
| e2cf9adc62 | |||
| 29e089fe3b | |||
| 9396c8e605 | |||
| e363e1937f | |||
| df1ab2f55b | |||
| 0e050b2def | |||
| 62d58c77af | |||
| c5be9bcd2b | |||
| b120f1507e | |||
| dd1db844ce | |||
| 4ea3ec2cf8 | |||
| 9200024e50 | |||
| 698b8a761c | |||
| dd7c4da0eb | |||
| b2a78cad2a | |||
| 5728b465e6 | |||
| bfe99e959c | |||
| 780beaadfb | |||
| 838c5b8c15 | |||
| 9d95a193db | |||
| 3201f0fb6a | |||
| 62ddc57fb7 | |||
| 510175ff04 | |||
| a85ad0c88c | |||
| 4938dc1918 | |||
| 09a917766f | |||
| eeacbfa007 |
224
CHANGELOG.md
224
CHANGELOG.md
@ -5,6 +5,230 @@ All notable changes to dbbackup will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Added - Single Database Extraction from Cluster Backups (CLI + TUI)
|
||||
- **Extract and restore individual databases from cluster backups** - selective restore without full cluster restoration
|
||||
- **CLI Commands**:
|
||||
- **List databases**: `dbbackup restore cluster backup.tar.gz --list-databases`
|
||||
- Shows all databases in cluster backup with sizes
|
||||
- Fast scan without full extraction
|
||||
- **Extract single database**: `dbbackup restore cluster backup.tar.gz --database myapp --output-dir /tmp/extract`
|
||||
- Extracts only the specified database dump
|
||||
- No restore, just file extraction
|
||||
- **Restore single database from cluster**: `dbbackup restore cluster backup.tar.gz --database myapp --confirm`
|
||||
- Extracts and restores only one database
|
||||
- Much faster than full cluster restore when you only need one database
|
||||
- **Rename on restore**: `dbbackup restore cluster backup.tar.gz --database myapp --target myapp_test --confirm`
|
||||
- Restore with different database name (useful for testing)
|
||||
- **Extract multiple databases**: `dbbackup restore cluster backup.tar.gz --databases "app1,app2,app3" --output-dir /tmp/extract`
|
||||
- Comma-separated list of databases to extract
|
||||
- **TUI Support**:
|
||||
- Press **'s'** on any cluster backup in archive browser to select individual databases
|
||||
- New **ClusterDatabaseSelector** view shows all databases with sizes
|
||||
- Navigate with arrow keys, select with Enter
|
||||
- Automatic handling when cluster backup selected in single restore mode
|
||||
- Full restore preview and confirmation workflow
|
||||
- **Benefits**:
|
||||
- Faster restores (extract only what you need)
|
||||
- Less disk space usage during restore
|
||||
- Easy database migration/copying
|
||||
- Better testing workflow
|
||||
- Selective disaster recovery
|
||||
|
||||
### Performance - Cluster Restore Optimization
|
||||
- **Eliminated duplicate archive extraction in cluster restore** - saves 30-50% time on large restores
|
||||
- Previously: Archive was extracted twice (once in preflight validation, once in actual restore)
|
||||
- Now: Archive extracted once and reused for both validation and restore
|
||||
- **Time savings**:
|
||||
- 50 GB cluster: ~3-6 minutes faster
|
||||
- 10 GB cluster: ~1-2 minutes faster
|
||||
- Small clusters (<5 GB): ~30 seconds faster
|
||||
- Optimization automatically enabled when `--diagnose` flag is used
|
||||
- New `ValidateAndExtractCluster()` performs combined validation + extraction
|
||||
- `RestoreCluster()` accepts optional `preExtractedPath` parameter to reuse extracted directory
|
||||
- Disk space checks intelligently skipped when using pre-extracted directory
|
||||
- Maintains backward compatibility - works with and without pre-extraction
|
||||
- Log output shows optimization: `"Using pre-extracted cluster directory ... optimization: skipping duplicate extraction"`
|
||||
|
||||
### Improved - Archive Validation
|
||||
- **Enhanced tar.gz validation with stream-based checks**
|
||||
- Fast header-only validation (validates gzip + tar structure without full extraction)
|
||||
- Checks gzip magic bytes (0x1f 0x8b) and tar header signature
|
||||
- Reduces preflight validation time from minutes to seconds on large archives
|
||||
- Falls back to full extraction only when necessary (with `--diagnose`)
|
||||
|
||||
## [3.42.74] - 2026-01-20 "Resource Profile System + Critical Ctrl+C Fix"
|
||||
|
||||
### Critical Bug Fix
|
||||
- **Fixed Ctrl+C not working in TUI backup/restore** - Context cancellation was broken in TUI mode
|
||||
- `executeBackupWithTUIProgress()` and `executeRestoreWithTUIProgress()` created new contexts with `WithCancel(parentCtx)`
|
||||
- When user pressed Ctrl+C, `model.cancel()` was called on parent context but execution had separate context
|
||||
- Fixed by using parent context directly instead of creating new one
|
||||
- Ctrl+C/ESC/q now properly propagate cancellation to running operations
|
||||
- Users can now interrupt long-running TUI operations
|
||||
|
||||
### Added - Resource Profile System
|
||||
- **`--profile` flag for restore operations** with three presets:
|
||||
- **Conservative** (`--profile=conservative`): Single-threaded (`--parallel=1`), minimal memory usage
|
||||
- Best for resource-constrained servers, shared hosting, or when "out of shared memory" errors occur
|
||||
- Automatically enables `LargeDBMode` for better resource management
|
||||
- **Balanced** (default): Auto-detect resources, moderate parallelism
|
||||
- Good default for most scenarios
|
||||
- **Aggressive** (`--profile=aggressive`): Maximum parallelism, all available resources
|
||||
- Best for dedicated database servers with ample resources
|
||||
- **Potato** (`--profile=potato`): Easter egg 🥔, same as conservative
|
||||
- **Profile system applies to both CLI and TUI**:
|
||||
- CLI: `dbbackup restore cluster backup.tar.gz --profile=conservative --confirm`
|
||||
- TUI: Automatically uses conservative profile for safer interactive operation
|
||||
- **User overrides supported**: `--jobs` and `--parallel-dbs` flags override profile settings
|
||||
- **New `internal/config/profile.go`** module:
|
||||
- `GetRestoreProfile(name)` - Returns profile settings
|
||||
- `ApplyProfile(cfg, profile, jobs, parallelDBs)` - Applies profile with overrides
|
||||
- `GetProfileDescription(name)` - Human-readable descriptions
|
||||
- `ListProfiles()` - All available profiles
|
||||
|
||||
### Added - PostgreSQL Diagnostic Tools
|
||||
- **`diagnose_postgres_memory.sh`** - Comprehensive memory and resource analysis script:
|
||||
- System memory overview with usage percentages and warnings
|
||||
- Top 15 memory consuming processes
|
||||
- PostgreSQL-specific memory configuration analysis
|
||||
- Current locks and connections monitoring
|
||||
- Shared memory segments inspection
|
||||
- Disk space and swap usage checks
|
||||
- Identifies other resource consumers (Nessus, Elastic Agent, monitoring tools)
|
||||
- Smart recommendations based on findings
|
||||
- Detects temp file usage (indicator of low work_mem)
|
||||
- **`fix_postgres_locks.sh`** - PostgreSQL lock configuration helper:
|
||||
- Automatically increases `max_locks_per_transaction` to 4096
|
||||
- Shows current configuration before applying changes
|
||||
- Calculates total lock capacity
|
||||
- Provides restart commands for different PostgreSQL setups
|
||||
- References diagnostic tool for comprehensive analysis
|
||||
|
||||
### Added - Documentation
|
||||
- **`RESTORE_PROFILES.md`** - Complete profile guide with real-world scenarios:
|
||||
- Profile comparison table
|
||||
- When to use each profile
|
||||
- Override examples
|
||||
- Troubleshooting guide for "out of shared memory" errors
|
||||
- Integration with diagnostic tools
|
||||
- **`email_infra_team.txt`** - Admin communication template (German):
|
||||
- Analysis results template
|
||||
- Problem identification section
|
||||
- Three solution variants (temporary, permanent, workaround)
|
||||
- Includes diagnostic tool references
|
||||
|
||||
### Changed - TUI Improvements
|
||||
- **TUI mode defaults to conservative profile** for safer operation
|
||||
- Interactive users benefit from stability over speed
|
||||
- Prevents resource exhaustion on shared systems
|
||||
- Can be overridden with environment variable: `export RESOURCE_PROFILE=balanced`
|
||||
|
||||
### Fixed
|
||||
- Context cancellation in TUI backup operations (critical)
|
||||
- Context cancellation in TUI restore operations (critical)
|
||||
- Better error diagnostics for "out of shared memory" errors
|
||||
- Improved resource detection and management
|
||||
|
||||
### Technical Details
|
||||
- Profile system respects explicit user flags (`--jobs`, `--parallel-dbs`)
|
||||
- Conservative profile sets `cfg.LargeDBMode = true` automatically
|
||||
- TUI profile selection logged when `Debug` mode enabled
|
||||
- All profiles support both single and cluster restore operations
|
||||
|
||||
## [3.42.50] - 2026-01-16 "Ctrl+C Signal Handling Fix"
|
||||
|
||||
### Fixed - Proper Ctrl+C/SIGINT Handling in TUI
|
||||
- **Added tea.InterruptMsg handling** - Bubbletea v1.3+ sends `InterruptMsg` for SIGINT signals
|
||||
instead of a `KeyMsg` with "ctrl+c", causing cancellation to not work
|
||||
- **Fixed cluster restore cancellation** - Ctrl+C now properly cancels running restore operations
|
||||
- **Fixed cluster backup cancellation** - Ctrl+C now properly cancels running backup operations
|
||||
- **Added interrupt handling to main menu** - Proper cleanup on SIGINT from menu
|
||||
- **Orphaned process cleanup** - `cleanup.KillOrphanedProcesses()` called on all interrupt paths
|
||||
|
||||
### Changed
|
||||
- All TUI execution views now handle both `tea.KeyMsg` ("ctrl+c") and `tea.InterruptMsg`
|
||||
- Context cancellation properly propagates to child processes via `exec.CommandContext`
|
||||
- No zombie pg_dump/pg_restore/gzip processes left behind on cancellation
|
||||
|
||||
## [3.42.49] - 2026-01-16 "Unified Cluster Backup Progress"
|
||||
|
||||
### Added - Unified Progress Display for Cluster Backup
|
||||
- **Combined overall progress bar** for cluster backup showing all phases:
|
||||
- Phase 1/3: Backing up Globals (0-15% of overall)
|
||||
- Phase 2/3: Backing up Databases (15-90% of overall)
|
||||
- Phase 3/3: Compressing Archive (90-100% of overall)
|
||||
- **Current database indicator** - Shows which database is currently being backed up
|
||||
- **Phase-aware progress tracking** - New fields in backup progress state:
|
||||
- `overallPhase` - Current phase (1=globals, 2=databases, 3=compressing)
|
||||
- `phaseDesc` - Human-readable phase description
|
||||
- **Dual progress bars** for cluster backup:
|
||||
- Overall progress bar showing combined operation progress
|
||||
- Database count progress bar showing individual database progress
|
||||
|
||||
### Changed
|
||||
- Cluster backup TUI now shows unified progress display matching restore
|
||||
- Progress callbacks now include phase information
|
||||
- Better visual feedback during entire cluster backup operation
|
||||
|
||||
## [3.42.48] - 2026-01-15 "Unified Cluster Restore Progress"
|
||||
|
||||
### Added - Unified Progress Display for Cluster Restore
|
||||
- **Combined overall progress bar** showing progress across all restore phases:
|
||||
- Phase 1/3: Extracting Archive (0-60% of overall)
|
||||
- Phase 2/3: Restoring Globals (60-65% of overall)
|
||||
- Phase 3/3: Restoring Databases (65-100% of overall)
|
||||
- **Current database indicator** - Shows which database is currently being restored
|
||||
- **Phase-aware progress tracking** - New fields in progress state:
|
||||
- `overallPhase` - Current phase (1=extraction, 2=globals, 3=databases)
|
||||
- `currentDB` - Name of database currently being restored
|
||||
- `extractionDone` - Boolean flag for phase transition
|
||||
- **Dual progress bars** for cluster restore:
|
||||
- Overall progress bar showing combined operation progress
|
||||
- Phase-specific progress bar (extraction bytes or database count)
|
||||
|
||||
### Changed
|
||||
- Cluster restore TUI now shows unified progress display
|
||||
- Progress callbacks now set phase and current database information
|
||||
- Extraction completion triggers automatic transition to globals phase
|
||||
- Database restore phase shows current database name with spinner
|
||||
|
||||
### Improved
|
||||
- Better visual feedback during entire cluster restore operation
|
||||
- Clear phase indicators help users understand restore progress
|
||||
- Overall progress percentage gives better time estimates
|
||||
|
||||
## [3.42.35] - 2026-01-15 "TUI Detailed Progress"
|
||||
|
||||
### Added - Enhanced TUI Progress Display
|
||||
- **Detailed progress bar in TUI restore** - schollz-style progress bar with:
|
||||
- Byte progress display (e.g., `245 MB / 1.2 GB`)
|
||||
- Transfer speed calculation (e.g., `45 MB/s`)
|
||||
- ETA prediction for long operations
|
||||
- Unicode block-based visual bar
|
||||
- **Real-time extraction progress** - Archive extraction now reports actual bytes processed
|
||||
- **Go-native tar extraction** - Uses Go's `archive/tar` + `compress/gzip` when progress callback is set
|
||||
- **New `DetailedProgress` component** in TUI package:
|
||||
- `NewDetailedProgress(total, description)` - Byte-based progress
|
||||
- `NewDetailedProgressItems(total, description)` - Item count progress
|
||||
- `NewDetailedProgressSpinner(description)` - Indeterminate spinner
|
||||
- `RenderProgressBar(width)` - Generate schollz-style output
|
||||
- **Progress callback API** in restore engine:
|
||||
- `SetProgressCallback(func(current, total int64, description string))`
|
||||
- Allows TUI to receive real-time progress updates from restore operations
|
||||
- **Shared progress state** pattern for Bubble Tea integration
|
||||
|
||||
### Changed
|
||||
- TUI restore execution now shows detailed byte progress during archive extraction
|
||||
- Cluster restore shows extraction progress instead of just spinner
|
||||
- Falls back to shell `tar` command when no progress callback is set (faster)
|
||||
|
||||
### Technical Details
|
||||
- `progressReader` wrapper tracks bytes read through gzip/tar pipeline
|
||||
- Throttled progress updates (every 100ms) to avoid UI flooding
|
||||
- Thread-safe shared state pattern for cross-goroutine progress updates
|
||||
|
||||
## [3.42.34] - 2026-01-14 "Filesystem Abstraction"
|
||||
|
||||
### Added - spf13/afero for Filesystem Abstraction
|
||||
|
||||
57
README.md
57
README.md
@ -56,7 +56,7 @@ Download from [releases](https://git.uuxo.net/UUXO/dbbackup/releases):
|
||||
|
||||
```bash
|
||||
# Linux x86_64
|
||||
wget https://git.uuxo.net/UUXO/dbbackup/releases/download/v3.42.35/dbbackup-linux-amd64
|
||||
wget https://git.uuxo.net/UUXO/dbbackup/releases/download/v3.42.74/dbbackup-linux-amd64
|
||||
chmod +x dbbackup-linux-amd64
|
||||
sudo mv dbbackup-linux-amd64 /usr/local/bin/dbbackup
|
||||
```
|
||||
@ -194,21 +194,59 @@ r: Restore | v: Verify | i: Info | d: Diagnose | D: Delete | R: Refresh | Esc: B
|
||||
```
|
||||
Configuration Settings
|
||||
|
||||
[SYSTEM] Detected Resources
|
||||
CPU: 8 physical cores, 16 logical cores
|
||||
Memory: 32GB total, 28GB available
|
||||
Recommended Profile: balanced
|
||||
→ 8 cores and 32GB RAM supports moderate parallelism
|
||||
|
||||
[CONFIG] Current Settings
|
||||
Target DB: PostgreSQL (postgres)
|
||||
Database: postgres@localhost:5432
|
||||
Backup Dir: /var/backups/postgres
|
||||
Compression: Level 6
|
||||
Profile: balanced | Cluster: 2 parallel | Jobs: 4
|
||||
|
||||
> Database Type: postgres
|
||||
CPU Workload Type: balanced
|
||||
Backup Directory: /root/db_backups
|
||||
Work Directory: /tmp
|
||||
Resource Profile: balanced (P:2 J:4)
|
||||
Cluster Parallelism: 2
|
||||
Backup Directory: /var/backups/postgres
|
||||
Work Directory: (system temp)
|
||||
Compression Level: 6
|
||||
Parallel Jobs: 16
|
||||
Dump Jobs: 8
|
||||
Parallel Jobs: 4
|
||||
Dump Jobs: 4
|
||||
Database Host: localhost
|
||||
Database Port: 5432
|
||||
Database User: root
|
||||
Database User: postgres
|
||||
SSL Mode: prefer
|
||||
|
||||
s: Save | r: Reset | q: Menu
|
||||
[KEYS] ↑↓ navigate | Enter edit | 'l' toggle LargeDB | 'c' conservative | 'p' recommend | 's' save | 'q' menu
|
||||
```
|
||||
|
||||
**Resource Profiles for Large Databases:**
|
||||
|
||||
When restoring large databases on VMs with limited resources, use the resource profile settings to prevent "out of shared memory" errors:
|
||||
|
||||
| Profile | Cluster Parallel | Jobs | Best For |
|
||||
|---------|------------------|------|----------|
|
||||
| conservative | 1 | 1 | Small VMs (<16GB RAM) |
|
||||
| balanced | 2 | 2-4 | Medium VMs (16-32GB RAM) |
|
||||
| performance | 4 | 4-8 | Large servers (32GB+ RAM) |
|
||||
| max-performance | 8 | 8-16 | High-end servers (64GB+) |
|
||||
|
||||
**Large DB Mode:** Toggle with `l` key. Reduces parallelism by 50% and sets max_locks_per_transaction=8192 for complex databases with many tables/LOBs.
|
||||
|
||||
**Quick shortcuts:** Press `l` to toggle Large DB Mode, `c` for conservative, `p` to show recommendation.
|
||||
|
||||
**Troubleshooting Tools:**
|
||||
|
||||
For PostgreSQL restore issues ("out of shared memory" errors), diagnostic scripts are available:
|
||||
- **diagnose_postgres_memory.sh** - Comprehensive system memory, PostgreSQL configuration, and resource analysis
|
||||
- **fix_postgres_locks.sh** - Automatically increase max_locks_per_transaction to 4096
|
||||
|
||||
See [RESTORE_PROFILES.md](RESTORE_PROFILES.md) for detailed troubleshooting guidance.
|
||||
|
||||
**Database Status:**
|
||||
```
|
||||
Database Status & Health Check
|
||||
@ -248,6 +286,9 @@ dbbackup restore single backup.dump --target myapp_db --create --confirm
|
||||
# Restore cluster
|
||||
dbbackup restore cluster cluster_backup.tar.gz --confirm
|
||||
|
||||
# Restore with resource profile (for resource-constrained servers)
|
||||
dbbackup restore cluster backup.tar.gz --profile=conservative --confirm
|
||||
|
||||
# Restore with debug logging (saves detailed error report on failure)
|
||||
dbbackup restore cluster backup.tar.gz --save-debug-log /tmp/restore-debug.json --confirm
|
||||
|
||||
@ -303,6 +344,7 @@ dbbackup backup single mydb --dry-run
|
||||
| `--backup-dir` | Backup directory | ~/db_backups |
|
||||
| `--compression` | Compression level (0-9) | 6 |
|
||||
| `--jobs` | Parallel jobs | 8 |
|
||||
| `--profile` | Resource profile (conservative/balanced/aggressive) | balanced |
|
||||
| `--cloud` | Cloud storage URI | - |
|
||||
| `--encrypt` | Enable encryption | false |
|
||||
| `--dry-run, -n` | Run preflight checks only | false |
|
||||
@ -858,6 +900,7 @@ Workload types:
|
||||
|
||||
## Documentation
|
||||
|
||||
- [RESTORE_PROFILES.md](RESTORE_PROFILES.md) - Restore resource profiles & troubleshooting
|
||||
- [SYSTEMD.md](SYSTEMD.md) - Systemd installation & scheduling
|
||||
- [DOCKER.md](DOCKER.md) - Docker deployment
|
||||
- [CLOUD.md](CLOUD.md) - Cloud storage configuration
|
||||
|
||||
195
RESTORE_PROFILES.md
Normal file
195
RESTORE_PROFILES.md
Normal file
@ -0,0 +1,195 @@
|
||||
# Restore Profiles
|
||||
|
||||
## Overview
|
||||
|
||||
The `--profile` flag allows you to optimize restore operations based on your server's resources and current workload. This is particularly useful when dealing with "out of shared memory" errors or resource-constrained environments.
|
||||
|
||||
## Available Profiles
|
||||
|
||||
### Conservative Profile (`--profile=conservative`)
|
||||
**Best for:** Resource-constrained servers, production systems with other running services, or when dealing with "out of shared memory" errors.
|
||||
|
||||
**Settings:**
|
||||
- Single-threaded restore (`--parallel=1`)
|
||||
- Single-threaded decompression (`--jobs=1`)
|
||||
- Memory-conservative mode enabled
|
||||
- Minimal memory footprint
|
||||
|
||||
**When to use:**
|
||||
- Server RAM usage > 70%
|
||||
- Other critical services running (web servers, monitoring agents)
|
||||
- "out of shared memory" errors during restore
|
||||
- Small VMs or shared hosting environments
|
||||
- Disk I/O is the bottleneck
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
dbbackup restore cluster backup.tar.gz --profile=conservative --confirm
|
||||
```
|
||||
|
||||
### Balanced Profile (`--profile=balanced`) - DEFAULT
|
||||
**Best for:** Most scenarios, general-purpose servers with adequate resources.
|
||||
|
||||
**Settings:**
|
||||
- Auto-detect parallelism based on CPU/RAM
|
||||
- Moderate resource usage
|
||||
- Good balance between speed and stability
|
||||
|
||||
**When to use:**
|
||||
- Default choice for most restores
|
||||
- Dedicated database server with moderate load
|
||||
- Unknown or variable server conditions
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
dbbackup restore cluster backup.tar.gz --confirm
|
||||
# or explicitly:
|
||||
dbbackup restore cluster backup.tar.gz --profile=balanced --confirm
|
||||
```
|
||||
|
||||
### Aggressive Profile (`--profile=aggressive`)
|
||||
**Best for:** Dedicated database servers with ample resources, maintenance windows, performance-critical restores.
|
||||
|
||||
**Settings:**
|
||||
- Maximum parallelism (auto-detect based on CPU cores)
|
||||
- Maximum resource utilization
|
||||
- Fastest restore speed
|
||||
|
||||
**When to use:**
|
||||
- Dedicated database server (no other services)
|
||||
- Server RAM usage < 50%
|
||||
- Time-critical restores (RTO minimization)
|
||||
- Maintenance windows with service downtime
|
||||
- Testing/development environments
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
dbbackup restore cluster backup.tar.gz --profile=aggressive --confirm
|
||||
```
|
||||
|
||||
### Potato Profile (`--profile=potato`) 🥔
|
||||
**Easter egg:** Same as conservative, for servers running on a potato.
|
||||
|
||||
## Profile Comparison
|
||||
|
||||
| Setting | Conservative | Balanced | Aggressive |
|
||||
|---------|-------------|----------|-----------|
|
||||
| Parallel DBs | 1 (sequential) | Auto (2-4) | Auto (all CPUs) |
|
||||
| Jobs (decompression) | 1 | Auto (2-4) | Auto (all CPUs) |
|
||||
| Memory Usage | Minimal | Moderate | Maximum |
|
||||
| Speed | Slowest | Medium | Fastest |
|
||||
| Stability | Most stable | Stable | Requires resources |
|
||||
|
||||
## Overriding Profile Settings
|
||||
|
||||
You can override specific profile settings:
|
||||
|
||||
```bash
|
||||
# Use conservative profile but allow 2 parallel jobs for decompression
|
||||
dbbackup restore cluster backup.tar.gz \\
|
||||
--profile=conservative \\
|
||||
--jobs=2 \\
|
||||
--confirm
|
||||
|
||||
# Use aggressive profile but limit to 2 parallel databases
|
||||
dbbackup restore cluster backup.tar.gz \\
|
||||
--profile=aggressive \\
|
||||
--parallel-dbs=2 \\
|
||||
--confirm
|
||||
```
|
||||
|
||||
## Real-World Scenarios
|
||||
|
||||
### Scenario 1: "Out of Shared Memory" Error
|
||||
**Problem:** PostgreSQL restore fails with `ERROR: out of shared memory`
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Step 1: Use conservative profile
|
||||
dbbackup restore cluster backup.tar.gz --profile=conservative --confirm
|
||||
|
||||
# Step 2: If still failing, temporarily stop monitoring agents
|
||||
sudo systemctl stop nessus-agent elastic-agent
|
||||
dbbackup restore cluster backup.tar.gz --profile=conservative --confirm
|
||||
sudo systemctl start nessus-agent elastic-agent
|
||||
|
||||
# Step 3: Ask infrastructure team to increase work_mem (see email_infra_team.txt)
|
||||
```
|
||||
|
||||
### Scenario 2: Fast Disaster Recovery
|
||||
**Goal:** Restore as quickly as possible during maintenance window
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Stop all non-essential services first
|
||||
sudo systemctl stop nginx php-fpm
|
||||
dbbackup restore cluster backup.tar.gz --profile=aggressive --confirm
|
||||
sudo systemctl start nginx php-fpm
|
||||
```
|
||||
|
||||
### Scenario 3: Shared Server with Multiple Services
|
||||
**Environment:** Web server + database + monitoring all on same VM
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Always use conservative to avoid impacting other services
|
||||
dbbackup restore cluster backup.tar.gz --profile=conservative --confirm
|
||||
```
|
||||
|
||||
### Scenario 4: Unknown Server Conditions
|
||||
**Situation:** Restoring to a new server, unsure of resources
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Step 1: Run diagnostics first
|
||||
./diagnose_postgres_memory.sh > diagnosis.log
|
||||
|
||||
# Step 2: Choose profile based on memory usage:
|
||||
# - If memory > 80%: use conservative
|
||||
# - If memory 50-80%: use balanced (default)
|
||||
# - If memory < 50%: use aggressive
|
||||
|
||||
# Step 3: Start with balanced and adjust if needed
|
||||
dbbackup restore cluster backup.tar.gz --confirm
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Profile Selection Guide
|
||||
|
||||
**Use Conservative when:**
|
||||
- ✅ Memory usage > 70%
|
||||
- ✅ Other services running
|
||||
- ✅ Getting "out of shared memory" errors
|
||||
- ✅ Restore keeps failing
|
||||
- ✅ Small VM (< 4 GB RAM)
|
||||
- ✅ High swap usage
|
||||
|
||||
**Use Balanced when:**
|
||||
- ✅ Normal operation
|
||||
- ✅ Moderate server load
|
||||
- ✅ Unsure what to use
|
||||
- ✅ Medium VM (4-16 GB RAM)
|
||||
|
||||
**Use Aggressive when:**
|
||||
- ✅ Dedicated database server
|
||||
- ✅ Memory usage < 50%
|
||||
- ✅ No other critical services
|
||||
- ✅ Need fastest possible restore
|
||||
- ✅ Large VM (> 16 GB RAM)
|
||||
- ✅ Maintenance window
|
||||
|
||||
## Environment Variables
|
||||
|
||||
You can set a default profile:
|
||||
|
||||
```bash
|
||||
export RESOURCE_PROFILE=conservative
|
||||
dbbackup restore cluster backup.tar.gz --confirm
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [diagnose_postgres_memory.sh](diagnose_postgres_memory.sh) - Analyze system resources before restore
|
||||
- [fix_postgres_locks.sh](fix_postgres_locks.sh) - Fix PostgreSQL lock exhaustion
|
||||
- [email_infra_team.txt](email_infra_team.txt) - Template email for infrastructure team
|
||||
171
RESTORE_PROGRESS_PROPOSAL.md
Normal file
171
RESTORE_PROGRESS_PROPOSAL.md
Normal file
@ -0,0 +1,171 @@
|
||||
# Restore Progress Bar Enhancement Proposal
|
||||
|
||||
## Problem
|
||||
During Phase 2 cluster restore, the progress bar is not real-time because:
|
||||
- `pg_restore` subprocess blocks until completion
|
||||
- Progress updates only happen **before** each database restore starts
|
||||
- No feedback during actual restore execution (which can take hours)
|
||||
- Users see frozen progress bar during large database restores
|
||||
|
||||
## Root Cause
|
||||
In `internal/restore/engine.go`:
|
||||
- `executeRestoreCommand()` blocks on `cmd.Wait()`
|
||||
- Progress is only reported at goroutine entry (line ~1315)
|
||||
- No streaming progress during pg_restore execution
|
||||
|
||||
## Proposed Solutions
|
||||
|
||||
### Option 1: Parse pg_restore stderr for progress (RECOMMENDED)
|
||||
**Pros:**
|
||||
- Real-time feedback during restore
|
||||
- Works with existing pg_restore
|
||||
- No external tools needed
|
||||
|
||||
**Implementation:**
|
||||
```go
|
||||
// In executeRestoreCommand, modify stderr reader:
|
||||
go func() {
|
||||
scanner := bufio.NewScanner(stderr)
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
|
||||
// Parse pg_restore progress lines
|
||||
// Format: "pg_restore: processing item 1234 TABLE public users"
|
||||
if strings.Contains(line, "processing item") {
|
||||
e.reportItemProgress(line) // Update progress bar
|
||||
}
|
||||
|
||||
// Capture errors
|
||||
if strings.Contains(line, "ERROR:") {
|
||||
lastError = line
|
||||
errorCount++
|
||||
}
|
||||
}
|
||||
}()
|
||||
```
|
||||
|
||||
**Add to RestoreCluster goroutine:**
|
||||
```go
|
||||
// Track sub-items within each database
|
||||
var currentDBItems, totalDBItems int
|
||||
e.setItemProgressCallback(func(current, total int) {
|
||||
currentDBItems = current
|
||||
totalDBItems = total
|
||||
// Update TUI with sub-progress
|
||||
e.reportDatabaseSubProgress(idx, totalDBs, dbName, current, total)
|
||||
})
|
||||
```
|
||||
|
||||
### Option 2: Verbose mode with line counting
|
||||
**Pros:**
|
||||
- More granular progress (row-level)
|
||||
- Shows exact operation being performed
|
||||
|
||||
**Cons:**
|
||||
- `--verbose` causes massive stderr output (OOM risk on huge DBs)
|
||||
- Currently disabled for memory safety
|
||||
- Requires careful memory management
|
||||
|
||||
### Option 3: Hybrid approach (BEST)
|
||||
**Combine both:**
|
||||
1. **Default**: Parse non-verbose pg_restore output for item counts
|
||||
2. **Small DBs** (<500MB): Enable verbose for detailed progress
|
||||
3. **Periodic updates**: Report progress every 5 seconds even without stderr changes
|
||||
|
||||
**Implementation:**
|
||||
```go
|
||||
// Add periodic progress ticker
|
||||
progressTicker := time.NewTicker(5 * time.Second)
|
||||
defer progressTicker.Stop()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-progressTicker.C:
|
||||
// Report heartbeat even if no stderr
|
||||
e.reportHeartbeat(dbName, time.Since(dbRestoreStart))
|
||||
case <-stderrDone:
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
```
|
||||
|
||||
## Recommended Implementation Plan
|
||||
|
||||
### Phase 1: Quick Win (1-2 hours)
|
||||
1. Add heartbeat ticker in cluster restore goroutines
|
||||
2. Update TUI to show "Restoring database X... (elapsed: 3m 45s)"
|
||||
3. No code changes to pg_restore wrapper
|
||||
|
||||
### Phase 2: Parse pg_restore Output (4-6 hours)
|
||||
1. Parse stderr for "processing item" lines
|
||||
2. Extract current/total item counts
|
||||
3. Report sub-progress to TUI
|
||||
4. Update progress bar calculation:
|
||||
```
|
||||
dbProgress = baseProgress + (itemsDone/totalItems) * dbWeightedPercent
|
||||
```
|
||||
|
||||
### Phase 3: Smart Verbose Mode (optional)
|
||||
1. Detect database size before restore
|
||||
2. Enable verbose for DBs < 500MB
|
||||
3. Parse verbose output for detailed progress
|
||||
4. Automatic fallback to item-based for large DBs
|
||||
|
||||
## Files to Modify
|
||||
|
||||
1. **internal/restore/engine.go**:
|
||||
- `executeRestoreCommand()` - add progress parsing
|
||||
- `RestoreCluster()` - add heartbeat ticker
|
||||
- New: `reportItemProgress()`, `reportHeartbeat()`
|
||||
|
||||
2. **internal/tui/restore_exec.go**:
|
||||
- Update `RestoreExecModel` to handle sub-progress
|
||||
- Add "elapsed time" display during restore
|
||||
- Show item counts: "Restoring tables... (234/567)"
|
||||
|
||||
3. **internal/progress/indicator.go**:
|
||||
- Add `UpdateSubProgress(current, total int)` method
|
||||
- Add `ReportHeartbeat(elapsed time.Duration)` method
|
||||
|
||||
## Example Output
|
||||
|
||||
**Before (current):**
|
||||
```
|
||||
[====================] Phase 2/3: Restoring Databases (1/5)
|
||||
Restoring database myapp...
|
||||
[frozen for 30 minutes]
|
||||
```
|
||||
|
||||
**After (with heartbeat):**
|
||||
```
|
||||
[====================] Phase 2/3: Restoring Databases (1/5)
|
||||
Restoring database myapp... (elapsed: 4m 32s)
|
||||
[updates every 5 seconds]
|
||||
```
|
||||
|
||||
**After (with item parsing):**
|
||||
```
|
||||
[=========>-----------] Phase 2/3: Restoring Databases (1/5)
|
||||
Restoring database myapp... (processing item 1,234/5,678) (elapsed: 4m 32s)
|
||||
[smooth progress bar movement]
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
1. Test with small DB (< 100MB) - verify heartbeat works
|
||||
2. Test with large DB (> 10GB) - verify no OOM, heartbeat works
|
||||
3. Test with BLOB-heavy DB - verify phased restore shows progress
|
||||
4. Test parallel cluster restore - verify multiple heartbeats don't conflict
|
||||
|
||||
## Risk Assessment
|
||||
- **Low risk**: Heartbeat ticker (Phase 1)
|
||||
- **Medium risk**: stderr parsing (Phase 2) - test thoroughly
|
||||
- **High risk**: Verbose mode (Phase 3) - can cause OOM
|
||||
|
||||
## Estimated Implementation Time
|
||||
- Phase 1 (heartbeat): 1-2 hours
|
||||
- Phase 2 (item parsing): 4-6 hours
|
||||
- Phase 3 (smart verbose): 8-10 hours (optional)
|
||||
|
||||
**Total for Phases 1+2: 5-8 hours**
|
||||
@ -3,9 +3,9 @@
|
||||
This directory contains pre-compiled binaries for the DB Backup Tool across multiple platforms and architectures.
|
||||
|
||||
## Build Information
|
||||
- **Version**: 3.42.34
|
||||
- **Build Time**: 2026-01-14_16:06:08_UTC
|
||||
- **Git Commit**: ba6e8a2
|
||||
- **Version**: 3.42.81
|
||||
- **Build Time**: 2026-01-22_17:13:41_UTC
|
||||
- **Git Commit**: 6a7cf3c
|
||||
|
||||
## Recent Updates (v1.1.0)
|
||||
- ✅ Fixed TUI progress display with line-by-line output
|
||||
|
||||
@ -66,6 +66,15 @@ TUI Automation Flags (for testing and CI/CD):
|
||||
cfg.TUIVerbose, _ = cmd.Flags().GetBool("verbose-tui")
|
||||
cfg.TUILogFile, _ = cmd.Flags().GetString("tui-log-file")
|
||||
|
||||
// Set conservative profile as default for TUI mode (safer for interactive users)
|
||||
if cfg.ResourceProfile == "" || cfg.ResourceProfile == "balanced" {
|
||||
cfg.ResourceProfile = "conservative"
|
||||
cfg.LargeDBMode = true
|
||||
if cfg.Debug {
|
||||
log.Info("TUI mode: using conservative profile by default")
|
||||
}
|
||||
}
|
||||
|
||||
// Check authentication before starting TUI
|
||||
if cfg.IsPostgreSQL() {
|
||||
if mismatch, msg := auth.CheckAuthenticationMismatch(cfg); mismatch {
|
||||
|
||||
336
cmd/restore.go
336
cmd/restore.go
@ -13,8 +13,10 @@ import (
|
||||
|
||||
"dbbackup/internal/backup"
|
||||
"dbbackup/internal/cloud"
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/database"
|
||||
"dbbackup/internal/pitr"
|
||||
"dbbackup/internal/progress"
|
||||
"dbbackup/internal/restore"
|
||||
"dbbackup/internal/security"
|
||||
|
||||
@ -28,6 +30,8 @@ var (
|
||||
restoreClean bool
|
||||
restoreCreate bool
|
||||
restoreJobs int
|
||||
restoreParallelDBs int // Number of parallel database restores
|
||||
restoreProfile string // Resource profile: conservative, balanced, aggressive
|
||||
restoreTarget string
|
||||
restoreVerbose bool
|
||||
restoreNoProgress bool
|
||||
@ -35,6 +39,13 @@ var (
|
||||
restoreCleanCluster bool
|
||||
restoreDiagnose bool // Run diagnosis before restore
|
||||
restoreSaveDebugLog string // Path to save debug log on failure
|
||||
restoreDebugLocks bool // Enable detailed lock debugging
|
||||
|
||||
// Single database extraction from cluster flags
|
||||
restoreDatabase string // Single database to extract/restore from cluster
|
||||
restoreDatabases string // Comma-separated list of databases to extract
|
||||
restoreOutputDir string // Extract to directory (no restore)
|
||||
restoreListDBs bool // List databases in cluster backup
|
||||
|
||||
// Diagnose flags
|
||||
diagnoseJSON bool
|
||||
@ -111,6 +122,9 @@ Examples:
|
||||
# Restore to different database
|
||||
dbbackup restore single mydb.dump.gz --target mydb_test --confirm
|
||||
|
||||
# Memory-constrained server (single-threaded, minimal memory)
|
||||
dbbackup restore single mydb.dump.gz --profile=conservative --confirm
|
||||
|
||||
# Clean target database before restore
|
||||
dbbackup restore single mydb.sql.gz --clean --confirm
|
||||
|
||||
@ -130,6 +144,11 @@ var restoreClusterCmd = &cobra.Command{
|
||||
This command restores all databases that were backed up together
|
||||
in a cluster backup operation.
|
||||
|
||||
Single Database Extraction:
|
||||
Use --list-databases to see available databases
|
||||
Use --database to extract/restore a specific database
|
||||
Use --output-dir to extract without restoring
|
||||
|
||||
Safety features:
|
||||
- Dry-run by default (use --confirm to execute)
|
||||
- Archive validation and listing
|
||||
@ -137,12 +156,33 @@ Safety features:
|
||||
- Sequential database restoration
|
||||
|
||||
Examples:
|
||||
# List databases in cluster backup
|
||||
dbbackup restore cluster backup.tar.gz --list-databases
|
||||
|
||||
# Extract single database (no restore)
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --output-dir /tmp/extract
|
||||
|
||||
# Restore single database from cluster
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --confirm
|
||||
|
||||
# Restore single database with different name
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --target myapp_test --confirm
|
||||
|
||||
# Extract multiple databases
|
||||
dbbackup restore cluster backup.tar.gz --databases "app1,app2,app3" --output-dir /tmp/extract
|
||||
|
||||
# Preview cluster restore
|
||||
dbbackup restore cluster cluster_backup_20240101_120000.tar.gz
|
||||
|
||||
# Restore full cluster
|
||||
dbbackup restore cluster cluster_backup_20240101_120000.tar.gz --confirm
|
||||
|
||||
# Memory-constrained server (conservative profile)
|
||||
dbbackup restore cluster cluster_backup.tar.gz --profile=conservative --confirm
|
||||
|
||||
# Maximum performance (dedicated server)
|
||||
dbbackup restore cluster cluster_backup.tar.gz --profile=aggressive --confirm
|
||||
|
||||
# Use parallel decompression
|
||||
dbbackup restore cluster cluster_backup.tar.gz --jobs 4 --confirm
|
||||
|
||||
@ -276,19 +316,27 @@ func init() {
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreClean, "clean", false, "Drop and recreate target database")
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreCreate, "create", false, "Create target database if it doesn't exist")
|
||||
restoreSingleCmd.Flags().StringVar(&restoreTarget, "target", "", "Target database name (defaults to original)")
|
||||
restoreSingleCmd.Flags().StringVar(&restoreProfile, "profile", "balanced", "Resource profile: conservative (--parallel=1, low memory), balanced, aggressive (max performance)")
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreVerbose, "verbose", false, "Show detailed restore progress")
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreNoProgress, "no-progress", false, "Disable progress indicators")
|
||||
restoreSingleCmd.Flags().StringVar(&restoreEncryptionKeyFile, "encryption-key-file", "", "Path to encryption key file (required for encrypted backups)")
|
||||
restoreSingleCmd.Flags().StringVar(&restoreEncryptionKeyEnv, "encryption-key-env", "DBBACKUP_ENCRYPTION_KEY", "Environment variable containing encryption key")
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreDiagnose, "diagnose", false, "Run deep diagnosis before restore to detect corruption/truncation")
|
||||
restoreSingleCmd.Flags().StringVar(&restoreSaveDebugLog, "save-debug-log", "", "Save detailed error report to file on failure (e.g., /tmp/restore-debug.json)")
|
||||
restoreSingleCmd.Flags().BoolVar(&restoreDebugLocks, "debug-locks", false, "Enable detailed lock debugging (captures PostgreSQL config, Guard decisions, boost attempts)")
|
||||
|
||||
// Cluster restore flags
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreListDBs, "list-databases", false, "List databases in cluster backup and exit")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreDatabase, "database", "", "Extract/restore single database from cluster")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreDatabases, "databases", "", "Extract multiple databases (comma-separated)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreOutputDir, "output-dir", "", "Extract to directory without restoring (requires --database or --databases)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreConfirm, "confirm", false, "Confirm and execute restore (required)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreDryRun, "dry-run", false, "Show what would be done without executing")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreForce, "force", false, "Skip safety checks and confirmations")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreCleanCluster, "clean-cluster", false, "Drop all existing user databases before restore (disaster recovery)")
|
||||
restoreClusterCmd.Flags().IntVar(&restoreJobs, "jobs", 0, "Number of parallel decompression jobs (0 = auto)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreProfile, "profile", "conservative", "Resource profile: conservative (single-threaded, prevents lock issues), balanced (auto-detect), aggressive (max speed)")
|
||||
restoreClusterCmd.Flags().IntVar(&restoreJobs, "jobs", 0, "Number of parallel decompression jobs (0 = auto, overrides profile)")
|
||||
restoreClusterCmd.Flags().IntVar(&restoreParallelDBs, "parallel-dbs", 0, "Number of databases to restore in parallel (0 = use profile, 1 = sequential, -1 = auto-detect, overrides profile)")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreWorkdir, "workdir", "", "Working directory for extraction (use when system disk is small, e.g. /mnt/storage/restore_tmp)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreVerbose, "verbose", false, "Show detailed restore progress")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreNoProgress, "no-progress", false, "Disable progress indicators")
|
||||
@ -296,6 +344,9 @@ func init() {
|
||||
restoreClusterCmd.Flags().StringVar(&restoreEncryptionKeyEnv, "encryption-key-env", "DBBACKUP_ENCRYPTION_KEY", "Environment variable containing encryption key")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreDiagnose, "diagnose", false, "Run deep diagnosis on all dumps before restore")
|
||||
restoreClusterCmd.Flags().StringVar(&restoreSaveDebugLog, "save-debug-log", "", "Save detailed error report to file on failure (e.g., /tmp/restore-debug.json)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreDebugLocks, "debug-locks", false, "Enable detailed lock debugging (captures PostgreSQL config, Guard decisions, boost attempts)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreClean, "clean", false, "Drop and recreate target database (for single DB restore)")
|
||||
restoreClusterCmd.Flags().BoolVar(&restoreCreate, "create", false, "Create target database if it doesn't exist (for single DB restore)")
|
||||
|
||||
// PITR restore flags
|
||||
restorePITRCmd.Flags().StringVar(&pitrBaseBackup, "base-backup", "", "Path to base backup file (.tar.gz) (required)")
|
||||
@ -434,6 +485,16 @@ func runRestoreDiagnose(cmd *cobra.Command, args []string) error {
|
||||
func runRestoreSingle(cmd *cobra.Command, args []string) error {
|
||||
archivePath := args[0]
|
||||
|
||||
// Apply resource profile
|
||||
if err := config.ApplyProfile(cfg, restoreProfile, restoreJobs, 0); err != nil {
|
||||
log.Warn("Invalid profile, using balanced", "error", err)
|
||||
restoreProfile = "balanced"
|
||||
_ = config.ApplyProfile(cfg, restoreProfile, restoreJobs, 0)
|
||||
}
|
||||
if cfg.Debug && restoreProfile != "balanced" {
|
||||
log.Info("Using restore profile", "profile", restoreProfile)
|
||||
}
|
||||
|
||||
// Check if this is a cloud URI
|
||||
var cleanupFunc func() error
|
||||
|
||||
@ -572,6 +633,12 @@ func runRestoreSingle(cmd *cobra.Command, args []string) error {
|
||||
log.Info("Debug logging enabled", "output", restoreSaveDebugLog)
|
||||
}
|
||||
|
||||
// Enable lock debugging if requested (single restore)
|
||||
if restoreDebugLocks {
|
||||
cfg.DebugLocks = true
|
||||
log.Info("🔍 Lock debugging enabled - will capture PostgreSQL lock config, Guard decisions, boost attempts")
|
||||
}
|
||||
|
||||
// Setup signal handling
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
defer cancel()
|
||||
@ -655,6 +722,203 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
return fmt.Errorf("archive not found: %s", archivePath)
|
||||
}
|
||||
|
||||
// Handle --list-databases flag
|
||||
if restoreListDBs {
|
||||
return runListDatabases(archivePath)
|
||||
}
|
||||
|
||||
// Handle single/multiple database extraction
|
||||
if restoreDatabase != "" || restoreDatabases != "" {
|
||||
return runExtractDatabases(archivePath)
|
||||
}
|
||||
|
||||
// Otherwise proceed with full cluster restore
|
||||
return runFullClusterRestore(archivePath)
|
||||
}
|
||||
|
||||
// runListDatabases lists all databases in a cluster backup
|
||||
func runListDatabases(archivePath string) error {
|
||||
ctx := context.Background()
|
||||
|
||||
log.Info("Scanning cluster backup", "archive", filepath.Base(archivePath))
|
||||
fmt.Println()
|
||||
|
||||
databases, err := restore.ListDatabasesInCluster(ctx, archivePath, log)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to list databases: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("📦 Databases in cluster backup:\n")
|
||||
var totalSize int64
|
||||
for _, db := range databases {
|
||||
sizeStr := formatSize(db.Size)
|
||||
fmt.Printf(" - %-30s (%s)\n", db.Name, sizeStr)
|
||||
totalSize += db.Size
|
||||
}
|
||||
|
||||
fmt.Printf("\nTotal: %s across %d database(s)\n", formatSize(totalSize), len(databases))
|
||||
return nil
|
||||
}
|
||||
|
||||
// runExtractDatabases extracts single or multiple databases from cluster backup
|
||||
func runExtractDatabases(archivePath string) error {
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
defer cancel()
|
||||
|
||||
// Setup signal handling
|
||||
sigChan := make(chan os.Signal, 1)
|
||||
signal.Notify(sigChan, os.Interrupt, syscall.SIGTERM)
|
||||
defer signal.Stop(sigChan)
|
||||
|
||||
go func() {
|
||||
<-sigChan
|
||||
log.Warn("Extraction interrupted by user")
|
||||
cancel()
|
||||
}()
|
||||
|
||||
// Single database extraction
|
||||
if restoreDatabase != "" {
|
||||
return handleSingleDatabaseExtraction(ctx, archivePath, restoreDatabase)
|
||||
}
|
||||
|
||||
// Multiple database extraction
|
||||
if restoreDatabases != "" {
|
||||
return handleMultipleDatabaseExtraction(ctx, archivePath, restoreDatabases)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// handleSingleDatabaseExtraction handles single database extraction or restore
|
||||
func handleSingleDatabaseExtraction(ctx context.Context, archivePath, dbName string) error {
|
||||
// Extract-only mode (no restore)
|
||||
if restoreOutputDir != "" {
|
||||
return extractSingleDatabase(ctx, archivePath, dbName, restoreOutputDir)
|
||||
}
|
||||
|
||||
// Restore mode
|
||||
if !restoreConfirm {
|
||||
fmt.Println("\n[DRY-RUN] DRY-RUN MODE - No changes will be made")
|
||||
fmt.Printf("\nWould extract and restore:\n")
|
||||
fmt.Printf(" Database: %s\n", dbName)
|
||||
fmt.Printf(" From: %s\n", archivePath)
|
||||
targetDB := restoreTarget
|
||||
if targetDB == "" {
|
||||
targetDB = dbName
|
||||
}
|
||||
fmt.Printf(" Target: %s\n", targetDB)
|
||||
if restoreClean {
|
||||
fmt.Printf(" Clean: true (drop and recreate)\n")
|
||||
}
|
||||
if restoreCreate {
|
||||
fmt.Printf(" Create: true (create if missing)\n")
|
||||
}
|
||||
fmt.Println("\nTo execute this restore, add --confirm flag")
|
||||
return nil
|
||||
}
|
||||
|
||||
// Create database instance
|
||||
db, err := database.New(cfg, log)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create database instance: %w", err)
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
// Create restore engine
|
||||
engine := restore.New(cfg, log, db)
|
||||
|
||||
// Determine target database name
|
||||
targetDB := restoreTarget
|
||||
if targetDB == "" {
|
||||
targetDB = dbName
|
||||
}
|
||||
|
||||
log.Info("Restoring single database from cluster", "database", dbName, "target", targetDB)
|
||||
|
||||
// Restore single database from cluster
|
||||
if err := engine.RestoreSingleFromCluster(ctx, archivePath, dbName, targetDB, restoreClean, restoreCreate); err != nil {
|
||||
return fmt.Errorf("restore failed: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("\n✅ Successfully restored '%s' as '%s'\n", dbName, targetDB)
|
||||
return nil
|
||||
}
|
||||
|
||||
// extractSingleDatabase extracts a single database without restoring
|
||||
func extractSingleDatabase(ctx context.Context, archivePath, dbName, outputDir string) error {
|
||||
log.Info("Extracting database", "database", dbName, "output", outputDir)
|
||||
|
||||
// Create progress indicator
|
||||
prog := progress.NewIndicator(!restoreNoProgress, "dots")
|
||||
|
||||
extractedPath, err := restore.ExtractDatabaseFromCluster(ctx, archivePath, dbName, outputDir, log, prog)
|
||||
if err != nil {
|
||||
return fmt.Errorf("extraction failed: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("\n✅ Extracted: %s\n", extractedPath)
|
||||
fmt.Printf(" Database: %s\n", dbName)
|
||||
fmt.Printf(" Location: %s\n", outputDir)
|
||||
return nil
|
||||
}
|
||||
|
||||
// handleMultipleDatabaseExtraction handles multiple database extraction
|
||||
func handleMultipleDatabaseExtraction(ctx context.Context, archivePath, databases string) error {
|
||||
if restoreOutputDir == "" {
|
||||
return fmt.Errorf("--output-dir required when using --databases")
|
||||
}
|
||||
|
||||
// Parse database list
|
||||
dbNames := strings.Split(databases, ",")
|
||||
for i := range dbNames {
|
||||
dbNames[i] = strings.TrimSpace(dbNames[i])
|
||||
}
|
||||
|
||||
log.Info("Extracting multiple databases", "count", len(dbNames), "output", restoreOutputDir)
|
||||
|
||||
// Create progress indicator
|
||||
prog := progress.NewIndicator(!restoreNoProgress, "dots")
|
||||
|
||||
extractedPaths, err := restore.ExtractMultipleDatabasesFromCluster(ctx, archivePath, dbNames, restoreOutputDir, log, prog)
|
||||
if err != nil {
|
||||
return fmt.Errorf("extraction failed: %w", err)
|
||||
}
|
||||
|
||||
fmt.Printf("\n✅ Extracted %d database(s):\n", len(extractedPaths))
|
||||
for dbName, path := range extractedPaths {
|
||||
fmt.Printf(" - %s → %s\n", dbName, filepath.Base(path))
|
||||
}
|
||||
fmt.Printf(" Location: %s\n", restoreOutputDir)
|
||||
return nil
|
||||
}
|
||||
|
||||
// runFullClusterRestore performs a full cluster restore
|
||||
func runFullClusterRestore(archivePath string) error {
|
||||
|
||||
// Apply resource profile
|
||||
if err := config.ApplyProfile(cfg, restoreProfile, restoreJobs, restoreParallelDBs); err != nil {
|
||||
log.Warn("Invalid profile, using balanced", "error", err)
|
||||
restoreProfile = "balanced"
|
||||
_ = config.ApplyProfile(cfg, restoreProfile, restoreJobs, restoreParallelDBs)
|
||||
}
|
||||
if cfg.Debug || restoreProfile != "balanced" {
|
||||
log.Info("Using restore profile", "profile", restoreProfile, "parallel_dbs", cfg.ClusterParallelism, "jobs", cfg.Jobs)
|
||||
}
|
||||
|
||||
// Convert to absolute path
|
||||
if !filepath.IsAbs(archivePath) {
|
||||
absPath, err := filepath.Abs(archivePath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("invalid archive path: %w", err)
|
||||
}
|
||||
archivePath = absPath
|
||||
}
|
||||
|
||||
// Check if file exists
|
||||
if _, err := os.Stat(archivePath); err != nil {
|
||||
return fmt.Errorf("archive not found: %s", archivePath)
|
||||
}
|
||||
|
||||
// Check if backup is encrypted and decrypt if necessary
|
||||
if backup.IsBackupEncrypted(archivePath) {
|
||||
log.Info("Encrypted cluster backup detected, decrypting...")
|
||||
@ -783,6 +1047,17 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
}
|
||||
}
|
||||
|
||||
// Override cluster parallelism if --parallel-dbs is specified
|
||||
if restoreParallelDBs == -1 {
|
||||
// Auto-detect optimal parallelism based on system resources
|
||||
autoParallel := restore.CalculateOptimalParallel()
|
||||
cfg.ClusterParallelism = autoParallel
|
||||
log.Info("Auto-detected optimal parallelism for database restores", "parallel_dbs", autoParallel, "mode", "auto")
|
||||
} else if restoreParallelDBs > 0 {
|
||||
cfg.ClusterParallelism = restoreParallelDBs
|
||||
log.Info("Using custom parallelism for database restores", "parallel_dbs", restoreParallelDBs)
|
||||
}
|
||||
|
||||
// Create restore engine
|
||||
engine := restore.New(cfg, log, db)
|
||||
|
||||
@ -792,6 +1067,12 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
log.Info("Debug logging enabled", "output", restoreSaveDebugLog)
|
||||
}
|
||||
|
||||
// Enable lock debugging if requested (cluster restore)
|
||||
if restoreDebugLocks {
|
||||
cfg.DebugLocks = true
|
||||
log.Info("🔍 Lock debugging enabled - will capture PostgreSQL lock config, Guard decisions, boost attempts")
|
||||
}
|
||||
|
||||
// Setup signal handling
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
defer cancel()
|
||||
@ -827,22 +1108,50 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
log.Info("Database cleanup completed")
|
||||
}
|
||||
|
||||
// Run pre-restore diagnosis if requested
|
||||
if restoreDiagnose {
|
||||
log.Info("[DIAG] Running pre-restore diagnosis...")
|
||||
// OPTIMIZATION: Pre-extract archive once for both diagnosis and restore
|
||||
// This avoids extracting the same tar.gz twice (saves 5-10 min on large clusters)
|
||||
var extractedDir string
|
||||
var extractErr error
|
||||
|
||||
// Create temp directory for extraction in configured WorkDir
|
||||
workDir := cfg.GetEffectiveWorkDir()
|
||||
diagTempDir, err := os.MkdirTemp(workDir, "dbbackup-diagnose-*")
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create temp directory for diagnosis in %s: %w", workDir, err)
|
||||
if restoreDiagnose || restoreConfirm {
|
||||
log.Info("Pre-extracting cluster archive (shared for validation and restore)...")
|
||||
extractedDir, extractErr = safety.ValidateAndExtractCluster(ctx, archivePath)
|
||||
if extractErr != nil {
|
||||
return fmt.Errorf("failed to extract cluster archive: %w", extractErr)
|
||||
}
|
||||
defer os.RemoveAll(diagTempDir)
|
||||
defer os.RemoveAll(extractedDir) // Cleanup at end
|
||||
log.Info("Archive extracted successfully", "location", extractedDir)
|
||||
}
|
||||
|
||||
// Run pre-restore diagnosis if requested (using already-extracted directory)
|
||||
if restoreDiagnose {
|
||||
log.Info("[DIAG] Running pre-restore diagnosis on extracted dumps...")
|
||||
|
||||
diagnoser := restore.NewDiagnoser(log, restoreVerbose)
|
||||
results, err := diagnoser.DiagnoseClusterDumps(archivePath, diagTempDir)
|
||||
// Diagnose dumps directly from extracted directory
|
||||
dumpsDir := filepath.Join(extractedDir, "dumps")
|
||||
if _, err := os.Stat(dumpsDir); err != nil {
|
||||
return fmt.Errorf("no dumps directory found in extracted archive: %w", err)
|
||||
}
|
||||
|
||||
entries, err := os.ReadDir(dumpsDir)
|
||||
if err != nil {
|
||||
return fmt.Errorf("diagnosis failed: %w", err)
|
||||
return fmt.Errorf("failed to read dumps directory: %w", err)
|
||||
}
|
||||
|
||||
// Diagnose each dump file
|
||||
var results []*restore.DiagnoseResult
|
||||
for _, entry := range entries {
|
||||
if entry.IsDir() {
|
||||
continue
|
||||
}
|
||||
dumpPath := filepath.Join(dumpsDir, entry.Name())
|
||||
result, err := diagnoser.DiagnoseFile(dumpPath)
|
||||
if err != nil {
|
||||
log.Warn("Could not diagnose dump", "file", entry.Name(), "error", err)
|
||||
continue
|
||||
}
|
||||
results = append(results, result)
|
||||
}
|
||||
|
||||
// Check for any invalid dumps
|
||||
@ -882,7 +1191,8 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
|
||||
startTime := time.Now()
|
||||
auditLogger.LogRestoreStart(user, "all_databases", archivePath)
|
||||
|
||||
if err := engine.RestoreCluster(ctx, archivePath); err != nil {
|
||||
// Pass pre-extracted directory to avoid double extraction
|
||||
if err := engine.RestoreCluster(ctx, archivePath, extractedDir); err != nil {
|
||||
auditLogger.LogRestoreFailed(user, "all_databases", err)
|
||||
return fmt.Errorf("cluster restore failed: %w", err)
|
||||
}
|
||||
|
||||
@ -134,6 +134,7 @@ func Execute(ctx context.Context, config *config.Config, logger logger.Logger) e
|
||||
rootCmd.PersistentFlags().StringVar(&cfg.BackupDir, "backup-dir", cfg.BackupDir, "Backup directory")
|
||||
rootCmd.PersistentFlags().BoolVar(&cfg.NoColor, "no-color", cfg.NoColor, "Disable colored output")
|
||||
rootCmd.PersistentFlags().BoolVar(&cfg.Debug, "debug", cfg.Debug, "Enable debug logging")
|
||||
rootCmd.PersistentFlags().BoolVar(&cfg.DebugLocks, "debug-locks", cfg.DebugLocks, "Enable detailed lock debugging (captures PostgreSQL lock configuration, Large DB Guard decisions, boost attempts)")
|
||||
rootCmd.PersistentFlags().IntVar(&cfg.Jobs, "jobs", cfg.Jobs, "Number of parallel jobs")
|
||||
rootCmd.PersistentFlags().IntVar(&cfg.DumpJobs, "dump-jobs", cfg.DumpJobs, "Number of parallel dump jobs")
|
||||
rootCmd.PersistentFlags().IntVar(&cfg.MaxCores, "max-cores", cfg.MaxCores, "Maximum CPU cores to use")
|
||||
|
||||
359
diagnose_postgres_memory.sh
Executable file
359
diagnose_postgres_memory.sh
Executable file
@ -0,0 +1,359 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# PostgreSQL Memory and Resource Diagnostic Tool
|
||||
# Analyzes memory usage, locks, and system resources to identify restore issues
|
||||
#
|
||||
|
||||
set -e
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " PostgreSQL Memory & Resource Diagnostics"
|
||||
echo " $(date '+%Y-%m-%d %H:%M:%S')"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
|
||||
# Function to format bytes to human readable
|
||||
bytes_to_human() {
|
||||
local bytes=$1
|
||||
if [ "$bytes" -ge 1073741824 ]; then
|
||||
echo "$(awk "BEGIN {printf \"%.2f GB\", $bytes/1073741824}")"
|
||||
elif [ "$bytes" -ge 1048576 ]; then
|
||||
echo "$(awk "BEGIN {printf \"%.2f MB\", $bytes/1048576}")"
|
||||
else
|
||||
echo "$(awk "BEGIN {printf \"%.2f KB\", $bytes/1024}")"
|
||||
fi
|
||||
}
|
||||
|
||||
# 1. SYSTEM MEMORY OVERVIEW
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}📊 SYSTEM MEMORY OVERVIEW${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
if command -v free &> /dev/null; then
|
||||
free -h
|
||||
echo
|
||||
|
||||
# Calculate percentages
|
||||
MEM_TOTAL=$(free -b | awk '/^Mem:/ {print $2}')
|
||||
MEM_USED=$(free -b | awk '/^Mem:/ {print $3}')
|
||||
MEM_FREE=$(free -b | awk '/^Mem:/ {print $4}')
|
||||
MEM_AVAILABLE=$(free -b | awk '/^Mem:/ {print $7}')
|
||||
|
||||
MEM_PERCENT=$(awk "BEGIN {printf \"%.1f\", ($MEM_USED/$MEM_TOTAL)*100}")
|
||||
|
||||
echo "Memory Utilization: ${MEM_PERCENT}%"
|
||||
echo "Total: $(bytes_to_human $MEM_TOTAL)"
|
||||
echo "Used: $(bytes_to_human $MEM_USED)"
|
||||
echo "Available: $(bytes_to_human $MEM_AVAILABLE)"
|
||||
|
||||
if (( $(echo "$MEM_PERCENT > 90" | bc -l) )); then
|
||||
echo -e "${RED}⚠️ WARNING: Memory usage is critically high (>90%)${NC}"
|
||||
elif (( $(echo "$MEM_PERCENT > 70" | bc -l) )); then
|
||||
echo -e "${YELLOW}⚠️ CAUTION: Memory usage is high (>70%)${NC}"
|
||||
else
|
||||
echo -e "${GREEN}✓ Memory usage is acceptable${NC}"
|
||||
fi
|
||||
fi
|
||||
echo
|
||||
|
||||
# 2. TOP MEMORY CONSUMERS
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}🔍 TOP 15 MEMORY CONSUMING PROCESSES${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
ps aux --sort=-%mem | head -16 | awk 'NR==1 {print $0} NR>1 {printf "%-8s %5s%% %7s %s\n", $1, $4, $6/1024"M", $11}'
|
||||
echo
|
||||
|
||||
# 3. POSTGRESQL PROCESSES
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}🐘 POSTGRESQL PROCESSES${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
PG_PROCS=$(ps aux | grep -E "postgres.*:" | grep -v grep || true)
|
||||
if [ -z "$PG_PROCS" ]; then
|
||||
echo "No PostgreSQL processes found"
|
||||
else
|
||||
echo "$PG_PROCS" | awk '{printf "%-8s %5s%% %7s %s\n", $1, $4, $6/1024"M", $11}'
|
||||
echo
|
||||
|
||||
# Sum up PostgreSQL memory
|
||||
PG_MEM_TOTAL=$(echo "$PG_PROCS" | awk '{sum+=$6} END {print sum/1024}')
|
||||
echo "Total PostgreSQL Memory: ${PG_MEM_TOTAL} MB"
|
||||
fi
|
||||
echo
|
||||
|
||||
# 4. POSTGRESQL CONFIGURATION
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}⚙️ POSTGRESQL MEMORY CONFIGURATION${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
if command -v psql &> /dev/null; then
|
||||
PSQL_CMD="psql -t -A -c"
|
||||
|
||||
# Try as postgres user first, then current user
|
||||
if sudo -u postgres $PSQL_CMD "SELECT 1" &> /dev/null; then
|
||||
PSQL_PREFIX="sudo -u postgres"
|
||||
elif $PSQL_CMD "SELECT 1" &> /dev/null; then
|
||||
PSQL_PREFIX=""
|
||||
else
|
||||
echo "❌ Cannot connect to PostgreSQL"
|
||||
PSQL_PREFIX="NONE"
|
||||
fi
|
||||
|
||||
if [ "$PSQL_PREFIX" != "NONE" ]; then
|
||||
echo "Key Memory Settings:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
|
||||
# Get all relevant settings (strip timing output)
|
||||
SHARED_BUFFERS=$($PSQL_PREFIX psql -t -A -c "SHOW shared_buffers;" 2>/dev/null | head -1 || echo "unknown")
|
||||
WORK_MEM=$($PSQL_PREFIX psql -t -A -c "SHOW work_mem;" 2>/dev/null | head -1 || echo "unknown")
|
||||
MAINT_WORK_MEM=$($PSQL_PREFIX psql -t -A -c "SHOW maintenance_work_mem;" 2>/dev/null | head -1 || echo "unknown")
|
||||
EFFECTIVE_CACHE=$($PSQL_PREFIX psql -t -A -c "SHOW effective_cache_size;" 2>/dev/null | head -1 || echo "unknown")
|
||||
MAX_CONNECTIONS=$($PSQL_PREFIX psql -t -A -c "SHOW max_connections;" 2>/dev/null | head -1 || echo "unknown")
|
||||
MAX_LOCKS=$($PSQL_PREFIX psql -t -A -c "SHOW max_locks_per_transaction;" 2>/dev/null | head -1 || echo "unknown")
|
||||
MAX_PREPARED=$($PSQL_PREFIX psql -t -A -c "SHOW max_prepared_transactions;" 2>/dev/null | head -1 || echo "unknown")
|
||||
|
||||
echo "shared_buffers: $SHARED_BUFFERS"
|
||||
echo "work_mem: $WORK_MEM"
|
||||
echo "maintenance_work_mem: $MAINT_WORK_MEM"
|
||||
echo "effective_cache_size: $EFFECTIVE_CACHE"
|
||||
echo "max_connections: $MAX_CONNECTIONS"
|
||||
echo "max_locks_per_transaction: $MAX_LOCKS"
|
||||
echo "max_prepared_transactions: $MAX_PREPARED"
|
||||
echo
|
||||
|
||||
# Calculate lock capacity
|
||||
if [ "$MAX_LOCKS" != "unknown" ] && [ "$MAX_CONNECTIONS" != "unknown" ] && [ "$MAX_PREPARED" != "unknown" ]; then
|
||||
# Ensure values are numeric
|
||||
if [[ "$MAX_LOCKS" =~ ^[0-9]+$ ]] && [[ "$MAX_CONNECTIONS" =~ ^[0-9]+$ ]] && [[ "$MAX_PREPARED" =~ ^[0-9]+$ ]]; then
|
||||
LOCK_CAPACITY=$((MAX_LOCKS * (MAX_CONNECTIONS + MAX_PREPARED)))
|
||||
echo "Total Lock Capacity: $LOCK_CAPACITY locks"
|
||||
|
||||
if [ "$MAX_LOCKS" -lt 1000 ]; then
|
||||
echo -e "${RED}⚠️ WARNING: max_locks_per_transaction is too low for large restores${NC}"
|
||||
echo -e "${YELLOW} Recommended: 4096 or higher${NC}"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
echo
|
||||
fi
|
||||
else
|
||||
echo "❌ psql not found"
|
||||
fi
|
||||
|
||||
# 5. CURRENT LOCKS AND CONNECTIONS
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}🔒 CURRENT LOCKS AND CONNECTIONS${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
if [ "$PSQL_PREFIX" != "NONE" ] && command -v psql &> /dev/null; then
|
||||
# Active connections
|
||||
ACTIVE_CONNS=$($PSQL_PREFIX psql -t -A -c "SELECT count(*) FROM pg_stat_activity;" 2>/dev/null | head -1 || echo "0")
|
||||
echo "Active Connections: $ACTIVE_CONNS / $MAX_CONNECTIONS"
|
||||
echo
|
||||
|
||||
# Lock statistics
|
||||
echo "Current Lock Usage:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
$PSQL_PREFIX psql -c "
|
||||
SELECT
|
||||
mode,
|
||||
COUNT(*) as count
|
||||
FROM pg_locks
|
||||
GROUP BY mode
|
||||
ORDER BY count DESC;
|
||||
" 2>/dev/null || echo "Unable to query locks"
|
||||
echo
|
||||
|
||||
# Total locks
|
||||
TOTAL_LOCKS=$($PSQL_PREFIX psql -t -A -c "SELECT COUNT(*) FROM pg_locks;" 2>/dev/null | head -1 || echo "0")
|
||||
echo "Total Active Locks: $TOTAL_LOCKS"
|
||||
|
||||
if [ ! -z "$LOCK_CAPACITY" ] && [ ! -z "$TOTAL_LOCKS" ] && [[ "$TOTAL_LOCKS" =~ ^[0-9]+$ ]] && [ "$TOTAL_LOCKS" -gt 0 ] 2>/dev/null; then
|
||||
LOCK_PERCENT=$((TOTAL_LOCKS * 100 / LOCK_CAPACITY))
|
||||
echo "Lock Usage: ${LOCK_PERCENT}%"
|
||||
|
||||
if [ "$LOCK_PERCENT" -gt 80 ]; then
|
||||
echo -e "${RED}⚠️ WARNING: Lock table usage is critically high${NC}"
|
||||
elif [ "$LOCK_PERCENT" -gt 60 ]; then
|
||||
echo -e "${YELLOW}⚠️ CAUTION: Lock table usage is elevated${NC}"
|
||||
fi
|
||||
fi
|
||||
echo
|
||||
|
||||
# Blocking queries
|
||||
echo "Blocking Queries:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
$PSQL_PREFIX psql -c "
|
||||
SELECT
|
||||
blocked_locks.pid AS blocked_pid,
|
||||
blocking_locks.pid AS blocking_pid,
|
||||
blocked_activity.usename AS blocked_user,
|
||||
blocking_activity.usename AS blocking_user,
|
||||
blocked_activity.query AS blocked_query
|
||||
FROM pg_catalog.pg_locks blocked_locks
|
||||
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
|
||||
JOIN pg_catalog.pg_locks blocking_locks
|
||||
ON blocking_locks.locktype = blocked_locks.locktype
|
||||
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
|
||||
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
|
||||
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
|
||||
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
|
||||
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
|
||||
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
|
||||
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
|
||||
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
|
||||
AND blocking_locks.pid != blocked_locks.pid
|
||||
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
|
||||
WHERE NOT blocked_locks.granted;
|
||||
" 2>/dev/null || echo "No blocking queries or unable to query"
|
||||
echo
|
||||
fi
|
||||
|
||||
# 6. SHARED MEMORY USAGE
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}💾 SHARED MEMORY SEGMENTS${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
if command -v ipcs &> /dev/null; then
|
||||
ipcs -m
|
||||
echo
|
||||
|
||||
# Sum up shared memory
|
||||
TOTAL_SHM=$(ipcs -m | awk '/^0x/ {sum+=$5} END {print sum}')
|
||||
if [ ! -z "$TOTAL_SHM" ]; then
|
||||
echo "Total Shared Memory: $(bytes_to_human $TOTAL_SHM)"
|
||||
fi
|
||||
else
|
||||
echo "ipcs command not available"
|
||||
fi
|
||||
echo
|
||||
|
||||
# 7. DISK SPACE (relevant for temp files)
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}💿 DISK SPACE${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
df -h | grep -E "Filesystem|/$|/var|/tmp|/postgres"
|
||||
echo
|
||||
|
||||
# Check for PostgreSQL temp files
|
||||
if [ "$PSQL_PREFIX" != "NONE" ] && command -v psql &> /dev/null; then
|
||||
TEMP_FILES=$($PSQL_PREFIX psql -t -A -c "SELECT count(*) FROM pg_stat_database WHERE temp_files > 0;" 2>/dev/null | head -1 || echo "0")
|
||||
if [ ! -z "$TEMP_FILES" ] && [ "$TEMP_FILES" -gt 0 ] 2>/dev/null; then
|
||||
echo -e "${YELLOW}⚠️ Databases are using temporary files (work_mem may be too low)${NC}"
|
||||
$PSQL_PREFIX psql -c "SELECT datname, temp_files, pg_size_pretty(temp_bytes) as temp_size FROM pg_stat_database WHERE temp_files > 0;" 2>/dev/null
|
||||
echo
|
||||
fi
|
||||
fi
|
||||
|
||||
# 8. OTHER RESOURCE CONSUMERS
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}🔍 OTHER POTENTIAL MEMORY CONSUMERS${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
# Check for common memory hogs
|
||||
echo "Checking for common memory-intensive services..."
|
||||
echo
|
||||
|
||||
for service in "mysqld" "mongodb" "redis" "elasticsearch" "java" "docker" "containerd"; do
|
||||
MEM=$(ps aux | grep "$service" | grep -v grep | awk '{sum+=$4} END {printf "%.1f", sum}')
|
||||
if [ ! -z "$MEM" ] && (( $(echo "$MEM > 0" | bc -l) )); then
|
||||
echo " ${service}: ${MEM}%"
|
||||
fi
|
||||
done
|
||||
echo
|
||||
|
||||
# 9. SWAP USAGE
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}🔄 SWAP USAGE${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
if command -v free &> /dev/null; then
|
||||
SWAP_TOTAL=$(free -b | awk '/^Swap:/ {print $2}')
|
||||
SWAP_USED=$(free -b | awk '/^Swap:/ {print $3}')
|
||||
|
||||
if [ "$SWAP_TOTAL" -gt 0 ]; then
|
||||
SWAP_PERCENT=$(awk "BEGIN {printf \"%.1f\", ($SWAP_USED/$SWAP_TOTAL)*100}")
|
||||
echo "Swap Total: $(bytes_to_human $SWAP_TOTAL)"
|
||||
echo "Swap Used: $(bytes_to_human $SWAP_USED) (${SWAP_PERCENT}%)"
|
||||
|
||||
if (( $(echo "$SWAP_PERCENT > 50" | bc -l) )); then
|
||||
echo -e "${RED}⚠️ WARNING: Heavy swap usage detected - system may be thrashing${NC}"
|
||||
elif (( $(echo "$SWAP_PERCENT > 20" | bc -l) )); then
|
||||
echo -e "${YELLOW}⚠️ CAUTION: System is using swap${NC}"
|
||||
else
|
||||
echo -e "${GREEN}✓ Swap usage is low${NC}"
|
||||
fi
|
||||
else
|
||||
echo "No swap configured"
|
||||
fi
|
||||
fi
|
||||
echo
|
||||
|
||||
# 10. RECOMMENDATIONS
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo -e "${BLUE}💡 RECOMMENDATIONS${NC}"
|
||||
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
|
||||
echo
|
||||
|
||||
echo "Based on the diagnostics:"
|
||||
echo
|
||||
|
||||
# Memory recommendations
|
||||
if [ ! -z "$MEM_PERCENT" ]; then
|
||||
if (( $(echo "$MEM_PERCENT > 80" | bc -l) )); then
|
||||
echo "1. ⚠️ Memory Pressure:"
|
||||
echo " • System memory is ${MEM_PERCENT}% utilized"
|
||||
echo " • Stop non-essential services before restore"
|
||||
echo " • Consider increasing system RAM"
|
||||
echo " • Use 'dbbackup restore --parallel=1' to reduce memory usage"
|
||||
echo
|
||||
fi
|
||||
fi
|
||||
|
||||
# Lock recommendations
|
||||
if [ "$MAX_LOCKS" != "unknown" ] && [ ! -z "$MAX_LOCKS" ] && [[ "$MAX_LOCKS" =~ ^[0-9]+$ ]]; then
|
||||
if [ "$MAX_LOCKS" -lt 1000 ] 2>/dev/null; then
|
||||
echo "2. ⚠️ Lock Configuration:"
|
||||
echo " • max_locks_per_transaction is too low: $MAX_LOCKS"
|
||||
echo " • Run: ./fix_postgres_locks.sh"
|
||||
echo " • Or manually: ALTER SYSTEM SET max_locks_per_transaction = 4096;"
|
||||
echo " • Then restart PostgreSQL"
|
||||
echo
|
||||
fi
|
||||
fi
|
||||
|
||||
# Other recommendations
|
||||
echo "3. 🔧 Before Large Restores:"
|
||||
echo " • Stop unnecessary services (web servers, cron jobs, etc.)"
|
||||
echo " • Clear PostgreSQL idle connections"
|
||||
echo " • Ensure adequate disk space for temp files"
|
||||
echo " • Consider using --large-db mode for very large databases"
|
||||
echo
|
||||
|
||||
echo "4. 📊 Monitor During Restore:"
|
||||
echo " • Watch: watch -n 2 'ps aux | grep postgres | head -20'"
|
||||
echo " • Locks: watch -n 5 'psql -c \"SELECT COUNT(*) FROM pg_locks;\"'"
|
||||
echo " • Memory: watch -n 2 free -h"
|
||||
echo
|
||||
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " Report generated: $(date '+%Y-%m-%d %H:%M:%S')"
|
||||
echo " Save this output: $0 > diagnosis_$(date +%Y%m%d_%H%M%S).log"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
112
email_infra_team.txt
Normal file
112
email_infra_team.txt
Normal file
@ -0,0 +1,112 @@
|
||||
Betreff: PostgreSQL Restore Fehler - "out of shared memory" auf RST Server
|
||||
|
||||
Hallo Infra-Team,
|
||||
|
||||
wir haben auf dem RST PostgreSQL Server (PostgreSQL 17.4) wiederholt Restore-Fehler mit "out of shared memory" Meldungen.
|
||||
|
||||
═══════════════════════════════════════════════════════════
|
||||
ANALYSE (Stand: 20.01.2026)
|
||||
═══════════════════════════════════════════════════════════
|
||||
|
||||
Server-Specs:
|
||||
• RAM: 31 GB (aktuell 19.6 GB belegt = 63.9%)
|
||||
• PostgreSQL nutzt nur ~118 MB für eigene Prozesse
|
||||
• Swap: 4 GB (6.4% genutzt)
|
||||
|
||||
Lock-Konfiguration:
|
||||
• max_locks_per_transaction: 4096 ✓ (bereits korrekt)
|
||||
• max_connections: 100
|
||||
• Lock Capacity: 409.600 ✓ (ausreichend)
|
||||
|
||||
═══════════════════════════════════════════════════════════
|
||||
PROBLEM-IDENTIFIKATION
|
||||
═══════════════════════════════════════════════════════════
|
||||
|
||||
1. MEMORY CONSUMER (nicht-PostgreSQL):
|
||||
• Nessus Agent: ~173 MB
|
||||
• Elastic Agent: ~300 MB (mehrere Komponenten)
|
||||
• Icinga: ~24 MB
|
||||
• Weitere Monitoring: ~100+ MB
|
||||
|
||||
2. WORK_MEM ZU NIEDRIG:
|
||||
• Aktuell: 64 MB
|
||||
• 4 Datenbanken nutzen Temp-Files (Indikator für zu wenig work_mem):
|
||||
- prodkc: 201 MB temp files
|
||||
- keycloak: 45 MB temp files
|
||||
- d7030: 6 MB temp files
|
||||
- pgbench_db: 2 MB temp files
|
||||
|
||||
═══════════════════════════════════════════════════════════
|
||||
EMPFOHLENE MASSNAHMEN
|
||||
═══════════════════════════════════════════════════════════
|
||||
|
||||
VARIANTE A - Temporär für große Restores:
|
||||
-------------------------------------------
|
||||
1. Monitoring-Agents stoppen (frei: ~500 MB):
|
||||
sudo systemctl stop nessus-agent
|
||||
sudo systemctl stop elastic-agent
|
||||
|
||||
2. work_mem erhöhen:
|
||||
sudo -u postgres psql -c "ALTER SYSTEM SET work_mem = '256MB';"
|
||||
sudo systemctl restart postgresql
|
||||
|
||||
3. Restore durchführen
|
||||
|
||||
4. Agents wieder starten:
|
||||
sudo systemctl start nessus-agent
|
||||
sudo systemctl start elastic-agent
|
||||
|
||||
|
||||
VARIANTE B - Permanente Lösung:
|
||||
-------------------------------------------
|
||||
1. work_mem auf 256 MB erhöhen (statt 64 MB)
|
||||
2. maintenance_work_mem optional auf 4 GB erhöhen (statt 2 GB)
|
||||
3. Falls möglich: Monitoring auf dedizierten Server verschieben
|
||||
|
||||
SQL-Befehle:
|
||||
ALTER SYSTEM SET work_mem = '256MB';
|
||||
ALTER SYSTEM SET maintenance_work_mem = '4GB';
|
||||
-- Anschließend PostgreSQL restart
|
||||
|
||||
|
||||
VARIANTE C - Falls keine Config-Änderung möglich:
|
||||
-------------------------------------------
|
||||
• Restore mit --profile=conservative durchführen (reduziert Memory-Druck)
|
||||
dbbackup restore cluster backup.tar.gz --profile=conservative --confirm
|
||||
|
||||
• Oder TUI-Modus nutzen (verwendet automatisch conservative profile):
|
||||
dbbackup interactive
|
||||
|
||||
• Monitoring während Restore-Fenster deaktivieren
|
||||
|
||||
═══════════════════════════════════════════════════════════
|
||||
DETAIL-REPORT
|
||||
═══════════════════════════════════════════════════════════
|
||||
|
||||
Vollständiger Diagnose-Report liegt bei bzw. kann jederzeit mit
|
||||
diesem Script generiert werden:
|
||||
|
||||
/path/to/diagnose_postgres_memory.sh
|
||||
|
||||
Das Script analysiert:
|
||||
• System Memory Usage
|
||||
• PostgreSQL Konfiguration
|
||||
• Lock Usage
|
||||
• Temp File Usage
|
||||
• Blocking Queries
|
||||
• Shared Memory Segments
|
||||
|
||||
═══════════════════════════════════════════════════════════
|
||||
|
||||
Bevorzugt wäre Variante B (permanente work_mem Erhöhung), damit künftige
|
||||
große Restores ohne manuelle Eingriffe durchlaufen.
|
||||
|
||||
Bitte um Rückmeldung, welche Variante ihr umsetzt bzw. ob ihr weitere
|
||||
Infos benötigt.
|
||||
|
||||
Danke & Grüße
|
||||
[Dein Name]
|
||||
|
||||
---
|
||||
Anhang: diagnose_postgres_memory.sh (falls nicht vorhanden)
|
||||
Error Log: /a01/dba/tmp/dbbackup-restore-debug-20260119-221730.json
|
||||
140
fix_postgres_locks.sh
Executable file
140
fix_postgres_locks.sh
Executable file
@ -0,0 +1,140 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# Fix PostgreSQL Lock Table Exhaustion
|
||||
# Increases max_locks_per_transaction to handle large database restores
|
||||
#
|
||||
|
||||
set -e
|
||||
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " PostgreSQL Lock Configuration Fix"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
|
||||
# Check if running as postgres user or with sudo
|
||||
if [ "$EUID" -ne 0 ] && [ "$(whoami)" != "postgres" ]; then
|
||||
echo "⚠️ This script should be run as:"
|
||||
echo " sudo $0"
|
||||
echo " or as the postgres user"
|
||||
echo
|
||||
read -p "Continue anyway? (y/N) " -n 1 -r
|
||||
echo
|
||||
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# Detect PostgreSQL version and config
|
||||
PSQL=$(command -v psql || echo "")
|
||||
if [ -z "$PSQL" ]; then
|
||||
echo "❌ psql not found in PATH"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "📊 Current PostgreSQL Configuration:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
sudo -u postgres psql -c "SHOW max_locks_per_transaction;" 2>/dev/null || psql -c "SHOW max_locks_per_transaction;" || echo "Unable to query current value"
|
||||
sudo -u postgres psql -c "SHOW max_connections;" 2>/dev/null || psql -c "SHOW max_connections;" || echo "Unable to query current value"
|
||||
sudo -u postgres psql -c "SHOW work_mem;" 2>/dev/null || psql -c "SHOW work_mem;" || echo "Unable to query current value"
|
||||
sudo -u postgres psql -c "SHOW maintenance_work_mem;" 2>/dev/null || psql -c "SHOW maintenance_work_mem;" || echo "Unable to query current value"
|
||||
echo
|
||||
|
||||
# Recommended values
|
||||
RECOMMENDED_LOCKS=4096
|
||||
RECOMMENDED_WORK_MEM="256MB"
|
||||
RECOMMENDED_MAINTENANCE_WORK_MEM="4GB"
|
||||
|
||||
echo "🔧 Applying Fixes:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
echo "1. Setting max_locks_per_transaction = $RECOMMENDED_LOCKS"
|
||||
echo "2. Setting work_mem = $RECOMMENDED_WORK_MEM (improves query performance)"
|
||||
echo "3. Setting maintenance_work_mem = $RECOMMENDED_MAINTENANCE_WORK_MEM (speeds up restore/vacuum)"
|
||||
echo
|
||||
|
||||
# Apply the settings
|
||||
SUCCESS=0
|
||||
|
||||
# Fix 1: max_locks_per_transaction
|
||||
if sudo -u postgres psql -c "ALTER SYSTEM SET max_locks_per_transaction = $RECOMMENDED_LOCKS;" 2>/dev/null; then
|
||||
echo "✅ max_locks_per_transaction updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
elif psql -c "ALTER SYSTEM SET max_locks_per_transaction = $RECOMMENDED_LOCKS;" 2>/dev/null; then
|
||||
echo "✅ max_locks_per_transaction updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
else
|
||||
echo "❌ Failed to update max_locks_per_transaction"
|
||||
fi
|
||||
|
||||
# Fix 2: work_mem
|
||||
if sudo -u postgres psql -c "ALTER SYSTEM SET work_mem = '$RECOMMENDED_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
elif psql -c "ALTER SYSTEM SET work_mem = '$RECOMMENDED_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
else
|
||||
echo "❌ Failed to update work_mem"
|
||||
fi
|
||||
|
||||
# Fix 3: maintenance_work_mem
|
||||
if sudo -u postgres psql -c "ALTER SYSTEM SET maintenance_work_mem = '$RECOMMENDED_MAINTENANCE_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ maintenance_work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
elif psql -c "ALTER SYSTEM SET maintenance_work_mem = '$RECOMMENDED_MAINTENANCE_WORK_MEM';" 2>/dev/null; then
|
||||
echo "✅ maintenance_work_mem updated successfully"
|
||||
SUCCESS=$((SUCCESS + 1))
|
||||
else
|
||||
echo "❌ Failed to update maintenance_work_mem"
|
||||
fi
|
||||
|
||||
if [ $SUCCESS -eq 0 ]; then
|
||||
echo
|
||||
echo "❌ All configuration updates failed"
|
||||
echo
|
||||
echo "Manual steps:"
|
||||
echo "1. Connect to PostgreSQL as superuser:"
|
||||
echo " sudo -u postgres psql"
|
||||
echo
|
||||
echo "2. Run these commands:"
|
||||
echo " ALTER SYSTEM SET max_locks_per_transaction = $RECOMMENDED_LOCKS;"
|
||||
echo " ALTER SYSTEM SET work_mem = '$RECOMMENDED_WORK_MEM';"
|
||||
echo " ALTER SYSTEM SET maintenance_work_mem = '$RECOMMENDED_MAINTENANCE_WORK_MEM';"
|
||||
echo
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo
|
||||
echo "✅ Applied $SUCCESS out of 3 configuration changes"
|
||||
|
||||
echo
|
||||
echo "⚠️ IMPORTANT: PostgreSQL restart required!"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
echo
|
||||
echo "Restart PostgreSQL using one of these commands:"
|
||||
echo
|
||||
echo " • systemd: sudo systemctl restart postgresql"
|
||||
echo " • pg_ctl: sudo -u postgres pg_ctl restart -D /var/lib/postgresql/data"
|
||||
echo " • service: sudo service postgresql restart"
|
||||
echo
|
||||
echo "📊 Expected capacity after restart:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
echo " Lock capacity: max_locks_per_transaction × (max_connections + max_prepared)"
|
||||
echo " = $RECOMMENDED_LOCKS × (connections + prepared)"
|
||||
echo
|
||||
echo " Work memory: $RECOMMENDED_WORK_MEM per query operation"
|
||||
echo " Maintenance: $RECOMMENDED_MAINTENANCE_WORK_MEM for restore/vacuum/index"
|
||||
echo
|
||||
echo "After restarting, verify with:"
|
||||
echo " psql -c 'SHOW max_locks_per_transaction;'"
|
||||
echo " psql -c 'SHOW work_mem;'"
|
||||
echo " psql -c 'SHOW maintenance_work_mem;'"
|
||||
echo
|
||||
echo "💡 Benefits:"
|
||||
echo " ✓ Prevents 'out of shared memory' errors during restore"
|
||||
echo " ✓ Reduces temp file usage (better performance)"
|
||||
echo " ✓ Faster restore, vacuum, and index operations"
|
||||
echo
|
||||
echo "🔍 For comprehensive diagnostics, run:"
|
||||
echo " ./diagnose_postgres_memory.sh"
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
@ -94,7 +94,7 @@
|
||||
"uid": "${DS_PROMETHEUS}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < 86400",
|
||||
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < bool 604800",
|
||||
"legendFormat": "{{database}}",
|
||||
"range": true,
|
||||
"refId": "A"
|
||||
@ -711,19 +711,6 @@
|
||||
},
|
||||
"pluginVersion": "10.2.0",
|
||||
"targets": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "${DS_PROMETHEUS}"
|
||||
},
|
||||
"editorMode": "code",
|
||||
"expr": "dbbackup_rpo_seconds{instance=~\"$instance\"} < 86400",
|
||||
"format": "table",
|
||||
"instant": true,
|
||||
"legendFormat": "__auto",
|
||||
"range": false,
|
||||
"refId": "Status"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
@ -769,26 +756,30 @@
|
||||
"Time": true,
|
||||
"Time 1": true,
|
||||
"Time 2": true,
|
||||
"Time 3": true,
|
||||
"__name__": true,
|
||||
"__name__ 1": true,
|
||||
"__name__ 2": true,
|
||||
"__name__ 3": true,
|
||||
"instance 1": true,
|
||||
"instance 2": true,
|
||||
"instance 3": true,
|
||||
"job": true,
|
||||
"job 1": true,
|
||||
"job 2": true,
|
||||
"job 3": true
|
||||
"engine 1": true,
|
||||
"engine 2": true
|
||||
},
|
||||
"indexByName": {
|
||||
"Database": 0,
|
||||
"Instance": 1,
|
||||
"Engine": 2,
|
||||
"RPO": 3,
|
||||
"Size": 4
|
||||
},
|
||||
"indexByName": {},
|
||||
"renameByName": {
|
||||
"Value #RPO": "RPO",
|
||||
"Value #Size": "Size",
|
||||
"Value #Status": "Status",
|
||||
"database": "Database",
|
||||
"instance": "Instance"
|
||||
"instance": "Instance",
|
||||
"engine": "Engine"
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -1275,7 +1266,7 @@
|
||||
"query": "label_values(dbbackup_rpo_seconds, instance)",
|
||||
"refId": "StandardVariableQuery"
|
||||
},
|
||||
"refresh": 1,
|
||||
"refresh": 2,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"sort": 1,
|
||||
|
||||
@ -84,19 +84,13 @@ func findHbaFileViaPostgres() string {
|
||||
|
||||
// parsePgHbaConf parses pg_hba.conf and returns the authentication method
|
||||
func parsePgHbaConf(path string, user string) AuthMethod {
|
||||
// Try with sudo if we can't read directly
|
||||
// Try to read the file directly - do NOT use sudo as it triggers password prompts
|
||||
// If we can't read pg_hba.conf, we'll rely on connection attempts to determine auth
|
||||
file, err := os.Open(path)
|
||||
if err != nil {
|
||||
// Try with sudo (with timeout)
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
|
||||
cmd := exec.CommandContext(ctx, "sudo", "cat", path)
|
||||
output, err := cmd.Output()
|
||||
if err != nil {
|
||||
return AuthUnknown
|
||||
}
|
||||
return parseHbaContent(string(output), user)
|
||||
// If we can't read the file, return unknown and let the connection determine auth
|
||||
// This avoids sudo password prompts when running as postgres via su
|
||||
return AuthUnknown
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
|
||||
@ -28,14 +28,22 @@ import (
|
||||
"dbbackup/internal/swap"
|
||||
)
|
||||
|
||||
// ProgressCallback is called with byte-level progress updates during backup operations
|
||||
type ProgressCallback func(current, total int64, description string)
|
||||
|
||||
// DatabaseProgressCallback is called with database count progress during cluster backup
|
||||
type DatabaseProgressCallback func(done, total int, dbName string)
|
||||
|
||||
// Engine handles backup operations
|
||||
type Engine struct {
|
||||
cfg *config.Config
|
||||
log logger.Logger
|
||||
db database.Database
|
||||
progress progress.Indicator
|
||||
detailedReporter *progress.DetailedReporter
|
||||
silent bool // Silent mode for TUI
|
||||
cfg *config.Config
|
||||
log logger.Logger
|
||||
db database.Database
|
||||
progress progress.Indicator
|
||||
detailedReporter *progress.DetailedReporter
|
||||
silent bool // Silent mode for TUI
|
||||
progressCallback ProgressCallback
|
||||
dbProgressCallback DatabaseProgressCallback
|
||||
}
|
||||
|
||||
// New creates a new backup engine
|
||||
@ -86,6 +94,30 @@ func NewSilent(cfg *config.Config, log logger.Logger, db database.Database, prog
|
||||
}
|
||||
}
|
||||
|
||||
// SetProgressCallback sets a callback for detailed progress reporting (for TUI mode)
|
||||
func (e *Engine) SetProgressCallback(cb ProgressCallback) {
|
||||
e.progressCallback = cb
|
||||
}
|
||||
|
||||
// SetDatabaseProgressCallback sets a callback for database count progress during cluster backup
|
||||
func (e *Engine) SetDatabaseProgressCallback(cb DatabaseProgressCallback) {
|
||||
e.dbProgressCallback = cb
|
||||
}
|
||||
|
||||
// reportProgress reports progress to the callback if set
|
||||
func (e *Engine) reportProgress(current, total int64, description string) {
|
||||
if e.progressCallback != nil {
|
||||
e.progressCallback(current, total, description)
|
||||
}
|
||||
}
|
||||
|
||||
// reportDatabaseProgress reports database count progress to the callback if set
|
||||
func (e *Engine) reportDatabaseProgress(done, total int, dbName string) {
|
||||
if e.dbProgressCallback != nil {
|
||||
e.dbProgressCallback(done, total, dbName)
|
||||
}
|
||||
}
|
||||
|
||||
// loggerAdapter adapts our logger to the progress.Logger interface
|
||||
type loggerAdapter struct {
|
||||
logger logger.Logger
|
||||
@ -465,6 +497,8 @@ func (e *Engine) BackupCluster(ctx context.Context) error {
|
||||
estimator.UpdateProgress(idx)
|
||||
e.printf(" [%d/%d] Backing up database: %s\n", idx+1, len(databases), name)
|
||||
quietProgress.Update(fmt.Sprintf("Backing up database %d/%d: %s", idx+1, len(databases), name))
|
||||
// Report database progress to TUI callback
|
||||
e.reportDatabaseProgress(idx+1, len(databases), name)
|
||||
mu.Unlock()
|
||||
|
||||
// Check database size and warn if very large
|
||||
@ -903,11 +937,15 @@ func (e *Engine) createSampleBackup(ctx context.Context, databaseName, outputFil
|
||||
func (e *Engine) backupGlobals(ctx context.Context, tempDir string) error {
|
||||
globalsFile := filepath.Join(tempDir, "globals.sql")
|
||||
|
||||
cmd := exec.CommandContext(ctx, "pg_dumpall", "--globals-only")
|
||||
if e.cfg.Host != "localhost" {
|
||||
cmd.Args = append(cmd.Args, "-h", e.cfg.Host, "-p", fmt.Sprintf("%d", e.cfg.Port))
|
||||
// CRITICAL: Always pass port even for localhost - user may have non-standard port
|
||||
cmd := exec.CommandContext(ctx, "pg_dumpall", "--globals-only",
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User)
|
||||
|
||||
// Only add -h flag for non-localhost to use Unix socket for peer auth
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
cmd.Args = append([]string{cmd.Args[0], "-h", e.cfg.Host}, cmd.Args[1:]...)
|
||||
}
|
||||
cmd.Args = append(cmd.Args, "-U", e.cfg.User)
|
||||
|
||||
cmd.Env = os.Environ()
|
||||
if e.cfg.Password != "" {
|
||||
@ -1334,6 +1372,27 @@ func (e *Engine) executeCommand(ctx context.Context, cmdArgs []string, outputFil
|
||||
// NO GO BUFFERING - pg_dump writes directly to disk
|
||||
cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
|
||||
|
||||
// Start heartbeat ticker for backup progress
|
||||
backupStart := time.Now()
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
defer heartbeatTicker.Stop()
|
||||
defer cancelHeartbeat()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(backupStart)
|
||||
if e.progress != nil {
|
||||
e.progress.Update(fmt.Sprintf("Backing up database... (elapsed: %s)", formatDuration(elapsed)))
|
||||
}
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
// Set environment variables for database tools
|
||||
cmd.Env = os.Environ()
|
||||
if e.cfg.Password != "" {
|
||||
@ -1560,3 +1619,22 @@ func formatBytes(bytes int64) string {
|
||||
}
|
||||
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
|
||||
// formatDuration formats a duration to human readable format (e.g., "3m 45s", "1h 23m", "45s")
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Second {
|
||||
return "0s"
|
||||
}
|
||||
|
||||
hours := int(d.Hours())
|
||||
minutes := int(d.Minutes()) % 60
|
||||
seconds := int(d.Seconds()) % 60
|
||||
|
||||
if hours > 0 {
|
||||
return fmt.Sprintf("%dh %dm", hours, minutes)
|
||||
}
|
||||
if minutes > 0 {
|
||||
return fmt.Sprintf("%dm %ds", minutes, seconds)
|
||||
}
|
||||
return fmt.Sprintf("%ds", seconds)
|
||||
}
|
||||
|
||||
@ -68,8 +68,8 @@ func ClassifyError(errorMsg string) *ErrorClassification {
|
||||
Type: "critical",
|
||||
Category: "locks",
|
||||
Message: errorMsg,
|
||||
Hint: "Lock table exhausted - typically caused by large objects (BLOBs) during restore",
|
||||
Action: "Option 1: Increase max_locks_per_transaction to 1024+ in postgresql.conf (requires restart). Option 2: Update dbbackup and retry - phased restore now auto-enabled for BLOB databases",
|
||||
Hint: "Lock table exhausted. Total capacity = max_locks_per_transaction × (max_connections + max_prepared_transactions). If you reduced VM size or max_connections, you need higher max_locks_per_transaction to compensate.",
|
||||
Action: "Fix: ALTER SYSTEM SET max_locks_per_transaction = 4096; then RESTART PostgreSQL. For smaller VMs with fewer connections, you need higher max_locks_per_transaction values.",
|
||||
Severity: 2,
|
||||
}
|
||||
case "permission_denied":
|
||||
@ -142,8 +142,8 @@ func ClassifyError(errorMsg string) *ErrorClassification {
|
||||
Type: "critical",
|
||||
Category: "locks",
|
||||
Message: errorMsg,
|
||||
Hint: "Lock table exhausted - typically caused by large objects (BLOBs) during restore",
|
||||
Action: "Option 1: Increase max_locks_per_transaction to 1024+ in postgresql.conf (requires restart). Option 2: Update dbbackup and retry - phased restore now auto-enabled for BLOB databases",
|
||||
Hint: "Lock table exhausted. Total capacity = max_locks_per_transaction × (max_connections + max_prepared_transactions). If you reduced VM size or max_connections, you need higher max_locks_per_transaction to compensate.",
|
||||
Action: "Fix: ALTER SYSTEM SET max_locks_per_transaction = 4096; then RESTART PostgreSQL. For smaller VMs with fewer connections, you need higher max_locks_per_transaction values.",
|
||||
Severity: 2,
|
||||
}
|
||||
}
|
||||
|
||||
@ -36,19 +36,25 @@ type Config struct {
|
||||
AutoDetectCores bool
|
||||
CPUWorkloadType string // "cpu-intensive", "io-intensive", "balanced"
|
||||
|
||||
// Resource profile for backup/restore operations
|
||||
ResourceProfile string // "conservative", "balanced", "performance", "max-performance"
|
||||
LargeDBMode bool // Enable large database mode (reduces parallelism, increases max_locks)
|
||||
|
||||
// CPU detection
|
||||
CPUDetector *cpu.Detector
|
||||
CPUInfo *cpu.CPUInfo
|
||||
MemoryInfo *cpu.MemoryInfo // System memory information
|
||||
|
||||
// Sample backup options
|
||||
SampleStrategy string // "ratio", "percent", "count"
|
||||
SampleValue int
|
||||
|
||||
// Output options
|
||||
NoColor bool
|
||||
Debug bool
|
||||
LogLevel string
|
||||
LogFormat string
|
||||
NoColor bool
|
||||
Debug bool
|
||||
DebugLocks bool // Extended lock debugging (captures lock detection, Guard decisions, boost attempts)
|
||||
LogLevel string
|
||||
LogFormat string
|
||||
|
||||
// Config persistence
|
||||
NoSaveConfig bool
|
||||
@ -178,6 +184,13 @@ func New() *Config {
|
||||
sslMode = ""
|
||||
}
|
||||
|
||||
// Detect memory information
|
||||
memInfo, _ := cpu.DetectMemory()
|
||||
|
||||
// Determine recommended resource profile
|
||||
recommendedProfile := cpu.RecommendProfile(cpuInfo, memInfo, false)
|
||||
defaultProfile := getEnvString("RESOURCE_PROFILE", recommendedProfile.Name)
|
||||
|
||||
cfg := &Config{
|
||||
// Database defaults
|
||||
Host: host,
|
||||
@ -189,18 +202,21 @@ func New() *Config {
|
||||
SSLMode: sslMode,
|
||||
Insecure: getEnvBool("INSECURE", false),
|
||||
|
||||
// Backup defaults
|
||||
// Backup defaults - use recommended profile's settings for small VMs
|
||||
BackupDir: backupDir,
|
||||
CompressionLevel: getEnvInt("COMPRESS_LEVEL", 6),
|
||||
Jobs: getEnvInt("JOBS", getDefaultJobs(cpuInfo)),
|
||||
DumpJobs: getEnvInt("DUMP_JOBS", getDefaultDumpJobs(cpuInfo)),
|
||||
Jobs: getEnvInt("JOBS", recommendedProfile.Jobs),
|
||||
DumpJobs: getEnvInt("DUMP_JOBS", recommendedProfile.DumpJobs),
|
||||
MaxCores: getEnvInt("MAX_CORES", getDefaultMaxCores(cpuInfo)),
|
||||
AutoDetectCores: getEnvBool("AUTO_DETECT_CORES", true),
|
||||
CPUWorkloadType: getEnvString("CPU_WORKLOAD_TYPE", "balanced"),
|
||||
ResourceProfile: defaultProfile,
|
||||
LargeDBMode: getEnvBool("LARGE_DB_MODE", false),
|
||||
|
||||
// CPU detection
|
||||
// CPU and memory detection
|
||||
CPUDetector: cpuDetector,
|
||||
CPUInfo: cpuInfo,
|
||||
MemoryInfo: memInfo,
|
||||
|
||||
// Sample backup defaults
|
||||
SampleStrategy: getEnvString("SAMPLE_STRATEGY", "ratio"),
|
||||
@ -220,8 +236,8 @@ func New() *Config {
|
||||
// Timeouts - default 24 hours (1440 min) to handle very large databases with large objects
|
||||
ClusterTimeoutMinutes: getEnvInt("CLUSTER_TIMEOUT_MIN", 1440),
|
||||
|
||||
// Cluster parallelism (default: 2 concurrent operations for faster cluster backup/restore)
|
||||
ClusterParallelism: getEnvInt("CLUSTER_PARALLELISM", 2),
|
||||
// Cluster parallelism - use recommended profile's setting for small VMs
|
||||
ClusterParallelism: getEnvInt("CLUSTER_PARALLELISM", recommendedProfile.ClusterParallelism),
|
||||
|
||||
// Working directory for large operations (default: system temp)
|
||||
WorkDir: getEnvString("WORK_DIR", ""),
|
||||
@ -409,6 +425,62 @@ func (c *Config) OptimizeForCPU() error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// ApplyResourceProfile applies a resource profile to the configuration
|
||||
// This adjusts parallelism settings based on the chosen profile
|
||||
func (c *Config) ApplyResourceProfile(profileName string) error {
|
||||
profile := cpu.GetProfileByName(profileName)
|
||||
if profile == nil {
|
||||
return &ConfigError{
|
||||
Field: "resource_profile",
|
||||
Value: profileName,
|
||||
Message: "unknown profile. Valid profiles: conservative, balanced, performance, max-performance",
|
||||
}
|
||||
}
|
||||
|
||||
// Validate profile against current system
|
||||
isValid, warnings := cpu.ValidateProfileForSystem(profile, c.CPUInfo, c.MemoryInfo)
|
||||
if !isValid {
|
||||
// Log warnings but don't block - user may know what they're doing
|
||||
_ = warnings // In production, log these warnings
|
||||
}
|
||||
|
||||
// Apply profile settings
|
||||
c.ResourceProfile = profile.Name
|
||||
|
||||
// If LargeDBMode is enabled, apply its modifiers
|
||||
if c.LargeDBMode {
|
||||
profile = cpu.ApplyLargeDBMode(profile)
|
||||
}
|
||||
|
||||
c.ClusterParallelism = profile.ClusterParallelism
|
||||
c.Jobs = profile.Jobs
|
||||
c.DumpJobs = profile.DumpJobs
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetResourceProfileRecommendation returns the recommended profile and reason
|
||||
func (c *Config) GetResourceProfileRecommendation(isLargeDB bool) (string, string) {
|
||||
profile, reason := cpu.RecommendProfileWithReason(c.CPUInfo, c.MemoryInfo, isLargeDB)
|
||||
return profile.Name, reason
|
||||
}
|
||||
|
||||
// GetCurrentProfile returns the current resource profile details
|
||||
// If LargeDBMode is enabled, returns a modified profile with reduced parallelism
|
||||
func (c *Config) GetCurrentProfile() *cpu.ResourceProfile {
|
||||
profile := cpu.GetProfileByName(c.ResourceProfile)
|
||||
if profile == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Apply LargeDBMode modifier if enabled
|
||||
if c.LargeDBMode {
|
||||
return cpu.ApplyLargeDBMode(profile)
|
||||
}
|
||||
|
||||
return profile
|
||||
}
|
||||
|
||||
// GetCPUInfo returns CPU information, detecting if necessary
|
||||
func (c *Config) GetCPUInfo() (*cpu.CPUInfo, error) {
|
||||
if c.CPUInfo != nil {
|
||||
|
||||
@ -28,9 +28,11 @@ type LocalConfig struct {
|
||||
DumpJobs int
|
||||
|
||||
// Performance settings
|
||||
CPUWorkload string
|
||||
MaxCores int
|
||||
ClusterTimeout int // Cluster operation timeout in minutes (default: 1440 = 24 hours)
|
||||
CPUWorkload string
|
||||
MaxCores int
|
||||
ClusterTimeout int // Cluster operation timeout in minutes (default: 1440 = 24 hours)
|
||||
ResourceProfile string
|
||||
LargeDBMode bool // Enable large database mode (reduces parallelism, increases locks)
|
||||
|
||||
// Security settings
|
||||
RetentionDays int
|
||||
@ -126,6 +128,10 @@ func LoadLocalConfig() (*LocalConfig, error) {
|
||||
if ct, err := strconv.Atoi(value); err == nil {
|
||||
cfg.ClusterTimeout = ct
|
||||
}
|
||||
case "resource_profile":
|
||||
cfg.ResourceProfile = value
|
||||
case "large_db_mode":
|
||||
cfg.LargeDBMode = value == "true" || value == "1"
|
||||
}
|
||||
case "security":
|
||||
switch key {
|
||||
@ -207,6 +213,12 @@ func SaveLocalConfig(cfg *LocalConfig) error {
|
||||
if cfg.ClusterTimeout != 0 {
|
||||
sb.WriteString(fmt.Sprintf("cluster_timeout = %d\n", cfg.ClusterTimeout))
|
||||
}
|
||||
if cfg.ResourceProfile != "" {
|
||||
sb.WriteString(fmt.Sprintf("resource_profile = %s\n", cfg.ResourceProfile))
|
||||
}
|
||||
if cfg.LargeDBMode {
|
||||
sb.WriteString("large_db_mode = true\n")
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
|
||||
// Security section
|
||||
@ -280,6 +292,14 @@ func ApplyLocalConfig(cfg *Config, local *LocalConfig) {
|
||||
if local.ClusterTimeout != 0 {
|
||||
cfg.ClusterTimeoutMinutes = local.ClusterTimeout
|
||||
}
|
||||
// Apply resource profile settings
|
||||
if local.ResourceProfile != "" {
|
||||
cfg.ResourceProfile = local.ResourceProfile
|
||||
}
|
||||
// LargeDBMode is a boolean - apply if true in config
|
||||
if local.LargeDBMode {
|
||||
cfg.LargeDBMode = true
|
||||
}
|
||||
if cfg.RetentionDays == 30 && local.RetentionDays != 0 {
|
||||
cfg.RetentionDays = local.RetentionDays
|
||||
}
|
||||
@ -294,22 +314,24 @@ func ApplyLocalConfig(cfg *Config, local *LocalConfig) {
|
||||
// ConfigFromConfig creates a LocalConfig from a Config
|
||||
func ConfigFromConfig(cfg *Config) *LocalConfig {
|
||||
return &LocalConfig{
|
||||
DBType: cfg.DatabaseType,
|
||||
Host: cfg.Host,
|
||||
Port: cfg.Port,
|
||||
User: cfg.User,
|
||||
Database: cfg.Database,
|
||||
SSLMode: cfg.SSLMode,
|
||||
BackupDir: cfg.BackupDir,
|
||||
WorkDir: cfg.WorkDir,
|
||||
Compression: cfg.CompressionLevel,
|
||||
Jobs: cfg.Jobs,
|
||||
DumpJobs: cfg.DumpJobs,
|
||||
CPUWorkload: cfg.CPUWorkloadType,
|
||||
MaxCores: cfg.MaxCores,
|
||||
ClusterTimeout: cfg.ClusterTimeoutMinutes,
|
||||
RetentionDays: cfg.RetentionDays,
|
||||
MinBackups: cfg.MinBackups,
|
||||
MaxRetries: cfg.MaxRetries,
|
||||
DBType: cfg.DatabaseType,
|
||||
Host: cfg.Host,
|
||||
Port: cfg.Port,
|
||||
User: cfg.User,
|
||||
Database: cfg.Database,
|
||||
SSLMode: cfg.SSLMode,
|
||||
BackupDir: cfg.BackupDir,
|
||||
WorkDir: cfg.WorkDir,
|
||||
Compression: cfg.CompressionLevel,
|
||||
Jobs: cfg.Jobs,
|
||||
DumpJobs: cfg.DumpJobs,
|
||||
CPUWorkload: cfg.CPUWorkloadType,
|
||||
MaxCores: cfg.MaxCores,
|
||||
ClusterTimeout: cfg.ClusterTimeoutMinutes,
|
||||
ResourceProfile: cfg.ResourceProfile,
|
||||
LargeDBMode: cfg.LargeDBMode,
|
||||
RetentionDays: cfg.RetentionDays,
|
||||
MinBackups: cfg.MinBackups,
|
||||
MaxRetries: cfg.MaxRetries,
|
||||
}
|
||||
}
|
||||
|
||||
128
internal/config/profile.go
Normal file
128
internal/config/profile.go
Normal file
@ -0,0 +1,128 @@
|
||||
package config
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// RestoreProfile defines resource settings for restore operations
|
||||
type RestoreProfile struct {
|
||||
Name string
|
||||
ParallelDBs int // Number of databases to restore in parallel
|
||||
Jobs int // Parallel decompression jobs
|
||||
DisableProgress bool // Disable progress indicators to reduce overhead
|
||||
MemoryConservative bool // Use memory-conservative settings
|
||||
}
|
||||
|
||||
// GetRestoreProfile returns the profile settings for a given profile name
|
||||
func GetRestoreProfile(profileName string) (*RestoreProfile, error) {
|
||||
profileName = strings.ToLower(strings.TrimSpace(profileName))
|
||||
|
||||
switch profileName {
|
||||
case "conservative":
|
||||
return &RestoreProfile{
|
||||
Name: "conservative",
|
||||
ParallelDBs: 1, // Single-threaded restore
|
||||
Jobs: 1, // Single-threaded decompression
|
||||
DisableProgress: false,
|
||||
MemoryConservative: true,
|
||||
}, nil
|
||||
|
||||
case "balanced", "":
|
||||
return &RestoreProfile{
|
||||
Name: "balanced",
|
||||
ParallelDBs: 0, // Use config default or auto-detect
|
||||
Jobs: 0, // Use config default or auto-detect
|
||||
DisableProgress: false,
|
||||
MemoryConservative: false,
|
||||
}, nil
|
||||
|
||||
case "aggressive", "performance", "max":
|
||||
return &RestoreProfile{
|
||||
Name: "aggressive",
|
||||
ParallelDBs: -1, // Auto-detect based on resources
|
||||
Jobs: -1, // Auto-detect based on CPU
|
||||
DisableProgress: false,
|
||||
MemoryConservative: false,
|
||||
}, nil
|
||||
|
||||
case "potato":
|
||||
// Easter egg: same as conservative but with a fun name
|
||||
return &RestoreProfile{
|
||||
Name: "potato",
|
||||
ParallelDBs: 1,
|
||||
Jobs: 1,
|
||||
DisableProgress: false,
|
||||
MemoryConservative: true,
|
||||
}, nil
|
||||
|
||||
default:
|
||||
return nil, fmt.Errorf("unknown profile: %s (valid: conservative, balanced, aggressive)", profileName)
|
||||
}
|
||||
}
|
||||
|
||||
// ApplyProfile applies profile settings to config, respecting explicit user overrides
|
||||
func ApplyProfile(cfg *Config, profileName string, explicitJobs, explicitParallelDBs int) error {
|
||||
profile, err := GetRestoreProfile(profileName)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
// Show profile being used
|
||||
if cfg.Debug {
|
||||
fmt.Printf("Using restore profile: %s\n", profile.Name)
|
||||
if profile.MemoryConservative {
|
||||
fmt.Println("Memory-conservative mode enabled")
|
||||
}
|
||||
}
|
||||
|
||||
// Apply profile settings only if not explicitly overridden
|
||||
if explicitJobs == 0 && profile.Jobs > 0 {
|
||||
cfg.Jobs = profile.Jobs
|
||||
}
|
||||
|
||||
if explicitParallelDBs == 0 && profile.ParallelDBs != 0 {
|
||||
cfg.ClusterParallelism = profile.ParallelDBs
|
||||
}
|
||||
|
||||
// Store profile name
|
||||
cfg.ResourceProfile = profile.Name
|
||||
|
||||
// Conservative profile implies large DB mode settings
|
||||
if profile.MemoryConservative {
|
||||
cfg.LargeDBMode = true
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// GetProfileDescription returns a human-readable description of the profile
|
||||
func GetProfileDescription(profileName string) string {
|
||||
profile, err := GetRestoreProfile(profileName)
|
||||
if err != nil {
|
||||
return "Unknown profile"
|
||||
}
|
||||
|
||||
switch profile.Name {
|
||||
case "conservative":
|
||||
return "Conservative: --parallel=1, single-threaded, minimal memory usage. Best for resource-constrained servers or when other services are running."
|
||||
case "potato":
|
||||
return "Potato Mode: Same as conservative, for servers running on a potato 🥔"
|
||||
case "balanced":
|
||||
return "Balanced: Auto-detect resources, moderate parallelism. Good default for most scenarios."
|
||||
case "aggressive":
|
||||
return "Aggressive: Maximum parallelism, all available resources. Best for dedicated database servers with ample resources."
|
||||
default:
|
||||
return profile.Name
|
||||
}
|
||||
}
|
||||
|
||||
// ListProfiles returns a list of all available profiles with descriptions
|
||||
func ListProfiles() map[string]string {
|
||||
return map[string]string{
|
||||
"conservative": GetProfileDescription("conservative"),
|
||||
"balanced": GetProfileDescription("balanced"),
|
||||
"aggressive": GetProfileDescription("aggressive"),
|
||||
"potato": GetProfileDescription("potato"),
|
||||
}
|
||||
}
|
||||
475
internal/cpu/profiles.go
Normal file
475
internal/cpu/profiles.go
Normal file
@ -0,0 +1,475 @@
|
||||
package cpu
|
||||
|
||||
import (
|
||||
"bufio"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"runtime"
|
||||
"strconv"
|
||||
"strings"
|
||||
)
|
||||
|
||||
// MemoryInfo holds system memory information
|
||||
type MemoryInfo struct {
|
||||
TotalBytes int64 `json:"total_bytes"`
|
||||
AvailableBytes int64 `json:"available_bytes"`
|
||||
FreeBytes int64 `json:"free_bytes"`
|
||||
UsedBytes int64 `json:"used_bytes"`
|
||||
SwapTotalBytes int64 `json:"swap_total_bytes"`
|
||||
SwapFreeBytes int64 `json:"swap_free_bytes"`
|
||||
TotalGB int `json:"total_gb"`
|
||||
AvailableGB int `json:"available_gb"`
|
||||
Platform string `json:"platform"`
|
||||
}
|
||||
|
||||
// ResourceProfile defines a resource allocation profile for backup/restore operations
|
||||
type ResourceProfile struct {
|
||||
Name string `json:"name"`
|
||||
Description string `json:"description"`
|
||||
ClusterParallelism int `json:"cluster_parallelism"` // Concurrent databases
|
||||
Jobs int `json:"jobs"` // Parallel jobs within pg_restore
|
||||
DumpJobs int `json:"dump_jobs"` // Parallel jobs for pg_dump
|
||||
MaintenanceWorkMem string `json:"maintenance_work_mem"` // PostgreSQL recommendation
|
||||
MaxLocksPerTxn int `json:"max_locks_per_txn"` // PostgreSQL recommendation
|
||||
RecommendedForLarge bool `json:"recommended_for_large"` // Suitable for large DBs?
|
||||
MinMemoryGB int `json:"min_memory_gb"` // Minimum memory for this profile
|
||||
MinCores int `json:"min_cores"` // Minimum cores for this profile
|
||||
}
|
||||
|
||||
// Predefined resource profiles
|
||||
var (
|
||||
// ProfileConservative - Safe for constrained VMs, avoids shared memory issues
|
||||
ProfileConservative = ResourceProfile{
|
||||
Name: "conservative",
|
||||
Description: "Safe for small VMs (2-4 cores, <16GB). Sequential operations, minimal memory pressure. Best for large DBs on limited hardware.",
|
||||
ClusterParallelism: 1,
|
||||
Jobs: 1,
|
||||
DumpJobs: 2,
|
||||
MaintenanceWorkMem: "256MB",
|
||||
MaxLocksPerTxn: 4096,
|
||||
RecommendedForLarge: true,
|
||||
MinMemoryGB: 4,
|
||||
MinCores: 2,
|
||||
}
|
||||
|
||||
// ProfileBalanced - Default profile, works for most scenarios
|
||||
ProfileBalanced = ResourceProfile{
|
||||
Name: "balanced",
|
||||
Description: "Balanced for medium VMs (4-8 cores, 16-32GB). Moderate parallelism with good safety margin.",
|
||||
ClusterParallelism: 2,
|
||||
Jobs: 2,
|
||||
DumpJobs: 4,
|
||||
MaintenanceWorkMem: "512MB",
|
||||
MaxLocksPerTxn: 2048,
|
||||
RecommendedForLarge: true,
|
||||
MinMemoryGB: 16,
|
||||
MinCores: 4,
|
||||
}
|
||||
|
||||
// ProfilePerformance - Aggressive parallelism for powerful servers
|
||||
ProfilePerformance = ResourceProfile{
|
||||
Name: "performance",
|
||||
Description: "Aggressive for powerful servers (8+ cores, 32GB+). Maximum parallelism for fast operations.",
|
||||
ClusterParallelism: 4,
|
||||
Jobs: 4,
|
||||
DumpJobs: 8,
|
||||
MaintenanceWorkMem: "1GB",
|
||||
MaxLocksPerTxn: 1024,
|
||||
RecommendedForLarge: false, // Large DBs may still need conservative
|
||||
MinMemoryGB: 32,
|
||||
MinCores: 8,
|
||||
}
|
||||
|
||||
// ProfileMaxPerformance - Maximum parallelism for high-end servers
|
||||
ProfileMaxPerformance = ResourceProfile{
|
||||
Name: "max-performance",
|
||||
Description: "Maximum for high-end servers (16+ cores, 64GB+). Full CPU utilization.",
|
||||
ClusterParallelism: 8,
|
||||
Jobs: 8,
|
||||
DumpJobs: 16,
|
||||
MaintenanceWorkMem: "2GB",
|
||||
MaxLocksPerTxn: 512,
|
||||
RecommendedForLarge: false, // Large DBs should use LargeDBMode
|
||||
MinMemoryGB: 64,
|
||||
MinCores: 16,
|
||||
}
|
||||
|
||||
// AllProfiles contains all available profiles (VM resource-based)
|
||||
AllProfiles = []ResourceProfile{
|
||||
ProfileConservative,
|
||||
ProfileBalanced,
|
||||
ProfilePerformance,
|
||||
ProfileMaxPerformance,
|
||||
}
|
||||
)
|
||||
|
||||
// GetProfileByName returns a profile by its name
|
||||
func GetProfileByName(name string) *ResourceProfile {
|
||||
for _, p := range AllProfiles {
|
||||
if strings.EqualFold(p.Name, name) {
|
||||
return &p
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ApplyLargeDBMode modifies a profile for large database operations.
|
||||
// This is a modifier that reduces parallelism and increases max_locks_per_transaction
|
||||
// to prevent "out of shared memory" errors with large databases (many tables, LOBs, etc.).
|
||||
// It returns a new profile with adjusted settings, leaving the original unchanged.
|
||||
func ApplyLargeDBMode(profile *ResourceProfile) *ResourceProfile {
|
||||
if profile == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
// Create a copy with adjusted settings
|
||||
modified := *profile
|
||||
|
||||
// Add "(large-db)" suffix to indicate this is modified
|
||||
modified.Name = profile.Name + " +large-db"
|
||||
modified.Description = fmt.Sprintf("%s [LargeDBMode: reduced parallelism, high locks]", profile.Description)
|
||||
|
||||
// Reduce parallelism to avoid lock exhaustion
|
||||
// Rule: halve parallelism, minimum 1
|
||||
modified.ClusterParallelism = max(1, profile.ClusterParallelism/2)
|
||||
modified.Jobs = max(1, profile.Jobs/2)
|
||||
modified.DumpJobs = max(2, profile.DumpJobs/2)
|
||||
|
||||
// Force high max_locks_per_transaction for large schemas
|
||||
modified.MaxLocksPerTxn = 8192
|
||||
|
||||
// Increase maintenance_work_mem for complex operations
|
||||
// Keep or boost maintenance work mem
|
||||
modified.MaintenanceWorkMem = "1GB"
|
||||
if profile.MinMemoryGB >= 32 {
|
||||
modified.MaintenanceWorkMem = "2GB"
|
||||
}
|
||||
|
||||
modified.RecommendedForLarge = true
|
||||
|
||||
return &modified
|
||||
}
|
||||
|
||||
// max returns the larger of two integers
|
||||
func max(a, b int) int {
|
||||
if a > b {
|
||||
return a
|
||||
}
|
||||
return b
|
||||
}
|
||||
|
||||
// DetectMemory detects system memory information
|
||||
func DetectMemory() (*MemoryInfo, error) {
|
||||
info := &MemoryInfo{
|
||||
Platform: runtime.GOOS,
|
||||
}
|
||||
|
||||
switch runtime.GOOS {
|
||||
case "linux":
|
||||
if err := detectLinuxMemory(info); err != nil {
|
||||
return info, fmt.Errorf("linux memory detection failed: %w", err)
|
||||
}
|
||||
case "darwin":
|
||||
if err := detectDarwinMemory(info); err != nil {
|
||||
return info, fmt.Errorf("darwin memory detection failed: %w", err)
|
||||
}
|
||||
case "windows":
|
||||
if err := detectWindowsMemory(info); err != nil {
|
||||
return info, fmt.Errorf("windows memory detection failed: %w", err)
|
||||
}
|
||||
default:
|
||||
// Fallback: use Go runtime memory stats
|
||||
var memStats runtime.MemStats
|
||||
runtime.ReadMemStats(&memStats)
|
||||
info.TotalBytes = int64(memStats.Sys)
|
||||
info.AvailableBytes = int64(memStats.Sys - memStats.Alloc)
|
||||
}
|
||||
|
||||
// Calculate GB values
|
||||
info.TotalGB = int(info.TotalBytes / (1024 * 1024 * 1024))
|
||||
info.AvailableGB = int(info.AvailableBytes / (1024 * 1024 * 1024))
|
||||
|
||||
return info, nil
|
||||
}
|
||||
|
||||
// detectLinuxMemory reads memory info from /proc/meminfo
|
||||
func detectLinuxMemory(info *MemoryInfo) error {
|
||||
file, err := os.Open("/proc/meminfo")
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
scanner := bufio.NewScanner(file)
|
||||
for scanner.Scan() {
|
||||
line := scanner.Text()
|
||||
parts := strings.Fields(line)
|
||||
if len(parts) < 2 {
|
||||
continue
|
||||
}
|
||||
|
||||
key := strings.TrimSuffix(parts[0], ":")
|
||||
value, err := strconv.ParseInt(parts[1], 10, 64)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
|
||||
// Values are in kB
|
||||
valueBytes := value * 1024
|
||||
|
||||
switch key {
|
||||
case "MemTotal":
|
||||
info.TotalBytes = valueBytes
|
||||
case "MemAvailable":
|
||||
info.AvailableBytes = valueBytes
|
||||
case "MemFree":
|
||||
info.FreeBytes = valueBytes
|
||||
case "SwapTotal":
|
||||
info.SwapTotalBytes = valueBytes
|
||||
case "SwapFree":
|
||||
info.SwapFreeBytes = valueBytes
|
||||
}
|
||||
}
|
||||
|
||||
info.UsedBytes = info.TotalBytes - info.AvailableBytes
|
||||
|
||||
return scanner.Err()
|
||||
}
|
||||
|
||||
// detectDarwinMemory detects memory on macOS
|
||||
func detectDarwinMemory(info *MemoryInfo) error {
|
||||
// Use sysctl for total memory
|
||||
if output, err := runCommand("sysctl", "-n", "hw.memsize"); err == nil {
|
||||
if val, err := strconv.ParseInt(strings.TrimSpace(output), 10, 64); err == nil {
|
||||
info.TotalBytes = val
|
||||
}
|
||||
}
|
||||
|
||||
// Use vm_stat for available memory (more complex parsing required)
|
||||
if output, err := runCommand("vm_stat"); err == nil {
|
||||
pageSize := int64(4096) // Default page size
|
||||
var freePages, inactivePages int64
|
||||
|
||||
lines := strings.Split(output, "\n")
|
||||
for _, line := range lines {
|
||||
if strings.Contains(line, "page size of") {
|
||||
parts := strings.Fields(line)
|
||||
for i, p := range parts {
|
||||
if p == "of" && i+1 < len(parts) {
|
||||
if ps, err := strconv.ParseInt(parts[i+1], 10, 64); err == nil {
|
||||
pageSize = ps
|
||||
}
|
||||
}
|
||||
}
|
||||
} else if strings.Contains(line, "Pages free:") {
|
||||
val := extractNumberFromLine(line)
|
||||
freePages = val
|
||||
} else if strings.Contains(line, "Pages inactive:") {
|
||||
val := extractNumberFromLine(line)
|
||||
inactivePages = val
|
||||
}
|
||||
}
|
||||
|
||||
info.FreeBytes = freePages * pageSize
|
||||
info.AvailableBytes = (freePages + inactivePages) * pageSize
|
||||
}
|
||||
|
||||
info.UsedBytes = info.TotalBytes - info.AvailableBytes
|
||||
return nil
|
||||
}
|
||||
|
||||
// detectWindowsMemory detects memory on Windows
|
||||
func detectWindowsMemory(info *MemoryInfo) error {
|
||||
// Use wmic for memory info
|
||||
if output, err := runCommand("wmic", "OS", "get", "TotalVisibleMemorySize,FreePhysicalMemory", "/format:list"); err == nil {
|
||||
lines := strings.Split(output, "\n")
|
||||
for _, line := range lines {
|
||||
line = strings.TrimSpace(line)
|
||||
if strings.HasPrefix(line, "TotalVisibleMemorySize=") {
|
||||
val := strings.TrimPrefix(line, "TotalVisibleMemorySize=")
|
||||
if v, err := strconv.ParseInt(strings.TrimSpace(val), 10, 64); err == nil {
|
||||
info.TotalBytes = v * 1024 // KB to bytes
|
||||
}
|
||||
} else if strings.HasPrefix(line, "FreePhysicalMemory=") {
|
||||
val := strings.TrimPrefix(line, "FreePhysicalMemory=")
|
||||
if v, err := strconv.ParseInt(strings.TrimSpace(val), 10, 64); err == nil {
|
||||
info.FreeBytes = v * 1024
|
||||
info.AvailableBytes = v * 1024
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
info.UsedBytes = info.TotalBytes - info.AvailableBytes
|
||||
return nil
|
||||
}
|
||||
|
||||
// RecommendProfile recommends a resource profile based on system resources and workload
|
||||
func RecommendProfile(cpuInfo *CPUInfo, memInfo *MemoryInfo, isLargeDB bool) *ResourceProfile {
|
||||
cores := 0
|
||||
if cpuInfo != nil {
|
||||
cores = cpuInfo.PhysicalCores
|
||||
if cores == 0 {
|
||||
cores = cpuInfo.LogicalCores
|
||||
}
|
||||
}
|
||||
if cores == 0 {
|
||||
cores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memGB := 0
|
||||
if memInfo != nil {
|
||||
memGB = memInfo.TotalGB
|
||||
}
|
||||
|
||||
// Special case: large databases should use conservative profile
|
||||
// The caller should also enable LargeDBMode for increased MaxLocksPerTxn
|
||||
if isLargeDB {
|
||||
// For large DBs, recommend conservative regardless of resources
|
||||
// LargeDBMode flag will handle the lock settings separately
|
||||
return &ProfileConservative
|
||||
}
|
||||
|
||||
// Resource-based selection
|
||||
if cores >= 16 && memGB >= 64 {
|
||||
return &ProfileMaxPerformance
|
||||
} else if cores >= 8 && memGB >= 32 {
|
||||
return &ProfilePerformance
|
||||
} else if cores >= 4 && memGB >= 16 {
|
||||
return &ProfileBalanced
|
||||
}
|
||||
|
||||
// Default to conservative for constrained systems
|
||||
return &ProfileConservative
|
||||
}
|
||||
|
||||
// RecommendProfileWithReason returns a profile recommendation with explanation
|
||||
func RecommendProfileWithReason(cpuInfo *CPUInfo, memInfo *MemoryInfo, isLargeDB bool) (*ResourceProfile, string) {
|
||||
cores := 0
|
||||
if cpuInfo != nil {
|
||||
cores = cpuInfo.PhysicalCores
|
||||
if cores == 0 {
|
||||
cores = cpuInfo.LogicalCores
|
||||
}
|
||||
}
|
||||
if cores == 0 {
|
||||
cores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memGB := 0
|
||||
if memInfo != nil {
|
||||
memGB = memInfo.TotalGB
|
||||
}
|
||||
|
||||
// Build reason string
|
||||
var reason strings.Builder
|
||||
reason.WriteString(fmt.Sprintf("System: %d cores, %dGB RAM. ", cores, memGB))
|
||||
|
||||
profile := RecommendProfile(cpuInfo, memInfo, isLargeDB)
|
||||
|
||||
if isLargeDB {
|
||||
reason.WriteString("Large database mode - using conservative settings. Enable LargeDBMode for higher max_locks.")
|
||||
} else if profile.Name == "conservative" {
|
||||
reason.WriteString("Limited resources detected - using conservative profile for stability.")
|
||||
} else if profile.Name == "max-performance" {
|
||||
reason.WriteString("High-end server detected - using maximum parallelism.")
|
||||
} else if profile.Name == "performance" {
|
||||
reason.WriteString("Good resources detected - using performance profile.")
|
||||
} else {
|
||||
reason.WriteString("Using balanced profile for optimal performance/stability trade-off.")
|
||||
}
|
||||
|
||||
return profile, reason.String()
|
||||
}
|
||||
|
||||
// ValidateProfileForSystem checks if a profile is suitable for the current system
|
||||
func ValidateProfileForSystem(profile *ResourceProfile, cpuInfo *CPUInfo, memInfo *MemoryInfo) (bool, []string) {
|
||||
var warnings []string
|
||||
|
||||
cores := 0
|
||||
if cpuInfo != nil {
|
||||
cores = cpuInfo.PhysicalCores
|
||||
if cores == 0 {
|
||||
cores = cpuInfo.LogicalCores
|
||||
}
|
||||
}
|
||||
if cores == 0 {
|
||||
cores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memGB := 0
|
||||
if memInfo != nil {
|
||||
memGB = memInfo.TotalGB
|
||||
}
|
||||
|
||||
// Check minimum requirements
|
||||
if cores < profile.MinCores {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("Profile '%s' recommends %d+ cores (system has %d)", profile.Name, profile.MinCores, cores))
|
||||
}
|
||||
|
||||
if memGB < profile.MinMemoryGB {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("Profile '%s' recommends %dGB+ RAM (system has %dGB)", profile.Name, profile.MinMemoryGB, memGB))
|
||||
}
|
||||
|
||||
// Check for potential issues
|
||||
if profile.ClusterParallelism > cores {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("Cluster parallelism (%d) exceeds CPU cores (%d) - may cause contention",
|
||||
profile.ClusterParallelism, cores))
|
||||
}
|
||||
|
||||
// Memory pressure warning
|
||||
memPerWorker := 2 // Rough estimate: 2GB per parallel worker for large DB operations
|
||||
requiredMem := profile.ClusterParallelism * profile.Jobs * memPerWorker
|
||||
if memGB > 0 && requiredMem > memGB {
|
||||
warnings = append(warnings,
|
||||
fmt.Sprintf("High parallelism may require ~%dGB RAM (system has %dGB) - risk of OOM",
|
||||
requiredMem, memGB))
|
||||
}
|
||||
|
||||
return len(warnings) == 0, warnings
|
||||
}
|
||||
|
||||
// FormatProfileSummary returns a formatted summary of a profile
|
||||
func (p *ResourceProfile) FormatProfileSummary() string {
|
||||
return fmt.Sprintf("[%s] Parallel: %d DBs, %d jobs | Recommended for large DBs: %v",
|
||||
strings.ToUpper(p.Name),
|
||||
p.ClusterParallelism,
|
||||
p.Jobs,
|
||||
p.RecommendedForLarge)
|
||||
}
|
||||
|
||||
// PostgreSQLRecommendations returns PostgreSQL configuration recommendations for this profile
|
||||
func (p *ResourceProfile) PostgreSQLRecommendations() []string {
|
||||
return []string{
|
||||
fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d;", p.MaxLocksPerTxn),
|
||||
fmt.Sprintf("ALTER SYSTEM SET maintenance_work_mem = '%s';", p.MaintenanceWorkMem),
|
||||
"-- Restart PostgreSQL after changes to max_locks_per_transaction",
|
||||
}
|
||||
}
|
||||
|
||||
// Helper functions
|
||||
|
||||
func runCommand(name string, args ...string) (string, error) {
|
||||
cmd := exec.Command(name, args...)
|
||||
output, err := cmd.Output()
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
return string(output), nil
|
||||
}
|
||||
|
||||
func extractNumberFromLine(line string) int64 {
|
||||
// Extract number before the period at end (e.g., "Pages free: 123456.")
|
||||
parts := strings.Fields(line)
|
||||
for _, p := range parts {
|
||||
p = strings.TrimSuffix(p, ".")
|
||||
if val, err := strconv.ParseInt(p, 10, 64); err == nil && val > 0 {
|
||||
return val
|
||||
}
|
||||
}
|
||||
return 0
|
||||
}
|
||||
@ -316,11 +316,12 @@ func (p *PostgreSQL) BuildBackupCommand(database, outputFile string, options Bac
|
||||
cmd := []string{"pg_dump"}
|
||||
|
||||
// Connection parameters
|
||||
if p.cfg.Host != "localhost" {
|
||||
// CRITICAL: Always pass port even for localhost - user may have non-standard port
|
||||
if p.cfg.Host != "localhost" && p.cfg.Host != "127.0.0.1" && p.cfg.Host != "" {
|
||||
cmd = append(cmd, "-h", p.cfg.Host)
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "--no-password")
|
||||
}
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "-U", p.cfg.User)
|
||||
|
||||
// Format and compression
|
||||
@ -380,11 +381,12 @@ func (p *PostgreSQL) BuildRestoreCommand(database, inputFile string, options Res
|
||||
cmd := []string{"pg_restore"}
|
||||
|
||||
// Connection parameters
|
||||
if p.cfg.Host != "localhost" {
|
||||
// CRITICAL: Always pass port even for localhost - user may have non-standard port
|
||||
if p.cfg.Host != "localhost" && p.cfg.Host != "127.0.0.1" && p.cfg.Host != "" {
|
||||
cmd = append(cmd, "-h", p.cfg.Host)
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "--no-password")
|
||||
}
|
||||
cmd = append(cmd, "-p", strconv.Itoa(p.cfg.Port))
|
||||
cmd = append(cmd, "-U", p.cfg.User)
|
||||
|
||||
// Parallel jobs (incompatible with --single-transaction per PostgreSQL docs)
|
||||
|
||||
@ -4,6 +4,7 @@ import (
|
||||
"bytes"
|
||||
"crypto/rand"
|
||||
"io"
|
||||
mathrand "math/rand"
|
||||
"testing"
|
||||
)
|
||||
|
||||
@ -100,12 +101,15 @@ func TestChunker_Deterministic(t *testing.T) {
|
||||
|
||||
func TestChunker_ShiftedData(t *testing.T) {
|
||||
// Test that shifted data still shares chunks (the key CDC benefit)
|
||||
// Use deterministic random data for reproducible test results
|
||||
rng := mathrand.New(mathrand.NewSource(42))
|
||||
|
||||
original := make([]byte, 100*1024)
|
||||
rand.Read(original)
|
||||
rng.Read(original)
|
||||
|
||||
// Create shifted version (prepend some bytes)
|
||||
prefix := make([]byte, 1000)
|
||||
rand.Read(prefix)
|
||||
rng.Read(prefix)
|
||||
shifted := append(prefix, original...)
|
||||
|
||||
// Chunk both
|
||||
|
||||
@ -146,7 +146,7 @@ func (d *Dots) Start(message string) {
|
||||
fmt.Fprint(d.writer, message)
|
||||
|
||||
go func() {
|
||||
ticker := time.NewTicker(500 * time.Millisecond)
|
||||
ticker := time.NewTicker(100 * time.Millisecond)
|
||||
defer ticker.Stop()
|
||||
|
||||
count := 0
|
||||
|
||||
@ -1,9 +1,12 @@
|
||||
package restore
|
||||
|
||||
import (
|
||||
"archive/tar"
|
||||
"compress/gzip"
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"io"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
@ -24,6 +27,21 @@ import (
|
||||
_ "github.com/jackc/pgx/v5/stdlib" // PostgreSQL driver
|
||||
)
|
||||
|
||||
// ProgressCallback is called with progress updates during long operations
|
||||
// Parameters: current bytes/items done, total bytes/items, description
|
||||
type ProgressCallback func(current, total int64, description string)
|
||||
|
||||
// DatabaseProgressCallback is called with database count progress during cluster restore
|
||||
type DatabaseProgressCallback func(done, total int, dbName string)
|
||||
|
||||
// DatabaseProgressWithTimingCallback is called with database progress including timing info
|
||||
// Parameters: done count, total count, database name, elapsed time for current restore phase, avg duration per DB
|
||||
type DatabaseProgressWithTimingCallback func(done, total int, dbName string, phaseElapsed, avgPerDB time.Duration)
|
||||
|
||||
// DatabaseProgressByBytesCallback is called with progress weighted by database sizes (bytes)
|
||||
// Parameters: bytes completed, total bytes, current database name, databases done count, total database count
|
||||
type DatabaseProgressByBytesCallback func(bytesDone, bytesTotal int64, dbName string, dbDone, dbTotal int)
|
||||
|
||||
// Engine handles database restore operations
|
||||
type Engine struct {
|
||||
cfg *config.Config
|
||||
@ -32,7 +50,14 @@ type Engine struct {
|
||||
progress progress.Indicator
|
||||
detailedReporter *progress.DetailedReporter
|
||||
dryRun bool
|
||||
silentMode bool // Suppress stdout output (for TUI mode)
|
||||
debugLogPath string // Path to save debug log on error
|
||||
|
||||
// TUI progress callback for detailed progress reporting
|
||||
progressCallback ProgressCallback
|
||||
dbProgressCallback DatabaseProgressCallback
|
||||
dbProgressTimingCallback DatabaseProgressWithTimingCallback
|
||||
dbProgressByBytesCallback DatabaseProgressByBytesCallback
|
||||
}
|
||||
|
||||
// New creates a new restore engine
|
||||
@ -62,6 +87,7 @@ func NewSilent(cfg *config.Config, log logger.Logger, db database.Database) *Eng
|
||||
progress: progressIndicator,
|
||||
detailedReporter: detailedReporter,
|
||||
dryRun: false,
|
||||
silentMode: true, // Suppress stdout for TUI
|
||||
}
|
||||
}
|
||||
|
||||
@ -88,6 +114,54 @@ func (e *Engine) SetDebugLogPath(path string) {
|
||||
e.debugLogPath = path
|
||||
}
|
||||
|
||||
// SetProgressCallback sets a callback for detailed progress reporting (for TUI mode)
|
||||
func (e *Engine) SetProgressCallback(cb ProgressCallback) {
|
||||
e.progressCallback = cb
|
||||
}
|
||||
|
||||
// SetDatabaseProgressCallback sets a callback for database count progress during cluster restore
|
||||
func (e *Engine) SetDatabaseProgressCallback(cb DatabaseProgressCallback) {
|
||||
e.dbProgressCallback = cb
|
||||
}
|
||||
|
||||
// SetDatabaseProgressWithTimingCallback sets a callback for database progress with timing info
|
||||
func (e *Engine) SetDatabaseProgressWithTimingCallback(cb DatabaseProgressWithTimingCallback) {
|
||||
e.dbProgressTimingCallback = cb
|
||||
}
|
||||
|
||||
// SetDatabaseProgressByBytesCallback sets a callback for progress weighted by database sizes
|
||||
func (e *Engine) SetDatabaseProgressByBytesCallback(cb DatabaseProgressByBytesCallback) {
|
||||
e.dbProgressByBytesCallback = cb
|
||||
}
|
||||
|
||||
// reportProgress safely calls the progress callback if set
|
||||
func (e *Engine) reportProgress(current, total int64, description string) {
|
||||
if e.progressCallback != nil {
|
||||
e.progressCallback(current, total, description)
|
||||
}
|
||||
}
|
||||
|
||||
// reportDatabaseProgress safely calls the database progress callback if set
|
||||
func (e *Engine) reportDatabaseProgress(done, total int, dbName string) {
|
||||
if e.dbProgressCallback != nil {
|
||||
e.dbProgressCallback(done, total, dbName)
|
||||
}
|
||||
}
|
||||
|
||||
// reportDatabaseProgressWithTiming safely calls the timing-aware callback if set
|
||||
func (e *Engine) reportDatabaseProgressWithTiming(done, total int, dbName string, phaseElapsed, avgPerDB time.Duration) {
|
||||
if e.dbProgressTimingCallback != nil {
|
||||
e.dbProgressTimingCallback(done, total, dbName, phaseElapsed, avgPerDB)
|
||||
}
|
||||
}
|
||||
|
||||
// reportDatabaseProgressByBytes safely calls the bytes-weighted callback if set
|
||||
func (e *Engine) reportDatabaseProgressByBytes(bytesDone, bytesTotal int64, dbName string, dbDone, dbTotal int) {
|
||||
if e.dbProgressByBytesCallback != nil {
|
||||
e.dbProgressByBytesCallback(bytesDone, bytesTotal, dbName, dbDone, dbTotal)
|
||||
}
|
||||
}
|
||||
|
||||
// loggerAdapter adapts our logger to the progress.Logger interface
|
||||
type loggerAdapter struct {
|
||||
logger logger.Logger
|
||||
@ -218,6 +292,25 @@ func (e *Engine) restorePostgreSQLDump(ctx context.Context, archivePath, targetD
|
||||
|
||||
cmd := e.db.BuildRestoreCommand(targetDB, archivePath, opts)
|
||||
|
||||
// Start heartbeat ticker for restore progress
|
||||
restoreStart := time.Now()
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
defer heartbeatTicker.Stop()
|
||||
defer cancelHeartbeat()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(restoreStart)
|
||||
e.progress.Update(fmt.Sprintf("Restoring %s... (elapsed: %s)", targetDB, formatDuration(elapsed)))
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
if compressed {
|
||||
// For compressed dumps, decompress first
|
||||
return e.executeRestoreWithDecompression(ctx, archivePath, cmd)
|
||||
@ -387,16 +480,18 @@ func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB
|
||||
var cmd []string
|
||||
|
||||
// For localhost, omit -h to use Unix socket (avoids Ident auth issues)
|
||||
// But always include -p for port (in case of non-standard port)
|
||||
hostArg := ""
|
||||
portArg := fmt.Sprintf("-p %d", e.cfg.Port)
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "" {
|
||||
hostArg = fmt.Sprintf("-h %s -p %d", e.cfg.Host, e.cfg.Port)
|
||||
hostArg = fmt.Sprintf("-h %s", e.cfg.Host)
|
||||
}
|
||||
|
||||
if compressed {
|
||||
// Use ON_ERROR_STOP=1 to fail fast on first error (prevents millions of errors on truncated dumps)
|
||||
psqlCmd := fmt.Sprintf("psql -U %s -d %s -v ON_ERROR_STOP=1", e.cfg.User, targetDB)
|
||||
psqlCmd := fmt.Sprintf("psql %s -U %s -d %s -v ON_ERROR_STOP=1", portArg, e.cfg.User, targetDB)
|
||||
if hostArg != "" {
|
||||
psqlCmd = fmt.Sprintf("psql %s -U %s -d %s -v ON_ERROR_STOP=1", hostArg, e.cfg.User, targetDB)
|
||||
psqlCmd = fmt.Sprintf("psql %s %s -U %s -d %s -v ON_ERROR_STOP=1", hostArg, portArg, e.cfg.User, targetDB)
|
||||
}
|
||||
// Set PGPASSWORD in the bash command for password-less auth
|
||||
cmd = []string{
|
||||
@ -417,6 +512,7 @@ func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB
|
||||
} else {
|
||||
cmd = []string{
|
||||
"psql",
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", targetDB,
|
||||
"-v", "ON_ERROR_STOP=1",
|
||||
@ -743,8 +839,99 @@ func (e *Engine) previewRestore(archivePath, targetDB string, format ArchiveForm
|
||||
return nil
|
||||
}
|
||||
|
||||
// RestoreSingleFromCluster extracts and restores a single database from a cluster backup
|
||||
func (e *Engine) RestoreSingleFromCluster(ctx context.Context, clusterArchivePath, dbName, targetDB string, cleanFirst, createIfMissing bool) error {
|
||||
operation := e.log.StartOperation("Single Database Restore from Cluster")
|
||||
|
||||
// Validate and sanitize archive path
|
||||
validArchivePath, pathErr := security.ValidateArchivePath(clusterArchivePath)
|
||||
if pathErr != nil {
|
||||
operation.Fail(fmt.Sprintf("Invalid archive path: %v", pathErr))
|
||||
return fmt.Errorf("invalid archive path: %w", pathErr)
|
||||
}
|
||||
clusterArchivePath = validArchivePath
|
||||
|
||||
// Validate archive exists
|
||||
if _, err := os.Stat(clusterArchivePath); os.IsNotExist(err) {
|
||||
operation.Fail("Archive not found")
|
||||
return fmt.Errorf("archive not found: %s", clusterArchivePath)
|
||||
}
|
||||
|
||||
// Verify it's a cluster archive
|
||||
format := DetectArchiveFormat(clusterArchivePath)
|
||||
if format != FormatClusterTarGz {
|
||||
operation.Fail("Not a cluster archive")
|
||||
return fmt.Errorf("not a cluster archive: %s (format: %s)", clusterArchivePath, format)
|
||||
}
|
||||
|
||||
// Create temporary directory for extraction
|
||||
workDir := e.cfg.GetEffectiveWorkDir()
|
||||
tempDir := filepath.Join(workDir, fmt.Sprintf(".extract_%d", time.Now().Unix()))
|
||||
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
||||
operation.Fail("Failed to create temporary directory")
|
||||
return fmt.Errorf("failed to create temp directory: %w", err)
|
||||
}
|
||||
defer os.RemoveAll(tempDir)
|
||||
|
||||
// Extract the specific database from cluster archive
|
||||
e.log.Info("Extracting database from cluster backup", "database", dbName, "cluster", filepath.Base(clusterArchivePath))
|
||||
e.progress.Start(fmt.Sprintf("Extracting '%s' from cluster backup", dbName))
|
||||
|
||||
extractedPath, err := ExtractDatabaseFromCluster(ctx, clusterArchivePath, dbName, tempDir, e.log, e.progress)
|
||||
if err != nil {
|
||||
e.progress.Fail(fmt.Sprintf("Extraction failed: %v", err))
|
||||
operation.Fail(fmt.Sprintf("Extraction failed: %v", err))
|
||||
return fmt.Errorf("failed to extract database: %w", err)
|
||||
}
|
||||
|
||||
e.progress.Update(fmt.Sprintf("Extracted: %s", filepath.Base(extractedPath)))
|
||||
e.log.Info("Database extracted successfully", "path", extractedPath)
|
||||
|
||||
// Now restore the extracted database file
|
||||
e.progress.Update("Restoring database...")
|
||||
|
||||
// Create database if requested and it doesn't exist
|
||||
if createIfMissing {
|
||||
e.log.Info("Checking if target database exists", "database", targetDB)
|
||||
if err := e.ensureDatabaseExists(ctx, targetDB); err != nil {
|
||||
operation.Fail(fmt.Sprintf("Failed to create database: %v", err))
|
||||
return fmt.Errorf("failed to create database '%s': %w", targetDB, err)
|
||||
}
|
||||
}
|
||||
|
||||
// Detect format of extracted file
|
||||
extractedFormat := DetectArchiveFormat(extractedPath)
|
||||
e.log.Info("Restoring extracted database", "format", extractedFormat, "target", targetDB)
|
||||
|
||||
// Restore based on format
|
||||
var restoreErr error
|
||||
switch extractedFormat {
|
||||
case FormatPostgreSQLDump, FormatPostgreSQLDumpGz:
|
||||
restoreErr = e.restorePostgreSQLDump(ctx, extractedPath, targetDB, extractedFormat == FormatPostgreSQLDumpGz, cleanFirst)
|
||||
case FormatPostgreSQLSQL, FormatPostgreSQLSQLGz:
|
||||
restoreErr = e.restorePostgreSQLSQL(ctx, extractedPath, targetDB, extractedFormat == FormatPostgreSQLSQLGz)
|
||||
case FormatMySQLSQL, FormatMySQLSQLGz:
|
||||
restoreErr = e.restoreMySQLSQL(ctx, extractedPath, targetDB, extractedFormat == FormatMySQLSQLGz)
|
||||
default:
|
||||
operation.Fail("Unsupported extracted format")
|
||||
return fmt.Errorf("unsupported extracted format: %s", extractedFormat)
|
||||
}
|
||||
|
||||
if restoreErr != nil {
|
||||
e.progress.Fail(fmt.Sprintf("Restore failed: %v", restoreErr))
|
||||
operation.Fail(fmt.Sprintf("Restore failed: %v", restoreErr))
|
||||
return restoreErr
|
||||
}
|
||||
|
||||
e.progress.Complete(fmt.Sprintf("Database '%s' restored from cluster backup", targetDB))
|
||||
operation.Complete(fmt.Sprintf("Restored '%s' from cluster as '%s'", dbName, targetDB))
|
||||
return nil
|
||||
}
|
||||
|
||||
// RestoreCluster restores a full cluster from a tar.gz archive
|
||||
func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
// If preExtractedPath is non-empty, uses that directory instead of extracting archivePath
|
||||
// This avoids double extraction when ValidateAndExtractCluster was already called
|
||||
func (e *Engine) RestoreCluster(ctx context.Context, archivePath string, preExtractedPath ...string) error {
|
||||
operation := e.log.StartOperation("Cluster Restore")
|
||||
|
||||
// Validate and sanitize archive path
|
||||
@ -775,22 +962,32 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
return fmt.Errorf("not a cluster archive: %s (detected format: %s)", archivePath, format)
|
||||
}
|
||||
|
||||
// Check disk space before starting restore
|
||||
e.log.Info("Checking disk space for restore")
|
||||
archiveInfo, err := os.Stat(archivePath)
|
||||
if err == nil {
|
||||
spaceCheck := checks.CheckDiskSpaceForRestore(e.cfg.BackupDir, archiveInfo.Size())
|
||||
// Check if we have a pre-extracted directory (optimization to avoid double extraction)
|
||||
// This check must happen BEFORE disk space checks to avoid false failures
|
||||
usingPreExtracted := len(preExtractedPath) > 0 && preExtractedPath[0] != ""
|
||||
|
||||
if spaceCheck.Critical {
|
||||
operation.Fail("Insufficient disk space")
|
||||
return fmt.Errorf("insufficient disk space for restore: %.1f%% used - need at least 4x archive size", spaceCheck.UsedPercent)
|
||||
}
|
||||
// Check disk space before starting restore (skip if using pre-extracted directory)
|
||||
var archiveInfo os.FileInfo
|
||||
var err error
|
||||
if !usingPreExtracted {
|
||||
e.log.Info("Checking disk space for restore")
|
||||
archiveInfo, err = os.Stat(archivePath)
|
||||
if err == nil {
|
||||
spaceCheck := checks.CheckDiskSpaceForRestore(e.cfg.BackupDir, archiveInfo.Size())
|
||||
|
||||
if spaceCheck.Warning {
|
||||
e.log.Warn("Low disk space - restore may fail",
|
||||
"available_gb", float64(spaceCheck.AvailableBytes)/(1024*1024*1024),
|
||||
"used_percent", spaceCheck.UsedPercent)
|
||||
if spaceCheck.Critical {
|
||||
operation.Fail("Insufficient disk space")
|
||||
return fmt.Errorf("insufficient disk space for restore: %.1f%% used - need at least 4x archive size", spaceCheck.UsedPercent)
|
||||
}
|
||||
|
||||
if spaceCheck.Warning {
|
||||
e.log.Warn("Low disk space - restore may fail",
|
||||
"available_gb", float64(spaceCheck.AvailableBytes)/(1024*1024*1024),
|
||||
"used_percent", spaceCheck.UsedPercent)
|
||||
}
|
||||
}
|
||||
} else {
|
||||
e.log.Info("Skipping disk space check (using pre-extracted directory)")
|
||||
}
|
||||
|
||||
if e.dryRun {
|
||||
@ -803,17 +1000,56 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
// Create temporary extraction directory in configured WorkDir
|
||||
workDir := e.cfg.GetEffectiveWorkDir()
|
||||
tempDir := filepath.Join(workDir, fmt.Sprintf(".restore_%d", time.Now().Unix()))
|
||||
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
||||
operation.Fail("Failed to create temporary directory")
|
||||
return fmt.Errorf("failed to create temp directory in %s: %w", workDir, err)
|
||||
}
|
||||
defer os.RemoveAll(tempDir)
|
||||
|
||||
// Extract archive
|
||||
e.log.Info("Extracting cluster archive", "archive", archivePath, "tempDir", tempDir)
|
||||
if err := e.extractArchive(ctx, archivePath, tempDir); err != nil {
|
||||
operation.Fail("Archive extraction failed")
|
||||
return fmt.Errorf("failed to extract archive: %w", err)
|
||||
// Handle pre-extracted directory or extract archive
|
||||
if usingPreExtracted {
|
||||
tempDir = preExtractedPath[0]
|
||||
// Note: Caller handles cleanup of pre-extracted directory
|
||||
e.log.Info("Using pre-extracted cluster directory",
|
||||
"path", tempDir,
|
||||
"optimization", "skipping duplicate extraction")
|
||||
} else {
|
||||
// Check disk space for extraction (need ~3x archive size: compressed + extracted + working space)
|
||||
if archiveInfo != nil {
|
||||
requiredBytes := uint64(archiveInfo.Size()) * 3
|
||||
extractionCheck := checks.CheckDiskSpace(workDir)
|
||||
if extractionCheck.AvailableBytes < requiredBytes {
|
||||
operation.Fail("Insufficient disk space for extraction")
|
||||
return fmt.Errorf("insufficient disk space for extraction in %s: need %.1f GB, have %.1f GB (archive size: %.1f GB × 3)",
|
||||
workDir,
|
||||
float64(requiredBytes)/(1024*1024*1024),
|
||||
float64(extractionCheck.AvailableBytes)/(1024*1024*1024),
|
||||
float64(archiveInfo.Size())/(1024*1024*1024))
|
||||
}
|
||||
e.log.Info("Disk space check for extraction passed",
|
||||
"workdir", workDir,
|
||||
"required_gb", float64(requiredBytes)/(1024*1024*1024),
|
||||
"available_gb", float64(extractionCheck.AvailableBytes)/(1024*1024*1024))
|
||||
}
|
||||
|
||||
// Need to extract archive ourselves
|
||||
if err := os.MkdirAll(tempDir, 0755); err != nil {
|
||||
operation.Fail("Failed to create temporary directory")
|
||||
return fmt.Errorf("failed to create temp directory in %s: %w", workDir, err)
|
||||
}
|
||||
defer os.RemoveAll(tempDir)
|
||||
|
||||
// Extract archive
|
||||
e.log.Info("Extracting cluster archive", "archive", archivePath, "tempDir", tempDir)
|
||||
if err := e.extractArchive(ctx, archivePath, tempDir); err != nil {
|
||||
operation.Fail("Archive extraction failed")
|
||||
return fmt.Errorf("failed to extract archive: %w", err)
|
||||
}
|
||||
|
||||
// Check context validity after extraction (debugging context cancellation issues)
|
||||
if ctx.Err() != nil {
|
||||
e.log.Error("Context cancelled after extraction - this should not happen",
|
||||
"context_error", ctx.Err(),
|
||||
"extraction_completed", true)
|
||||
operation.Fail("Context cancelled unexpectedly")
|
||||
return fmt.Errorf("context cancelled after extraction completed: %w", ctx.Err())
|
||||
}
|
||||
e.log.Info("Extraction completed, context still valid")
|
||||
}
|
||||
|
||||
// Check if user has superuser privileges (required for ownership restoration)
|
||||
@ -936,6 +1172,27 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
e.log.Warn("Preflight checks failed", "error", preflightErr)
|
||||
}
|
||||
|
||||
// 🛡️ LARGE DATABASE GUARD - Bulletproof protection for large database restores
|
||||
e.progress.Update("Analyzing database characteristics...")
|
||||
guard := NewLargeDBGuard(e.cfg, e.log)
|
||||
|
||||
// Build list of dump files for analysis
|
||||
var dumpFilePaths []string
|
||||
for _, entry := range entries {
|
||||
if !entry.IsDir() {
|
||||
dumpFilePaths = append(dumpFilePaths, filepath.Join(dumpsDir, entry.Name()))
|
||||
}
|
||||
}
|
||||
|
||||
// Determine optimal restore strategy
|
||||
strategy := guard.DetermineStrategy(ctx, archivePath, dumpFilePaths)
|
||||
|
||||
// Apply strategy (override config if needed)
|
||||
if strategy.UseConservative {
|
||||
guard.ApplyStrategy(strategy, e.cfg)
|
||||
guard.WarnUser(strategy, e.silentMode)
|
||||
}
|
||||
|
||||
// Calculate optimal lock boost based on BLOB count
|
||||
lockBoostValue := 2048 // Default
|
||||
if preflight != nil && preflight.Archive.RecommendedLockBoost > 0 {
|
||||
@ -944,34 +1201,114 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
|
||||
// AUTO-TUNE: Boost PostgreSQL settings for large restores
|
||||
e.progress.Update("Tuning PostgreSQL for large restore...")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Attempting to boost PostgreSQL lock settings",
|
||||
"target_max_locks", lockBoostValue,
|
||||
"conservative_mode", strategy.UseConservative)
|
||||
}
|
||||
|
||||
originalSettings, tuneErr := e.boostPostgreSQLSettings(ctx, lockBoostValue)
|
||||
if tuneErr != nil {
|
||||
e.log.Warn("Could not boost PostgreSQL settings - restore may fail on BLOB-heavy databases",
|
||||
"error", tuneErr)
|
||||
} else {
|
||||
e.log.Info("Boosted PostgreSQL settings for restore",
|
||||
"max_locks_per_transaction", fmt.Sprintf("%d → %d", originalSettings.MaxLocks, lockBoostValue),
|
||||
"maintenance_work_mem", fmt.Sprintf("%s → 2GB", originalSettings.MaintenanceWorkMem))
|
||||
// Ensure we reset settings when done (even on failure)
|
||||
defer func() {
|
||||
if resetErr := e.resetPostgreSQLSettings(ctx, originalSettings); resetErr != nil {
|
||||
e.log.Warn("Could not reset PostgreSQL settings", "error", resetErr)
|
||||
} else {
|
||||
e.log.Info("Reset PostgreSQL settings to original values")
|
||||
}
|
||||
}()
|
||||
e.log.Error("Could not boost PostgreSQL settings", "error", tuneErr)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] Lock boost attempt FAILED",
|
||||
"error", tuneErr,
|
||||
"phase", "boostPostgreSQLSettings")
|
||||
}
|
||||
|
||||
operation.Fail("PostgreSQL tuning failed")
|
||||
return fmt.Errorf("failed to boost PostgreSQL settings: %w", tuneErr)
|
||||
}
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Lock boost function returned",
|
||||
"original_max_locks", originalSettings.MaxLocks,
|
||||
"target_max_locks", lockBoostValue,
|
||||
"boost_successful", originalSettings.MaxLocks >= lockBoostValue)
|
||||
}
|
||||
|
||||
// CRITICAL: Verify locks were actually increased
|
||||
// Even in conservative mode (--jobs=1), a single massive database can exhaust locks
|
||||
// If boost failed (couldn't restart PostgreSQL), we MUST abort
|
||||
if originalSettings.MaxLocks < lockBoostValue {
|
||||
e.log.Error("PostgreSQL lock boost FAILED - restart required but not possible",
|
||||
"current_locks", originalSettings.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"conservative_mode", strategy.UseConservative,
|
||||
"note", "Even single-threaded restore can fail with massive databases")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] CRITICAL: Lock verification FAILED",
|
||||
"actual_locks", originalSettings.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"delta", lockBoostValue-originalSettings.MaxLocks,
|
||||
"verdict", "ABORT RESTORE")
|
||||
}
|
||||
|
||||
operation.Fail(fmt.Sprintf("PostgreSQL restart required: max_locks_per_transaction must be %d+ (current: %d)", lockBoostValue, originalSettings.MaxLocks))
|
||||
|
||||
// Provide clear instructions
|
||||
e.log.Error("=" + strings.Repeat("=", 70))
|
||||
e.log.Error("RESTORE ABORTED - Action Required:")
|
||||
e.log.Error("1. ALTER SYSTEM has saved max_locks_per_transaction=%d to postgresql.auto.conf", lockBoostValue)
|
||||
e.log.Error("2. Restart PostgreSQL to activate the new setting:")
|
||||
e.log.Error(" sudo systemctl restart postgresql")
|
||||
e.log.Error("3. Retry the restore - it will then complete successfully")
|
||||
e.log.Error("=" + strings.Repeat("=", 70))
|
||||
|
||||
return fmt.Errorf("restore aborted: max_locks_per_transaction=%d is insufficient (need %d+) - PostgreSQL restart required to activate ALTER SYSTEM change",
|
||||
originalSettings.MaxLocks, lockBoostValue)
|
||||
}
|
||||
|
||||
e.log.Info("PostgreSQL tuning verified - locks sufficient for restore",
|
||||
"max_locks_per_transaction", originalSettings.MaxLocks,
|
||||
"target_locks", lockBoostValue,
|
||||
"maintenance_work_mem", "2GB",
|
||||
"conservative_mode", strategy.UseConservative)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Lock verification PASSED",
|
||||
"actual_locks", originalSettings.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"verdict", "PROCEED WITH RESTORE")
|
||||
}
|
||||
|
||||
// Ensure we reset settings when done (even on failure)
|
||||
defer func() {
|
||||
if resetErr := e.resetPostgreSQLSettings(ctx, originalSettings); resetErr != nil {
|
||||
e.log.Warn("Could not reset PostgreSQL settings", "error", resetErr)
|
||||
} else {
|
||||
e.log.Info("Reset PostgreSQL settings to original values")
|
||||
}
|
||||
}()
|
||||
|
||||
var restoreErrors *multierror.Error
|
||||
var restoreErrorsMu sync.Mutex
|
||||
totalDBs := 0
|
||||
|
||||
// Count total databases
|
||||
// Count total databases and calculate total bytes for weighted progress
|
||||
var totalBytes int64
|
||||
dbSizes := make(map[string]int64) // Map database name to dump file size
|
||||
for _, entry := range entries {
|
||||
if !entry.IsDir() {
|
||||
totalDBs++
|
||||
dumpFile := filepath.Join(dumpsDir, entry.Name())
|
||||
if info, err := os.Stat(dumpFile); err == nil {
|
||||
dbName := entry.Name()
|
||||
dbName = strings.TrimSuffix(dbName, ".dump")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql.gz")
|
||||
dbSizes[dbName] = info.Size()
|
||||
totalBytes += info.Size()
|
||||
}
|
||||
}
|
||||
}
|
||||
e.log.Info("Calculated total restore size", "databases", totalDBs, "total_bytes", totalBytes)
|
||||
|
||||
// Track bytes completed for weighted progress
|
||||
var bytesCompleted int64
|
||||
var bytesCompletedMu sync.Mutex
|
||||
|
||||
// Create ETA estimator for database restores
|
||||
estimator := progress.NewETAEstimator("Restoring cluster", totalDBs)
|
||||
@ -999,6 +1336,23 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
var successCount, failCount int32
|
||||
var mu sync.Mutex // Protect shared resources (progress, logger)
|
||||
|
||||
// CRITICAL: Check context before starting database restore loop
|
||||
// This helps debug issues where context gets cancelled between extraction and restore
|
||||
if ctx.Err() != nil {
|
||||
e.log.Error("Context cancelled before database restore loop started",
|
||||
"context_error", ctx.Err(),
|
||||
"total_databases", totalDBs,
|
||||
"parallelism", parallelism)
|
||||
operation.Fail("Context cancelled before database restores could start")
|
||||
return fmt.Errorf("context cancelled before database restore: %w", ctx.Err())
|
||||
}
|
||||
e.log.Info("Starting database restore loop", "databases", totalDBs, "parallelism", parallelism)
|
||||
|
||||
// Timing tracking for restore phase progress
|
||||
restorePhaseStart := time.Now()
|
||||
var completedDBTimes []time.Duration // Track duration for each completed DB restore
|
||||
var completedDBTimesMu sync.Mutex
|
||||
|
||||
// Create semaphore to limit concurrency
|
||||
semaphore := make(chan struct{}, parallelism)
|
||||
var wg sync.WaitGroup
|
||||
@ -1024,6 +1378,19 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
}
|
||||
}()
|
||||
|
||||
// Check for context cancellation before starting
|
||||
if ctx.Err() != nil {
|
||||
e.log.Warn("Context cancelled - skipping database restore", "file", filename)
|
||||
atomic.AddInt32(&failCount, 1)
|
||||
restoreErrorsMu.Lock()
|
||||
restoreErrors = multierror.Append(restoreErrors, fmt.Errorf("%s: restore skipped (context cancelled)", strings.TrimSuffix(strings.TrimSuffix(filename, ".dump"), ".sql.gz")))
|
||||
restoreErrorsMu.Unlock()
|
||||
return
|
||||
}
|
||||
|
||||
// Track timing for this database restore
|
||||
dbRestoreStart := time.Now()
|
||||
|
||||
// Update estimator progress (thread-safe)
|
||||
mu.Lock()
|
||||
estimator.UpdateProgress(idx)
|
||||
@ -1036,10 +1403,26 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
|
||||
dbProgress := 15 + int(float64(idx)/float64(totalDBs)*85.0)
|
||||
|
||||
// Calculate average time per DB and report progress with timing
|
||||
completedDBTimesMu.Lock()
|
||||
var avgPerDB time.Duration
|
||||
if len(completedDBTimes) > 0 {
|
||||
var totalDuration time.Duration
|
||||
for _, d := range completedDBTimes {
|
||||
totalDuration += d
|
||||
}
|
||||
avgPerDB = totalDuration / time.Duration(len(completedDBTimes))
|
||||
}
|
||||
phaseElapsed := time.Since(restorePhaseStart)
|
||||
completedDBTimesMu.Unlock()
|
||||
|
||||
mu.Lock()
|
||||
statusMsg := fmt.Sprintf("Restoring database %s (%d/%d)", dbName, idx+1, totalDBs)
|
||||
e.progress.Update(statusMsg)
|
||||
e.log.Info("Restoring database", "name", dbName, "file", dumpFile, "progress", dbProgress)
|
||||
// Report database progress for TUI (both callbacks)
|
||||
e.reportDatabaseProgress(idx, totalDBs, dbName)
|
||||
e.reportDatabaseProgressWithTiming(idx, totalDBs, dbName, phaseElapsed, avgPerDB)
|
||||
mu.Unlock()
|
||||
|
||||
// STEP 1: Drop existing database completely (clean slate)
|
||||
@ -1062,6 +1445,25 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
preserveOwnership := isSuperuser
|
||||
isCompressedSQL := strings.HasSuffix(dumpFile, ".sql.gz")
|
||||
|
||||
// Start heartbeat ticker to show progress during long-running restore
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(dbRestoreStart)
|
||||
mu.Lock()
|
||||
statusMsg := fmt.Sprintf("Restoring %s (%d/%d) - elapsed: %s",
|
||||
dbName, idx+1, totalDBs, formatDuration(elapsed))
|
||||
e.progress.Update(statusMsg)
|
||||
mu.Unlock()
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
var restoreErr error
|
||||
if isCompressedSQL {
|
||||
mu.Lock()
|
||||
@ -1075,6 +1477,10 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
restoreErr = e.restorePostgreSQLDumpWithOwnership(ctx, dumpFile, dbName, false, preserveOwnership)
|
||||
}
|
||||
|
||||
// Stop heartbeat ticker
|
||||
heartbeatTicker.Stop()
|
||||
cancelHeartbeat()
|
||||
|
||||
if restoreErr != nil {
|
||||
mu.Lock()
|
||||
e.log.Error("Failed to restore database", "name", dbName, "file", dumpFile, "error", restoreErr)
|
||||
@ -1104,7 +1510,27 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
return
|
||||
}
|
||||
|
||||
// Track completed database restore duration for ETA calculation
|
||||
dbRestoreDuration := time.Since(dbRestoreStart)
|
||||
completedDBTimesMu.Lock()
|
||||
completedDBTimes = append(completedDBTimes, dbRestoreDuration)
|
||||
completedDBTimesMu.Unlock()
|
||||
|
||||
// Update bytes completed for weighted progress
|
||||
dbSize := dbSizes[dbName]
|
||||
bytesCompletedMu.Lock()
|
||||
bytesCompleted += dbSize
|
||||
currentBytesCompleted := bytesCompleted
|
||||
currentSuccessCount := int(atomic.LoadInt32(&successCount)) + 1 // +1 because we're about to increment
|
||||
bytesCompletedMu.Unlock()
|
||||
|
||||
// Report weighted progress (bytes-based)
|
||||
e.reportDatabaseProgressByBytes(currentBytesCompleted, totalBytes, dbName, currentSuccessCount, totalDBs)
|
||||
|
||||
atomic.AddInt32(&successCount, 1)
|
||||
|
||||
// Small delay to ensure PostgreSQL fully closes connections before next restore
|
||||
time.Sleep(100 * time.Millisecond)
|
||||
}(dbIndex, entry.Name())
|
||||
|
||||
dbIndex++
|
||||
@ -1116,6 +1542,35 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
successCountFinal := int(atomic.LoadInt32(&successCount))
|
||||
failCountFinal := int(atomic.LoadInt32(&failCount))
|
||||
|
||||
// SANITY CHECK: Verify all databases were accounted for
|
||||
// This catches any goroutine that exited without updating counters
|
||||
accountedFor := successCountFinal + failCountFinal
|
||||
if accountedFor != totalDBs {
|
||||
missingCount := totalDBs - accountedFor
|
||||
e.log.Error("INTERNAL ERROR: Some database restore goroutines did not report status",
|
||||
"expected", totalDBs,
|
||||
"success", successCountFinal,
|
||||
"failed", failCountFinal,
|
||||
"unaccounted", missingCount)
|
||||
|
||||
// Treat unaccounted databases as failures
|
||||
failCountFinal += missingCount
|
||||
restoreErrorsMu.Lock()
|
||||
restoreErrors = multierror.Append(restoreErrors, fmt.Errorf("%d database(s) did not complete (possible goroutine crash or deadlock)", missingCount))
|
||||
restoreErrorsMu.Unlock()
|
||||
}
|
||||
|
||||
// CRITICAL: Check if no databases were restored at all
|
||||
if successCountFinal == 0 {
|
||||
e.progress.Fail(fmt.Sprintf("Cluster restore FAILED: 0 of %d databases restored", totalDBs))
|
||||
operation.Fail("No databases were restored")
|
||||
|
||||
if failCountFinal > 0 && restoreErrors != nil {
|
||||
return fmt.Errorf("cluster restore failed: all %d database(s) failed:\n%s", failCountFinal, restoreErrors.Error())
|
||||
}
|
||||
return fmt.Errorf("cluster restore failed: no databases were restored (0 of %d total). Check PostgreSQL logs for details", totalDBs)
|
||||
}
|
||||
|
||||
if failCountFinal > 0 {
|
||||
// Format multi-error with detailed output
|
||||
restoreErrors.ErrorFormat = func(errs []error) string {
|
||||
@ -1146,8 +1601,163 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
||||
return nil
|
||||
}
|
||||
|
||||
// extractArchive extracts a tar.gz archive
|
||||
// extractArchive extracts a tar.gz archive with progress reporting
|
||||
func (e *Engine) extractArchive(ctx context.Context, archivePath, destDir string) error {
|
||||
// If progress callback is set, use Go's archive/tar for progress tracking
|
||||
if e.progressCallback != nil {
|
||||
return e.extractArchiveWithProgress(ctx, archivePath, destDir)
|
||||
}
|
||||
|
||||
// Otherwise use fast shell tar (no progress)
|
||||
return e.extractArchiveShell(ctx, archivePath, destDir)
|
||||
}
|
||||
|
||||
// extractArchiveWithProgress extracts using Go's archive/tar with detailed progress reporting
|
||||
func (e *Engine) extractArchiveWithProgress(ctx context.Context, archivePath, destDir string) error {
|
||||
// Get archive size for progress calculation
|
||||
archiveInfo, err := os.Stat(archivePath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to stat archive: %w", err)
|
||||
}
|
||||
totalSize := archiveInfo.Size()
|
||||
|
||||
// Open the archive file
|
||||
file, err := os.Open(archivePath)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to open archive: %w", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
// Wrap with progress reader
|
||||
progressReader := &progressReader{
|
||||
reader: file,
|
||||
totalSize: totalSize,
|
||||
callback: e.progressCallback,
|
||||
desc: "Extracting archive",
|
||||
}
|
||||
|
||||
// Create gzip reader
|
||||
gzReader, err := gzip.NewReader(progressReader)
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create gzip reader: %w", err)
|
||||
}
|
||||
defer gzReader.Close()
|
||||
|
||||
// Create tar reader
|
||||
tarReader := tar.NewReader(gzReader)
|
||||
|
||||
// Extract files
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return ctx.Err()
|
||||
default:
|
||||
}
|
||||
|
||||
header, err := tarReader.Next()
|
||||
if err == io.EOF {
|
||||
break // End of archive
|
||||
}
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to read tar header: %w", err)
|
||||
}
|
||||
|
||||
// Sanitize and validate path
|
||||
targetPath := filepath.Join(destDir, header.Name)
|
||||
|
||||
// Security check: ensure path is within destDir (prevent path traversal)
|
||||
if !strings.HasPrefix(filepath.Clean(targetPath), filepath.Clean(destDir)) {
|
||||
e.log.Warn("Skipping potentially malicious path in archive", "path", header.Name)
|
||||
continue
|
||||
}
|
||||
|
||||
switch header.Typeflag {
|
||||
case tar.TypeDir:
|
||||
if err := os.MkdirAll(targetPath, 0755); err != nil {
|
||||
return fmt.Errorf("failed to create directory %s: %w", targetPath, err)
|
||||
}
|
||||
case tar.TypeReg:
|
||||
// Ensure parent directory exists
|
||||
if err := os.MkdirAll(filepath.Dir(targetPath), 0755); err != nil {
|
||||
return fmt.Errorf("failed to create parent directory: %w", err)
|
||||
}
|
||||
|
||||
// Create the file
|
||||
outFile, err := os.OpenFile(targetPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, os.FileMode(header.Mode))
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to create file %s: %w", targetPath, err)
|
||||
}
|
||||
|
||||
// Copy file contents
|
||||
if _, err := io.Copy(outFile, tarReader); err != nil {
|
||||
outFile.Close()
|
||||
return fmt.Errorf("failed to write file %s: %w", targetPath, err)
|
||||
}
|
||||
outFile.Close()
|
||||
case tar.TypeSymlink:
|
||||
// Handle symlinks (common in some archives)
|
||||
if err := os.Symlink(header.Linkname, targetPath); err != nil {
|
||||
// Ignore symlink errors (may already exist or not supported)
|
||||
e.log.Debug("Could not create symlink", "path", targetPath, "target", header.Linkname)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Final progress update
|
||||
e.reportProgress(totalSize, totalSize, "Extraction complete")
|
||||
return nil
|
||||
}
|
||||
|
||||
// progressReader wraps an io.Reader to report read progress
|
||||
type progressReader struct {
|
||||
reader io.Reader
|
||||
totalSize int64
|
||||
bytesRead int64
|
||||
callback ProgressCallback
|
||||
desc string
|
||||
lastReport time.Time
|
||||
reportEvery time.Duration
|
||||
}
|
||||
|
||||
func (pr *progressReader) Read(p []byte) (n int, err error) {
|
||||
n, err = pr.reader.Read(p)
|
||||
pr.bytesRead += int64(n)
|
||||
|
||||
// Throttle progress reporting to every 50ms for smoother updates
|
||||
if pr.reportEvery == 0 {
|
||||
pr.reportEvery = 50 * time.Millisecond
|
||||
}
|
||||
if time.Since(pr.lastReport) > pr.reportEvery {
|
||||
if pr.callback != nil {
|
||||
pr.callback(pr.bytesRead, pr.totalSize, pr.desc)
|
||||
}
|
||||
pr.lastReport = time.Now()
|
||||
}
|
||||
|
||||
return n, err
|
||||
}
|
||||
|
||||
// extractArchiveShell extracts using shell tar command (faster but no progress)
|
||||
func (e *Engine) extractArchiveShell(ctx context.Context, archivePath, destDir string) error {
|
||||
// Start heartbeat ticker for extraction progress
|
||||
extractionStart := time.Now()
|
||||
heartbeatCtx, cancelHeartbeat := context.WithCancel(ctx)
|
||||
heartbeatTicker := time.NewTicker(5 * time.Second)
|
||||
defer heartbeatTicker.Stop()
|
||||
defer cancelHeartbeat()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-heartbeatTicker.C:
|
||||
elapsed := time.Since(extractionStart)
|
||||
e.progress.Update(fmt.Sprintf("Extracting archive... (elapsed: %s)", formatDuration(elapsed)))
|
||||
case <-heartbeatCtx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
cmd := exec.CommandContext(ctx, "tar", "-xzf", archivePath, "-C", destDir)
|
||||
|
||||
// Stream stderr to avoid memory issues - tar can produce lots of output for large archives
|
||||
@ -1199,6 +1809,8 @@ func (e *Engine) extractArchive(ctx context.Context, archivePath, destDir string
|
||||
}
|
||||
|
||||
// restoreGlobals restores global objects (roles, tablespaces)
|
||||
// Note: psql returns 0 even when some statements fail (e.g., role already exists)
|
||||
// We track errors but only fail on FATAL errors that would prevent restore
|
||||
func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
args := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
@ -1228,6 +1840,8 @@ func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
|
||||
// Read stderr in chunks in goroutine
|
||||
var lastError string
|
||||
var errorCount int
|
||||
var fatalError bool
|
||||
stderrDone := make(chan struct{})
|
||||
go func() {
|
||||
defer close(stderrDone)
|
||||
@ -1236,9 +1850,23 @@ func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
n, err := stderr.Read(buf)
|
||||
if n > 0 {
|
||||
chunk := string(buf[:n])
|
||||
if strings.Contains(chunk, "ERROR") || strings.Contains(chunk, "FATAL") {
|
||||
// Track different error types
|
||||
if strings.Contains(chunk, "FATAL") {
|
||||
fatalError = true
|
||||
lastError = chunk
|
||||
e.log.Warn("Globals restore stderr", "output", chunk)
|
||||
e.log.Error("Globals restore FATAL error", "output", chunk)
|
||||
} else if strings.Contains(chunk, "ERROR") {
|
||||
errorCount++
|
||||
lastError = chunk
|
||||
// Only log first few errors to avoid spam
|
||||
if errorCount <= 5 {
|
||||
// Check if it's an ignorable "already exists" error
|
||||
if strings.Contains(chunk, "already exists") {
|
||||
e.log.Debug("Globals restore: object already exists (expected)", "output", chunk)
|
||||
} else {
|
||||
e.log.Warn("Globals restore error", "output", chunk)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
if err != nil {
|
||||
@ -1266,10 +1894,23 @@ func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
||||
|
||||
<-stderrDone
|
||||
|
||||
// Only fail on actual command errors or FATAL PostgreSQL errors
|
||||
// Regular ERROR messages (like "role already exists") are expected
|
||||
if cmdErr != nil {
|
||||
return fmt.Errorf("failed to restore globals: %w (last error: %s)", cmdErr, lastError)
|
||||
}
|
||||
|
||||
// If we had FATAL errors, those are real problems
|
||||
if fatalError {
|
||||
return fmt.Errorf("globals restore had FATAL error: %s", lastError)
|
||||
}
|
||||
|
||||
// Log summary if there were errors (but don't fail)
|
||||
if errorCount > 0 {
|
||||
e.log.Info("Globals restore completed with some errors (usually 'already exists' - expected)",
|
||||
"error_count", errorCount)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
@ -1337,6 +1978,7 @@ func (e *Engine) terminateConnections(ctx context.Context, dbName string) error
|
||||
}
|
||||
|
||||
// dropDatabaseIfExists drops a database completely (clean slate)
|
||||
// Uses PostgreSQL 13+ WITH (FORCE) option to forcefully drop even with active connections
|
||||
func (e *Engine) dropDatabaseIfExists(ctx context.Context, dbName string) error {
|
||||
// First terminate all connections
|
||||
if err := e.terminateConnections(ctx, dbName); err != nil {
|
||||
@ -1346,26 +1988,67 @@ func (e *Engine) dropDatabaseIfExists(ctx context.Context, dbName string) error
|
||||
// Wait a moment for connections to terminate
|
||||
time.Sleep(500 * time.Millisecond)
|
||||
|
||||
// Drop the database
|
||||
args := []string{
|
||||
// Try to revoke new connections (prevents race condition)
|
||||
// This only works if we have the privilege to do so
|
||||
revokeArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("DROP DATABASE IF EXISTS \"%s\"", dbName),
|
||||
"-c", fmt.Sprintf("REVOKE CONNECT ON DATABASE \"%s\" FROM PUBLIC", dbName),
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
args = append([]string{"-h", e.cfg.Host}, args...)
|
||||
revokeArgs = append([]string{"-h", e.cfg.Host}, revokeArgs...)
|
||||
}
|
||||
revokeCmd := exec.CommandContext(ctx, "psql", revokeArgs...)
|
||||
revokeCmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
revokeCmd.Run() // Ignore errors - database might not exist
|
||||
|
||||
// Terminate connections again after revoking connect privilege
|
||||
e.terminateConnections(ctx, dbName)
|
||||
time.Sleep(200 * time.Millisecond)
|
||||
|
||||
// Try DROP DATABASE WITH (FORCE) first (PostgreSQL 13+)
|
||||
// This forcefully terminates connections and drops the database atomically
|
||||
forceArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("DROP DATABASE IF EXISTS \"%s\" WITH (FORCE)", dbName),
|
||||
}
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
forceArgs = append([]string{"-h", e.cfg.Host}, forceArgs...)
|
||||
}
|
||||
forceCmd := exec.CommandContext(ctx, "psql", forceArgs...)
|
||||
forceCmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
|
||||
output, err := forceCmd.CombinedOutput()
|
||||
if err == nil {
|
||||
e.log.Info("Dropped existing database (with FORCE)", "name", dbName)
|
||||
return nil
|
||||
}
|
||||
|
||||
cmd := exec.CommandContext(ctx, "psql", args...)
|
||||
// If FORCE option failed (PostgreSQL < 13), try regular drop
|
||||
if strings.Contains(string(output), "syntax error") || strings.Contains(string(output), "WITH (FORCE)") {
|
||||
e.log.Debug("WITH (FORCE) not supported, using standard DROP", "name", dbName)
|
||||
|
||||
// Always set PGPASSWORD (empty string is fine for peer/ident auth)
|
||||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
args := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("DROP DATABASE IF EXISTS \"%s\"", dbName),
|
||||
}
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
args = append([]string{"-h", e.cfg.Host}, args...)
|
||||
}
|
||||
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
cmd := exec.CommandContext(ctx, "psql", args...)
|
||||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
|
||||
output, err = cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to drop database '%s': %w\nOutput: %s", dbName, err, string(output))
|
||||
}
|
||||
} else if err != nil {
|
||||
return fmt.Errorf("failed to drop database '%s': %w\nOutput: %s", dbName, err, string(output))
|
||||
}
|
||||
|
||||
@ -1408,12 +2091,14 @@ func (e *Engine) ensureMySQLDatabaseExists(ctx context.Context, dbName string) e
|
||||
}
|
||||
|
||||
// ensurePostgresDatabaseExists checks if a PostgreSQL database exists and creates it if not
|
||||
// It attempts to extract encoding/locale from the dump file to preserve original settings
|
||||
func (e *Engine) ensurePostgresDatabaseExists(ctx context.Context, dbName string) error {
|
||||
// Skip creation for postgres and template databases - they should already exist
|
||||
if dbName == "postgres" || dbName == "template0" || dbName == "template1" {
|
||||
e.log.Info("Skipping create for system database (assume exists)", "name", dbName)
|
||||
return nil
|
||||
}
|
||||
|
||||
// Build psql command with authentication
|
||||
buildPsqlCmd := func(ctx context.Context, database, query string) *exec.Cmd {
|
||||
args := []string{
|
||||
@ -1453,14 +2138,31 @@ func (e *Engine) ensurePostgresDatabaseExists(ctx context.Context, dbName string
|
||||
|
||||
// Database doesn't exist, create it
|
||||
// IMPORTANT: Use template0 to avoid duplicate definition errors from local additions to template1
|
||||
// Also use UTF8 encoding explicitly as it's the most common and safest choice
|
||||
// See PostgreSQL docs: https://www.postgresql.org/docs/current/app-pgrestore.html#APP-PGRESTORE-NOTES
|
||||
e.log.Info("Creating database from template0", "name", dbName)
|
||||
e.log.Info("Creating database from template0 with UTF8 encoding", "name", dbName)
|
||||
|
||||
// Get server's default locale for LC_COLLATE and LC_CTYPE
|
||||
// This ensures compatibility while using the correct encoding
|
||||
localeCmd := buildPsqlCmd(ctx, "postgres", "SHOW lc_collate")
|
||||
localeOutput, _ := localeCmd.CombinedOutput()
|
||||
serverLocale := strings.TrimSpace(string(localeOutput))
|
||||
if serverLocale == "" {
|
||||
serverLocale = "en_US.UTF-8" // Fallback to common default
|
||||
}
|
||||
|
||||
// Build CREATE DATABASE command with encoding and locale
|
||||
// Using ENCODING 'UTF8' explicitly ensures the dump can be restored
|
||||
createSQL := fmt.Sprintf(
|
||||
"CREATE DATABASE \"%s\" WITH TEMPLATE template0 ENCODING 'UTF8' LC_COLLATE '%s' LC_CTYPE '%s'",
|
||||
dbName, serverLocale, serverLocale,
|
||||
)
|
||||
|
||||
createArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("CREATE DATABASE \"%s\" WITH TEMPLATE template0", dbName),
|
||||
"-c", createSQL,
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
@ -1475,9 +2177,27 @@ func (e *Engine) ensurePostgresDatabaseExists(ctx context.Context, dbName string
|
||||
|
||||
output, err = createCmd.CombinedOutput()
|
||||
if err != nil {
|
||||
// Log the error and include the psql output in the returned error to aid debugging
|
||||
e.log.Warn("Database creation failed", "name", dbName, "error", err, "output", string(output))
|
||||
return fmt.Errorf("failed to create database '%s': %w (output: %s)", dbName, err, strings.TrimSpace(string(output)))
|
||||
// If encoding/locale fails, try simpler CREATE DATABASE
|
||||
e.log.Warn("Database creation with encoding failed, trying simple create", "name", dbName, "error", err)
|
||||
|
||||
simpleArgs := []string{
|
||||
"-p", fmt.Sprintf("%d", e.cfg.Port),
|
||||
"-U", e.cfg.User,
|
||||
"-d", "postgres",
|
||||
"-c", fmt.Sprintf("CREATE DATABASE \"%s\" WITH TEMPLATE template0", dbName),
|
||||
}
|
||||
if e.cfg.Host != "localhost" && e.cfg.Host != "127.0.0.1" && e.cfg.Host != "" {
|
||||
simpleArgs = append([]string{"-h", e.cfg.Host}, simpleArgs...)
|
||||
}
|
||||
|
||||
simpleCmd := exec.CommandContext(ctx, "psql", simpleArgs...)
|
||||
simpleCmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", e.cfg.Password))
|
||||
|
||||
output, err = simpleCmd.CombinedOutput()
|
||||
if err != nil {
|
||||
e.log.Warn("Database creation failed", "name", dbName, "error", err, "output", string(output))
|
||||
return fmt.Errorf("failed to create database '%s': %w (output: %s)", dbName, err, strings.TrimSpace(string(output)))
|
||||
}
|
||||
}
|
||||
|
||||
e.log.Info("Successfully created database from template0", "name", dbName)
|
||||
@ -1620,6 +2340,25 @@ func FormatBytes(bytes int64) string {
|
||||
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
|
||||
// formatDuration formats a duration to human readable format (e.g., "3m 45s", "1h 23m", "45s")
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Second {
|
||||
return "0s"
|
||||
}
|
||||
|
||||
hours := int(d.Hours())
|
||||
minutes := int(d.Minutes()) % 60
|
||||
seconds := int(d.Seconds()) % 60
|
||||
|
||||
if hours > 0 {
|
||||
return fmt.Sprintf("%dh %dm", hours, minutes)
|
||||
}
|
||||
if minutes > 0 {
|
||||
return fmt.Sprintf("%dm %ds", minutes, seconds)
|
||||
}
|
||||
return fmt.Sprintf("%ds", seconds)
|
||||
}
|
||||
|
||||
// quickValidateSQLDump performs a fast validation of SQL dump files
|
||||
// by checking for truncated COPY blocks. This catches corrupted dumps
|
||||
// BEFORE attempting a full restore (which could waste 49+ minutes).
|
||||
@ -1665,9 +2404,10 @@ func (e *Engine) quickValidateSQLDump(archivePath string, compressed bool) error
|
||||
return nil
|
||||
}
|
||||
|
||||
// boostLockCapacity temporarily increases max_locks_per_transaction to prevent OOM
|
||||
// during large restores with many BLOBs. Returns the original value for later reset.
|
||||
// Uses ALTER SYSTEM + pg_reload_conf() so no restart is needed.
|
||||
// boostLockCapacity checks and reports on max_locks_per_transaction capacity.
|
||||
// IMPORTANT: max_locks_per_transaction requires a PostgreSQL RESTART to change!
|
||||
// This function now calculates total lock capacity based on max_connections and
|
||||
// warns the user if capacity is insufficient for the restore.
|
||||
func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
|
||||
// Connect to PostgreSQL to run system commands
|
||||
connStr := fmt.Sprintf("host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable",
|
||||
@ -1685,7 +2425,7 @@ func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
// Get current value
|
||||
// Get current max_locks_per_transaction
|
||||
var currentValue int
|
||||
err = db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(¤tValue)
|
||||
if err != nil {
|
||||
@ -1698,22 +2438,56 @@ func (e *Engine) boostLockCapacity(ctx context.Context) (int, error) {
|
||||
fmt.Sscanf(currentValueStr, "%d", ¤tValue)
|
||||
}
|
||||
|
||||
// Skip if already high enough
|
||||
if currentValue >= 2048 {
|
||||
e.log.Info("max_locks_per_transaction already sufficient", "value", currentValue)
|
||||
return currentValue, nil
|
||||
// Get max_connections to calculate total lock capacity
|
||||
var maxConns int
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_connections").Scan(&maxConns); err != nil {
|
||||
maxConns = 100 // default
|
||||
}
|
||||
|
||||
// Boost to 2048 (enough for most BLOB-heavy databases)
|
||||
_, err = db.ExecContext(ctx, "ALTER SYSTEM SET max_locks_per_transaction = 2048")
|
||||
if err != nil {
|
||||
return currentValue, fmt.Errorf("failed to set max_locks_per_transaction: %w", err)
|
||||
// Get max_prepared_transactions
|
||||
var maxPreparedTxns int
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_prepared_transactions").Scan(&maxPreparedTxns); err != nil {
|
||||
maxPreparedTxns = 0
|
||||
}
|
||||
|
||||
// Reload config without restart
|
||||
_, err = db.ExecContext(ctx, "SELECT pg_reload_conf()")
|
||||
if err != nil {
|
||||
return currentValue, fmt.Errorf("failed to reload config: %w", err)
|
||||
// Calculate total lock table capacity:
|
||||
// Total locks = max_locks_per_transaction × (max_connections + max_prepared_transactions)
|
||||
totalLockCapacity := currentValue * (maxConns + maxPreparedTxns)
|
||||
|
||||
e.log.Info("PostgreSQL lock table capacity",
|
||||
"max_locks_per_transaction", currentValue,
|
||||
"max_connections", maxConns,
|
||||
"max_prepared_transactions", maxPreparedTxns,
|
||||
"total_lock_capacity", totalLockCapacity)
|
||||
|
||||
// Minimum recommended total capacity for BLOB-heavy restores: 200,000 locks
|
||||
minRecommendedCapacity := 200000
|
||||
if totalLockCapacity < minRecommendedCapacity {
|
||||
recommendedMaxLocks := minRecommendedCapacity / (maxConns + maxPreparedTxns)
|
||||
if recommendedMaxLocks < 4096 {
|
||||
recommendedMaxLocks = 4096
|
||||
}
|
||||
|
||||
e.log.Warn("Lock table capacity may be insufficient for BLOB-heavy restores",
|
||||
"current_total_capacity", totalLockCapacity,
|
||||
"recommended_capacity", minRecommendedCapacity,
|
||||
"current_max_locks", currentValue,
|
||||
"recommended_max_locks", recommendedMaxLocks,
|
||||
"note", "max_locks_per_transaction requires PostgreSQL RESTART to change")
|
||||
|
||||
// Write suggested fix to ALTER SYSTEM but warn about restart
|
||||
_, err = db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", recommendedMaxLocks))
|
||||
if err != nil {
|
||||
e.log.Warn("Could not set recommended max_locks_per_transaction (needs superuser)", "error", err)
|
||||
} else {
|
||||
e.log.Warn("Wrote recommended max_locks_per_transaction to postgresql.auto.conf",
|
||||
"value", recommendedMaxLocks,
|
||||
"action", "RESTART PostgreSQL to apply: sudo systemctl restart postgresql")
|
||||
}
|
||||
} else {
|
||||
e.log.Info("Lock table capacity is sufficient",
|
||||
"total_capacity", totalLockCapacity,
|
||||
"max_locks_per_transaction", currentValue)
|
||||
}
|
||||
|
||||
return currentValue, nil
|
||||
@ -1761,10 +2535,21 @@ type OriginalSettings struct {
|
||||
}
|
||||
|
||||
// boostPostgreSQLSettings boosts multiple PostgreSQL settings for large restores
|
||||
// NOTE: max_locks_per_transaction requires a PostgreSQL RESTART to take effect!
|
||||
// maintenance_work_mem can be changed with pg_reload_conf().
|
||||
func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int) (*OriginalSettings, error) {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] boostPostgreSQLSettings: Starting lock boost procedure",
|
||||
"target_lock_value", lockBoostValue)
|
||||
}
|
||||
|
||||
connStr := e.buildConnString()
|
||||
db, err := sql.Open("pgx", connStr)
|
||||
if err != nil {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] Failed to connect to PostgreSQL",
|
||||
"error", err)
|
||||
}
|
||||
return nil, fmt.Errorf("failed to connect: %w", err)
|
||||
}
|
||||
defer db.Close()
|
||||
@ -1776,34 +2561,218 @@ func (e *Engine) boostPostgreSQLSettings(ctx context.Context, lockBoostValue int
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&maxLocksStr); err == nil {
|
||||
original.MaxLocks, _ = strconv.Atoi(maxLocksStr)
|
||||
}
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Current PostgreSQL lock configuration",
|
||||
"current_max_locks", original.MaxLocks,
|
||||
"target_max_locks", lockBoostValue,
|
||||
"boost_required", original.MaxLocks < lockBoostValue)
|
||||
}
|
||||
|
||||
// Get current maintenance_work_mem
|
||||
db.QueryRowContext(ctx, "SHOW maintenance_work_mem").Scan(&original.MaintenanceWorkMem)
|
||||
|
||||
// Boost max_locks_per_transaction (if not already high enough)
|
||||
// CRITICAL: max_locks_per_transaction requires a PostgreSQL RESTART!
|
||||
// pg_reload_conf() is NOT sufficient for this parameter.
|
||||
needsRestart := false
|
||||
if original.MaxLocks < lockBoostValue {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Executing ALTER SYSTEM to boost locks",
|
||||
"from", original.MaxLocks,
|
||||
"to", lockBoostValue)
|
||||
}
|
||||
|
||||
_, err = db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", lockBoostValue))
|
||||
if err != nil {
|
||||
e.log.Warn("Could not boost max_locks_per_transaction", "error", err)
|
||||
e.log.Warn("Could not set max_locks_per_transaction", "error", err)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] ALTER SYSTEM failed",
|
||||
"error", err)
|
||||
}
|
||||
} else {
|
||||
needsRestart = true
|
||||
e.log.Warn("max_locks_per_transaction requires PostgreSQL restart to take effect",
|
||||
"current", original.MaxLocks,
|
||||
"target", lockBoostValue)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] ALTER SYSTEM succeeded - restart required",
|
||||
"setting_saved_to", "postgresql.auto.conf",
|
||||
"active_after", "PostgreSQL restart")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Boost maintenance_work_mem to 2GB for faster index creation
|
||||
// (this one CAN be applied via pg_reload_conf)
|
||||
_, err = db.ExecContext(ctx, "ALTER SYSTEM SET maintenance_work_mem = '2GB'")
|
||||
if err != nil {
|
||||
e.log.Warn("Could not boost maintenance_work_mem", "error", err)
|
||||
}
|
||||
|
||||
// Reload config to apply changes (no restart needed for these settings)
|
||||
// Reload config to apply maintenance_work_mem
|
||||
_, err = db.ExecContext(ctx, "SELECT pg_reload_conf()")
|
||||
if err != nil {
|
||||
return original, fmt.Errorf("failed to reload config: %w", err)
|
||||
}
|
||||
|
||||
// If max_locks_per_transaction needs a restart, try to do it
|
||||
if needsRestart {
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Attempting PostgreSQL restart to activate new lock setting")
|
||||
}
|
||||
|
||||
if restarted := e.tryRestartPostgreSQL(ctx); restarted {
|
||||
e.log.Info("PostgreSQL restarted successfully - max_locks_per_transaction now active")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] PostgreSQL restart SUCCEEDED")
|
||||
}
|
||||
|
||||
// Wait for PostgreSQL to be ready
|
||||
time.Sleep(3 * time.Second)
|
||||
// Update original.MaxLocks to reflect the new value after restart
|
||||
var newMaxLocksStr string
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&newMaxLocksStr); err == nil {
|
||||
original.MaxLocks, _ = strconv.Atoi(newMaxLocksStr)
|
||||
e.log.Info("Verified new max_locks_per_transaction after restart", "value", original.MaxLocks)
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] Post-restart verification",
|
||||
"new_max_locks", original.MaxLocks,
|
||||
"target_was", lockBoostValue,
|
||||
"verification", "PASS")
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// Cannot restart - this is now a CRITICAL failure
|
||||
// We tried to boost locks but can't apply them without restart
|
||||
e.log.Error("CRITICAL: max_locks_per_transaction boost requires PostgreSQL restart")
|
||||
e.log.Error("Current value: "+strconv.Itoa(original.MaxLocks)+", required: "+strconv.Itoa(lockBoostValue))
|
||||
e.log.Error("The setting has been saved to postgresql.auto.conf but is NOT ACTIVE")
|
||||
e.log.Error("Restore will ABORT to prevent 'out of shared memory' failure")
|
||||
e.log.Error("Action required: Ask DBA to restart PostgreSQL, then retry restore")
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Error("🔍 [LOCK-DEBUG] PostgreSQL restart FAILED",
|
||||
"current_locks", original.MaxLocks,
|
||||
"required_locks", lockBoostValue,
|
||||
"setting_saved", true,
|
||||
"setting_active", false,
|
||||
"verdict", "ABORT - Manual restart required")
|
||||
}
|
||||
|
||||
// Return original settings so caller can check and abort
|
||||
return original, nil
|
||||
}
|
||||
}
|
||||
|
||||
if e.cfg.DebugLocks {
|
||||
e.log.Info("🔍 [LOCK-DEBUG] boostPostgreSQLSettings: Complete",
|
||||
"final_max_locks", original.MaxLocks,
|
||||
"target_was", lockBoostValue,
|
||||
"boost_successful", original.MaxLocks >= lockBoostValue)
|
||||
}
|
||||
|
||||
return original, nil
|
||||
}
|
||||
|
||||
// canRestartPostgreSQL checks if we have the ability to restart PostgreSQL
|
||||
// Returns false if running in a restricted environment (e.g., su postgres on enterprise systems)
|
||||
func (e *Engine) canRestartPostgreSQL() bool {
|
||||
// Check if we're running as postgres user - if so, we likely can't restart
|
||||
// because PostgreSQL is managed by init/systemd, not directly by pg_ctl
|
||||
currentUser := os.Getenv("USER")
|
||||
if currentUser == "" {
|
||||
currentUser = os.Getenv("LOGNAME")
|
||||
}
|
||||
|
||||
// If we're the postgres user, check if we have sudo access
|
||||
if currentUser == "postgres" {
|
||||
// Try a quick sudo check - if this fails, we can't restart
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(ctx, "sudo", "-n", "true")
|
||||
cmd.Stdin = nil
|
||||
if err := cmd.Run(); err != nil {
|
||||
e.log.Info("Running as postgres user without sudo access - cannot restart PostgreSQL",
|
||||
"user", currentUser,
|
||||
"hint", "Ask system administrator to restart PostgreSQL if needed")
|
||||
return false
|
||||
}
|
||||
}
|
||||
|
||||
return true
|
||||
}
|
||||
|
||||
// tryRestartPostgreSQL attempts to restart PostgreSQL using various methods
|
||||
// Returns true if restart was successful
|
||||
// IMPORTANT: Uses short timeouts and non-interactive sudo to avoid blocking on password prompts
|
||||
// NOTE: This function will return false immediately if running as postgres without sudo
|
||||
func (e *Engine) tryRestartPostgreSQL(ctx context.Context) bool {
|
||||
// First check if we can even attempt a restart
|
||||
if !e.canRestartPostgreSQL() {
|
||||
e.log.Info("Skipping PostgreSQL restart attempt (no privileges)")
|
||||
return false
|
||||
}
|
||||
|
||||
e.progress.Update("Attempting PostgreSQL restart for lock settings...")
|
||||
|
||||
// Use short timeout for each restart attempt (don't block on sudo password prompts)
|
||||
runWithTimeout := func(args ...string) bool {
|
||||
cmdCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
cmd := exec.CommandContext(cmdCtx, args[0], args[1:]...)
|
||||
// Set stdin to /dev/null to prevent sudo from waiting for password
|
||||
cmd.Stdin = nil
|
||||
return cmd.Run() == nil
|
||||
}
|
||||
|
||||
// Method 1: systemctl (most common on modern Linux) - use sudo -n for non-interactive
|
||||
if runWithTimeout("sudo", "-n", "systemctl", "restart", "postgresql") {
|
||||
return true
|
||||
}
|
||||
|
||||
// Method 2: systemctl with version suffix (e.g., postgresql-15)
|
||||
for _, ver := range []string{"17", "16", "15", "14", "13", "12"} {
|
||||
if runWithTimeout("sudo", "-n", "systemctl", "restart", "postgresql-"+ver) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
// Method 3: service command (older systems)
|
||||
if runWithTimeout("sudo", "-n", "service", "postgresql", "restart") {
|
||||
return true
|
||||
}
|
||||
|
||||
// Method 4: pg_ctl as postgres user (if we ARE postgres user, no sudo needed)
|
||||
if runWithTimeout("pg_ctl", "restart", "-D", "/var/lib/postgresql/data", "-m", "fast") {
|
||||
return true
|
||||
}
|
||||
|
||||
// Method 5: Try common PGDATA paths with pg_ctl directly (for postgres user)
|
||||
pgdataPaths := []string{
|
||||
"/var/lib/pgsql/data",
|
||||
"/var/lib/pgsql/17/data",
|
||||
"/var/lib/pgsql/16/data",
|
||||
"/var/lib/pgsql/15/data",
|
||||
"/var/lib/postgresql/17/main",
|
||||
"/var/lib/postgresql/16/main",
|
||||
"/var/lib/postgresql/15/main",
|
||||
}
|
||||
for _, pgdata := range pgdataPaths {
|
||||
if runWithTimeout("pg_ctl", "restart", "-D", pgdata, "-m", "fast") {
|
||||
return true
|
||||
}
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
|
||||
// resetPostgreSQLSettings restores original PostgreSQL settings
|
||||
// NOTE: max_locks_per_transaction changes are written but require restart to take effect.
|
||||
// We don't restart here since we're done with the restore.
|
||||
func (e *Engine) resetPostgreSQLSettings(ctx context.Context, original *OriginalSettings) error {
|
||||
connStr := e.buildConnString()
|
||||
db, err := sql.Open("pgx", connStr)
|
||||
@ -1812,25 +2781,28 @@ func (e *Engine) resetPostgreSQLSettings(ctx context.Context, original *Original
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
// Reset max_locks_per_transaction
|
||||
// Reset max_locks_per_transaction (will take effect on next restart)
|
||||
if original.MaxLocks == 64 { // Default
|
||||
db.ExecContext(ctx, "ALTER SYSTEM RESET max_locks_per_transaction")
|
||||
} else if original.MaxLocks > 0 {
|
||||
db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET max_locks_per_transaction = %d", original.MaxLocks))
|
||||
}
|
||||
|
||||
// Reset maintenance_work_mem
|
||||
// Reset maintenance_work_mem (takes effect immediately with reload)
|
||||
if original.MaintenanceWorkMem == "64MB" { // Default
|
||||
db.ExecContext(ctx, "ALTER SYSTEM RESET maintenance_work_mem")
|
||||
} else if original.MaintenanceWorkMem != "" {
|
||||
db.ExecContext(ctx, fmt.Sprintf("ALTER SYSTEM SET maintenance_work_mem = '%s'", original.MaintenanceWorkMem))
|
||||
}
|
||||
|
||||
// Reload config
|
||||
// Reload config (only maintenance_work_mem will take effect immediately)
|
||||
_, err = db.ExecContext(ctx, "SELECT pg_reload_conf()")
|
||||
if err != nil {
|
||||
return fmt.Errorf("failed to reload config: %w", err)
|
||||
}
|
||||
|
||||
e.log.Info("PostgreSQL settings reset queued",
|
||||
"note", "max_locks_per_transaction will revert on next PostgreSQL restart")
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
344
internal/restore/extract.go
Normal file
344
internal/restore/extract.go
Normal file
@ -0,0 +1,344 @@
|
||||
package restore
|
||||
|
||||
import (
|
||||
"archive/tar"
|
||||
"compress/gzip"
|
||||
"context"
|
||||
"fmt"
|
||||
"io"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"sort"
|
||||
"strings"
|
||||
|
||||
"dbbackup/internal/logger"
|
||||
"dbbackup/internal/progress"
|
||||
)
|
||||
|
||||
// DatabaseInfo represents metadata about a database in a cluster backup
|
||||
type DatabaseInfo struct {
|
||||
Name string
|
||||
Filename string
|
||||
Size int64
|
||||
}
|
||||
|
||||
// ListDatabasesInCluster lists all databases in a cluster backup archive
|
||||
func ListDatabasesInCluster(ctx context.Context, archivePath string, log logger.Logger) ([]DatabaseInfo, error) {
|
||||
file, err := os.Open(archivePath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot open archive: %w", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
gz, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("not a valid gzip archive: %w", err)
|
||||
}
|
||||
defer gz.Close()
|
||||
|
||||
tarReader := tar.NewReader(gz)
|
||||
databases := make([]DatabaseInfo, 0)
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return nil, ctx.Err()
|
||||
default:
|
||||
}
|
||||
|
||||
header, err := tarReader.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("error reading tar archive: %w", err)
|
||||
}
|
||||
|
||||
// Look for files in dumps/ directory
|
||||
if !header.FileInfo().IsDir() && strings.HasPrefix(header.Name, "dumps/") {
|
||||
filename := filepath.Base(header.Name)
|
||||
|
||||
// Extract database name from filename (remove .dump, .dump.gz, .sql, .sql.gz)
|
||||
dbName := filename
|
||||
dbName = strings.TrimSuffix(dbName, ".dump.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".dump")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql")
|
||||
|
||||
databases = append(databases, DatabaseInfo{
|
||||
Name: dbName,
|
||||
Filename: filename,
|
||||
Size: header.Size,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Sort by name for consistent output
|
||||
sort.Slice(databases, func(i, j int) bool {
|
||||
return databases[i].Name < databases[j].Name
|
||||
})
|
||||
|
||||
if len(databases) == 0 {
|
||||
return nil, fmt.Errorf("no databases found in cluster backup")
|
||||
}
|
||||
|
||||
return databases, nil
|
||||
}
|
||||
|
||||
// ExtractDatabaseFromCluster extracts a single database dump from cluster backup
|
||||
func ExtractDatabaseFromCluster(ctx context.Context, archivePath, dbName, outputDir string, log logger.Logger, prog progress.Indicator) (string, error) {
|
||||
file, err := os.Open(archivePath)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("cannot open archive: %w", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
stat, err := file.Stat()
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("cannot stat archive: %w", err)
|
||||
}
|
||||
archiveSize := stat.Size()
|
||||
|
||||
gz, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("not a valid gzip archive: %w", err)
|
||||
}
|
||||
defer gz.Close()
|
||||
|
||||
tarReader := tar.NewReader(gz)
|
||||
|
||||
// Create output directory if needed
|
||||
if err := os.MkdirAll(outputDir, 0755); err != nil {
|
||||
return "", fmt.Errorf("cannot create output directory: %w", err)
|
||||
}
|
||||
|
||||
targetPattern := fmt.Sprintf("dumps/%s.", dbName) // Match dbName.dump, dbName.sql, etc.
|
||||
var extractedPath string
|
||||
found := false
|
||||
|
||||
if prog != nil {
|
||||
prog.Start(fmt.Sprintf("Extracting database: %s", dbName))
|
||||
defer prog.Stop()
|
||||
}
|
||||
|
||||
var bytesRead int64
|
||||
ticker := make(chan struct{})
|
||||
stopTicker := make(chan struct{})
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-stopTicker:
|
||||
return
|
||||
case <-ticker:
|
||||
if prog != nil && archiveSize > 0 {
|
||||
percentage := float64(bytesRead) / float64(archiveSize) * 100
|
||||
prog.Update(fmt.Sprintf("Scanning: %.1f%%", percentage))
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
close(stopTicker)
|
||||
return "", ctx.Err()
|
||||
default:
|
||||
}
|
||||
|
||||
header, err := tarReader.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return "", fmt.Errorf("error reading tar archive: %w", err)
|
||||
}
|
||||
|
||||
bytesRead += header.Size
|
||||
select {
|
||||
case ticker <- struct{}{}:
|
||||
default:
|
||||
}
|
||||
|
||||
// Check if this is the database we're looking for
|
||||
if strings.HasPrefix(header.Name, targetPattern) && !header.FileInfo().IsDir() {
|
||||
filename := filepath.Base(header.Name)
|
||||
extractedPath = filepath.Join(outputDir, filename)
|
||||
|
||||
// Extract the file
|
||||
outFile, err := os.Create(extractedPath)
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return "", fmt.Errorf("cannot create output file: %w", err)
|
||||
}
|
||||
|
||||
if prog != nil {
|
||||
prog.Update(fmt.Sprintf("Extracting: %s", filename))
|
||||
}
|
||||
|
||||
written, err := io.Copy(outFile, tarReader)
|
||||
outFile.Close()
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return "", fmt.Errorf("extraction failed: %w", err)
|
||||
}
|
||||
|
||||
log.Info("Database extracted successfully", "database", dbName, "size", formatBytes(written), "path", extractedPath)
|
||||
found = true
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
close(stopTicker)
|
||||
|
||||
if !found {
|
||||
return "", fmt.Errorf("database '%s' not found in cluster backup", dbName)
|
||||
}
|
||||
|
||||
return extractedPath, nil
|
||||
}
|
||||
|
||||
// ExtractMultipleDatabasesFromCluster extracts multiple databases from cluster backup
|
||||
func ExtractMultipleDatabasesFromCluster(ctx context.Context, archivePath string, dbNames []string, outputDir string, log logger.Logger, prog progress.Indicator) (map[string]string, error) {
|
||||
file, err := os.Open(archivePath)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot open archive: %w", err)
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
stat, err := file.Stat()
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("cannot stat archive: %w", err)
|
||||
}
|
||||
archiveSize := stat.Size()
|
||||
|
||||
gz, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("not a valid gzip archive: %w", err)
|
||||
}
|
||||
defer gz.Close()
|
||||
|
||||
tarReader := tar.NewReader(gz)
|
||||
|
||||
// Create output directory if needed
|
||||
if err := os.MkdirAll(outputDir, 0755); err != nil {
|
||||
return nil, fmt.Errorf("cannot create output directory: %w", err)
|
||||
}
|
||||
|
||||
// Build lookup map
|
||||
targetDBs := make(map[string]bool)
|
||||
for _, dbName := range dbNames {
|
||||
targetDBs[dbName] = true
|
||||
}
|
||||
|
||||
extractedPaths := make(map[string]string)
|
||||
|
||||
if prog != nil {
|
||||
prog.Start(fmt.Sprintf("Extracting %d databases", len(dbNames)))
|
||||
defer prog.Stop()
|
||||
}
|
||||
|
||||
var bytesRead int64
|
||||
ticker := make(chan struct{})
|
||||
stopTicker := make(chan struct{})
|
||||
go func() {
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-stopTicker:
|
||||
return
|
||||
case <-ticker:
|
||||
if prog != nil && archiveSize > 0 {
|
||||
percentage := float64(bytesRead) / float64(archiveSize) * 100
|
||||
prog.Update(fmt.Sprintf("Scanning: %.1f%% (%d/%d found)", percentage, len(extractedPaths), len(dbNames)))
|
||||
}
|
||||
}
|
||||
}
|
||||
}()
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
close(stopTicker)
|
||||
return nil, ctx.Err()
|
||||
default:
|
||||
}
|
||||
|
||||
header, err := tarReader.Next()
|
||||
if err == io.EOF {
|
||||
break
|
||||
}
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return nil, fmt.Errorf("error reading tar archive: %w", err)
|
||||
}
|
||||
|
||||
bytesRead += header.Size
|
||||
select {
|
||||
case ticker <- struct{}{}:
|
||||
default:
|
||||
}
|
||||
|
||||
// Check if this is one of the databases we're looking for
|
||||
if strings.HasPrefix(header.Name, "dumps/") && !header.FileInfo().IsDir() {
|
||||
filename := filepath.Base(header.Name)
|
||||
|
||||
// Extract database name
|
||||
dbName := filename
|
||||
dbName = strings.TrimSuffix(dbName, ".dump.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".dump")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql.gz")
|
||||
dbName = strings.TrimSuffix(dbName, ".sql")
|
||||
|
||||
if targetDBs[dbName] {
|
||||
extractedPath := filepath.Join(outputDir, filename)
|
||||
|
||||
// Extract the file
|
||||
outFile, err := os.Create(extractedPath)
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return nil, fmt.Errorf("cannot create output file for %s: %w", dbName, err)
|
||||
}
|
||||
|
||||
if prog != nil {
|
||||
prog.Update(fmt.Sprintf("Extracting: %s (%d/%d)", dbName, len(extractedPaths)+1, len(dbNames)))
|
||||
}
|
||||
|
||||
written, err := io.Copy(outFile, tarReader)
|
||||
outFile.Close()
|
||||
if err != nil {
|
||||
close(stopTicker)
|
||||
return nil, fmt.Errorf("extraction failed for %s: %w", dbName, err)
|
||||
}
|
||||
|
||||
log.Info("Database extracted", "database", dbName, "size", formatBytes(written))
|
||||
extractedPaths[dbName] = extractedPath
|
||||
|
||||
// Stop early if we found all databases
|
||||
if len(extractedPaths) == len(dbNames) {
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
close(stopTicker)
|
||||
|
||||
// Check if all requested databases were found
|
||||
missing := make([]string, 0)
|
||||
for _, dbName := range dbNames {
|
||||
if _, found := extractedPaths[dbName]; !found {
|
||||
missing = append(missing, dbName)
|
||||
}
|
||||
}
|
||||
|
||||
if len(missing) > 0 {
|
||||
return extractedPaths, fmt.Errorf("databases not found in cluster backup: %s", strings.Join(missing, ", "))
|
||||
}
|
||||
|
||||
return extractedPaths, nil
|
||||
}
|
||||
363
internal/restore/large_db_guard.go
Normal file
363
internal/restore/large_db_guard.go
Normal file
@ -0,0 +1,363 @@
|
||||
package restore
|
||||
|
||||
import (
|
||||
"context"
|
||||
"database/sql"
|
||||
"fmt"
|
||||
"os"
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/logger"
|
||||
)
|
||||
|
||||
// LargeDBGuard provides bulletproof protection for large database restores
|
||||
type LargeDBGuard struct {
|
||||
log logger.Logger
|
||||
cfg *config.Config
|
||||
}
|
||||
|
||||
// RestoreStrategy determines how to restore based on database characteristics
|
||||
type RestoreStrategy struct {
|
||||
UseConservative bool // Force conservative (single-threaded) mode
|
||||
Reason string // Why this strategy was chosen
|
||||
Jobs int // Recommended --jobs value
|
||||
ParallelDBs int // Recommended parallel database restores
|
||||
ExpectedTime string // Estimated restore time
|
||||
}
|
||||
|
||||
// NewLargeDBGuard creates a new guard
|
||||
func NewLargeDBGuard(cfg *config.Config, log logger.Logger) *LargeDBGuard {
|
||||
return &LargeDBGuard{
|
||||
cfg: cfg,
|
||||
log: log,
|
||||
}
|
||||
}
|
||||
|
||||
// DetermineStrategy analyzes the restore and determines the safest approach
|
||||
func (g *LargeDBGuard) DetermineStrategy(ctx context.Context, archivePath string, dumpFiles []string) *RestoreStrategy {
|
||||
strategy := &RestoreStrategy{
|
||||
UseConservative: false,
|
||||
Jobs: 0, // Will use profile default
|
||||
ParallelDBs: 0, // Will use profile default
|
||||
}
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Large DB Guard: Starting strategy analysis",
|
||||
"archive", archivePath,
|
||||
"dump_count", len(dumpFiles))
|
||||
}
|
||||
|
||||
// 1. Check for large objects (BLOBs)
|
||||
hasLargeObjects, blobCount := g.detectLargeObjects(ctx, dumpFiles)
|
||||
if hasLargeObjects {
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("Database contains %d large objects (BLOBs)", blobCount)
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
|
||||
if blobCount > 10000 {
|
||||
strategy.ExpectedTime = "8-12 hours for very large BLOB database"
|
||||
} else if blobCount > 1000 {
|
||||
strategy.ExpectedTime = "4-8 hours for large BLOB database"
|
||||
} else {
|
||||
strategy.ExpectedTime = "2-4 hours"
|
||||
}
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: Forcing conservative mode",
|
||||
"blob_count", blobCount,
|
||||
"reason", strategy.Reason)
|
||||
return strategy
|
||||
}
|
||||
|
||||
// 2. Check total database size
|
||||
totalSize := g.estimateTotalSize(dumpFiles)
|
||||
if totalSize > 50*1024*1024*1024 { // > 50GB
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("Total database size: %s (>50GB)", FormatBytes(totalSize))
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
strategy.ExpectedTime = "6-10 hours for very large database"
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: Forcing conservative mode",
|
||||
"total_size_gb", totalSize/(1024*1024*1024),
|
||||
"reason", strategy.Reason)
|
||||
return strategy
|
||||
}
|
||||
|
||||
// 3. Check PostgreSQL lock configuration
|
||||
// CRITICAL: ALWAYS force conservative mode unless locks are 4096+
|
||||
// Parallel restore exhausts locks even with 2048 and high connection count
|
||||
// This is the PRIMARY protection - lock exhaustion is the #1 failure mode
|
||||
maxLocks, maxConns := g.checkLockConfiguration(ctx)
|
||||
lockCapacity := maxLocks * maxConns
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] PostgreSQL lock configuration detected",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"calculated_capacity", lockCapacity,
|
||||
"threshold_required", 4096,
|
||||
"below_threshold", maxLocks < 4096)
|
||||
}
|
||||
|
||||
if maxLocks < 4096 {
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("PostgreSQL max_locks_per_transaction=%d (need 4096+ for parallel restore)", maxLocks)
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: FORCING conservative mode - lock protection",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"total_capacity", lockCapacity,
|
||||
"required_locks", 4096,
|
||||
"reason", strategy.Reason)
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Guard decision: CONSERVATIVE mode",
|
||||
"jobs", 1,
|
||||
"parallel_dbs", 1,
|
||||
"reason", "Lock threshold not met (max_locks < 4096)")
|
||||
}
|
||||
return strategy
|
||||
}
|
||||
|
||||
g.log.Info("✅ Large DB Guard: Lock configuration OK for parallel restore",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"total_capacity", lockCapacity)
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Lock check PASSED - parallel restore allowed",
|
||||
"max_locks", maxLocks,
|
||||
"threshold", 4096,
|
||||
"verdict", "PASS")
|
||||
}
|
||||
|
||||
// 4. Check individual dump file sizes
|
||||
largestDump := g.findLargestDump(dumpFiles)
|
||||
if largestDump.size > 10*1024*1024*1024 { // > 10GB single dump
|
||||
strategy.UseConservative = true
|
||||
strategy.Reason = fmt.Sprintf("Largest database: %s (%s)", largestDump.name, FormatBytes(largestDump.size))
|
||||
strategy.Jobs = 1
|
||||
strategy.ParallelDBs = 1
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard: Forcing conservative mode",
|
||||
"largest_db", largestDump.name,
|
||||
"size_gb", largestDump.size/(1024*1024*1024),
|
||||
"reason", strategy.Reason)
|
||||
return strategy
|
||||
}
|
||||
|
||||
// All checks passed - safe to use default profile
|
||||
strategy.Reason = "No large database risks detected"
|
||||
g.log.Info("✅ Large DB Guard: Safe to use default profile")
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Final strategy: Default profile (no restrictions)",
|
||||
"use_conservative", false,
|
||||
"reason", strategy.Reason)
|
||||
}
|
||||
|
||||
return strategy
|
||||
}
|
||||
|
||||
// detectLargeObjects checks dump files for BLOBs/large objects
|
||||
func (g *LargeDBGuard) detectLargeObjects(ctx context.Context, dumpFiles []string) (bool, int) {
|
||||
totalBlobCount := 0
|
||||
|
||||
for _, dumpFile := range dumpFiles {
|
||||
// Skip if not a custom format dump
|
||||
if !strings.HasSuffix(dumpFile, ".dump") {
|
||||
continue
|
||||
}
|
||||
|
||||
// Use pg_restore -l to list contents (fast)
|
||||
listCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
|
||||
cmd := exec.CommandContext(listCtx, "pg_restore", "-l", dumpFile)
|
||||
output, err := cmd.Output()
|
||||
cancel()
|
||||
|
||||
if err != nil {
|
||||
continue // Skip on error
|
||||
}
|
||||
|
||||
// Count BLOB entries
|
||||
for _, line := range strings.Split(string(output), "\n") {
|
||||
if strings.Contains(line, "BLOB") ||
|
||||
strings.Contains(line, "LARGE OBJECT") ||
|
||||
strings.Contains(line, " BLOBS ") {
|
||||
totalBlobCount++
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return totalBlobCount > 0, totalBlobCount
|
||||
}
|
||||
|
||||
// estimateTotalSize calculates total size of all dump files
|
||||
func (g *LargeDBGuard) estimateTotalSize(dumpFiles []string) int64 {
|
||||
var total int64
|
||||
for _, file := range dumpFiles {
|
||||
if info, err := os.Stat(file); err == nil {
|
||||
total += info.Size()
|
||||
}
|
||||
}
|
||||
return total
|
||||
}
|
||||
|
||||
// checkLockCapacity gets PostgreSQL lock table capacity
|
||||
func (g *LargeDBGuard) checkLockCapacity(ctx context.Context) int {
|
||||
maxLocks, maxConns := g.checkLockConfiguration(ctx)
|
||||
maxPrepared := 0 // We don't use prepared transactions in restore
|
||||
|
||||
// Calculate total lock capacity
|
||||
capacity := maxLocks * (maxConns + maxPrepared)
|
||||
return capacity
|
||||
}
|
||||
|
||||
// checkLockConfiguration returns max_locks_per_transaction and max_connections
|
||||
func (g *LargeDBGuard) checkLockConfiguration(ctx context.Context) (int, int) {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Querying PostgreSQL for lock configuration",
|
||||
"host", g.cfg.Host,
|
||||
"port", g.cfg.Port,
|
||||
"user", g.cfg.User)
|
||||
}
|
||||
|
||||
// Build connection string
|
||||
connStr := fmt.Sprintf("host=%s port=%d user=%s password=%s dbname=postgres sslmode=disable",
|
||||
g.cfg.Host, g.cfg.Port, g.cfg.User, g.cfg.Password)
|
||||
|
||||
db, err := sql.Open("pgx", connStr)
|
||||
if err != nil {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Warn("🔍 [LOCK-DEBUG] Failed to connect to PostgreSQL, using defaults",
|
||||
"error", err,
|
||||
"default_max_locks", 64,
|
||||
"default_max_connections", 100)
|
||||
}
|
||||
return 64, 100 // PostgreSQL defaults
|
||||
}
|
||||
defer db.Close()
|
||||
|
||||
var maxLocks, maxConns int
|
||||
|
||||
// Get max_locks_per_transaction
|
||||
err = db.QueryRowContext(ctx, "SHOW max_locks_per_transaction").Scan(&maxLocks)
|
||||
if err != nil {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Warn("🔍 [LOCK-DEBUG] Failed to query max_locks_per_transaction",
|
||||
"error", err,
|
||||
"using_default", 64)
|
||||
}
|
||||
maxLocks = 64 // PostgreSQL default
|
||||
}
|
||||
|
||||
// Get max_connections
|
||||
err = db.QueryRowContext(ctx, "SHOW max_connections").Scan(&maxConns)
|
||||
if err != nil {
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Warn("🔍 [LOCK-DEBUG] Failed to query max_connections",
|
||||
"error", err,
|
||||
"using_default", 100)
|
||||
}
|
||||
maxConns = 100 // PostgreSQL default
|
||||
}
|
||||
|
||||
if g.cfg.DebugLocks {
|
||||
g.log.Info("🔍 [LOCK-DEBUG] Successfully retrieved PostgreSQL lock settings",
|
||||
"max_locks_per_transaction", maxLocks,
|
||||
"max_connections", maxConns,
|
||||
"total_capacity", maxLocks*maxConns)
|
||||
}
|
||||
|
||||
return maxLocks, maxConns
|
||||
}
|
||||
|
||||
// findLargestDump finds the largest individual dump file
|
||||
func (g *LargeDBGuard) findLargestDump(dumpFiles []string) struct {
|
||||
name string
|
||||
size int64
|
||||
} {
|
||||
var largest struct {
|
||||
name string
|
||||
size int64
|
||||
}
|
||||
|
||||
for _, file := range dumpFiles {
|
||||
if info, err := os.Stat(file); err == nil {
|
||||
if info.Size() > largest.size {
|
||||
largest.name = filepath.Base(file)
|
||||
largest.size = info.Size()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return largest
|
||||
}
|
||||
|
||||
// ApplyStrategy enforces the recommended strategy
|
||||
func (g *LargeDBGuard) ApplyStrategy(strategy *RestoreStrategy, cfg *config.Config) {
|
||||
if !strategy.UseConservative {
|
||||
return
|
||||
}
|
||||
|
||||
// Override configuration to force conservative settings
|
||||
if strategy.Jobs > 0 {
|
||||
cfg.Jobs = strategy.Jobs
|
||||
}
|
||||
if strategy.ParallelDBs > 0 {
|
||||
cfg.ClusterParallelism = strategy.ParallelDBs
|
||||
}
|
||||
|
||||
g.log.Warn("🛡️ Large DB Guard ACTIVE",
|
||||
"reason", strategy.Reason,
|
||||
"jobs", cfg.Jobs,
|
||||
"parallel_dbs", cfg.ClusterParallelism,
|
||||
"expected_time", strategy.ExpectedTime)
|
||||
}
|
||||
|
||||
// WarnUser displays prominent warning about single-threaded restore
|
||||
// In silent mode (TUI), this is skipped to prevent scrambled output
|
||||
func (g *LargeDBGuard) WarnUser(strategy *RestoreStrategy, silentMode bool) {
|
||||
if !strategy.UseConservative {
|
||||
return
|
||||
}
|
||||
|
||||
// In TUI/silent mode, don't print to stdout - it causes scrambled output
|
||||
if silentMode {
|
||||
// Log the warning instead for debugging
|
||||
g.log.Info("Large Database Protection Active",
|
||||
"reason", strategy.Reason,
|
||||
"jobs", strategy.Jobs,
|
||||
"parallel_dbs", strategy.ParallelDBs,
|
||||
"expected_time", strategy.ExpectedTime)
|
||||
return
|
||||
}
|
||||
|
||||
fmt.Println()
|
||||
fmt.Println("╔══════════════════════════════════════════════════════════════╗")
|
||||
fmt.Println("║ 🛡️ LARGE DATABASE PROTECTION ACTIVE 🛡️ ║")
|
||||
fmt.Println("╚══════════════════════════════════════════════════════════════╝")
|
||||
fmt.Println()
|
||||
fmt.Printf(" Reason: %s\n", strategy.Reason)
|
||||
fmt.Println()
|
||||
fmt.Println(" Strategy: SINGLE-THREADED RESTORE (Conservative Mode)")
|
||||
fmt.Println(" • Prevents PostgreSQL lock exhaustion")
|
||||
fmt.Println(" • Guarantees completion without 'out of shared memory' errors")
|
||||
fmt.Println(" • Slower but 100% reliable")
|
||||
fmt.Println()
|
||||
if strategy.ExpectedTime != "" {
|
||||
fmt.Printf(" Estimated Time: %s\n", strategy.ExpectedTime)
|
||||
fmt.Println()
|
||||
}
|
||||
fmt.Println(" This restore will complete successfully. Please be patient.")
|
||||
fmt.Println()
|
||||
fmt.Println("═══════════════════════════════════════════════════════════════")
|
||||
fmt.Println()
|
||||
}
|
||||
@ -16,6 +16,57 @@ import (
|
||||
"github.com/shirou/gopsutil/v3/mem"
|
||||
)
|
||||
|
||||
// CalculateOptimalParallel returns the recommended number of parallel workers
|
||||
// based on available system resources (CPU cores and RAM).
|
||||
// This is a standalone function that can be called from anywhere.
|
||||
// Returns 0 if resources cannot be detected.
|
||||
func CalculateOptimalParallel() int {
|
||||
cpuCores := runtime.NumCPU()
|
||||
|
||||
vmem, err := mem.VirtualMemory()
|
||||
if err != nil {
|
||||
// Fallback: use half of CPU cores if memory detection fails
|
||||
if cpuCores > 1 {
|
||||
return cpuCores / 2
|
||||
}
|
||||
return 1
|
||||
}
|
||||
|
||||
memAvailableGB := float64(vmem.Available) / (1024 * 1024 * 1024)
|
||||
|
||||
// Each pg_restore worker needs approximately 2-4GB of RAM
|
||||
// Use conservative 3GB per worker to avoid OOM
|
||||
const memPerWorkerGB = 3.0
|
||||
|
||||
// Calculate limits
|
||||
maxByMem := int(memAvailableGB / memPerWorkerGB)
|
||||
maxByCPU := cpuCores
|
||||
|
||||
// Use the minimum of memory and CPU limits
|
||||
recommended := maxByMem
|
||||
if maxByCPU < recommended {
|
||||
recommended = maxByCPU
|
||||
}
|
||||
|
||||
// Apply sensible bounds
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
if recommended > 16 {
|
||||
recommended = 16 // Cap at 16 to avoid diminishing returns
|
||||
}
|
||||
|
||||
// If memory pressure is high (>80%), reduce parallelism
|
||||
if vmem.UsedPercent > 80 && recommended > 1 {
|
||||
recommended = recommended / 2
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
}
|
||||
|
||||
return recommended
|
||||
}
|
||||
|
||||
// PreflightResult contains all preflight check results
|
||||
type PreflightResult struct {
|
||||
// Linux system checks
|
||||
@ -35,25 +86,29 @@ type PreflightResult struct {
|
||||
|
||||
// LinuxChecks contains Linux kernel/system checks
|
||||
type LinuxChecks struct {
|
||||
ShmMax int64 // /proc/sys/kernel/shmmax
|
||||
ShmAll int64 // /proc/sys/kernel/shmall
|
||||
MemTotal uint64 // Total RAM in bytes
|
||||
MemAvailable uint64 // Available RAM in bytes
|
||||
MemUsedPercent float64 // Memory usage percentage
|
||||
ShmMaxOK bool // Is shmmax sufficient?
|
||||
ShmAllOK bool // Is shmall sufficient?
|
||||
MemAvailableOK bool // Is available RAM sufficient?
|
||||
IsLinux bool // Are we running on Linux?
|
||||
ShmMax int64 // /proc/sys/kernel/shmmax
|
||||
ShmAll int64 // /proc/sys/kernel/shmall
|
||||
MemTotal uint64 // Total RAM in bytes
|
||||
MemAvailable uint64 // Available RAM in bytes
|
||||
MemUsedPercent float64 // Memory usage percentage
|
||||
CPUCores int // Number of CPU cores
|
||||
RecommendedParallel int // Auto-calculated optimal parallel count
|
||||
ShmMaxOK bool // Is shmmax sufficient?
|
||||
ShmAllOK bool // Is shmall sufficient?
|
||||
MemAvailableOK bool // Is available RAM sufficient?
|
||||
IsLinux bool // Are we running on Linux?
|
||||
}
|
||||
|
||||
// PostgreSQLChecks contains PostgreSQL configuration checks
|
||||
type PostgreSQLChecks struct {
|
||||
MaxLocksPerTransaction int // Current setting
|
||||
MaintenanceWorkMem string // Current setting
|
||||
SharedBuffers string // Current setting (info only)
|
||||
MaxConnections int // Current setting
|
||||
Version string // PostgreSQL version
|
||||
IsSuperuser bool // Can we modify settings?
|
||||
MaxLocksPerTransaction int // Current setting
|
||||
MaxPreparedTransactions int // Current setting (affects lock capacity)
|
||||
TotalLockCapacity int // Calculated: max_locks × (max_connections + max_prepared)
|
||||
MaintenanceWorkMem string // Current setting
|
||||
SharedBuffers string // Current setting (info only)
|
||||
MaxConnections int // Current setting
|
||||
Version string // PostgreSQL version
|
||||
IsSuperuser bool // Can we modify settings?
|
||||
}
|
||||
|
||||
// ArchiveChecks contains analysis of the backup archive
|
||||
@ -98,6 +153,7 @@ func (e *Engine) RunPreflightChecks(ctx context.Context, dumpsDir string, entrie
|
||||
// checkSystemResources uses gopsutil for cross-platform system checks
|
||||
func (e *Engine) checkSystemResources(result *PreflightResult) {
|
||||
result.Linux.IsLinux = runtime.GOOS == "linux"
|
||||
result.Linux.CPUCores = runtime.NumCPU()
|
||||
|
||||
// Get memory info (works on Linux, macOS, Windows, BSD)
|
||||
if vmem, err := mem.VirtualMemory(); err == nil {
|
||||
@ -116,6 +172,9 @@ func (e *Engine) checkSystemResources(result *PreflightResult) {
|
||||
e.log.Warn("Could not detect system memory", "error", err)
|
||||
}
|
||||
|
||||
// Calculate recommended parallel based on resources
|
||||
result.Linux.RecommendedParallel = e.calculateRecommendedParallel(result)
|
||||
|
||||
// Linux-specific kernel checks (shmmax, shmall)
|
||||
if result.Linux.IsLinux {
|
||||
e.checkLinuxKernel(result)
|
||||
@ -201,10 +260,70 @@ func (e *Engine) checkPostgreSQL(ctx context.Context, result *PreflightResult) {
|
||||
result.PostgreSQL.IsSuperuser = isSuperuser
|
||||
}
|
||||
|
||||
// Add info/warnings
|
||||
// Check max_prepared_transactions for lock capacity calculation
|
||||
var maxPreparedTxns string
|
||||
if err := db.QueryRowContext(ctx, "SHOW max_prepared_transactions").Scan(&maxPreparedTxns); err == nil {
|
||||
result.PostgreSQL.MaxPreparedTransactions, _ = strconv.Atoi(maxPreparedTxns)
|
||||
}
|
||||
|
||||
// CRITICAL: Calculate TOTAL lock table capacity
|
||||
// Formula: max_locks_per_transaction × (max_connections + max_prepared_transactions)
|
||||
// This is THE key capacity metric for BLOB-heavy restores
|
||||
maxConns := result.PostgreSQL.MaxConnections
|
||||
if maxConns == 0 {
|
||||
maxConns = 100 // default
|
||||
}
|
||||
maxPrepared := result.PostgreSQL.MaxPreparedTransactions
|
||||
totalLockCapacity := result.PostgreSQL.MaxLocksPerTransaction * (maxConns + maxPrepared)
|
||||
result.PostgreSQL.TotalLockCapacity = totalLockCapacity
|
||||
|
||||
e.log.Info("PostgreSQL lock table capacity",
|
||||
"max_locks_per_transaction", result.PostgreSQL.MaxLocksPerTransaction,
|
||||
"max_connections", maxConns,
|
||||
"max_prepared_transactions", maxPrepared,
|
||||
"total_lock_capacity", totalLockCapacity)
|
||||
|
||||
// CRITICAL: max_locks_per_transaction requires PostgreSQL RESTART to change!
|
||||
// Warn users loudly about this - it's the #1 cause of "out of shared memory" errors
|
||||
if result.PostgreSQL.MaxLocksPerTransaction < 256 {
|
||||
e.log.Info("PostgreSQL max_locks_per_transaction is low - will auto-boost",
|
||||
"current", result.PostgreSQL.MaxLocksPerTransaction)
|
||||
e.log.Warn("PostgreSQL max_locks_per_transaction is LOW",
|
||||
"current", result.PostgreSQL.MaxLocksPerTransaction,
|
||||
"recommended", "256+",
|
||||
"note", "REQUIRES PostgreSQL restart to change!")
|
||||
|
||||
result.Warnings = append(result.Warnings,
|
||||
fmt.Sprintf("max_locks_per_transaction=%d is low (recommend 256+). "+
|
||||
"This setting requires PostgreSQL RESTART to change. "+
|
||||
"BLOB-heavy databases may fail with 'out of shared memory' error. "+
|
||||
"Fix: Edit postgresql.conf, set max_locks_per_transaction=2048, then restart PostgreSQL.",
|
||||
result.PostgreSQL.MaxLocksPerTransaction))
|
||||
}
|
||||
|
||||
// NEW: Check total lock capacity is sufficient for typical BLOB operations
|
||||
// Minimum recommended: 200,000 for moderate BLOB databases
|
||||
minRecommendedCapacity := 200000
|
||||
if totalLockCapacity < minRecommendedCapacity {
|
||||
recommendedMaxLocks := minRecommendedCapacity / (maxConns + maxPrepared)
|
||||
if recommendedMaxLocks < 4096 {
|
||||
recommendedMaxLocks = 4096
|
||||
}
|
||||
|
||||
e.log.Warn("Total lock table capacity is LOW for BLOB-heavy restores",
|
||||
"current_capacity", totalLockCapacity,
|
||||
"recommended", minRecommendedCapacity,
|
||||
"current_max_locks", result.PostgreSQL.MaxLocksPerTransaction,
|
||||
"current_max_connections", maxConns,
|
||||
"recommended_max_locks", recommendedMaxLocks,
|
||||
"note", "VMs with fewer connections need higher max_locks_per_transaction")
|
||||
|
||||
result.Warnings = append(result.Warnings,
|
||||
fmt.Sprintf("Total lock capacity=%d is low (recommend %d+). "+
|
||||
"Capacity = max_locks_per_transaction(%d) × max_connections(%d). "+
|
||||
"If you reduced VM size/connections, increase max_locks_per_transaction to %d. "+
|
||||
"Fix: ALTER SYSTEM SET max_locks_per_transaction = %d; then RESTART PostgreSQL.",
|
||||
totalLockCapacity, minRecommendedCapacity,
|
||||
result.PostgreSQL.MaxLocksPerTransaction, maxConns,
|
||||
recommendedMaxLocks, recommendedMaxLocks))
|
||||
}
|
||||
|
||||
// Parse shared_buffers and warn if very low
|
||||
@ -315,22 +434,128 @@ func (e *Engine) calculateRecommendations(result *PreflightResult) {
|
||||
if result.Archive.TotalBlobCount > 50000 {
|
||||
lockBoost = 16384
|
||||
}
|
||||
if result.Archive.TotalBlobCount > 100000 {
|
||||
lockBoost = 32768
|
||||
}
|
||||
if result.Archive.TotalBlobCount > 200000 {
|
||||
lockBoost = 65536
|
||||
}
|
||||
|
||||
// Cap at reasonable maximum
|
||||
if lockBoost > 16384 {
|
||||
lockBoost = 16384
|
||||
// For extreme cases, calculate actual requirement
|
||||
// Rule of thumb: ~1 lock per BLOB, divided by max_connections (default 100)
|
||||
// Add 50% safety margin
|
||||
maxConns := result.PostgreSQL.MaxConnections
|
||||
if maxConns == 0 {
|
||||
maxConns = 100 // default
|
||||
}
|
||||
calculatedLocks := (result.Archive.TotalBlobCount / maxConns) * 3 / 2 // 1.5x safety margin
|
||||
if calculatedLocks > lockBoost {
|
||||
lockBoost = calculatedLocks
|
||||
}
|
||||
|
||||
result.Archive.RecommendedLockBoost = lockBoost
|
||||
|
||||
// CRITICAL: Check if current max_locks_per_transaction is dangerously low for this BLOB count
|
||||
currentLocks := result.PostgreSQL.MaxLocksPerTransaction
|
||||
if currentLocks > 0 && result.Archive.TotalBlobCount > 0 {
|
||||
// Estimate max BLOBs we can handle: locks * max_connections
|
||||
maxSafeBLOBs := currentLocks * maxConns
|
||||
|
||||
if result.Archive.TotalBlobCount > maxSafeBLOBs {
|
||||
severity := "WARNING"
|
||||
if result.Archive.TotalBlobCount > maxSafeBLOBs*2 {
|
||||
severity = "CRITICAL"
|
||||
result.CanProceed = false
|
||||
}
|
||||
|
||||
e.log.Error(fmt.Sprintf("%s: max_locks_per_transaction too low for BLOB count", severity),
|
||||
"current_max_locks", currentLocks,
|
||||
"total_blobs", result.Archive.TotalBlobCount,
|
||||
"max_safe_blobs", maxSafeBLOBs,
|
||||
"recommended_max_locks", lockBoost)
|
||||
|
||||
result.Errors = append(result.Errors,
|
||||
fmt.Sprintf("%s: Archive contains %s BLOBs but max_locks_per_transaction=%d can only safely handle ~%s. "+
|
||||
"Increase max_locks_per_transaction to %d in postgresql.conf and RESTART PostgreSQL.",
|
||||
severity,
|
||||
humanize.Comma(int64(result.Archive.TotalBlobCount)),
|
||||
currentLocks,
|
||||
humanize.Comma(int64(maxSafeBLOBs)),
|
||||
lockBoost))
|
||||
}
|
||||
}
|
||||
|
||||
// Log recommendation
|
||||
e.log.Info("Calculated recommended lock boost",
|
||||
"total_blobs", result.Archive.TotalBlobCount,
|
||||
"recommended_locks", lockBoost)
|
||||
}
|
||||
|
||||
// calculateRecommendedParallel determines optimal parallelism based on system resources
|
||||
// Returns the recommended number of parallel workers for pg_restore
|
||||
func (e *Engine) calculateRecommendedParallel(result *PreflightResult) int {
|
||||
cpuCores := result.Linux.CPUCores
|
||||
if cpuCores == 0 {
|
||||
cpuCores = runtime.NumCPU()
|
||||
}
|
||||
|
||||
memAvailableGB := float64(result.Linux.MemAvailable) / (1024 * 1024 * 1024)
|
||||
|
||||
// Each pg_restore worker needs approximately 2-4GB of RAM
|
||||
// Use conservative 3GB per worker to avoid OOM
|
||||
const memPerWorkerGB = 3.0
|
||||
|
||||
// Calculate limits
|
||||
maxByMem := int(memAvailableGB / memPerWorkerGB)
|
||||
maxByCPU := cpuCores
|
||||
|
||||
// Use the minimum of memory and CPU limits
|
||||
recommended := maxByMem
|
||||
if maxByCPU < recommended {
|
||||
recommended = maxByCPU
|
||||
}
|
||||
|
||||
// Apply sensible bounds
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
if recommended > 16 {
|
||||
recommended = 16 // Cap at 16 to avoid diminishing returns
|
||||
}
|
||||
|
||||
// If memory pressure is high (>80%), reduce parallelism
|
||||
if result.Linux.MemUsedPercent > 80 && recommended > 1 {
|
||||
recommended = recommended / 2
|
||||
if recommended < 1 {
|
||||
recommended = 1
|
||||
}
|
||||
}
|
||||
|
||||
e.log.Info("Calculated recommended parallel",
|
||||
"cpu_cores", cpuCores,
|
||||
"mem_available_gb", fmt.Sprintf("%.1f", memAvailableGB),
|
||||
"max_by_mem", maxByMem,
|
||||
"max_by_cpu", maxByCPU,
|
||||
"recommended", recommended)
|
||||
|
||||
return recommended
|
||||
}
|
||||
|
||||
// printPreflightSummary prints a nice summary of all checks
|
||||
// In silent mode (TUI), this is skipped and results are logged instead
|
||||
func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
// In TUI/silent mode, don't print to stdout - it causes scrambled output
|
||||
if e.silentMode {
|
||||
// Log summary instead for debugging
|
||||
e.log.Info("Preflight checks complete",
|
||||
"can_proceed", result.CanProceed,
|
||||
"warnings", len(result.Warnings),
|
||||
"errors", len(result.Errors),
|
||||
"total_blobs", result.Archive.TotalBlobCount,
|
||||
"recommended_locks", result.Archive.RecommendedLockBoost)
|
||||
return
|
||||
}
|
||||
|
||||
fmt.Println()
|
||||
fmt.Println(strings.Repeat("─", 60))
|
||||
fmt.Println(" PREFLIGHT CHECKS")
|
||||
@ -341,6 +566,8 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
printCheck("Total RAM", humanize.Bytes(result.Linux.MemTotal), true)
|
||||
printCheck("Available RAM", humanize.Bytes(result.Linux.MemAvailable), result.Linux.MemAvailableOK || result.Linux.MemAvailable == 0)
|
||||
printCheck("Memory Usage", fmt.Sprintf("%.1f%%", result.Linux.MemUsedPercent), result.Linux.MemUsedPercent < 85)
|
||||
printCheck("CPU Cores", fmt.Sprintf("%d", result.Linux.CPUCores), true)
|
||||
printCheck("Recommended Parallel", fmt.Sprintf("%d (auto-calculated)", result.Linux.RecommendedParallel), true)
|
||||
|
||||
// Linux-specific kernel checks
|
||||
if result.Linux.IsLinux && result.Linux.ShmMax > 0 {
|
||||
@ -356,6 +583,13 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
humanize.Comma(int64(result.PostgreSQL.MaxLocksPerTransaction)),
|
||||
humanize.Comma(int64(result.Archive.RecommendedLockBoost))),
|
||||
true)
|
||||
printCheck("max_connections", humanize.Comma(int64(result.PostgreSQL.MaxConnections)), true)
|
||||
// Show total lock capacity with warning if low
|
||||
totalCapacityOK := result.PostgreSQL.TotalLockCapacity >= 200000
|
||||
printCheck("Total Lock Capacity",
|
||||
fmt.Sprintf("%s (max_locks × max_conns)",
|
||||
humanize.Comma(int64(result.PostgreSQL.TotalLockCapacity))),
|
||||
totalCapacityOK)
|
||||
printCheck("maintenance_work_mem", fmt.Sprintf("%s → 2GB (auto-boost)",
|
||||
result.PostgreSQL.MaintenanceWorkMem), true)
|
||||
printInfo("shared_buffers", result.PostgreSQL.SharedBuffers)
|
||||
@ -377,6 +611,14 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
}
|
||||
}
|
||||
|
||||
// Errors (blocking issues)
|
||||
if len(result.Errors) > 0 {
|
||||
fmt.Println("\n ✗ ERRORS (must fix before proceeding):")
|
||||
for _, e := range result.Errors {
|
||||
fmt.Printf(" • %s\n", e)
|
||||
}
|
||||
}
|
||||
|
||||
// Warnings
|
||||
if len(result.Warnings) > 0 {
|
||||
fmt.Println("\n ⚠ Warnings:")
|
||||
@ -385,6 +627,23 @@ func (e *Engine) printPreflightSummary(result *PreflightResult) {
|
||||
}
|
||||
}
|
||||
|
||||
// Final status
|
||||
fmt.Println()
|
||||
if !result.CanProceed {
|
||||
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
|
||||
fmt.Println(" │ ✗ PREFLIGHT FAILED - Cannot proceed with restore │")
|
||||
fmt.Println(" │ Fix the errors above and try again. │")
|
||||
fmt.Println(" └─────────────────────────────────────────────────────────┘")
|
||||
} else if len(result.Warnings) > 0 {
|
||||
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
|
||||
fmt.Println(" │ ⚠ PREFLIGHT PASSED WITH WARNINGS - Proceed with care │")
|
||||
fmt.Println(" └─────────────────────────────────────────────────────────┘")
|
||||
} else {
|
||||
fmt.Println(" ┌─────────────────────────────────────────────────────────┐")
|
||||
fmt.Println(" │ ✓ PREFLIGHT PASSED - Ready to restore │")
|
||||
fmt.Println(" └─────────────────────────────────────────────────────────┘")
|
||||
}
|
||||
|
||||
fmt.Println(strings.Repeat("─", 60))
|
||||
fmt.Println()
|
||||
}
|
||||
|
||||
@ -190,7 +190,7 @@ func (s *Safety) validateSQLScriptGz(path string) error {
|
||||
return fmt.Errorf("does not appear to contain SQL content")
|
||||
}
|
||||
|
||||
// validateTarGz validates tar.gz archive
|
||||
// validateTarGz validates tar.gz archive with fast stream-based checks
|
||||
func (s *Safety) validateTarGz(path string) error {
|
||||
file, err := os.Open(path)
|
||||
if err != nil {
|
||||
@ -205,11 +205,40 @@ func (s *Safety) validateTarGz(path string) error {
|
||||
return fmt.Errorf("cannot read file header")
|
||||
}
|
||||
|
||||
if buffer[0] == 0x1f && buffer[1] == 0x8b {
|
||||
return nil // Valid gzip header
|
||||
if buffer[0] != 0x1f || buffer[1] != 0x8b {
|
||||
return fmt.Errorf("not a valid gzip file")
|
||||
}
|
||||
|
||||
return fmt.Errorf("not a valid gzip file")
|
||||
// Quick tar structure validation (stream-based, no full extraction)
|
||||
// Reset to start and decompress first few KB to check tar header
|
||||
file.Seek(0, 0)
|
||||
gzReader, err := gzip.NewReader(file)
|
||||
if err != nil {
|
||||
return fmt.Errorf("gzip corruption detected: %w", err)
|
||||
}
|
||||
defer gzReader.Close()
|
||||
|
||||
// Read first tar header to verify it's a valid tar archive
|
||||
headerBuf := make([]byte, 512) // Tar header is 512 bytes
|
||||
n, err = gzReader.Read(headerBuf)
|
||||
if err != nil && err != io.EOF {
|
||||
return fmt.Errorf("failed to read tar header: %w", err)
|
||||
}
|
||||
if n < 512 {
|
||||
return fmt.Errorf("archive too small or corrupted")
|
||||
}
|
||||
|
||||
// Check tar magic ("ustar\0" at offset 257)
|
||||
if len(headerBuf) >= 263 {
|
||||
magic := string(headerBuf[257:262])
|
||||
if magic != "ustar" {
|
||||
s.log.Debug("No tar magic found, but may still be valid tar", "magic", magic)
|
||||
// Don't fail - some tar implementations don't use magic
|
||||
}
|
||||
}
|
||||
|
||||
s.log.Debug("Cluster archive validation passed (stream-based check)")
|
||||
return nil // Valid gzip + tar structure
|
||||
}
|
||||
|
||||
// containsSQLKeywords checks if content contains SQL keywords
|
||||
@ -228,6 +257,42 @@ func containsSQLKeywords(content string) bool {
|
||||
return false
|
||||
}
|
||||
|
||||
// ValidateAndExtractCluster performs validation and pre-extraction for cluster restore
|
||||
// Returns path to extracted directory (in temp location) to avoid double-extraction
|
||||
// Caller must clean up the returned directory with os.RemoveAll() when done
|
||||
func (s *Safety) ValidateAndExtractCluster(ctx context.Context, archivePath string) (extractedDir string, err error) {
|
||||
// First validate archive integrity (fast stream check)
|
||||
if err := s.ValidateArchive(archivePath); err != nil {
|
||||
return "", fmt.Errorf("archive validation failed: %w", err)
|
||||
}
|
||||
|
||||
// Create temp directory for extraction in configured WorkDir
|
||||
workDir := s.cfg.GetEffectiveWorkDir()
|
||||
if workDir == "" {
|
||||
workDir = s.cfg.BackupDir
|
||||
}
|
||||
|
||||
tempDir, err := os.MkdirTemp(workDir, "dbbackup-cluster-extract-*")
|
||||
if err != nil {
|
||||
return "", fmt.Errorf("failed to create temp extraction directory in %s: %w", workDir, err)
|
||||
}
|
||||
|
||||
// Extract using tar command (fastest method)
|
||||
s.log.Info("Pre-extracting cluster archive for validation and restore",
|
||||
"archive", archivePath,
|
||||
"dest", tempDir)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "tar", "-xzf", archivePath, "-C", tempDir)
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
os.RemoveAll(tempDir) // Cleanup on failure
|
||||
return "", fmt.Errorf("extraction failed: %w: %s", err, string(output))
|
||||
}
|
||||
|
||||
s.log.Info("Cluster archive extracted successfully", "location", tempDir)
|
||||
return tempDir, nil
|
||||
}
|
||||
|
||||
// CheckDiskSpace verifies sufficient disk space for restore
|
||||
// Uses the effective work directory (WorkDir if set, otherwise BackupDir) since
|
||||
// that's where extraction actually happens for large databases
|
||||
@ -334,10 +399,12 @@ func (s *Safety) checkPostgresDatabaseExists(ctx context.Context, dbName string)
|
||||
"-tAc", fmt.Sprintf("SELECT 1 FROM pg_database WHERE datname='%s'", dbName),
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
if s.cfg.Host != "localhost" && s.cfg.Host != "127.0.0.1" && s.cfg.Host != "" {
|
||||
args = append([]string{"-h", s.cfg.Host}, args...)
|
||||
// Always add -h flag for explicit host connection (required for password auth)
|
||||
host := s.cfg.Host
|
||||
if host == "" {
|
||||
host = "localhost"
|
||||
}
|
||||
args = append([]string{"-h", host}, args...)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "psql", args...)
|
||||
|
||||
@ -346,9 +413,9 @@ func (s *Safety) checkPostgresDatabaseExists(ctx context.Context, dbName string)
|
||||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
|
||||
}
|
||||
|
||||
output, err := cmd.Output()
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
return false, fmt.Errorf("failed to check database existence: %w", err)
|
||||
return false, fmt.Errorf("failed to check database existence: %w (output: %s)", err, strings.TrimSpace(string(output)))
|
||||
}
|
||||
|
||||
return strings.TrimSpace(string(output)) == "1", nil
|
||||
@ -405,21 +472,29 @@ func (s *Safety) listPostgresUserDatabases(ctx context.Context) ([]string, error
|
||||
"-c", query,
|
||||
}
|
||||
|
||||
// Only add -h flag if host is not localhost (to use Unix socket for peer auth)
|
||||
if s.cfg.Host != "localhost" && s.cfg.Host != "127.0.0.1" && s.cfg.Host != "" {
|
||||
args = append([]string{"-h", s.cfg.Host}, args...)
|
||||
// Always add -h flag for explicit host connection (required for password auth)
|
||||
// Empty or unset host defaults to localhost
|
||||
host := s.cfg.Host
|
||||
if host == "" {
|
||||
host = "localhost"
|
||||
}
|
||||
args = append([]string{"-h", host}, args...)
|
||||
|
||||
cmd := exec.CommandContext(ctx, "psql", args...)
|
||||
|
||||
// Set password if provided
|
||||
// Set password - check config first, then environment
|
||||
env := os.Environ()
|
||||
if s.cfg.Password != "" {
|
||||
cmd.Env = append(os.Environ(), fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
|
||||
env = append(env, fmt.Sprintf("PGPASSWORD=%s", s.cfg.Password))
|
||||
}
|
||||
cmd.Env = env
|
||||
|
||||
output, err := cmd.Output()
|
||||
s.log.Debug("Listing PostgreSQL databases", "host", host, "port", s.cfg.Port, "user", s.cfg.User)
|
||||
|
||||
output, err := cmd.CombinedOutput()
|
||||
if err != nil {
|
||||
return nil, fmt.Errorf("failed to list databases: %w", err)
|
||||
// Include psql output in error for debugging
|
||||
return nil, fmt.Errorf("failed to list databases: %w (output: %s)", err, strings.TrimSpace(string(output)))
|
||||
}
|
||||
|
||||
// Parse output
|
||||
@ -432,6 +507,8 @@ func (s *Safety) listPostgresUserDatabases(ctx context.Context) ([]string, error
|
||||
}
|
||||
}
|
||||
|
||||
s.log.Debug("Found user databases", "count", len(databases), "databases", databases, "raw_output", string(output))
|
||||
|
||||
return databases, nil
|
||||
}
|
||||
|
||||
|
||||
@ -214,8 +214,9 @@ func (m ArchiveBrowserModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
|
||||
if m.mode == "restore-single" && selected.Format.IsClusterBackup() {
|
||||
m.message = errorStyle.Render("[FAIL] Please select a single database backup")
|
||||
return m, nil
|
||||
// Cluster backup selected in single restore mode - offer to select individual database
|
||||
clusterSelector := NewClusterDatabaseSelector(m.config, m.logger, m, m.ctx, selected, "single", false)
|
||||
return clusterSelector, clusterSelector.Init()
|
||||
}
|
||||
|
||||
// Open restore preview
|
||||
@ -223,6 +224,18 @@ func (m ArchiveBrowserModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
return preview, preview.Init()
|
||||
}
|
||||
|
||||
case "s":
|
||||
// Select single database from cluster (shortcut key)
|
||||
if len(m.archives) > 0 && m.cursor < len(m.archives) {
|
||||
selected := m.archives[m.cursor]
|
||||
if selected.Format.IsClusterBackup() {
|
||||
clusterSelector := NewClusterDatabaseSelector(m.config, m.logger, m, m.ctx, selected, "single", false)
|
||||
return clusterSelector, clusterSelector.Init()
|
||||
} else {
|
||||
m.message = infoStyle.Render("💡 [s] only works with cluster backups")
|
||||
}
|
||||
}
|
||||
|
||||
case "i":
|
||||
// Show detailed info
|
||||
if len(m.archives) > 0 && m.cursor < len(m.archives) {
|
||||
@ -351,7 +364,7 @@ func (m ArchiveBrowserModel) View() string {
|
||||
s.WriteString(infoStyle.Render(fmt.Sprintf("Total: %d archive(s) | Selected: %d/%d",
|
||||
len(m.archives), m.cursor+1, len(m.archives))))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render("[KEY] ↑/↓: Navigate | Enter: Select | d: Diagnose | f: Filter | i: Info | Esc: Back"))
|
||||
s.WriteString(infoStyle.Render("[KEY] ↑/↓: Navigate | Enter: Select | s: Single DB from Cluster | d: Diagnose | f: Filter | i: Info | Esc: Back"))
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
389
internal/tui/backup_exec.go
Executable file → Normal file
389
internal/tui/backup_exec.go
Executable file → Normal file
@ -4,6 +4,7 @@ import (
|
||||
"context"
|
||||
"fmt"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
tea "github.com/charmbracelet/bubbletea"
|
||||
@ -12,6 +13,14 @@ import (
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/database"
|
||||
"dbbackup/internal/logger"
|
||||
"path/filepath"
|
||||
)
|
||||
|
||||
// Backup phase constants for consistency
|
||||
const (
|
||||
backupPhaseGlobals = 1
|
||||
backupPhaseDatabases = 2
|
||||
backupPhaseCompressing = 3
|
||||
)
|
||||
|
||||
// BackupExecutionModel handles backup execution with progress
|
||||
@ -30,9 +39,81 @@ type BackupExecutionModel struct {
|
||||
cancelling bool // True when user has requested cancellation
|
||||
err error
|
||||
result string
|
||||
archivePath string // Path to created archive (for summary)
|
||||
archiveSize int64 // Size of created archive (for summary)
|
||||
startTime time.Time
|
||||
elapsed time.Duration // Final elapsed time
|
||||
details []string
|
||||
spinnerFrame int
|
||||
|
||||
// Database count progress (for cluster backup)
|
||||
dbTotal int
|
||||
dbDone int
|
||||
dbName string // Current database being backed up
|
||||
overallPhase int // 1=globals, 2=databases, 3=compressing
|
||||
phaseDesc string // Description of current phase
|
||||
phase2StartTime time.Time // When phase 2 (databases) started (for realtime ETA)
|
||||
dbPhaseElapsed time.Duration // Elapsed time since database backup phase started
|
||||
dbAvgPerDB time.Duration // Average time per database backup
|
||||
}
|
||||
|
||||
// sharedBackupProgressState holds progress state that can be safely accessed from callbacks
|
||||
type sharedBackupProgressState struct {
|
||||
mu sync.Mutex
|
||||
dbTotal int
|
||||
dbDone int
|
||||
dbName string
|
||||
overallPhase int // 1=globals, 2=databases, 3=compressing
|
||||
phaseDesc string // Description of current phase
|
||||
hasUpdate bool
|
||||
phase2StartTime time.Time // When phase 2 started (for realtime ETA calculation)
|
||||
dbPhaseElapsed time.Duration // Elapsed time since database backup phase started
|
||||
dbAvgPerDB time.Duration // Average time per database backup
|
||||
}
|
||||
|
||||
// Package-level shared progress state for backup operations
|
||||
var (
|
||||
currentBackupProgressMu sync.Mutex
|
||||
currentBackupProgressState *sharedBackupProgressState
|
||||
)
|
||||
|
||||
func setCurrentBackupProgress(state *sharedBackupProgressState) {
|
||||
currentBackupProgressMu.Lock()
|
||||
defer currentBackupProgressMu.Unlock()
|
||||
currentBackupProgressState = state
|
||||
}
|
||||
|
||||
func clearCurrentBackupProgress() {
|
||||
currentBackupProgressMu.Lock()
|
||||
defer currentBackupProgressMu.Unlock()
|
||||
currentBackupProgressState = nil
|
||||
}
|
||||
|
||||
func getCurrentBackupProgress() (dbTotal, dbDone int, dbName string, overallPhase int, phaseDesc string, hasUpdate bool, dbPhaseElapsed, dbAvgPerDB time.Duration, phase2StartTime time.Time) {
|
||||
currentBackupProgressMu.Lock()
|
||||
defer currentBackupProgressMu.Unlock()
|
||||
|
||||
if currentBackupProgressState == nil {
|
||||
return 0, 0, "", 0, "", false, 0, 0, time.Time{}
|
||||
}
|
||||
|
||||
currentBackupProgressState.mu.Lock()
|
||||
defer currentBackupProgressState.mu.Unlock()
|
||||
|
||||
hasUpdate = currentBackupProgressState.hasUpdate
|
||||
currentBackupProgressState.hasUpdate = false
|
||||
|
||||
// Calculate realtime phase elapsed if we have a phase 2 start time
|
||||
dbPhaseElapsed = currentBackupProgressState.dbPhaseElapsed
|
||||
if !currentBackupProgressState.phase2StartTime.IsZero() {
|
||||
dbPhaseElapsed = time.Since(currentBackupProgressState.phase2StartTime)
|
||||
}
|
||||
|
||||
return currentBackupProgressState.dbTotal, currentBackupProgressState.dbDone,
|
||||
currentBackupProgressState.dbName, currentBackupProgressState.overallPhase,
|
||||
currentBackupProgressState.phaseDesc, hasUpdate,
|
||||
dbPhaseElapsed, currentBackupProgressState.dbAvgPerDB,
|
||||
currentBackupProgressState.phase2StartTime
|
||||
}
|
||||
|
||||
func NewBackupExecution(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, backupType, dbName string, ratio int) BackupExecutionModel {
|
||||
@ -55,7 +136,6 @@ func NewBackupExecution(cfg *config.Config, log logger.Logger, parent tea.Model,
|
||||
}
|
||||
|
||||
func (m BackupExecutionModel) Init() tea.Cmd {
|
||||
// TUI handles all display through View() - no progress callbacks needed
|
||||
return tea.Batch(
|
||||
executeBackupWithTUIProgress(m.ctx, m.config, m.logger, m.backupType, m.databaseName, m.ratio),
|
||||
backupTickCmd(),
|
||||
@ -77,20 +157,26 @@ type backupProgressMsg struct {
|
||||
}
|
||||
|
||||
type backupCompleteMsg struct {
|
||||
result string
|
||||
err error
|
||||
result string
|
||||
err error
|
||||
archivePath string
|
||||
archiveSize int64
|
||||
elapsed time.Duration
|
||||
}
|
||||
|
||||
func executeBackupWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, backupType, dbName string, ratio int) tea.Cmd {
|
||||
return func() tea.Msg {
|
||||
// NO TIMEOUT for backup operations - a backup takes as long as it takes
|
||||
// Large databases can take many hours
|
||||
// Only manual cancellation (Ctrl+C) should stop the backup
|
||||
ctx, cancel := context.WithCancel(parentCtx)
|
||||
defer cancel()
|
||||
// Use the parent context directly - it's already cancellable from the model
|
||||
// DO NOT create a new context here as it breaks Ctrl+C cancellation
|
||||
ctx := parentCtx
|
||||
|
||||
start := time.Now()
|
||||
|
||||
// Setup shared progress state for TUI polling
|
||||
progressState := &sharedBackupProgressState{}
|
||||
setCurrentBackupProgress(progressState)
|
||||
defer clearCurrentBackupProgress()
|
||||
|
||||
dbClient, err := database.New(cfg, log)
|
||||
if err != nil {
|
||||
return backupCompleteMsg{
|
||||
@ -110,6 +196,22 @@ func executeBackupWithTUIProgress(parentCtx context.Context, cfg *config.Config,
|
||||
// Pass nil as indicator - TUI itself handles all display, no stdout printing
|
||||
engine := backup.NewSilent(cfg, log, dbClient, nil)
|
||||
|
||||
// Set database progress callback for cluster backups
|
||||
engine.SetDatabaseProgressCallback(func(done, total int, currentDB string) {
|
||||
progressState.mu.Lock()
|
||||
progressState.dbDone = done
|
||||
progressState.dbTotal = total
|
||||
progressState.dbName = currentDB
|
||||
progressState.overallPhase = backupPhaseDatabases
|
||||
progressState.phaseDesc = fmt.Sprintf("Phase 2/3: Backing up Databases (%d/%d)", done, total)
|
||||
progressState.hasUpdate = true
|
||||
// Set phase 2 start time on first callback (for realtime ETA calculation)
|
||||
if progressState.phase2StartTime.IsZero() {
|
||||
progressState.phase2StartTime = time.Now()
|
||||
}
|
||||
progressState.mu.Unlock()
|
||||
})
|
||||
|
||||
var backupErr error
|
||||
switch backupType {
|
||||
case "single":
|
||||
@ -144,8 +246,9 @@ func executeBackupWithTUIProgress(parentCtx context.Context, cfg *config.Config,
|
||||
}
|
||||
|
||||
return backupCompleteMsg{
|
||||
result: result,
|
||||
err: nil,
|
||||
result: result,
|
||||
err: nil,
|
||||
elapsed: elapsed,
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -157,10 +260,25 @@ func (m BackupExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
// Increment spinner frame for smooth animation
|
||||
m.spinnerFrame = (m.spinnerFrame + 1) % len(spinnerFrames)
|
||||
|
||||
// Update status based on elapsed time to show progress
|
||||
// Poll for database progress updates from callbacks
|
||||
dbTotal, dbDone, dbName, overallPhase, phaseDesc, hasUpdate, dbPhaseElapsed, dbAvgPerDB, _ := getCurrentBackupProgress()
|
||||
if hasUpdate {
|
||||
m.dbTotal = dbTotal
|
||||
m.dbDone = dbDone
|
||||
m.dbName = dbName
|
||||
m.overallPhase = overallPhase
|
||||
m.phaseDesc = phaseDesc
|
||||
m.dbPhaseElapsed = dbPhaseElapsed
|
||||
m.dbAvgPerDB = dbAvgPerDB
|
||||
}
|
||||
|
||||
// Update status based on progress and elapsed time
|
||||
elapsedSec := int(time.Since(m.startTime).Seconds())
|
||||
|
||||
if elapsedSec < 2 {
|
||||
if m.dbTotal > 0 && m.dbDone > 0 {
|
||||
// We have real progress from cluster backup
|
||||
m.status = fmt.Sprintf("Backing up database: %s", m.dbName)
|
||||
} else if elapsedSec < 2 {
|
||||
m.status = "Initializing backup..."
|
||||
} else if elapsedSec < 5 {
|
||||
if m.backupType == "cluster" {
|
||||
@ -199,6 +317,7 @@ func (m BackupExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.done = true
|
||||
m.err = msg.err
|
||||
m.result = msg.result
|
||||
m.elapsed = msg.elapsed
|
||||
if m.err == nil {
|
||||
m.status = "[OK] Backup completed successfully!"
|
||||
} else {
|
||||
@ -210,6 +329,20 @@ func (m BackupExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.InterruptMsg:
|
||||
// Handle Ctrl+C signal (SIGINT) - Bubbletea v1.3+ sends this instead of KeyMsg for ctrl+c
|
||||
if !m.done && !m.cancelling {
|
||||
m.cancelling = true
|
||||
m.status = "[STOP] Cancelling backup... (please wait)"
|
||||
if m.cancel != nil {
|
||||
m.cancel()
|
||||
}
|
||||
return m, nil
|
||||
} else if m.done {
|
||||
return m.parent, tea.Quit
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.KeyMsg:
|
||||
switch msg.String() {
|
||||
case "ctrl+c", "esc":
|
||||
@ -234,14 +367,80 @@ func (m BackupExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// renderDatabaseProgressBar renders a progress bar for database count progress
|
||||
func renderBackupDatabaseProgressBar(done, total int, dbName string, width int) string {
|
||||
if total == 0 {
|
||||
return ""
|
||||
}
|
||||
|
||||
// Calculate progress percentage
|
||||
percent := float64(done) / float64(total)
|
||||
if percent > 1.0 {
|
||||
percent = 1.0
|
||||
}
|
||||
|
||||
// Calculate filled width
|
||||
barWidth := width - 20 // Leave room for label and percentage
|
||||
if barWidth < 10 {
|
||||
barWidth = 10
|
||||
}
|
||||
filled := int(float64(barWidth) * percent)
|
||||
if filled > barWidth {
|
||||
filled = barWidth
|
||||
}
|
||||
|
||||
// Build progress bar
|
||||
bar := strings.Repeat("█", filled) + strings.Repeat("░", barWidth-filled)
|
||||
|
||||
return fmt.Sprintf(" Database: [%s] %d/%d", bar, done, total)
|
||||
}
|
||||
|
||||
// renderBackupDatabaseProgressBarWithTiming renders database backup progress with ETA
|
||||
func renderBackupDatabaseProgressBarWithTiming(done, total int, dbPhaseElapsed, dbAvgPerDB time.Duration) string {
|
||||
if total == 0 {
|
||||
return ""
|
||||
}
|
||||
|
||||
// Calculate progress percentage
|
||||
percent := float64(done) / float64(total)
|
||||
if percent > 1.0 {
|
||||
percent = 1.0
|
||||
}
|
||||
|
||||
// Build progress bar
|
||||
barWidth := 50
|
||||
filled := int(float64(barWidth) * percent)
|
||||
if filled > barWidth {
|
||||
filled = barWidth
|
||||
}
|
||||
bar := strings.Repeat("█", filled) + strings.Repeat("░", barWidth-filled)
|
||||
|
||||
// Calculate ETA similar to restore
|
||||
var etaStr string
|
||||
if done > 0 && done < total {
|
||||
avgPerDB := dbPhaseElapsed / time.Duration(done)
|
||||
remaining := total - done
|
||||
eta := avgPerDB * time.Duration(remaining)
|
||||
etaStr = fmt.Sprintf(" | ETA: %s", formatDuration(eta))
|
||||
} else if done == total {
|
||||
etaStr = " | Complete"
|
||||
}
|
||||
|
||||
return fmt.Sprintf(" Databases: [%s] %d/%d | Elapsed: %s%s\n",
|
||||
bar, done, total, formatDuration(dbPhaseElapsed), etaStr)
|
||||
}
|
||||
|
||||
func (m BackupExecutionModel) View() string {
|
||||
var s strings.Builder
|
||||
s.Grow(512) // Pre-allocate estimated capacity for better performance
|
||||
|
||||
// Clear screen with newlines and render header
|
||||
s.WriteString("\n\n")
|
||||
header := titleStyle.Render("[EXEC] Backup Execution")
|
||||
s.WriteString(header)
|
||||
header := "[EXEC] Backing up Database"
|
||||
if m.backupType == "cluster" {
|
||||
header = "[EXEC] Cluster Backup"
|
||||
}
|
||||
s.WriteString(titleStyle.Render(header))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Backup details - properly aligned
|
||||
@ -252,33 +451,159 @@ func (m BackupExecutionModel) View() string {
|
||||
if m.ratio > 0 {
|
||||
s.WriteString(fmt.Sprintf(" %-10s %d\n", "Sample:", m.ratio))
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" %-10s %s\n", "Duration:", time.Since(m.startTime).Round(time.Second)))
|
||||
s.WriteString("\n")
|
||||
|
||||
// Status with spinner
|
||||
// Status display
|
||||
if !m.done {
|
||||
if m.cancelling {
|
||||
s.WriteString(fmt.Sprintf(" %s %s\n", spinnerFrames[m.spinnerFrame], m.status))
|
||||
// Unified progress display for cluster backup
|
||||
if m.backupType == "cluster" {
|
||||
// Calculate overall progress across all phases
|
||||
// Phase 1: Globals (0-15%)
|
||||
// Phase 2: Databases (15-90%)
|
||||
// Phase 3: Compressing (90-100%)
|
||||
overallProgress := 0
|
||||
phaseLabel := "Starting..."
|
||||
|
||||
elapsedSec := int(time.Since(m.startTime).Seconds())
|
||||
|
||||
if m.overallPhase == backupPhaseDatabases && m.dbTotal > 0 {
|
||||
// Phase 2: Database backups - contributes 15-90%
|
||||
dbPct := int((int64(m.dbDone) * 100) / int64(m.dbTotal))
|
||||
overallProgress = 15 + (dbPct * 75 / 100)
|
||||
phaseLabel = m.phaseDesc
|
||||
} else if m.overallPhase == backupPhaseCompressing {
|
||||
// Phase 3: Compressing archive
|
||||
overallProgress = 92
|
||||
phaseLabel = "Phase 3/3: Compressing Archive"
|
||||
} else if elapsedSec < 5 {
|
||||
// Initial setup
|
||||
overallProgress = 2
|
||||
phaseLabel = "Phase 1/3: Initializing..."
|
||||
} else if m.dbTotal == 0 {
|
||||
// Phase 1: Globals backup (before databases start)
|
||||
overallProgress = 10
|
||||
phaseLabel = "Phase 1/3: Backing up Globals"
|
||||
}
|
||||
|
||||
// Header with phase and overall progress
|
||||
s.WriteString(infoStyle.Render(" ─── Cluster Backup Progress ──────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(fmt.Sprintf(" %s\n\n", phaseLabel))
|
||||
|
||||
// Overall progress bar
|
||||
s.WriteString(" Overall: ")
|
||||
s.WriteString(renderProgressBar(overallProgress))
|
||||
s.WriteString(fmt.Sprintf(" %d%%\n", overallProgress))
|
||||
|
||||
// Phase-specific details
|
||||
if m.dbTotal > 0 && m.dbDone > 0 {
|
||||
// Show current database being backed up
|
||||
s.WriteString("\n")
|
||||
spinner := spinnerFrames[m.spinnerFrame]
|
||||
if m.dbName != "" && m.dbDone <= m.dbTotal {
|
||||
s.WriteString(fmt.Sprintf(" Current: %s %s\n", spinner, m.dbName))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
// Database progress bar with timing
|
||||
s.WriteString(renderBackupDatabaseProgressBarWithTiming(m.dbDone, m.dbTotal, m.dbPhaseElapsed, m.dbAvgPerDB))
|
||||
s.WriteString("\n")
|
||||
} else {
|
||||
// Intermediate phase (globals)
|
||||
spinner := spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("\n %s %s\n\n", spinner, m.status))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" %s %s\n", spinnerFrames[m.spinnerFrame], m.status))
|
||||
s.WriteString("\n [KEY] Press Ctrl+C or ESC to cancel\n")
|
||||
// Single/sample database backup - simpler display
|
||||
spinner := spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf(" %s %s\n", spinner, m.status))
|
||||
}
|
||||
|
||||
if !m.cancelling {
|
||||
// Elapsed time
|
||||
s.WriteString(fmt.Sprintf("Elapsed: %s\n", formatDuration(time.Since(m.startTime))))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render("[KEYS] Press Ctrl+C or ESC to cancel"))
|
||||
}
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" %s\n\n", m.status))
|
||||
|
||||
// Show completion summary with detailed stats
|
||||
if m.err != nil {
|
||||
s.WriteString(fmt.Sprintf(" [FAIL] Error: %v\n", m.err))
|
||||
} else if m.result != "" {
|
||||
// Parse and display result cleanly
|
||||
lines := strings.Split(m.result, "\n")
|
||||
for _, line := range lines {
|
||||
line = strings.TrimSpace(line)
|
||||
if line != "" {
|
||||
s.WriteString(" " + line + "\n")
|
||||
}
|
||||
s.WriteString(errorStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("║ [FAIL] BACKUP FAILED ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(errorStyle.Render(fmt.Sprintf(" Error: %v", m.err)))
|
||||
s.WriteString("\n")
|
||||
} else {
|
||||
s.WriteString(successStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("║ [OK] BACKUP COMPLETED SUCCESSFULLY ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Summary section
|
||||
s.WriteString(infoStyle.Render(" ─── Summary ───────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Archive info (if available)
|
||||
if m.archivePath != "" {
|
||||
s.WriteString(fmt.Sprintf(" Archive: %s\n", filepath.Base(m.archivePath)))
|
||||
}
|
||||
if m.archiveSize > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Archive Size: %s\n", FormatBytes(m.archiveSize)))
|
||||
}
|
||||
|
||||
// Backup type specific info
|
||||
switch m.backupType {
|
||||
case "cluster":
|
||||
s.WriteString(" Type: Cluster Backup\n")
|
||||
if m.dbTotal > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Databases: %d backed up\n", m.dbTotal))
|
||||
}
|
||||
case "single":
|
||||
s.WriteString(" Type: Single Database Backup\n")
|
||||
s.WriteString(fmt.Sprintf(" Database: %s\n", m.databaseName))
|
||||
case "sample":
|
||||
s.WriteString(" Type: Sample Backup\n")
|
||||
s.WriteString(fmt.Sprintf(" Database: %s\n", m.databaseName))
|
||||
s.WriteString(fmt.Sprintf(" Sample Ratio: %d\n", m.ratio))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
}
|
||||
s.WriteString("\n [KEY] Press Enter or ESC to return to menu\n")
|
||||
|
||||
// Timing section (always shown, consistent with restore)
|
||||
s.WriteString(infoStyle.Render(" ─── Timing ────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
elapsed := m.elapsed
|
||||
if elapsed == 0 {
|
||||
elapsed = time.Since(m.startTime)
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" Total Time: %s\n", formatDuration(elapsed)))
|
||||
|
||||
// Calculate and show throughput if we have size info
|
||||
if m.archiveSize > 0 && elapsed.Seconds() > 0 {
|
||||
throughput := float64(m.archiveSize) / elapsed.Seconds()
|
||||
s.WriteString(fmt.Sprintf(" Throughput: %s/s (average)\n", FormatBytes(int64(throughput))))
|
||||
}
|
||||
|
||||
if m.backupType == "cluster" && m.dbTotal > 0 && m.err == nil {
|
||||
avgPerDB := elapsed / time.Duration(m.dbTotal)
|
||||
s.WriteString(fmt.Sprintf(" Avg per DB: %s\n", formatDuration(avgPerDB)))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(infoStyle.Render(" [KEYS] Press Enter to continue"))
|
||||
}
|
||||
|
||||
return s.String()
|
||||
|
||||
281
internal/tui/cluster_db_selector.go
Normal file
281
internal/tui/cluster_db_selector.go
Normal file
@ -0,0 +1,281 @@
|
||||
package tui
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"strings"
|
||||
|
||||
tea "github.com/charmbracelet/bubbletea"
|
||||
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/logger"
|
||||
"dbbackup/internal/restore"
|
||||
)
|
||||
|
||||
// ClusterDatabaseSelectorModel for selecting databases from a cluster backup
|
||||
type ClusterDatabaseSelectorModel struct {
|
||||
config *config.Config
|
||||
logger logger.Logger
|
||||
parent tea.Model
|
||||
ctx context.Context
|
||||
archive ArchiveInfo
|
||||
databases []restore.DatabaseInfo
|
||||
cursor int
|
||||
selected map[int]bool // Track multiple selections
|
||||
loading bool
|
||||
err error
|
||||
title string
|
||||
mode string // "single" or "multiple"
|
||||
extractOnly bool // If true, extract without restoring
|
||||
}
|
||||
|
||||
func NewClusterDatabaseSelector(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, archive ArchiveInfo, mode string, extractOnly bool) ClusterDatabaseSelectorModel {
|
||||
return ClusterDatabaseSelectorModel{
|
||||
config: cfg,
|
||||
logger: log,
|
||||
parent: parent,
|
||||
ctx: ctx,
|
||||
archive: archive,
|
||||
databases: nil,
|
||||
selected: make(map[int]bool),
|
||||
title: "Select Database(s) from Cluster Backup",
|
||||
loading: true,
|
||||
mode: mode,
|
||||
extractOnly: extractOnly,
|
||||
}
|
||||
}
|
||||
|
||||
func (m ClusterDatabaseSelectorModel) Init() tea.Cmd {
|
||||
return fetchClusterDatabases(m.ctx, m.archive, m.logger)
|
||||
}
|
||||
|
||||
type clusterDatabaseListMsg struct {
|
||||
databases []restore.DatabaseInfo
|
||||
err error
|
||||
}
|
||||
|
||||
func fetchClusterDatabases(ctx context.Context, archive ArchiveInfo, log logger.Logger) tea.Cmd {
|
||||
return func() tea.Msg {
|
||||
databases, err := restore.ListDatabasesInCluster(ctx, archive.Path, log)
|
||||
if err != nil {
|
||||
return clusterDatabaseListMsg{databases: nil, err: fmt.Errorf("failed to list databases: %w", err)}
|
||||
}
|
||||
return clusterDatabaseListMsg{databases: databases, err: nil}
|
||||
}
|
||||
}
|
||||
|
||||
func (m ClusterDatabaseSelectorModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
switch msg := msg.(type) {
|
||||
case clusterDatabaseListMsg:
|
||||
m.loading = false
|
||||
if msg.err != nil {
|
||||
m.err = msg.err
|
||||
} else {
|
||||
m.databases = msg.databases
|
||||
if len(m.databases) > 0 && m.mode == "single" {
|
||||
m.selected[0] = true // Pre-select first database in single mode
|
||||
}
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.KeyMsg:
|
||||
if m.loading {
|
||||
return m, nil
|
||||
}
|
||||
|
||||
switch msg.String() {
|
||||
case "q", "esc":
|
||||
// Return to parent
|
||||
return m.parent, nil
|
||||
|
||||
case "up", "k":
|
||||
if m.cursor > 0 {
|
||||
m.cursor--
|
||||
}
|
||||
|
||||
case "down", "j":
|
||||
if m.cursor < len(m.databases)-1 {
|
||||
m.cursor++
|
||||
}
|
||||
|
||||
case " ": // Space to toggle selection (multiple mode)
|
||||
if m.mode == "multiple" {
|
||||
m.selected[m.cursor] = !m.selected[m.cursor]
|
||||
} else {
|
||||
// Single mode: clear all and select current
|
||||
m.selected = make(map[int]bool)
|
||||
m.selected[m.cursor] = true
|
||||
}
|
||||
|
||||
case "enter":
|
||||
if m.err != nil {
|
||||
return m.parent, nil
|
||||
}
|
||||
|
||||
if len(m.databases) == 0 {
|
||||
return m.parent, nil
|
||||
}
|
||||
|
||||
// Get selected database(s)
|
||||
var selectedDBs []restore.DatabaseInfo
|
||||
for i, selected := range m.selected {
|
||||
if selected && i < len(m.databases) {
|
||||
selectedDBs = append(selectedDBs, m.databases[i])
|
||||
}
|
||||
}
|
||||
|
||||
if len(selectedDBs) == 0 {
|
||||
// No selection, use cursor position
|
||||
selectedDBs = []restore.DatabaseInfo{m.databases[m.cursor]}
|
||||
}
|
||||
|
||||
if m.extractOnly {
|
||||
// TODO: Implement extraction flow
|
||||
m.logger.Info("Extract-only mode not yet implemented in TUI")
|
||||
return m.parent, nil
|
||||
}
|
||||
|
||||
// For restore: proceed to restore preview/confirmation
|
||||
if len(selectedDBs) == 1 {
|
||||
// Single database restore from cluster
|
||||
// Create a temporary archive info for the selected database
|
||||
dbArchive := ArchiveInfo{
|
||||
Name: selectedDBs[0].Filename,
|
||||
Path: m.archive.Path, // Still use cluster archive path
|
||||
Format: m.archive.Format,
|
||||
Size: selectedDBs[0].Size,
|
||||
Modified: m.archive.Modified,
|
||||
DatabaseName: selectedDBs[0].Name,
|
||||
}
|
||||
|
||||
preview := NewRestorePreview(m.config, m.logger, m.parent, m.ctx, dbArchive, "restore-cluster-single")
|
||||
return preview, preview.Init()
|
||||
} else {
|
||||
// Multiple database restore - not yet implemented
|
||||
m.logger.Info("Multiple database restore not yet implemented in TUI")
|
||||
return m.parent, nil
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return m, nil
|
||||
}
|
||||
|
||||
func (m ClusterDatabaseSelectorModel) View() string {
|
||||
if m.loading {
|
||||
return TitleStyle.Render("Loading databases from cluster backup...") + "\n\nPlease wait..."
|
||||
}
|
||||
|
||||
if m.err != nil {
|
||||
var s strings.Builder
|
||||
s.WriteString(TitleStyle.Render("Error"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusErrorStyle.Render("Failed to list databases"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(m.err.Error())
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusReadyStyle.Render("Press any key to go back"))
|
||||
return s.String()
|
||||
}
|
||||
|
||||
if len(m.databases) == 0 {
|
||||
var s strings.Builder
|
||||
s.WriteString(TitleStyle.Render("No Databases Found"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusWarningStyle.Render("The cluster backup appears to be empty or invalid."))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(StatusReadyStyle.Render("Press any key to go back"))
|
||||
return s.String()
|
||||
}
|
||||
|
||||
var s strings.Builder
|
||||
|
||||
// Title
|
||||
s.WriteString(TitleStyle.Render(m.title))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Archive info
|
||||
s.WriteString(LabelStyle.Render("Archive: "))
|
||||
s.WriteString(m.archive.Name)
|
||||
s.WriteString("\n")
|
||||
s.WriteString(LabelStyle.Render("Databases: "))
|
||||
s.WriteString(fmt.Sprintf("%d", len(m.databases)))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Instructions
|
||||
if m.mode == "multiple" {
|
||||
s.WriteString(StatusReadyStyle.Render("↑/↓: navigate • space: select/deselect • enter: confirm • q/esc: back"))
|
||||
} else {
|
||||
s.WriteString(StatusReadyStyle.Render("↑/↓: navigate • enter: select • q/esc: back"))
|
||||
}
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Database list
|
||||
s.WriteString(ListHeaderStyle.Render("Available Databases:"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
for i, db := range m.databases {
|
||||
cursor := " "
|
||||
if m.cursor == i {
|
||||
cursor = "▶ "
|
||||
}
|
||||
|
||||
checkbox := ""
|
||||
if m.mode == "multiple" {
|
||||
if m.selected[i] {
|
||||
checkbox = "[✓] "
|
||||
} else {
|
||||
checkbox = "[ ] "
|
||||
}
|
||||
} else {
|
||||
if m.selected[i] {
|
||||
checkbox = "● "
|
||||
} else {
|
||||
checkbox = "○ "
|
||||
}
|
||||
}
|
||||
|
||||
sizeStr := formatBytes(db.Size)
|
||||
line := fmt.Sprintf("%s%s%-40s %10s", cursor, checkbox, db.Name, sizeStr)
|
||||
|
||||
if m.cursor == i {
|
||||
s.WriteString(ListSelectedStyle.Render(line))
|
||||
} else {
|
||||
s.WriteString(ListNormalStyle.Render(line))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
|
||||
// Selection summary
|
||||
selectedCount := 0
|
||||
var totalSize int64
|
||||
for i, selected := range m.selected {
|
||||
if selected && i < len(m.databases) {
|
||||
selectedCount++
|
||||
totalSize += m.databases[i].Size
|
||||
}
|
||||
}
|
||||
|
||||
if selectedCount > 0 {
|
||||
s.WriteString(StatusSuccessStyle.Render(fmt.Sprintf("Selected: %d database(s), Total size: %s", selectedCount, formatBytes(totalSize))))
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// formatBytes formats byte count as human-readable string
|
||||
func formatBytes(bytes int64) string {
|
||||
const unit = 1024
|
||||
if bytes < unit {
|
||||
return fmt.Sprintf("%d B", bytes)
|
||||
}
|
||||
div, exp := int64(unit), 0
|
||||
for n := bytes / unit; n >= unit; n /= unit {
|
||||
div *= unit
|
||||
exp++
|
||||
}
|
||||
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
406
internal/tui/detailed_progress.go
Normal file
406
internal/tui/detailed_progress.go
Normal file
@ -0,0 +1,406 @@
|
||||
package tui
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
)
|
||||
|
||||
// DetailedProgress provides schollz-like progress information for TUI rendering
|
||||
// This is a data structure that can be queried by Bubble Tea's View() method
|
||||
type DetailedProgress struct {
|
||||
mu sync.RWMutex
|
||||
|
||||
// Core progress
|
||||
Total int64 // Total bytes or items
|
||||
Current int64 // Current bytes or items done
|
||||
|
||||
// Display info
|
||||
Description string // What operation is happening
|
||||
Unit string // "bytes", "files", "databases", etc.
|
||||
|
||||
// Timing for ETA/speed calculation
|
||||
StartTime time.Time
|
||||
LastUpdate time.Time
|
||||
SpeedWindow []speedSample // Rolling window for speed calculation
|
||||
|
||||
// State
|
||||
IsIndeterminate bool // True if total is unknown (spinner mode)
|
||||
IsComplete bool
|
||||
IsFailed bool
|
||||
ErrorMessage string
|
||||
}
|
||||
|
||||
type speedSample struct {
|
||||
timestamp time.Time
|
||||
bytes int64
|
||||
}
|
||||
|
||||
// NewDetailedProgress creates a progress tracker with known total
|
||||
func NewDetailedProgress(total int64, description string) *DetailedProgress {
|
||||
return &DetailedProgress{
|
||||
Total: total,
|
||||
Description: description,
|
||||
Unit: "bytes",
|
||||
StartTime: time.Now(),
|
||||
LastUpdate: time.Now(),
|
||||
SpeedWindow: make([]speedSample, 0, 20),
|
||||
IsIndeterminate: total <= 0,
|
||||
}
|
||||
}
|
||||
|
||||
// NewDetailedProgressItems creates a progress tracker for item counts
|
||||
func NewDetailedProgressItems(total int, description string) *DetailedProgress {
|
||||
return &DetailedProgress{
|
||||
Total: int64(total),
|
||||
Description: description,
|
||||
Unit: "items",
|
||||
StartTime: time.Now(),
|
||||
LastUpdate: time.Now(),
|
||||
SpeedWindow: make([]speedSample, 0, 20),
|
||||
IsIndeterminate: total <= 0,
|
||||
}
|
||||
}
|
||||
|
||||
// NewDetailedProgressSpinner creates an indeterminate progress tracker
|
||||
func NewDetailedProgressSpinner(description string) *DetailedProgress {
|
||||
return &DetailedProgress{
|
||||
Total: -1,
|
||||
Description: description,
|
||||
Unit: "",
|
||||
StartTime: time.Now(),
|
||||
LastUpdate: time.Now(),
|
||||
SpeedWindow: make([]speedSample, 0, 20),
|
||||
IsIndeterminate: true,
|
||||
}
|
||||
}
|
||||
|
||||
// Add adds to the current progress
|
||||
func (dp *DetailedProgress) Add(n int64) {
|
||||
dp.mu.Lock()
|
||||
defer dp.mu.Unlock()
|
||||
|
||||
dp.Current += n
|
||||
dp.LastUpdate = time.Now()
|
||||
|
||||
// Add speed sample
|
||||
dp.SpeedWindow = append(dp.SpeedWindow, speedSample{
|
||||
timestamp: dp.LastUpdate,
|
||||
bytes: dp.Current,
|
||||
})
|
||||
|
||||
// Keep only last 20 samples for speed calculation
|
||||
if len(dp.SpeedWindow) > 20 {
|
||||
dp.SpeedWindow = dp.SpeedWindow[len(dp.SpeedWindow)-20:]
|
||||
}
|
||||
}
|
||||
|
||||
// Set sets the current progress to a specific value
|
||||
func (dp *DetailedProgress) Set(n int64) {
|
||||
dp.mu.Lock()
|
||||
defer dp.mu.Unlock()
|
||||
|
||||
dp.Current = n
|
||||
dp.LastUpdate = time.Now()
|
||||
|
||||
// Add speed sample
|
||||
dp.SpeedWindow = append(dp.SpeedWindow, speedSample{
|
||||
timestamp: dp.LastUpdate,
|
||||
bytes: dp.Current,
|
||||
})
|
||||
|
||||
if len(dp.SpeedWindow) > 20 {
|
||||
dp.SpeedWindow = dp.SpeedWindow[len(dp.SpeedWindow)-20:]
|
||||
}
|
||||
}
|
||||
|
||||
// SetTotal updates the total (useful when total becomes known during operation)
|
||||
func (dp *DetailedProgress) SetTotal(total int64) {
|
||||
dp.mu.Lock()
|
||||
defer dp.mu.Unlock()
|
||||
|
||||
dp.Total = total
|
||||
dp.IsIndeterminate = total <= 0
|
||||
}
|
||||
|
||||
// SetDescription updates the description
|
||||
func (dp *DetailedProgress) SetDescription(desc string) {
|
||||
dp.mu.Lock()
|
||||
defer dp.mu.Unlock()
|
||||
dp.Description = desc
|
||||
}
|
||||
|
||||
// Complete marks the progress as complete
|
||||
func (dp *DetailedProgress) Complete() {
|
||||
dp.mu.Lock()
|
||||
defer dp.mu.Unlock()
|
||||
|
||||
dp.IsComplete = true
|
||||
dp.Current = dp.Total
|
||||
}
|
||||
|
||||
// Fail marks the progress as failed
|
||||
func (dp *DetailedProgress) Fail(errMsg string) {
|
||||
dp.mu.Lock()
|
||||
defer dp.mu.Unlock()
|
||||
|
||||
dp.IsFailed = true
|
||||
dp.ErrorMessage = errMsg
|
||||
}
|
||||
|
||||
// GetPercent returns the progress percentage (0-100)
|
||||
func (dp *DetailedProgress) GetPercent() int {
|
||||
dp.mu.RLock()
|
||||
defer dp.mu.RUnlock()
|
||||
|
||||
if dp.IsIndeterminate || dp.Total <= 0 {
|
||||
return 0
|
||||
}
|
||||
percent := int((dp.Current * 100) / dp.Total)
|
||||
if percent > 100 {
|
||||
return 100
|
||||
}
|
||||
return percent
|
||||
}
|
||||
|
||||
// GetSpeed returns the current transfer speed in bytes/second
|
||||
func (dp *DetailedProgress) GetSpeed() float64 {
|
||||
dp.mu.RLock()
|
||||
defer dp.mu.RUnlock()
|
||||
|
||||
if len(dp.SpeedWindow) < 2 {
|
||||
return 0
|
||||
}
|
||||
|
||||
// Use first and last samples in window for smoothed speed
|
||||
first := dp.SpeedWindow[0]
|
||||
last := dp.SpeedWindow[len(dp.SpeedWindow)-1]
|
||||
|
||||
elapsed := last.timestamp.Sub(first.timestamp).Seconds()
|
||||
if elapsed <= 0 {
|
||||
return 0
|
||||
}
|
||||
|
||||
bytesTransferred := last.bytes - first.bytes
|
||||
return float64(bytesTransferred) / elapsed
|
||||
}
|
||||
|
||||
// GetETA returns the estimated time remaining
|
||||
func (dp *DetailedProgress) GetETA() time.Duration {
|
||||
dp.mu.RLock()
|
||||
defer dp.mu.RUnlock()
|
||||
|
||||
if dp.IsIndeterminate || dp.Total <= 0 || dp.Current >= dp.Total {
|
||||
return 0
|
||||
}
|
||||
|
||||
speed := dp.getSpeedLocked()
|
||||
if speed <= 0 {
|
||||
return 0
|
||||
}
|
||||
|
||||
remaining := dp.Total - dp.Current
|
||||
seconds := float64(remaining) / speed
|
||||
return time.Duration(seconds) * time.Second
|
||||
}
|
||||
|
||||
func (dp *DetailedProgress) getSpeedLocked() float64 {
|
||||
if len(dp.SpeedWindow) < 2 {
|
||||
return 0
|
||||
}
|
||||
|
||||
first := dp.SpeedWindow[0]
|
||||
last := dp.SpeedWindow[len(dp.SpeedWindow)-1]
|
||||
|
||||
elapsed := last.timestamp.Sub(first.timestamp).Seconds()
|
||||
if elapsed <= 0 {
|
||||
return 0
|
||||
}
|
||||
|
||||
bytesTransferred := last.bytes - first.bytes
|
||||
return float64(bytesTransferred) / elapsed
|
||||
}
|
||||
|
||||
// GetElapsed returns the elapsed time since start
|
||||
func (dp *DetailedProgress) GetElapsed() time.Duration {
|
||||
dp.mu.RLock()
|
||||
defer dp.mu.RUnlock()
|
||||
return time.Since(dp.StartTime)
|
||||
}
|
||||
|
||||
// GetState returns a snapshot of the current state for rendering
|
||||
func (dp *DetailedProgress) GetState() DetailedProgressState {
|
||||
dp.mu.RLock()
|
||||
defer dp.mu.RUnlock()
|
||||
|
||||
return DetailedProgressState{
|
||||
Description: dp.Description,
|
||||
Current: dp.Current,
|
||||
Total: dp.Total,
|
||||
Percent: dp.getPercentLocked(),
|
||||
Speed: dp.getSpeedLocked(),
|
||||
ETA: dp.getETALocked(),
|
||||
Elapsed: time.Since(dp.StartTime),
|
||||
Unit: dp.Unit,
|
||||
IsIndeterminate: dp.IsIndeterminate,
|
||||
IsComplete: dp.IsComplete,
|
||||
IsFailed: dp.IsFailed,
|
||||
ErrorMessage: dp.ErrorMessage,
|
||||
}
|
||||
}
|
||||
|
||||
func (dp *DetailedProgress) getPercentLocked() int {
|
||||
if dp.IsIndeterminate || dp.Total <= 0 {
|
||||
return 0
|
||||
}
|
||||
percent := int((dp.Current * 100) / dp.Total)
|
||||
if percent > 100 {
|
||||
return 100
|
||||
}
|
||||
return percent
|
||||
}
|
||||
|
||||
func (dp *DetailedProgress) getETALocked() time.Duration {
|
||||
if dp.IsIndeterminate || dp.Total <= 0 || dp.Current >= dp.Total {
|
||||
return 0
|
||||
}
|
||||
|
||||
speed := dp.getSpeedLocked()
|
||||
if speed <= 0 {
|
||||
return 0
|
||||
}
|
||||
|
||||
remaining := dp.Total - dp.Current
|
||||
seconds := float64(remaining) / speed
|
||||
return time.Duration(seconds) * time.Second
|
||||
}
|
||||
|
||||
// DetailedProgressState is an immutable snapshot for rendering
|
||||
type DetailedProgressState struct {
|
||||
Description string
|
||||
Current int64
|
||||
Total int64
|
||||
Percent int
|
||||
Speed float64 // bytes/sec
|
||||
ETA time.Duration
|
||||
Elapsed time.Duration
|
||||
Unit string
|
||||
IsIndeterminate bool
|
||||
IsComplete bool
|
||||
IsFailed bool
|
||||
ErrorMessage string
|
||||
}
|
||||
|
||||
// RenderProgressBar renders a TUI-friendly progress bar string
|
||||
// Returns something like: "Extracting archive [████████░░░░░░░░░░░░] 45% 12.5 MB/s ETA: 2m 30s"
|
||||
func (s DetailedProgressState) RenderProgressBar(width int) string {
|
||||
if s.IsIndeterminate {
|
||||
return s.renderIndeterminate()
|
||||
}
|
||||
|
||||
// Progress bar
|
||||
barWidth := 30
|
||||
if width < 80 {
|
||||
barWidth = 20
|
||||
}
|
||||
filled := (s.Percent * barWidth) / 100
|
||||
if filled > barWidth {
|
||||
filled = barWidth
|
||||
}
|
||||
|
||||
bar := strings.Repeat("█", filled) + strings.Repeat("░", barWidth-filled)
|
||||
|
||||
// Format bytes
|
||||
currentStr := FormatBytes(s.Current)
|
||||
totalStr := FormatBytes(s.Total)
|
||||
|
||||
// Format speed
|
||||
speedStr := ""
|
||||
if s.Speed > 0 {
|
||||
speedStr = fmt.Sprintf("%s/s", FormatBytes(int64(s.Speed)))
|
||||
}
|
||||
|
||||
// Format ETA
|
||||
etaStr := ""
|
||||
if s.ETA > 0 && !s.IsComplete {
|
||||
etaStr = fmt.Sprintf("ETA: %s", FormatDurationShort(s.ETA))
|
||||
}
|
||||
|
||||
// Build the line
|
||||
parts := []string{
|
||||
fmt.Sprintf("[%s]", bar),
|
||||
fmt.Sprintf("%3d%%", s.Percent),
|
||||
}
|
||||
|
||||
if s.Unit == "bytes" && s.Total > 0 {
|
||||
parts = append(parts, fmt.Sprintf("%s/%s", currentStr, totalStr))
|
||||
} else if s.Total > 0 {
|
||||
parts = append(parts, fmt.Sprintf("%d/%d", s.Current, s.Total))
|
||||
}
|
||||
|
||||
if speedStr != "" {
|
||||
parts = append(parts, speedStr)
|
||||
}
|
||||
if etaStr != "" {
|
||||
parts = append(parts, etaStr)
|
||||
}
|
||||
|
||||
return strings.Join(parts, " ")
|
||||
}
|
||||
|
||||
func (s DetailedProgressState) renderIndeterminate() string {
|
||||
elapsed := FormatDurationShort(s.Elapsed)
|
||||
return fmt.Sprintf("[spinner] %s Elapsed: %s", s.Description, elapsed)
|
||||
}
|
||||
|
||||
// RenderCompact renders a compact single-line progress string
|
||||
func (s DetailedProgressState) RenderCompact() string {
|
||||
if s.IsComplete {
|
||||
return fmt.Sprintf("[OK] %s completed in %s", s.Description, FormatDurationShort(s.Elapsed))
|
||||
}
|
||||
if s.IsFailed {
|
||||
return fmt.Sprintf("[FAIL] %s: %s", s.Description, s.ErrorMessage)
|
||||
}
|
||||
if s.IsIndeterminate {
|
||||
return fmt.Sprintf("[...] %s (%s)", s.Description, FormatDurationShort(s.Elapsed))
|
||||
}
|
||||
|
||||
return fmt.Sprintf("[%3d%%] %s - %s/%s", s.Percent, s.Description,
|
||||
FormatBytes(s.Current), FormatBytes(s.Total))
|
||||
}
|
||||
|
||||
// FormatBytes formats bytes in human-readable format
|
||||
func FormatBytes(b int64) string {
|
||||
const unit = 1024
|
||||
if b < unit {
|
||||
return fmt.Sprintf("%d B", b)
|
||||
}
|
||||
div, exp := int64(unit), 0
|
||||
for n := b / unit; n >= unit; n /= unit {
|
||||
div *= unit
|
||||
exp++
|
||||
}
|
||||
return fmt.Sprintf("%.1f %cB", float64(b)/float64(div), "KMGTPE"[exp])
|
||||
}
|
||||
|
||||
// FormatDurationShort formats duration in short form
|
||||
func FormatDurationShort(d time.Duration) string {
|
||||
if d < time.Second {
|
||||
return "<1s"
|
||||
}
|
||||
if d < time.Minute {
|
||||
return fmt.Sprintf("%ds", int(d.Seconds()))
|
||||
}
|
||||
if d < time.Hour {
|
||||
m := int(d.Minutes())
|
||||
s := int(d.Seconds()) % 60
|
||||
if s > 0 {
|
||||
return fmt.Sprintf("%dm %ds", m, s)
|
||||
}
|
||||
return fmt.Sprintf("%dm", m)
|
||||
}
|
||||
h := int(d.Hours())
|
||||
m := int(d.Minutes()) % 60
|
||||
return fmt.Sprintf("%dh %dm", h, m)
|
||||
}
|
||||
@ -188,6 +188,21 @@ func (m *MenuModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.InterruptMsg:
|
||||
// Handle Ctrl+C signal (SIGINT) - Bubbletea v1.3+ sends this
|
||||
if m.cancel != nil {
|
||||
m.cancel()
|
||||
}
|
||||
|
||||
// Clean up any orphaned processes before exit
|
||||
m.logger.Info("Cleaning up processes before exit (SIGINT)")
|
||||
if err := cleanup.KillOrphanedProcesses(m.logger); err != nil {
|
||||
m.logger.Warn("Failed to clean up all processes", "error", err)
|
||||
}
|
||||
|
||||
m.quitting = true
|
||||
return m, tea.Quit
|
||||
|
||||
case tea.KeyMsg:
|
||||
switch msg.String() {
|
||||
case "ctrl+c", "q":
|
||||
@ -284,9 +299,13 @@ func (m *MenuModel) View() string {
|
||||
|
||||
var s string
|
||||
|
||||
// Product branding header
|
||||
brandLine := fmt.Sprintf("dbbackup v%s • Enterprise Database Backup & Recovery", m.config.Version)
|
||||
s += "\n" + infoStyle.Render(brandLine) + "\n"
|
||||
|
||||
// Header
|
||||
header := titleStyle.Render("Database Backup Tool - Interactive Menu")
|
||||
s += fmt.Sprintf("\n%s\n\n", header)
|
||||
header := titleStyle.Render("Interactive Menu")
|
||||
s += fmt.Sprintf("%s\n\n", header)
|
||||
|
||||
if len(m.dbTypes) > 0 {
|
||||
options := make([]string, len(m.dbTypes))
|
||||
|
||||
@ -6,6 +6,7 @@ import (
|
||||
"os/exec"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
tea "github.com/charmbracelet/bubbletea"
|
||||
@ -45,6 +46,29 @@ type RestoreExecutionModel struct {
|
||||
spinnerFrame int
|
||||
spinnerFrames []string
|
||||
|
||||
// Detailed byte progress for schollz-style display
|
||||
bytesTotal int64
|
||||
bytesDone int64
|
||||
description string
|
||||
showBytes bool // True when we have real byte progress to show
|
||||
speed float64 // Rolling window speed in bytes/sec
|
||||
|
||||
// Database count progress (for cluster restore)
|
||||
dbTotal int
|
||||
dbDone int
|
||||
|
||||
// Current database being restored (for detailed display)
|
||||
currentDB string
|
||||
|
||||
// Timing info for database restore phase (ETA calculation)
|
||||
dbPhaseElapsed time.Duration // Elapsed time since restore phase started
|
||||
dbAvgPerDB time.Duration // Average time per database restore
|
||||
|
||||
// Overall progress tracking for unified display
|
||||
overallPhase int // 1=Extracting, 2=Globals, 3=Databases
|
||||
extractionDone bool
|
||||
extractionTime time.Duration // How long extraction took (for ETA calc)
|
||||
|
||||
// Results
|
||||
done bool
|
||||
cancelling bool // True when user has requested cancellation
|
||||
@ -97,10 +121,13 @@ func restoreTickCmd() tea.Cmd {
|
||||
}
|
||||
|
||||
type restoreProgressMsg struct {
|
||||
status string
|
||||
phase string
|
||||
progress int
|
||||
detail string
|
||||
status string
|
||||
phase string
|
||||
progress int
|
||||
detail string
|
||||
bytesTotal int64
|
||||
bytesDone int64
|
||||
description string
|
||||
}
|
||||
|
||||
type restoreCompleteMsg struct {
|
||||
@ -109,13 +136,134 @@ type restoreCompleteMsg struct {
|
||||
elapsed time.Duration
|
||||
}
|
||||
|
||||
// sharedProgressState holds progress state that can be safely accessed from callbacks
|
||||
type sharedProgressState struct {
|
||||
mu sync.Mutex
|
||||
bytesTotal int64
|
||||
bytesDone int64
|
||||
description string
|
||||
hasUpdate bool
|
||||
|
||||
// Database count progress (for cluster restore)
|
||||
dbTotal int
|
||||
dbDone int
|
||||
|
||||
// Current database being restored
|
||||
currentDB string
|
||||
|
||||
// Timing info for database restore phase
|
||||
dbPhaseElapsed time.Duration // Elapsed time since restore phase started
|
||||
dbAvgPerDB time.Duration // Average time per database restore
|
||||
phase3StartTime time.Time // When phase 3 started (for realtime ETA calculation)
|
||||
|
||||
// Overall phase tracking (1=Extract, 2=Globals, 3=Databases)
|
||||
overallPhase int
|
||||
extractionDone bool
|
||||
|
||||
// Weighted progress by database sizes (bytes)
|
||||
dbBytesTotal int64 // Total bytes across all databases
|
||||
dbBytesDone int64 // Bytes completed (sum of finished DB sizes)
|
||||
|
||||
// Rolling window for speed calculation
|
||||
speedSamples []restoreSpeedSample
|
||||
}
|
||||
|
||||
type restoreSpeedSample struct {
|
||||
timestamp time.Time
|
||||
bytes int64
|
||||
}
|
||||
|
||||
// Package-level shared progress state for restore operations
|
||||
var (
|
||||
currentRestoreProgressMu sync.Mutex
|
||||
currentRestoreProgressState *sharedProgressState
|
||||
)
|
||||
|
||||
func setCurrentRestoreProgress(state *sharedProgressState) {
|
||||
currentRestoreProgressMu.Lock()
|
||||
defer currentRestoreProgressMu.Unlock()
|
||||
currentRestoreProgressState = state
|
||||
}
|
||||
|
||||
func clearCurrentRestoreProgress() {
|
||||
currentRestoreProgressMu.Lock()
|
||||
defer currentRestoreProgressMu.Unlock()
|
||||
currentRestoreProgressState = nil
|
||||
}
|
||||
|
||||
func getCurrentRestoreProgress() (bytesTotal, bytesDone int64, description string, hasUpdate bool, dbTotal, dbDone int, speed float64, dbPhaseElapsed, dbAvgPerDB time.Duration, currentDB string, overallPhase int, extractionDone bool, dbBytesTotal, dbBytesDone int64, phase3StartTime time.Time) {
|
||||
currentRestoreProgressMu.Lock()
|
||||
defer currentRestoreProgressMu.Unlock()
|
||||
|
||||
if currentRestoreProgressState == nil {
|
||||
return 0, 0, "", false, 0, 0, 0, 0, 0, "", 0, false, 0, 0, time.Time{}
|
||||
}
|
||||
|
||||
currentRestoreProgressState.mu.Lock()
|
||||
defer currentRestoreProgressState.mu.Unlock()
|
||||
|
||||
// Calculate rolling window speed
|
||||
speed = calculateRollingSpeed(currentRestoreProgressState.speedSamples)
|
||||
|
||||
// Calculate realtime phase elapsed if we have a phase 3 start time
|
||||
dbPhaseElapsed = currentRestoreProgressState.dbPhaseElapsed
|
||||
if !currentRestoreProgressState.phase3StartTime.IsZero() {
|
||||
dbPhaseElapsed = time.Since(currentRestoreProgressState.phase3StartTime)
|
||||
}
|
||||
|
||||
return currentRestoreProgressState.bytesTotal, currentRestoreProgressState.bytesDone,
|
||||
currentRestoreProgressState.description, currentRestoreProgressState.hasUpdate,
|
||||
currentRestoreProgressState.dbTotal, currentRestoreProgressState.dbDone, speed,
|
||||
dbPhaseElapsed, currentRestoreProgressState.dbAvgPerDB,
|
||||
currentRestoreProgressState.currentDB, currentRestoreProgressState.overallPhase,
|
||||
currentRestoreProgressState.extractionDone,
|
||||
currentRestoreProgressState.dbBytesTotal, currentRestoreProgressState.dbBytesDone,
|
||||
currentRestoreProgressState.phase3StartTime
|
||||
}
|
||||
|
||||
// calculateRollingSpeed calculates speed from recent samples (last 5 seconds)
|
||||
func calculateRollingSpeed(samples []restoreSpeedSample) float64 {
|
||||
if len(samples) < 2 {
|
||||
return 0
|
||||
}
|
||||
|
||||
// Use samples from last 5 seconds for smoothed speed
|
||||
now := time.Now()
|
||||
cutoff := now.Add(-5 * time.Second)
|
||||
|
||||
var firstInWindow, lastInWindow *restoreSpeedSample
|
||||
for i := range samples {
|
||||
if samples[i].timestamp.After(cutoff) {
|
||||
if firstInWindow == nil {
|
||||
firstInWindow = &samples[i]
|
||||
}
|
||||
lastInWindow = &samples[i]
|
||||
}
|
||||
}
|
||||
|
||||
// Fall back to first and last if window is empty
|
||||
if firstInWindow == nil || lastInWindow == nil || firstInWindow == lastInWindow {
|
||||
firstInWindow = &samples[0]
|
||||
lastInWindow = &samples[len(samples)-1]
|
||||
}
|
||||
|
||||
elapsed := lastInWindow.timestamp.Sub(firstInWindow.timestamp).Seconds()
|
||||
if elapsed <= 0 {
|
||||
return 0
|
||||
}
|
||||
|
||||
bytesTransferred := lastInWindow.bytes - firstInWindow.bytes
|
||||
return float64(bytesTransferred) / elapsed
|
||||
}
|
||||
|
||||
// restoreProgressChannel allows sending progress updates from the restore goroutine
|
||||
type restoreProgressChannel chan restoreProgressMsg
|
||||
|
||||
func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string, cleanFirst, createIfMissing bool, restoreType string, cleanClusterFirst bool, existingDBs []string, saveDebugLog bool) tea.Cmd {
|
||||
return func() tea.Msg {
|
||||
// NO TIMEOUT for restore operations - a restore takes as long as it takes
|
||||
// Large databases with large objects can take many hours
|
||||
// Only manual cancellation (Ctrl+C) should stop the restore
|
||||
ctx, cancel := context.WithCancel(parentCtx)
|
||||
defer cancel()
|
||||
// Use the parent context directly - it's already cancellable from the model
|
||||
// DO NOT create a new context here as it breaks Ctrl+C cancellation
|
||||
ctx := parentCtx
|
||||
|
||||
start := time.Now()
|
||||
|
||||
@ -131,31 +279,144 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
defer dbClient.Close()
|
||||
|
||||
// STEP 1: Clean cluster if requested (drop all existing user databases)
|
||||
if restoreType == "restore-cluster" && cleanClusterFirst && len(existingDBs) > 0 {
|
||||
log.Info("Dropping existing user databases before cluster restore", "count", len(existingDBs))
|
||||
|
||||
// Drop databases using command-line psql (no connection required)
|
||||
// This matches how cluster restore works - uses CLI tools, not database connections
|
||||
droppedCount := 0
|
||||
for _, dbName := range existingDBs {
|
||||
// Create timeout context for each database drop (5 minutes per DB - large DBs take time)
|
||||
dropCtx, dropCancel := context.WithTimeout(ctx, 5*time.Minute)
|
||||
if err := dropDatabaseCLI(dropCtx, cfg, dbName); err != nil {
|
||||
log.Warn("Failed to drop database", "name", dbName, "error", err)
|
||||
// Continue with other databases
|
||||
} else {
|
||||
droppedCount++
|
||||
log.Info("Dropped database", "name", dbName)
|
||||
}
|
||||
dropCancel() // Clean up context
|
||||
if restoreType == "restore-cluster" && cleanClusterFirst {
|
||||
// Re-detect databases at execution time to get current state
|
||||
// The preview list may be stale or detection may have failed earlier
|
||||
safety := restore.NewSafety(cfg, log)
|
||||
currentDBs, err := safety.ListUserDatabases(ctx)
|
||||
if err != nil {
|
||||
log.Warn("Failed to list databases for cleanup, using preview list", "error", err)
|
||||
currentDBs = existingDBs // Fall back to preview list
|
||||
} else if len(currentDBs) > 0 {
|
||||
log.Info("Re-detected user databases for cleanup", "count", len(currentDBs), "databases", currentDBs)
|
||||
existingDBs = currentDBs // Update with fresh list
|
||||
}
|
||||
|
||||
log.Info("Cluster cleanup completed", "dropped", droppedCount, "total", len(existingDBs))
|
||||
if len(existingDBs) > 0 {
|
||||
log.Info("Dropping existing user databases before cluster restore", "count", len(existingDBs))
|
||||
|
||||
// Drop databases using command-line psql (no connection required)
|
||||
// This matches how cluster restore works - uses CLI tools, not database connections
|
||||
droppedCount := 0
|
||||
for _, dbName := range existingDBs {
|
||||
// Create timeout context for each database drop (5 minutes per DB - large DBs take time)
|
||||
dropCtx, dropCancel := context.WithTimeout(ctx, 5*time.Minute)
|
||||
if err := dropDatabaseCLI(dropCtx, cfg, dbName); err != nil {
|
||||
log.Warn("Failed to drop database", "name", dbName, "error", err)
|
||||
// Continue with other databases
|
||||
} else {
|
||||
droppedCount++
|
||||
log.Info("Dropped database", "name", dbName)
|
||||
}
|
||||
dropCancel() // Clean up context
|
||||
}
|
||||
|
||||
log.Info("Cluster cleanup completed", "dropped", droppedCount, "total", len(existingDBs))
|
||||
} else {
|
||||
log.Info("No user databases to clean up")
|
||||
}
|
||||
}
|
||||
|
||||
// STEP 2: Create restore engine with silent progress (no stdout interference with TUI)
|
||||
engine := restore.NewSilent(cfg, log, dbClient)
|
||||
|
||||
// Set up progress callback for detailed progress reporting
|
||||
// We use a shared pointer that can be queried by the TUI ticker
|
||||
progressState := &sharedProgressState{
|
||||
speedSamples: make([]restoreSpeedSample, 0, 100),
|
||||
}
|
||||
engine.SetProgressCallback(func(current, total int64, description string) {
|
||||
progressState.mu.Lock()
|
||||
defer progressState.mu.Unlock()
|
||||
progressState.bytesDone = current
|
||||
progressState.bytesTotal = total
|
||||
progressState.description = description
|
||||
progressState.hasUpdate = true
|
||||
progressState.overallPhase = 1
|
||||
progressState.extractionDone = false
|
||||
|
||||
// Check if extraction is complete
|
||||
if current >= total && total > 0 {
|
||||
progressState.extractionDone = true
|
||||
progressState.overallPhase = 2
|
||||
}
|
||||
|
||||
// Add speed sample for rolling window calculation
|
||||
progressState.speedSamples = append(progressState.speedSamples, restoreSpeedSample{
|
||||
timestamp: time.Now(),
|
||||
bytes: current,
|
||||
})
|
||||
// Keep only last 100 samples
|
||||
if len(progressState.speedSamples) > 100 {
|
||||
progressState.speedSamples = progressState.speedSamples[len(progressState.speedSamples)-100:]
|
||||
}
|
||||
})
|
||||
|
||||
// Set up database progress callback for cluster restore
|
||||
engine.SetDatabaseProgressCallback(func(done, total int, dbName string) {
|
||||
progressState.mu.Lock()
|
||||
defer progressState.mu.Unlock()
|
||||
progressState.dbDone = done
|
||||
progressState.dbTotal = total
|
||||
progressState.description = fmt.Sprintf("Restoring %s", dbName)
|
||||
progressState.currentDB = dbName
|
||||
progressState.overallPhase = 3
|
||||
progressState.extractionDone = true
|
||||
progressState.hasUpdate = true
|
||||
// Set phase 3 start time on first callback (for realtime ETA calculation)
|
||||
if progressState.phase3StartTime.IsZero() {
|
||||
progressState.phase3StartTime = time.Now()
|
||||
}
|
||||
// Clear byte progress when switching to db progress
|
||||
progressState.bytesTotal = 0
|
||||
progressState.bytesDone = 0
|
||||
})
|
||||
|
||||
// Set up timing-aware database progress callback for cluster restore ETA
|
||||
engine.SetDatabaseProgressWithTimingCallback(func(done, total int, dbName string, phaseElapsed, avgPerDB time.Duration) {
|
||||
progressState.mu.Lock()
|
||||
defer progressState.mu.Unlock()
|
||||
progressState.dbDone = done
|
||||
progressState.dbTotal = total
|
||||
progressState.description = fmt.Sprintf("Restoring %s", dbName)
|
||||
progressState.currentDB = dbName
|
||||
progressState.overallPhase = 3
|
||||
progressState.extractionDone = true
|
||||
progressState.dbPhaseElapsed = phaseElapsed
|
||||
progressState.dbAvgPerDB = avgPerDB
|
||||
progressState.hasUpdate = true
|
||||
// Set phase 3 start time on first callback (for realtime ETA calculation)
|
||||
if progressState.phase3StartTime.IsZero() {
|
||||
progressState.phase3StartTime = time.Now()
|
||||
}
|
||||
// Clear byte progress when switching to db progress
|
||||
progressState.bytesTotal = 0
|
||||
progressState.bytesDone = 0
|
||||
})
|
||||
|
||||
// Set up weighted (bytes-based) progress callback for accurate cluster restore progress
|
||||
engine.SetDatabaseProgressByBytesCallback(func(bytesDone, bytesTotal int64, dbName string, dbDone, dbTotal int) {
|
||||
progressState.mu.Lock()
|
||||
defer progressState.mu.Unlock()
|
||||
progressState.dbBytesDone = bytesDone
|
||||
progressState.dbBytesTotal = bytesTotal
|
||||
progressState.dbDone = dbDone
|
||||
progressState.dbTotal = dbTotal
|
||||
progressState.currentDB = dbName
|
||||
progressState.overallPhase = 3
|
||||
progressState.extractionDone = true
|
||||
progressState.hasUpdate = true
|
||||
// Set phase 3 start time on first callback (for realtime ETA calculation)
|
||||
if progressState.phase3StartTime.IsZero() {
|
||||
progressState.phase3StartTime = time.Now()
|
||||
}
|
||||
})
|
||||
|
||||
// Store progress state in a package-level variable for the ticker to access
|
||||
// This is a workaround because tea messages can't be sent from callbacks
|
||||
setCurrentRestoreProgress(progressState)
|
||||
defer clearCurrentRestoreProgress()
|
||||
|
||||
// Enable debug logging if requested
|
||||
if saveDebugLog {
|
||||
// Generate debug log path using configured WorkDir
|
||||
@ -165,13 +426,13 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
log.Info("Debug logging enabled", "path", debugLogPath)
|
||||
}
|
||||
|
||||
// Set up progress callback (but it won't work in goroutine - progress is already sent via logs)
|
||||
// The TUI will just use spinner animation to show activity
|
||||
|
||||
// STEP 3: Execute restore based on type
|
||||
var restoreErr error
|
||||
if restoreType == "restore-cluster" {
|
||||
restoreErr = engine.RestoreCluster(ctx, archive.Path)
|
||||
} else if restoreType == "restore-cluster-single" {
|
||||
// Restore single database from cluster backup
|
||||
restoreErr = engine.RestoreSingleFromCluster(ctx, archive.Path, targetDB, targetDB, cleanFirst, createIfMissing)
|
||||
} else {
|
||||
restoreErr = engine.RestoreSingle(ctx, archive.Path, targetDB, cleanFirst, createIfMissing)
|
||||
}
|
||||
@ -187,6 +448,8 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
||||
result := fmt.Sprintf("Successfully restored from %s", archive.Name)
|
||||
if restoreType == "restore-single" {
|
||||
result = fmt.Sprintf("Successfully restored '%s' from %s", targetDB, archive.Name)
|
||||
} else if restoreType == "restore-cluster-single" {
|
||||
result = fmt.Sprintf("Successfully restored '%s' from cluster %s", targetDB, archive.Name)
|
||||
} else if restoreType == "restore-cluster" && cleanClusterFirst {
|
||||
result = fmt.Sprintf("Successfully restored cluster from %s (cleaned %d existing database(s) first)", archive.Name, len(existingDBs))
|
||||
}
|
||||
@ -206,39 +469,91 @@ func (m RestoreExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.spinnerFrame = (m.spinnerFrame + 1) % len(m.spinnerFrames)
|
||||
m.elapsed = time.Since(m.startTime)
|
||||
|
||||
// Update status based on elapsed time to show progress
|
||||
// This provides visual feedback even though we don't have real-time progress
|
||||
elapsedSec := int(m.elapsed.Seconds())
|
||||
// Poll shared progress state for real-time updates
|
||||
// Note: dbPhaseElapsed is now calculated in realtime inside getCurrentRestoreProgress()
|
||||
bytesTotal, bytesDone, description, hasUpdate, dbTotal, dbDone, speed, dbPhaseElapsed, dbAvgPerDB, currentDB, overallPhase, extractionDone, dbBytesTotal, dbBytesDone, _ := getCurrentRestoreProgress()
|
||||
if hasUpdate && bytesTotal > 0 && !extractionDone {
|
||||
// Phase 1: Extraction
|
||||
m.bytesTotal = bytesTotal
|
||||
m.bytesDone = bytesDone
|
||||
m.description = description
|
||||
m.showBytes = true
|
||||
m.speed = speed
|
||||
m.overallPhase = 1
|
||||
m.extractionDone = false
|
||||
|
||||
if elapsedSec < 2 {
|
||||
m.status = "Initializing restore..."
|
||||
m.phase = "Starting"
|
||||
} else if elapsedSec < 5 {
|
||||
if m.cleanClusterFirst && len(m.existingDBs) > 0 {
|
||||
m.status = fmt.Sprintf("Cleaning %d existing database(s)...", len(m.existingDBs))
|
||||
m.phase = "Cleanup"
|
||||
} else if m.restoreType == "restore-cluster" {
|
||||
m.status = "Extracting cluster archive..."
|
||||
m.phase = "Extraction"
|
||||
// Update status to reflect actual progress
|
||||
m.status = description
|
||||
m.phase = "Phase 1/3: Extracting Archive"
|
||||
m.progress = int((bytesDone * 100) / bytesTotal)
|
||||
} else if hasUpdate && dbTotal > 0 {
|
||||
// Phase 3: Database restores
|
||||
m.dbTotal = dbTotal
|
||||
m.dbDone = dbDone
|
||||
m.dbPhaseElapsed = dbPhaseElapsed
|
||||
m.dbAvgPerDB = dbAvgPerDB
|
||||
m.currentDB = currentDB
|
||||
m.overallPhase = overallPhase
|
||||
m.extractionDone = extractionDone
|
||||
m.showBytes = false
|
||||
|
||||
if dbDone < dbTotal {
|
||||
m.status = fmt.Sprintf("Restoring: %s", currentDB)
|
||||
} else {
|
||||
m.status = "Preparing restore..."
|
||||
m.phase = "Preparation"
|
||||
m.status = "Finalizing..."
|
||||
}
|
||||
} else if elapsedSec < 10 {
|
||||
if m.restoreType == "restore-cluster" {
|
||||
m.status = "Restoring global objects..."
|
||||
m.phase = "Globals"
|
||||
|
||||
// Use weighted progress by bytes if available, otherwise use count
|
||||
if dbBytesTotal > 0 {
|
||||
weightedPercent := int((dbBytesDone * 100) / dbBytesTotal)
|
||||
m.phase = fmt.Sprintf("Phase 3/3: Databases (%d/%d) - %.1f%% by size", dbDone, dbTotal, float64(dbBytesDone*100)/float64(dbBytesTotal))
|
||||
m.progress = weightedPercent
|
||||
} else {
|
||||
m.status = fmt.Sprintf("Restoring database '%s'...", m.targetDB)
|
||||
m.phase = "Restore"
|
||||
m.phase = fmt.Sprintf("Phase 3/3: Databases (%d/%d)", dbDone, dbTotal)
|
||||
m.progress = int((dbDone * 100) / dbTotal)
|
||||
}
|
||||
} else if hasUpdate && extractionDone && dbTotal == 0 {
|
||||
// Phase 2: Globals restore (brief phase between extraction and databases)
|
||||
m.overallPhase = 2
|
||||
m.extractionDone = true
|
||||
m.showBytes = false
|
||||
m.status = "Restoring global objects (roles, tablespaces)..."
|
||||
m.phase = "Phase 2/3: Restoring Globals"
|
||||
} else {
|
||||
if m.restoreType == "restore-cluster" {
|
||||
m.status = "Restoring cluster databases..."
|
||||
m.phase = "Restore"
|
||||
// Fallback: Update status based on elapsed time to show progress
|
||||
// This provides visual feedback even though we don't have real-time progress
|
||||
elapsedSec := int(m.elapsed.Seconds())
|
||||
|
||||
if elapsedSec < 2 {
|
||||
m.status = "Initializing restore..."
|
||||
m.phase = "Starting"
|
||||
} else if elapsedSec < 5 {
|
||||
if m.cleanClusterFirst && len(m.existingDBs) > 0 {
|
||||
m.status = fmt.Sprintf("Cleaning %d existing database(s)...", len(m.existingDBs))
|
||||
m.phase = "Cleanup"
|
||||
} else if m.restoreType == "restore-cluster" {
|
||||
m.status = "Extracting cluster archive..."
|
||||
m.phase = "Extraction"
|
||||
} else {
|
||||
m.status = "Preparing restore..."
|
||||
m.phase = "Preparation"
|
||||
}
|
||||
} else if elapsedSec < 10 {
|
||||
if m.restoreType == "restore-cluster" {
|
||||
m.status = "Restoring global objects..."
|
||||
m.phase = "Globals"
|
||||
} else {
|
||||
m.status = fmt.Sprintf("Restoring database '%s'...", m.targetDB)
|
||||
m.phase = "Restore"
|
||||
}
|
||||
} else {
|
||||
m.status = fmt.Sprintf("Restoring database '%s'...", m.targetDB)
|
||||
m.phase = "Restore"
|
||||
if m.restoreType == "restore-cluster" {
|
||||
m.status = "Restoring cluster databases..."
|
||||
m.phase = "Restore"
|
||||
} else {
|
||||
m.status = fmt.Sprintf("Restoring database '%s'...", m.targetDB)
|
||||
m.phase = "Restore"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@ -250,6 +565,15 @@ func (m RestoreExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.status = msg.status
|
||||
m.phase = msg.phase
|
||||
m.progress = msg.progress
|
||||
|
||||
// Update byte-level progress if available
|
||||
if msg.bytesTotal > 0 {
|
||||
m.bytesTotal = msg.bytesTotal
|
||||
m.bytesDone = msg.bytesDone
|
||||
m.description = msg.description
|
||||
m.showBytes = true
|
||||
}
|
||||
|
||||
if msg.detail != "" {
|
||||
m.details = append(m.details, msg.detail)
|
||||
// Keep only last 5 details
|
||||
@ -279,6 +603,21 @@ func (m RestoreExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.InterruptMsg:
|
||||
// Handle Ctrl+C signal (SIGINT) - Bubbletea v1.3+ sends this instead of KeyMsg for ctrl+c
|
||||
if !m.done && !m.cancelling {
|
||||
m.cancelling = true
|
||||
m.status = "[STOP] Cancelling restore... (please wait)"
|
||||
m.phase = "Cancelling"
|
||||
if m.cancel != nil {
|
||||
m.cancel()
|
||||
}
|
||||
return m, nil
|
||||
} else if m.done {
|
||||
return m.parent, tea.Quit
|
||||
}
|
||||
return m, nil
|
||||
|
||||
case tea.KeyMsg:
|
||||
switch msg.String() {
|
||||
case "ctrl+c", "esc":
|
||||
@ -324,50 +663,179 @@ func (m RestoreExecutionModel) View() string {
|
||||
title := "[EXEC] Restoring Database"
|
||||
if m.restoreType == "restore-cluster" {
|
||||
title = "[EXEC] Restoring Cluster"
|
||||
} else if m.restoreType == "restore-cluster-single" {
|
||||
title = "[EXEC] Restoring Single Database from Cluster"
|
||||
}
|
||||
s.WriteString(titleStyle.Render(title))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Archive info
|
||||
s.WriteString(fmt.Sprintf("Archive: %s\n", m.archive.Name))
|
||||
if m.restoreType == "restore-single" {
|
||||
if m.restoreType == "restore-single" || m.restoreType == "restore-cluster-single" {
|
||||
s.WriteString(fmt.Sprintf("Target: %s\n", m.targetDB))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
if m.done {
|
||||
// Show result
|
||||
// Show result with comprehensive summary
|
||||
if m.err != nil {
|
||||
s.WriteString(errorStyle.Render("[FAIL] Restore Failed"))
|
||||
s.WriteString(errorStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("║ [FAIL] RESTORE FAILED ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(errorStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(errorStyle.Render(fmt.Sprintf("Error: %v", m.err)))
|
||||
|
||||
// Parse and display error in a clean, structured format
|
||||
errStr := m.err.Error()
|
||||
|
||||
// Extract key parts from the error message
|
||||
errDisplay := formatRestoreError(errStr)
|
||||
s.WriteString(errDisplay)
|
||||
s.WriteString("\n")
|
||||
} else {
|
||||
s.WriteString(successStyle.Render("[OK] Restore Completed Successfully"))
|
||||
s.WriteString(successStyle.Render("╔══════════════════════════════════════════════════════════════╗"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("║ [OK] RESTORE COMPLETED SUCCESSFULLY ║"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(successStyle.Render("╚══════════════════════════════════════════════════════════════╝"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(successStyle.Render(m.result))
|
||||
|
||||
// Summary section
|
||||
s.WriteString(infoStyle.Render(" ─── Summary ───────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Archive info
|
||||
s.WriteString(fmt.Sprintf(" Archive: %s\n", m.archive.Name))
|
||||
if m.archive.Size > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Archive Size: %s\n", FormatBytes(m.archive.Size)))
|
||||
}
|
||||
|
||||
// Restore type specific info
|
||||
if m.restoreType == "restore-cluster" {
|
||||
s.WriteString(fmt.Sprintf(" Type: Cluster Restore\n"))
|
||||
if m.dbTotal > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Databases: %d restored\n", m.dbTotal))
|
||||
}
|
||||
if m.cleanClusterFirst && len(m.existingDBs) > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Cleaned: %d existing database(s) dropped\n", len(m.existingDBs)))
|
||||
}
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" Type: Single Database Restore\n"))
|
||||
s.WriteString(fmt.Sprintf(" Target DB: %s\n", m.targetDB))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
s.WriteString(fmt.Sprintf("\nElapsed Time: %s\n", formatDuration(m.elapsed)))
|
||||
// Timing section
|
||||
s.WriteString(infoStyle.Render(" ─── Timing ────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(fmt.Sprintf(" Total Time: %s\n", formatDuration(m.elapsed)))
|
||||
|
||||
// Calculate and show throughput if we have size info
|
||||
if m.archive.Size > 0 && m.elapsed.Seconds() > 0 {
|
||||
throughput := float64(m.archive.Size) / m.elapsed.Seconds()
|
||||
s.WriteString(fmt.Sprintf(" Throughput: %s/s (average)\n", FormatBytes(int64(throughput))))
|
||||
}
|
||||
|
||||
if m.dbTotal > 0 && m.err == nil {
|
||||
avgPerDB := m.elapsed / time.Duration(m.dbTotal)
|
||||
s.WriteString(fmt.Sprintf(" Avg per DB: %s\n", formatDuration(avgPerDB)))
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render("[KEYS] Press Enter to continue"))
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(infoStyle.Render(" [KEYS] Press Enter to continue"))
|
||||
} else {
|
||||
// Show progress
|
||||
s.WriteString(fmt.Sprintf("Phase: %s\n", m.phase))
|
||||
// Show unified progress for cluster restore
|
||||
if m.restoreType == "restore-cluster" {
|
||||
// Calculate overall progress across all phases
|
||||
// Phase 1: Extraction (0-60%)
|
||||
// Phase 2: Globals (60-65%)
|
||||
// Phase 3: Databases (65-100%)
|
||||
overallProgress := 0
|
||||
phaseLabel := "Starting..."
|
||||
|
||||
// Show status with rotating spinner (unified indicator for all operations)
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("Status: %s %s\n", spinner, m.status))
|
||||
s.WriteString("\n")
|
||||
if m.showBytes && m.bytesTotal > 0 {
|
||||
// Phase 1: Extraction - contributes 0-60%
|
||||
extractPct := int((m.bytesDone * 100) / m.bytesTotal)
|
||||
overallProgress = (extractPct * 60) / 100
|
||||
phaseLabel = "Phase 1/3: Extracting Archive"
|
||||
} else if m.extractionDone && m.dbTotal == 0 {
|
||||
// Phase 2: Globals restore
|
||||
overallProgress = 62
|
||||
phaseLabel = "Phase 2/3: Restoring Globals"
|
||||
} else if m.dbTotal > 0 {
|
||||
// Phase 3: Database restores - contributes 65-100%
|
||||
dbPct := int((int64(m.dbDone) * 100) / int64(m.dbTotal))
|
||||
overallProgress = 65 + (dbPct * 35 / 100)
|
||||
phaseLabel = fmt.Sprintf("Phase 3/3: Databases (%d/%d)", m.dbDone, m.dbTotal)
|
||||
}
|
||||
|
||||
// Header with phase and overall progress
|
||||
s.WriteString(infoStyle.Render(" ─── Cluster Restore Progress ─────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(fmt.Sprintf(" %s\n\n", phaseLabel))
|
||||
|
||||
// Overall progress bar
|
||||
s.WriteString(" Overall: ")
|
||||
s.WriteString(renderProgressBar(overallProgress))
|
||||
s.WriteString(fmt.Sprintf(" %d%%\n", overallProgress))
|
||||
|
||||
// Phase-specific details
|
||||
if m.showBytes && m.bytesTotal > 0 {
|
||||
// Show extraction details
|
||||
s.WriteString("\n")
|
||||
s.WriteString(fmt.Sprintf(" %s\n", m.status))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(renderDetailedProgressBarWithSpeed(m.bytesDone, m.bytesTotal, m.speed))
|
||||
s.WriteString("\n")
|
||||
} else if m.dbTotal > 0 {
|
||||
// Show current database being restored
|
||||
s.WriteString("\n")
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
if m.currentDB != "" && m.dbDone < m.dbTotal {
|
||||
s.WriteString(fmt.Sprintf(" Current: %s %s\n", spinner, m.currentDB))
|
||||
} else if m.dbDone >= m.dbTotal {
|
||||
s.WriteString(fmt.Sprintf(" %s Finalizing...\n", spinner))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
// Database progress bar with timing
|
||||
s.WriteString(renderDatabaseProgressBarWithTiming(m.dbDone, m.dbTotal, m.dbPhaseElapsed, m.dbAvgPerDB))
|
||||
s.WriteString("\n")
|
||||
} else {
|
||||
// Intermediate phase (globals)
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("\n %s %s\n\n", spinner, m.status))
|
||||
}
|
||||
|
||||
// Only show progress bar for single database restore
|
||||
// Cluster restore uses spinner only (consistent with CLI behavior)
|
||||
if m.restoreType == "restore-single" {
|
||||
progressBar := renderProgressBar(m.progress)
|
||||
s.WriteString(progressBar)
|
||||
s.WriteString(fmt.Sprintf(" %d%%\n", m.progress))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ───────────────────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
} else {
|
||||
// Single database restore - simpler display
|
||||
s.WriteString(fmt.Sprintf("Phase: %s\n", m.phase))
|
||||
|
||||
// Show detailed progress bar when we have byte-level information
|
||||
if m.showBytes && m.bytesTotal > 0 {
|
||||
s.WriteString(fmt.Sprintf("Status: %s\n", m.status))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(renderDetailedProgressBarWithSpeed(m.bytesDone, m.bytesTotal, m.speed))
|
||||
s.WriteString("\n\n")
|
||||
} else {
|
||||
spinner := m.spinnerFrames[m.spinnerFrame]
|
||||
s.WriteString(fmt.Sprintf("Status: %s %s\n", spinner, m.status))
|
||||
s.WriteString("\n")
|
||||
|
||||
// Fallback to simple progress bar
|
||||
progressBar := renderProgressBar(m.progress)
|
||||
s.WriteString(progressBar)
|
||||
s.WriteString(fmt.Sprintf(" %d%%\n", m.progress))
|
||||
s.WriteString("\n")
|
||||
}
|
||||
}
|
||||
|
||||
// Elapsed time
|
||||
@ -390,6 +858,141 @@ func renderProgressBar(percent int) string {
|
||||
return successStyle.Render(bar) + infoStyle.Render(empty)
|
||||
}
|
||||
|
||||
// renderDetailedProgressBar renders a schollz-style progress bar with bytes, speed, and ETA
|
||||
// Uses elapsed time for speed calculation (fallback)
|
||||
func renderDetailedProgressBar(done, total int64, elapsed time.Duration) string {
|
||||
speed := 0.0
|
||||
if elapsed.Seconds() > 0 {
|
||||
speed = float64(done) / elapsed.Seconds()
|
||||
}
|
||||
return renderDetailedProgressBarWithSpeed(done, total, speed)
|
||||
}
|
||||
|
||||
// renderDetailedProgressBarWithSpeed renders a schollz-style progress bar with pre-calculated rolling speed
|
||||
func renderDetailedProgressBarWithSpeed(done, total int64, speed float64) string {
|
||||
var s strings.Builder
|
||||
|
||||
// Calculate percentage
|
||||
percent := 0
|
||||
if total > 0 {
|
||||
percent = int((done * 100) / total)
|
||||
if percent > 100 {
|
||||
percent = 100
|
||||
}
|
||||
}
|
||||
|
||||
// Render progress bar
|
||||
width := 30
|
||||
filled := (percent * width) / 100
|
||||
barFilled := strings.Repeat("█", filled)
|
||||
barEmpty := strings.Repeat("░", width-filled)
|
||||
|
||||
s.WriteString(successStyle.Render("["))
|
||||
s.WriteString(successStyle.Render(barFilled))
|
||||
s.WriteString(infoStyle.Render(barEmpty))
|
||||
s.WriteString(successStyle.Render("]"))
|
||||
|
||||
// Percentage
|
||||
s.WriteString(fmt.Sprintf(" %3d%%", percent))
|
||||
|
||||
// Bytes progress
|
||||
s.WriteString(fmt.Sprintf(" %s / %s", FormatBytes(done), FormatBytes(total)))
|
||||
|
||||
// Speed display (using rolling window speed)
|
||||
if speed > 0 {
|
||||
s.WriteString(fmt.Sprintf(" %s/s", FormatBytes(int64(speed))))
|
||||
|
||||
// ETA calculation based on rolling speed
|
||||
if done < total {
|
||||
remaining := total - done
|
||||
etaSeconds := float64(remaining) / speed
|
||||
eta := time.Duration(etaSeconds) * time.Second
|
||||
s.WriteString(fmt.Sprintf(" ETA: %s", FormatDurationShort(eta)))
|
||||
}
|
||||
}
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// renderDatabaseProgressBar renders a progress bar for database count (cluster restore)
|
||||
func renderDatabaseProgressBar(done, total int) string {
|
||||
var s strings.Builder
|
||||
|
||||
// Calculate percentage
|
||||
percent := 0
|
||||
if total > 0 {
|
||||
percent = (done * 100) / total
|
||||
if percent > 100 {
|
||||
percent = 100
|
||||
}
|
||||
}
|
||||
|
||||
// Render progress bar
|
||||
width := 30
|
||||
filled := (percent * width) / 100
|
||||
barFilled := strings.Repeat("█", filled)
|
||||
barEmpty := strings.Repeat("░", width-filled)
|
||||
|
||||
s.WriteString(successStyle.Render("["))
|
||||
s.WriteString(successStyle.Render(barFilled))
|
||||
s.WriteString(infoStyle.Render(barEmpty))
|
||||
s.WriteString(successStyle.Render("]"))
|
||||
|
||||
// Count and percentage
|
||||
s.WriteString(fmt.Sprintf(" %3d%% %d / %d databases", percent, done, total))
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// renderDatabaseProgressBarWithTiming renders a progress bar for database count with timing and ETA
|
||||
func renderDatabaseProgressBarWithTiming(done, total int, phaseElapsed, avgPerDB time.Duration) string {
|
||||
var s strings.Builder
|
||||
|
||||
// Calculate percentage
|
||||
percent := 0
|
||||
if total > 0 {
|
||||
percent = (done * 100) / total
|
||||
if percent > 100 {
|
||||
percent = 100
|
||||
}
|
||||
}
|
||||
|
||||
// Render progress bar
|
||||
width := 30
|
||||
filled := (percent * width) / 100
|
||||
barFilled := strings.Repeat("█", filled)
|
||||
barEmpty := strings.Repeat("░", width-filled)
|
||||
|
||||
s.WriteString(successStyle.Render("["))
|
||||
s.WriteString(successStyle.Render(barFilled))
|
||||
s.WriteString(infoStyle.Render(barEmpty))
|
||||
s.WriteString(successStyle.Render("]"))
|
||||
|
||||
// Count and percentage
|
||||
s.WriteString(fmt.Sprintf(" %3d%% %d / %d databases", percent, done, total))
|
||||
|
||||
// Timing and ETA
|
||||
if phaseElapsed > 0 {
|
||||
s.WriteString(fmt.Sprintf(" [%s", FormatDurationShort(phaseElapsed)))
|
||||
|
||||
// Calculate ETA based on average time per database
|
||||
if avgPerDB > 0 && done < total {
|
||||
remainingDBs := total - done
|
||||
eta := time.Duration(remainingDBs) * avgPerDB
|
||||
s.WriteString(fmt.Sprintf(" / ETA: %s", FormatDurationShort(eta)))
|
||||
} else if done > 0 && done < total {
|
||||
// Fallback: estimate ETA from overall elapsed time
|
||||
avgElapsed := phaseElapsed / time.Duration(done)
|
||||
remainingDBs := total - done
|
||||
eta := time.Duration(remainingDBs) * avgElapsed
|
||||
s.WriteString(fmt.Sprintf(" / ETA: ~%s", FormatDurationShort(eta)))
|
||||
}
|
||||
s.WriteString("]")
|
||||
}
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
// formatDuration formats duration in human readable format
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Minute {
|
||||
@ -434,3 +1037,188 @@ func dropDatabaseCLI(ctx context.Context, cfg *config.Config, dbName string) err
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// formatRestoreError formats a restore error message for clean TUI display
|
||||
func formatRestoreError(errStr string) string {
|
||||
var s strings.Builder
|
||||
maxLineWidth := 60
|
||||
|
||||
// Common patterns to extract
|
||||
patterns := []struct {
|
||||
key string
|
||||
pattern string
|
||||
}{
|
||||
{"Error Type", "ERROR:"},
|
||||
{"Hint", "HINT:"},
|
||||
{"Last Error", "last error:"},
|
||||
{"Total Errors", "total errors:"},
|
||||
}
|
||||
|
||||
// First, try to extract a clean error summary
|
||||
errLines := strings.Split(errStr, "\n")
|
||||
|
||||
// Find the main error message (first line or first ERROR:)
|
||||
mainError := ""
|
||||
hint := ""
|
||||
totalErrors := ""
|
||||
dbsFailed := []string{}
|
||||
|
||||
for _, line := range errLines {
|
||||
line = strings.TrimSpace(line)
|
||||
if line == "" {
|
||||
continue
|
||||
}
|
||||
|
||||
// Extract ERROR messages
|
||||
if strings.Contains(line, "ERROR:") {
|
||||
if mainError == "" {
|
||||
// Get just the ERROR part
|
||||
idx := strings.Index(line, "ERROR:")
|
||||
if idx >= 0 {
|
||||
mainError = strings.TrimSpace(line[idx:])
|
||||
// Truncate if too long
|
||||
if len(mainError) > maxLineWidth {
|
||||
mainError = mainError[:maxLineWidth-3] + "..."
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Extract HINT
|
||||
if strings.Contains(line, "HINT:") {
|
||||
idx := strings.Index(line, "HINT:")
|
||||
if idx >= 0 {
|
||||
hint = strings.TrimSpace(line[idx+5:])
|
||||
if len(hint) > maxLineWidth {
|
||||
hint = hint[:maxLineWidth-3] + "..."
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Extract total errors count
|
||||
if strings.Contains(line, "total errors:") {
|
||||
idx := strings.Index(line, "total errors:")
|
||||
if idx >= 0 {
|
||||
totalErrors = strings.TrimSpace(line[idx+13:])
|
||||
// Just extract the number
|
||||
parts := strings.Fields(totalErrors)
|
||||
if len(parts) > 0 {
|
||||
totalErrors = parts[0]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Extract failed database names (for cluster restore)
|
||||
if strings.Contains(line, ": restore failed:") {
|
||||
parts := strings.SplitN(line, ":", 2)
|
||||
if len(parts) > 0 {
|
||||
dbName := strings.TrimSpace(parts[0])
|
||||
if dbName != "" && !strings.HasPrefix(dbName, "Error") {
|
||||
dbsFailed = append(dbsFailed, dbName)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// If no structured error found, use the first line
|
||||
if mainError == "" {
|
||||
firstLine := errStr
|
||||
if idx := strings.Index(errStr, "\n"); idx > 0 {
|
||||
firstLine = errStr[:idx]
|
||||
}
|
||||
if len(firstLine) > maxLineWidth*2 {
|
||||
firstLine = firstLine[:maxLineWidth*2-3] + "..."
|
||||
}
|
||||
mainError = firstLine
|
||||
}
|
||||
|
||||
// Build structured error display
|
||||
s.WriteString(infoStyle.Render(" ─── Error Details ─────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Error type detection
|
||||
errorType := "critical"
|
||||
if strings.Contains(errStr, "out of shared memory") || strings.Contains(errStr, "max_locks_per_transaction") {
|
||||
errorType = "critical"
|
||||
} else if strings.Contains(errStr, "connection") {
|
||||
errorType = "connection"
|
||||
} else if strings.Contains(errStr, "permission") || strings.Contains(errStr, "access") {
|
||||
errorType = "permission"
|
||||
}
|
||||
|
||||
s.WriteString(fmt.Sprintf(" Type: %s\n", errorType))
|
||||
s.WriteString(fmt.Sprintf(" Message: %s\n", mainError))
|
||||
|
||||
if hint != "" {
|
||||
s.WriteString(fmt.Sprintf(" Hint: %s\n", hint))
|
||||
}
|
||||
|
||||
if totalErrors != "" {
|
||||
s.WriteString(fmt.Sprintf(" Total Errors: %s\n", totalErrors))
|
||||
}
|
||||
|
||||
// Show failed databases (max 5)
|
||||
if len(dbsFailed) > 0 {
|
||||
s.WriteString("\n")
|
||||
s.WriteString(" Failed Databases:\n")
|
||||
for i, db := range dbsFailed {
|
||||
if i >= 5 {
|
||||
s.WriteString(fmt.Sprintf(" ... and %d more\n", len(dbsFailed)-5))
|
||||
break
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" • %s\n", db))
|
||||
}
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ─── Diagnosis ─────────────────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
|
||||
// Provide specific recommendations based on error
|
||||
if strings.Contains(errStr, "out of shared memory") || strings.Contains(errStr, "max_locks_per_transaction") {
|
||||
s.WriteString(errorStyle.Render(" • PostgreSQL lock table exhausted\n"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" Lock capacity = max_locks_per_transaction\n")
|
||||
s.WriteString(" × (max_connections + max_prepared_transactions)\n\n")
|
||||
s.WriteString(" If you reduced VM size or max_connections, you need higher\n")
|
||||
s.WriteString(" max_locks_per_transaction to compensate.\n\n")
|
||||
s.WriteString(successStyle.Render(" FIX OPTIONS:\n"))
|
||||
s.WriteString(" 1. Enable 'Large DB Mode' in Settings\n")
|
||||
s.WriteString(" (press 'l' to toggle, reduces parallelism, increases locks)\n\n")
|
||||
s.WriteString(" 2. Increase PostgreSQL locks:\n")
|
||||
s.WriteString(" ALTER SYSTEM SET max_locks_per_transaction = 4096;\n")
|
||||
s.WriteString(" Then RESTART PostgreSQL.\n\n")
|
||||
s.WriteString(" 3. Reduce parallel jobs:\n")
|
||||
s.WriteString(" Set Cluster Parallelism = 1 in Settings\n")
|
||||
} else if strings.Contains(errStr, "connection") || strings.Contains(errStr, "refused") {
|
||||
s.WriteString(" • Database connection failed\n\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" 1. Check database is running\n")
|
||||
s.WriteString(" 2. Verify host, port, and credentials in Settings\n")
|
||||
s.WriteString(" 3. Check firewall/network connectivity\n")
|
||||
} else if strings.Contains(errStr, "permission") || strings.Contains(errStr, "denied") {
|
||||
s.WriteString(" • Permission denied\n\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] Recommendations ────────────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" 1. Verify database user has sufficient privileges\n")
|
||||
s.WriteString(" 2. Grant CREATE/DROP DATABASE permissions if restoring cluster\n")
|
||||
s.WriteString(" 3. Check file system permissions on backup directory\n")
|
||||
} else {
|
||||
s.WriteString(" See error message above for details.\n\n")
|
||||
s.WriteString(infoStyle.Render(" ─── [HINT] General Recommendations ────────────────────────"))
|
||||
s.WriteString("\n\n")
|
||||
s.WriteString(" 1. Check the full error log for details\n")
|
||||
s.WriteString(" 2. Try restoring with 'conservative' profile (press 'c')\n")
|
||||
s.WriteString(" 3. For complex databases, enable 'Large DB Mode' (press 'l')\n")
|
||||
}
|
||||
|
||||
s.WriteString("\n")
|
||||
|
||||
// Suppress the pattern variable since we don't use it but defined it
|
||||
_ = patterns
|
||||
|
||||
return s.String()
|
||||
}
|
||||
|
||||
@ -55,11 +55,13 @@ type RestorePreviewModel struct {
|
||||
cleanClusterFirst bool // For cluster restore: drop all user databases first
|
||||
existingDBCount int // Number of existing user databases
|
||||
existingDBs []string // List of existing user databases
|
||||
existingDBError string // Error message if database listing failed
|
||||
safetyChecks []SafetyCheck
|
||||
checking bool
|
||||
canProceed bool
|
||||
message string
|
||||
saveDebugLog bool // Save detailed error report on failure
|
||||
debugLocks bool // Enable detailed lock debugging
|
||||
workDir string // Custom work directory for extraction
|
||||
}
|
||||
|
||||
@ -102,6 +104,7 @@ type safetyCheckCompleteMsg struct {
|
||||
canProceed bool
|
||||
existingDBCount int
|
||||
existingDBs []string
|
||||
existingDBError string
|
||||
}
|
||||
|
||||
func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string) tea.Cmd {
|
||||
@ -221,10 +224,12 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
|
||||
check = SafetyCheck{Name: "Existing databases", Status: "checking", Critical: false}
|
||||
|
||||
// Get list of existing user databases (exclude templates and system DBs)
|
||||
var existingDBError string
|
||||
dbList, err := safety.ListUserDatabases(ctx)
|
||||
if err != nil {
|
||||
check.Status = "warning"
|
||||
check.Message = fmt.Sprintf("Cannot list databases: %v", err)
|
||||
existingDBError = err.Error()
|
||||
} else {
|
||||
existingDBCount = len(dbList)
|
||||
existingDBs = dbList
|
||||
@ -238,6 +243,14 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
|
||||
}
|
||||
}
|
||||
checks = append(checks, check)
|
||||
|
||||
return safetyCheckCompleteMsg{
|
||||
checks: checks,
|
||||
canProceed: canProceed,
|
||||
existingDBCount: existingDBCount,
|
||||
existingDBs: existingDBs,
|
||||
existingDBError: existingDBError,
|
||||
}
|
||||
}
|
||||
|
||||
return safetyCheckCompleteMsg{
|
||||
@ -257,6 +270,7 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.canProceed = msg.canProceed
|
||||
m.existingDBCount = msg.existingDBCount
|
||||
m.existingDBs = msg.existingDBs
|
||||
m.existingDBError = msg.existingDBError
|
||||
// Auto-forward in auto-confirm mode
|
||||
if m.config.TUIAutoConfirm {
|
||||
return m.parent, tea.Quit
|
||||
@ -275,10 +289,17 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
|
||||
case "c":
|
||||
if m.mode == "restore-cluster" {
|
||||
// Toggle cluster cleanup
|
||||
// Toggle cluster cleanup - databases will be re-detected at execution time
|
||||
m.cleanClusterFirst = !m.cleanClusterFirst
|
||||
if m.cleanClusterFirst {
|
||||
m.message = checkWarningStyle.Render(fmt.Sprintf("[WARN] Will drop %d existing database(s) before restore", m.existingDBCount))
|
||||
if m.existingDBError != "" {
|
||||
// Detection failed in preview - will re-detect at execution
|
||||
m.message = checkWarningStyle.Render("[WARN] Will clean existing databases before restore (detection pending)")
|
||||
} else if m.existingDBCount > 0 {
|
||||
m.message = checkWarningStyle.Render(fmt.Sprintf("[WARN] Will drop %d existing database(s) before restore", m.existingDBCount))
|
||||
} else {
|
||||
m.message = infoStyle.Render("[INFO] Cleanup enabled (no databases currently detected)")
|
||||
}
|
||||
} else {
|
||||
m.message = fmt.Sprintf("Clean cluster first: disabled")
|
||||
}
|
||||
@ -297,6 +318,15 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
m.message = "Debug log: disabled"
|
||||
}
|
||||
|
||||
case "l":
|
||||
// Toggle lock debugging
|
||||
m.debugLocks = !m.debugLocks
|
||||
if m.debugLocks {
|
||||
m.message = infoStyle.Render("🔍 [LOCK-DEBUG] Lock debugging: ENABLED (captures PostgreSQL lock config, Guard decisions, boost attempts)")
|
||||
} else {
|
||||
m.message = "Lock debugging: disabled"
|
||||
}
|
||||
|
||||
case "w":
|
||||
// Toggle/set work directory
|
||||
if m.workDir == "" {
|
||||
@ -326,7 +356,10 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// Proceed to restore execution
|
||||
// Proceed to restore execution (enable lock debugging in Config)
|
||||
if m.debugLocks {
|
||||
m.config.DebugLocks = true
|
||||
}
|
||||
exec := NewRestoreExecution(m.config, m.logger, m.parent, m.ctx, m.archive, m.targetDB, m.cleanFirst, m.createIfMissing, m.mode, m.cleanClusterFirst, m.existingDBs, m.saveDebugLog, m.workDir)
|
||||
return exec, exec.Init()
|
||||
}
|
||||
@ -382,7 +415,27 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString("\n")
|
||||
s.WriteString(fmt.Sprintf(" Host: %s:%d\n", m.config.Host, m.config.Port))
|
||||
|
||||
if m.existingDBCount > 0 {
|
||||
// Show Resource Profile and CPU Workload settings
|
||||
profile := m.config.GetCurrentProfile()
|
||||
if profile != nil {
|
||||
s.WriteString(fmt.Sprintf(" Resource Profile: %s (Parallel:%d, Jobs:%d)\n",
|
||||
profile.Name, profile.ClusterParallelism, profile.Jobs))
|
||||
} else {
|
||||
s.WriteString(fmt.Sprintf(" Resource Profile: %s\n", m.config.ResourceProfile))
|
||||
}
|
||||
// Show Large DB Mode status
|
||||
if m.config.LargeDBMode {
|
||||
s.WriteString(" Large DB Mode: ON (reduced parallelism, high locks)\n")
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" CPU Workload: %s\n", m.config.CPUWorkloadType))
|
||||
s.WriteString(fmt.Sprintf(" Cluster Parallelism: %d databases\n", m.config.ClusterParallelism))
|
||||
|
||||
if m.existingDBError != "" {
|
||||
// Show warning when database listing failed - but still allow cleanup toggle
|
||||
s.WriteString(checkWarningStyle.Render(" Existing Databases: Detection failed\n"))
|
||||
s.WriteString(infoStyle.Render(fmt.Sprintf(" (%s)\n", m.existingDBError)))
|
||||
s.WriteString(infoStyle.Render(" (Will re-detect at restore time)\n"))
|
||||
} else if m.existingDBCount > 0 {
|
||||
s.WriteString(fmt.Sprintf(" Existing Databases: %d found\n", m.existingDBCount))
|
||||
|
||||
// Show first few database names
|
||||
@ -395,17 +448,20 @@ func (m RestorePreviewModel) View() string {
|
||||
}
|
||||
s.WriteString(fmt.Sprintf(" - %s\n", db))
|
||||
}
|
||||
|
||||
cleanIcon := "[N]"
|
||||
cleanStyle := infoStyle
|
||||
if m.cleanClusterFirst {
|
||||
cleanIcon = "[Y]"
|
||||
cleanStyle = checkWarningStyle
|
||||
}
|
||||
s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s %v (press 'c' to toggle)\n", cleanIcon, m.cleanClusterFirst)))
|
||||
} else {
|
||||
s.WriteString(" Existing Databases: None (clean slate)\n")
|
||||
}
|
||||
|
||||
// Always show cleanup toggle for cluster restore
|
||||
cleanIcon := "[N]"
|
||||
cleanStyle := infoStyle
|
||||
if m.cleanClusterFirst {
|
||||
cleanIcon := "[Y]"
|
||||
cleanStyle = checkWarningStyle
|
||||
s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s enabled (press 'c' to toggle)\n", cleanIcon)))
|
||||
} else {
|
||||
s.WriteString(cleanStyle.Render(fmt.Sprintf(" Clean All First: %s disabled (press 'c' to toggle)\n", cleanIcon)))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
@ -453,10 +509,18 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString(infoStyle.Render(" All existing data in target database will be dropped!"))
|
||||
s.WriteString("\n\n")
|
||||
}
|
||||
if m.cleanClusterFirst && m.existingDBCount > 0 {
|
||||
if m.cleanClusterFirst {
|
||||
s.WriteString(checkWarningStyle.Render("[DANGER] WARNING: Cluster cleanup enabled"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(checkWarningStyle.Render(fmt.Sprintf(" %d existing database(s) will be DROPPED before restore!", m.existingDBCount)))
|
||||
if m.existingDBError != "" {
|
||||
s.WriteString(checkWarningStyle.Render(" Existing databases will be DROPPED before restore!"))
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" (Database count will be detected at restore time)"))
|
||||
} else if m.existingDBCount > 0 {
|
||||
s.WriteString(checkWarningStyle.Render(fmt.Sprintf(" %d existing database(s) will be DROPPED before restore!", m.existingDBCount)))
|
||||
} else {
|
||||
s.WriteString(infoStyle.Render(" No databases currently detected - cleanup will verify at restore time"))
|
||||
}
|
||||
s.WriteString("\n")
|
||||
s.WriteString(infoStyle.Render(" This ensures a clean disaster recovery scenario"))
|
||||
s.WriteString("\n\n")
|
||||
@ -495,6 +559,20 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString(infoStyle.Render(fmt.Sprintf(" Saves detailed error report to %s on failure", m.config.GetEffectiveWorkDir())))
|
||||
s.WriteString("\n")
|
||||
}
|
||||
|
||||
// Lock debugging option
|
||||
lockDebugIcon := "[-]"
|
||||
lockDebugStyle := infoStyle
|
||||
if m.debugLocks {
|
||||
lockDebugIcon = "[🔍]"
|
||||
lockDebugStyle = checkPassedStyle
|
||||
}
|
||||
s.WriteString(lockDebugStyle.Render(fmt.Sprintf(" %s Lock Debug: %v (press 'l' to toggle)", lockDebugIcon, m.debugLocks)))
|
||||
s.WriteString("\n")
|
||||
if m.debugLocks {
|
||||
s.WriteString(infoStyle.Render(" Captures PostgreSQL lock config, Guard decisions, boost attempts"))
|
||||
s.WriteString("\n")
|
||||
}
|
||||
s.WriteString("\n")
|
||||
|
||||
// Message
|
||||
@ -510,10 +588,10 @@ func (m RestorePreviewModel) View() string {
|
||||
s.WriteString(successStyle.Render("[OK] Ready to restore"))
|
||||
s.WriteString("\n")
|
||||
if m.mode == "restore-single" {
|
||||
s.WriteString(infoStyle.Render("t: Clean-first | c: Create | w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
|
||||
s.WriteString(infoStyle.Render("t: Clean-first | c: Create | w: WorkDir | d: Debug | l: LockDebug | Enter: Proceed | Esc: Cancel"))
|
||||
} else if m.mode == "restore-cluster" {
|
||||
if m.existingDBCount > 0 {
|
||||
s.WriteString(infoStyle.Render("c: Cleanup | w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
|
||||
s.WriteString(infoStyle.Render("c: Cleanup | w: WorkDir | d: Debug | l: LockDebug | Enter: Proceed | Esc: Cancel"))
|
||||
} else {
|
||||
s.WriteString(infoStyle.Render("w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
|
||||
}
|
||||
|
||||
@ -10,6 +10,7 @@ import (
|
||||
"github.com/charmbracelet/lipgloss"
|
||||
|
||||
"dbbackup/internal/config"
|
||||
"dbbackup/internal/cpu"
|
||||
"dbbackup/internal/logger"
|
||||
)
|
||||
|
||||
@ -101,6 +102,65 @@ func NewSettingsModel(cfg *config.Config, log logger.Logger, parent tea.Model) S
|
||||
Type: "selector",
|
||||
Description: "CPU workload profile (press Enter to cycle: Balanced → CPU-Intensive → I/O-Intensive)",
|
||||
},
|
||||
{
|
||||
Key: "resource_profile",
|
||||
DisplayName: "Resource Profile",
|
||||
Value: func(c *config.Config) string {
|
||||
profile := c.GetCurrentProfile()
|
||||
if profile != nil {
|
||||
return fmt.Sprintf("%s (P:%d J:%d)", profile.Name, profile.ClusterParallelism, profile.Jobs)
|
||||
}
|
||||
return c.ResourceProfile
|
||||
},
|
||||
Update: func(c *config.Config, v string) error {
|
||||
profiles := []string{"conservative", "balanced", "performance", "max-performance"}
|
||||
currentIdx := 0
|
||||
for i, p := range profiles {
|
||||
if c.ResourceProfile == p {
|
||||
currentIdx = i
|
||||
break
|
||||
}
|
||||
}
|
||||
nextIdx := (currentIdx + 1) % len(profiles)
|
||||
return c.ApplyResourceProfile(profiles[nextIdx])
|
||||
},
|
||||
Type: "selector",
|
||||
Description: "Resource profile for VM capacity. Toggle 'l' for Large DB Mode on any profile.",
|
||||
},
|
||||
{
|
||||
Key: "large_db_mode",
|
||||
DisplayName: "Large DB Mode",
|
||||
Value: func(c *config.Config) string {
|
||||
if c.LargeDBMode {
|
||||
return "ON (↓parallelism, ↑locks)"
|
||||
}
|
||||
return "OFF"
|
||||
},
|
||||
Update: func(c *config.Config, v string) error {
|
||||
c.LargeDBMode = !c.LargeDBMode
|
||||
return nil
|
||||
},
|
||||
Type: "selector",
|
||||
Description: "Enable for databases with many tables/LOBs. Reduces parallelism, increases max_locks_per_transaction.",
|
||||
},
|
||||
{
|
||||
Key: "cluster_parallelism",
|
||||
DisplayName: "Cluster Parallelism",
|
||||
Value: func(c *config.Config) string { return fmt.Sprintf("%d", c.ClusterParallelism) },
|
||||
Update: func(c *config.Config, v string) error {
|
||||
val, err := strconv.Atoi(v)
|
||||
if err != nil {
|
||||
return fmt.Errorf("cluster parallelism must be a number")
|
||||
}
|
||||
if val < 1 {
|
||||
return fmt.Errorf("cluster parallelism must be at least 1")
|
||||
}
|
||||
c.ClusterParallelism = val
|
||||
return nil
|
||||
},
|
||||
Type: "int",
|
||||
Description: "Concurrent databases during cluster backup/restore (1=sequential, safer for large DBs)",
|
||||
},
|
||||
{
|
||||
Key: "backup_dir",
|
||||
DisplayName: "Backup Directory",
|
||||
@ -528,12 +588,70 @@ func (m SettingsModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
||||
|
||||
case "s":
|
||||
return m.saveSettings()
|
||||
|
||||
case "l":
|
||||
// Quick shortcut: Toggle Large DB Mode
|
||||
return m.toggleLargeDBMode()
|
||||
|
||||
case "c":
|
||||
// Quick shortcut: Apply "conservative" profile for constrained VMs
|
||||
return m.applyConservativeProfile()
|
||||
|
||||
case "p":
|
||||
// Show profile recommendation
|
||||
return m.showProfileRecommendation()
|
||||
}
|
||||
}
|
||||
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// toggleLargeDBMode toggles the Large DB Mode flag
|
||||
func (m SettingsModel) toggleLargeDBMode() (tea.Model, tea.Cmd) {
|
||||
m.config.LargeDBMode = !m.config.LargeDBMode
|
||||
if m.config.LargeDBMode {
|
||||
profile := m.config.GetCurrentProfile()
|
||||
m.message = successStyle.Render(fmt.Sprintf(
|
||||
"[ON] Large DB Mode enabled: %s → Parallel=%d, Jobs=%d, MaxLocks=%d",
|
||||
profile.Name, profile.ClusterParallelism, profile.Jobs, profile.MaxLocksPerTxn))
|
||||
} else {
|
||||
profile := m.config.GetCurrentProfile()
|
||||
m.message = successStyle.Render(fmt.Sprintf(
|
||||
"[OFF] Large DB Mode disabled: %s → Parallel=%d, Jobs=%d",
|
||||
profile.Name, profile.ClusterParallelism, profile.Jobs))
|
||||
}
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// applyConservativeProfile applies the conservative profile for constrained VMs
|
||||
func (m SettingsModel) applyConservativeProfile() (tea.Model, tea.Cmd) {
|
||||
if err := m.config.ApplyResourceProfile("conservative"); err != nil {
|
||||
m.message = errorStyle.Render(fmt.Sprintf("[FAIL] %s", err.Error()))
|
||||
return m, nil
|
||||
}
|
||||
m.message = successStyle.Render("[OK] Applied 'conservative' profile: Cluster=1, Jobs=1. Safe for small VMs with limited memory.")
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// showProfileRecommendation displays the recommended profile based on system resources
|
||||
func (m SettingsModel) showProfileRecommendation() (tea.Model, tea.Cmd) {
|
||||
profileName, reason := m.config.GetResourceProfileRecommendation(false)
|
||||
|
||||
var largeDBHint string
|
||||
if m.config.LargeDBMode {
|
||||
largeDBHint = "Large DB Mode: ON"
|
||||
} else {
|
||||
largeDBHint = "Large DB Mode: OFF (press 'l' to enable)"
|
||||
}
|
||||
|
||||
m.message = infoStyle.Render(fmt.Sprintf(
|
||||
"[RECOMMEND] Profile: %s | %s\n"+
|
||||
" → %s\n"+
|
||||
" Press 'l' to toggle Large DB Mode, 'c' for conservative",
|
||||
profileName, largeDBHint, reason))
|
||||
return m, nil
|
||||
}
|
||||
|
||||
// handleEditingInput handles input when editing a setting
|
||||
func (m SettingsModel) handleEditingInput(msg tea.KeyMsg) (tea.Model, tea.Cmd) {
|
||||
switch msg.String() {
|
||||
@ -747,7 +865,32 @@ func (m SettingsModel) View() string {
|
||||
// Current configuration summary
|
||||
if !m.editing {
|
||||
b.WriteString("\n")
|
||||
b.WriteString(infoStyle.Render("[INFO] Current Configuration"))
|
||||
b.WriteString(infoStyle.Render("[INFO] System Resources & Configuration"))
|
||||
b.WriteString("\n")
|
||||
|
||||
// System resources
|
||||
var sysInfo []string
|
||||
if m.config.CPUInfo != nil {
|
||||
sysInfo = append(sysInfo, fmt.Sprintf("CPU: %d cores (physical), %d logical",
|
||||
m.config.CPUInfo.PhysicalCores, m.config.CPUInfo.LogicalCores))
|
||||
}
|
||||
if m.config.MemoryInfo != nil {
|
||||
sysInfo = append(sysInfo, fmt.Sprintf("Memory: %dGB total, %dGB available",
|
||||
m.config.MemoryInfo.TotalGB, m.config.MemoryInfo.AvailableGB))
|
||||
}
|
||||
|
||||
// Recommended profile
|
||||
recommendedProfile, reason := m.config.GetResourceProfileRecommendation(false)
|
||||
sysInfo = append(sysInfo, fmt.Sprintf("Recommended Profile: %s", recommendedProfile))
|
||||
sysInfo = append(sysInfo, fmt.Sprintf(" → %s", reason))
|
||||
|
||||
for _, line := range sysInfo {
|
||||
b.WriteString(detailStyle.Render(fmt.Sprintf(" %s", line)))
|
||||
b.WriteString("\n")
|
||||
}
|
||||
|
||||
b.WriteString("\n")
|
||||
b.WriteString(infoStyle.Render("[CONFIG] Current Settings"))
|
||||
b.WriteString("\n")
|
||||
|
||||
summary := []string{
|
||||
@ -755,7 +898,17 @@ func (m SettingsModel) View() string {
|
||||
fmt.Sprintf("Database: %s@%s:%d", m.config.User, m.config.Host, m.config.Port),
|
||||
fmt.Sprintf("Backup Dir: %s", m.config.BackupDir),
|
||||
fmt.Sprintf("Compression: Level %d", m.config.CompressionLevel),
|
||||
fmt.Sprintf("Jobs: %d parallel, %d dump", m.config.Jobs, m.config.DumpJobs),
|
||||
fmt.Sprintf("Profile: %s | Cluster: %d parallel | Jobs: %d",
|
||||
m.config.ResourceProfile, m.config.ClusterParallelism, m.config.Jobs),
|
||||
}
|
||||
|
||||
// Show profile warnings if applicable
|
||||
profile := m.config.GetCurrentProfile()
|
||||
if profile != nil {
|
||||
isValid, warnings := cpu.ValidateProfileForSystem(profile, m.config.CPUInfo, m.config.MemoryInfo)
|
||||
if !isValid && len(warnings) > 0 {
|
||||
summary = append(summary, fmt.Sprintf("⚠️ Warning: %s", warnings[0]))
|
||||
}
|
||||
}
|
||||
|
||||
if m.config.CloudEnabled {
|
||||
@ -782,9 +935,9 @@ func (m SettingsModel) View() string {
|
||||
} else {
|
||||
// Show different help based on current selection
|
||||
if m.cursor >= 0 && m.cursor < len(m.settings) && m.settings[m.cursor].Type == "path" {
|
||||
footer = infoStyle.Render("\n[KEYS] Up/Down navigate | Enter edit | Tab browse directories | 's' save | 'r' reset | 'q' menu")
|
||||
footer = infoStyle.Render("\n[KEYS] ↑↓ navigate | Enter edit | Tab dirs | 'l' toggle LargeDB | 'c' conservative | 'p' recommend | 's' save | 'q' menu")
|
||||
} else {
|
||||
footer = infoStyle.Render("\n[KEYS] Up/Down navigate | Enter edit | 's' save | 'r' reset | 'q' menu | Tab=dirs on path fields only")
|
||||
footer = infoStyle.Render("\n[KEYS] ↑↓ navigate | Enter edit | 'l' toggle LargeDB mode | 'c' conservative | 'p' recommend | 's' save | 'r' reset | 'q' menu")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
2
main.go
2
main.go
@ -16,7 +16,7 @@ import (
|
||||
|
||||
// Build information (set by ldflags)
|
||||
var (
|
||||
version = "3.42.34"
|
||||
version = "3.42.81"
|
||||
buildTime = "unknown"
|
||||
gitCommit = "unknown"
|
||||
)
|
||||
|
||||
77
release-notes-v3.42.77.md
Normal file
77
release-notes-v3.42.77.md
Normal file
@ -0,0 +1,77 @@
|
||||
# dbbackup v3.42.77
|
||||
|
||||
## 🎯 New Feature: Single Database Extraction from Cluster Backups
|
||||
|
||||
Extract and restore individual databases from cluster backups without full cluster restoration!
|
||||
|
||||
### 🆕 New Flags
|
||||
|
||||
- **`--list-databases`**: List all databases in cluster backup with sizes
|
||||
- **`--database <name>`**: Extract/restore a single database from cluster
|
||||
- **`--databases "db1,db2,db3"`**: Extract multiple databases (comma-separated)
|
||||
- **`--output-dir <path>`**: Extract to directory without restoring
|
||||
- **`--target <name>`**: Rename database during restore
|
||||
|
||||
### 📖 Examples
|
||||
|
||||
```bash
|
||||
# List databases in cluster backup
|
||||
dbbackup restore cluster backup.tar.gz --list-databases
|
||||
|
||||
# Extract single database (no restore)
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --output-dir /tmp/extract
|
||||
|
||||
# Restore single database from cluster
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --confirm
|
||||
|
||||
# Restore with different name (testing)
|
||||
dbbackup restore cluster backup.tar.gz --database myapp --target myapp_test --confirm
|
||||
|
||||
# Extract multiple databases
|
||||
dbbackup restore cluster backup.tar.gz --databases "app1,app2,app3" --output-dir /tmp/extract
|
||||
```
|
||||
|
||||
### 💡 Use Cases
|
||||
|
||||
✅ **Selective disaster recovery** - restore only affected databases
|
||||
✅ **Database migration** - copy databases between clusters
|
||||
✅ **Testing workflows** - restore with different names
|
||||
✅ **Faster restores** - extract only what you need
|
||||
✅ **Less disk space** - no need to extract entire cluster
|
||||
|
||||
### ⚙️ Technical Details
|
||||
|
||||
- Stream-based extraction with progress feedback
|
||||
- Fast cluster archive scanning (no full extraction needed)
|
||||
- Works with all cluster backup formats (.tar.gz)
|
||||
- Compatible with existing cluster restore workflow
|
||||
- Automatic format detection for extracted dumps
|
||||
|
||||
### 🖥️ TUI Support (Interactive Mode)
|
||||
|
||||
**New in this release**: Press **`s`** key when viewing a cluster backup to select individual databases!
|
||||
|
||||
- Navigate cluster backups in TUI and press `s` for database selection
|
||||
- Interactive database picker with size information
|
||||
- Visual selection confirmation before restore
|
||||
- Seamless integration with existing TUI workflows
|
||||
|
||||
**TUI Workflow:**
|
||||
1. Launch TUI: `dbbackup` (no arguments)
|
||||
2. Navigate to "Restore" → "Single Database"
|
||||
3. Select cluster backup archive
|
||||
4. Press `s` to show database list
|
||||
5. Select database and confirm restore
|
||||
|
||||
## 📦 Installation
|
||||
|
||||
Download the binary for your platform below and make it executable:
|
||||
|
||||
```bash
|
||||
chmod +x dbbackup_*
|
||||
./dbbackup_* --version
|
||||
```
|
||||
|
||||
## 🔍 Checksums
|
||||
|
||||
SHA256 checksums in `checksums.txt`.
|
||||
99
verify_postgres_locks.sh
Executable file
99
verify_postgres_locks.sh
Executable file
@ -0,0 +1,99 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
# PostgreSQL Lock Configuration Check & Restore Guidance
|
||||
#
|
||||
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " PostgreSQL Lock Configuration & Restore Strategy"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
|
||||
# Get values - extract ONLY digits, remove all non-numeric chars
|
||||
LOCKS=$(sudo -u postgres psql --no-psqlrc -t -A -c "SHOW max_locks_per_transaction;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
CONNS=$(sudo -u postgres psql --no-psqlrc -t -A -c "SHOW max_connections;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
PREPARED=$(sudo -u postgres psql --no-psqlrc -t -A -c "SHOW max_prepared_transactions;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
|
||||
if [ -z "$LOCKS" ]; then
|
||||
LOCKS=$(psql --no-psqlrc -t -A -c "SHOW max_locks_per_transaction;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
CONNS=$(psql --no-psqlrc -t -A -c "SHOW max_connections;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
PREPARED=$(psql --no-psqlrc -t -A -c "SHOW max_prepared_transactions;" 2>/dev/null | tr -cd '0-9' | head -c 10)
|
||||
fi
|
||||
|
||||
if [ -z "$LOCKS" ] || [ -z "$CONNS" ]; then
|
||||
echo "❌ ERROR: Could not retrieve PostgreSQL settings"
|
||||
echo " Ensure PostgreSQL is running and accessible"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "📊 Current Configuration:"
|
||||
echo "────────────────────────────────────────────────────────────"
|
||||
echo " max_locks_per_transaction: $LOCKS"
|
||||
echo " max_connections: $CONNS"
|
||||
echo " max_prepared_transactions: ${PREPARED:-0}"
|
||||
echo
|
||||
|
||||
# Calculate capacity
|
||||
PREPARED=${PREPARED:-0}
|
||||
CAPACITY=$((LOCKS * (CONNS + PREPARED)))
|
||||
|
||||
echo " Total Lock Capacity: $CAPACITY locks"
|
||||
echo
|
||||
|
||||
# Determine status
|
||||
if [ "$LOCKS" -lt 2048 ]; then
|
||||
STATUS="❌ CRITICAL"
|
||||
RECOMMENDATION="increase_locks"
|
||||
elif [ "$LOCKS" -lt 4096 ]; then
|
||||
STATUS="⚠️ LOW"
|
||||
RECOMMENDATION="single_threaded"
|
||||
else
|
||||
STATUS="✅ OK"
|
||||
RECOMMENDATION="single_threaded"
|
||||
fi
|
||||
|
||||
echo "Status: $STATUS (locks=$LOCKS, capacity=$CAPACITY)"
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " 🎯 RECOMMENDED RESTORE COMMAND"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
|
||||
if [ "$RECOMMENDATION" = "increase_locks" ]; then
|
||||
echo "CRITICAL: Locks too low. Increase first, THEN use single-threaded:"
|
||||
echo
|
||||
echo "1. Increase locks (requires PostgreSQL restart):"
|
||||
echo " sudo -u postgres psql -c \"ALTER SYSTEM SET max_locks_per_transaction = 4096;\""
|
||||
echo " sudo systemctl restart postgresql"
|
||||
echo
|
||||
echo "2. Run restore with single-threaded mode:"
|
||||
echo " dbbackup restore cluster <backup-file> \\"
|
||||
echo " --profile conservative \\"
|
||||
echo " --parallel-dbs 1 \\"
|
||||
echo " --jobs 1 \\"
|
||||
echo " --confirm"
|
||||
else
|
||||
echo "✅ Use default CONSERVATIVE profile (single-threaded, prevents lock issues):"
|
||||
echo
|
||||
echo " dbbackup restore cluster <backup-file> --confirm"
|
||||
echo
|
||||
echo " (Default profile is now 'conservative' = single-threaded)"
|
||||
echo
|
||||
echo " For faster restore (if locks are sufficient):"
|
||||
echo " dbbackup restore cluster <backup-file> --profile balanced --confirm"
|
||||
echo " dbbackup restore cluster <backup-file> --profile aggressive --confirm"
|
||||
fi
|
||||
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo " ℹ️ WHY SINGLE-THREADED?"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
echo
|
||||
echo " Parallel restore with large databases (especially with BLOBs)"
|
||||
echo " can exhaust locks EVEN with high max_locks_per_transaction."
|
||||
echo
|
||||
echo " --jobs 1 = Single-threaded pg_restore (minimal locks)"
|
||||
echo " --parallel-dbs 1 = Restore one database at a time"
|
||||
echo
|
||||
echo " Trade-off: Slower restore, but GUARANTEED completion."
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
Reference in New Issue
Block a user