diff --git a/CLUSTER_RESTORE_COMPLIANCE.md b/CLUSTER_RESTORE_COMPLIANCE.md deleted file mode 100644 index ff31e24..0000000 --- a/CLUSTER_RESTORE_COMPLIANCE.md +++ /dev/null @@ -1,168 +0,0 @@ -# PostgreSQL Cluster Restore - Best Practices Compliance Check - -## ✅ Current Implementation Status - -### Our Cluster Restore Process (internal/restore/engine.go) - -Based on PostgreSQL official documentation and best practices, our implementation follows the correct approach: - -## 1. ✅ Global Objects Restoration (FIRST) -```go -// Lines 505-528: Restore globals BEFORE databases -globalsFile := filepath.Join(tempDir, "globals.sql") -if _, err := os.Stat(globalsFile); err == nil { - e.restoreGlobals(ctx, globalsFile) // Restores roles, tablespaces FIRST -} -``` - -**Why:** Roles and tablespaces must exist before restoring databases that reference them. - -## 2. ✅ Proper Database Cleanup (DROP IF EXISTS) -```go -// Lines 600-605: Drop existing database completely -e.dropDatabaseIfExists(ctx, dbName) -``` - -### dropDatabaseIfExists implementation (lines 835-870): -```go -// Step 1: Terminate all active connections -terminateConnections(ctx, dbName) - -// Step 2: Wait for termination -time.Sleep(500 * time.Millisecond) - -// Step 3: Drop database with IF EXISTS -DROP DATABASE IF EXISTS "dbName" -``` - -**PostgreSQL Docs**: "The `--clean` option can be useful even when your intention is to restore the dump script into a fresh cluster. Use of `--clean` authorizes the script to drop and re-create the built-in postgres and template1 databases." - -## 3. ✅ Template0 for Database Creation -```go -// Line 915: Use template0 to avoid duplicate definitions -CREATE DATABASE "dbName" WITH TEMPLATE template0 -``` - -**Why:** `template0` is truly empty, whereas `template1` may have local additions that cause "duplicate definition" errors. - -**PostgreSQL Docs (pg_restore)**: "To make an empty database without any local additions, copy from template0 not template1, for example: CREATE DATABASE foo WITH TEMPLATE template0;" - -## 4. ✅ Connection Termination Before Drop -```go -// Lines 800-833: terminateConnections function -SELECT pg_terminate_backend(pid) -FROM pg_stat_activity -WHERE datname = 'dbname' -AND pid <> pg_backend_pid() -``` - -**Why:** Cannot drop a database with active connections. Must terminate them first. - -## 5. ✅ Parallel Restore with Worker Pool -```go -// Lines 555-571: Parallel restore implementation -parallelism := e.cfg.ClusterParallelism -semaphore := make(chan struct{}, parallelism) -// Restores multiple databases concurrently -``` - -**Best Practice:** Significantly speeds up cluster restore (3-5x faster). - -## 6. ✅ Error Handling and Reporting -```go -// Lines 628-645: Comprehensive error tracking -var failedDBs []string -var successCount, failCount int32 - -// Report failures at end -if len(failedDBs) > 0 { - return fmt.Errorf("cluster restore completed with %d failures: %s", - len(failedDBs), strings.Join(failedDBs, ", ")) -} -``` - -## 7. ✅ Superuser Privilege Detection -```go -// Lines 488-503: Check for superuser -isSuperuser, err := e.checkSuperuser(ctx) -if !isSuperuser { - e.log.Warn("Current user is not a superuser - database ownership may not be fully restored") -} -``` - -**Why:** Ownership restoration requires superuser privileges. Warn user if not available. - -## 8. ✅ System Database Skip Logic -```go -// Lines 877-881: Skip system databases -if dbName == "postgres" || dbName == "template0" || dbName == "template1" { - e.log.Info("Skipping create for system database (assume exists)") - return nil -} -``` - -**Why:** System databases always exist and should not be dropped/created. - ---- - -## PostgreSQL Documentation References - -### From pg_dumpall docs: -> "`-c, --clean`: Emit SQL commands to DROP all the dumped databases, roles, and tablespaces before recreating them. This option is useful when the restore is to overwrite an existing cluster." - -### From managing-databases docs: -> "To destroy a database: DROP DATABASE name;" -> "You cannot drop a database while clients are connected to it. You can use pg_terminate_backend to disconnect them." - -### From pg_restore docs: -> "To make an empty database without any local additions, copy from template0 not template1" - ---- - -## Comparison with PostgreSQL Best Practices - -| Practice | PostgreSQL Docs | Our Implementation | Status | -|----------|----------------|-------------------|--------| -| Restore globals first | ✅ Required | ✅ Implemented | ✅ CORRECT | -| DROP before CREATE | ✅ Recommended | ✅ Implemented | ✅ CORRECT | -| Terminate connections | ✅ Required | ✅ Implemented | ✅ CORRECT | -| Use template0 | ✅ Recommended | ✅ Implemented | ✅ CORRECT | -| Handle IF EXISTS errors | ✅ Recommended | ✅ Implemented | ✅ CORRECT | -| Superuser warnings | ✅ Recommended | ✅ Implemented | ✅ CORRECT | -| Parallel restore | ⚪ Optional | ✅ Implemented | ✅ ENHANCED | - ---- - -## Additional Safety Features (Beyond Docs) - -1. **Version Compatibility Checking** (NEW) - - Warns about PG 13 → PG 17 upgrades - - Blocks unsupported downgrades - - Provides recommendations - -2. **Atomic Failure Tracking** - - Thread-safe counters for parallel operations - - Detailed error collection per database - -3. **Progress Indicators** - - Real-time ETA estimation - - Per-database progress tracking - -4. **Disk Space Validation** - - Pre-checks available space (4x multiplier for cluster) - - Prevents out-of-space failures mid-restore - ---- - -## Conclusion - -✅ **Our cluster restore implementation is 100% compliant with PostgreSQL best practices.** - -The cleanup process (`dropDatabaseIfExists`) correctly: -1. Terminates all connections -2. Waits for cleanup -3. Drops the database completely -4. Uses `template0` for fresh creation -5. Handles system databases appropriately - -**No changes needed** - implementation follows official documentation exactly. diff --git a/LARGE_OBJECT_RESTORE_FIX.md b/LARGE_OBJECT_RESTORE_FIX.md deleted file mode 100644 index 8940819..0000000 --- a/LARGE_OBJECT_RESTORE_FIX.md +++ /dev/null @@ -1,165 +0,0 @@ -# Large Object Restore Fix - -## Problem Analysis - -### Error 1: "type backup_state already exists" (postgres database) -**Root Cause**: `--single-transaction` combined with `--exit-on-error` causes entire restore to fail when objects already exist in target database. - -**Why it fails**: -- `--single-transaction` wraps restore in BEGIN/COMMIT -- `--exit-on-error` aborts on ANY error (including ignorable ones) -- "already exists" errors are IGNORABLE - PostgreSQL should continue - -### Error 2: "could not open large object 9646664" + 2.5M errors (resydb database) -**Root Cause**: `--single-transaction` takes locks on ALL restored objects simultaneously, exhausting lock table. - -**Why it fails**: -- Single transaction locks ALL large objects at once -- With 35,000+ large objects, exceeds max_locks_per_transaction -- Lock exhaustion → "could not open large object" errors -- Cascading failures → millions of errors - -## PostgreSQL Documentation (Verified) - -### From pg_restore docs: -> **"pg_restore cannot restore large objects selectively"** - All large objects restored together - -> **"-j / --jobs: Only custom and directory formats supported"** - -> **"multiple jobs cannot be used together with --single-transaction"** - -### From Section 19.5 (Resource Consumption): -> **"max_locks_per_transaction × max_connections = total locks"** -- Lock table is SHARED across all sessions -- Single transaction consuming all locks blocks everything - -## Changes Made - -### 1. Disabled `--single-transaction` (CRITICAL FIX) -**File**: `internal/restore/engine.go` -- Line 186: `SingleTransaction: false` (was: true) -- Line 210: `SingleTransaction: false` (was: true) - -**Impact**: -- No longer wraps entire restore in one transaction -- Each object restored in its own transaction -- Locks released incrementally (not held until end) -- Prevents lock table exhaustion - -### 2. Removed `--exit-on-error` (CRITICAL FIX) -**File**: `internal/database/postgresql.go` -- Line 375-378: Removed `cmd.append("--exit-on-error")` - -**Impact**: -- PostgreSQL continues on ignorable errors (correct behavior) -- "already exists" errors logged but don't stop restore -- Final error count reported at end -- Only real errors cause failure - -### 3. Kept Sequential Parallelism Detection -**File**: `internal/restore/engine.go` -- Lines 552-565: `detectLargeObjectsInDumps()` still active -- Automatically reduces cluster parallelism to 1 when BLOBs detected - -**Impact**: -- Prevents multiple databases with large objects from competing for locks -- Sequential cluster restore = only one DB's large objects in lock table at a time - -## Why This Works - -### Before (BROKEN): -``` -START TRANSACTION; -- Single transaction begins - CREATE TABLE ... -- Lock acquired - CREATE INDEX ... -- Lock acquired - RESTORE BLOB 1 -- Lock acquired - RESTORE BLOB 2 -- Lock acquired - ... - RESTORE BLOB 35000 -- Lock acquired → EXHAUSTED! - ERROR: max_locks_per_transaction exceeded -ROLLBACK; -- Everything fails -``` - -### After (FIXED): -``` -BEGIN; CREATE TABLE ...; COMMIT; -- Lock released -BEGIN; CREATE INDEX ...; COMMIT; -- Lock released -BEGIN; RESTORE BLOB 1; COMMIT; -- Lock released -BEGIN; RESTORE BLOB 2; COMMIT; -- Lock released -... -BEGIN; RESTORE BLOB 35000; COMMIT; -- Each only holds ~100 locks max -SUCCESS: All objects restored -``` - -## Testing Recommendations - -### 1. Test with postgres database (backup_state error) -```bash -./dbbackup restore cluster /path/to/backup.tar.gz -# Should now skip "already exists" errors and continue -``` - -### 2. Test with resydb database (large objects) -```bash -# Check dump for large objects first -pg_restore -l resydb.dump | grep -i "blob\|large object" - -# Restore should now work without lock exhaustion -./dbbackup restore cluster /path/to/backup.tar.gz -``` - -### 3. Monitor locks during restore -```sql --- In another terminal while restore runs: -SELECT count(*) FROM pg_locks; --- Should stay well below max_locks_per_transaction × max_connections -``` - -## Expected Behavior Now - -### For "already exists" errors: -``` -pg_restore: warning: object already exists: TYPE backup_state -pg_restore: warning: object already exists: FUNCTION ... -... (continues restoring) ... -pg_restore: total errors: 10 (all ignorable) -SUCCESS -``` - -### For large objects: -``` -Restoring database resydb... - Large objects detected - using sequential restore - Restoring 35,000 large objects... (progress) - ✓ Database resydb restored successfully -``` - -## Configuration Settings (Still Valid) - -These PostgreSQL settings help but are NO LONGER REQUIRED with the fix: - -```ini -# Still recommended for performance, not required for correctness: -max_locks_per_transaction = 256 # Provides headroom -maintenance_work_mem = 1GB # Faster index creation -shared_buffers = 8GB # Better caching -``` - -## Commit This Fix - -```bash -git add internal/restore/engine.go internal/database/postgresql.go -git commit -m "CRITICAL FIX: Remove --single-transaction and --exit-on-error from pg_restore - -- Disabled --single-transaction to prevent lock table exhaustion with large objects -- Removed --exit-on-error to allow PostgreSQL to skip ignorable errors -- Fixes 'could not open large object' errors (lock exhaustion) -- Fixes 'already exists' errors causing complete restore failure -- Each object now restored in its own transaction (locks released incrementally) -- PostgreSQL default behavior (continue on ignorable errors) is correct for restores - -Per PostgreSQL docs: --single-transaction incompatible with large object restores -and causes lock table exhaustion with 1000+ objects." - -git push -``` diff --git a/PHASE2_COMPLETION.md b/PHASE2_COMPLETION.md deleted file mode 100644 index 1fdbc1e..0000000 --- a/PHASE2_COMPLETION.md +++ /dev/null @@ -1,247 +0,0 @@ -# Phase 2 TUI Improvements - Completion Report - -## Overview -Phase 2 of the TUI improvements adds professional, actionable UX features focused on transparency and error guidance. All features implemented without over-engineering. - -## Implemented Features - -### 1. Disk Space Pre-Flight Checks ✅ -**Files:** `internal/checks/disk_check.go` - -**Features:** -- Real-time filesystem stats using `syscall.Statfs_t` -- Three-tier status system: - - **Critical** (≥95% used): Blocks operation - - **Warning** (≥80% used): Warns but allows - - **Sufficient** (<80% used): OK to proceed -- Smart space estimation: - - Backups: Based on compression level - - Restores: 4x archive size (decompression overhead) - -**Integration:** -- `internal/backup/engine.go` - Pre-flight check before cluster backup -- `internal/restore/engine.go` - Pre-flight check before cluster restore -- Displays formatted message in CLI mode -- Logs warnings when space is tight - -**Example Output:** -``` -📊 Disk Space Check (OK): - Path: /var/lib/pgsql/db_backups - Total: 151.0 GiB - Available: 66.0 GiB (55.0% used) - ✓ Status: OK - - ✓ Sufficient space available -``` - -### 2. Error Classification & Hints ✅ -**Files:** `internal/checks/error_hints.go` - -**Features:** -- Smart error pattern matching (regex + substring) -- Four severity levels: - - **Ignorable**: Objects already exist (normal) - - **Warning**: Version mismatches - - **Critical**: Lock exhaustion, permissions, connections - - **Fatal**: Corrupted dumps, excessive errors - -**Error Categories:** -- `duplicate`: Already exists (ignorable) -- `disk_space`: No space left on device -- `locks`: max_locks_per_transaction exhausted -- `corruption`: Syntax errors in dump file -- `permissions`: Permission denied, must be owner -- `network`: Connection refused, pg_hba.conf -- `version`: PostgreSQL version mismatch -- `unknown`: Unclassified errors - -**Integration:** -- `internal/restore/engine.go` - Classify errors during restore -- Enhanced error logging with hints and actions -- Error messages include actionable solutions - -**Example Error Classification:** -``` -❌ CRITICAL Error - -Category: locks -Message: ERROR: out of shared memory - HINT: You might need to increase max_locks_per_transaction - -💡 Hint: Lock table exhausted - typically caused by large objects in parallel restore - -🔧 Action: Increase max_locks_per_transaction in postgresql.conf to 512 or higher -``` - -### 3. Actionable Error Messages ✅ - -**Common Errors Mapped:** - -1. **"already exists"** - - Type: Ignorable - - Hint: "Object already exists in target database - this is normal during restore" - - Action: "No action needed - restore will continue" - -2. **"no space left"** - - Type: Critical - - Hint: "Insufficient disk space to complete operation" - - Action: "Free up disk space: rm old_backups/* or increase storage" - -3. **"max_locks_per_transaction"** - - Type: Critical - - Hint: "Lock table exhausted - typically caused by large objects" - - Action: "Increase max_locks_per_transaction in postgresql.conf to 512" - -4. **"syntax error"** - - Type: Fatal - - Hint: "Syntax error in dump file - backup may be corrupted" - - Action: "Re-create backup with: dbbackup backup single " - -5. **"permission denied"** - - Type: Critical - - Hint: "Insufficient permissions to perform operation" - - Action: "Run as superuser or use --no-owner flag for restore" - -6. **"connection refused"** - - Type: Critical - - Hint: "Cannot connect to database server" - - Action: "Check database is running and pg_hba.conf allows connection" - -## Architecture Decisions - -### Separate `checks` Package -- **Why:** Avoid import cycles (backup/restore ↔ tui) -- **Location:** `internal/checks/` -- **Dependencies:** Only stdlib (`syscall`, `fmt`, `strings`) -- **Result:** Clean separation, no circular dependencies - -### No Logger Dependency -- **Why:** Keep checks package lightweight -- **Alternative:** Callers log results as needed -- **Benefit:** Reusable in any context - -### Three-Tier Status System -- **Why:** Clear visual indicators for users -- **Critical:** Red ❌ - Blocks operation -- **Warning:** Yellow ⚠️ - Warns but allows -- **Sufficient:** Green ✓ - OK to proceed - -## Testing Status - -### Background Test -**File:** `test_backup_restore.sh` -**Status:** ✅ Running (PID 1071950) - -**Progress (as of last check):** -- ✅ Cluster backup complete: 17/17 databases -- ✅ d7030 backed up: 34GB with 35,000 large objects -- ✅ Large DBs handled: testdb_50gb (6.7GB) × 2 -- 🔄 Creating compressed archive... -- ⏳ Next: Drop d7030 → Restore cluster → Verify BLOBs - -**Validates:** -- Lock exhaustion fix (35K large objects) -- Ignorable error handling ("already exists") -- Ctrl+C cancellation -- Disk space handling (34GB backup) - -## Performance Impact - -### Disk Space Check -- **Cost:** ~1ms per check (single syscall) -- **When:** Once before backup/restore starts -- **Impact:** Negligible - -### Error Classification -- **Cost:** String pattern matching per error -- **When:** Only when errors occur -- **Impact:** Minimal (errors already indicate slow path) - -## User Experience Improvements - -### Before Phase 2: -``` -Error: restore failed: exit status 1 (total errors: 2500000) -``` -❌ No hint what went wrong -❌ No actionable guidance -❌ Can't distinguish critical from ignorable errors - -### After Phase 2: -``` -📊 Disk Space Check (OK): - Available: 66.0 GiB (55.0% used) - ✓ Sufficient space available - -[restore in progress...] - -❌ CRITICAL Error - Category: locks - 💡 Hint: Lock table exhausted - typically caused by large objects - 🔧 Action: Increase max_locks_per_transaction to 512 or higher -``` -✅ Clear disk status before starting -✅ Helpful error classification -✅ Actionable solution provided -✅ Professional, transparent UX - -## Code Quality - -### Test Coverage -- ✅ Compiles without warnings -- ✅ No import cycles -- ✅ Minimal dependencies -- ✅ Integrated into existing workflows - -### Error Handling -- ✅ Graceful fallback if syscall fails -- ✅ Default classification for unknown errors -- ✅ Non-blocking in CLI mode - -### Documentation -- ✅ Inline comments for all functions -- ✅ Clear struct field descriptions -- ✅ Usage examples in TUI_IMPROVEMENTS.md - -## Next Steps (Phase 3) - -### Real-Time Progress (Not Yet Implemented) -- Show bytes processed / total bytes -- Display transfer speed (MB/s) -- Update ETA based on actual speed -- Progress bars using Bubble Tea components - -### Keyboard Shortcuts (Not Yet Implemented) -- `1-9`: Quick jump to menu options -- `q`: Quit application -- `r`: Refresh backup list -- `/`: Search/filter backups - -### Enhanced Backup List (Not Yet Implemented) -- Show backup size, age, health -- Visual indicators for verification status -- Sort by date, size, name - -## Git History -``` -9d36b26 - Add Phase 2 TUI improvements: disk space checks and error hints -e95eeb7 - Add comprehensive TUI improvement plan and background test script -c31717c - Add Ctrl+C interrupt handling for cluster operations -[previous commits...] -``` - -## Summary - -Phase 2 delivers on the core promise: **transparent, actionable, professional UX without over-engineering.** - -**Key Achievements:** -- ✅ Pre-flight disk space validation prevents "100% full" surprises -- ✅ Smart error classification distinguishes critical from ignorable -- ✅ Actionable hints provide specific solutions, not generic messages -- ✅ Zero performance impact (checks run once, errors already slow) -- ✅ Clean architecture (no import cycles, minimal dependencies) -- ✅ Integrated seamlessly into existing workflows - -**User Impact:** -Users now see what's happening, why errors occur, and exactly how to fix them. No more mysterious failures or cryptic messages. diff --git a/TUI_IMPROVEMENTS.md b/TUI_IMPROVEMENTS.md deleted file mode 100644 index 307f4ed..0000000 --- a/TUI_IMPROVEMENTS.md +++ /dev/null @@ -1,250 +0,0 @@ -# Interactive TUI Experience Improvements - -## Current Issues & Solutions - -### 1. **Progress Visibility During Long Operations** - -**Problem**: Cluster backup/restore with large databases (40GB+) takes 30+ minutes with minimal feedback. - -**Solutions**: -- ✅ Show current database being processed -- ✅ Display database size before backup/restore starts -- ✅ ETA estimator for multi-database operations -- 🔄 **NEW**: Real-time progress bar per database (bytes processed / total bytes) -- 🔄 **NEW**: Show current operation speed (MB/s) -- 🔄 **NEW**: Percentage complete for entire cluster operation - -### 2. **Error Handling & Recovery** - -**Problem**: When restore fails (like resydb with 2.5M errors), user has no context about WHY or WHAT to do. - -**Solutions**: -- ✅ Distinguish ignorable errors (already exists) from critical errors -- 🔄 **NEW**: Show error classification in TUI: - ``` - ⚠️ WARNING: 5 ignorable errors (objects already exist) - ❌ CRITICAL: Syntax errors detected - dump file may be corrupted - 💡 HINT: Re-create backup with: dbbackup backup single resydb - ``` -- 🔄 **NEW**: Offer retry option for failed databases -- 🔄 **NEW**: Skip vs Abort choice for non-critical failures - -### 3. **Large Object Detection Feedback** - -**Problem**: User doesn't know WHY parallelism was reduced. - -**Solution**: -``` -🔍 Scanning cluster backup for large objects... - ✓ postgres: No large objects - ⚠️ d7030: 35,000 BLOBs detected (42GB) - -⚙️ Automatically reducing parallelism: 2 → 1 (sequential) -💡 Reason: Large objects require exclusive lock table access -``` - -### 4. **Disk Space Warnings** - -**Problem**: Backup fails silently when disk is full. - -**Solutions**: -- 🔄 **NEW**: Pre-flight check before backup: - ``` - 📊 Disk Space Check: - Database size: 42GB - Available space: 66GB - Estimated backup: ~15GB (compressed) - ✓ Sufficient space available - ``` -- 🔄 **NEW**: Warning at 80% disk usage -- 🔄 **NEW**: Block operation at 95% disk usage - -### 5. **Cancellation Handling (Ctrl+C)** - -**Problem**: Users don't know if Ctrl+C will work or leave partial backups. - -**Solutions**: -- ✅ Graceful cancellation on Ctrl+C -- 🔄 **NEW**: Show cleanup message: - ``` - ^C received - Cancelling backup... - 🧹 Cleaning up temporary files... - ✓ Cleanup complete - no partial backups left - ``` -- 🔄 **NEW**: Confirmation prompt for cluster operations: - ``` - ⚠️ Cluster backup in progress (3/10 databases) - Are you sure you want to cancel? (y/N) - ``` - -### 6. **Interactive Mode Navigation** - -**Problem**: TUI menu is basic, no keyboard shortcuts, no search. - -**Solutions**: -- 🔄 **NEW**: Keyboard shortcuts: - - `1-9`: Quick jump to menu items - - `q`: Quit - - `r`: Refresh status - - `/`: Search backups -- 🔄 **NEW**: Backup list improvements: - ``` - 📦 Available Backups: - - 1. cluster_20251118_103045.tar.gz [45GB] ⏱ 2 hours ago - ├─ postgres (325MB) - ├─ d7030 (42GB) ⚠️ 35K BLOBs - └─ template1 (8MB) - - 2. cluster_20251112_084329.tar.gz [38GB] ⏱ 6 days ago - └─ ⚠️ WARNING: May contain corrupted resydb dump - ``` -- 🔄 **NEW**: Filter/sort options: by date, by size, by status - -### 7. **Configuration Recommendations** - -**Problem**: Users don't know optimal settings for their workload. - -**Solutions**: -- 🔄 **NEW**: Auto-detect and suggest settings on first run: - ``` - 🔧 System Configuration Detected: - RAM: 32GB → Recommended: shared_buffers=8GB - CPUs: 4 cores → Recommended: parallel_jobs=3 - Disk: 66GB free → Recommended: max backup size: 50GB - - Apply these settings? (Y/n) - ``` -- 🔄 **NEW**: Show current vs recommended config in menu: - ``` - ⚙️ Configuration Status: - max_locks_per_transaction: 256 ✓ (sufficient for 35K objects) - maintenance_work_mem: 64MB ⚠️ (recommend: 1GB for faster restores) - shared_buffers: 128MB ⚠️ (recommend: 8GB with 32GB RAM) - ``` - -### 8. **Backup Verification & Health** - -**Problem**: No way to verify backup integrity before restore. - -**Solutions**: -- 🔄 **NEW**: Add "Verify Backup" menu option: - ``` - 🔍 Verifying backup: cluster_20251118_103045.tar.gz - ✓ Archive integrity: OK - ✓ Extracting metadata... - ✓ Checking dump formats... - - Databases found: - ✓ postgres: Custom format, 325MB - ✓ d7030: Custom format, 42GB, 35,000 BLOBs - ⚠️ resydb: CORRUPTED - 2.5M syntax errors detected - - Overall: ⚠️ Partial (2/3 databases healthy) - ``` -- 🔄 **NEW**: Show last backup status in main menu - -### 9. **Restore Dry Run** - -**Problem**: No preview of what will be restored. - -**Solution**: -``` -🎬 Restore Preview (Dry Run): - -Target: cluster_20251118_103045.tar.gz -Databases to restore: - 1. postgres (325MB) - - Will overwrite: 5 existing objects - - New objects: 120 - - 2. d7030 (42GB, 35K BLOBs) - - Will DROP and recreate database - - Estimated time: 25-30 minutes - - Required locks: 35,000 (available: 25,600) ⚠️ - -⚠️ WARNING: Insufficient locks for d7030 -💡 Solution: Increase max_locks_per_transaction to 512 - -Proceed with restore? (y/N) -``` - -### 10. **Multi-Step Wizards** - -**Problem**: Complex operations (like cluster restore with --clean) need multiple confirmations. - -**Solution**: Step-by-step wizard: -``` -Step 1/4: Select backup -Step 2/4: Review databases to restore -Step 3/4: Check prerequisites (disk space, locks, etc.) -Step 4/4: Confirm and execute -``` - -## Implementation Priority - -### Phase 1 (High Impact, Low Effort) ✅ -- ✅ ETA estimators -- ✅ Large object detection warnings -- ✅ Ctrl+C handling -- ✅ Ignorable error detection - -### Phase 2 (High Impact, Medium Effort) 🔄 -- Real-time progress bars with MB/s -- Disk space pre-flight checks -- Backup verification tool -- Error hints and suggestions - -### Phase 3 (Quality of Life) 🔄 -- Keyboard shortcuts -- Backup list with metadata -- Configuration recommendations -- Restore dry run - -### Phase 4 (Advanced) 📋 -- Multi-step wizards -- Search/filter backups -- Auto-retry failed databases -- Parallel restore progress split-view - -## Code Structure - -``` -internal/tui/ - menu.go - Main interactive menu - backup_menu.go - Backup wizard - restore_menu.go - Restore wizard - verify_menu.go - Backup verification (NEW) - config_menu.go - Configuration tuning (NEW) - progress_view.go - Real-time progress display (ENHANCED) - errors.go - Error classification & hints (NEW) -``` - -## Testing Plan - -1. **Large Database Test** (In Progress) - - 42GB d7030 with 35K BLOBs - - Verify progress updates - - Verify large object detection - - Verify successful restore - -2. **Error Scenarios** - - Corrupted dump file - - Insufficient disk space - - Insufficient locks - - Network interruption - - Ctrl+C during operations - -3. **Performance** - - Backup time vs raw pg_dump - - Restore time vs raw pg_restore - - Memory usage during 40GB+ operations - - CPU utilization with parallel workers - -## Success Metrics - -- ✅ No "black box" operations - user always knows what's happening -- ✅ Errors are actionable - user knows what to fix -- ✅ Safe operations - confirmations for destructive actions -- ✅ Fast feedback - progress updates every 1-2 seconds -- ✅ Professional feel - polished, consistent, intuitive diff --git a/create_d7030_test.sh b/create_d7030_test.sh deleted file mode 100755 index 3fb6955..0000000 --- a/create_d7030_test.sh +++ /dev/null @@ -1,281 +0,0 @@ -#!/usr/bin/env bash -# create_d7030_test.sh -# Create a realistic d7030 database with tables, data, and many BLOBs to test large object restore - -set -euo pipefail - -DB_NAME="d7030" -NUM_DOCUMENTS=15000 # Number of documents with BLOBs (~750MB at 50KB each) -NUM_IMAGES=10000 # Number of image records (~900MB for images + ~100MB thumbnails) -# Total BLOBs: 25,000 large objects -# Approximate size: 15000*50KB + 10000*90KB + 10000*10KB = ~2.4GB in BLOBs alone -# With tables, indexes, and overhead: ~3-4GB per iteration -# We'll create multiple batches to reach ~25GB - -echo "Creating database: $DB_NAME" - -# Drop if exists -sudo -u postgres psql -c "DROP DATABASE IF EXISTS $DB_NAME;" 2>/dev/null || true - -# Create database -sudo -u postgres psql -c "CREATE DATABASE $DB_NAME;" - -echo "Creating schema and tables..." - -# Enable pgcrypto extension for gen_random_bytes -sudo -u postgres psql -d "$DB_NAME" -c "CREATE EXTENSION IF NOT EXISTS pgcrypto;" - -# Create schema with realistic business tables -sudo -u postgres psql -d "$DB_NAME" <<'EOF' --- Create tables for a document management system -CREATE TABLE departments ( - dept_id SERIAL PRIMARY KEY, - dept_name VARCHAR(100) NOT NULL, - created_at TIMESTAMP DEFAULT NOW() -); - -CREATE TABLE employees ( - emp_id SERIAL PRIMARY KEY, - dept_id INTEGER REFERENCES departments(dept_id), - first_name VARCHAR(50) NOT NULL, - last_name VARCHAR(50) NOT NULL, - email VARCHAR(100) UNIQUE, - hire_date DATE DEFAULT CURRENT_DATE -); - -CREATE TABLE document_types ( - type_id SERIAL PRIMARY KEY, - type_name VARCHAR(50) NOT NULL, - description TEXT -); - --- Table with large objects (BLOBs) -CREATE TABLE documents ( - doc_id SERIAL PRIMARY KEY, - emp_id INTEGER REFERENCES employees(emp_id), - type_id INTEGER REFERENCES document_types(type_id), - title VARCHAR(255) NOT NULL, - description TEXT, - file_data OID, -- Large object reference - file_size INTEGER, - mime_type VARCHAR(100), - created_at TIMESTAMP DEFAULT NOW(), - updated_at TIMESTAMP DEFAULT NOW() -); - -CREATE TABLE images ( - image_id SERIAL PRIMARY KEY, - doc_id INTEGER REFERENCES documents(doc_id), - image_name VARCHAR(255), - image_data OID, -- Large object reference - thumbnail_data OID, -- Another large object - width INTEGER, - height INTEGER, - created_at TIMESTAMP DEFAULT NOW() -); - -CREATE TABLE audit_log ( - log_id SERIAL PRIMARY KEY, - table_name VARCHAR(50), - record_id INTEGER, - action VARCHAR(20), - changed_by INTEGER, - changed_at TIMESTAMP DEFAULT NOW(), - details JSONB -); - --- Create indexes -CREATE INDEX idx_documents_emp ON documents(emp_id); -CREATE INDEX idx_documents_type ON documents(type_id); -CREATE INDEX idx_images_doc ON images(doc_id); -CREATE INDEX idx_audit_table ON audit_log(table_name, record_id); - --- Insert reference data -INSERT INTO departments (dept_name) VALUES - ('Engineering'), ('Sales'), ('Marketing'), ('HR'), ('Finance'); - -INSERT INTO document_types (type_name, description) VALUES - ('Contract', 'Legal contracts and agreements'), - ('Invoice', 'Financial invoices and receipts'), - ('Report', 'Business reports and analysis'), - ('Manual', 'Technical manuals and guides'), - ('Presentation', 'Presentation slides and materials'); - --- Insert employees -INSERT INTO employees (dept_id, first_name, last_name, email) -SELECT - (random() * 4 + 1)::INTEGER, - 'Employee_' || generate_series, - 'LastName_' || generate_series, - 'employee' || generate_series || '@d7030.com' -FROM generate_series(1, 50); - -EOF - -echo "Inserting documents with large objects (BLOBs)..." -echo "This will take several minutes to create ~25GB of data..." - -# Create temporary files with random data for importing in postgres home -# Make documents larger for 25GB target: ~1MB each -TEMP_FILE="/var/lib/pgsql/test_blob_data.bin" -sudo dd if=/dev/urandom of="$TEMP_FILE" bs=1M count=1 2>/dev/null -sudo chown postgres:postgres "$TEMP_FILE" - -# Create documents with actual large objects using lo_import -sudo -u postgres psql -d "$DB_NAME" </dev/null -sudo dd if=/dev/urandom of="$TEMP_THUMB" bs=1K count=200 2>/dev/null -sudo chown postgres:postgres "$TEMP_IMAGE" "$TEMP_THUMB" - -# Create images with multiple large objects per record -sudo -u postgres psql -d "$DB_NAME" </dev/null - echo "# Increased by fix_max_locks.sh on $(date)" | sudo tee -a "$CONFIG_FILE" >/dev/null - echo "max_locks_per_transaction = $NEW_VALUE" | sudo tee -a "$CONFIG_FILE" >/dev/null -fi - -# Ensure correct permissions -sudo chown postgres:postgres "$CONFIG_FILE" -sudo chmod 600 "$CONFIG_FILE" - -# Test the config before restarting -echo "Testing PostgreSQL config..." -sudo -u postgres /usr/bin/postgres -D /var/lib/pgsql/data -C max_locks_per_transaction 2>&1 | head -5 - -# Restart PostgreSQL and verify -echo "Restarting PostgreSQL service..." -sudo systemctl restart postgresql -sleep 3 - -if sudo systemctl is-active --quiet postgresql; then - echo "✅ PostgreSQL restarted successfully" - sudo -u postgres psql -c "SHOW max_locks_per_transaction;" -else - echo "❌ PostgreSQL failed to start!" - echo "Restoring backup..." - sudo cp "$BACKUP_FILE" "$CONFIG_FILE" - sudo systemctl start postgresql - echo "Original config restored. Check /var/log/postgresql for errors." - exit 1 -fi - -echo "" -echo "Success! Backup available at: $BACKUP_FILE" -exit 0 diff --git a/test_backup_restore.sh b/test_backup_restore.sh deleted file mode 100755 index 494c856..0000000 --- a/test_backup_restore.sh +++ /dev/null @@ -1,51 +0,0 @@ -#!/bin/bash -set -e - -LOG="/var/lib/pgsql/dbbackup_test.log" - -echo "=== Database Backup/Restore Test ===" | tee $LOG -echo "Started: $(date)" | tee -a $LOG -echo "" | tee -a $LOG - -cd /root/dbbackup - -# Step 1: Cluster Backup -echo "STEP 1: Creating cluster backup..." | tee -a $LOG -sudo -u postgres ./dbbackup backup cluster --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG -BACKUP_FILE=$(ls -t /var/lib/pgsql/db_backups/cluster_*.tar.gz | head -1) -echo "Backup created: $BACKUP_FILE" | tee -a $LOG -echo "Backup size: $(ls -lh $BACKUP_FILE | awk '{print $5}')" | tee -a $LOG -echo "" | tee -a $LOG - -# Step 2: Drop d7030 database to prepare for restore test -echo "STEP 2: Dropping d7030 database for clean restore test..." | tee -a $LOG -sudo -u postgres psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'd7030' AND pid <> pg_backend_pid();" 2>&1 | tee -a $LOG -sudo -u postgres psql -d postgres -c "DROP DATABASE IF EXISTS d7030;" 2>&1 | tee -a $LOG -echo "d7030 database dropped" | tee -a $LOG -echo "" | tee -a $LOG - -# Step 3: Cluster Restore -echo "STEP 3: Restoring cluster from backup..." | tee -a $LOG -sudo -u postgres ./dbbackup restore cluster $BACKUP_FILE --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG -echo "Restore completed" | tee -a $LOG -echo "" | tee -a $LOG - -# Step 4: Verify restored data -echo "STEP 4: Verifying restored databases..." | tee -a $LOG -sudo -u postgres psql -d postgres -c "\l" 2>&1 | tee -a $LOG -echo "" | tee -a $LOG -echo "Checking d7030 large objects..." | tee -a $LOG -BLOB_COUNT=$(sudo -u postgres psql -d d7030 -t -c "SELECT count(*) FROM pg_largeobject_metadata;" 2>/dev/null || echo "0") -echo "Large objects in d7030: $BLOB_COUNT" | tee -a $LOG -echo "" | tee -a $LOG - -# Step 5: Cleanup -echo "STEP 5: Cleaning up test backup..." | tee -a $LOG -rm -f $BACKUP_FILE -echo "Backup file deleted: $BACKUP_FILE" | tee -a $LOG -echo "" | tee -a $LOG - -echo "=== TEST COMPLETE ===" | tee -a $LOG -echo "Finished: $(date)" | tee -a $LOG -echo "" | tee -a $LOG -echo "✅ Full test log available at: $LOG" diff --git a/test_build b/test_build deleted file mode 100755 index 6625942..0000000 Binary files a/test_build and /dev/null differ diff --git a/verify_backup_blobs.sh b/verify_backup_blobs.sh deleted file mode 100755 index 97f35b0..0000000 --- a/verify_backup_blobs.sh +++ /dev/null @@ -1,57 +0,0 @@ -#!/bin/bash -# Verify that backup contains large objects (BLOBs) - -if [ $# -eq 0 ]; then - echo "Usage: $0 " - echo "Example: $0 /var/lib/pgsql/db_backups/d7030.dump" - exit 1 -fi - -BACKUP_FILE="$1" - -if [ ! -f "$BACKUP_FILE" ]; then - echo "Error: File not found: $BACKUP_FILE" - exit 1 -fi - -echo "=========================================" -echo "Backup BLOB/Large Object Verification" -echo "=========================================" -echo "File: $BACKUP_FILE" -echo "" - -# Check if file is a valid PostgreSQL dump -echo "1. Checking dump file format..." -pg_restore -l "$BACKUP_FILE" > /dev/null 2>&1 -if [ $? -eq 0 ]; then - echo " ✅ Valid PostgreSQL custom format dump" -else - echo " ❌ Not a valid pg_dump custom format file" - exit 1 -fi - -# List table of contents and look for BLOB entries -echo "" -echo "2. Checking for BLOB/Large Object entries..." -BLOB_COUNT=$(pg_restore -l "$BACKUP_FILE" | grep -i "BLOB\|LARGE OBJECT" | wc -l) - -if [ $BLOB_COUNT -gt 0 ]; then - echo " ✅ Found $BLOB_COUNT large object entries in backup" - echo "" - echo " Sample entries:" - pg_restore -l "$BACKUP_FILE" | grep -i "BLOB\|LARGE OBJECT" | head -10 -else - echo " ⚠️ No large object entries found" - echo " This could mean:" - echo " - Database has no large objects (normal)" - echo " - Backup was created without --blobs flag (problem)" -fi - -echo "" -echo "3. Full table of contents summary..." -pg_restore -l "$BACKUP_FILE" | tail -20 - -echo "" -echo "=========================================" -echo "Verification complete" -echo "========================================="