Add comprehensive TUI improvement plan and background test script

- Created TUI_IMPROVEMENTS.md with 10 major UX enhancements - Prioritized improvements into 4 phases (Phase 1 already complete) - Created test_backup_restore.sh for safe background testing - Plan includes: real-time progress, error hints, disk checks, backup verification - Focus on making operations transparent, actionable, and professional - Background test running: backup → restore → verify → cleanup cycle
2025-11-18 12:42:06 +00:00
parent f80f19fe93
commit 3a2ff21e6f
2 changed files with 301 additions and 0 deletions
--- a/TUI_IMPROVEMENTS.md
+++ b/TUI_IMPROVEMENTS.md
@@ -0,0 +1,250 @@
+# Interactive TUI Experience Improvements
+
+## Current Issues & Solutions
+
+### 1. **Progress Visibility During Long Operations**
+
+**Problem**: Cluster backup/restore with large databases (40GB+) takes 30+ minutes with minimal feedback.
+
+**Solutions**:
+- ✅ Show current database being processed
+- ✅ Display database size before backup/restore starts
+- ✅ ETA estimator for multi-database operations
+- 🔄 **NEW**: Real-time progress bar per database (bytes processed / total bytes)
+- 🔄 **NEW**: Show current operation speed (MB/s)
+- 🔄 **NEW**: Percentage complete for entire cluster operation
+
+### 2. **Error Handling & Recovery**
+
+**Problem**: When restore fails (like resydb with 2.5M errors), user has no context about WHY or WHAT to do.
+
+**Solutions**:
+- ✅ Distinguish ignorable errors (already exists) from critical errors
+- 🔄 **NEW**: Show error classification in TUI:
+  ```
+  ⚠️  WARNING: 5 ignorable errors (objects already exist)
+  ❌ CRITICAL: Syntax errors detected - dump file may be corrupted
+  💡 HINT: Re-create backup with: dbbackup backup single resydb
+  ```
+- 🔄 **NEW**: Offer retry option for failed databases
+- 🔄 **NEW**: Skip vs Abort choice for non-critical failures
+
+### 3. **Large Object Detection Feedback**
+
+**Problem**: User doesn't know WHY parallelism was reduced.
+
+**Solution**:
+```
+🔍 Scanning cluster backup for large objects...
+   ✓ postgres: No large objects
+   ⚠️  d7030: 35,000 BLOBs detected (42GB)
+   
+⚙️  Automatically reducing parallelism: 2 → 1 (sequential)
+💡 Reason: Large objects require exclusive lock table access
+```
+
+### 4. **Disk Space Warnings**
+
+**Problem**: Backup fails silently when disk is full.
+
+**Solutions**:
+- 🔄 **NEW**: Pre-flight check before backup:
+  ```
+  📊 Disk Space Check:
+     Database size: 42GB
+     Available space: 66GB
+     Estimated backup: ~15GB (compressed)
+     ✓ Sufficient space available
+  ```
+- 🔄 **NEW**: Warning at 80% disk usage
+- 🔄 **NEW**: Block operation at 95% disk usage
+
+### 5. **Cancellation Handling (Ctrl+C)**
+
+**Problem**: Users don't know if Ctrl+C will work or leave partial backups.
+
+**Solutions**:
+- ✅ Graceful cancellation on Ctrl+C
+- 🔄 **NEW**: Show cleanup message:
+  ```
+  ^C received - Cancelling backup...
+  🧹 Cleaning up temporary files...
+  ✓ Cleanup complete - no partial backups left
+  ```
+- 🔄 **NEW**: Confirmation prompt for cluster operations:
+  ```
+  ⚠️  Cluster backup in progress (3/10 databases)
+  Are you sure you want to cancel? (y/N)
+  ```
+
+### 6. **Interactive Mode Navigation**
+
+**Problem**: TUI menu is basic, no keyboard shortcuts, no search.
+
+**Solutions**:
+- 🔄 **NEW**: Keyboard shortcuts:
+  - `1-9`: Quick jump to menu items
+  - `q`: Quit
+  - `r`: Refresh status
+  - `/`: Search backups
+- 🔄 **NEW**: Backup list improvements:
+  ```
+  📦 Available Backups:
+  
+  1. cluster_20251118_103045.tar.gz  [45GB]  ⏱ 2 hours ago
+     ├─ postgres (325MB)
+     ├─ d7030 (42GB) ⚠️ 35K BLOBs
+     └─ template1 (8MB)
+  
+  2. cluster_20251112_084329.tar.gz  [38GB]  ⏱ 6 days ago
+     └─ ⚠️ WARNING: May contain corrupted resydb dump
+  ```
+- 🔄 **NEW**: Filter/sort options: by date, by size, by status
+
+### 7. **Configuration Recommendations**
+
+**Problem**: Users don't know optimal settings for their workload.
+
+**Solutions**:
+- 🔄 **NEW**: Auto-detect and suggest settings on first run:
+  ```
+  🔧 System Configuration Detected:
+     RAM: 32GB → Recommended: shared_buffers=8GB
+     CPUs: 4 cores → Recommended: parallel_jobs=3
+     Disk: 66GB free → Recommended: max backup size: 50GB
+     
+  Apply these settings? (Y/n)
+  ```
+- 🔄 **NEW**: Show current vs recommended config in menu:
+  ```
+  ⚙️  Configuration Status:
+     max_locks_per_transaction: 256 ✓ (sufficient for 35K objects)
+     maintenance_work_mem: 64MB ⚠️ (recommend: 1GB for faster restores)
+     shared_buffers: 128MB ⚠️ (recommend: 8GB with 32GB RAM)
+  ```
+
+### 8. **Backup Verification & Health**
+
+**Problem**: No way to verify backup integrity before restore.
+
+**Solutions**:
+- 🔄 **NEW**: Add "Verify Backup" menu option:
+  ```
+  🔍 Verifying backup: cluster_20251118_103045.tar.gz
+     ✓ Archive integrity: OK
+     ✓ Extracting metadata...
+     ✓ Checking dump formats...
+     
+  Databases found:
+     ✓ postgres: Custom format, 325MB
+     ✓ d7030: Custom format, 42GB, 35,000 BLOBs
+     ⚠️  resydb: CORRUPTED - 2.5M syntax errors detected
+     
+  Overall: ⚠️ Partial (2/3 databases healthy)
+  ```
+- 🔄 **NEW**: Show last backup status in main menu
+
+### 9. **Restore Dry Run**
+
+**Problem**: No preview of what will be restored.
+
+**Solution**:
+```
+🎬 Restore Preview (Dry Run):
+
+Target: cluster_20251118_103045.tar.gz
+Databases to restore:
+  1. postgres (325MB)
+     - Will overwrite: 5 existing objects
+     - New objects: 120
+     
+  2. d7030 (42GB, 35K BLOBs)
+     - Will DROP and recreate database
+     - Estimated time: 25-30 minutes
+     - Required locks: 35,000 (available: 25,600) ⚠️
+     
+⚠️  WARNING: Insufficient locks for d7030
+💡 Solution: Increase max_locks_per_transaction to 512
+
+Proceed with restore? (y/N)
+```
+
+### 10. **Multi-Step Wizards**
+
+**Problem**: Complex operations (like cluster restore with --clean) need multiple confirmations.
+
+**Solution**: Step-by-step wizard:
+```
+Step 1/4: Select backup
+Step 2/4: Review databases to restore
+Step 3/4: Check prerequisites (disk space, locks, etc.)
+Step 4/4: Confirm and execute
+```
+
+## Implementation Priority
+
+### Phase 1 (High Impact, Low Effort) ✅
+- ✅ ETA estimators
+- ✅ Large object detection warnings
+- ✅ Ctrl+C handling
+- ✅ Ignorable error detection
+
+### Phase 2 (High Impact, Medium Effort) 🔄
+- Real-time progress bars with MB/s
+- Disk space pre-flight checks
+- Backup verification tool
+- Error hints and suggestions
+
+### Phase 3 (Quality of Life) 🔄
+- Keyboard shortcuts
+- Backup list with metadata
+- Configuration recommendations
+- Restore dry run
+
+### Phase 4 (Advanced) 📋
+- Multi-step wizards
+- Search/filter backups
+- Auto-retry failed databases
+- Parallel restore progress split-view
+
+## Code Structure
+
+```
+internal/tui/
+  menu.go          - Main interactive menu
+  backup_menu.go   - Backup wizard
+  restore_menu.go  - Restore wizard
+  verify_menu.go   - Backup verification (NEW)
+  config_menu.go   - Configuration tuning (NEW)
+  progress_view.go - Real-time progress display (ENHANCED)
+  errors.go        - Error classification & hints (NEW)
+```
+
+## Testing Plan
+
+1. **Large Database Test** (In Progress)
+   - 42GB d7030 with 35K BLOBs
+   - Verify progress updates
+   - Verify large object detection
+   - Verify successful restore
+
+2. **Error Scenarios**
+   - Corrupted dump file
+   - Insufficient disk space
+   - Insufficient locks
+   - Network interruption
+   - Ctrl+C during operations
+
+3. **Performance**
+   - Backup time vs raw pg_dump
+   - Restore time vs raw pg_restore
+   - Memory usage during 40GB+ operations
+   - CPU utilization with parallel workers
+
+## Success Metrics
+
+- ✅ No "black box" operations - user always knows what's happening
+- ✅ Errors are actionable - user knows what to fix
+- ✅ Safe operations - confirmations for destructive actions
+- ✅ Fast feedback - progress updates every 1-2 seconds
+- ✅ Professional feel - polished, consistent, intuitive
--- a/test_backup_restore.sh
+++ b/test_backup_restore.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+set -e
+
+LOG="/var/lib/pgsql/dbbackup_test.log"
+
+echo "=== Database Backup/Restore Test ===" | tee $LOG
+echo "Started: $(date)" | tee -a $LOG
+echo "" | tee -a $LOG
+
+cd /root/dbbackup
+
+# Step 1: Cluster Backup
+echo "STEP 1: Creating cluster backup..." | tee -a $LOG
+sudo -u postgres ./dbbackup backup cluster --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG
+BACKUP_FILE=$(ls -t /var/lib/pgsql/db_backups/cluster_*.tar.gz | head -1)
+echo "Backup created: $BACKUP_FILE" | tee -a $LOG
+echo "Backup size: $(ls -lh $BACKUP_FILE | awk '{print $5}')" | tee -a $LOG
+echo "" | tee -a $LOG
+
+# Step 2: Drop d7030 database to prepare for restore test
+echo "STEP 2: Dropping d7030 database for clean restore test..." | tee -a $LOG
+sudo -u postgres psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'd7030' AND pid <> pg_backend_pid();" 2>&1 | tee -a $LOG
+sudo -u postgres psql -d postgres -c "DROP DATABASE IF EXISTS d7030;" 2>&1 | tee -a $LOG
+echo "d7030 database dropped" | tee -a $LOG
+echo "" | tee -a $LOG
+
+# Step 3: Cluster Restore
+echo "STEP 3: Restoring cluster from backup..." | tee -a $LOG
+sudo -u postgres ./dbbackup restore cluster $BACKUP_FILE --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG
+echo "Restore completed" | tee -a $LOG
+echo "" | tee -a $LOG
+
+# Step 4: Verify restored data
+echo "STEP 4: Verifying restored databases..." | tee -a $LOG
+sudo -u postgres psql -d postgres -c "\l" 2>&1 | tee -a $LOG
+echo "" | tee -a $LOG
+echo "Checking d7030 large objects..." | tee -a $LOG
+BLOB_COUNT=$(sudo -u postgres psql -d d7030 -t -c "SELECT count(*) FROM pg_largeobject_metadata;" 2>/dev/null || echo "0")
+echo "Large objects in d7030: $BLOB_COUNT" | tee -a $LOG
+echo "" | tee -a $LOG
+
+# Step 5: Cleanup
+echo "STEP 5: Cleaning up test backup..." | tee -a $LOG
+rm -f $BACKUP_FILE
+echo "Backup file deleted: $BACKUP_FILE" | tee -a $LOG
+echo "" | tee -a $LOG
+
+echo "=== TEST COMPLETE ===" | tee -a $LOG
+echo "Finished: $(date)" | tee -a $LOG
+echo "" | tee -a $LOG
+echo "✅ Full test log available at: $LOG"