diff --git a/TUI_IMPROVEMENTS.md b/TUI_IMPROVEMENTS.md new file mode 100644 index 0000000..307f4ed --- /dev/null +++ b/TUI_IMPROVEMENTS.md @@ -0,0 +1,250 @@ +# Interactive TUI Experience Improvements + +## Current Issues & Solutions + +### 1. **Progress Visibility During Long Operations** + +**Problem**: Cluster backup/restore with large databases (40GB+) takes 30+ minutes with minimal feedback. + +**Solutions**: +- โœ… Show current database being processed +- โœ… Display database size before backup/restore starts +- โœ… ETA estimator for multi-database operations +- ๐Ÿ”„ **NEW**: Real-time progress bar per database (bytes processed / total bytes) +- ๐Ÿ”„ **NEW**: Show current operation speed (MB/s) +- ๐Ÿ”„ **NEW**: Percentage complete for entire cluster operation + +### 2. **Error Handling & Recovery** + +**Problem**: When restore fails (like resydb with 2.5M errors), user has no context about WHY or WHAT to do. + +**Solutions**: +- โœ… Distinguish ignorable errors (already exists) from critical errors +- ๐Ÿ”„ **NEW**: Show error classification in TUI: + ``` + โš ๏ธ WARNING: 5 ignorable errors (objects already exist) + โŒ CRITICAL: Syntax errors detected - dump file may be corrupted + ๐Ÿ’ก HINT: Re-create backup with: dbbackup backup single resydb + ``` +- ๐Ÿ”„ **NEW**: Offer retry option for failed databases +- ๐Ÿ”„ **NEW**: Skip vs Abort choice for non-critical failures + +### 3. **Large Object Detection Feedback** + +**Problem**: User doesn't know WHY parallelism was reduced. + +**Solution**: +``` +๐Ÿ” Scanning cluster backup for large objects... + โœ“ postgres: No large objects + โš ๏ธ d7030: 35,000 BLOBs detected (42GB) + +โš™๏ธ Automatically reducing parallelism: 2 โ†’ 1 (sequential) +๐Ÿ’ก Reason: Large objects require exclusive lock table access +``` + +### 4. **Disk Space Warnings** + +**Problem**: Backup fails silently when disk is full. + +**Solutions**: +- ๐Ÿ”„ **NEW**: Pre-flight check before backup: + ``` + ๐Ÿ“Š Disk Space Check: + Database size: 42GB + Available space: 66GB + Estimated backup: ~15GB (compressed) + โœ“ Sufficient space available + ``` +- ๐Ÿ”„ **NEW**: Warning at 80% disk usage +- ๐Ÿ”„ **NEW**: Block operation at 95% disk usage + +### 5. **Cancellation Handling (Ctrl+C)** + +**Problem**: Users don't know if Ctrl+C will work or leave partial backups. + +**Solutions**: +- โœ… Graceful cancellation on Ctrl+C +- ๐Ÿ”„ **NEW**: Show cleanup message: + ``` + ^C received - Cancelling backup... + ๐Ÿงน Cleaning up temporary files... + โœ“ Cleanup complete - no partial backups left + ``` +- ๐Ÿ”„ **NEW**: Confirmation prompt for cluster operations: + ``` + โš ๏ธ Cluster backup in progress (3/10 databases) + Are you sure you want to cancel? (y/N) + ``` + +### 6. **Interactive Mode Navigation** + +**Problem**: TUI menu is basic, no keyboard shortcuts, no search. + +**Solutions**: +- ๐Ÿ”„ **NEW**: Keyboard shortcuts: + - `1-9`: Quick jump to menu items + - `q`: Quit + - `r`: Refresh status + - `/`: Search backups +- ๐Ÿ”„ **NEW**: Backup list improvements: + ``` + ๐Ÿ“ฆ Available Backups: + + 1. cluster_20251118_103045.tar.gz [45GB] โฑ 2 hours ago + โ”œโ”€ postgres (325MB) + โ”œโ”€ d7030 (42GB) โš ๏ธ 35K BLOBs + โ””โ”€ template1 (8MB) + + 2. cluster_20251112_084329.tar.gz [38GB] โฑ 6 days ago + โ””โ”€ โš ๏ธ WARNING: May contain corrupted resydb dump + ``` +- ๐Ÿ”„ **NEW**: Filter/sort options: by date, by size, by status + +### 7. **Configuration Recommendations** + +**Problem**: Users don't know optimal settings for their workload. + +**Solutions**: +- ๐Ÿ”„ **NEW**: Auto-detect and suggest settings on first run: + ``` + ๐Ÿ”ง System Configuration Detected: + RAM: 32GB โ†’ Recommended: shared_buffers=8GB + CPUs: 4 cores โ†’ Recommended: parallel_jobs=3 + Disk: 66GB free โ†’ Recommended: max backup size: 50GB + + Apply these settings? (Y/n) + ``` +- ๐Ÿ”„ **NEW**: Show current vs recommended config in menu: + ``` + โš™๏ธ Configuration Status: + max_locks_per_transaction: 256 โœ“ (sufficient for 35K objects) + maintenance_work_mem: 64MB โš ๏ธ (recommend: 1GB for faster restores) + shared_buffers: 128MB โš ๏ธ (recommend: 8GB with 32GB RAM) + ``` + +### 8. **Backup Verification & Health** + +**Problem**: No way to verify backup integrity before restore. + +**Solutions**: +- ๐Ÿ”„ **NEW**: Add "Verify Backup" menu option: + ``` + ๐Ÿ” Verifying backup: cluster_20251118_103045.tar.gz + โœ“ Archive integrity: OK + โœ“ Extracting metadata... + โœ“ Checking dump formats... + + Databases found: + โœ“ postgres: Custom format, 325MB + โœ“ d7030: Custom format, 42GB, 35,000 BLOBs + โš ๏ธ resydb: CORRUPTED - 2.5M syntax errors detected + + Overall: โš ๏ธ Partial (2/3 databases healthy) + ``` +- ๐Ÿ”„ **NEW**: Show last backup status in main menu + +### 9. **Restore Dry Run** + +**Problem**: No preview of what will be restored. + +**Solution**: +``` +๐ŸŽฌ Restore Preview (Dry Run): + +Target: cluster_20251118_103045.tar.gz +Databases to restore: + 1. postgres (325MB) + - Will overwrite: 5 existing objects + - New objects: 120 + + 2. d7030 (42GB, 35K BLOBs) + - Will DROP and recreate database + - Estimated time: 25-30 minutes + - Required locks: 35,000 (available: 25,600) โš ๏ธ + +โš ๏ธ WARNING: Insufficient locks for d7030 +๐Ÿ’ก Solution: Increase max_locks_per_transaction to 512 + +Proceed with restore? (y/N) +``` + +### 10. **Multi-Step Wizards** + +**Problem**: Complex operations (like cluster restore with --clean) need multiple confirmations. + +**Solution**: Step-by-step wizard: +``` +Step 1/4: Select backup +Step 2/4: Review databases to restore +Step 3/4: Check prerequisites (disk space, locks, etc.) +Step 4/4: Confirm and execute +``` + +## Implementation Priority + +### Phase 1 (High Impact, Low Effort) โœ… +- โœ… ETA estimators +- โœ… Large object detection warnings +- โœ… Ctrl+C handling +- โœ… Ignorable error detection + +### Phase 2 (High Impact, Medium Effort) ๐Ÿ”„ +- Real-time progress bars with MB/s +- Disk space pre-flight checks +- Backup verification tool +- Error hints and suggestions + +### Phase 3 (Quality of Life) ๐Ÿ”„ +- Keyboard shortcuts +- Backup list with metadata +- Configuration recommendations +- Restore dry run + +### Phase 4 (Advanced) ๐Ÿ“‹ +- Multi-step wizards +- Search/filter backups +- Auto-retry failed databases +- Parallel restore progress split-view + +## Code Structure + +``` +internal/tui/ + menu.go - Main interactive menu + backup_menu.go - Backup wizard + restore_menu.go - Restore wizard + verify_menu.go - Backup verification (NEW) + config_menu.go - Configuration tuning (NEW) + progress_view.go - Real-time progress display (ENHANCED) + errors.go - Error classification & hints (NEW) +``` + +## Testing Plan + +1. **Large Database Test** (In Progress) + - 42GB d7030 with 35K BLOBs + - Verify progress updates + - Verify large object detection + - Verify successful restore + +2. **Error Scenarios** + - Corrupted dump file + - Insufficient disk space + - Insufficient locks + - Network interruption + - Ctrl+C during operations + +3. **Performance** + - Backup time vs raw pg_dump + - Restore time vs raw pg_restore + - Memory usage during 40GB+ operations + - CPU utilization with parallel workers + +## Success Metrics + +- โœ… No "black box" operations - user always knows what's happening +- โœ… Errors are actionable - user knows what to fix +- โœ… Safe operations - confirmations for destructive actions +- โœ… Fast feedback - progress updates every 1-2 seconds +- โœ… Professional feel - polished, consistent, intuitive diff --git a/test_backup_restore.sh b/test_backup_restore.sh new file mode 100755 index 0000000..494c856 --- /dev/null +++ b/test_backup_restore.sh @@ -0,0 +1,51 @@ +#!/bin/bash +set -e + +LOG="/var/lib/pgsql/dbbackup_test.log" + +echo "=== Database Backup/Restore Test ===" | tee $LOG +echo "Started: $(date)" | tee -a $LOG +echo "" | tee -a $LOG + +cd /root/dbbackup + +# Step 1: Cluster Backup +echo "STEP 1: Creating cluster backup..." | tee -a $LOG +sudo -u postgres ./dbbackup backup cluster --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG +BACKUP_FILE=$(ls -t /var/lib/pgsql/db_backups/cluster_*.tar.gz | head -1) +echo "Backup created: $BACKUP_FILE" | tee -a $LOG +echo "Backup size: $(ls -lh $BACKUP_FILE | awk '{print $5}')" | tee -a $LOG +echo "" | tee -a $LOG + +# Step 2: Drop d7030 database to prepare for restore test +echo "STEP 2: Dropping d7030 database for clean restore test..." | tee -a $LOG +sudo -u postgres psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'd7030' AND pid <> pg_backend_pid();" 2>&1 | tee -a $LOG +sudo -u postgres psql -d postgres -c "DROP DATABASE IF EXISTS d7030;" 2>&1 | tee -a $LOG +echo "d7030 database dropped" | tee -a $LOG +echo "" | tee -a $LOG + +# Step 3: Cluster Restore +echo "STEP 3: Restoring cluster from backup..." | tee -a $LOG +sudo -u postgres ./dbbackup restore cluster $BACKUP_FILE --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG +echo "Restore completed" | tee -a $LOG +echo "" | tee -a $LOG + +# Step 4: Verify restored data +echo "STEP 4: Verifying restored databases..." | tee -a $LOG +sudo -u postgres psql -d postgres -c "\l" 2>&1 | tee -a $LOG +echo "" | tee -a $LOG +echo "Checking d7030 large objects..." | tee -a $LOG +BLOB_COUNT=$(sudo -u postgres psql -d d7030 -t -c "SELECT count(*) FROM pg_largeobject_metadata;" 2>/dev/null || echo "0") +echo "Large objects in d7030: $BLOB_COUNT" | tee -a $LOG +echo "" | tee -a $LOG + +# Step 5: Cleanup +echo "STEP 5: Cleaning up test backup..." | tee -a $LOG +rm -f $BACKUP_FILE +echo "Backup file deleted: $BACKUP_FILE" | tee -a $LOG +echo "" | tee -a $LOG + +echo "=== TEST COMPLETE ===" | tee -a $LOG +echo "Finished: $(date)" | tee -a $LOG +echo "" | tee -a $LOG +echo "โœ… Full test log available at: $LOG"