Add comprehensive TUI improvement plan and background test script
- Created TUI_IMPROVEMENTS.md with 10 major UX enhancements - Prioritized improvements into 4 phases (Phase 1 already complete) - Created test_backup_restore.sh for safe background testing - Plan includes: real-time progress, error hints, disk checks, backup verification - Focus on making operations transparent, actionable, and professional - Background test running: backup → restore → verify → cleanup cycle
This commit is contained in:
250
TUI_IMPROVEMENTS.md
Normal file
250
TUI_IMPROVEMENTS.md
Normal file
@@ -0,0 +1,250 @@
|
||||
# Interactive TUI Experience Improvements
|
||||
|
||||
## Current Issues & Solutions
|
||||
|
||||
### 1. **Progress Visibility During Long Operations**
|
||||
|
||||
**Problem**: Cluster backup/restore with large databases (40GB+) takes 30+ minutes with minimal feedback.
|
||||
|
||||
**Solutions**:
|
||||
- ✅ Show current database being processed
|
||||
- ✅ Display database size before backup/restore starts
|
||||
- ✅ ETA estimator for multi-database operations
|
||||
- 🔄 **NEW**: Real-time progress bar per database (bytes processed / total bytes)
|
||||
- 🔄 **NEW**: Show current operation speed (MB/s)
|
||||
- 🔄 **NEW**: Percentage complete for entire cluster operation
|
||||
|
||||
### 2. **Error Handling & Recovery**
|
||||
|
||||
**Problem**: When restore fails (like resydb with 2.5M errors), user has no context about WHY or WHAT to do.
|
||||
|
||||
**Solutions**:
|
||||
- ✅ Distinguish ignorable errors (already exists) from critical errors
|
||||
- 🔄 **NEW**: Show error classification in TUI:
|
||||
```
|
||||
⚠️ WARNING: 5 ignorable errors (objects already exist)
|
||||
❌ CRITICAL: Syntax errors detected - dump file may be corrupted
|
||||
💡 HINT: Re-create backup with: dbbackup backup single resydb
|
||||
```
|
||||
- 🔄 **NEW**: Offer retry option for failed databases
|
||||
- 🔄 **NEW**: Skip vs Abort choice for non-critical failures
|
||||
|
||||
### 3. **Large Object Detection Feedback**
|
||||
|
||||
**Problem**: User doesn't know WHY parallelism was reduced.
|
||||
|
||||
**Solution**:
|
||||
```
|
||||
🔍 Scanning cluster backup for large objects...
|
||||
✓ postgres: No large objects
|
||||
⚠️ d7030: 35,000 BLOBs detected (42GB)
|
||||
|
||||
⚙️ Automatically reducing parallelism: 2 → 1 (sequential)
|
||||
💡 Reason: Large objects require exclusive lock table access
|
||||
```
|
||||
|
||||
### 4. **Disk Space Warnings**
|
||||
|
||||
**Problem**: Backup fails silently when disk is full.
|
||||
|
||||
**Solutions**:
|
||||
- 🔄 **NEW**: Pre-flight check before backup:
|
||||
```
|
||||
📊 Disk Space Check:
|
||||
Database size: 42GB
|
||||
Available space: 66GB
|
||||
Estimated backup: ~15GB (compressed)
|
||||
✓ Sufficient space available
|
||||
```
|
||||
- 🔄 **NEW**: Warning at 80% disk usage
|
||||
- 🔄 **NEW**: Block operation at 95% disk usage
|
||||
|
||||
### 5. **Cancellation Handling (Ctrl+C)**
|
||||
|
||||
**Problem**: Users don't know if Ctrl+C will work or leave partial backups.
|
||||
|
||||
**Solutions**:
|
||||
- ✅ Graceful cancellation on Ctrl+C
|
||||
- 🔄 **NEW**: Show cleanup message:
|
||||
```
|
||||
^C received - Cancelling backup...
|
||||
🧹 Cleaning up temporary files...
|
||||
✓ Cleanup complete - no partial backups left
|
||||
```
|
||||
- 🔄 **NEW**: Confirmation prompt for cluster operations:
|
||||
```
|
||||
⚠️ Cluster backup in progress (3/10 databases)
|
||||
Are you sure you want to cancel? (y/N)
|
||||
```
|
||||
|
||||
### 6. **Interactive Mode Navigation**
|
||||
|
||||
**Problem**: TUI menu is basic, no keyboard shortcuts, no search.
|
||||
|
||||
**Solutions**:
|
||||
- 🔄 **NEW**: Keyboard shortcuts:
|
||||
- `1-9`: Quick jump to menu items
|
||||
- `q`: Quit
|
||||
- `r`: Refresh status
|
||||
- `/`: Search backups
|
||||
- 🔄 **NEW**: Backup list improvements:
|
||||
```
|
||||
📦 Available Backups:
|
||||
|
||||
1. cluster_20251118_103045.tar.gz [45GB] ⏱ 2 hours ago
|
||||
├─ postgres (325MB)
|
||||
├─ d7030 (42GB) ⚠️ 35K BLOBs
|
||||
└─ template1 (8MB)
|
||||
|
||||
2. cluster_20251112_084329.tar.gz [38GB] ⏱ 6 days ago
|
||||
└─ ⚠️ WARNING: May contain corrupted resydb dump
|
||||
```
|
||||
- 🔄 **NEW**: Filter/sort options: by date, by size, by status
|
||||
|
||||
### 7. **Configuration Recommendations**
|
||||
|
||||
**Problem**: Users don't know optimal settings for their workload.
|
||||
|
||||
**Solutions**:
|
||||
- 🔄 **NEW**: Auto-detect and suggest settings on first run:
|
||||
```
|
||||
🔧 System Configuration Detected:
|
||||
RAM: 32GB → Recommended: shared_buffers=8GB
|
||||
CPUs: 4 cores → Recommended: parallel_jobs=3
|
||||
Disk: 66GB free → Recommended: max backup size: 50GB
|
||||
|
||||
Apply these settings? (Y/n)
|
||||
```
|
||||
- 🔄 **NEW**: Show current vs recommended config in menu:
|
||||
```
|
||||
⚙️ Configuration Status:
|
||||
max_locks_per_transaction: 256 ✓ (sufficient for 35K objects)
|
||||
maintenance_work_mem: 64MB ⚠️ (recommend: 1GB for faster restores)
|
||||
shared_buffers: 128MB ⚠️ (recommend: 8GB with 32GB RAM)
|
||||
```
|
||||
|
||||
### 8. **Backup Verification & Health**
|
||||
|
||||
**Problem**: No way to verify backup integrity before restore.
|
||||
|
||||
**Solutions**:
|
||||
- 🔄 **NEW**: Add "Verify Backup" menu option:
|
||||
```
|
||||
🔍 Verifying backup: cluster_20251118_103045.tar.gz
|
||||
✓ Archive integrity: OK
|
||||
✓ Extracting metadata...
|
||||
✓ Checking dump formats...
|
||||
|
||||
Databases found:
|
||||
✓ postgres: Custom format, 325MB
|
||||
✓ d7030: Custom format, 42GB, 35,000 BLOBs
|
||||
⚠️ resydb: CORRUPTED - 2.5M syntax errors detected
|
||||
|
||||
Overall: ⚠️ Partial (2/3 databases healthy)
|
||||
```
|
||||
- 🔄 **NEW**: Show last backup status in main menu
|
||||
|
||||
### 9. **Restore Dry Run**
|
||||
|
||||
**Problem**: No preview of what will be restored.
|
||||
|
||||
**Solution**:
|
||||
```
|
||||
🎬 Restore Preview (Dry Run):
|
||||
|
||||
Target: cluster_20251118_103045.tar.gz
|
||||
Databases to restore:
|
||||
1. postgres (325MB)
|
||||
- Will overwrite: 5 existing objects
|
||||
- New objects: 120
|
||||
|
||||
2. d7030 (42GB, 35K BLOBs)
|
||||
- Will DROP and recreate database
|
||||
- Estimated time: 25-30 minutes
|
||||
- Required locks: 35,000 (available: 25,600) ⚠️
|
||||
|
||||
⚠️ WARNING: Insufficient locks for d7030
|
||||
💡 Solution: Increase max_locks_per_transaction to 512
|
||||
|
||||
Proceed with restore? (y/N)
|
||||
```
|
||||
|
||||
### 10. **Multi-Step Wizards**
|
||||
|
||||
**Problem**: Complex operations (like cluster restore with --clean) need multiple confirmations.
|
||||
|
||||
**Solution**: Step-by-step wizard:
|
||||
```
|
||||
Step 1/4: Select backup
|
||||
Step 2/4: Review databases to restore
|
||||
Step 3/4: Check prerequisites (disk space, locks, etc.)
|
||||
Step 4/4: Confirm and execute
|
||||
```
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
### Phase 1 (High Impact, Low Effort) ✅
|
||||
- ✅ ETA estimators
|
||||
- ✅ Large object detection warnings
|
||||
- ✅ Ctrl+C handling
|
||||
- ✅ Ignorable error detection
|
||||
|
||||
### Phase 2 (High Impact, Medium Effort) 🔄
|
||||
- Real-time progress bars with MB/s
|
||||
- Disk space pre-flight checks
|
||||
- Backup verification tool
|
||||
- Error hints and suggestions
|
||||
|
||||
### Phase 3 (Quality of Life) 🔄
|
||||
- Keyboard shortcuts
|
||||
- Backup list with metadata
|
||||
- Configuration recommendations
|
||||
- Restore dry run
|
||||
|
||||
### Phase 4 (Advanced) 📋
|
||||
- Multi-step wizards
|
||||
- Search/filter backups
|
||||
- Auto-retry failed databases
|
||||
- Parallel restore progress split-view
|
||||
|
||||
## Code Structure
|
||||
|
||||
```
|
||||
internal/tui/
|
||||
menu.go - Main interactive menu
|
||||
backup_menu.go - Backup wizard
|
||||
restore_menu.go - Restore wizard
|
||||
verify_menu.go - Backup verification (NEW)
|
||||
config_menu.go - Configuration tuning (NEW)
|
||||
progress_view.go - Real-time progress display (ENHANCED)
|
||||
errors.go - Error classification & hints (NEW)
|
||||
```
|
||||
|
||||
## Testing Plan
|
||||
|
||||
1. **Large Database Test** (In Progress)
|
||||
- 42GB d7030 with 35K BLOBs
|
||||
- Verify progress updates
|
||||
- Verify large object detection
|
||||
- Verify successful restore
|
||||
|
||||
2. **Error Scenarios**
|
||||
- Corrupted dump file
|
||||
- Insufficient disk space
|
||||
- Insufficient locks
|
||||
- Network interruption
|
||||
- Ctrl+C during operations
|
||||
|
||||
3. **Performance**
|
||||
- Backup time vs raw pg_dump
|
||||
- Restore time vs raw pg_restore
|
||||
- Memory usage during 40GB+ operations
|
||||
- CPU utilization with parallel workers
|
||||
|
||||
## Success Metrics
|
||||
|
||||
- ✅ No "black box" operations - user always knows what's happening
|
||||
- ✅ Errors are actionable - user knows what to fix
|
||||
- ✅ Safe operations - confirmations for destructive actions
|
||||
- ✅ Fast feedback - progress updates every 1-2 seconds
|
||||
- ✅ Professional feel - polished, consistent, intuitive
|
||||
51
test_backup_restore.sh
Executable file
51
test_backup_restore.sh
Executable file
@@ -0,0 +1,51 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
LOG="/var/lib/pgsql/dbbackup_test.log"
|
||||
|
||||
echo "=== Database Backup/Restore Test ===" | tee $LOG
|
||||
echo "Started: $(date)" | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
|
||||
cd /root/dbbackup
|
||||
|
||||
# Step 1: Cluster Backup
|
||||
echo "STEP 1: Creating cluster backup..." | tee -a $LOG
|
||||
sudo -u postgres ./dbbackup backup cluster --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG
|
||||
BACKUP_FILE=$(ls -t /var/lib/pgsql/db_backups/cluster_*.tar.gz | head -1)
|
||||
echo "Backup created: $BACKUP_FILE" | tee -a $LOG
|
||||
echo "Backup size: $(ls -lh $BACKUP_FILE | awk '{print $5}')" | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
|
||||
# Step 2: Drop d7030 database to prepare for restore test
|
||||
echo "STEP 2: Dropping d7030 database for clean restore test..." | tee -a $LOG
|
||||
sudo -u postgres psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'd7030' AND pid <> pg_backend_pid();" 2>&1 | tee -a $LOG
|
||||
sudo -u postgres psql -d postgres -c "DROP DATABASE IF EXISTS d7030;" 2>&1 | tee -a $LOG
|
||||
echo "d7030 database dropped" | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
|
||||
# Step 3: Cluster Restore
|
||||
echo "STEP 3: Restoring cluster from backup..." | tee -a $LOG
|
||||
sudo -u postgres ./dbbackup restore cluster $BACKUP_FILE --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG
|
||||
echo "Restore completed" | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
|
||||
# Step 4: Verify restored data
|
||||
echo "STEP 4: Verifying restored databases..." | tee -a $LOG
|
||||
sudo -u postgres psql -d postgres -c "\l" 2>&1 | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
echo "Checking d7030 large objects..." | tee -a $LOG
|
||||
BLOB_COUNT=$(sudo -u postgres psql -d d7030 -t -c "SELECT count(*) FROM pg_largeobject_metadata;" 2>/dev/null || echo "0")
|
||||
echo "Large objects in d7030: $BLOB_COUNT" | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
|
||||
# Step 5: Cleanup
|
||||
echo "STEP 5: Cleaning up test backup..." | tee -a $LOG
|
||||
rm -f $BACKUP_FILE
|
||||
echo "Backup file deleted: $BACKUP_FILE" | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
|
||||
echo "=== TEST COMPLETE ===" | tee -a $LOG
|
||||
echo "Finished: $(date)" | tee -a $LOG
|
||||
echo "" | tee -a $LOG
|
||||
echo "✅ Full test log available at: $LOG"
|
||||
Reference in New Issue
Block a user