Remove obsolete development documentation and test scripts

Removed files (features now implemented in production code):
- CLUSTER_RESTORE_COMPLIANCE.md - cluster restore best practices implemented
- LARGE_OBJECT_RESTORE_FIX.md - large object fixes applied (--single-transaction removed)
- PHASE2_COMPLETION.md - Phase 2 TUI improvements completed
- TUI_IMPROVEMENTS.md - all TUI enhancements implemented
- create_d7030_test.sh - test database no longer needed
- fix_max_locks.sh - fix applied to codebase
- test_backup_restore.sh - superseded by production features
- test_build - build artifact
- verify_backup_blobs.sh - verification built into restore process

All features documented in these files are now part of the main codebase and documented in README.md
This commit is contained in:
2025-11-19 05:07:08 +00:00
parent 6831d96dba
commit 0a6aec5801
9 changed files with 0 additions and 1277 deletions

View File

@@ -1,168 +0,0 @@
# PostgreSQL Cluster Restore - Best Practices Compliance Check
## ✅ Current Implementation Status
### Our Cluster Restore Process (internal/restore/engine.go)
Based on PostgreSQL official documentation and best practices, our implementation follows the correct approach:
## 1. ✅ Global Objects Restoration (FIRST)
```go
// Lines 505-528: Restore globals BEFORE databases
globalsFile := filepath.Join(tempDir, "globals.sql")
if _, err := os.Stat(globalsFile); err == nil {
e.restoreGlobals(ctx, globalsFile) // Restores roles, tablespaces FIRST
}
```
**Why:** Roles and tablespaces must exist before restoring databases that reference them.
## 2. ✅ Proper Database Cleanup (DROP IF EXISTS)
```go
// Lines 600-605: Drop existing database completely
e.dropDatabaseIfExists(ctx, dbName)
```
### dropDatabaseIfExists implementation (lines 835-870):
```go
// Step 1: Terminate all active connections
terminateConnections(ctx, dbName)
// Step 2: Wait for termination
time.Sleep(500 * time.Millisecond)
// Step 3: Drop database with IF EXISTS
DROP DATABASE IF EXISTS "dbName"
```
**PostgreSQL Docs**: "The `--clean` option can be useful even when your intention is to restore the dump script into a fresh cluster. Use of `--clean` authorizes the script to drop and re-create the built-in postgres and template1 databases."
## 3. ✅ Template0 for Database Creation
```go
// Line 915: Use template0 to avoid duplicate definitions
CREATE DATABASE "dbName" WITH TEMPLATE template0
```
**Why:** `template0` is truly empty, whereas `template1` may have local additions that cause "duplicate definition" errors.
**PostgreSQL Docs (pg_restore)**: "To make an empty database without any local additions, copy from template0 not template1, for example: CREATE DATABASE foo WITH TEMPLATE template0;"
## 4. ✅ Connection Termination Before Drop
```go
// Lines 800-833: terminateConnections function
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'dbname'
AND pid <> pg_backend_pid()
```
**Why:** Cannot drop a database with active connections. Must terminate them first.
## 5. ✅ Parallel Restore with Worker Pool
```go
// Lines 555-571: Parallel restore implementation
parallelism := e.cfg.ClusterParallelism
semaphore := make(chan struct{}, parallelism)
// Restores multiple databases concurrently
```
**Best Practice:** Significantly speeds up cluster restore (3-5x faster).
## 6. ✅ Error Handling and Reporting
```go
// Lines 628-645: Comprehensive error tracking
var failedDBs []string
var successCount, failCount int32
// Report failures at end
if len(failedDBs) > 0 {
return fmt.Errorf("cluster restore completed with %d failures: %s",
len(failedDBs), strings.Join(failedDBs, ", "))
}
```
## 7. ✅ Superuser Privilege Detection
```go
// Lines 488-503: Check for superuser
isSuperuser, err := e.checkSuperuser(ctx)
if !isSuperuser {
e.log.Warn("Current user is not a superuser - database ownership may not be fully restored")
}
```
**Why:** Ownership restoration requires superuser privileges. Warn user if not available.
## 8. ✅ System Database Skip Logic
```go
// Lines 877-881: Skip system databases
if dbName == "postgres" || dbName == "template0" || dbName == "template1" {
e.log.Info("Skipping create for system database (assume exists)")
return nil
}
```
**Why:** System databases always exist and should not be dropped/created.
---
## PostgreSQL Documentation References
### From pg_dumpall docs:
> "`-c, --clean`: Emit SQL commands to DROP all the dumped databases, roles, and tablespaces before recreating them. This option is useful when the restore is to overwrite an existing cluster."
### From managing-databases docs:
> "To destroy a database: DROP DATABASE name;"
> "You cannot drop a database while clients are connected to it. You can use pg_terminate_backend to disconnect them."
### From pg_restore docs:
> "To make an empty database without any local additions, copy from template0 not template1"
---
## Comparison with PostgreSQL Best Practices
| Practice | PostgreSQL Docs | Our Implementation | Status |
|----------|----------------|-------------------|--------|
| Restore globals first | ✅ Required | ✅ Implemented | ✅ CORRECT |
| DROP before CREATE | ✅ Recommended | ✅ Implemented | ✅ CORRECT |
| Terminate connections | ✅ Required | ✅ Implemented | ✅ CORRECT |
| Use template0 | ✅ Recommended | ✅ Implemented | ✅ CORRECT |
| Handle IF EXISTS errors | ✅ Recommended | ✅ Implemented | ✅ CORRECT |
| Superuser warnings | ✅ Recommended | ✅ Implemented | ✅ CORRECT |
| Parallel restore | ⚪ Optional | ✅ Implemented | ✅ ENHANCED |
---
## Additional Safety Features (Beyond Docs)
1. **Version Compatibility Checking** (NEW)
- Warns about PG 13 → PG 17 upgrades
- Blocks unsupported downgrades
- Provides recommendations
2. **Atomic Failure Tracking**
- Thread-safe counters for parallel operations
- Detailed error collection per database
3. **Progress Indicators**
- Real-time ETA estimation
- Per-database progress tracking
4. **Disk Space Validation**
- Pre-checks available space (4x multiplier for cluster)
- Prevents out-of-space failures mid-restore
---
## Conclusion
**Our cluster restore implementation is 100% compliant with PostgreSQL best practices.**
The cleanup process (`dropDatabaseIfExists`) correctly:
1. Terminates all connections
2. Waits for cleanup
3. Drops the database completely
4. Uses `template0` for fresh creation
5. Handles system databases appropriately
**No changes needed** - implementation follows official documentation exactly.

View File

@@ -1,165 +0,0 @@
# Large Object Restore Fix
## Problem Analysis
### Error 1: "type backup_state already exists" (postgres database)
**Root Cause**: `--single-transaction` combined with `--exit-on-error` causes entire restore to fail when objects already exist in target database.
**Why it fails**:
- `--single-transaction` wraps restore in BEGIN/COMMIT
- `--exit-on-error` aborts on ANY error (including ignorable ones)
- "already exists" errors are IGNORABLE - PostgreSQL should continue
### Error 2: "could not open large object 9646664" + 2.5M errors (resydb database)
**Root Cause**: `--single-transaction` takes locks on ALL restored objects simultaneously, exhausting lock table.
**Why it fails**:
- Single transaction locks ALL large objects at once
- With 35,000+ large objects, exceeds max_locks_per_transaction
- Lock exhaustion → "could not open large object" errors
- Cascading failures → millions of errors
## PostgreSQL Documentation (Verified)
### From pg_restore docs:
> **"pg_restore cannot restore large objects selectively"** - All large objects restored together
> **"-j / --jobs: Only custom and directory formats supported"**
> **"multiple jobs cannot be used together with --single-transaction"**
### From Section 19.5 (Resource Consumption):
> **"max_locks_per_transaction × max_connections = total locks"**
- Lock table is SHARED across all sessions
- Single transaction consuming all locks blocks everything
## Changes Made
### 1. Disabled `--single-transaction` (CRITICAL FIX)
**File**: `internal/restore/engine.go`
- Line 186: `SingleTransaction: false` (was: true)
- Line 210: `SingleTransaction: false` (was: true)
**Impact**:
- No longer wraps entire restore in one transaction
- Each object restored in its own transaction
- Locks released incrementally (not held until end)
- Prevents lock table exhaustion
### 2. Removed `--exit-on-error` (CRITICAL FIX)
**File**: `internal/database/postgresql.go`
- Line 375-378: Removed `cmd.append("--exit-on-error")`
**Impact**:
- PostgreSQL continues on ignorable errors (correct behavior)
- "already exists" errors logged but don't stop restore
- Final error count reported at end
- Only real errors cause failure
### 3. Kept Sequential Parallelism Detection
**File**: `internal/restore/engine.go`
- Lines 552-565: `detectLargeObjectsInDumps()` still active
- Automatically reduces cluster parallelism to 1 when BLOBs detected
**Impact**:
- Prevents multiple databases with large objects from competing for locks
- Sequential cluster restore = only one DB's large objects in lock table at a time
## Why This Works
### Before (BROKEN):
```
START TRANSACTION; -- Single transaction begins
CREATE TABLE ... -- Lock acquired
CREATE INDEX ... -- Lock acquired
RESTORE BLOB 1 -- Lock acquired
RESTORE BLOB 2 -- Lock acquired
...
RESTORE BLOB 35000 -- Lock acquired → EXHAUSTED!
ERROR: max_locks_per_transaction exceeded
ROLLBACK; -- Everything fails
```
### After (FIXED):
```
BEGIN; CREATE TABLE ...; COMMIT; -- Lock released
BEGIN; CREATE INDEX ...; COMMIT; -- Lock released
BEGIN; RESTORE BLOB 1; COMMIT; -- Lock released
BEGIN; RESTORE BLOB 2; COMMIT; -- Lock released
...
BEGIN; RESTORE BLOB 35000; COMMIT; -- Each only holds ~100 locks max
SUCCESS: All objects restored
```
## Testing Recommendations
### 1. Test with postgres database (backup_state error)
```bash
./dbbackup restore cluster /path/to/backup.tar.gz
# Should now skip "already exists" errors and continue
```
### 2. Test with resydb database (large objects)
```bash
# Check dump for large objects first
pg_restore -l resydb.dump | grep -i "blob\|large object"
# Restore should now work without lock exhaustion
./dbbackup restore cluster /path/to/backup.tar.gz
```
### 3. Monitor locks during restore
```sql
-- In another terminal while restore runs:
SELECT count(*) FROM pg_locks;
-- Should stay well below max_locks_per_transaction × max_connections
```
## Expected Behavior Now
### For "already exists" errors:
```
pg_restore: warning: object already exists: TYPE backup_state
pg_restore: warning: object already exists: FUNCTION ...
... (continues restoring) ...
pg_restore: total errors: 10 (all ignorable)
SUCCESS
```
### For large objects:
```
Restoring database resydb...
Large objects detected - using sequential restore
Restoring 35,000 large objects... (progress)
✓ Database resydb restored successfully
```
## Configuration Settings (Still Valid)
These PostgreSQL settings help but are NO LONGER REQUIRED with the fix:
```ini
# Still recommended for performance, not required for correctness:
max_locks_per_transaction = 256 # Provides headroom
maintenance_work_mem = 1GB # Faster index creation
shared_buffers = 8GB # Better caching
```
## Commit This Fix
```bash
git add internal/restore/engine.go internal/database/postgresql.go
git commit -m "CRITICAL FIX: Remove --single-transaction and --exit-on-error from pg_restore
- Disabled --single-transaction to prevent lock table exhaustion with large objects
- Removed --exit-on-error to allow PostgreSQL to skip ignorable errors
- Fixes 'could not open large object' errors (lock exhaustion)
- Fixes 'already exists' errors causing complete restore failure
- Each object now restored in its own transaction (locks released incrementally)
- PostgreSQL default behavior (continue on ignorable errors) is correct for restores
Per PostgreSQL docs: --single-transaction incompatible with large object restores
and causes lock table exhaustion with 1000+ objects."
git push
```

View File

@@ -1,247 +0,0 @@
# Phase 2 TUI Improvements - Completion Report
## Overview
Phase 2 of the TUI improvements adds professional, actionable UX features focused on transparency and error guidance. All features implemented without over-engineering.
## Implemented Features
### 1. Disk Space Pre-Flight Checks ✅
**Files:** `internal/checks/disk_check.go`
**Features:**
- Real-time filesystem stats using `syscall.Statfs_t`
- Three-tier status system:
- **Critical** (≥95% used): Blocks operation
- **Warning** (≥80% used): Warns but allows
- **Sufficient** (<80% used): OK to proceed
- Smart space estimation:
- Backups: Based on compression level
- Restores: 4x archive size (decompression overhead)
**Integration:**
- `internal/backup/engine.go` - Pre-flight check before cluster backup
- `internal/restore/engine.go` - Pre-flight check before cluster restore
- Displays formatted message in CLI mode
- Logs warnings when space is tight
**Example Output:**
```
📊 Disk Space Check (OK):
Path: /var/lib/pgsql/db_backups
Total: 151.0 GiB
Available: 66.0 GiB (55.0% used)
✓ Status: OK
✓ Sufficient space available
```
### 2. Error Classification & Hints ✅
**Files:** `internal/checks/error_hints.go`
**Features:**
- Smart error pattern matching (regex + substring)
- Four severity levels:
- **Ignorable**: Objects already exist (normal)
- **Warning**: Version mismatches
- **Critical**: Lock exhaustion, permissions, connections
- **Fatal**: Corrupted dumps, excessive errors
**Error Categories:**
- `duplicate`: Already exists (ignorable)
- `disk_space`: No space left on device
- `locks`: max_locks_per_transaction exhausted
- `corruption`: Syntax errors in dump file
- `permissions`: Permission denied, must be owner
- `network`: Connection refused, pg_hba.conf
- `version`: PostgreSQL version mismatch
- `unknown`: Unclassified errors
**Integration:**
- `internal/restore/engine.go` - Classify errors during restore
- Enhanced error logging with hints and actions
- Error messages include actionable solutions
**Example Error Classification:**
```
❌ CRITICAL Error
Category: locks
Message: ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction
💡 Hint: Lock table exhausted - typically caused by large objects in parallel restore
🔧 Action: Increase max_locks_per_transaction in postgresql.conf to 512 or higher
```
### 3. Actionable Error Messages ✅
**Common Errors Mapped:**
1. **"already exists"**
- Type: Ignorable
- Hint: "Object already exists in target database - this is normal during restore"
- Action: "No action needed - restore will continue"
2. **"no space left"**
- Type: Critical
- Hint: "Insufficient disk space to complete operation"
- Action: "Free up disk space: rm old_backups/* or increase storage"
3. **"max_locks_per_transaction"**
- Type: Critical
- Hint: "Lock table exhausted - typically caused by large objects"
- Action: "Increase max_locks_per_transaction in postgresql.conf to 512"
4. **"syntax error"**
- Type: Fatal
- Hint: "Syntax error in dump file - backup may be corrupted"
- Action: "Re-create backup with: dbbackup backup single <database>"
5. **"permission denied"**
- Type: Critical
- Hint: "Insufficient permissions to perform operation"
- Action: "Run as superuser or use --no-owner flag for restore"
6. **"connection refused"**
- Type: Critical
- Hint: "Cannot connect to database server"
- Action: "Check database is running and pg_hba.conf allows connection"
## Architecture Decisions
### Separate `checks` Package
- **Why:** Avoid import cycles (backup/restore ↔ tui)
- **Location:** `internal/checks/`
- **Dependencies:** Only stdlib (`syscall`, `fmt`, `strings`)
- **Result:** Clean separation, no circular dependencies
### No Logger Dependency
- **Why:** Keep checks package lightweight
- **Alternative:** Callers log results as needed
- **Benefit:** Reusable in any context
### Three-Tier Status System
- **Why:** Clear visual indicators for users
- **Critical:** Red ❌ - Blocks operation
- **Warning:** Yellow ⚠️ - Warns but allows
- **Sufficient:** Green ✓ - OK to proceed
## Testing Status
### Background Test
**File:** `test_backup_restore.sh`
**Status:** ✅ Running (PID 1071950)
**Progress (as of last check):**
- ✅ Cluster backup complete: 17/17 databases
- ✅ d7030 backed up: 34GB with 35,000 large objects
- ✅ Large DBs handled: testdb_50gb (6.7GB) × 2
- 🔄 Creating compressed archive...
- ⏳ Next: Drop d7030 → Restore cluster → Verify BLOBs
**Validates:**
- Lock exhaustion fix (35K large objects)
- Ignorable error handling ("already exists")
- Ctrl+C cancellation
- Disk space handling (34GB backup)
## Performance Impact
### Disk Space Check
- **Cost:** ~1ms per check (single syscall)
- **When:** Once before backup/restore starts
- **Impact:** Negligible
### Error Classification
- **Cost:** String pattern matching per error
- **When:** Only when errors occur
- **Impact:** Minimal (errors already indicate slow path)
## User Experience Improvements
### Before Phase 2:
```
Error: restore failed: exit status 1 (total errors: 2500000)
```
❌ No hint what went wrong
❌ No actionable guidance
❌ Can't distinguish critical from ignorable errors
### After Phase 2:
```
📊 Disk Space Check (OK):
Available: 66.0 GiB (55.0% used)
✓ Sufficient space available
[restore in progress...]
❌ CRITICAL Error
Category: locks
💡 Hint: Lock table exhausted - typically caused by large objects
🔧 Action: Increase max_locks_per_transaction to 512 or higher
```
✅ Clear disk status before starting
✅ Helpful error classification
✅ Actionable solution provided
✅ Professional, transparent UX
## Code Quality
### Test Coverage
- ✅ Compiles without warnings
- ✅ No import cycles
- ✅ Minimal dependencies
- ✅ Integrated into existing workflows
### Error Handling
- ✅ Graceful fallback if syscall fails
- ✅ Default classification for unknown errors
- ✅ Non-blocking in CLI mode
### Documentation
- ✅ Inline comments for all functions
- ✅ Clear struct field descriptions
- ✅ Usage examples in TUI_IMPROVEMENTS.md
## Next Steps (Phase 3)
### Real-Time Progress (Not Yet Implemented)
- Show bytes processed / total bytes
- Display transfer speed (MB/s)
- Update ETA based on actual speed
- Progress bars using Bubble Tea components
### Keyboard Shortcuts (Not Yet Implemented)
- `1-9`: Quick jump to menu options
- `q`: Quit application
- `r`: Refresh backup list
- `/`: Search/filter backups
### Enhanced Backup List (Not Yet Implemented)
- Show backup size, age, health
- Visual indicators for verification status
- Sort by date, size, name
## Git History
```
9d36b26 - Add Phase 2 TUI improvements: disk space checks and error hints
e95eeb7 - Add comprehensive TUI improvement plan and background test script
c31717c - Add Ctrl+C interrupt handling for cluster operations
[previous commits...]
```
## Summary
Phase 2 delivers on the core promise: **transparent, actionable, professional UX without over-engineering.**
**Key Achievements:**
- ✅ Pre-flight disk space validation prevents "100% full" surprises
- ✅ Smart error classification distinguishes critical from ignorable
- ✅ Actionable hints provide specific solutions, not generic messages
- ✅ Zero performance impact (checks run once, errors already slow)
- ✅ Clean architecture (no import cycles, minimal dependencies)
- ✅ Integrated seamlessly into existing workflows
**User Impact:**
Users now see what's happening, why errors occur, and exactly how to fix them. No more mysterious failures or cryptic messages.

View File

@@ -1,250 +0,0 @@
# Interactive TUI Experience Improvements
## Current Issues & Solutions
### 1. **Progress Visibility During Long Operations**
**Problem**: Cluster backup/restore with large databases (40GB+) takes 30+ minutes with minimal feedback.
**Solutions**:
- ✅ Show current database being processed
- ✅ Display database size before backup/restore starts
- ✅ ETA estimator for multi-database operations
- 🔄 **NEW**: Real-time progress bar per database (bytes processed / total bytes)
- 🔄 **NEW**: Show current operation speed (MB/s)
- 🔄 **NEW**: Percentage complete for entire cluster operation
### 2. **Error Handling & Recovery**
**Problem**: When restore fails (like resydb with 2.5M errors), user has no context about WHY or WHAT to do.
**Solutions**:
- ✅ Distinguish ignorable errors (already exists) from critical errors
- 🔄 **NEW**: Show error classification in TUI:
```
⚠️ WARNING: 5 ignorable errors (objects already exist)
❌ CRITICAL: Syntax errors detected - dump file may be corrupted
💡 HINT: Re-create backup with: dbbackup backup single resydb
```
- 🔄 **NEW**: Offer retry option for failed databases
- 🔄 **NEW**: Skip vs Abort choice for non-critical failures
### 3. **Large Object Detection Feedback**
**Problem**: User doesn't know WHY parallelism was reduced.
**Solution**:
```
🔍 Scanning cluster backup for large objects...
✓ postgres: No large objects
⚠️ d7030: 35,000 BLOBs detected (42GB)
⚙️ Automatically reducing parallelism: 2 → 1 (sequential)
💡 Reason: Large objects require exclusive lock table access
```
### 4. **Disk Space Warnings**
**Problem**: Backup fails silently when disk is full.
**Solutions**:
- 🔄 **NEW**: Pre-flight check before backup:
```
📊 Disk Space Check:
Database size: 42GB
Available space: 66GB
Estimated backup: ~15GB (compressed)
✓ Sufficient space available
```
- 🔄 **NEW**: Warning at 80% disk usage
- 🔄 **NEW**: Block operation at 95% disk usage
### 5. **Cancellation Handling (Ctrl+C)**
**Problem**: Users don't know if Ctrl+C will work or leave partial backups.
**Solutions**:
- ✅ Graceful cancellation on Ctrl+C
- 🔄 **NEW**: Show cleanup message:
```
^C received - Cancelling backup...
🧹 Cleaning up temporary files...
✓ Cleanup complete - no partial backups left
```
- 🔄 **NEW**: Confirmation prompt for cluster operations:
```
⚠️ Cluster backup in progress (3/10 databases)
Are you sure you want to cancel? (y/N)
```
### 6. **Interactive Mode Navigation**
**Problem**: TUI menu is basic, no keyboard shortcuts, no search.
**Solutions**:
- 🔄 **NEW**: Keyboard shortcuts:
- `1-9`: Quick jump to menu items
- `q`: Quit
- `r`: Refresh status
- `/`: Search backups
- 🔄 **NEW**: Backup list improvements:
```
📦 Available Backups:
1. cluster_20251118_103045.tar.gz [45GB] ⏱ 2 hours ago
├─ postgres (325MB)
├─ d7030 (42GB) ⚠️ 35K BLOBs
└─ template1 (8MB)
2. cluster_20251112_084329.tar.gz [38GB] ⏱ 6 days ago
└─ ⚠️ WARNING: May contain corrupted resydb dump
```
- 🔄 **NEW**: Filter/sort options: by date, by size, by status
### 7. **Configuration Recommendations**
**Problem**: Users don't know optimal settings for their workload.
**Solutions**:
- 🔄 **NEW**: Auto-detect and suggest settings on first run:
```
🔧 System Configuration Detected:
RAM: 32GB → Recommended: shared_buffers=8GB
CPUs: 4 cores → Recommended: parallel_jobs=3
Disk: 66GB free → Recommended: max backup size: 50GB
Apply these settings? (Y/n)
```
- 🔄 **NEW**: Show current vs recommended config in menu:
```
⚙️ Configuration Status:
max_locks_per_transaction: 256 ✓ (sufficient for 35K objects)
maintenance_work_mem: 64MB ⚠️ (recommend: 1GB for faster restores)
shared_buffers: 128MB ⚠️ (recommend: 8GB with 32GB RAM)
```
### 8. **Backup Verification & Health**
**Problem**: No way to verify backup integrity before restore.
**Solutions**:
- 🔄 **NEW**: Add "Verify Backup" menu option:
```
🔍 Verifying backup: cluster_20251118_103045.tar.gz
✓ Archive integrity: OK
✓ Extracting metadata...
✓ Checking dump formats...
Databases found:
✓ postgres: Custom format, 325MB
✓ d7030: Custom format, 42GB, 35,000 BLOBs
⚠️ resydb: CORRUPTED - 2.5M syntax errors detected
Overall: ⚠️ Partial (2/3 databases healthy)
```
- 🔄 **NEW**: Show last backup status in main menu
### 9. **Restore Dry Run**
**Problem**: No preview of what will be restored.
**Solution**:
```
🎬 Restore Preview (Dry Run):
Target: cluster_20251118_103045.tar.gz
Databases to restore:
1. postgres (325MB)
- Will overwrite: 5 existing objects
- New objects: 120
2. d7030 (42GB, 35K BLOBs)
- Will DROP and recreate database
- Estimated time: 25-30 minutes
- Required locks: 35,000 (available: 25,600) ⚠️
⚠️ WARNING: Insufficient locks for d7030
💡 Solution: Increase max_locks_per_transaction to 512
Proceed with restore? (y/N)
```
### 10. **Multi-Step Wizards**
**Problem**: Complex operations (like cluster restore with --clean) need multiple confirmations.
**Solution**: Step-by-step wizard:
```
Step 1/4: Select backup
Step 2/4: Review databases to restore
Step 3/4: Check prerequisites (disk space, locks, etc.)
Step 4/4: Confirm and execute
```
## Implementation Priority
### Phase 1 (High Impact, Low Effort) ✅
- ✅ ETA estimators
- ✅ Large object detection warnings
- ✅ Ctrl+C handling
- ✅ Ignorable error detection
### Phase 2 (High Impact, Medium Effort) 🔄
- Real-time progress bars with MB/s
- Disk space pre-flight checks
- Backup verification tool
- Error hints and suggestions
### Phase 3 (Quality of Life) 🔄
- Keyboard shortcuts
- Backup list with metadata
- Configuration recommendations
- Restore dry run
### Phase 4 (Advanced) 📋
- Multi-step wizards
- Search/filter backups
- Auto-retry failed databases
- Parallel restore progress split-view
## Code Structure
```
internal/tui/
menu.go - Main interactive menu
backup_menu.go - Backup wizard
restore_menu.go - Restore wizard
verify_menu.go - Backup verification (NEW)
config_menu.go - Configuration tuning (NEW)
progress_view.go - Real-time progress display (ENHANCED)
errors.go - Error classification & hints (NEW)
```
## Testing Plan
1. **Large Database Test** (In Progress)
- 42GB d7030 with 35K BLOBs
- Verify progress updates
- Verify large object detection
- Verify successful restore
2. **Error Scenarios**
- Corrupted dump file
- Insufficient disk space
- Insufficient locks
- Network interruption
- Ctrl+C during operations
3. **Performance**
- Backup time vs raw pg_dump
- Restore time vs raw pg_restore
- Memory usage during 40GB+ operations
- CPU utilization with parallel workers
## Success Metrics
- ✅ No "black box" operations - user always knows what's happening
- ✅ Errors are actionable - user knows what to fix
- ✅ Safe operations - confirmations for destructive actions
- ✅ Fast feedback - progress updates every 1-2 seconds
- ✅ Professional feel - polished, consistent, intuitive

View File

@@ -1,281 +0,0 @@
#!/usr/bin/env bash
# create_d7030_test.sh
# Create a realistic d7030 database with tables, data, and many BLOBs to test large object restore
set -euo pipefail
DB_NAME="d7030"
NUM_DOCUMENTS=15000 # Number of documents with BLOBs (~750MB at 50KB each)
NUM_IMAGES=10000 # Number of image records (~900MB for images + ~100MB thumbnails)
# Total BLOBs: 25,000 large objects
# Approximate size: 15000*50KB + 10000*90KB + 10000*10KB = ~2.4GB in BLOBs alone
# With tables, indexes, and overhead: ~3-4GB per iteration
# We'll create multiple batches to reach ~25GB
echo "Creating database: $DB_NAME"
# Drop if exists
sudo -u postgres psql -c "DROP DATABASE IF EXISTS $DB_NAME;" 2>/dev/null || true
# Create database
sudo -u postgres psql -c "CREATE DATABASE $DB_NAME;"
echo "Creating schema and tables..."
# Enable pgcrypto extension for gen_random_bytes
sudo -u postgres psql -d "$DB_NAME" -c "CREATE EXTENSION IF NOT EXISTS pgcrypto;"
# Create schema with realistic business tables
sudo -u postgres psql -d "$DB_NAME" <<'EOF'
-- Create tables for a document management system
CREATE TABLE departments (
dept_id SERIAL PRIMARY KEY,
dept_name VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE employees (
emp_id SERIAL PRIMARY KEY,
dept_id INTEGER REFERENCES departments(dept_id),
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE,
hire_date DATE DEFAULT CURRENT_DATE
);
CREATE TABLE document_types (
type_id SERIAL PRIMARY KEY,
type_name VARCHAR(50) NOT NULL,
description TEXT
);
-- Table with large objects (BLOBs)
CREATE TABLE documents (
doc_id SERIAL PRIMARY KEY,
emp_id INTEGER REFERENCES employees(emp_id),
type_id INTEGER REFERENCES document_types(type_id),
title VARCHAR(255) NOT NULL,
description TEXT,
file_data OID, -- Large object reference
file_size INTEGER,
mime_type VARCHAR(100),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE images (
image_id SERIAL PRIMARY KEY,
doc_id INTEGER REFERENCES documents(doc_id),
image_name VARCHAR(255),
image_data OID, -- Large object reference
thumbnail_data OID, -- Another large object
width INTEGER,
height INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE audit_log (
log_id SERIAL PRIMARY KEY,
table_name VARCHAR(50),
record_id INTEGER,
action VARCHAR(20),
changed_by INTEGER,
changed_at TIMESTAMP DEFAULT NOW(),
details JSONB
);
-- Create indexes
CREATE INDEX idx_documents_emp ON documents(emp_id);
CREATE INDEX idx_documents_type ON documents(type_id);
CREATE INDEX idx_images_doc ON images(doc_id);
CREATE INDEX idx_audit_table ON audit_log(table_name, record_id);
-- Insert reference data
INSERT INTO departments (dept_name) VALUES
('Engineering'), ('Sales'), ('Marketing'), ('HR'), ('Finance');
INSERT INTO document_types (type_name, description) VALUES
('Contract', 'Legal contracts and agreements'),
('Invoice', 'Financial invoices and receipts'),
('Report', 'Business reports and analysis'),
('Manual', 'Technical manuals and guides'),
('Presentation', 'Presentation slides and materials');
-- Insert employees
INSERT INTO employees (dept_id, first_name, last_name, email)
SELECT
(random() * 4 + 1)::INTEGER,
'Employee_' || generate_series,
'LastName_' || generate_series,
'employee' || generate_series || '@d7030.com'
FROM generate_series(1, 50);
EOF
echo "Inserting documents with large objects (BLOBs)..."
echo "This will take several minutes to create ~25GB of data..."
# Create temporary files with random data for importing in postgres home
# Make documents larger for 25GB target: ~1MB each
TEMP_FILE="/var/lib/pgsql/test_blob_data.bin"
sudo dd if=/dev/urandom of="$TEMP_FILE" bs=1M count=1 2>/dev/null
sudo chown postgres:postgres "$TEMP_FILE"
# Create documents with actual large objects using lo_import
sudo -u postgres psql -d "$DB_NAME" <<EOF
DO \$\$
DECLARE
v_emp_id INTEGER;
v_type_id INTEGER;
v_loid OID;
BEGIN
FOR i IN 1..$NUM_DOCUMENTS LOOP
-- Random employee and document type
v_emp_id := (random() * 49 + 1)::INTEGER;
v_type_id := (random() * 4 + 1)::INTEGER;
-- Import file as large object (creates a unique BLOB for each)
v_loid := lo_import('$TEMP_FILE');
-- Insert document record
INSERT INTO documents (emp_id, type_id, title, description, file_data, file_size, mime_type)
VALUES (
v_emp_id,
v_type_id,
'Document_' || i || '_' || (CASE v_type_id
WHEN 1 THEN 'Contract'
WHEN 2 THEN 'Invoice'
WHEN 3 THEN 'Report'
WHEN 4 THEN 'Manual'
ELSE 'Presentation'
END),
'This is a test document with large object data. Document number ' || i,
v_loid,
1048576,
(CASE v_type_id
WHEN 1 THEN 'application/pdf'
WHEN 2 THEN 'application/pdf'
WHEN 3 THEN 'application/vnd.ms-excel'
WHEN 4 THEN 'application/pdf'
ELSE 'application/vnd.ms-powerpoint'
END)
);
-- Progress indicator
IF i % 500 = 0 THEN
RAISE NOTICE 'Created % documents with BLOBs...', i;
END IF;
END LOOP;
END \$\$;
EOF
rm -f "$TEMP_FILE"
echo "Inserting images with large objects..."
# Create temp files for image and thumbnail in postgres home
# Make images larger: ~1.5MB for full image, ~200KB for thumbnail
TEMP_IMAGE="/var/lib/pgsql/test_image_data.bin"
TEMP_THUMB="/var/lib/pgsql/test_thumb_data.bin"
sudo dd if=/dev/urandom of="$TEMP_IMAGE" bs=1M count=1 bs=512K count=3 2>/dev/null
sudo dd if=/dev/urandom of="$TEMP_THUMB" bs=1K count=200 2>/dev/null
sudo chown postgres:postgres "$TEMP_IMAGE" "$TEMP_THUMB"
# Create images with multiple large objects per record
sudo -u postgres psql -d "$DB_NAME" <<EOF
DO \$\$
DECLARE
v_doc_id INTEGER;
v_image_oid OID;
v_thumb_oid OID;
BEGIN
FOR i IN 1..$NUM_IMAGES LOOP
-- Random document (only from successfully created documents)
SELECT doc_id INTO v_doc_id FROM documents ORDER BY random() LIMIT 1;
IF v_doc_id IS NULL THEN
EXIT; -- No documents exist, skip images
END IF;
-- Import full-size image as large object
v_image_oid := lo_import('$TEMP_IMAGE');
-- Import thumbnail as large object
v_thumb_oid := lo_import('$TEMP_THUMB');
-- Insert image record
INSERT INTO images (doc_id, image_name, image_data, thumbnail_data, width, height)
VALUES (
v_doc_id,
'Image_' || i || '.jpg',
v_image_oid,
v_thumb_oid,
(random() * 2000 + 800)::INTEGER,
(random() * 1500 + 600)::INTEGER
);
IF i % 500 = 0 THEN
RAISE NOTICE 'Created % images with BLOBs...', i;
END IF;
END LOOP;
END \$\$;
EOF
rm -f "$TEMP_IMAGE" "$TEMP_THUMB"
echo "Inserting audit log data..."
# Create audit log entries
sudo -u postgres psql -d "$DB_NAME" <<EOF
INSERT INTO audit_log (table_name, record_id, action, changed_by, details)
SELECT
'documents',
doc_id,
(ARRAY['INSERT', 'UPDATE', 'VIEW'])[(random() * 2 + 1)::INTEGER],
(random() * 49 + 1)::INTEGER,
jsonb_build_object(
'timestamp', NOW() - (random() * INTERVAL '90 days'),
'ip_address', '192.168.' || (random() * 255)::INTEGER || '.' || (random() * 255)::INTEGER,
'user_agent', 'Mozilla/5.0'
)
FROM documents
CROSS JOIN generate_series(1, 3);
EOF
echo ""
echo "Database statistics:"
sudo -u postgres psql -d "$DB_NAME" <<'EOF'
SELECT
'Departments' as table_name,
COUNT(*) as row_count
FROM departments
UNION ALL
SELECT 'Employees', COUNT(*) FROM employees
UNION ALL
SELECT 'Document Types', COUNT(*) FROM document_types
UNION ALL
SELECT 'Documents (with BLOBs)', COUNT(*) FROM documents
UNION ALL
SELECT 'Images (with BLOBs)', COUNT(*) FROM images
UNION ALL
SELECT 'Audit Log', COUNT(*) FROM audit_log;
-- Count large objects
SELECT COUNT(*) as total_large_objects FROM pg_largeobject_metadata;
-- Total size of large objects
SELECT pg_size_pretty(SUM(pg_column_size(data))) as total_blob_size
FROM pg_largeobject;
EOF
echo ""
echo "✅ Database $DB_NAME created successfully with realistic data and BLOBs!"
echo ""
echo "Large objects created:"
echo " - $NUM_DOCUMENTS documents (each with ~1MB BLOB)"
echo " - $NUM_IMAGES images (each with 2 BLOBs: ~1.5MB image + ~200KB thumbnail)"
echo " - Total: ~$((NUM_DOCUMENTS + NUM_IMAGES * 2)) large objects"
echo ""
echo "Estimated size: ~$((NUM_DOCUMENTS * 1 + NUM_IMAGES * 1 + NUM_IMAGES * 0))MB in BLOBs"
echo ""
echo "You can now backup this database and test restore with large object locks."

View File

@@ -1,58 +0,0 @@
#!/usr/bin/env bash
# fix_max_locks.sh
# Safely update max_locks_per_transaction in postgresql.conf and restart PostgreSQL
# Usage: sudo ./fix_max_locks.sh [NEW_VALUE]
set -euo pipefail
NEW_VALUE=${1:-256}
CONFIG_FILE="/var/lib/pgsql/data/postgresql.conf"
BACKUP_FILE="${CONFIG_FILE}.bak.$(date +%s)"
echo "PostgreSQL config file: $CONFIG_FILE"
# Create a backup
sudo cp "$CONFIG_FILE" "$BACKUP_FILE"
echo "Backup written to $BACKUP_FILE"
# Check if setting exists (commented or not)
if sudo grep -qE "^\s*#?\s*max_locks_per_transaction\s*=" "$CONFIG_FILE"; then
echo "Updating existing max_locks_per_transaction to $NEW_VALUE"
# Replace the line (whether commented or not)
sudo sed -i "s/^\s*#\?\s*max_locks_per_transaction\s*=.*/max_locks_per_transaction = $NEW_VALUE/" "$CONFIG_FILE"
else
echo "Adding max_locks_per_transaction = $NEW_VALUE to config"
# Append at the end
echo "" | sudo tee -a "$CONFIG_FILE" >/dev/null
echo "# Increased by fix_max_locks.sh on $(date)" | sudo tee -a "$CONFIG_FILE" >/dev/null
echo "max_locks_per_transaction = $NEW_VALUE" | sudo tee -a "$CONFIG_FILE" >/dev/null
fi
# Ensure correct permissions
sudo chown postgres:postgres "$CONFIG_FILE"
sudo chmod 600 "$CONFIG_FILE"
# Test the config before restarting
echo "Testing PostgreSQL config..."
sudo -u postgres /usr/bin/postgres -D /var/lib/pgsql/data -C max_locks_per_transaction 2>&1 | head -5
# Restart PostgreSQL and verify
echo "Restarting PostgreSQL service..."
sudo systemctl restart postgresql
sleep 3
if sudo systemctl is-active --quiet postgresql; then
echo "✅ PostgreSQL restarted successfully"
sudo -u postgres psql -c "SHOW max_locks_per_transaction;"
else
echo "❌ PostgreSQL failed to start!"
echo "Restoring backup..."
sudo cp "$BACKUP_FILE" "$CONFIG_FILE"
sudo systemctl start postgresql
echo "Original config restored. Check /var/log/postgresql for errors."
exit 1
fi
echo ""
echo "Success! Backup available at: $BACKUP_FILE"
exit 0

View File

@@ -1,51 +0,0 @@
#!/bin/bash
set -e
LOG="/var/lib/pgsql/dbbackup_test.log"
echo "=== Database Backup/Restore Test ===" | tee $LOG
echo "Started: $(date)" | tee -a $LOG
echo "" | tee -a $LOG
cd /root/dbbackup
# Step 1: Cluster Backup
echo "STEP 1: Creating cluster backup..." | tee -a $LOG
sudo -u postgres ./dbbackup backup cluster --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG
BACKUP_FILE=$(ls -t /var/lib/pgsql/db_backups/cluster_*.tar.gz | head -1)
echo "Backup created: $BACKUP_FILE" | tee -a $LOG
echo "Backup size: $(ls -lh $BACKUP_FILE | awk '{print $5}')" | tee -a $LOG
echo "" | tee -a $LOG
# Step 2: Drop d7030 database to prepare for restore test
echo "STEP 2: Dropping d7030 database for clean restore test..." | tee -a $LOG
sudo -u postgres psql -d postgres -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE datname = 'd7030' AND pid <> pg_backend_pid();" 2>&1 | tee -a $LOG
sudo -u postgres psql -d postgres -c "DROP DATABASE IF EXISTS d7030;" 2>&1 | tee -a $LOG
echo "d7030 database dropped" | tee -a $LOG
echo "" | tee -a $LOG
# Step 3: Cluster Restore
echo "STEP 3: Restoring cluster from backup..." | tee -a $LOG
sudo -u postgres ./dbbackup restore cluster $BACKUP_FILE --backup-dir /var/lib/pgsql/db_backups 2>&1 | tee -a $LOG
echo "Restore completed" | tee -a $LOG
echo "" | tee -a $LOG
# Step 4: Verify restored data
echo "STEP 4: Verifying restored databases..." | tee -a $LOG
sudo -u postgres psql -d postgres -c "\l" 2>&1 | tee -a $LOG
echo "" | tee -a $LOG
echo "Checking d7030 large objects..." | tee -a $LOG
BLOB_COUNT=$(sudo -u postgres psql -d d7030 -t -c "SELECT count(*) FROM pg_largeobject_metadata;" 2>/dev/null || echo "0")
echo "Large objects in d7030: $BLOB_COUNT" | tee -a $LOG
echo "" | tee -a $LOG
# Step 5: Cleanup
echo "STEP 5: Cleaning up test backup..." | tee -a $LOG
rm -f $BACKUP_FILE
echo "Backup file deleted: $BACKUP_FILE" | tee -a $LOG
echo "" | tee -a $LOG
echo "=== TEST COMPLETE ===" | tee -a $LOG
echo "Finished: $(date)" | tee -a $LOG
echo "" | tee -a $LOG
echo "✅ Full test log available at: $LOG"

Binary file not shown.

View File

@@ -1,57 +0,0 @@
#!/bin/bash
# Verify that backup contains large objects (BLOBs)
if [ $# -eq 0 ]; then
echo "Usage: $0 <backup_file.dump>"
echo "Example: $0 /var/lib/pgsql/db_backups/d7030.dump"
exit 1
fi
BACKUP_FILE="$1"
if [ ! -f "$BACKUP_FILE" ]; then
echo "Error: File not found: $BACKUP_FILE"
exit 1
fi
echo "========================================="
echo "Backup BLOB/Large Object Verification"
echo "========================================="
echo "File: $BACKUP_FILE"
echo ""
# Check if file is a valid PostgreSQL dump
echo "1. Checking dump file format..."
pg_restore -l "$BACKUP_FILE" > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo " ✅ Valid PostgreSQL custom format dump"
else
echo " ❌ Not a valid pg_dump custom format file"
exit 1
fi
# List table of contents and look for BLOB entries
echo ""
echo "2. Checking for BLOB/Large Object entries..."
BLOB_COUNT=$(pg_restore -l "$BACKUP_FILE" | grep -i "BLOB\|LARGE OBJECT" | wc -l)
if [ $BLOB_COUNT -gt 0 ]; then
echo " ✅ Found $BLOB_COUNT large object entries in backup"
echo ""
echo " Sample entries:"
pg_restore -l "$BACKUP_FILE" | grep -i "BLOB\|LARGE OBJECT" | head -10
else
echo " ⚠️ No large object entries found"
echo " This could mean:"
echo " - Database has no large objects (normal)"
echo " - Backup was created without --blobs flag (problem)"
fi
echo ""
echo "3. Full table of contents summary..."
pg_restore -l "$BACKUP_FILE" | tail -20
echo ""
echo "========================================="
echo "Verification complete"
echo "========================================="