Commit Graph

20 Commits

Author SHA1 Message Date
37f55fdfb3 restore: improve error reporting and add specific error handling
IMPROVEMENTS:
- Better formatted error list (newline separated instead of semicolons)
- Detect and log specific error types (max_locks, massive error counts)
- Show succeeded/failed/total count in summary
- Provide actionable hints for known issues

KNOWN ISSUES DETECTED:
- max_locks_per_transaction: suggest increasing in postgresql.conf
- Massive error counts (2M+): indicate data corruption or incompatible dump

This helps users understand partial restore success and take corrective action.
2025-11-13 16:01:32 +00:00
ab3aceb5c0 restore: fix OOM caused by --verbose output accumulation
CRITICAL OOM FIX:
- pg_restore --verbose outputs MASSIVE text (gigabytes for large DBs)
- Previous fix accumulated ALL errors in allErrors slice causing OOM
- Now limit error capture to last 10 errors only
- Discard verbose progress output entirely to prevent memory buildup

CHANGES:
- Replace allErrors slice with lastError string + errorCount counter
- Only log first 10 errors to prevent memory exhaustion
- Make --verbose optional via RestoreOptions.Verbose flag
- Disable --verbose for cluster restores (prevent OOM)
- Keep --verbose for single DB restores (better diagnostics)

This resolves 'runtime: out of memory' panic during cluster restore.
2025-11-13 14:19:56 +00:00
58d11bc4b3 restore: add critical PostgreSQL restore flags per official documentation
Based on PostgreSQL documentation research (postgresql.org/docs/current/app-pgrestore.html):

CRITICAL FIXES:
- Add --exit-on-error: pg_restore continues on errors by default, masking failures
- Add --no-data-for-failed-tables: prevents duplicate data in existing tables
- Use template0 for CREATE DATABASE: avoids duplicate definition errors from template1 additions
- Fix --jobs incompatibility: cannot use with --single-transaction per docs

WHY THIS MATTERS:
- Without --exit-on-error, pg_restore returns success even with failures
- Without --no-data-for-failed-tables, restore fails on existing objects
- template1 may have local additions causing 'duplicate definition' errors
- --jobs with --single-transaction causes pg_restore to fail

This should resolve the 'exit status 1' cluster restore failures.
2025-11-13 12:54:44 +00:00
b9b44dd989 restore: enhance error capture with detailed stderr logging and verbose pg_restore
- Capture all ERROR/FATAL/error: messages from pg_restore/psql stderr
- Include full error details in failure messages for better diagnostics
- Add --verbose flag to pg_restore for comprehensive error reporting
- Improve thread-safe logging in parallel cluster restore
- Help diagnose cluster restore failures with actual PostgreSQL error messages
2025-11-13 12:47:40 +00:00
71386828bb restore: skip creating system DBs (postgres, template0/1) during cluster restore to avoid spurious failures 2025-11-13 09:03:44 +00:00
b2d3fdf105 fix: Typo 2025-11-12 17:10:18 +00:00
2722ff782d Perf: Major performance improvements - parallel cluster operations and optimized goroutines
1. Parallel Cluster Operations (3-5x speedup):
   - Added ClusterParallelism config option (default: 2 concurrent operations)
   - Implemented worker pool pattern for cluster backup/restore
   - Thread-safe progress tracking with sync.Mutex and atomic counters
   - Configurable via CLUSTER_PARALLELISM env var

2. Progress Indicator Optimizations:
   - Replaced busy-wait select+sleep with time.Ticker in Spinner
   - Replaced busy-wait select+sleep with time.Ticker in Dots
   - More CPU-efficient, cleaner shutdown pattern

3. Signal Handler Cleanup:
   - Added signal.Stop() to properly deregister signal handlers
   - Prevents goroutine leaks on long-running operations
   - Applied to both single and cluster restore commands

Benefits:
- Cluster backup/restore 3-5x faster with 2-4 workers
- Reduced CPU usage in progress spinners
- Cleaner goroutine lifecycle management
- No breaking changes - sequential by default if parallelism=1
2025-11-12 13:07:41 +00:00
3d38e909b8 Fix: Critical OOM issue in cluster restore - stream command output instead of loading into memory
- Replaced CombinedOutput() with streaming StderrPipe() in restore engine
- Fixed executeRestoreCommand() to read stderr in 4KB chunks
- Fixed executeRestoreWithDecompression() to stream output
- Fixed extractArchive() to avoid loading tar output into memory
- Fixed restoreGlobals() to stream large globals.sql files
- Only log ERROR/FATAL messages, not all output
- Prevents out-of-memory crashes on large database restores (GB+ data)

This fixes the 'fatal error: out of memory allocating heap arena metadata'
issue when restoring large cluster backups.
2025-11-12 12:22:32 +00:00
b926bb7806 Fix database names in cluster restore: strip .sql.gz extension
- Previously: testdb_50gb.sql.gz.sql.gz (double extension bug)
- Now: testdb_50gb (correct database name)
- Strips both .dump and .sql.gz extensions from filenames
2025-11-11 18:33:29 +00:00
d675e6b7da Fix cluster restore: detect .sql.gz files and use psql instead of pg_restore
- Added format detection in RestoreCluster to distinguish between custom dumps and compressed SQL
- Route .sql.gz files to restorePostgreSQLSQL() with gunzip pipeline
- Fixed PGPASSWORD environment variable propagation in bash subshells
- Successfully tested full cluster restore: 17 databases, 43 minutes, 7GB+ databases verified
- Ultimate validation test passed: backup -> destroy all DBs -> restore -> verify data integrity
2025-11-11 17:43:32 +00:00
cd948e84f1 fix: Implement database creation in RestoreSingle
BUG #1: restore single --create flag was not implemented
- Added ensureDatabaseExists() call when createIfMissing=true
- Database is now created before restore if --create flag is used
- Added TEST_PLAN.md with comprehensive testing matrix

Tested: restore single --create flag now works correctly
Before: ERROR: database does not exist
After: Database created successfully and restored
2025-11-10 09:03:36 +00:00
bdbd8d5e54 feat: Implement ownership preservation in cluster restore
- Add superuser privilege detection (checkSuperuser)
- Implement clean slate restore (DROP DATABASE before restore)
- Add connection termination before DROP (prevents errors)
- Create restorePostgreSQLDumpWithOwnership for configurable ownership
- Fix Unix socket support (skip -h localhost for peer auth)
- Restore global objects (roles/tablespaces) BEFORE databases
- Preserve table/view/function ownership when superuser
- Add comprehensive logging and error handling
- Update restore workflow with ETA tracking
- Add OWNERSHIP_RESTORATION.md documentation

Fixes: Database ownership and privileges not preserved during restore
Tested: ownership_test database with custom owner restored correctly
2025-11-10 08:48:56 +00:00
1a8bf35bbc Add ETA estimation to cluster backup/restore operations
- Created internal/progress/estimator.go with ETAEstimator component
- Tracks elapsed time and estimates remaining time based on progress
- Enhanced Spinner and LineByLine indicators to display ETA info
- Integrated into BackupCluster and RestoreCluster functions
- Display format: 'Operation | X/Y (Z%) | Elapsed: Xm | ETA: ~Ym remaining'
- Preserves spinner animation while showing progress/time estimates
- Quick Win approach: no historical data storage, just current operation tracking
2025-11-07 13:28:11 +00:00
b201d527dd Quality improvements: Remove dead code, add unit tests, fix ignored errors
HIGH PRIORITY FIXES:
1. Remove unused progressCallback mechanism (dead code cleanup)
2. Add unit tests for restore package (formats, safety checks)
   - Test coverage for archive format detection
   - Test coverage for safety validation
   - Added NullLogger for testing
3. Fix ignored errors in backup pipeline
   - Handle StdoutPipe() errors properly
   - Log stderr pipe errors
   - Document CPU detection errors

IMPROVEMENTS:
- formats_test.go: 8 test functions, all passing
- safety_test.go: 6 test functions for validation
- logger/null.go: Test helper for unit tests
- Proper error handling in streaming compression
- Fixed indentation in stderr handling
2025-11-07 11:47:07 +00:00
ce7d820b47 Add rotating spinner to TUI status for visual progress feedback 2025-11-07 11:20:36 +00:00
894a334cb5 Fix: Disable stdout progress in TUI mode to prevent display breaking 2025-11-07 10:50:45 +00:00
828c4d6a47 Fix: Enable --clean flag for cluster restore to handle existing tables 2025-11-07 10:46:27 +00:00
4a5d63e2bb Fix: Ctrl+C now works in TUI, improve database creation with peer auth support 2025-11-07 10:35:24 +00:00
969b936843 Fix: Ensure databases exist before cluster restore - resolves 11 failures issue 2025-11-07 10:27:03 +00:00
97be6564ef feat: implement full restore functionality with TUI integration
- Add complete restore engine (internal/restore/)
  - RestoreSingle() for single database restore
  - RestoreCluster() for full cluster restore
  - Archive format detection (7 formats supported)
  - Safety validation (integrity, disk space, tools)
  - Streaming decompression with pigz support

- Add CLI restore commands (cmd/restore.go)
  - restore single: restore single database backup
  - restore cluster: restore full cluster backup
  - restore list: list available backup archives
  - Safety-first design: dry-run by default, --confirm required

- Add TUI restore integration (internal/tui/)
  - Archive browser: browse and select backups
  - Restore preview: safety checks and confirmation
  - Restore execution: real-time progress tracking
  - Backup manager: comprehensive archive management

- Features:
  - Format auto-detection (.dump, .dump.gz, .sql, .sql.gz, .tar.gz)
  - Archive validation before restore
  - Disk space verification
  - Tool availability checks
  - Target database configuration
  - Clean-first and create-if-missing options
  - Parallel decompression support
  - Progress tracking with phases

Phase 1 (Core Functionality) complete and tested
2025-11-07 09:41:44 +00:00