Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 15a60d2e71 | |||
| 9c65821250 |
161
CHANGELOG.md
161
CHANGELOG.md
@@ -5,6 +5,167 @@ All notable changes to dbbackup will be documented in this file.
|
|||||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||||
|
|
||||||
|
## [3.42.10] - 2026-01-08 "Code Quality"
|
||||||
|
|
||||||
|
### Fixed - Code Quality Issues
|
||||||
|
- Removed deprecated `io/ioutil` usage (replaced with `os`)
|
||||||
|
- Fixed `os.DirEntry.ModTime()` → `file.Info().ModTime()`
|
||||||
|
- Removed unused fields and variables
|
||||||
|
- Fixed ineffective assignments in TUI code
|
||||||
|
- Fixed error strings (no capitalization, no trailing punctuation)
|
||||||
|
|
||||||
|
## [3.42.9] - 2026-01-08 "Diagnose Timeout Fix"
|
||||||
|
|
||||||
|
### Fixed - diagnose.go Timeout Bugs
|
||||||
|
|
||||||
|
**More short timeouts that caused large archive failures:**
|
||||||
|
|
||||||
|
- `diagnoseClusterArchive()`: tar listing 60s → **5 minutes**
|
||||||
|
- `verifyWithPgRestore()`: pg_restore --list 60s → **5 minutes**
|
||||||
|
- `DiagnoseClusterDumps()`: archive listing 120s → **10 minutes**
|
||||||
|
|
||||||
|
**Impact:** These timeouts caused "context deadline exceeded" errors when
|
||||||
|
diagnosing multi-GB backup archives, preventing TUI restore from even starting.
|
||||||
|
|
||||||
|
## [3.42.8] - 2026-01-08 "TUI Timeout Fix"
|
||||||
|
|
||||||
|
### Fixed - TUI Timeout Bugs Causing Backup/Restore Failures
|
||||||
|
|
||||||
|
**ROOT CAUSE of 2-3 month TUI backup/restore failures identified and fixed:**
|
||||||
|
|
||||||
|
#### Critical Timeout Fixes:
|
||||||
|
- **restore_preview.go**: Safety check timeout increased from 60s → **10 minutes**
|
||||||
|
- Large archives (>1GB) take 2+ minutes to diagnose
|
||||||
|
- Users saw "context deadline exceeded" before backup even started
|
||||||
|
- **dbselector.go**: Database listing timeout increased from 15s → **60 seconds**
|
||||||
|
- Busy PostgreSQL servers need more time to respond
|
||||||
|
- **status.go**: Status check timeout increased from 10s → **30 seconds**
|
||||||
|
- SSL negotiation and slow networks caused failures
|
||||||
|
|
||||||
|
#### Stability Improvements:
|
||||||
|
- **Panic recovery** added to parallel goroutines in:
|
||||||
|
- `backup/engine.go:BackupCluster()` - cluster backup workers
|
||||||
|
- `restore/engine.go:RestoreCluster()` - cluster restore workers
|
||||||
|
- Prevents single database panic from crashing entire operation
|
||||||
|
|
||||||
|
#### Bug Fix:
|
||||||
|
- **restore/engine.go**: Fixed variable shadowing `err` → `cmdErr` for exit code detection
|
||||||
|
|
||||||
|
## [3.42.7] - 2026-01-08 "Context Killer Complete"
|
||||||
|
|
||||||
|
### Fixed - Additional Deadlock Bugs in Restore & Engine
|
||||||
|
|
||||||
|
**All remaining cmd.Wait() deadlock bugs fixed across the codebase:**
|
||||||
|
|
||||||
|
#### internal/restore/engine.go:
|
||||||
|
- `executeRestoreWithDecompression()` - gunzip/pigz pipeline restore
|
||||||
|
- `extractArchive()` - tar extraction for cluster restore
|
||||||
|
- `restoreGlobals()` - pg_dumpall globals restore
|
||||||
|
|
||||||
|
#### internal/backup/engine.go:
|
||||||
|
- `createArchive()` - tar/pigz archive creation pipeline
|
||||||
|
|
||||||
|
#### internal/engine/mysqldump.go:
|
||||||
|
- `Backup()` - mysqldump backup operation
|
||||||
|
- `BackupToWriter()` - streaming mysqldump to writer
|
||||||
|
|
||||||
|
**All 6 functions now use proper channel-based context handling with Process.Kill().**
|
||||||
|
|
||||||
|
## [3.42.6] - 2026-01-08 "Deadlock Killer"
|
||||||
|
|
||||||
|
### Fixed - Backup Command Context Handling
|
||||||
|
|
||||||
|
**Critical Bug: pg_dump/mysqldump could hang forever on context cancellation**
|
||||||
|
|
||||||
|
The `executeCommand`, `executeCommandWithProgress`, `executeMySQLWithProgressAndCompression`,
|
||||||
|
and `executeMySQLWithCompression` functions had a race condition where:
|
||||||
|
|
||||||
|
1. A goroutine was spawned to read stderr
|
||||||
|
2. `cmd.Wait()` was called directly
|
||||||
|
3. If context was cancelled, the process was NOT killed
|
||||||
|
4. The goroutine could hang forever waiting for stderr
|
||||||
|
|
||||||
|
**Fix**: All backup execution functions now use proper channel-based context handling:
|
||||||
|
```go
|
||||||
|
// Wait for command with context handling
|
||||||
|
cmdDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
// Context cancelled - kill process
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone
|
||||||
|
cmdErr = ctx.Err()
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Affected Functions:**
|
||||||
|
- `executeCommand()` - pg_dump for cluster backup
|
||||||
|
- `executeCommandWithProgress()` - pg_dump for single backup with progress
|
||||||
|
- `executeMySQLWithProgressAndCompression()` - mysqldump pipeline
|
||||||
|
- `executeMySQLWithCompression()` - mysqldump pipeline
|
||||||
|
|
||||||
|
**This fixes:** Backup operations hanging indefinitely when cancelled or timing out.
|
||||||
|
|
||||||
|
## [3.42.5] - 2026-01-08 "False Positive Fix"
|
||||||
|
|
||||||
|
### Fixed - Encryption Detection Bug
|
||||||
|
|
||||||
|
**IsBackupEncrypted False Positive:**
|
||||||
|
- **BUG FIX**: `IsBackupEncrypted()` returned `true` for ALL files, blocking normal restores
|
||||||
|
- Root cause: Fallback logic checked if first 12 bytes (nonce size) could be read - always true
|
||||||
|
- Fix: Now properly detects known unencrypted formats by magic bytes:
|
||||||
|
- Gzip: `1f 8b`
|
||||||
|
- PostgreSQL custom: `PGDMP`
|
||||||
|
- Plain SQL: starts with `--`, `SET`, `CREATE`
|
||||||
|
- Returns `false` if no metadata present and format is recognized as unencrypted
|
||||||
|
- Affected file: `internal/backup/encryption.go`
|
||||||
|
|
||||||
|
## [3.42.4] - 2026-01-08 "The Long Haul"
|
||||||
|
|
||||||
|
### Fixed - Critical Restore Timeout Bug
|
||||||
|
|
||||||
|
**Removed Arbitrary Timeouts from Backup/Restore Operations:**
|
||||||
|
- **CRITICAL FIX**: Removed 4-hour timeout that was killing large database restores
|
||||||
|
- PostgreSQL cluster restores of 69GB+ databases no longer fail with "context deadline exceeded"
|
||||||
|
- All backup/restore operations now use `context.WithCancel` instead of `context.WithTimeout`
|
||||||
|
- Operations run until completion or manual cancellation (Ctrl+C)
|
||||||
|
|
||||||
|
**Affected Files:**
|
||||||
|
- `internal/tui/restore_exec.go`: Changed from 4-hour timeout to context.WithCancel
|
||||||
|
- `internal/tui/backup_exec.go`: Changed from 4-hour timeout to context.WithCancel
|
||||||
|
- `internal/backup/engine.go`: Removed per-database timeout in cluster backup
|
||||||
|
- `cmd/restore.go`: CLI restore commands use context.WithCancel
|
||||||
|
|
||||||
|
**exec.Command Context Audit:**
|
||||||
|
- Fixed `exec.Command` without Context in `internal/restore/engine.go:730`
|
||||||
|
- Added proper context handling to all external command calls
|
||||||
|
- Added timeouts only for quick diagnostic/version checks (not restore path):
|
||||||
|
- `restore/version_check.go`: 30s timeout for pg_restore --version check only
|
||||||
|
- `restore/error_report.go`: 10s timeout for tool version detection
|
||||||
|
- `restore/diagnose.go`: 60s timeout for diagnostic functions
|
||||||
|
- `pitr/binlog.go`: 10s timeout for mysqlbinlog --version check
|
||||||
|
- `cleanup/processes.go`: 5s timeout for process listing
|
||||||
|
- `auth/helper.go`: 30s timeout for auth helper commands
|
||||||
|
|
||||||
|
**Verification:**
|
||||||
|
- 54 total `exec.CommandContext` calls verified in backup/restore/pitr path
|
||||||
|
- 0 `exec.Command` without Context in critical restore path
|
||||||
|
- All 14 PostgreSQL exec calls use CommandContext (pg_dump, pg_restore, psql)
|
||||||
|
- All 15 MySQL/MariaDB exec calls use CommandContext (mysqldump, mysql, mysqlbinlog)
|
||||||
|
- All 14 test packages pass
|
||||||
|
|
||||||
|
### Technical Details
|
||||||
|
- Large Object (BLOB/BYTEA) restores are particularly affected by timeouts
|
||||||
|
- 69GB database with large objects can take 5+ hours to restore
|
||||||
|
- Previous 4-hour hard timeout was causing consistent failures
|
||||||
|
- Now: No timeout - runs until complete or user cancels
|
||||||
|
|
||||||
## [3.42.1] - 2026-01-07 "Resistance is Futile"
|
## [3.42.1] - 2026-01-07 "Resistance is Futile"
|
||||||
|
|
||||||
### Added - Content-Defined Chunking Deduplication
|
### Added - Content-Defined Chunking Deduplication
|
||||||
|
|||||||
295
EMOTICON_REMOVAL_PLAN.md
Normal file
295
EMOTICON_REMOVAL_PLAN.md
Normal file
@@ -0,0 +1,295 @@
|
|||||||
|
# Emoticon Removal Plan for Python Code
|
||||||
|
|
||||||
|
## ⚠️ CRITICAL: Code Must Remain Functional After Removal
|
||||||
|
|
||||||
|
This document outlines a **safe, systematic approach** to removing emoticons from Python code without breaking functionality.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Identification Phase
|
||||||
|
|
||||||
|
### 1.1 Where Emoticons CAN Safely Exist (Safe to Remove)
|
||||||
|
| Location | Risk Level | Action |
|
||||||
|
|----------|------------|--------|
|
||||||
|
| Comments (`# 🎉 Success!`) | ✅ SAFE | Remove or replace with text |
|
||||||
|
| Docstrings (`"""📌 Note:..."""`) | ✅ SAFE | Remove or replace with text |
|
||||||
|
| Print statements for decoration (`print("✅ Done!")`) | ⚠️ LOW | Replace with ASCII or text |
|
||||||
|
| Logging messages (`logger.info("🔥 Starting...")`) | ⚠️ LOW | Replace with text equivalent |
|
||||||
|
|
||||||
|
### 1.2 Where Emoticons are DANGEROUS to Remove
|
||||||
|
| Location | Risk Level | Action |
|
||||||
|
|----------|------------|--------|
|
||||||
|
| String literals used in logic | 🚨 HIGH | **DO NOT REMOVE** without analysis |
|
||||||
|
| Dictionary keys (`{"🔑": value}`) | 🚨 CRITICAL | **NEVER REMOVE** - breaks code |
|
||||||
|
| Regex patterns | 🚨 CRITICAL | **NEVER REMOVE** - breaks matching |
|
||||||
|
| String comparisons (`if x == "✅"`) | 🚨 CRITICAL | Requires refactoring, not just removal |
|
||||||
|
| Database/API payloads | 🚨 CRITICAL | May break external systems |
|
||||||
|
| File content markers | 🚨 HIGH | May break parsing logic |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Pre-Removal Checklist
|
||||||
|
|
||||||
|
### 2.1 Before ANY Changes
|
||||||
|
- [ ] **Full backup** of the codebase
|
||||||
|
- [ ] **Run all tests** and record baseline results
|
||||||
|
- [ ] **Document all emoticon locations** with grep/search
|
||||||
|
- [ ] **Identify emoticon usage patterns** (decorative vs. functional)
|
||||||
|
|
||||||
|
### 2.2 Discovery Commands
|
||||||
|
```bash
|
||||||
|
# Find all files with emoticons (Unicode range for common emojis)
|
||||||
|
grep -rn --include="*.py" -P '[\x{1F300}-\x{1F9FF}]' .
|
||||||
|
|
||||||
|
# Find emoticons in strings
|
||||||
|
grep -rn --include="*.py" -E '["'"'"'][^"'"'"']*[\x{1F300}-\x{1F9FF}]' .
|
||||||
|
|
||||||
|
# List unique emoticons used
|
||||||
|
grep -oP '[\x{1F300}-\x{1F9FF}]' *.py | sort -u
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Replacement Strategy
|
||||||
|
|
||||||
|
### 3.1 Semantic Replacement Table
|
||||||
|
| Emoticon | Text Replacement | Context |
|
||||||
|
|----------|------------------|---------|
|
||||||
|
| ✅ | `[OK]` or `[SUCCESS]` | Status indicators |
|
||||||
|
| ❌ | `[FAIL]` or `[ERROR]` | Error indicators |
|
||||||
|
| ⚠️ | `[WARNING]` | Warning messages |
|
||||||
|
| 🔥 | `[HOT]` or `` (remove) | Decorative |
|
||||||
|
| 🎉 | `[DONE]` or `` (remove) | Celebration/completion |
|
||||||
|
| 📌 | `[NOTE]` | Notes/pinned items |
|
||||||
|
| 🚀 | `[START]` or `` (remove) | Launch/start indicators |
|
||||||
|
| 💾 | `[SAVE]` | Save operations |
|
||||||
|
| 🔑 | `[KEY]` | Key/authentication |
|
||||||
|
| 📁 | `[FILE]` | File operations |
|
||||||
|
| 🔍 | `[SEARCH]` | Search operations |
|
||||||
|
| ⏳ | `[WAIT]` or `[LOADING]` | Progress indicators |
|
||||||
|
| 🛑 | `[STOP]` | Stop/halt indicators |
|
||||||
|
| ℹ️ | `[INFO]` | Information |
|
||||||
|
| 🐛 | `[BUG]` or `[DEBUG]` | Debug messages |
|
||||||
|
|
||||||
|
### 3.2 Context-Aware Replacement Rules
|
||||||
|
|
||||||
|
```
|
||||||
|
RULE 1: Comments
|
||||||
|
- Remove emoticon entirely OR replace with text
|
||||||
|
- Example: `# 🎉 Feature complete` → `# Feature complete`
|
||||||
|
|
||||||
|
RULE 2: User-facing strings (print/logging)
|
||||||
|
- Replace with semantic text equivalent
|
||||||
|
- Example: `print("✅ Backup complete")` → `print("[OK] Backup complete")`
|
||||||
|
|
||||||
|
RULE 3: Functional strings (DANGER ZONE)
|
||||||
|
- DO NOT auto-replace
|
||||||
|
- Requires manual code refactoring
|
||||||
|
- Example: `status = "✅"` → Refactor to `status = "success"` AND update all comparisons
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Safe Removal Process
|
||||||
|
|
||||||
|
### Step 1: Audit
|
||||||
|
```python
|
||||||
|
# Python script to audit emoticon usage
|
||||||
|
import re
|
||||||
|
import ast
|
||||||
|
|
||||||
|
EMOJI_PATTERN = re.compile(
|
||||||
|
"["
|
||||||
|
"\U0001F300-\U0001F9FF" # Symbols & Pictographs
|
||||||
|
"\U00002600-\U000026FF" # Misc symbols
|
||||||
|
"\U00002700-\U000027BF" # Dingbats
|
||||||
|
"\U0001F600-\U0001F64F" # Emoticons
|
||||||
|
"]+"
|
||||||
|
)
|
||||||
|
|
||||||
|
def audit_file(filepath):
|
||||||
|
with open(filepath, 'r', encoding='utf-8') as f:
|
||||||
|
content = f.read()
|
||||||
|
|
||||||
|
# Parse AST to understand context
|
||||||
|
tree = ast.parse(content)
|
||||||
|
|
||||||
|
findings = []
|
||||||
|
for lineno, line in enumerate(content.split('\n'), 1):
|
||||||
|
matches = EMOJI_PATTERN.findall(line)
|
||||||
|
if matches:
|
||||||
|
# Determine context (comment, string, etc.)
|
||||||
|
context = classify_context(line, matches)
|
||||||
|
findings.append({
|
||||||
|
'line': lineno,
|
||||||
|
'content': line.strip(),
|
||||||
|
'emojis': matches,
|
||||||
|
'context': context,
|
||||||
|
'risk': assess_risk(context)
|
||||||
|
})
|
||||||
|
return findings
|
||||||
|
|
||||||
|
def classify_context(line, matches):
|
||||||
|
stripped = line.strip()
|
||||||
|
if stripped.startswith('#'):
|
||||||
|
return 'COMMENT'
|
||||||
|
if 'print(' in line or 'logging.' in line or 'logger.' in line:
|
||||||
|
return 'OUTPUT'
|
||||||
|
if '==' in line or '!=' in line:
|
||||||
|
return 'COMPARISON'
|
||||||
|
if re.search(r'["\'][^"\']*$', line.split('#')[0]):
|
||||||
|
return 'STRING_LITERAL'
|
||||||
|
return 'UNKNOWN'
|
||||||
|
|
||||||
|
def assess_risk(context):
|
||||||
|
risk_map = {
|
||||||
|
'COMMENT': 'LOW',
|
||||||
|
'OUTPUT': 'LOW',
|
||||||
|
'COMPARISON': 'CRITICAL',
|
||||||
|
'STRING_LITERAL': 'HIGH',
|
||||||
|
'UNKNOWN': 'HIGH'
|
||||||
|
}
|
||||||
|
return risk_map.get(context, 'HIGH')
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 2: Generate Change Plan
|
||||||
|
```python
|
||||||
|
def generate_change_plan(findings):
|
||||||
|
plan = {'safe': [], 'review_required': [], 'do_not_touch': []}
|
||||||
|
|
||||||
|
for finding in findings:
|
||||||
|
if finding['risk'] == 'LOW':
|
||||||
|
plan['safe'].append(finding)
|
||||||
|
elif finding['risk'] == 'HIGH':
|
||||||
|
plan['review_required'].append(finding)
|
||||||
|
else: # CRITICAL
|
||||||
|
plan['do_not_touch'].append(finding)
|
||||||
|
|
||||||
|
return plan
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 3: Apply Changes (SAFE items only)
|
||||||
|
```python
|
||||||
|
def apply_safe_replacements(filepath, replacements):
|
||||||
|
# Create backup first!
|
||||||
|
import shutil
|
||||||
|
shutil.copy(filepath, filepath + '.backup')
|
||||||
|
|
||||||
|
with open(filepath, 'r', encoding='utf-8') as f:
|
||||||
|
content = f.read()
|
||||||
|
|
||||||
|
for old, new in replacements:
|
||||||
|
content = content.replace(old, new)
|
||||||
|
|
||||||
|
with open(filepath, 'w', encoding='utf-8') as f:
|
||||||
|
f.write(content)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: Validate
|
||||||
|
```bash
|
||||||
|
# After each file change:
|
||||||
|
python -m py_compile <modified_file.py> # Syntax check
|
||||||
|
pytest <related_tests> # Run tests
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Validation Checklist
|
||||||
|
|
||||||
|
### After EACH File Modification
|
||||||
|
- [ ] File compiles without syntax errors (`python -m py_compile file.py`)
|
||||||
|
- [ ] All imports still work
|
||||||
|
- [ ] Related unit tests pass
|
||||||
|
- [ ] Integration tests pass
|
||||||
|
- [ ] Manual smoke test if applicable
|
||||||
|
|
||||||
|
### After ALL Modifications
|
||||||
|
- [ ] Full test suite passes
|
||||||
|
- [ ] Application starts correctly
|
||||||
|
- [ ] Key functionality verified manually
|
||||||
|
- [ ] No new warnings in logs
|
||||||
|
- [ ] Compare output with baseline
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Rollback Plan
|
||||||
|
|
||||||
|
### If Something Breaks
|
||||||
|
1. **Immediate**: Restore from `.backup` files
|
||||||
|
2. **Git**: `git checkout -- <file>` or `git stash pop`
|
||||||
|
3. **Full rollback**: Restore from pre-change backup
|
||||||
|
|
||||||
|
### Keep Until Verified
|
||||||
|
```bash
|
||||||
|
# Backup storage structure
|
||||||
|
backups/
|
||||||
|
├── pre_emoticon_removal/
|
||||||
|
│ ├── timestamp.tar.gz
|
||||||
|
│ └── git_commit_hash.txt
|
||||||
|
└── individual_files/
|
||||||
|
├── file1.py.backup
|
||||||
|
└── file2.py.backup
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Implementation Order
|
||||||
|
|
||||||
|
1. **Phase 1**: Comments only (LOWEST risk)
|
||||||
|
2. **Phase 2**: Docstrings (LOW risk)
|
||||||
|
3. **Phase 3**: Print/logging statements (LOW-MEDIUM risk)
|
||||||
|
4. **Phase 4**: Manual review items (HIGH risk) - one by one
|
||||||
|
5. **Phase 5**: NEVER touch CRITICAL items without full refactoring
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Example Workflow
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Create full backup
|
||||||
|
git stash && git checkout -b emoticon-removal
|
||||||
|
|
||||||
|
# 2. Run audit script
|
||||||
|
python emoticon_audit.py > audit_report.json
|
||||||
|
|
||||||
|
# 3. Review audit report
|
||||||
|
cat audit_report.json | jq '.do_not_touch' # Check critical items
|
||||||
|
|
||||||
|
# 4. Apply safe changes only
|
||||||
|
python apply_safe_changes.py --dry-run # Preview first!
|
||||||
|
python apply_safe_changes.py # Apply
|
||||||
|
|
||||||
|
# 5. Validate after each change
|
||||||
|
python -m pytest tests/
|
||||||
|
|
||||||
|
# 6. Commit incrementally
|
||||||
|
git add -p # Review each change
|
||||||
|
git commit -m "Remove emoticons from comments in module X"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. DO NOT DO
|
||||||
|
|
||||||
|
❌ **Never** use global find-replace on emoticons
|
||||||
|
❌ **Never** remove emoticons from string comparisons without refactoring
|
||||||
|
❌ **Never** change multiple files without testing between changes
|
||||||
|
❌ **Never** assume an emoticon is decorative - verify context
|
||||||
|
❌ **Never** proceed if tests fail after a change
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Sign-Off Requirements
|
||||||
|
|
||||||
|
Before merging emoticon removal changes:
|
||||||
|
- [ ] All tests pass (100%)
|
||||||
|
- [ ] Code review by second developer
|
||||||
|
- [ ] Manual testing of affected features
|
||||||
|
- [ ] Documented all CRITICAL items left unchanged (with justification)
|
||||||
|
- [ ] Backup verified and accessible
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Author**: Generated Plan
|
||||||
|
**Date**: 2026-01-07
|
||||||
|
**Status**: PLAN ONLY - No code changes made
|
||||||
@@ -143,7 +143,7 @@ Backup Execution
|
|||||||
|
|
||||||
Backup created: cluster_20251128_092928.tar.gz
|
Backup created: cluster_20251128_092928.tar.gz
|
||||||
Size: 22.5 GB (compressed)
|
Size: 22.5 GB (compressed)
|
||||||
Location: /u01/dba/dumps/
|
Location: /var/backups/postgres/
|
||||||
Databases: 7
|
Databases: 7
|
||||||
Checksum: SHA-256 verified
|
Checksum: SHA-256 verified
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -4,8 +4,8 @@ This directory contains pre-compiled binaries for the DB Backup Tool across mult
|
|||||||
|
|
||||||
## Build Information
|
## Build Information
|
||||||
- **Version**: 3.42.1
|
- **Version**: 3.42.1
|
||||||
- **Build Time**: 2026-01-07_19:40:21_UTC
|
- **Build Time**: 2026-01-08_05:03:53_UTC
|
||||||
- **Git Commit**: 3653ced
|
- **Git Commit**: 9c65821
|
||||||
|
|
||||||
## Recent Updates (v1.1.0)
|
## Recent Updates (v1.1.0)
|
||||||
- ✅ Fixed TUI progress display with line-by-line output
|
- ✅ Fixed TUI progress display with line-by-line output
|
||||||
|
|||||||
@@ -2,12 +2,14 @@ package auth
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"bufio"
|
"bufio"
|
||||||
|
"context"
|
||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
"os/exec"
|
"os/exec"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
"strconv"
|
"strconv"
|
||||||
"strings"
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
"dbbackup/internal/config"
|
"dbbackup/internal/config"
|
||||||
)
|
)
|
||||||
@@ -69,7 +71,10 @@ func checkPgHbaConf(user string) AuthMethod {
|
|||||||
|
|
||||||
// findHbaFileViaPostgres asks PostgreSQL for the hba_file location
|
// findHbaFileViaPostgres asks PostgreSQL for the hba_file location
|
||||||
func findHbaFileViaPostgres() string {
|
func findHbaFileViaPostgres() string {
|
||||||
cmd := exec.Command("psql", "-U", "postgres", "-t", "-c", "SHOW hba_file;")
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, "psql", "-U", "postgres", "-t", "-c", "SHOW hba_file;")
|
||||||
output, err := cmd.Output()
|
output, err := cmd.Output()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return ""
|
return ""
|
||||||
@@ -82,8 +87,11 @@ func parsePgHbaConf(path string, user string) AuthMethod {
|
|||||||
// Try with sudo if we can't read directly
|
// Try with sudo if we can't read directly
|
||||||
file, err := os.Open(path)
|
file, err := os.Open(path)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
// Try with sudo
|
// Try with sudo (with timeout)
|
||||||
cmd := exec.Command("sudo", "cat", path)
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, "sudo", "cat", path)
|
||||||
output, err := cmd.Output()
|
output, err := cmd.Output()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return AuthUnknown
|
return AuthUnknown
|
||||||
|
|||||||
@@ -87,20 +87,46 @@ func IsBackupEncrypted(backupPath string) bool {
|
|||||||
return meta.Encrypted
|
return meta.Encrypted
|
||||||
}
|
}
|
||||||
|
|
||||||
// Fallback: check if file starts with encryption nonce
|
// No metadata found - check file format to determine if encrypted
|
||||||
|
// Known unencrypted formats have specific magic bytes:
|
||||||
|
// - Gzip: 1f 8b
|
||||||
|
// - PGDMP (PostgreSQL custom): 50 47 44 4d 50 (PGDMP)
|
||||||
|
// - Plain SQL: starts with text (-- or SET or CREATE)
|
||||||
|
// - Tar: 75 73 74 61 72 (ustar) at offset 257
|
||||||
|
//
|
||||||
|
// If file doesn't match any known format, it MIGHT be encrypted,
|
||||||
|
// but we return false to avoid false positives. User must provide
|
||||||
|
// metadata file or use --encrypt flag explicitly.
|
||||||
file, err := os.Open(backupPath)
|
file, err := os.Open(backupPath)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
defer file.Close()
|
defer file.Close()
|
||||||
|
|
||||||
// Try to read nonce - if it succeeds, likely encrypted
|
header := make([]byte, 6)
|
||||||
nonce := make([]byte, crypto.NonceSize)
|
if n, err := file.Read(header); err != nil || n < 2 {
|
||||||
if n, err := file.Read(nonce); err != nil || n != crypto.NonceSize {
|
|
||||||
return false
|
return false
|
||||||
}
|
}
|
||||||
|
|
||||||
return true
|
// Check for known unencrypted formats
|
||||||
|
// Gzip magic: 1f 8b
|
||||||
|
if header[0] == 0x1f && header[1] == 0x8b {
|
||||||
|
return false // Gzip compressed - not encrypted
|
||||||
|
}
|
||||||
|
|
||||||
|
// PGDMP magic (PostgreSQL custom format)
|
||||||
|
if len(header) >= 5 && string(header[:5]) == "PGDMP" {
|
||||||
|
return false // PostgreSQL custom dump - not encrypted
|
||||||
|
}
|
||||||
|
|
||||||
|
// Plain text SQL (starts with --, SET, CREATE, etc.)
|
||||||
|
if header[0] == '-' || header[0] == 'S' || header[0] == 'C' || header[0] == '/' {
|
||||||
|
return false // Plain text SQL - not encrypted
|
||||||
|
}
|
||||||
|
|
||||||
|
// Without metadata, we cannot reliably determine encryption status
|
||||||
|
// Return false to avoid blocking restores with false positives
|
||||||
|
return false
|
||||||
}
|
}
|
||||||
|
|
||||||
// DecryptBackupFile decrypts an encrypted backup file
|
// DecryptBackupFile decrypts an encrypted backup file
|
||||||
|
|||||||
@@ -443,6 +443,14 @@ func (e *Engine) BackupCluster(ctx context.Context) error {
|
|||||||
defer wg.Done()
|
defer wg.Done()
|
||||||
defer func() { <-semaphore }() // Release
|
defer func() { <-semaphore }() // Release
|
||||||
|
|
||||||
|
// Panic recovery - prevent one database failure from crashing entire cluster backup
|
||||||
|
defer func() {
|
||||||
|
if r := recover(); r != nil {
|
||||||
|
e.log.Error("Panic in database backup goroutine", "database", name, "panic", r)
|
||||||
|
atomic.AddInt32(&failCount, 1)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
// Check for cancellation at start of goroutine
|
// Check for cancellation at start of goroutine
|
||||||
select {
|
select {
|
||||||
case <-ctx.Done():
|
case <-ctx.Done():
|
||||||
@@ -502,26 +510,10 @@ func (e *Engine) BackupCluster(ctx context.Context) error {
|
|||||||
|
|
||||||
cmd := e.db.BuildBackupCommand(name, dumpFile, options)
|
cmd := e.db.BuildBackupCommand(name, dumpFile, options)
|
||||||
|
|
||||||
// Calculate timeout based on database size:
|
// NO TIMEOUT for individual database backups
|
||||||
// - Minimum 2 hours for small databases
|
// Large databases with large objects can take many hours
|
||||||
// - Add 1 hour per 20GB for large databases
|
// The parent context handles cancellation if needed
|
||||||
// - This allows ~69GB database to take up to 5+ hours
|
err := e.executeCommand(ctx, cmd, dumpFile)
|
||||||
timeout := 2 * time.Hour
|
|
||||||
if size, err := e.db.GetDatabaseSize(ctx, name); err == nil {
|
|
||||||
sizeGB := size / (1024 * 1024 * 1024)
|
|
||||||
if sizeGB > 20 {
|
|
||||||
extraHours := (sizeGB / 20) + 1
|
|
||||||
timeout = time.Duration(2+extraHours) * time.Hour
|
|
||||||
mu.Lock()
|
|
||||||
e.printf(" Extended timeout: %v (for %dGB database)\n", timeout, sizeGB)
|
|
||||||
mu.Unlock()
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
dbCtx, cancel := context.WithTimeout(ctx, timeout)
|
|
||||||
defer cancel()
|
|
||||||
err := e.executeCommand(dbCtx, cmd, dumpFile)
|
|
||||||
cancel()
|
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
e.log.Warn("Failed to backup database", "database", name, "error", err)
|
e.log.Warn("Failed to backup database", "database", name, "error", err)
|
||||||
@@ -614,12 +606,36 @@ func (e *Engine) executeCommandWithProgress(ctx context.Context, cmdArgs []strin
|
|||||||
return fmt.Errorf("failed to start command: %w", err)
|
return fmt.Errorf("failed to start command: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Monitor progress via stderr
|
// Monitor progress via stderr in goroutine
|
||||||
go e.monitorCommandProgress(stderr, tracker)
|
stderrDone := make(chan struct{})
|
||||||
|
go func() {
|
||||||
|
defer close(stderrDone)
|
||||||
|
e.monitorCommandProgress(stderr, tracker)
|
||||||
|
}()
|
||||||
|
|
||||||
// Wait for command to complete
|
// Wait for command to complete with proper context handling
|
||||||
if err := cmd.Wait(); err != nil {
|
cmdDone := make(chan error, 1)
|
||||||
return fmt.Errorf("backup command failed: %w", err)
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed (success or failure)
|
||||||
|
case <-ctx.Done():
|
||||||
|
// Context cancelled - kill process to unblock
|
||||||
|
e.log.Warn("Backup cancelled - killing process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone // Wait for goroutine to finish
|
||||||
|
cmdErr = ctx.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for stderr reader to finish
|
||||||
|
<-stderrDone
|
||||||
|
|
||||||
|
if cmdErr != nil {
|
||||||
|
return fmt.Errorf("backup command failed: %w", cmdErr)
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
@@ -696,8 +712,12 @@ func (e *Engine) executeMySQLWithProgressAndCompression(ctx context.Context, cmd
|
|||||||
return fmt.Errorf("failed to get stderr pipe: %w", err)
|
return fmt.Errorf("failed to get stderr pipe: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Start monitoring progress
|
// Start monitoring progress in goroutine
|
||||||
go e.monitorCommandProgress(stderr, tracker)
|
stderrDone := make(chan struct{})
|
||||||
|
go func() {
|
||||||
|
defer close(stderrDone)
|
||||||
|
e.monitorCommandProgress(stderr, tracker)
|
||||||
|
}()
|
||||||
|
|
||||||
// Start both commands
|
// Start both commands
|
||||||
if err := gzipCmd.Start(); err != nil {
|
if err := gzipCmd.Start(); err != nil {
|
||||||
@@ -705,20 +725,41 @@ func (e *Engine) executeMySQLWithProgressAndCompression(ctx context.Context, cmd
|
|||||||
}
|
}
|
||||||
|
|
||||||
if err := dumpCmd.Start(); err != nil {
|
if err := dumpCmd.Start(); err != nil {
|
||||||
|
gzipCmd.Process.Kill()
|
||||||
return fmt.Errorf("failed to start mysqldump: %w", err)
|
return fmt.Errorf("failed to start mysqldump: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Wait for mysqldump to complete
|
// Wait for mysqldump with context handling
|
||||||
if err := dumpCmd.Wait(); err != nil {
|
dumpDone := make(chan error, 1)
|
||||||
return fmt.Errorf("mysqldump failed: %w", err)
|
go func() {
|
||||||
|
dumpDone <- dumpCmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var dumpErr error
|
||||||
|
select {
|
||||||
|
case dumpErr = <-dumpDone:
|
||||||
|
// mysqldump completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
e.log.Warn("Backup cancelled - killing mysqldump")
|
||||||
|
dumpCmd.Process.Kill()
|
||||||
|
gzipCmd.Process.Kill()
|
||||||
|
<-dumpDone
|
||||||
|
return ctx.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Wait for stderr reader
|
||||||
|
<-stderrDone
|
||||||
|
|
||||||
// Close pipe and wait for gzip
|
// Close pipe and wait for gzip
|
||||||
pipe.Close()
|
pipe.Close()
|
||||||
if err := gzipCmd.Wait(); err != nil {
|
if err := gzipCmd.Wait(); err != nil {
|
||||||
return fmt.Errorf("gzip failed: %w", err)
|
return fmt.Errorf("gzip failed: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if dumpErr != nil {
|
||||||
|
return fmt.Errorf("mysqldump failed: %w", dumpErr)
|
||||||
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -749,19 +790,45 @@ func (e *Engine) executeMySQLWithCompression(ctx context.Context, cmdArgs []stri
|
|||||||
gzipCmd.Stdin = stdin
|
gzipCmd.Stdin = stdin
|
||||||
gzipCmd.Stdout = outFile
|
gzipCmd.Stdout = outFile
|
||||||
|
|
||||||
// Start both commands
|
// Start gzip first
|
||||||
if err := gzipCmd.Start(); err != nil {
|
if err := gzipCmd.Start(); err != nil {
|
||||||
return fmt.Errorf("failed to start gzip: %w", err)
|
return fmt.Errorf("failed to start gzip: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := dumpCmd.Run(); err != nil {
|
// Start mysqldump
|
||||||
return fmt.Errorf("mysqldump failed: %w", err)
|
if err := dumpCmd.Start(); err != nil {
|
||||||
|
gzipCmd.Process.Kill()
|
||||||
|
return fmt.Errorf("failed to start mysqldump: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Wait for mysqldump with context handling
|
||||||
|
dumpDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
dumpDone <- dumpCmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var dumpErr error
|
||||||
|
select {
|
||||||
|
case dumpErr = <-dumpDone:
|
||||||
|
// mysqldump completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
e.log.Warn("Backup cancelled - killing mysqldump")
|
||||||
|
dumpCmd.Process.Kill()
|
||||||
|
gzipCmd.Process.Kill()
|
||||||
|
<-dumpDone
|
||||||
|
return ctx.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Close pipe and wait for gzip
|
||||||
|
stdin.Close()
|
||||||
if err := gzipCmd.Wait(); err != nil {
|
if err := gzipCmd.Wait(); err != nil {
|
||||||
return fmt.Errorf("gzip failed: %w", err)
|
return fmt.Errorf("gzip failed: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if dumpErr != nil {
|
||||||
|
return fmt.Errorf("mysqldump failed: %w", dumpErr)
|
||||||
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -898,15 +965,46 @@ func (e *Engine) createArchive(ctx context.Context, sourceDir, outputFile string
|
|||||||
goto regularTar
|
goto regularTar
|
||||||
}
|
}
|
||||||
|
|
||||||
// Wait for tar to finish
|
// Wait for tar with proper context handling
|
||||||
if err := cmd.Wait(); err != nil {
|
tarDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
tarDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var tarErr error
|
||||||
|
select {
|
||||||
|
case tarErr = <-tarDone:
|
||||||
|
// tar completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
e.log.Warn("Archive creation cancelled - killing processes")
|
||||||
|
cmd.Process.Kill()
|
||||||
pigzCmd.Process.Kill()
|
pigzCmd.Process.Kill()
|
||||||
return fmt.Errorf("tar failed: %w", err)
|
<-tarDone
|
||||||
|
return ctx.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
// Wait for pigz to finish
|
if tarErr != nil {
|
||||||
if err := pigzCmd.Wait(); err != nil {
|
pigzCmd.Process.Kill()
|
||||||
return fmt.Errorf("pigz compression failed: %w", err)
|
return fmt.Errorf("tar failed: %w", tarErr)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for pigz with proper context handling
|
||||||
|
pigzDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
pigzDone <- pigzCmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var pigzErr error
|
||||||
|
select {
|
||||||
|
case pigzErr = <-pigzDone:
|
||||||
|
case <-ctx.Done():
|
||||||
|
pigzCmd.Process.Kill()
|
||||||
|
<-pigzDone
|
||||||
|
return ctx.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
if pigzErr != nil {
|
||||||
|
return fmt.Errorf("pigz compression failed: %w", pigzErr)
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
@@ -1251,8 +1349,10 @@ func (e *Engine) executeCommand(ctx context.Context, cmdArgs []string, outputFil
|
|||||||
return fmt.Errorf("failed to start backup command: %w", err)
|
return fmt.Errorf("failed to start backup command: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Stream stderr output (don't buffer it all in memory)
|
// Stream stderr output in goroutine (don't buffer it all in memory)
|
||||||
|
stderrDone := make(chan struct{})
|
||||||
go func() {
|
go func() {
|
||||||
|
defer close(stderrDone)
|
||||||
scanner := bufio.NewScanner(stderr)
|
scanner := bufio.NewScanner(stderr)
|
||||||
scanner.Buffer(make([]byte, 64*1024), 1024*1024) // 1MB max line size
|
scanner.Buffer(make([]byte, 64*1024), 1024*1024) // 1MB max line size
|
||||||
for scanner.Scan() {
|
for scanner.Scan() {
|
||||||
@@ -1263,10 +1363,30 @@ func (e *Engine) executeCommand(ctx context.Context, cmdArgs []string, outputFil
|
|||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
|
|
||||||
// Wait for command to complete
|
// Wait for command to complete with proper context handling
|
||||||
if err := cmd.Wait(); err != nil {
|
cmdDone := make(chan error, 1)
|
||||||
e.log.Error("Backup command failed", "error", err, "database", filepath.Base(outputFile))
|
go func() {
|
||||||
return fmt.Errorf("backup command failed: %w", err)
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed (success or failure)
|
||||||
|
case <-ctx.Done():
|
||||||
|
// Context cancelled - kill process to unblock
|
||||||
|
e.log.Warn("Backup cancelled - killing pg_dump process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone // Wait for goroutine to finish
|
||||||
|
cmdErr = ctx.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
// Wait for stderr reader to finish
|
||||||
|
<-stderrDone
|
||||||
|
|
||||||
|
if cmdErr != nil {
|
||||||
|
e.log.Error("Backup command failed", "error", cmdErr, "database", filepath.Base(outputFile))
|
||||||
|
return fmt.Errorf("backup command failed: %w", cmdErr)
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ import (
|
|||||||
"strings"
|
"strings"
|
||||||
"sync"
|
"sync"
|
||||||
"syscall"
|
"syscall"
|
||||||
|
"time"
|
||||||
|
|
||||||
"dbbackup/internal/logger"
|
"dbbackup/internal/logger"
|
||||||
)
|
)
|
||||||
@@ -116,8 +117,11 @@ func KillOrphanedProcesses(log logger.Logger) error {
|
|||||||
|
|
||||||
// findProcessesByName returns PIDs of processes matching the given name
|
// findProcessesByName returns PIDs of processes matching the given name
|
||||||
func findProcessesByName(name string, excludePID int) ([]int, error) {
|
func findProcessesByName(name string, excludePID int) ([]int, error) {
|
||||||
// Use pgrep for efficient process searching
|
// Use pgrep for efficient process searching with timeout
|
||||||
cmd := exec.Command("pgrep", "-x", name)
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, "pgrep", "-x", name)
|
||||||
output, err := cmd.Output()
|
output, err := cmd.Output()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
// Exit code 1 means no processes found (not an error)
|
// Exit code 1 means no processes found (not an error)
|
||||||
|
|||||||
@@ -90,7 +90,7 @@ func NewAzureBackend(cfg *Config) (*AzureBackend, error) {
|
|||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
// Use default Azure credential (managed identity, environment variables, etc.)
|
// Use default Azure credential (managed identity, environment variables, etc.)
|
||||||
return nil, fmt.Errorf("Azure authentication requires account name and key, or use AZURE_STORAGE_CONNECTION_STRING environment variable")
|
return nil, fmt.Errorf("azure authentication requires account name and key, or use AZURE_STORAGE_CONNECTION_STRING environment variable")
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -217,8 +217,8 @@ func New() *Config {
|
|||||||
SingleDBName: getEnvString("SINGLE_DB_NAME", ""),
|
SingleDBName: getEnvString("SINGLE_DB_NAME", ""),
|
||||||
RestoreDBName: getEnvString("RESTORE_DB_NAME", ""),
|
RestoreDBName: getEnvString("RESTORE_DB_NAME", ""),
|
||||||
|
|
||||||
// Timeouts
|
// Timeouts - default 24 hours (1440 min) to handle very large databases with large objects
|
||||||
ClusterTimeoutMinutes: getEnvInt("CLUSTER_TIMEOUT_MIN", 240),
|
ClusterTimeoutMinutes: getEnvInt("CLUSTER_TIMEOUT_MIN", 1440),
|
||||||
|
|
||||||
// Cluster parallelism (default: 2 concurrent operations for faster cluster backup/restore)
|
// Cluster parallelism (default: 2 concurrent operations for faster cluster backup/restore)
|
||||||
ClusterParallelism: getEnvInt("CLUSTER_PARALLELISM", 2),
|
ClusterParallelism: getEnvInt("CLUSTER_PARALLELISM", 2),
|
||||||
@@ -227,7 +227,7 @@ func New() *Config {
|
|||||||
WorkDir: getEnvString("WORK_DIR", ""),
|
WorkDir: getEnvString("WORK_DIR", ""),
|
||||||
|
|
||||||
// Swap file management
|
// Swap file management
|
||||||
SwapFilePath: "", // Will be set after WorkDir is initialized
|
SwapFilePath: "", // Will be set after WorkDir is initialized
|
||||||
SwapFileSizeGB: getEnvInt("SWAP_FILE_SIZE_GB", 0), // 0 = disabled by default
|
SwapFileSizeGB: getEnvInt("SWAP_FILE_SIZE_GB", 0), // 0 = disabled by default
|
||||||
AutoSwap: getEnvBool("AUTO_SWAP", false),
|
AutoSwap: getEnvBool("AUTO_SWAP", false),
|
||||||
|
|
||||||
|
|||||||
@@ -28,8 +28,9 @@ type LocalConfig struct {
|
|||||||
DumpJobs int
|
DumpJobs int
|
||||||
|
|
||||||
// Performance settings
|
// Performance settings
|
||||||
CPUWorkload string
|
CPUWorkload string
|
||||||
MaxCores int
|
MaxCores int
|
||||||
|
ClusterTimeout int // Cluster operation timeout in minutes (default: 1440 = 24 hours)
|
||||||
|
|
||||||
// Security settings
|
// Security settings
|
||||||
RetentionDays int
|
RetentionDays int
|
||||||
@@ -121,6 +122,10 @@ func LoadLocalConfig() (*LocalConfig, error) {
|
|||||||
if mc, err := strconv.Atoi(value); err == nil {
|
if mc, err := strconv.Atoi(value); err == nil {
|
||||||
cfg.MaxCores = mc
|
cfg.MaxCores = mc
|
||||||
}
|
}
|
||||||
|
case "cluster_timeout":
|
||||||
|
if ct, err := strconv.Atoi(value); err == nil {
|
||||||
|
cfg.ClusterTimeout = ct
|
||||||
|
}
|
||||||
}
|
}
|
||||||
case "security":
|
case "security":
|
||||||
switch key {
|
switch key {
|
||||||
@@ -199,6 +204,9 @@ func SaveLocalConfig(cfg *LocalConfig) error {
|
|||||||
if cfg.MaxCores != 0 {
|
if cfg.MaxCores != 0 {
|
||||||
sb.WriteString(fmt.Sprintf("max_cores = %d\n", cfg.MaxCores))
|
sb.WriteString(fmt.Sprintf("max_cores = %d\n", cfg.MaxCores))
|
||||||
}
|
}
|
||||||
|
if cfg.ClusterTimeout != 0 {
|
||||||
|
sb.WriteString(fmt.Sprintf("cluster_timeout = %d\n", cfg.ClusterTimeout))
|
||||||
|
}
|
||||||
sb.WriteString("\n")
|
sb.WriteString("\n")
|
||||||
|
|
||||||
// Security section
|
// Security section
|
||||||
@@ -268,6 +276,10 @@ func ApplyLocalConfig(cfg *Config, local *LocalConfig) {
|
|||||||
if local.MaxCores != 0 {
|
if local.MaxCores != 0 {
|
||||||
cfg.MaxCores = local.MaxCores
|
cfg.MaxCores = local.MaxCores
|
||||||
}
|
}
|
||||||
|
// Apply cluster timeout from config file (overrides default)
|
||||||
|
if local.ClusterTimeout != 0 {
|
||||||
|
cfg.ClusterTimeoutMinutes = local.ClusterTimeout
|
||||||
|
}
|
||||||
if cfg.RetentionDays == 30 && local.RetentionDays != 0 {
|
if cfg.RetentionDays == 30 && local.RetentionDays != 0 {
|
||||||
cfg.RetentionDays = local.RetentionDays
|
cfg.RetentionDays = local.RetentionDays
|
||||||
}
|
}
|
||||||
@@ -282,21 +294,22 @@ func ApplyLocalConfig(cfg *Config, local *LocalConfig) {
|
|||||||
// ConfigFromConfig creates a LocalConfig from a Config
|
// ConfigFromConfig creates a LocalConfig from a Config
|
||||||
func ConfigFromConfig(cfg *Config) *LocalConfig {
|
func ConfigFromConfig(cfg *Config) *LocalConfig {
|
||||||
return &LocalConfig{
|
return &LocalConfig{
|
||||||
DBType: cfg.DatabaseType,
|
DBType: cfg.DatabaseType,
|
||||||
Host: cfg.Host,
|
Host: cfg.Host,
|
||||||
Port: cfg.Port,
|
Port: cfg.Port,
|
||||||
User: cfg.User,
|
User: cfg.User,
|
||||||
Database: cfg.Database,
|
Database: cfg.Database,
|
||||||
SSLMode: cfg.SSLMode,
|
SSLMode: cfg.SSLMode,
|
||||||
BackupDir: cfg.BackupDir,
|
BackupDir: cfg.BackupDir,
|
||||||
WorkDir: cfg.WorkDir,
|
WorkDir: cfg.WorkDir,
|
||||||
Compression: cfg.CompressionLevel,
|
Compression: cfg.CompressionLevel,
|
||||||
Jobs: cfg.Jobs,
|
Jobs: cfg.Jobs,
|
||||||
DumpJobs: cfg.DumpJobs,
|
DumpJobs: cfg.DumpJobs,
|
||||||
CPUWorkload: cfg.CPUWorkloadType,
|
CPUWorkload: cfg.CPUWorkloadType,
|
||||||
MaxCores: cfg.MaxCores,
|
MaxCores: cfg.MaxCores,
|
||||||
RetentionDays: cfg.RetentionDays,
|
ClusterTimeout: cfg.ClusterTimeoutMinutes,
|
||||||
MinBackups: cfg.MinBackups,
|
RetentionDays: cfg.RetentionDays,
|
||||||
MaxRetries: cfg.MaxRetries,
|
MinBackups: cfg.MinBackups,
|
||||||
|
MaxRetries: cfg.MaxRetries,
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -15,7 +15,6 @@ import (
|
|||||||
|
|
||||||
"github.com/jackc/pgx/v5/pgxpool"
|
"github.com/jackc/pgx/v5/pgxpool"
|
||||||
"github.com/jackc/pgx/v5/stdlib"
|
"github.com/jackc/pgx/v5/stdlib"
|
||||||
_ "github.com/jackc/pgx/v5/stdlib" // PostgreSQL driver (pgx)
|
|
||||||
)
|
)
|
||||||
|
|
||||||
// PostgreSQL implements Database interface for PostgreSQL
|
// PostgreSQL implements Database interface for PostgreSQL
|
||||||
|
|||||||
@@ -234,10 +234,26 @@ func (e *MySQLDumpEngine) Backup(ctx context.Context, opts *BackupOptions) (*Bac
|
|||||||
gzWriter.Close()
|
gzWriter.Close()
|
||||||
}
|
}
|
||||||
|
|
||||||
// Wait for command
|
// Wait for command with proper context handling
|
||||||
if err := cmd.Wait(); err != nil {
|
cmdDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
e.log.Warn("MySQL backup cancelled - killing process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone
|
||||||
|
cmdErr = ctx.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
if cmdErr != nil {
|
||||||
stderr := stderrBuf.String()
|
stderr := stderrBuf.String()
|
||||||
return nil, fmt.Errorf("mysqldump failed: %w\n%s", err, stderr)
|
return nil, fmt.Errorf("mysqldump failed: %w\n%s", cmdErr, stderr)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Get file info
|
// Get file info
|
||||||
@@ -442,8 +458,25 @@ func (e *MySQLDumpEngine) BackupToWriter(ctx context.Context, w io.Writer, opts
|
|||||||
gzWriter.Close()
|
gzWriter.Close()
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := cmd.Wait(); err != nil {
|
// Wait for command with proper context handling
|
||||||
return nil, fmt.Errorf("mysqldump failed: %w\n%s", err, stderrBuf.String())
|
cmdDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
e.log.Warn("MySQL streaming backup cancelled - killing process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone
|
||||||
|
cmdErr = ctx.Err()
|
||||||
|
}
|
||||||
|
|
||||||
|
if cmdErr != nil {
|
||||||
|
return nil, fmt.Errorf("mysqldump failed: %w\n%s", cmdErr, stderrBuf.String())
|
||||||
}
|
}
|
||||||
|
|
||||||
return &BackupResult{
|
return &BackupResult{
|
||||||
|
|||||||
@@ -63,7 +63,7 @@ func (b *BtrfsBackend) Detect(dataDir string) (bool, error) {
|
|||||||
// CreateSnapshot creates a Btrfs snapshot
|
// CreateSnapshot creates a Btrfs snapshot
|
||||||
func (b *BtrfsBackend) CreateSnapshot(ctx context.Context, opts SnapshotOptions) (*Snapshot, error) {
|
func (b *BtrfsBackend) CreateSnapshot(ctx context.Context, opts SnapshotOptions) (*Snapshot, error) {
|
||||||
if b.config == nil || b.config.Subvolume == "" {
|
if b.config == nil || b.config.Subvolume == "" {
|
||||||
return nil, fmt.Errorf("Btrfs subvolume not configured")
|
return nil, fmt.Errorf("btrfs subvolume not configured")
|
||||||
}
|
}
|
||||||
|
|
||||||
// Generate snapshot name
|
// Generate snapshot name
|
||||||
|
|||||||
@@ -212,7 +212,11 @@ func (m *BinlogManager) detectTools() error {
|
|||||||
|
|
||||||
// detectServerType determines if we're working with MySQL or MariaDB
|
// detectServerType determines if we're working with MySQL or MariaDB
|
||||||
func (m *BinlogManager) detectServerType() DatabaseType {
|
func (m *BinlogManager) detectServerType() DatabaseType {
|
||||||
cmd := exec.Command(m.mysqlbinlogPath, "--version")
|
// Use timeout to prevent blocking if command hangs
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, m.mysqlbinlogPath, "--version")
|
||||||
output, err := cmd.Output()
|
output, err := cmd.Output()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return DatabaseMySQL // Default to MySQL
|
return DatabaseMySQL // Default to MySQL
|
||||||
|
|||||||
@@ -4,6 +4,7 @@ import (
|
|||||||
"bufio"
|
"bufio"
|
||||||
"bytes"
|
"bytes"
|
||||||
"compress/gzip"
|
"compress/gzip"
|
||||||
|
"context"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
@@ -12,6 +13,7 @@ import (
|
|||||||
"path/filepath"
|
"path/filepath"
|
||||||
"regexp"
|
"regexp"
|
||||||
"strings"
|
"strings"
|
||||||
|
"time"
|
||||||
|
|
||||||
"dbbackup/internal/logger"
|
"dbbackup/internal/logger"
|
||||||
)
|
)
|
||||||
@@ -60,9 +62,9 @@ type DiagnoseDetails struct {
|
|||||||
TableList []string `json:"table_list,omitempty"`
|
TableList []string `json:"table_list,omitempty"`
|
||||||
|
|
||||||
// Compression analysis
|
// Compression analysis
|
||||||
GzipValid bool `json:"gzip_valid,omitempty"`
|
GzipValid bool `json:"gzip_valid,omitempty"`
|
||||||
GzipError string `json:"gzip_error,omitempty"`
|
GzipError string `json:"gzip_error,omitempty"`
|
||||||
ExpandedSize int64 `json:"expanded_size,omitempty"`
|
ExpandedSize int64 `json:"expanded_size,omitempty"`
|
||||||
CompressionRatio float64 `json:"compression_ratio,omitempty"`
|
CompressionRatio float64 `json:"compression_ratio,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -157,7 +159,7 @@ func (d *Diagnoser) diagnosePgDump(filePath string, result *DiagnoseResult) {
|
|||||||
result.IsCorrupted = true
|
result.IsCorrupted = true
|
||||||
result.Details.HasPGDMPSignature = false
|
result.Details.HasPGDMPSignature = false
|
||||||
result.Details.FirstBytes = fmt.Sprintf("%q", header[:minInt(n, 20)])
|
result.Details.FirstBytes = fmt.Sprintf("%q", header[:minInt(n, 20)])
|
||||||
result.Errors = append(result.Errors,
|
result.Errors = append(result.Errors,
|
||||||
"Missing PGDMP signature - file is NOT PostgreSQL custom format",
|
"Missing PGDMP signature - file is NOT PostgreSQL custom format",
|
||||||
"This file may be SQL format incorrectly named as .dump",
|
"This file may be SQL format incorrectly named as .dump",
|
||||||
"Try: file "+filePath+" to check actual file type")
|
"Try: file "+filePath+" to check actual file type")
|
||||||
@@ -185,7 +187,7 @@ func (d *Diagnoser) diagnosePgDumpGz(filePath string, result *DiagnoseResult) {
|
|||||||
result.IsCorrupted = true
|
result.IsCorrupted = true
|
||||||
result.Details.GzipValid = false
|
result.Details.GzipValid = false
|
||||||
result.Details.GzipError = err.Error()
|
result.Details.GzipError = err.Error()
|
||||||
result.Errors = append(result.Errors,
|
result.Errors = append(result.Errors,
|
||||||
fmt.Sprintf("Invalid gzip format: %v", err),
|
fmt.Sprintf("Invalid gzip format: %v", err),
|
||||||
"The file may be truncated or corrupted during transfer")
|
"The file may be truncated or corrupted during transfer")
|
||||||
return
|
return
|
||||||
@@ -210,7 +212,7 @@ func (d *Diagnoser) diagnosePgDumpGz(filePath string, result *DiagnoseResult) {
|
|||||||
} else {
|
} else {
|
||||||
result.Details.HasPGDMPSignature = false
|
result.Details.HasPGDMPSignature = false
|
||||||
result.Details.FirstBytes = fmt.Sprintf("%q", header[:minInt(n, 20)])
|
result.Details.FirstBytes = fmt.Sprintf("%q", header[:minInt(n, 20)])
|
||||||
|
|
||||||
// Check if it's actually SQL content
|
// Check if it's actually SQL content
|
||||||
content := string(header[:n])
|
content := string(header[:n])
|
||||||
if strings.Contains(content, "PostgreSQL") || strings.Contains(content, "pg_dump") ||
|
if strings.Contains(content, "PostgreSQL") || strings.Contains(content, "pg_dump") ||
|
||||||
@@ -233,7 +235,7 @@ func (d *Diagnoser) diagnosePgDumpGz(filePath string, result *DiagnoseResult) {
|
|||||||
// Verify full gzip stream integrity by reading to end
|
// Verify full gzip stream integrity by reading to end
|
||||||
file.Seek(0, 0)
|
file.Seek(0, 0)
|
||||||
gz, _ = gzip.NewReader(file)
|
gz, _ = gzip.NewReader(file)
|
||||||
|
|
||||||
var totalRead int64
|
var totalRead int64
|
||||||
buf := make([]byte, 32*1024)
|
buf := make([]byte, 32*1024)
|
||||||
for {
|
for {
|
||||||
@@ -255,7 +257,7 @@ func (d *Diagnoser) diagnosePgDumpGz(filePath string, result *DiagnoseResult) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
gz.Close()
|
gz.Close()
|
||||||
|
|
||||||
result.Details.ExpandedSize = totalRead
|
result.Details.ExpandedSize = totalRead
|
||||||
if result.FileSize > 0 {
|
if result.FileSize > 0 {
|
||||||
result.Details.CompressionRatio = float64(totalRead) / float64(result.FileSize)
|
result.Details.CompressionRatio = float64(totalRead) / float64(result.FileSize)
|
||||||
@@ -392,7 +394,7 @@ func (d *Diagnoser) diagnoseSQLScript(filePath string, compressed bool, result *
|
|||||||
lastCopyTable, copyStartLine),
|
lastCopyTable, copyStartLine),
|
||||||
"The backup was truncated during data export",
|
"The backup was truncated during data export",
|
||||||
"This explains the 'syntax error' during restore - COPY data is being interpreted as SQL")
|
"This explains the 'syntax error' during restore - COPY data is being interpreted as SQL")
|
||||||
|
|
||||||
if len(copyDataSamples) > 0 {
|
if len(copyDataSamples) > 0 {
|
||||||
result.Errors = append(result.Errors,
|
result.Errors = append(result.Errors,
|
||||||
fmt.Sprintf("Sample orphaned data: %s", copyDataSamples[0]))
|
fmt.Sprintf("Sample orphaned data: %s", copyDataSamples[0]))
|
||||||
@@ -412,8 +414,12 @@ func (d *Diagnoser) diagnoseSQLScript(filePath string, compressed bool, result *
|
|||||||
|
|
||||||
// diagnoseClusterArchive analyzes a cluster tar.gz archive
|
// diagnoseClusterArchive analyzes a cluster tar.gz archive
|
||||||
func (d *Diagnoser) diagnoseClusterArchive(filePath string, result *DiagnoseResult) {
|
func (d *Diagnoser) diagnoseClusterArchive(filePath string, result *DiagnoseResult) {
|
||||||
// First verify tar.gz integrity
|
// First verify tar.gz integrity with timeout
|
||||||
cmd := exec.Command("tar", "-tzf", filePath)
|
// 5 minutes for large archives (multi-GB archives need more time)
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, "tar", "-tzf", filePath)
|
||||||
output, err := cmd.Output()
|
output, err := cmd.Output()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
result.IsValid = false
|
result.IsValid = false
|
||||||
@@ -491,13 +497,18 @@ func (d *Diagnoser) diagnoseUnknown(filePath string, result *DiagnoseResult) {
|
|||||||
|
|
||||||
// verifyWithPgRestore uses pg_restore --list to verify dump integrity
|
// verifyWithPgRestore uses pg_restore --list to verify dump integrity
|
||||||
func (d *Diagnoser) verifyWithPgRestore(filePath string, result *DiagnoseResult) {
|
func (d *Diagnoser) verifyWithPgRestore(filePath string, result *DiagnoseResult) {
|
||||||
cmd := exec.Command("pg_restore", "--list", filePath)
|
// Use timeout to prevent blocking on very large dump files
|
||||||
|
// 5 minutes for large dumps (multi-GB dumps with many tables)
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, "pg_restore", "--list", filePath)
|
||||||
output, err := cmd.CombinedOutput()
|
output, err := cmd.CombinedOutput()
|
||||||
|
|
||||||
if err != nil {
|
if err != nil {
|
||||||
result.Details.PgRestoreListable = false
|
result.Details.PgRestoreListable = false
|
||||||
result.Details.PgRestoreError = string(output)
|
result.Details.PgRestoreError = string(output)
|
||||||
|
|
||||||
// Check for specific errors
|
// Check for specific errors
|
||||||
errStr := string(output)
|
errStr := string(output)
|
||||||
if strings.Contains(errStr, "unexpected end of file") ||
|
if strings.Contains(errStr, "unexpected end of file") ||
|
||||||
@@ -544,7 +555,11 @@ func (d *Diagnoser) verifyWithPgRestore(filePath string, result *DiagnoseResult)
|
|||||||
// DiagnoseClusterDumps extracts and diagnoses all dumps in a cluster archive
|
// DiagnoseClusterDumps extracts and diagnoses all dumps in a cluster archive
|
||||||
func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*DiagnoseResult, error) {
|
func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*DiagnoseResult, error) {
|
||||||
// First, try to list archive contents without extracting (fast check)
|
// First, try to list archive contents without extracting (fast check)
|
||||||
listCmd := exec.Command("tar", "-tzf", archivePath)
|
// 10 minutes for very large archives
|
||||||
|
listCtx, listCancel := context.WithTimeout(context.Background(), 10*time.Minute)
|
||||||
|
defer listCancel()
|
||||||
|
|
||||||
|
listCmd := exec.CommandContext(listCtx, "tar", "-tzf", archivePath)
|
||||||
listOutput, listErr := listCmd.CombinedOutput()
|
listOutput, listErr := listCmd.CombinedOutput()
|
||||||
if listErr != nil {
|
if listErr != nil {
|
||||||
// Archive listing failed - likely corrupted
|
// Archive listing failed - likely corrupted
|
||||||
@@ -557,9 +572,9 @@ func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*Diagno
|
|||||||
IsCorrupted: true,
|
IsCorrupted: true,
|
||||||
Details: &DiagnoseDetails{},
|
Details: &DiagnoseDetails{},
|
||||||
}
|
}
|
||||||
|
|
||||||
errOutput := string(listOutput)
|
errOutput := string(listOutput)
|
||||||
if strings.Contains(errOutput, "unexpected end of file") ||
|
if strings.Contains(errOutput, "unexpected end of file") ||
|
||||||
strings.Contains(errOutput, "Unexpected EOF") ||
|
strings.Contains(errOutput, "Unexpected EOF") ||
|
||||||
strings.Contains(errOutput, "truncated") {
|
strings.Contains(errOutput, "truncated") {
|
||||||
errResult.IsTruncated = true
|
errResult.IsTruncated = true
|
||||||
@@ -574,28 +589,34 @@ func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*Diagno
|
|||||||
fmt.Sprintf("tar error: %s", truncateString(errOutput, 300)),
|
fmt.Sprintf("tar error: %s", truncateString(errOutput, 300)),
|
||||||
"Run manually: tar -tzf "+archivePath+" 2>&1 | tail -50")
|
"Run manually: tar -tzf "+archivePath+" 2>&1 | tail -50")
|
||||||
}
|
}
|
||||||
|
|
||||||
return []*DiagnoseResult{errResult}, nil
|
return []*DiagnoseResult{errResult}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
// Archive is listable - now check disk space before extraction
|
// Archive is listable - now check disk space before extraction
|
||||||
files := strings.Split(strings.TrimSpace(string(listOutput)), "\n")
|
files := strings.Split(strings.TrimSpace(string(listOutput)), "\n")
|
||||||
|
|
||||||
// Check if we have enough disk space (estimate 4x archive size needed)
|
// Check if we have enough disk space (estimate 4x archive size needed)
|
||||||
archiveInfo, _ := os.Stat(archivePath)
|
archiveInfo, _ := os.Stat(archivePath)
|
||||||
requiredSpace := archiveInfo.Size() * 4
|
requiredSpace := archiveInfo.Size() * 4
|
||||||
|
|
||||||
// Check temp directory space - try to extract metadata first
|
// Check temp directory space - try to extract metadata first
|
||||||
if stat, err := os.Stat(tempDir); err == nil && stat.IsDir() {
|
if stat, err := os.Stat(tempDir); err == nil && stat.IsDir() {
|
||||||
// Try extraction of a small test file first
|
// Try extraction of a small test file first with timeout
|
||||||
testCmd := exec.Command("tar", "-xzf", archivePath, "-C", tempDir, "--wildcards", "*.json", "--wildcards", "globals.sql")
|
testCtx, testCancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||||
|
testCmd := exec.CommandContext(testCtx, "tar", "-xzf", archivePath, "-C", tempDir, "--wildcards", "*.json", "--wildcards", "globals.sql")
|
||||||
testCmd.Run() // Ignore error - just try to extract metadata
|
testCmd.Run() // Ignore error - just try to extract metadata
|
||||||
|
testCancel()
|
||||||
}
|
}
|
||||||
|
|
||||||
d.log.Info("Archive listing successful", "files", len(files))
|
d.log.Info("Archive listing successful", "files", len(files))
|
||||||
|
|
||||||
// Try full extraction
|
// Try full extraction - NO TIMEOUT here as large archives can take a long time
|
||||||
cmd := exec.Command("tar", "-xzf", archivePath, "-C", tempDir)
|
// Use a generous timeout (30 minutes) for very large archives
|
||||||
|
extractCtx, extractCancel := context.WithTimeout(context.Background(), 30*time.Minute)
|
||||||
|
defer extractCancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(extractCtx, "tar", "-xzf", archivePath, "-C", tempDir)
|
||||||
var stderr bytes.Buffer
|
var stderr bytes.Buffer
|
||||||
cmd.Stderr = &stderr
|
cmd.Stderr = &stderr
|
||||||
if err := cmd.Run(); err != nil {
|
if err := cmd.Run(); err != nil {
|
||||||
@@ -608,14 +629,14 @@ func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*Diagno
|
|||||||
IsValid: false,
|
IsValid: false,
|
||||||
Details: &DiagnoseDetails{},
|
Details: &DiagnoseDetails{},
|
||||||
}
|
}
|
||||||
|
|
||||||
errOutput := stderr.String()
|
errOutput := stderr.String()
|
||||||
if strings.Contains(errOutput, "No space left") ||
|
if strings.Contains(errOutput, "No space left") ||
|
||||||
strings.Contains(errOutput, "cannot write") ||
|
strings.Contains(errOutput, "cannot write") ||
|
||||||
strings.Contains(errOutput, "Disk quota exceeded") {
|
strings.Contains(errOutput, "Disk quota exceeded") {
|
||||||
errResult.Errors = append(errResult.Errors,
|
errResult.Errors = append(errResult.Errors,
|
||||||
"INSUFFICIENT DISK SPACE to extract archive for diagnosis",
|
"INSUFFICIENT DISK SPACE to extract archive for diagnosis",
|
||||||
fmt.Sprintf("Archive size: %s (needs ~%s for extraction)",
|
fmt.Sprintf("Archive size: %s (needs ~%s for extraction)",
|
||||||
formatBytes(archiveInfo.Size()), formatBytes(requiredSpace)),
|
formatBytes(archiveInfo.Size()), formatBytes(requiredSpace)),
|
||||||
"Use CLI diagnosis instead: dbbackup restore diagnose "+archivePath,
|
"Use CLI diagnosis instead: dbbackup restore diagnose "+archivePath,
|
||||||
"Or use --workdir flag to specify a location with more space")
|
"Or use --workdir flag to specify a location with more space")
|
||||||
@@ -634,7 +655,7 @@ func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*Diagno
|
|||||||
fmt.Sprintf("Extraction failed: %v", err),
|
fmt.Sprintf("Extraction failed: %v", err),
|
||||||
fmt.Sprintf("tar error: %s", truncateString(errOutput, 300)))
|
fmt.Sprintf("tar error: %s", truncateString(errOutput, 300)))
|
||||||
}
|
}
|
||||||
|
|
||||||
// Still report what files we found in the listing
|
// Still report what files we found in the listing
|
||||||
var dumpFiles []string
|
var dumpFiles []string
|
||||||
for _, f := range files {
|
for _, f := range files {
|
||||||
@@ -648,7 +669,7 @@ func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*Diagno
|
|||||||
errResult.Warnings = append(errResult.Warnings,
|
errResult.Warnings = append(errResult.Warnings,
|
||||||
fmt.Sprintf("Archive contains %d database dumps (listing only)", len(dumpFiles)))
|
fmt.Sprintf("Archive contains %d database dumps (listing only)", len(dumpFiles)))
|
||||||
}
|
}
|
||||||
|
|
||||||
return []*DiagnoseResult{errResult}, nil
|
return []*DiagnoseResult{errResult}, nil
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -27,8 +27,7 @@ type Engine struct {
|
|||||||
progress progress.Indicator
|
progress progress.Indicator
|
||||||
detailedReporter *progress.DetailedReporter
|
detailedReporter *progress.DetailedReporter
|
||||||
dryRun bool
|
dryRun bool
|
||||||
debugLogPath string // Path to save debug log on error
|
debugLogPath string // Path to save debug log on error
|
||||||
errorCollector *ErrorCollector // Collects detailed error info
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// New creates a new restore engine
|
// New creates a new restore engine
|
||||||
@@ -357,43 +356,68 @@ func (e *Engine) executeRestoreCommandWithContext(ctx context.Context, cmdArgs [
|
|||||||
return fmt.Errorf("failed to start restore command: %w", err)
|
return fmt.Errorf("failed to start restore command: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Read stderr in chunks to log errors without loading all into memory
|
// Read stderr in goroutine to avoid blocking
|
||||||
buf := make([]byte, 4096)
|
|
||||||
var lastError string
|
var lastError string
|
||||||
var errorCount int
|
var errorCount int
|
||||||
const maxErrors = 10 // Limit captured errors to prevent OOM
|
stderrDone := make(chan struct{})
|
||||||
for {
|
go func() {
|
||||||
n, err := stderr.Read(buf)
|
defer close(stderrDone)
|
||||||
if n > 0 {
|
buf := make([]byte, 4096)
|
||||||
chunk := string(buf[:n])
|
const maxErrors = 10 // Limit captured errors to prevent OOM
|
||||||
|
for {
|
||||||
// Feed to error collector if enabled
|
n, err := stderr.Read(buf)
|
||||||
if collector != nil {
|
if n > 0 {
|
||||||
collector.CaptureStderr(chunk)
|
chunk := string(buf[:n])
|
||||||
}
|
|
||||||
|
// Feed to error collector if enabled
|
||||||
// Only capture REAL errors, not verbose output
|
if collector != nil {
|
||||||
if strings.Contains(chunk, "ERROR:") || strings.Contains(chunk, "FATAL:") || strings.Contains(chunk, "error:") {
|
collector.CaptureStderr(chunk)
|
||||||
lastError = strings.TrimSpace(chunk)
|
|
||||||
errorCount++
|
|
||||||
if errorCount <= maxErrors {
|
|
||||||
e.log.Warn("Restore stderr", "output", chunk)
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Only capture REAL errors, not verbose output
|
||||||
|
if strings.Contains(chunk, "ERROR:") || strings.Contains(chunk, "FATAL:") || strings.Contains(chunk, "error:") {
|
||||||
|
lastError = strings.TrimSpace(chunk)
|
||||||
|
errorCount++
|
||||||
|
if errorCount <= maxErrors {
|
||||||
|
e.log.Warn("Restore stderr", "output", chunk)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Note: --verbose output is discarded to prevent OOM
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
break
|
||||||
}
|
}
|
||||||
// Note: --verbose output is discarded to prevent OOM
|
|
||||||
}
|
|
||||||
if err != nil {
|
|
||||||
break
|
|
||||||
}
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Wait for command with proper context handling
|
||||||
|
cmdDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed (success or failure)
|
||||||
|
case <-ctx.Done():
|
||||||
|
// Context cancelled - kill process
|
||||||
|
e.log.Warn("Restore cancelled - killing process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone
|
||||||
|
cmdErr = ctx.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := cmd.Wait(); err != nil {
|
// Wait for stderr reader to finish
|
||||||
|
<-stderrDone
|
||||||
|
|
||||||
|
if cmdErr != nil {
|
||||||
// Get exit code
|
// Get exit code
|
||||||
exitCode := 1
|
exitCode := 1
|
||||||
if exitErr, ok := err.(*exec.ExitError); ok {
|
if exitErr, ok := cmdErr.(*exec.ExitError); ok {
|
||||||
exitCode = exitErr.ExitCode()
|
exitCode = exitErr.ExitCode()
|
||||||
}
|
}
|
||||||
|
|
||||||
// PostgreSQL pg_restore returns exit code 1 even for ignorable errors
|
// PostgreSQL pg_restore returns exit code 1 even for ignorable errors
|
||||||
// Check if errors are ignorable (already exists, duplicate, etc.)
|
// Check if errors are ignorable (already exists, duplicate, etc.)
|
||||||
if lastError != "" && e.isIgnorableError(lastError) {
|
if lastError != "" && e.isIgnorableError(lastError) {
|
||||||
@@ -427,10 +451,10 @@ func (e *Engine) executeRestoreCommandWithContext(ctx context.Context, cmdArgs [
|
|||||||
errType,
|
errType,
|
||||||
errHint,
|
errHint,
|
||||||
)
|
)
|
||||||
|
|
||||||
// Print report to console
|
// Print report to console
|
||||||
collector.PrintReport(report)
|
collector.PrintReport(report)
|
||||||
|
|
||||||
// Save to file
|
// Save to file
|
||||||
if e.debugLogPath != "" {
|
if e.debugLogPath != "" {
|
||||||
if saveErr := collector.SaveReport(report, e.debugLogPath); saveErr != nil {
|
if saveErr := collector.SaveReport(report, e.debugLogPath); saveErr != nil {
|
||||||
@@ -481,31 +505,56 @@ func (e *Engine) executeRestoreWithDecompression(ctx context.Context, archivePat
|
|||||||
return fmt.Errorf("failed to start restore command: %w", err)
|
return fmt.Errorf("failed to start restore command: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Read stderr in chunks to log errors without loading all into memory
|
// Read stderr in goroutine to avoid blocking
|
||||||
buf := make([]byte, 4096)
|
|
||||||
var lastError string
|
var lastError string
|
||||||
var errorCount int
|
var errorCount int
|
||||||
const maxErrors = 10 // Limit captured errors to prevent OOM
|
stderrDone := make(chan struct{})
|
||||||
for {
|
go func() {
|
||||||
n, err := stderr.Read(buf)
|
defer close(stderrDone)
|
||||||
if n > 0 {
|
buf := make([]byte, 4096)
|
||||||
chunk := string(buf[:n])
|
const maxErrors = 10 // Limit captured errors to prevent OOM
|
||||||
// Only capture REAL errors, not verbose output
|
for {
|
||||||
if strings.Contains(chunk, "ERROR:") || strings.Contains(chunk, "FATAL:") || strings.Contains(chunk, "error:") {
|
n, err := stderr.Read(buf)
|
||||||
lastError = strings.TrimSpace(chunk)
|
if n > 0 {
|
||||||
errorCount++
|
chunk := string(buf[:n])
|
||||||
if errorCount <= maxErrors {
|
// Only capture REAL errors, not verbose output
|
||||||
e.log.Warn("Restore stderr", "output", chunk)
|
if strings.Contains(chunk, "ERROR:") || strings.Contains(chunk, "FATAL:") || strings.Contains(chunk, "error:") {
|
||||||
|
lastError = strings.TrimSpace(chunk)
|
||||||
|
errorCount++
|
||||||
|
if errorCount <= maxErrors {
|
||||||
|
e.log.Warn("Restore stderr", "output", chunk)
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
// Note: --verbose output is discarded to prevent OOM
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
break
|
||||||
}
|
}
|
||||||
// Note: --verbose output is discarded to prevent OOM
|
|
||||||
}
|
|
||||||
if err != nil {
|
|
||||||
break
|
|
||||||
}
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Wait for command with proper context handling
|
||||||
|
cmdDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed (success or failure)
|
||||||
|
case <-ctx.Done():
|
||||||
|
// Context cancelled - kill process
|
||||||
|
e.log.Warn("Restore with decompression cancelled - killing process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone
|
||||||
|
cmdErr = ctx.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := cmd.Wait(); err != nil {
|
// Wait for stderr reader to finish
|
||||||
|
<-stderrDone
|
||||||
|
|
||||||
|
if cmdErr != nil {
|
||||||
// PostgreSQL pg_restore returns exit code 1 even for ignorable errors
|
// PostgreSQL pg_restore returns exit code 1 even for ignorable errors
|
||||||
// Check if errors are ignorable (already exists, duplicate, etc.)
|
// Check if errors are ignorable (already exists, duplicate, etc.)
|
||||||
if lastError != "" && e.isIgnorableError(lastError) {
|
if lastError != "" && e.isIgnorableError(lastError) {
|
||||||
@@ -517,18 +566,18 @@ func (e *Engine) executeRestoreWithDecompression(ctx context.Context, archivePat
|
|||||||
if lastError != "" {
|
if lastError != "" {
|
||||||
classification := checks.ClassifyError(lastError)
|
classification := checks.ClassifyError(lastError)
|
||||||
e.log.Error("Restore with decompression failed",
|
e.log.Error("Restore with decompression failed",
|
||||||
"error", err,
|
"error", cmdErr,
|
||||||
"last_stderr", lastError,
|
"last_stderr", lastError,
|
||||||
"error_count", errorCount,
|
"error_count", errorCount,
|
||||||
"error_type", classification.Type,
|
"error_type", classification.Type,
|
||||||
"hint", classification.Hint,
|
"hint", classification.Hint,
|
||||||
"action", classification.Action)
|
"action", classification.Action)
|
||||||
return fmt.Errorf("restore failed: %w (last error: %s, total errors: %d) - %s",
|
return fmt.Errorf("restore failed: %w (last error: %s, total errors: %d) - %s",
|
||||||
err, lastError, errorCount, classification.Hint)
|
cmdErr, lastError, errorCount, classification.Hint)
|
||||||
}
|
}
|
||||||
|
|
||||||
e.log.Error("Restore with decompression failed", "error", err, "last_stderr", lastError, "error_count", errorCount)
|
e.log.Error("Restore with decompression failed", "error", cmdErr, "last_stderr", lastError, "error_count", errorCount)
|
||||||
return fmt.Errorf("restore failed: %w", err)
|
return fmt.Errorf("restore failed: %w", cmdErr)
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
@@ -727,7 +776,7 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
|||||||
}
|
}
|
||||||
} else if strings.HasSuffix(dumpFile, ".dump") {
|
} else if strings.HasSuffix(dumpFile, ".dump") {
|
||||||
// Validate custom format dumps using pg_restore --list
|
// Validate custom format dumps using pg_restore --list
|
||||||
cmd := exec.Command("pg_restore", "--list", dumpFile)
|
cmd := exec.CommandContext(ctx, "pg_restore", "--list", dumpFile)
|
||||||
output, err := cmd.CombinedOutput()
|
output, err := cmd.CombinedOutput()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
dbName := strings.TrimSuffix(entry.Name(), ".dump")
|
dbName := strings.TrimSuffix(entry.Name(), ".dump")
|
||||||
@@ -753,8 +802,8 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
|||||||
if len(corruptedDumps) > 0 {
|
if len(corruptedDumps) > 0 {
|
||||||
operation.Fail("Corrupted dump files detected")
|
operation.Fail("Corrupted dump files detected")
|
||||||
e.progress.Fail(fmt.Sprintf("Found %d corrupted dump files - restore aborted", len(corruptedDumps)))
|
e.progress.Fail(fmt.Sprintf("Found %d corrupted dump files - restore aborted", len(corruptedDumps)))
|
||||||
return fmt.Errorf("pre-validation failed: %d corrupted dump files detected:\n %s\n\nThe backup archive appears to be damaged. You need to restore from a different backup.",
|
return fmt.Errorf("pre-validation failed: %d corrupted dump files detected: %s - the backup archive appears to be damaged, restore from a different backup",
|
||||||
len(corruptedDumps), strings.Join(corruptedDumps, "\n "))
|
len(corruptedDumps), strings.Join(corruptedDumps, ", "))
|
||||||
}
|
}
|
||||||
e.log.Info("All dump files passed validation")
|
e.log.Info("All dump files passed validation")
|
||||||
|
|
||||||
@@ -812,6 +861,14 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
|
|||||||
defer wg.Done()
|
defer wg.Done()
|
||||||
defer func() { <-semaphore }() // Release
|
defer func() { <-semaphore }() // Release
|
||||||
|
|
||||||
|
// Panic recovery - prevent one database failure from crashing entire cluster restore
|
||||||
|
defer func() {
|
||||||
|
if r := recover(); r != nil {
|
||||||
|
e.log.Error("Panic in database restore goroutine", "file", filename, "panic", r)
|
||||||
|
atomic.AddInt32(&failCount, 1)
|
||||||
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
// Update estimator progress (thread-safe)
|
// Update estimator progress (thread-safe)
|
||||||
mu.Lock()
|
mu.Lock()
|
||||||
estimator.UpdateProgress(idx)
|
estimator.UpdateProgress(idx)
|
||||||
@@ -939,16 +996,39 @@ func (e *Engine) extractArchive(ctx context.Context, archivePath, destDir string
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Discard stderr output in chunks to prevent memory buildup
|
// Discard stderr output in chunks to prevent memory buildup
|
||||||
buf := make([]byte, 4096)
|
stderrDone := make(chan struct{})
|
||||||
for {
|
go func() {
|
||||||
_, err := stderr.Read(buf)
|
defer close(stderrDone)
|
||||||
if err != nil {
|
buf := make([]byte, 4096)
|
||||||
break
|
for {
|
||||||
|
_, err := stderr.Read(buf)
|
||||||
|
if err != nil {
|
||||||
|
break
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
}()
|
||||||
|
|
||||||
|
// Wait for command with proper context handling
|
||||||
|
cmdDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
e.log.Warn("Archive extraction cancelled - killing process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone
|
||||||
|
cmdErr = ctx.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := cmd.Wait(); err != nil {
|
<-stderrDone
|
||||||
return fmt.Errorf("tar extraction failed: %w", err)
|
|
||||||
|
if cmdErr != nil {
|
||||||
|
return fmt.Errorf("tar extraction failed: %w", cmdErr)
|
||||||
}
|
}
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
@@ -981,25 +1061,48 @@ func (e *Engine) restoreGlobals(ctx context.Context, globalsFile string) error {
|
|||||||
return fmt.Errorf("failed to start psql: %w", err)
|
return fmt.Errorf("failed to start psql: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Read stderr in chunks
|
// Read stderr in chunks in goroutine
|
||||||
buf := make([]byte, 4096)
|
|
||||||
var lastError string
|
var lastError string
|
||||||
for {
|
stderrDone := make(chan struct{})
|
||||||
n, err := stderr.Read(buf)
|
go func() {
|
||||||
if n > 0 {
|
defer close(stderrDone)
|
||||||
chunk := string(buf[:n])
|
buf := make([]byte, 4096)
|
||||||
if strings.Contains(chunk, "ERROR") || strings.Contains(chunk, "FATAL") {
|
for {
|
||||||
lastError = chunk
|
n, err := stderr.Read(buf)
|
||||||
e.log.Warn("Globals restore stderr", "output", chunk)
|
if n > 0 {
|
||||||
|
chunk := string(buf[:n])
|
||||||
|
if strings.Contains(chunk, "ERROR") || strings.Contains(chunk, "FATAL") {
|
||||||
|
lastError = chunk
|
||||||
|
e.log.Warn("Globals restore stderr", "output", chunk)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if err != nil {
|
||||||
|
break
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
if err != nil {
|
}()
|
||||||
break
|
|
||||||
}
|
// Wait for command with proper context handling
|
||||||
|
cmdDone := make(chan error, 1)
|
||||||
|
go func() {
|
||||||
|
cmdDone <- cmd.Wait()
|
||||||
|
}()
|
||||||
|
|
||||||
|
var cmdErr error
|
||||||
|
select {
|
||||||
|
case cmdErr = <-cmdDone:
|
||||||
|
// Command completed
|
||||||
|
case <-ctx.Done():
|
||||||
|
e.log.Warn("Globals restore cancelled - killing process")
|
||||||
|
cmd.Process.Kill()
|
||||||
|
<-cmdDone
|
||||||
|
cmdErr = ctx.Err()
|
||||||
}
|
}
|
||||||
|
|
||||||
if err := cmd.Wait(); err != nil {
|
<-stderrDone
|
||||||
return fmt.Errorf("failed to restore globals: %w (last error: %s)", err, lastError)
|
|
||||||
|
if cmdErr != nil {
|
||||||
|
return fmt.Errorf("failed to restore globals: %w (last error: %s)", cmdErr, lastError)
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
@@ -1263,7 +1366,8 @@ func (e *Engine) detectLargeObjectsInDumps(dumpsDir string, entries []os.DirEntr
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Use pg_restore -l to list contents (fast, doesn't restore data)
|
// Use pg_restore -l to list contents (fast, doesn't restore data)
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
// 2 minutes for large dumps with many objects
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Minute)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
cmd := exec.CommandContext(ctx, "pg_restore", "-l", dumpFile)
|
cmd := exec.CommandContext(ctx, "pg_restore", "-l", dumpFile)
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ package restore
|
|||||||
import (
|
import (
|
||||||
"bufio"
|
"bufio"
|
||||||
"compress/gzip"
|
"compress/gzip"
|
||||||
|
"context"
|
||||||
"encoding/json"
|
"encoding/json"
|
||||||
"fmt"
|
"fmt"
|
||||||
"io"
|
"io"
|
||||||
@@ -20,43 +21,43 @@ import (
|
|||||||
// RestoreErrorReport contains comprehensive information about a restore failure
|
// RestoreErrorReport contains comprehensive information about a restore failure
|
||||||
type RestoreErrorReport struct {
|
type RestoreErrorReport struct {
|
||||||
// Metadata
|
// Metadata
|
||||||
Timestamp time.Time `json:"timestamp"`
|
Timestamp time.Time `json:"timestamp"`
|
||||||
Version string `json:"version"`
|
Version string `json:"version"`
|
||||||
GoVersion string `json:"go_version"`
|
GoVersion string `json:"go_version"`
|
||||||
OS string `json:"os"`
|
OS string `json:"os"`
|
||||||
Arch string `json:"arch"`
|
Arch string `json:"arch"`
|
||||||
|
|
||||||
// Archive info
|
// Archive info
|
||||||
ArchivePath string `json:"archive_path"`
|
ArchivePath string `json:"archive_path"`
|
||||||
ArchiveSize int64 `json:"archive_size"`
|
ArchiveSize int64 `json:"archive_size"`
|
||||||
ArchiveFormat string `json:"archive_format"`
|
ArchiveFormat string `json:"archive_format"`
|
||||||
|
|
||||||
// Database info
|
// Database info
|
||||||
TargetDB string `json:"target_db"`
|
TargetDB string `json:"target_db"`
|
||||||
DatabaseType string `json:"database_type"`
|
DatabaseType string `json:"database_type"`
|
||||||
|
|
||||||
// Error details
|
// Error details
|
||||||
ExitCode int `json:"exit_code"`
|
ExitCode int `json:"exit_code"`
|
||||||
ErrorMessage string `json:"error_message"`
|
ErrorMessage string `json:"error_message"`
|
||||||
ErrorType string `json:"error_type"`
|
ErrorType string `json:"error_type"`
|
||||||
ErrorHint string `json:"error_hint"`
|
ErrorHint string `json:"error_hint"`
|
||||||
TotalErrors int `json:"total_errors"`
|
TotalErrors int `json:"total_errors"`
|
||||||
|
|
||||||
// Captured output
|
// Captured output
|
||||||
LastStderr []string `json:"last_stderr"`
|
LastStderr []string `json:"last_stderr"`
|
||||||
FirstErrors []string `json:"first_errors"`
|
FirstErrors []string `json:"first_errors"`
|
||||||
|
|
||||||
// Context around failure
|
// Context around failure
|
||||||
FailureContext *FailureContext `json:"failure_context,omitempty"`
|
FailureContext *FailureContext `json:"failure_context,omitempty"`
|
||||||
|
|
||||||
// Diagnosis results
|
// Diagnosis results
|
||||||
DiagnosisResult *DiagnoseResult `json:"diagnosis_result,omitempty"`
|
DiagnosisResult *DiagnoseResult `json:"diagnosis_result,omitempty"`
|
||||||
|
|
||||||
// Environment (sanitized)
|
// Environment (sanitized)
|
||||||
PostgresVersion string `json:"postgres_version,omitempty"`
|
PostgresVersion string `json:"postgres_version,omitempty"`
|
||||||
PgRestoreVersion string `json:"pg_restore_version,omitempty"`
|
PgRestoreVersion string `json:"pg_restore_version,omitempty"`
|
||||||
PsqlVersion string `json:"psql_version,omitempty"`
|
PsqlVersion string `json:"psql_version,omitempty"`
|
||||||
|
|
||||||
// Recommendations
|
// Recommendations
|
||||||
Recommendations []string `json:"recommendations"`
|
Recommendations []string `json:"recommendations"`
|
||||||
}
|
}
|
||||||
@@ -67,40 +68,40 @@ type FailureContext struct {
|
|||||||
FailedLine int `json:"failed_line,omitempty"`
|
FailedLine int `json:"failed_line,omitempty"`
|
||||||
FailedStatement string `json:"failed_statement,omitempty"`
|
FailedStatement string `json:"failed_statement,omitempty"`
|
||||||
SurroundingLines []string `json:"surrounding_lines,omitempty"`
|
SurroundingLines []string `json:"surrounding_lines,omitempty"`
|
||||||
|
|
||||||
// For COPY block errors
|
// For COPY block errors
|
||||||
InCopyBlock bool `json:"in_copy_block,omitempty"`
|
InCopyBlock bool `json:"in_copy_block,omitempty"`
|
||||||
CopyTableName string `json:"copy_table_name,omitempty"`
|
CopyTableName string `json:"copy_table_name,omitempty"`
|
||||||
CopyStartLine int `json:"copy_start_line,omitempty"`
|
CopyStartLine int `json:"copy_start_line,omitempty"`
|
||||||
SampleCopyData []string `json:"sample_copy_data,omitempty"`
|
SampleCopyData []string `json:"sample_copy_data,omitempty"`
|
||||||
|
|
||||||
// File position info
|
// File position info
|
||||||
BytePosition int64 `json:"byte_position,omitempty"`
|
BytePosition int64 `json:"byte_position,omitempty"`
|
||||||
PercentComplete float64 `json:"percent_complete,omitempty"`
|
PercentComplete float64 `json:"percent_complete,omitempty"`
|
||||||
}
|
}
|
||||||
|
|
||||||
// ErrorCollector captures detailed error information during restore
|
// ErrorCollector captures detailed error information during restore
|
||||||
type ErrorCollector struct {
|
type ErrorCollector struct {
|
||||||
log logger.Logger
|
log logger.Logger
|
||||||
cfg *config.Config
|
cfg *config.Config
|
||||||
archivePath string
|
archivePath string
|
||||||
targetDB string
|
targetDB string
|
||||||
format ArchiveFormat
|
format ArchiveFormat
|
||||||
|
|
||||||
// Captured data
|
// Captured data
|
||||||
stderrLines []string
|
stderrLines []string
|
||||||
firstErrors []string
|
firstErrors []string
|
||||||
lastErrors []string
|
lastErrors []string
|
||||||
totalErrors int
|
totalErrors int
|
||||||
exitCode int
|
exitCode int
|
||||||
|
|
||||||
// Limits
|
// Limits
|
||||||
maxStderrLines int
|
maxStderrLines int
|
||||||
maxErrorCapture int
|
maxErrorCapture int
|
||||||
|
|
||||||
// State
|
// State
|
||||||
startTime time.Time
|
startTime time.Time
|
||||||
enabled bool
|
enabled bool
|
||||||
}
|
}
|
||||||
|
|
||||||
// NewErrorCollector creates a new error collector
|
// NewErrorCollector creates a new error collector
|
||||||
@@ -126,30 +127,30 @@ func (ec *ErrorCollector) CaptureStderr(chunk string) {
|
|||||||
if !ec.enabled {
|
if !ec.enabled {
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
|
|
||||||
lines := strings.Split(chunk, "\n")
|
lines := strings.Split(chunk, "\n")
|
||||||
for _, line := range lines {
|
for _, line := range lines {
|
||||||
line = strings.TrimSpace(line)
|
line = strings.TrimSpace(line)
|
||||||
if line == "" {
|
if line == "" {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
// Store last N lines of stderr
|
// Store last N lines of stderr
|
||||||
if len(ec.stderrLines) >= ec.maxStderrLines {
|
if len(ec.stderrLines) >= ec.maxStderrLines {
|
||||||
// Shift array, drop oldest
|
// Shift array, drop oldest
|
||||||
ec.stderrLines = ec.stderrLines[1:]
|
ec.stderrLines = ec.stderrLines[1:]
|
||||||
}
|
}
|
||||||
ec.stderrLines = append(ec.stderrLines, line)
|
ec.stderrLines = append(ec.stderrLines, line)
|
||||||
|
|
||||||
// Check if this is an error line
|
// Check if this is an error line
|
||||||
if isErrorLine(line) {
|
if isErrorLine(line) {
|
||||||
ec.totalErrors++
|
ec.totalErrors++
|
||||||
|
|
||||||
// Capture first N errors
|
// Capture first N errors
|
||||||
if len(ec.firstErrors) < ec.maxErrorCapture {
|
if len(ec.firstErrors) < ec.maxErrorCapture {
|
||||||
ec.firstErrors = append(ec.firstErrors, line)
|
ec.firstErrors = append(ec.firstErrors, line)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Keep last N errors (ring buffer style)
|
// Keep last N errors (ring buffer style)
|
||||||
if len(ec.lastErrors) >= ec.maxErrorCapture {
|
if len(ec.lastErrors) >= ec.maxErrorCapture {
|
||||||
ec.lastErrors = ec.lastErrors[1:]
|
ec.lastErrors = ec.lastErrors[1:]
|
||||||
@@ -184,36 +185,36 @@ func (ec *ErrorCollector) GenerateReport(errMessage string, errType string, errH
|
|||||||
LastStderr: ec.stderrLines,
|
LastStderr: ec.stderrLines,
|
||||||
FirstErrors: ec.firstErrors,
|
FirstErrors: ec.firstErrors,
|
||||||
}
|
}
|
||||||
|
|
||||||
// Get archive size
|
// Get archive size
|
||||||
if stat, err := os.Stat(ec.archivePath); err == nil {
|
if stat, err := os.Stat(ec.archivePath); err == nil {
|
||||||
report.ArchiveSize = stat.Size()
|
report.ArchiveSize = stat.Size()
|
||||||
}
|
}
|
||||||
|
|
||||||
// Get tool versions
|
// Get tool versions
|
||||||
report.PostgresVersion = getCommandVersion("postgres", "--version")
|
report.PostgresVersion = getCommandVersion("postgres", "--version")
|
||||||
report.PgRestoreVersion = getCommandVersion("pg_restore", "--version")
|
report.PgRestoreVersion = getCommandVersion("pg_restore", "--version")
|
||||||
report.PsqlVersion = getCommandVersion("psql", "--version")
|
report.PsqlVersion = getCommandVersion("psql", "--version")
|
||||||
|
|
||||||
// Analyze failure context
|
// Analyze failure context
|
||||||
report.FailureContext = ec.analyzeFailureContext()
|
report.FailureContext = ec.analyzeFailureContext()
|
||||||
|
|
||||||
// Run diagnosis if not already done
|
// Run diagnosis if not already done
|
||||||
diagnoser := NewDiagnoser(ec.log, false)
|
diagnoser := NewDiagnoser(ec.log, false)
|
||||||
if diagResult, err := diagnoser.DiagnoseFile(ec.archivePath); err == nil {
|
if diagResult, err := diagnoser.DiagnoseFile(ec.archivePath); err == nil {
|
||||||
report.DiagnosisResult = diagResult
|
report.DiagnosisResult = diagResult
|
||||||
}
|
}
|
||||||
|
|
||||||
// Generate recommendations
|
// Generate recommendations
|
||||||
report.Recommendations = ec.generateRecommendations(report)
|
report.Recommendations = ec.generateRecommendations(report)
|
||||||
|
|
||||||
return report
|
return report
|
||||||
}
|
}
|
||||||
|
|
||||||
// analyzeFailureContext extracts context around the failure
|
// analyzeFailureContext extracts context around the failure
|
||||||
func (ec *ErrorCollector) analyzeFailureContext() *FailureContext {
|
func (ec *ErrorCollector) analyzeFailureContext() *FailureContext {
|
||||||
ctx := &FailureContext{}
|
ctx := &FailureContext{}
|
||||||
|
|
||||||
// Look for line number in errors
|
// Look for line number in errors
|
||||||
for _, errLine := range ec.lastErrors {
|
for _, errLine := range ec.lastErrors {
|
||||||
if lineNum := extractLineNumber(errLine); lineNum > 0 {
|
if lineNum := extractLineNumber(errLine); lineNum > 0 {
|
||||||
@@ -221,7 +222,7 @@ func (ec *ErrorCollector) analyzeFailureContext() *FailureContext {
|
|||||||
break
|
break
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Look for COPY-related errors
|
// Look for COPY-related errors
|
||||||
for _, errLine := range ec.lastErrors {
|
for _, errLine := range ec.lastErrors {
|
||||||
if strings.Contains(errLine, "COPY") || strings.Contains(errLine, "syntax error") {
|
if strings.Contains(errLine, "COPY") || strings.Contains(errLine, "syntax error") {
|
||||||
@@ -233,12 +234,12 @@ func (ec *ErrorCollector) analyzeFailureContext() *FailureContext {
|
|||||||
break
|
break
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// If we have a line number, try to get surrounding context from the dump
|
// If we have a line number, try to get surrounding context from the dump
|
||||||
if ctx.FailedLine > 0 && ec.archivePath != "" {
|
if ctx.FailedLine > 0 && ec.archivePath != "" {
|
||||||
ctx.SurroundingLines = ec.getSurroundingLines(ctx.FailedLine, 5)
|
ctx.SurroundingLines = ec.getSurroundingLines(ctx.FailedLine, 5)
|
||||||
}
|
}
|
||||||
|
|
||||||
return ctx
|
return ctx
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -246,13 +247,13 @@ func (ec *ErrorCollector) analyzeFailureContext() *FailureContext {
|
|||||||
func (ec *ErrorCollector) getSurroundingLines(lineNum int, context int) []string {
|
func (ec *ErrorCollector) getSurroundingLines(lineNum int, context int) []string {
|
||||||
var reader io.Reader
|
var reader io.Reader
|
||||||
var lines []string
|
var lines []string
|
||||||
|
|
||||||
file, err := os.Open(ec.archivePath)
|
file, err := os.Open(ec.archivePath)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
defer file.Close()
|
defer file.Close()
|
||||||
|
|
||||||
// Handle compressed files
|
// Handle compressed files
|
||||||
if strings.HasSuffix(ec.archivePath, ".gz") {
|
if strings.HasSuffix(ec.archivePath, ".gz") {
|
||||||
gz, err := gzip.NewReader(file)
|
gz, err := gzip.NewReader(file)
|
||||||
@@ -264,19 +265,19 @@ func (ec *ErrorCollector) getSurroundingLines(lineNum int, context int) []string
|
|||||||
} else {
|
} else {
|
||||||
reader = file
|
reader = file
|
||||||
}
|
}
|
||||||
|
|
||||||
scanner := bufio.NewScanner(reader)
|
scanner := bufio.NewScanner(reader)
|
||||||
buf := make([]byte, 0, 1024*1024)
|
buf := make([]byte, 0, 1024*1024)
|
||||||
scanner.Buffer(buf, 10*1024*1024)
|
scanner.Buffer(buf, 10*1024*1024)
|
||||||
|
|
||||||
currentLine := 0
|
currentLine := 0
|
||||||
startLine := lineNum - context
|
startLine := lineNum - context
|
||||||
endLine := lineNum + context
|
endLine := lineNum + context
|
||||||
|
|
||||||
if startLine < 1 {
|
if startLine < 1 {
|
||||||
startLine = 1
|
startLine = 1
|
||||||
}
|
}
|
||||||
|
|
||||||
for scanner.Scan() {
|
for scanner.Scan() {
|
||||||
currentLine++
|
currentLine++
|
||||||
if currentLine >= startLine && currentLine <= endLine {
|
if currentLine >= startLine && currentLine <= endLine {
|
||||||
@@ -290,18 +291,18 @@ func (ec *ErrorCollector) getSurroundingLines(lineNum int, context int) []string
|
|||||||
break
|
break
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return lines
|
return lines
|
||||||
}
|
}
|
||||||
|
|
||||||
// generateRecommendations provides actionable recommendations based on the error
|
// generateRecommendations provides actionable recommendations based on the error
|
||||||
func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []string {
|
func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []string {
|
||||||
var recs []string
|
var recs []string
|
||||||
|
|
||||||
// Check diagnosis results
|
// Check diagnosis results
|
||||||
if report.DiagnosisResult != nil {
|
if report.DiagnosisResult != nil {
|
||||||
if report.DiagnosisResult.IsTruncated {
|
if report.DiagnosisResult.IsTruncated {
|
||||||
recs = append(recs,
|
recs = append(recs,
|
||||||
"CRITICAL: Backup file is truncated/incomplete",
|
"CRITICAL: Backup file is truncated/incomplete",
|
||||||
"Action: Re-run the backup for the affected database",
|
"Action: Re-run the backup for the affected database",
|
||||||
"Check: Verify disk space was available during backup",
|
"Check: Verify disk space was available during backup",
|
||||||
@@ -317,14 +318,14 @@ func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []
|
|||||||
}
|
}
|
||||||
if report.DiagnosisResult.Details != nil && report.DiagnosisResult.Details.UnterminatedCopy {
|
if report.DiagnosisResult.Details != nil && report.DiagnosisResult.Details.UnterminatedCopy {
|
||||||
recs = append(recs,
|
recs = append(recs,
|
||||||
fmt.Sprintf("ISSUE: COPY block for table '%s' was not terminated",
|
fmt.Sprintf("ISSUE: COPY block for table '%s' was not terminated",
|
||||||
report.DiagnosisResult.Details.LastCopyTable),
|
report.DiagnosisResult.Details.LastCopyTable),
|
||||||
"Cause: Backup was interrupted during data export",
|
"Cause: Backup was interrupted during data export",
|
||||||
"Action: Re-run backup ensuring it completes fully",
|
"Action: Re-run backup ensuring it completes fully",
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check error patterns
|
// Check error patterns
|
||||||
if report.TotalErrors > 1000000 {
|
if report.TotalErrors > 1000000 {
|
||||||
recs = append(recs,
|
recs = append(recs,
|
||||||
@@ -333,7 +334,7 @@ func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []
|
|||||||
"Check: Verify dump format matches restore command",
|
"Check: Verify dump format matches restore command",
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Check for common error types
|
// Check for common error types
|
||||||
errLower := strings.ToLower(report.ErrorMessage)
|
errLower := strings.ToLower(report.ErrorMessage)
|
||||||
if strings.Contains(errLower, "syntax error") {
|
if strings.Contains(errLower, "syntax error") {
|
||||||
@@ -343,7 +344,7 @@ func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []
|
|||||||
"Check: Run 'dbbackup restore diagnose <archive>' for detailed analysis",
|
"Check: Run 'dbbackup restore diagnose <archive>' for detailed analysis",
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
if strings.Contains(errLower, "permission denied") {
|
if strings.Contains(errLower, "permission denied") {
|
||||||
recs = append(recs,
|
recs = append(recs,
|
||||||
"ISSUE: Permission denied",
|
"ISSUE: Permission denied",
|
||||||
@@ -351,7 +352,7 @@ func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []
|
|||||||
"Action: For ownership preservation, use a superuser account",
|
"Action: For ownership preservation, use a superuser account",
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
if strings.Contains(errLower, "does not exist") {
|
if strings.Contains(errLower, "does not exist") {
|
||||||
recs = append(recs,
|
recs = append(recs,
|
||||||
"ISSUE: Missing object reference",
|
"ISSUE: Missing object reference",
|
||||||
@@ -359,7 +360,7 @@ func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []
|
|||||||
"Action: Check if target database was created",
|
"Action: Check if target database was created",
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(recs) == 0 {
|
if len(recs) == 0 {
|
||||||
recs = append(recs,
|
recs = append(recs,
|
||||||
"Run 'dbbackup restore diagnose <archive>' for detailed analysis",
|
"Run 'dbbackup restore diagnose <archive>' for detailed analysis",
|
||||||
@@ -367,7 +368,7 @@ func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []
|
|||||||
"Review the PostgreSQL/MySQL logs on the target server",
|
"Review the PostgreSQL/MySQL logs on the target server",
|
||||||
)
|
)
|
||||||
}
|
}
|
||||||
|
|
||||||
return recs
|
return recs
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -378,18 +379,18 @@ func (ec *ErrorCollector) SaveReport(report *RestoreErrorReport, outputPath stri
|
|||||||
if err := os.MkdirAll(dir, 0755); err != nil {
|
if err := os.MkdirAll(dir, 0755); err != nil {
|
||||||
return fmt.Errorf("failed to create directory: %w", err)
|
return fmt.Errorf("failed to create directory: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Marshal to JSON with indentation
|
// Marshal to JSON with indentation
|
||||||
data, err := json.MarshalIndent(report, "", " ")
|
data, err := json.MarshalIndent(report, "", " ")
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return fmt.Errorf("failed to marshal report: %w", err)
|
return fmt.Errorf("failed to marshal report: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Write file
|
// Write file
|
||||||
if err := os.WriteFile(outputPath, data, 0644); err != nil {
|
if err := os.WriteFile(outputPath, data, 0644); err != nil {
|
||||||
return fmt.Errorf("failed to write report: %w", err)
|
return fmt.Errorf("failed to write report: %w", err)
|
||||||
}
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -399,35 +400,35 @@ func (ec *ErrorCollector) PrintReport(report *RestoreErrorReport) {
|
|||||||
fmt.Println(strings.Repeat("═", 70))
|
fmt.Println(strings.Repeat("═", 70))
|
||||||
fmt.Println(" 🔴 RESTORE ERROR REPORT")
|
fmt.Println(" 🔴 RESTORE ERROR REPORT")
|
||||||
fmt.Println(strings.Repeat("═", 70))
|
fmt.Println(strings.Repeat("═", 70))
|
||||||
|
|
||||||
fmt.Printf("\n📅 Timestamp: %s\n", report.Timestamp.Format("2006-01-02 15:04:05"))
|
fmt.Printf("\n📅 Timestamp: %s\n", report.Timestamp.Format("2006-01-02 15:04:05"))
|
||||||
fmt.Printf("📦 Archive: %s\n", filepath.Base(report.ArchivePath))
|
fmt.Printf("📦 Archive: %s\n", filepath.Base(report.ArchivePath))
|
||||||
fmt.Printf("📊 Format: %s\n", report.ArchiveFormat)
|
fmt.Printf("📊 Format: %s\n", report.ArchiveFormat)
|
||||||
fmt.Printf("🎯 Target DB: %s\n", report.TargetDB)
|
fmt.Printf("🎯 Target DB: %s\n", report.TargetDB)
|
||||||
fmt.Printf("⚠️ Exit Code: %d\n", report.ExitCode)
|
fmt.Printf("⚠️ Exit Code: %d\n", report.ExitCode)
|
||||||
fmt.Printf("❌ Total Errors: %d\n", report.TotalErrors)
|
fmt.Printf("❌ Total Errors: %d\n", report.TotalErrors)
|
||||||
|
|
||||||
fmt.Println("\n" + strings.Repeat("─", 70))
|
fmt.Println("\n" + strings.Repeat("─", 70))
|
||||||
fmt.Println("ERROR DETAILS:")
|
fmt.Println("ERROR DETAILS:")
|
||||||
fmt.Println(strings.Repeat("─", 70))
|
fmt.Println(strings.Repeat("─", 70))
|
||||||
|
|
||||||
fmt.Printf("\nType: %s\n", report.ErrorType)
|
fmt.Printf("\nType: %s\n", report.ErrorType)
|
||||||
fmt.Printf("Message: %s\n", report.ErrorMessage)
|
fmt.Printf("Message: %s\n", report.ErrorMessage)
|
||||||
if report.ErrorHint != "" {
|
if report.ErrorHint != "" {
|
||||||
fmt.Printf("Hint: %s\n", report.ErrorHint)
|
fmt.Printf("Hint: %s\n", report.ErrorHint)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show failure context
|
// Show failure context
|
||||||
if report.FailureContext != nil && report.FailureContext.FailedLine > 0 {
|
if report.FailureContext != nil && report.FailureContext.FailedLine > 0 {
|
||||||
fmt.Println("\n" + strings.Repeat("─", 70))
|
fmt.Println("\n" + strings.Repeat("─", 70))
|
||||||
fmt.Println("FAILURE CONTEXT:")
|
fmt.Println("FAILURE CONTEXT:")
|
||||||
fmt.Println(strings.Repeat("─", 70))
|
fmt.Println(strings.Repeat("─", 70))
|
||||||
|
|
||||||
fmt.Printf("\nFailed at line: %d\n", report.FailureContext.FailedLine)
|
fmt.Printf("\nFailed at line: %d\n", report.FailureContext.FailedLine)
|
||||||
if report.FailureContext.InCopyBlock {
|
if report.FailureContext.InCopyBlock {
|
||||||
fmt.Printf("Inside COPY block for table: %s\n", report.FailureContext.CopyTableName)
|
fmt.Printf("Inside COPY block for table: %s\n", report.FailureContext.CopyTableName)
|
||||||
}
|
}
|
||||||
|
|
||||||
if len(report.FailureContext.SurroundingLines) > 0 {
|
if len(report.FailureContext.SurroundingLines) > 0 {
|
||||||
fmt.Println("\nSurrounding lines:")
|
fmt.Println("\nSurrounding lines:")
|
||||||
for _, line := range report.FailureContext.SurroundingLines {
|
for _, line := range report.FailureContext.SurroundingLines {
|
||||||
@@ -435,13 +436,13 @@ func (ec *ErrorCollector) PrintReport(report *RestoreErrorReport) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show first few errors
|
// Show first few errors
|
||||||
if len(report.FirstErrors) > 0 {
|
if len(report.FirstErrors) > 0 {
|
||||||
fmt.Println("\n" + strings.Repeat("─", 70))
|
fmt.Println("\n" + strings.Repeat("─", 70))
|
||||||
fmt.Println("FIRST ERRORS:")
|
fmt.Println("FIRST ERRORS:")
|
||||||
fmt.Println(strings.Repeat("─", 70))
|
fmt.Println(strings.Repeat("─", 70))
|
||||||
|
|
||||||
for i, err := range report.FirstErrors {
|
for i, err := range report.FirstErrors {
|
||||||
if i >= 5 {
|
if i >= 5 {
|
||||||
fmt.Printf("... and %d more\n", len(report.FirstErrors)-5)
|
fmt.Printf("... and %d more\n", len(report.FirstErrors)-5)
|
||||||
@@ -450,13 +451,13 @@ func (ec *ErrorCollector) PrintReport(report *RestoreErrorReport) {
|
|||||||
fmt.Printf(" %d. %s\n", i+1, truncateString(err, 100))
|
fmt.Printf(" %d. %s\n", i+1, truncateString(err, 100))
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show diagnosis summary
|
// Show diagnosis summary
|
||||||
if report.DiagnosisResult != nil && !report.DiagnosisResult.IsValid {
|
if report.DiagnosisResult != nil && !report.DiagnosisResult.IsValid {
|
||||||
fmt.Println("\n" + strings.Repeat("─", 70))
|
fmt.Println("\n" + strings.Repeat("─", 70))
|
||||||
fmt.Println("DIAGNOSIS:")
|
fmt.Println("DIAGNOSIS:")
|
||||||
fmt.Println(strings.Repeat("─", 70))
|
fmt.Println(strings.Repeat("─", 70))
|
||||||
|
|
||||||
if report.DiagnosisResult.IsTruncated {
|
if report.DiagnosisResult.IsTruncated {
|
||||||
fmt.Println(" ❌ File is TRUNCATED")
|
fmt.Println(" ❌ File is TRUNCATED")
|
||||||
}
|
}
|
||||||
@@ -470,21 +471,21 @@ func (ec *ErrorCollector) PrintReport(report *RestoreErrorReport) {
|
|||||||
fmt.Printf(" • %s\n", err)
|
fmt.Printf(" • %s\n", err)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show recommendations
|
// Show recommendations
|
||||||
fmt.Println("\n" + strings.Repeat("─", 70))
|
fmt.Println("\n" + strings.Repeat("─", 70))
|
||||||
fmt.Println("💡 RECOMMENDATIONS:")
|
fmt.Println("💡 RECOMMENDATIONS:")
|
||||||
fmt.Println(strings.Repeat("─", 70))
|
fmt.Println(strings.Repeat("─", 70))
|
||||||
|
|
||||||
for _, rec := range report.Recommendations {
|
for _, rec := range report.Recommendations {
|
||||||
fmt.Printf(" • %s\n", rec)
|
fmt.Printf(" • %s\n", rec)
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show tool versions
|
// Show tool versions
|
||||||
fmt.Println("\n" + strings.Repeat("─", 70))
|
fmt.Println("\n" + strings.Repeat("─", 70))
|
||||||
fmt.Println("ENVIRONMENT:")
|
fmt.Println("ENVIRONMENT:")
|
||||||
fmt.Println(strings.Repeat("─", 70))
|
fmt.Println(strings.Repeat("─", 70))
|
||||||
|
|
||||||
fmt.Printf(" OS: %s/%s\n", report.OS, report.Arch)
|
fmt.Printf(" OS: %s/%s\n", report.OS, report.Arch)
|
||||||
fmt.Printf(" Go: %s\n", report.GoVersion)
|
fmt.Printf(" Go: %s\n", report.GoVersion)
|
||||||
if report.PgRestoreVersion != "" {
|
if report.PgRestoreVersion != "" {
|
||||||
@@ -493,15 +494,15 @@ func (ec *ErrorCollector) PrintReport(report *RestoreErrorReport) {
|
|||||||
if report.PsqlVersion != "" {
|
if report.PsqlVersion != "" {
|
||||||
fmt.Printf(" psql: %s\n", report.PsqlVersion)
|
fmt.Printf(" psql: %s\n", report.PsqlVersion)
|
||||||
}
|
}
|
||||||
|
|
||||||
fmt.Println(strings.Repeat("═", 70))
|
fmt.Println(strings.Repeat("═", 70))
|
||||||
}
|
}
|
||||||
|
|
||||||
// Helper functions
|
// Helper functions
|
||||||
|
|
||||||
func isErrorLine(line string) bool {
|
func isErrorLine(line string) bool {
|
||||||
return strings.Contains(line, "ERROR:") ||
|
return strings.Contains(line, "ERROR:") ||
|
||||||
strings.Contains(line, "FATAL:") ||
|
strings.Contains(line, "FATAL:") ||
|
||||||
strings.Contains(line, "error:") ||
|
strings.Contains(line, "error:") ||
|
||||||
strings.Contains(line, "PANIC:")
|
strings.Contains(line, "PANIC:")
|
||||||
}
|
}
|
||||||
@@ -556,7 +557,11 @@ func getDatabaseType(format ArchiveFormat) string {
|
|||||||
}
|
}
|
||||||
|
|
||||||
func getCommandVersion(cmd string, arg string) string {
|
func getCommandVersion(cmd string, arg string) string {
|
||||||
output, err := exec.Command(cmd, arg).CombinedOutput()
|
// Use timeout to prevent blocking if command hangs
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
output, err := exec.CommandContext(ctx, cmd, arg).CombinedOutput()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return ""
|
return ""
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -6,6 +6,7 @@ import (
|
|||||||
"os/exec"
|
"os/exec"
|
||||||
"regexp"
|
"regexp"
|
||||||
"strconv"
|
"strconv"
|
||||||
|
"time"
|
||||||
|
|
||||||
"dbbackup/internal/database"
|
"dbbackup/internal/database"
|
||||||
)
|
)
|
||||||
@@ -47,8 +48,13 @@ func ParsePostgreSQLVersion(versionStr string) (*VersionInfo, error) {
|
|||||||
|
|
||||||
// GetDumpFileVersion extracts the PostgreSQL version from a dump file
|
// GetDumpFileVersion extracts the PostgreSQL version from a dump file
|
||||||
// Uses pg_restore -l to read the dump metadata
|
// Uses pg_restore -l to read the dump metadata
|
||||||
|
// Uses a 30-second timeout to avoid blocking on large files
|
||||||
func GetDumpFileVersion(dumpPath string) (*VersionInfo, error) {
|
func GetDumpFileVersion(dumpPath string) (*VersionInfo, error) {
|
||||||
cmd := exec.Command("pg_restore", "-l", dumpPath)
|
// Use a timeout context to prevent blocking on very large dump files
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||||
|
defer cancel()
|
||||||
|
|
||||||
|
cmd := exec.CommandContext(ctx, "pg_restore", "-l", dumpPath)
|
||||||
output, err := cmd.CombinedOutput()
|
output, err := cmd.CombinedOutput()
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return nil, fmt.Errorf("failed to read dump file metadata: %w (output: %s)", err, string(output))
|
return nil, fmt.Errorf("failed to read dump file metadata: %w (output: %s)", err, string(output))
|
||||||
|
|||||||
@@ -83,10 +83,10 @@ type backupCompleteMsg struct {
|
|||||||
|
|
||||||
func executeBackupWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, backupType, dbName string, ratio int) tea.Cmd {
|
func executeBackupWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, backupType, dbName string, ratio int) tea.Cmd {
|
||||||
return func() tea.Msg {
|
return func() tea.Msg {
|
||||||
// Use configurable cluster timeout (minutes) from config; default set in config.New()
|
// NO TIMEOUT for backup operations - a backup takes as long as it takes
|
||||||
// Use parent context to inherit cancellation from TUI
|
// Large databases can take many hours
|
||||||
clusterTimeout := time.Duration(cfg.ClusterTimeoutMinutes) * time.Minute
|
// Only manual cancellation (Ctrl+C) should stop the backup
|
||||||
ctx, cancel := context.WithTimeout(parentCtx, clusterTimeout)
|
ctx, cancel := context.WithCancel(parentCtx)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
start := time.Now()
|
start := time.Now()
|
||||||
|
|||||||
@@ -67,7 +67,6 @@ func (m ConfirmationModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
|||||||
switch msg := msg.(type) {
|
switch msg := msg.(type) {
|
||||||
case autoConfirmMsg:
|
case autoConfirmMsg:
|
||||||
// Auto-confirm triggered
|
// Auto-confirm triggered
|
||||||
m.confirmed = true
|
|
||||||
if m.onConfirm != nil {
|
if m.onConfirm != nil {
|
||||||
return m.onConfirm()
|
return m.onConfirm()
|
||||||
}
|
}
|
||||||
@@ -95,7 +94,6 @@ func (m ConfirmationModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
|||||||
|
|
||||||
case "enter", "y":
|
case "enter", "y":
|
||||||
if msg.String() == "y" || m.cursor == 0 {
|
if msg.String() == "y" || m.cursor == 0 {
|
||||||
m.confirmed = true
|
|
||||||
// Execute the onConfirm callback if provided
|
// Execute the onConfirm callback if provided
|
||||||
if m.onConfirm != nil {
|
if m.onConfirm != nil {
|
||||||
return m.onConfirm()
|
return m.onConfirm()
|
||||||
|
|||||||
@@ -53,7 +53,8 @@ type databaseListMsg struct {
|
|||||||
|
|
||||||
func fetchDatabases(cfg *config.Config, log logger.Logger) tea.Cmd {
|
func fetchDatabases(cfg *config.Config, log logger.Logger) tea.Cmd {
|
||||||
return func() tea.Msg {
|
return func() tea.Msg {
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
|
// 60 seconds for database listing - busy servers may be slow
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
dbClient, err := database.New(cfg, log)
|
dbClient, err := database.New(cfg, log)
|
||||||
|
|||||||
@@ -2,7 +2,7 @@ package tui
|
|||||||
|
|
||||||
import (
|
import (
|
||||||
"fmt"
|
"fmt"
|
||||||
"io/ioutil"
|
"os"
|
||||||
"strings"
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
||||||
@@ -59,7 +59,7 @@ func loadHistory(cfg *config.Config) []HistoryEntry {
|
|||||||
var entries []HistoryEntry
|
var entries []HistoryEntry
|
||||||
|
|
||||||
// Read backup files from backup directory
|
// Read backup files from backup directory
|
||||||
files, err := ioutil.ReadDir(cfg.BackupDir)
|
files, err := os.ReadDir(cfg.BackupDir)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
return entries
|
return entries
|
||||||
}
|
}
|
||||||
@@ -74,6 +74,12 @@ func loadHistory(cfg *config.Config) []HistoryEntry {
|
|||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Get file info for ModTime
|
||||||
|
info, err := file.Info()
|
||||||
|
if err != nil {
|
||||||
|
continue
|
||||||
|
}
|
||||||
|
|
||||||
var backupType string
|
var backupType string
|
||||||
var database string
|
var database string
|
||||||
|
|
||||||
@@ -97,7 +103,7 @@ func loadHistory(cfg *config.Config) []HistoryEntry {
|
|||||||
entries = append(entries, HistoryEntry{
|
entries = append(entries, HistoryEntry{
|
||||||
Type: backupType,
|
Type: backupType,
|
||||||
Database: database,
|
Database: database,
|
||||||
Timestamp: file.ModTime(),
|
Timestamp: info.ModTime(),
|
||||||
Status: "✅ Completed",
|
Status: "✅ Completed",
|
||||||
Filename: name,
|
Filename: name,
|
||||||
})
|
})
|
||||||
|
|||||||
@@ -111,10 +111,10 @@ type restoreCompleteMsg struct {
|
|||||||
|
|
||||||
func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string, cleanFirst, createIfMissing bool, restoreType string, cleanClusterFirst bool, existingDBs []string, saveDebugLog bool) tea.Cmd {
|
func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string, cleanFirst, createIfMissing bool, restoreType string, cleanClusterFirst bool, existingDBs []string, saveDebugLog bool) tea.Cmd {
|
||||||
return func() tea.Msg {
|
return func() tea.Msg {
|
||||||
// Use configurable cluster timeout (minutes) from config; default set in config.New()
|
// NO TIMEOUT for restore operations - a restore takes as long as it takes
|
||||||
// Use parent context to inherit cancellation from TUI
|
// Large databases with large objects can take many hours
|
||||||
restoreTimeout := time.Duration(cfg.ClusterTimeoutMinutes) * time.Minute
|
// Only manual cancellation (Ctrl+C) should stop the restore
|
||||||
ctx, cancel := context.WithTimeout(parentCtx, restoreTimeout)
|
ctx, cancel := context.WithCancel(parentCtx)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
start := time.Now()
|
start := time.Now()
|
||||||
@@ -138,8 +138,8 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
|
|||||||
// This matches how cluster restore works - uses CLI tools, not database connections
|
// This matches how cluster restore works - uses CLI tools, not database connections
|
||||||
droppedCount := 0
|
droppedCount := 0
|
||||||
for _, dbName := range existingDBs {
|
for _, dbName := range existingDBs {
|
||||||
// Create timeout context for each database drop (30 seconds per DB)
|
// Create timeout context for each database drop (5 minutes per DB - large DBs take time)
|
||||||
dropCtx, dropCancel := context.WithTimeout(ctx, 30*time.Second)
|
dropCtx, dropCancel := context.WithTimeout(ctx, 5*time.Minute)
|
||||||
if err := dropDatabaseCLI(dropCtx, cfg, dbName); err != nil {
|
if err := dropDatabaseCLI(dropCtx, cfg, dbName); err != nil {
|
||||||
log.Warn("Failed to drop database", "name", dbName, "error", err)
|
log.Warn("Failed to drop database", "name", dbName, "error", err)
|
||||||
// Continue with other databases
|
// Continue with other databases
|
||||||
|
|||||||
@@ -106,7 +106,8 @@ type safetyCheckCompleteMsg struct {
|
|||||||
|
|
||||||
func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string) tea.Cmd {
|
func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string) tea.Cmd {
|
||||||
return func() tea.Msg {
|
return func() tea.Msg {
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
|
// 10 minutes for safety checks - large archives can take a long time to diagnose
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
safety := restore.NewSafety(cfg, log)
|
safety := restore.NewSafety(cfg, log)
|
||||||
@@ -444,7 +445,7 @@ func (m RestorePreviewModel) View() string {
|
|||||||
// Advanced Options
|
// Advanced Options
|
||||||
s.WriteString(archiveHeaderStyle.Render("⚙️ Advanced Options"))
|
s.WriteString(archiveHeaderStyle.Render("⚙️ Advanced Options"))
|
||||||
s.WriteString("\n")
|
s.WriteString("\n")
|
||||||
|
|
||||||
// Work directory option
|
// Work directory option
|
||||||
workDirIcon := "✗"
|
workDirIcon := "✗"
|
||||||
workDirStyle := infoStyle
|
workDirStyle := infoStyle
|
||||||
@@ -460,7 +461,7 @@ func (m RestorePreviewModel) View() string {
|
|||||||
s.WriteString(infoStyle.Render(" ⚠️ Large archives need more space than /tmp may have"))
|
s.WriteString(infoStyle.Render(" ⚠️ Large archives need more space than /tmp may have"))
|
||||||
s.WriteString("\n")
|
s.WriteString("\n")
|
||||||
}
|
}
|
||||||
|
|
||||||
// Debug log option
|
// Debug log option
|
||||||
debugIcon := "✗"
|
debugIcon := "✗"
|
||||||
debugStyle := infoStyle
|
debugStyle := infoStyle
|
||||||
|
|||||||
@@ -482,7 +482,6 @@ func (m SettingsModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
|
|||||||
|
|
||||||
switch msg.String() {
|
switch msg.String() {
|
||||||
case "ctrl+c", "q", "esc":
|
case "ctrl+c", "q", "esc":
|
||||||
m.quitting = true
|
|
||||||
return m.parent, nil
|
return m.parent, nil
|
||||||
|
|
||||||
case "up", "k":
|
case "up", "k":
|
||||||
|
|||||||
@@ -70,7 +70,8 @@ type statusMsg struct {
|
|||||||
|
|
||||||
func fetchStatus(cfg *config.Config, log logger.Logger) tea.Cmd {
|
func fetchStatus(cfg *config.Config, log logger.Logger) tea.Cmd {
|
||||||
return func() tea.Msg {
|
return func() tea.Msg {
|
||||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
// 30 seconds for status check - slow networks or SSL negotiation
|
||||||
|
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
|
||||||
defer cancel()
|
defer cancel()
|
||||||
|
|
||||||
dbClient, err := database.New(cfg, log)
|
dbClient, err := database.New(cfg, log)
|
||||||
|
|||||||
Reference in New Issue
Block a user