Compare commits

...

42 Commits

Author SHA1 Message Date
56ad0824c7 ci: simplify JSON creation, add HTTP code debug
All checks were successful
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m22s
CI/CD / Build & Release (push) Successful in 1m51s
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-07 12:57:07 +01:00
ec65df2976 ci: add verbose output for binary upload debugging
All checks were successful
CI/CD / Test (push) Successful in 1m14s
CI/CD / Lint (push) Successful in 1m22s
CI/CD / Build & Release (push) Successful in 1m51s
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-07 12:55:08 +01:00
23cc1e0e08 ci: use jq to build JSON payload safely
All checks were successful
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m21s
CI/CD / Build & Release (push) Successful in 1m53s
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-07 12:52:59 +01:00
7770abab6f ci: fix JSON escaping in release creation
Some checks failed
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m22s
CI/CD / Build & Release (push) Failing after 1m49s
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-07 12:45:03 +01:00
f6a20f035b ci: simplified build-and-release job, add optional GitHub mirror
Some checks failed
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m22s
CI/CD / Build & Release (push) Failing after 1m52s
CI/CD / Mirror to GitHub (push) Has been skipped
- Removed matrix build + artifact passing (was failing)
- Single job builds all platforms and creates release
- Added optional mirror-to-github job (needs GITHUB_MIRROR_TOKEN var)
- Better error handling for release creation
2026-01-07 12:31:21 +01:00
28e54d118f ci: use github.token instead of secrets.GITEA_TOKEN
Some checks failed
CI/CD / Test (push) Successful in 1m14s
CI/CD / Lint (push) Successful in 1m23s
CI/CD / Release (push) Has been skipped
CI/CD / Build (amd64, darwin) (push) Failing after 30s
CI/CD / Build (amd64, linux) (push) Failing after 30s
CI/CD / Build (arm64, darwin) (push) Failing after 30s
CI/CD / Build (arm64, linux) (push) Failing after 31s
2026-01-07 12:20:41 +01:00
ab0ff3f28d ci: add release job with Gitea binary uploads
All checks were successful
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m21s
CI/CD / Build (amd64, darwin) (push) Successful in 42s
CI/CD / Build (amd64, linux) (push) Successful in 30s
CI/CD / Build (arm64, darwin) (push) Successful in 30s
CI/CD / Build (arm64, linux) (push) Successful in 31s
CI/CD / Release (push) Has been skipped
- Upload artifacts on tag pushes
- Create release via Gitea API
- Attach all platform binaries to release
2026-01-07 12:10:33 +01:00
b7dd325c51 chore: remove binaries from git tracking
All checks were successful
CI/CD / Test (push) Successful in 1m15s
CI/CD / Lint (push) Successful in 1m22s
CI/CD / Build (amd64, darwin) (push) Successful in 30s
CI/CD / Build (amd64, linux) (push) Successful in 29s
CI/CD / Build (arm64, darwin) (push) Successful in 30s
CI/CD / Build (arm64, linux) (push) Successful in 30s
- Add bin/dbbackup_* to .gitignore
- Binaries distributed via GitHub Releases instead
- Reduces repo size and eliminates large file warnings
2026-01-07 12:04:22 +01:00
2ed54141a3 chore: rebuild all platform binaries
Some checks failed
CI/CD / Test (push) Successful in 2m43s
CI/CD / Lint (push) Successful in 2m50s
CI/CD / Build (amd64, linux) (push) Has been cancelled
CI/CD / Build (arm64, darwin) (push) Has been cancelled
CI/CD / Build (arm64, linux) (push) Has been cancelled
CI/CD / Build (amd64, darwin) (push) Has been cancelled
2026-01-07 11:57:08 +01:00
495ee31247 docs: add comprehensive SYSTEMD.md installation guide
- Create dedicated SYSTEMD.md with full manual installation steps
- Cover security hardening, multiple instances, troubleshooting
- Document Prometheus exporter manual setup
- Simplify README systemd section with link to detailed guide
- Add SYSTEMD.md to documentation list
2026-01-07 11:55:20 +01:00
78e10f5057 fix: installer issues found during testing
- Remove invalid --config flag from exporter service template
- Change ReadOnlyPaths to ReadWritePaths for catalog access
- Add copyBinary() to install binary to /usr/local/bin (ProtectHome compat)
- Fix exporter status detection using direct systemctl check
- Add os/exec import for status check
2026-01-07 11:50:51 +01:00
f4a0e2d82c build: rebuild all platform binaries with dry-run fix
All checks were successful
CI/CD / Test (push) Successful in 2m49s
CI/CD / Lint (push) Successful in 2m50s
CI/CD / Build (amd64, darwin) (push) Successful in 1m58s
CI/CD / Build (amd64, linux) (push) Successful in 1m58s
CI/CD / Build (arm64, darwin) (push) Successful in 2m0s
CI/CD / Build (arm64, linux) (push) Successful in 1m59s
2026-01-07 11:40:10 +01:00
f66d19acb0 fix: allow dry-run install without root privileges
Some checks failed
CI/CD / Test (push) Successful in 2m53s
CI/CD / Build (amd64, darwin) (push) Has been cancelled
CI/CD / Build (amd64, linux) (push) Has been cancelled
CI/CD / Build (arm64, darwin) (push) Has been cancelled
CI/CD / Build (arm64, linux) (push) Has been cancelled
CI/CD / Lint (push) Has been cancelled
2026-01-07 11:37:13 +01:00
16f377e9b5 docs: update README with systemd and Prometheus metrics sections
Some checks failed
CI/CD / Test (push) Successful in 2m45s
CI/CD / Lint (push) Successful in 2m56s
CI/CD / Build (amd64, linux) (push) Has been cancelled
CI/CD / Build (arm64, darwin) (push) Has been cancelled
CI/CD / Build (arm64, linux) (push) Has been cancelled
CI/CD / Build (amd64, darwin) (push) Has been cancelled
- Add install/uninstall and metrics commands to command table
- Add Systemd Integration section with install examples
- Add Prometheus Metrics section with textfile and HTTP exporter docs
- Update Features list with systemd and metrics highlights
- Rebuild all platform binaries
2026-01-07 11:26:54 +01:00
7e32a0369d feat: add embedded systemd installer and Prometheus metrics
Some checks failed
CI/CD / Test (push) Successful in 2m42s
CI/CD / Lint (push) Successful in 2m50s
CI/CD / Build (amd64, darwin) (push) Successful in 2m0s
CI/CD / Build (amd64, linux) (push) Successful in 1m58s
CI/CD / Build (arm64, darwin) (push) Successful in 2m1s
CI/CD / Build (arm64, linux) (push) Has been cancelled
Systemd Integration:
- New 'dbbackup install' command creates service/timer units
- Supports single-database and cluster backup modes
- Automatic dbbackup user/group creation with proper permissions
- Hardened service units with security features
- Template units with configurable OnCalendar schedules
- 'dbbackup uninstall' for clean removal

Prometheus Metrics:
- 'dbbackup metrics export' for textfile collector format
- 'dbbackup metrics serve' runs HTTP exporter on port 9399
- Metrics: last_success_timestamp, rpo_seconds, backup_total, etc.
- Integration with node_exporter textfile collector
- --with-metrics flag during install

Technical:
- Systemd templates embedded with //go:embed
- Service units include ReadWritePaths, OOMScoreAdjust
- Metrics exporter caches with 30s TTL
- Graceful shutdown on SIGTERM
2026-01-07 11:18:09 +01:00
120ee33e3b build: v3.41.0 binaries with TUI cancellation fix
All checks were successful
CI/CD / Test (push) Successful in 2m44s
CI/CD / Lint (push) Successful in 2m51s
CI/CD / Build (amd64, darwin) (push) Successful in 1m58s
CI/CD / Build (amd64, linux) (push) Successful in 1m59s
CI/CD / Build (arm64, darwin) (push) Successful in 2m1s
CI/CD / Build (arm64, linux) (push) Successful in 1m59s
2026-01-07 09:55:08 +01:00
9f375621d1 fix(tui): enable Ctrl+C/ESC to cancel running backup/restore operations
PROBLEM: Users could not interrupt backup or restore operations through
the TUI interface. Pressing Ctrl+C or ESC did nothing during execution.

ROOT CAUSE:
- BackupExecutionModel ignored ALL key presses while running (only handled when done)
- RestoreExecutionModel returned tea.Quit but didn't cancel the context
- The operation goroutine kept running in the background with its own context

FIX:
- Added cancel context.CancelFunc to both execution models
- Create child context with WithCancel in New*Execution constructors
- Handle ctrl+c and esc during execution to call cancel()
- Show 'Cancelling...' status while waiting for graceful shutdown
- Show cancel hint in View: 'Press Ctrl+C or ESC to cancel'

The fix works because:
- exec.CommandContext(ctx) will SIGKILL the subprocess when ctx is cancelled
- pg_dump, pg_restore, psql, mysql all get terminated properly
- User sees immediate feedback that cancellation is in progress
2026-01-07 09:53:47 +01:00
9ad925191e build: v3.41.0 binaries with P0 security fixes
Some checks failed
CI/CD / Test (push) Successful in 2m45s
CI/CD / Lint (push) Successful in 2m51s
CI/CD / Build (amd64, darwin) (push) Successful in 1m59s
CI/CD / Build (arm64, darwin) (push) Has been cancelled
CI/CD / Build (arm64, linux) (push) Has been cancelled
CI/CD / Build (amd64, linux) (push) Has been cancelled
2026-01-07 09:46:49 +01:00
9d8a6e763e security: P0 fixes - SQL injection prevention + data race fix
- Add identifier validation for database names in PostgreSQL and MySQL
  - validateIdentifier() rejects names with invalid characters
  - quoteIdentifier() safely quotes identifiers with proper escaping
  - Max length: 63 chars (PostgreSQL), 64 chars (MySQL)
  - Only allows alphanumeric + underscores, must start with letter/underscore

- Fix data race in notification manager
  - Multiple goroutines were appending to shared error slice
  - Added errMu sync.Mutex to protect concurrent error collection

- Security improvements prevent:
  - SQL injection via malicious database names
  - CREATE DATABASE `foo`; DROP DATABASE production; --`
  - Race conditions causing lost or corrupted error data
2026-01-07 09:45:13 +01:00
63b16eee8b build: v3.41.0 binaries with DB+Go specialist fixes
All checks were successful
CI/CD / Test (push) Successful in 2m41s
CI/CD / Lint (push) Successful in 2m50s
CI/CD / Build (amd64, darwin) (push) Successful in 1m58s
CI/CD / Build (amd64, linux) (push) Successful in 1m58s
CI/CD / Build (arm64, darwin) (push) Successful in 1m56s
CI/CD / Build (arm64, linux) (push) Successful in 1m58s
2026-01-07 08:59:53 +01:00
91228552fb fix(backup/restore): implement DB+Go specialist recommendations
P0: Add ON_ERROR_STOP=1 to psql (fail fast, not 2.6M errors)
P1: Fix pipe deadlock in streaming compression (goroutine+context)
P1: Handle SIGPIPE (exit 141) - report compressor as root cause
P2: Validate .dump files with pg_restore --list before restore
P2: Add fsync after streaming compression for durability

Fixes potential hung backups and improves error diagnostics.
2026-01-07 08:58:00 +01:00
9ee55309bd docs: update CHANGELOG for v3.41.0 pre-restore validation
Some checks failed
CI/CD / Test (push) Successful in 2m41s
CI/CD / Lint (push) Successful in 2m50s
CI/CD / Build (amd64, darwin) (push) Successful in 1m59s
CI/CD / Build (amd64, linux) (push) Successful in 1m57s
CI/CD / Build (arm64, darwin) (push) Successful in 1m58s
CI/CD / Build (arm64, linux) (push) Has been cancelled
2026-01-07 08:48:38 +01:00
0baf741c0b build: v3.40.0 binaries for all platforms
Some checks failed
CI/CD / Test (push) Successful in 2m44s
CI/CD / Lint (push) Successful in 2m47s
CI/CD / Build (amd64, darwin) (push) Successful in 1m58s
CI/CD / Build (amd64, linux) (push) Successful in 1m57s
CI/CD / Build (arm64, linux) (push) Has been cancelled
CI/CD / Build (arm64, darwin) (push) Has been cancelled
2026-01-07 08:36:26 +01:00
faace7271c fix(restore): add pre-validation for truncated SQL dumps
Some checks failed
CI/CD / Test (push) Successful in 2m42s
CI/CD / Build (amd64, darwin) (push) Has been cancelled
CI/CD / Build (amd64, linux) (push) Has been cancelled
CI/CD / Build (arm64, darwin) (push) Has been cancelled
CI/CD / Build (arm64, linux) (push) Has been cancelled
CI/CD / Lint (push) Has been cancelled
- Validate SQL dump files BEFORE attempting restore
- Detect unterminated COPY blocks that cause 'syntax error' failures
- Cluster restore now pre-validates ALL dumps upfront (fail-fast)
- Saves hours of wasted restore time on corrupted backups

The truncated resydb.sql.gz was causing 49min restore attempts
that failed with 2.6M errors. Now fails immediately with clear
error message showing which table's COPY block was truncated.
2026-01-07 08:34:10 +01:00
c3ade7a693 Include pre-built binaries for distribution
All checks were successful
CI/CD / Test (push) Successful in 2m36s
CI/CD / Lint (push) Successful in 2m45s
CI/CD / Build (amd64, darwin) (push) Successful in 1m53s
CI/CD / Build (amd64, linux) (push) Successful in 1m56s
CI/CD / Build (arm64, darwin) (push) Successful in 1m54s
CI/CD / Build (arm64, linux) (push) Successful in 1m54s
2026-01-06 15:32:47 +01:00
52d475506c fix(backup): dynamic timeout for large database backups
All checks were successful
CI/CD / Test (push) Successful in 1m11s
CI/CD / Lint (push) Successful in 1m20s
CI/CD / Build (amd64, darwin) (push) Successful in 29s
CI/CD / Build (amd64, linux) (push) Successful in 28s
CI/CD / Build (arm64, darwin) (push) Successful in 29s
CI/CD / Build (arm64, linux) (push) Successful in 29s
- 2-hour timeout was causing truncated backups for databases > 40GB
- Now scales: 2 hours base + 1 hour per 20GB
- 69GB database now gets ~5.5 hour timeout
- Fixed streaming compression error handling order

Fixes truncated resydb.sql.gz in cluster backups
2026-01-06 15:09:29 +01:00
938ee61686 docs: update README with v3.40.0 TUI features (Diagnose, WorkDir)
All checks were successful
CI/CD / Test (push) Successful in 1m12s
CI/CD / Lint (push) Successful in 1m19s
CI/CD / Build (amd64, darwin) (push) Successful in 29s
CI/CD / Build (amd64, linux) (push) Successful in 29s
CI/CD / Build (arm64, darwin) (push) Successful in 29s
CI/CD / Build (arm64, linux) (push) Successful in 29s
2026-01-06 14:58:10 +01:00
85b61048c0 fix(ci): simplify CI - use github.token via env, remove mirror until working
All checks were successful
CI/CD / Test (push) Successful in 1m12s
CI/CD / Lint (push) Successful in 1m20s
CI/CD / Build (amd64, darwin) (push) Successful in 30s
CI/CD / Build (amd64, linux) (push) Successful in 29s
CI/CD / Build (arm64, darwin) (push) Successful in 29s
CI/CD / Build (arm64, linux) (push) Successful in 29s
2026-01-06 14:13:54 +01:00
30954cb7c2 fix(ci): use GITHUB_TOKEN for repo authentication
Some checks failed
CI/CD / Test (push) Failing after 4s
CI/CD / Generate SBOM (push) Has been skipped
CI/CD / Lint (push) Failing after 4s
CI/CD / Build (darwin-amd64) (push) Has been skipped
CI/CD / Build (linux-amd64) (push) Has been skipped
CI/CD / Build (darwin-arm64) (push) Has been skipped
CI/CD / Build (linux-arm64) (push) Has been skipped
CI/CD / Release (push) Has been skipped
CI/CD / Build & Push Docker Image (push) Has been skipped
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-06 14:06:11 +01:00
ddf46f190b fix(ci): use public git.uuxo.net URL instead of internal gitea:3000
Some checks failed
CI/CD / Test (push) Failing after 4s
CI/CD / Generate SBOM (push) Has been skipped
CI/CD / Lint (push) Failing after 4s
CI/CD / Build (darwin-amd64) (push) Has been skipped
CI/CD / Build (linux-amd64) (push) Has been skipped
CI/CD / Build (darwin-arm64) (push) Has been skipped
CI/CD / Build (linux-arm64) (push) Has been skipped
CI/CD / Release (push) Has been skipped
CI/CD / Build & Push Docker Image (push) Has been skipped
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-06 14:04:57 +01:00
4c6d44725e fix(ci): use manual git fetch with GITHUB_SERVER_URL/SHA (no Node.js needed)
Some checks failed
CI/CD / Test (push) Failing after 3s
CI/CD / Generate SBOM (push) Has been skipped
CI/CD / Lint (push) Failing after 3s
CI/CD / Build (darwin-amd64) (push) Has been skipped
CI/CD / Build (linux-amd64) (push) Has been skipped
CI/CD / Build (darwin-arm64) (push) Has been skipped
CI/CD / Build (linux-arm64) (push) Has been skipped
CI/CD / Release (push) Has been skipped
CI/CD / Build & Push Docker Image (push) Has been skipped
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-06 14:03:09 +01:00
be69c0e00f fix(ci): use actions/checkout@v4 instead of manual git clone
Some checks failed
CI/CD / Test (push) Failing after 9s
CI/CD / Generate SBOM (push) Has been skipped
CI/CD / Lint (push) Failing after 2s
CI/CD / Build (darwin-amd64) (push) Has been skipped
CI/CD / Build (linux-amd64) (push) Has been skipped
CI/CD / Build (darwin-arm64) (push) Has been skipped
CI/CD / Build (linux-arm64) (push) Has been skipped
CI/CD / Release (push) Has been skipped
CI/CD / Build & Push Docker Image (push) Has been skipped
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-06 14:01:15 +01:00
ee1f58efdb chore: ignore bin/ directory to prevent repository bloat
Some checks failed
CI/CD / Test (push) Failing after 4s
CI/CD / Generate SBOM (push) Has been skipped
CI/CD / Lint (push) Failing after 4s
CI/CD / Build (darwin-amd64) (push) Has been skipped
CI/CD / Build (linux-amd64) (push) Has been skipped
CI/CD / Build (darwin-arm64) (push) Has been skipped
CI/CD / Build (linux-arm64) (push) Has been skipped
CI/CD / Release (push) Has been skipped
CI/CD / Build & Push Docker Image (push) Has been skipped
CI/CD / Mirror to GitHub (push) Has been skipped
2026-01-06 13:39:47 +01:00
5959d7313d fix(diagnose): add debug logging for WorkDir usage 2026-01-06 12:34:00 +01:00
b856d8b3f8 feat(tui): add Work Directory setting for large archive operations
- Added WorkDir to Config for custom temp directory
- TUI Settings: new 'Work Directory' option to set alternative temp location
- Restore Preview: press 'w' to toggle work directory (uses backup dir as default)
- Diagnose View: now uses configured WorkDir for cluster extraction
- Config persistence: WorkDir saved to .dbbackup.conf

This fixes diagnosis/restore failures when /tmp is too small for large archives.
Use cases: servers with limited /tmp, 70GB+ archives needing 280GB+ extraction space.
2026-01-06 11:11:22 +01:00
886aa4810a fix(diagnose): improve cluster archive diagnosis error handling
- Better error messages when tar extraction fails
- Detect truncated/corrupted archives without full extraction
- Show archive contents even when extraction fails
- Provide helpful hints for disk space and corruption issues
- Exit status 2 from tar now shows detailed diagnostics
2026-01-06 10:42:38 +01:00
14bd1f848c feat(tui): add Diagnose Backup File option to interactive menu
- Added 'Diagnose Backup File' as menu option in TUI
- Archive browser now supports 'diagnose' mode
- Allows users to run deep diagnosis on backups before restore
- Helps identify truncation/corruption issues in large backups
2026-01-06 09:44:22 +01:00
4c171c0e44 v3.40.0: Restore diagnostics and error reporting
Features:
- restore diagnose command for backup file analysis
- Deep COPY block verification for truncated dump detection
- PGDMP signature and gzip integrity validation
- Detailed error reports with --save-debug-log flag
- Ring buffer stderr capture (prevents OOM on 2M+ errors)
- Error classification with actionable recommendations

TUI Enhancements:
- Automatic dump validity safety check before restore
- Press 'd' in archive browser to diagnose backups
- Press 'd' in restore preview for debug log toggle
- Debug logs saved to /tmp on failure when enabled

Documentation:
- Updated README with diagnose command and examples
- Updated CHANGELOG with full feature list
- Updated restore preview screenshots
2026-01-05 15:17:54 +01:00
e7f0a9f5eb docs: update documentation to match current CLI syntax
- AZURE.md, GCS.md: Replace 'backup postgres' with 'backup single'
- AZURE.md, GCS.md: Replace 'restore postgres --source' with proper syntax
- AZURE.md, GCS.md: Remove non-existent --output, --source flags
- VEEAM_ALTERNATIVE.md: Fix command examples and broken link
- CONTRIBUTING.md: Remove RELEASE_NOTES step from release process
- CHANGELOG.md: Remove reference to deleted file
- Remove RELEASE_NOTES_v3.1.md (content is in CHANGELOG.md)
2026-01-05 12:41:18 +01:00
2e942f04a4 docs: remove undocumented --notify flag from README
The --notify CLI flag was documented but not implemented.
Notifications are configured via environment variables only.
2026-01-05 12:35:33 +01:00
f29e6fe102 docs: fix MYSQL_PITR.md - remove non-existent --pitr flag
Regular backups already capture binlog position automatically when
PITR is enabled at the MySQL level. No special flag needed.
2025-12-15 15:12:50 +01:00
51fc570fc7 chore: bump version to 3.2.0 across all files 2025-12-15 15:09:34 +01:00
46 changed files with 5518 additions and 748 deletions

View File

@@ -1,4 +1,6 @@
# CI/CD Pipeline for dbbackup
# Main repo: Gitea (git.uuxo.net)
# Mirror: GitHub (github.com/PlusOne/dbbackup)
name: CI/CD
on:
@@ -8,9 +10,6 @@ on:
pull_request:
branches: [main, master]
env:
GITEA_URL: https://git.uuxo.net
jobs:
test:
name: Test
@@ -18,26 +17,25 @@ jobs:
container:
image: golang:1.24-bookworm
steps:
- name: Install git
run: apt-get update && apt-get install -y git ca-certificates
- name: Checkout code
env:
TOKEN: ${{ github.token }}
run: |
apt-get update && apt-get install -y -qq git ca-certificates
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${{ env.GITEA_URL }}/${GITHUB_REPOSITORY}.git .
git init
git remote add origin "https://${TOKEN}@git.uuxo.net/${GITHUB_REPOSITORY}.git"
git fetch --depth=1 origin "${GITHUB_SHA}"
git checkout FETCH_HEAD
- name: Download dependencies
run: go mod download
- name: Run tests with race detection
env:
GOMAXPROCS: 8
run: go test -race -coverprofile=coverage.out -covermode=atomic ./...
- name: Run tests
run: go test -race -coverprofile=coverage.out ./...
- name: Generate coverage report
run: |
go tool cover -func=coverage.out
go tool cover -html=coverage.out -o coverage.html
- name: Coverage summary
run: go tool cover -func=coverage.out | tail -1
lint:
name: Lint
@@ -45,168 +43,140 @@ jobs:
container:
image: golang:1.24-bookworm
steps:
- name: Install git
run: apt-get update && apt-get install -y git ca-certificates
- name: Checkout code
run: |
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${{ env.GITEA_URL }}/${GITHUB_REPOSITORY}.git .
- name: Install golangci-lint
run: go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.62.2
- name: Run golangci-lint
env:
GOMAXPROCS: 8
run: golangci-lint run --timeout=5m ./...
TOKEN: ${{ github.token }}
run: |
apt-get update && apt-get install -y -qq git ca-certificates
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git init
git remote add origin "https://${TOKEN}@git.uuxo.net/${GITHUB_REPOSITORY}.git"
git fetch --depth=1 origin "${GITHUB_SHA}"
git checkout FETCH_HEAD
build:
name: Build (${{ matrix.goos }}-${{ matrix.goarch }})
- name: Install and run golangci-lint
run: |
go install github.com/golangci/golangci-lint/cmd/golangci-lint@v1.62.2
golangci-lint run --timeout=5m ./...
build-and-release:
name: Build & Release
runs-on: ubuntu-latest
needs: [test, lint]
if: startsWith(github.ref, 'refs/tags/')
container:
image: golang:1.24-bookworm
strategy:
max-parallel: 8
matrix:
goos: [linux, darwin]
goarch: [amd64, arm64]
steps:
- name: Install git
run: apt-get update && apt-get install -y git ca-certificates
- name: Checkout code
run: |
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${{ env.GITEA_URL }}/${GITHUB_REPOSITORY}.git .
- name: Build binary
env:
GOOS: ${{ matrix.goos }}
GOARCH: ${{ matrix.goarch }}
CGO_ENABLED: 0
GOMAXPROCS: 8
run: |
BINARY_NAME=dbbackup
go build -ldflags="-s -w" -o dist/${BINARY_NAME}-${{ matrix.goos }}-${{ matrix.goarch }} .
sbom:
name: Generate SBOM
runs-on: ubuntu-latest
needs: [test]
container:
image: golang:1.24-bookworm
steps:
- name: Install git
run: apt-get update && apt-get install -y git ca-certificates
- name: Checkout code
TOKEN: ${{ github.token }}
run: |
apt-get update && apt-get install -y -qq git ca-certificates curl jq
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${{ env.GITEA_URL }}/${GITHUB_REPOSITORY}.git .
git init
git remote add origin "https://${TOKEN}@git.uuxo.net/${GITHUB_REPOSITORY}.git"
git fetch --depth=1 origin "${GITHUB_SHA}"
git checkout FETCH_HEAD
- name: Install Syft
run: curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin
- name: Generate SBOM
- name: Build all platforms
run: |
syft . -o spdx-json=sbom-spdx.json
syft . -o cyclonedx-json=sbom-cyclonedx.json
mkdir -p release
# Linux amd64
echo "Building linux/amd64..."
CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-s -w" -o release/dbbackup-linux-amd64 .
# Linux arm64
echo "Building linux/arm64..."
CGO_ENABLED=0 GOOS=linux GOARCH=arm64 go build -ldflags="-s -w" -o release/dbbackup-linux-arm64 .
# Darwin amd64
echo "Building darwin/amd64..."
CGO_ENABLED=0 GOOS=darwin GOARCH=amd64 go build -ldflags="-s -w" -o release/dbbackup-darwin-amd64 .
# Darwin arm64
echo "Building darwin/arm64..."
CGO_ENABLED=0 GOOS=darwin GOARCH=arm64 go build -ldflags="-s -w" -o release/dbbackup-darwin-arm64 .
# FreeBSD amd64
echo "Building freebsd/amd64..."
CGO_ENABLED=0 GOOS=freebsd GOARCH=amd64 go build -ldflags="-s -w" -o release/dbbackup-freebsd-amd64 .
echo "All builds complete:"
ls -lh release/
release:
name: Release
runs-on: ubuntu-latest
needs: [test, lint, build]
if: startsWith(github.ref, 'refs/tags/v')
container:
image: golang:1.24-bookworm
steps:
- name: Install tools
run: |
apt-get update && apt-get install -y git ca-certificates
curl -sSfL https://github.com/goreleaser/goreleaser/releases/download/v2.4.8/goreleaser_Linux_x86_64.tar.gz | tar xz -C /usr/local/bin goreleaser
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin
- name: Checkout code
run: |
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git clone --branch ${GITHUB_REF_NAME} ${{ env.GITEA_URL }}/${GITHUB_REPOSITORY}.git .
git fetch --tags
- name: Run goreleaser
- name: Create Gitea Release
env:
GITEA_TOKEN: ${{ secrets.GITEA_TOKEN }}
run: goreleaser release --clean
docker:
name: Build & Push Docker Image
runs-on: ubuntu-latest
needs: [test, lint]
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/'))
container:
image: docker:24-cli
options: --privileged
services:
docker:
image: docker:24-dind
options: --privileged
steps:
- name: Install dependencies
run: apk add --no-cache git curl
- name: Checkout code
GITEA_TOKEN: ${{ github.token }}
run: |
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${{ env.GITEA_URL }}/${GITHUB_REPOSITORY}.git .
- name: Set up Docker Buildx
run: |
docker buildx create --use --name builder --driver docker-container
docker buildx inspect --bootstrap
- name: Login to Gitea Registry
if: ${{ secrets.REGISTRY_USER != '' && secrets.REGISTRY_TOKEN != '' }}
run: |
echo "${{ secrets.REGISTRY_TOKEN }}" | docker login git.uuxo.net -u "${{ secrets.REGISTRY_USER }}" --password-stdin
- name: Build and push
if: ${{ secrets.REGISTRY_USER != '' && secrets.REGISTRY_TOKEN != '' }}
run: |
# Determine tags
if [[ "${GITHUB_REF}" == refs/tags/* ]]; then
VERSION=${GITHUB_REF#refs/tags/}
TAGS="-t git.uuxo.net/uuxo/dbbackup:${VERSION} -t git.uuxo.net/uuxo/dbbackup:latest"
else
TAGS="-t git.uuxo.net/uuxo/dbbackup:${GITHUB_SHA::8} -t git.uuxo.net/uuxo/dbbackup:main"
TAG=${GITHUB_REF#refs/tags/}
echo "Creating Gitea release for ${TAG}..."
echo "Debug: GITHUB_REPOSITORY=${GITHUB_REPOSITORY}"
echo "Debug: TAG=${TAG}"
# Simple body without special characters
BODY="Download binaries for your platform"
# Create release via API with simple inline JSON
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST \
-H "Authorization: token ${GITEA_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"tag_name":"'"${TAG}"'","name":"'"${TAG}"'","body":"'"${BODY}"'","draft":false,"prerelease":false}' \
"https://git.uuxo.net/api/v1/repos/${GITHUB_REPOSITORY}/releases")
HTTP_CODE=$(echo "$RESPONSE" | tail -1)
BODY_RESPONSE=$(echo "$RESPONSE" | sed '$d')
echo "HTTP Code: $HTTP_CODE"
echo "Response: $BODY_RESPONSE"
RELEASE_ID=$(echo "$BODY_RESPONSE" | jq -r '.id')
if [ "$RELEASE_ID" = "null" ] || [ -z "$RELEASE_ID" ]; then
echo "Failed to create release"
exit 1
fi
docker buildx build \
--platform linux/amd64,linux/arm64 \
--push \
${TAGS} \
.
# Test 1765481480
echo "Created release ID: $RELEASE_ID"
# Upload each binary
echo "Files to upload:"
ls -la release/
for file in release/dbbackup-*; do
FILENAME=$(basename "$file")
echo "Uploading $FILENAME..."
UPLOAD_RESPONSE=$(curl -s -X POST \
-H "Authorization: token ${GITEA_TOKEN}" \
-F "attachment=@${file}" \
"https://git.uuxo.net/api/v1/repos/${GITHUB_REPOSITORY}/releases/${RELEASE_ID}/assets?name=${FILENAME}")
echo "Upload response: $UPLOAD_RESPONSE"
done
echo "Gitea release complete!"
mirror:
# Mirror to GitHub (optional - runs if GITHUB_MIRROR_TOKEN secret is set)
mirror-to-github:
name: Mirror to GitHub
runs-on: ubuntu-latest
needs: [test, lint]
if: github.event_name == 'push' && github.ref == 'refs/heads/main' && vars.MIRROR_ENABLED != 'false'
container:
image: debian:bookworm-slim
volumes:
- /root/.ssh:/root/.ssh:ro
needs: [build-and-release]
if: startsWith(github.ref, 'refs/tags/') && vars.GITHUB_MIRROR_TOKEN != ''
continue-on-error: true
steps:
- name: Install git
run: apt-get update && apt-get install -y --no-install-recommends git openssh-client ca-certificates && rm -rf /var/lib/apt/lists/*
- name: Clone and mirror
- name: Mirror to GitHub
env:
GIT_SSH_COMMAND: "ssh -i /root/.ssh/id_ed25519 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no"
GITHUB_MIRROR_TOKEN: ${{ vars.GITHUB_MIRROR_TOKEN }}
run: |
git config --global --add safe.directory "$GITHUB_WORKSPACE"
git clone --mirror ${{ env.GITEA_URL }}/${GITHUB_REPOSITORY}.git repo.git
TAG=${GITHUB_REF#refs/tags/}
echo "Mirroring ${TAG} to GitHub..."
# Clone from Gitea
git clone --bare "https://git.uuxo.net/${GITHUB_REPOSITORY}.git" repo.git
cd repo.git
git remote add github git@github.com:PlusOne/dbbackup.git
git push --mirror github || git push --force --all github && git push --force --tags github
# Push to GitHub
git push --mirror "https://${GITHUB_MIRROR_TOKEN}@github.com/PlusOne/dbbackup.git" || echo "Mirror push failed (non-critical)"
echo "GitHub mirror complete!"

4
.gitignore vendored
View File

@@ -9,10 +9,12 @@ logs/
*.trace
*.err
# Ignore built binaries in root (keep bin/ directory for releases)
# Ignore built binaries (built fresh via build_all.sh on release)
/dbbackup
/dbbackup_*
!dbbackup.png
bin/dbbackup_*
bin/*.exe
# Ignore development artifacts
*.swp

View File

@@ -28,21 +28,16 @@ This guide covers using **Azure Blob Storage** with `dbbackup` for secure, scala
```bash
# Backup PostgreSQL to Azure
dbbackup backup postgres \
--host localhost \
--database mydb \
--output backup.sql \
--cloud "azure://mycontainer/backups/db.sql?account=myaccount&key=ACCOUNT_KEY"
dbbackup backup single mydb \
--cloud "azure://mycontainer/backups/?account=myaccount&key=ACCOUNT_KEY"
```
### 3. Restore from Azure
```bash
# Restore from Azure backup
dbbackup restore postgres \
--source "azure://mycontainer/backups/db.sql?account=myaccount&key=ACCOUNT_KEY" \
--host localhost \
--database mydb_restored
# Download backup from Azure and restore
dbbackup cloud download "azure://mycontainer/backups/mydb.dump.gz?account=myaccount&key=ACCOUNT_KEY" ./mydb.dump.gz
dbbackup restore single ./mydb.dump.gz --target mydb_restored --confirm
```
## URI Syntax
@@ -99,7 +94,7 @@ export AZURE_STORAGE_ACCOUNT="myaccount"
export AZURE_STORAGE_KEY="YOUR_ACCOUNT_KEY"
# Use simplified URI (credentials from environment)
dbbackup backup postgres --cloud "azure://container/path/backup.sql"
dbbackup backup single mydb --cloud "azure://container/path/"
```
### Method 3: Connection String
@@ -109,7 +104,7 @@ Use Azure connection string:
```bash
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=myaccount;AccountKey=YOUR_KEY;EndpointSuffix=core.windows.net"
dbbackup backup postgres --cloud "azure://container/path/backup.sql"
dbbackup backup single mydb --cloud "azure://container/path/"
```
### Getting Your Account Key
@@ -196,11 +191,8 @@ Configure automatic tier transitions:
```bash
# PostgreSQL backup with automatic Azure upload
dbbackup backup postgres \
--host localhost \
--database production_db \
--output /backups/db.sql \
--cloud "azure://prod-backups/postgres/$(date +%Y%m%d_%H%M%S).sql?account=myaccount&key=KEY" \
dbbackup backup single production_db \
--cloud "azure://prod-backups/postgres/?account=myaccount&key=KEY" \
--compression 6
```
@@ -208,10 +200,7 @@ dbbackup backup postgres \
```bash
# Backup entire PostgreSQL cluster to Azure
dbbackup backup postgres \
--host localhost \
--all-databases \
--output-dir /backups \
dbbackup backup cluster \
--cloud "azure://prod-backups/postgres/cluster/?account=myaccount&key=KEY"
```
@@ -257,13 +246,9 @@ dbbackup cleanup "azure://prod-backups/postgres/?account=myaccount&key=KEY" --ke
#!/bin/bash
# Azure backup script (run via cron)
DATE=$(date +%Y%m%d_%H%M%S)
AZURE_URI="azure://prod-backups/postgres/${DATE}.sql?account=myaccount&key=${AZURE_STORAGE_KEY}"
AZURE_URI="azure://prod-backups/postgres/?account=myaccount&key=${AZURE_STORAGE_KEY}"
dbbackup backup postgres \
--host localhost \
--database production_db \
--output /tmp/backup.sql \
dbbackup backup single production_db \
--cloud "${AZURE_URI}" \
--compression 9
@@ -289,35 +274,25 @@ For large files (>256MB), dbbackup automatically uses Azure Block Blob staging:
```bash
# Large database backup (automatically uses block blob)
dbbackup backup postgres \
--host localhost \
--database huge_db \
--output /backups/huge.sql \
--cloud "azure://backups/huge.sql?account=myaccount&key=KEY"
dbbackup backup single huge_db \
--cloud "azure://backups/?account=myaccount&key=KEY"
```
### Progress Tracking
```bash
# Backup with progress display
dbbackup backup postgres \
--host localhost \
--database mydb \
--output backup.sql \
--cloud "azure://backups/backup.sql?account=myaccount&key=KEY" \
--progress
dbbackup backup single mydb \
--cloud "azure://backups/?account=myaccount&key=KEY"
```
### Concurrent Operations
```bash
# Backup multiple databases in parallel
dbbackup backup postgres \
--host localhost \
--all-databases \
--output-dir /backups \
# Backup cluster with parallel jobs
dbbackup backup cluster \
--cloud "azure://backups/cluster/?account=myaccount&key=KEY" \
--parallelism 4
--jobs 4
```
### Custom Metadata
@@ -365,11 +340,8 @@ Endpoint: http://localhost:10000/devstoreaccount1
```bash
# Backup to Azurite
dbbackup backup postgres \
--host localhost \
--database testdb \
--output test.sql \
--cloud "azure://test-backups/test.sql?endpoint=http://localhost:10000&account=devstoreaccount1&key=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="
dbbackup backup single testdb \
--cloud "azure://test-backups/?endpoint=http://localhost:10000&account=devstoreaccount1&key=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw=="
```
### Run Integration Tests
@@ -492,8 +464,8 @@ Tests include:
Enable debug mode:
```bash
dbbackup backup postgres \
--cloud "azure://container/backup.sql?account=myaccount&key=KEY" \
dbbackup backup single mydb \
--cloud "azure://container/?account=myaccount&key=KEY" \
--debug
```

View File

@@ -5,6 +5,188 @@ All notable changes to dbbackup will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [3.42.0] - 2026-01-07 "The Operator"
### Added - 🐧 Systemd Integration & Prometheus Metrics
**Embedded Systemd Installer:**
- New `dbbackup install` command installs as systemd service/timer
- Supports single-database (`--backup-type single`) and cluster (`--backup-type cluster`) modes
- Automatic `dbbackup` user/group creation with proper permissions
- Hardened service units with security features (NoNewPrivileges, ProtectSystem, CapabilityBoundingSet)
- Templated timer units with configurable schedules (daily, weekly, or custom OnCalendar)
- Built-in dry-run mode (`--dry-run`) to preview installation
- `dbbackup install --status` shows current installation state
- `dbbackup uninstall` cleanly removes all systemd units and optionally configuration
**Prometheus Metrics Support:**
- New `dbbackup metrics export` command writes textfile collector format
- New `dbbackup metrics serve` command runs HTTP exporter on port 9399
- Metrics: `dbbackup_last_success_timestamp`, `dbbackup_rpo_seconds`, `dbbackup_backup_total`, etc.
- Integration with node_exporter textfile collector
- Metrics automatically updated via ExecStopPost in service units
- `--with-metrics` flag during install sets up exporter as systemd service
**New Commands:**
```bash
# Install as systemd service
sudo dbbackup install --backup-type cluster --schedule daily
# Install with Prometheus metrics
sudo dbbackup install --with-metrics --metrics-port 9399
# Check installation status
dbbackup install --status
# Export metrics for node_exporter
dbbackup metrics export --output /var/lib/dbbackup/metrics/dbbackup.prom
# Run HTTP metrics server
dbbackup metrics serve --port 9399
```
### Technical Details
- Systemd templates embedded with `//go:embed` for self-contained binary
- Templates use ReadWritePaths for security isolation
- Service units include proper OOMScoreAdjust (-100) to protect backups
- Metrics exporter caches with 30-second TTL for performance
- Graceful shutdown on SIGTERM for metrics server
---
## [3.41.0] - 2026-01-07 "The Pre-Flight Check"
### Added - 🛡️ Pre-Restore Validation
**Automatic Dump Validation Before Restore:**
- SQL dump files are now validated BEFORE attempting restore
- Detects truncated COPY blocks that cause "syntax error" failures
- Catches corrupted backups in seconds instead of wasting 49+ minutes
- Cluster restore pre-validates ALL dumps upfront (fail-fast approach)
- Custom format `.dump` files now validated with `pg_restore --list`
**Improved Error Messages:**
- Clear indication when dump file is truncated
- Shows which table's COPY block was interrupted
- Displays sample orphaned data for diagnosis
- Provides actionable error messages with root cause
### Fixed
- **P0: SQL Injection** - Added identifier validation for database names in CREATE/DROP DATABASE to prevent SQL injection attacks; uses safe quoting and regex validation (alphanumeric + underscore only)
- **P0: Data Race** - Fixed concurrent goroutines appending to shared error slice in notification manager; now uses mutex synchronization
- **P0: psql ON_ERROR_STOP** - Added `-v ON_ERROR_STOP=1` to psql commands to fail fast on first error instead of accumulating millions of errors
- **P1: Pipe deadlock** - Fixed streaming compression deadlock when pg_dump blocks on full pipe buffer; now uses goroutine with proper context timeout handling
- **P1: SIGPIPE handling** - Detect exit code 141 (broken pipe) and report compressor failure as root cause
- **P2: .dump validation** - Custom format dumps now validated with `pg_restore --list` before restore
- **P2: fsync durability** - Added `outFile.Sync()` after streaming compression to prevent truncation on power loss
- Truncated `.sql.gz` dumps no longer waste hours on doomed restores
- "syntax error at or near" errors now caught before restore begins
- Cluster restores abort immediately if any dump is corrupted
### Technical Details
- Integrated `Diagnoser` into restore pipeline for pre-validation
- Added `quickValidateSQLDump()` for fast integrity checks
- Pre-validation runs on all `.sql.gz` and `.dump` files in cluster archives
- Streaming compression uses channel-based wait with context cancellation
- Zero performance impact on valid backups (diagnosis is fast)
---
## [3.40.0] - 2026-01-05 "The Diagnostician"
### Added - 🔍 Restore Diagnostics & Error Reporting
**Backup Diagnosis Command:**
- `restore diagnose <archive>` - Deep analysis of backup files before restore
- Detects truncated dumps, corrupted archives, incomplete COPY blocks
- PGDMP signature validation for PostgreSQL custom format
- Gzip integrity verification with decompression test
- `pg_restore --list` validation for custom format archives
- `--deep` flag for exhaustive line-by-line analysis
- `--json` flag for machine-readable output
- Cluster archive diagnosis scans all contained dumps
**Detailed Error Reporting:**
- Comprehensive error collector captures stderr during restore
- Ring buffer prevents OOM on high-error restores (2M+ errors)
- Error classification with actionable hints and recommendations
- `--save-debug-log <path>` saves JSON report on failure
- Reports include: exit codes, last errors, line context, tool versions
- Automatic recommendations based on error patterns
**TUI Restore Enhancements:**
- **Dump validity** safety check runs automatically before restore
- Detects truncated/corrupted backups in restore preview
- Press **`d`** to toggle debug log saving in Advanced Options
- Debug logs saved to `/tmp/dbbackup-restore-debug-*.json` on failure
- Press **`d`** in archive browser to run diagnosis on any backup
**New Commands:**
- `restore diagnose` - Analyze backup file integrity and structure
**New Flags:**
- `--save-debug-log <path>` - Save detailed JSON error report on failure
- `--diagnose` - Run deep diagnosis before cluster restore
- `--deep` - Enable exhaustive diagnosis (line-by-line analysis)
- `--json` - Output diagnosis in JSON format
- `--keep-temp` - Keep temporary files after diagnosis
- `--verbose` - Show detailed diagnosis progress
### Technical Details
- 1,200+ lines of new diagnostic code
- Error classification system with 15+ error patterns
- Ring buffer stderr capture (1MB max, 10K lines)
- Zero memory growth on high-error restores
- Full TUI integration for diagnostics
---
## [3.2.0] - 2025-12-13 "The Margin Eraser"
### Added - 🚀 Physical Backup Revolution
**MySQL Clone Plugin Integration:**
- Native physical backup using MySQL 8.0.17+ Clone Plugin
- No XtraBackup dependency - pure Go implementation
- Real-time progress monitoring via performance_schema
- Support for both local and remote clone operations
**Filesystem Snapshot Orchestration:**
- LVM snapshot support with automatic cleanup
- ZFS snapshot integration with send/receive
- Btrfs subvolume snapshot support
- Brief table lock (<100ms) for consistency
- Automatic snapshot backend detection
**Continuous Binlog Streaming:**
- Real-time binlog capture using MySQL replication protocol
- Multiple targets: file, compressed file, S3 direct streaming
- Sub-second RPO without impacting database server
- Automatic position tracking and checkpointing
**Parallel Cloud Streaming:**
- Direct database-to-S3 streaming (zero local storage)
- Configurable worker pool for parallel uploads
- S3 multipart upload with automatic retry
- Support for S3, GCS, and Azure Blob Storage
**Smart Engine Selection:**
- Automatic engine selection based on environment
- MySQL version detection and capability checking
- Filesystem type detection for optimal snapshot backend
- Database size-based recommendations
**New Commands:**
- `engine list` - List available backup engines
- `engine info <name>` - Show detailed engine information
- `backup --engine=<name>` - Use specific backup engine
### Technical Details
- 7,559 lines of new code
- Zero new external dependencies
- 10/10 platform builds successful
- Full test coverage for new engines
## [3.1.0] - 2025-11-26
### Added - 🔄 Point-in-Time Recovery (PITR)
@@ -117,7 +299,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Documentation
- Added comprehensive PITR.md guide (complete PITR documentation)
- Updated README.md with PITR section (200+ lines)
- Added RELEASE_NOTES_v3.1.md (full feature list)
- Updated CHANGELOG.md with v3.1.0 details
- Added NOTICE file for Apache License attribution
- Created comprehensive test suite (tests/pitr_complete_test.go - 700+ lines)

View File

@@ -17,7 +17,7 @@ Be respectful, constructive, and professional in all interactions. We're buildin
**Bug Report Template:**
```
**Version:** dbbackup v3.1.0
**Version:** dbbackup v3.40.0
**OS:** Linux/macOS/BSD
**Database:** PostgreSQL 14 / MySQL 8.0 / MariaDB 10.6
**Command:** The exact command that failed
@@ -274,12 +274,11 @@ Fixes #56
1. Update version in `main.go`
2. Update `CHANGELOG.md`
3. Create release notes (`RELEASE_NOTES_vX.Y.Z.md`)
4. Commit: `git commit -m "Release vX.Y.Z"`
5. Tag: `git tag -a vX.Y.Z -m "Release vX.Y.Z"`
6. Push: `git push origin main vX.Y.Z`
7. Build binaries: `./build_all.sh`
8. Create GitHub Release with binaries
3. Commit: `git commit -m "Release vX.Y.Z"`
4. Tag: `git tag -a vX.Y.Z -m "Release vX.Y.Z"`
5. Push: `git push origin main vX.Y.Z`
6. Build binaries: `./build_all.sh`
7. Create GitHub Release with binaries
## Questions?

80
GCS.md
View File

@@ -28,21 +28,16 @@ This guide covers using **Google Cloud Storage (GCS)** with `dbbackup` for secur
```bash
# Backup PostgreSQL to GCS (using ADC)
dbbackup backup postgres \
--host localhost \
--database mydb \
--output backup.sql \
--cloud "gs://mybucket/backups/db.sql"
dbbackup backup single mydb \
--cloud "gs://mybucket/backups/"
```
### 3. Restore from GCS
```bash
# Restore from GCS backup
dbbackup restore postgres \
--source "gs://mybucket/backups/db.sql" \
--host localhost \
--database mydb_restored
# Download backup from GCS and restore
dbbackup cloud download "gs://mybucket/backups/mydb.dump.gz" ./mydb.dump.gz
dbbackup restore single ./mydb.dump.gz --target mydb_restored --confirm
```
## URI Syntax
@@ -107,7 +102,7 @@ gcloud auth application-default login
gcloud auth activate-service-account --key-file=/path/to/key.json
# Use simplified URI (credentials from environment)
dbbackup backup postgres --cloud "gs://mybucket/backups/backup.sql"
dbbackup backup single mydb --cloud "gs://mybucket/backups/"
```
### Method 2: Service Account JSON
@@ -121,14 +116,14 @@ Download service account key from GCP Console:
**Use in URI:**
```bash
dbbackup backup postgres \
--cloud "gs://mybucket/backup.sql?credentials=/path/to/service-account.json"
dbbackup backup single mydb \
--cloud "gs://mybucket/?credentials=/path/to/service-account.json"
```
**Or via environment:**
```bash
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
dbbackup backup postgres --cloud "gs://mybucket/backup.sql"
dbbackup backup single mydb --cloud "gs://mybucket/"
```
### Method 3: Workload Identity (GKE)
@@ -147,7 +142,7 @@ metadata:
Then use ADC in your pod:
```bash
dbbackup backup postgres --cloud "gs://mybucket/backup.sql"
dbbackup backup single mydb --cloud "gs://mybucket/"
```
### Required IAM Permissions
@@ -250,11 +245,8 @@ gsutil mb -l eu gs://mybucket/
```bash
# PostgreSQL backup with automatic GCS upload
dbbackup backup postgres \
--host localhost \
--database production_db \
--output /backups/db.sql \
--cloud "gs://prod-backups/postgres/$(date +%Y%m%d_%H%M%S).sql" \
dbbackup backup single production_db \
--cloud "gs://prod-backups/postgres/" \
--compression 6
```
@@ -262,10 +254,7 @@ dbbackup backup postgres \
```bash
# Backup entire PostgreSQL cluster to GCS
dbbackup backup postgres \
--host localhost \
--all-databases \
--output-dir /backups \
dbbackup backup cluster \
--cloud "gs://prod-backups/postgres/cluster/"
```
@@ -314,13 +303,9 @@ dbbackup cleanup "gs://prod-backups/postgres/" --keep 7
#!/bin/bash
# GCS backup script (run via cron)
DATE=$(date +%Y%m%d_%H%M%S)
GCS_URI="gs://prod-backups/postgres/${DATE}.sql"
GCS_URI="gs://prod-backups/postgres/"
dbbackup backup postgres \
--host localhost \
--database production_db \
--output /tmp/backup.sql \
dbbackup backup single production_db \
--cloud "${GCS_URI}" \
--compression 9
@@ -360,35 +345,25 @@ For large files, dbbackup automatically uses GCS chunked upload:
```bash
# Large database backup (automatically uses chunked upload)
dbbackup backup postgres \
--host localhost \
--database huge_db \
--output /backups/huge.sql \
--cloud "gs://backups/huge.sql"
dbbackup backup single huge_db \
--cloud "gs://backups/"
```
### Progress Tracking
```bash
# Backup with progress display
dbbackup backup postgres \
--host localhost \
--database mydb \
--output backup.sql \
--cloud "gs://backups/backup.sql" \
--progress
dbbackup backup single mydb \
--cloud "gs://backups/"
```
### Concurrent Operations
```bash
# Backup multiple databases in parallel
dbbackup backup postgres \
--host localhost \
--all-databases \
--output-dir /backups \
# Backup cluster with parallel jobs
dbbackup backup cluster \
--cloud "gs://backups/cluster/" \
--parallelism 4
--jobs 4
```
### Custom Metadata
@@ -460,11 +435,8 @@ curl -X POST "http://localhost:4443/storage/v1/b?project=test-project" \
```bash
# Backup to fake-gcs-server
dbbackup backup postgres \
--host localhost \
--database testdb \
--output test.sql \
--cloud "gs://test-backups/test.sql?endpoint=http://localhost:4443/storage/v1"
dbbackup backup single testdb \
--cloud "gs://test-backups/?endpoint=http://localhost:4443/storage/v1"
```
### Run Integration Tests
@@ -593,8 +565,8 @@ Tests include:
Enable debug mode:
```bash
dbbackup backup postgres \
--cloud "gs://bucket/backup.sql" \
dbbackup backup single mydb \
--cloud "gs://bucket/" \
--debug
```

View File

@@ -110,10 +110,12 @@ dbbackup pitr mysql-enable --archive-dir /backups/binlog_archive
### 3. Create a Base Backup
```bash
# Create a PITR-capable backup
dbbackup backup single mydb --pitr
# Create a backup - binlog position is automatically recorded
dbbackup backup single mydb
```
> **Note:** All backups automatically capture the current binlog position when PITR is enabled at the MySQL level. This position is stored in the backup metadata and used as the starting point for binlog replay during recovery.
### 4. Start Binlog Archiving
```bash

188
README.md
View File

@@ -19,6 +19,8 @@ Database backup and restore utility for PostgreSQL, MySQL, and MariaDB.
- Point-in-Time Recovery (PITR) for PostgreSQL and MySQL/MariaDB
- **GFS retention policies**: Grandfather-Father-Son backup rotation
- **Notifications**: SMTP email and webhook alerts
- **Systemd integration**: Install as service with scheduled timers
- **Prometheus metrics**: Textfile collector and HTTP exporter
- Interactive terminal UI
- Cross-platform binaries
@@ -54,7 +56,7 @@ Download from [releases](https://git.uuxo.net/UUXO/dbbackup/releases):
```bash
# Linux x86_64
wget https://git.uuxo.net/UUXO/dbbackup/releases/download/v3.1.0/dbbackup-linux-amd64
wget https://git.uuxo.net/UUXO/dbbackup/releases/download/v3.40.0/dbbackup-linux-amd64
chmod +x dbbackup-linux-amd64
sudo mv dbbackup-linux-amd64 /usr/local/bin/dbbackup
```
@@ -94,6 +96,7 @@ Database: postgres@localhost:5432 (PostgreSQL)
────────────────────────────────
Restore Single Database
Restore Cluster Backup
Diagnose Backup File
List & Manage Backups
────────────────────────────────
View Active Operations
@@ -161,11 +164,15 @@ Cluster Restore Options
Safety Checks
[OK] Archive integrity verified
[OK] Dump validity verified
[OK] Disk space: 140 GB available
[OK] Required tools found
[OK] Target database accessible
c: Toggle cleanup | Enter: Proceed | Esc: Cancel
Advanced Options
✗ Debug Log: false (press 'd' to toggle)
c: Toggle cleanup | d: Debug log | Enter: Proceed | Esc: Cancel
```
**Backup Manager:**
@@ -180,7 +187,7 @@ FILENAME FORMAT SIZE MODIFIED
[OK] myapp_prod_20250114.dump.gz PostgreSQL Custom 12.3 GB 2025-01-14
[!!] users_db_20241220.dump.gz PostgreSQL Custom 850 MB 2024-12-20
r: Restore | v: Verify | i: Info | d: Delete | R: Refresh | Esc: Back
r: Restore | v: Verify | i: Info | d: Diagnose | D: Delete | R: Refresh | Esc: Back
```
**Configuration Settings:**
@@ -190,6 +197,7 @@ Configuration Settings
> Database Type: postgres
CPU Workload Type: balanced
Backup Directory: /root/db_backups
Work Directory: /tmp
Compression Level: 6
Parallel Jobs: 16
Dump Jobs: 8
@@ -240,6 +248,12 @@ dbbackup restore single backup.dump --target myapp_db --create --confirm
# Restore cluster
dbbackup restore cluster cluster_backup.tar.gz --confirm
# Restore with debug logging (saves detailed error report on failure)
dbbackup restore cluster backup.tar.gz --save-debug-log /tmp/restore-debug.json --confirm
# Diagnose backup before restore
dbbackup restore diagnose backup.dump.gz --deep
# Cloud backup
dbbackup backup single mydb --cloud s3://my-bucket/backups/
@@ -257,6 +271,7 @@ dbbackup backup single mydb --dry-run
| `restore single` | Restore single database |
| `restore cluster` | Restore full cluster |
| `restore pitr` | Point-in-Time Recovery |
| `restore diagnose` | Diagnose backup file integrity |
| `verify-backup` | Verify backup integrity |
| `cleanup` | Remove old backups |
| `status` | Check connection status |
@@ -271,6 +286,10 @@ dbbackup backup single mydb --dry-run
| `drill` | DR drill testing |
| `report` | Compliance report generation |
| `rto` | RTO/RPO analysis |
| `install` | Install as systemd service |
| `uninstall` | Remove systemd service |
| `metrics export` | Export Prometheus metrics to textfile |
| `metrics serve` | Run Prometheus HTTP exporter |
## Global Flags
@@ -287,8 +306,8 @@ dbbackup backup single mydb --dry-run
| `--cloud` | Cloud storage URI | - |
| `--encrypt` | Enable encryption | false |
| `--dry-run, -n` | Run preflight checks only | false |
| `--notify` | Enable notifications | false |
| `--debug` | Enable debug logging | false |
| `--save-debug-log` | Save error report to file on failure | - |
## Encryption
@@ -436,9 +455,64 @@ dbbackup backup cluster -n # Short flag
Ready to backup. Remove --dry-run to execute.
```
## Backup Diagnosis
Diagnose backup files before restore to detect corruption or truncation:
```bash
# Diagnose a backup file
dbbackup restore diagnose backup.dump.gz
# Deep analysis (line-by-line COPY block verification)
dbbackup restore diagnose backup.dump.gz --deep
# JSON output for automation
dbbackup restore diagnose backup.dump.gz --json
# Diagnose cluster archive (checks all contained dumps)
dbbackup restore diagnose cluster_backup.tar.gz --deep
```
**Checks performed:**
- PGDMP signature validation (PostgreSQL custom format)
- Gzip integrity verification
- COPY block termination (detects truncated dumps)
- `pg_restore --list` validation
- Archive structure analysis
**Example output:**
```
🔍 Backup Diagnosis Report
══════════════════════════════════════════════════════════════
📁 File: mydb_20260105.dump.gz
Format: PostgreSQL Custom (gzip)
Size: 2.5 GB
🔬 Analysis Results:
✅ Gzip integrity: Valid
✅ PGDMP signature: Valid
✅ pg_restore --list: Success (245 objects)
❌ COPY block check: TRUNCATED
⚠️ Issues Found:
- COPY block for table 'orders' not terminated
- Dump appears truncated at line 1,234,567
💡 Recommendations:
- Re-run the backup for this database
- Check disk space on backup server
- Verify network stability during backup
```
**In Interactive Mode:**
- Press `d` in archive browser to diagnose any backup
- Automatic dump validity check in restore preview
- Toggle debug logging with `d` in restore options
## Notifications
Get alerted on backup events via email or webhooks.
Get alerted on backup events via email or webhooks. Configure via environment variables.
### SMTP Email
@@ -451,8 +525,8 @@ export NOTIFY_SMTP_PASSWORD="secret"
export NOTIFY_SMTP_FROM="dbbackup@example.com"
export NOTIFY_SMTP_TO="admin@example.com,dba@example.com"
# Enable notifications
dbbackup backup single mydb --notify
# Run backup (notifications triggered when SMTP is configured)
dbbackup backup single mydb
```
### Webhooks
@@ -465,7 +539,8 @@ export NOTIFY_WEBHOOK_SECRET="signing-secret" # Optional HMAC signing
# Slack webhook
export NOTIFY_WEBHOOK_URL="https://hooks.slack.com/services/T00/B00/XXX"
dbbackup backup single mydb --notify
# Run backup (notifications triggered when webhook is configured)
dbbackup backup single mydb
```
**Webhook payload:**
@@ -604,6 +679,102 @@ dbbackup rto analyze mydb --target-rto 4h --target-rpo 1h
- Compliance status
- Recommendations for improvement
## Systemd Integration
Install dbbackup as a systemd service for automated scheduled backups:
```bash
# Install with Prometheus metrics exporter
sudo dbbackup install --backup-type cluster --with-metrics
# Preview what would be installed
dbbackup install --dry-run --backup-type cluster
# Check installation status
dbbackup install --status
# Uninstall
sudo dbbackup uninstall cluster --purge
```
**Schedule options:**
```bash
--schedule daily # Every day at midnight (default)
--schedule weekly # Every Monday at midnight
--schedule "*-*-* 02:00:00" # Every day at 2am
--schedule "Mon *-*-* 03:00" # Every Monday at 3am
```
**What gets installed:**
- Systemd service and timer units
- Dedicated `dbbackup` user with security hardening
- Directories: `/var/lib/dbbackup/`, `/etc/dbbackup/`
- Optional: Prometheus HTTP exporter on port 9399
📖 **Full documentation:** [SYSTEMD.md](SYSTEMD.md) - Manual setup, security hardening, multiple instances, troubleshooting
## Prometheus Metrics
Export backup metrics for monitoring with Prometheus:
### Textfile Collector
For integration with node_exporter:
```bash
# Export metrics to textfile
dbbackup metrics export --output /var/lib/node_exporter/textfile_collector/dbbackup.prom
# Export for specific instance
dbbackup metrics export --instance production --output /var/lib/dbbackup/metrics/production.prom
```
Configure node_exporter:
```bash
node_exporter --collector.textfile.directory=/var/lib/node_exporter/textfile_collector/
```
### HTTP Exporter
Run a dedicated metrics HTTP server:
```bash
# Start metrics server on default port 9399
dbbackup metrics serve
# Custom port
dbbackup metrics serve --port 9100
# Run as systemd service (installed via --with-metrics)
sudo systemctl start dbbackup-exporter
```
**Endpoints:**
- `/metrics` - Prometheus exposition format
- `/health` - Health check (returns 200 OK)
**Available metrics:**
| Metric | Type | Description |
|--------|------|-------------|
| `dbbackup_last_success_timestamp` | gauge | Unix timestamp of last successful backup |
| `dbbackup_last_backup_duration_seconds` | gauge | Duration of last backup |
| `dbbackup_last_backup_size_bytes` | gauge | Size of last backup |
| `dbbackup_backup_total` | counter | Total backups by status (success/failure) |
| `dbbackup_rpo_seconds` | gauge | Seconds since last successful backup |
| `dbbackup_backup_verified` | gauge | Whether last backup was verified (1/0) |
| `dbbackup_scrape_timestamp` | gauge | When metrics were collected |
**Labels:** `instance`, `database`, `engine`
**Example Prometheus query:**
```promql
# Alert if RPO exceeds 24 hours
dbbackup_rpo_seconds{instance="production"} > 86400
# Backup success rate
sum(rate(dbbackup_backup_total{status="success"}[24h])) / sum(rate(dbbackup_backup_total[24h]))
```
## Configuration
### PostgreSQL Authentication
@@ -687,6 +858,7 @@ Workload types:
## Documentation
- [SYSTEMD.md](SYSTEMD.md) - Systemd installation & scheduling
- [DOCKER.md](DOCKER.md) - Docker deployment
- [CLOUD.md](CLOUD.md) - Cloud storage configuration
- [PITR.md](PITR.md) - Point-in-Time Recovery

View File

@@ -1,396 +0,0 @@
# dbbackup v3.1.0 - Enterprise Backup Solution
**Released:** November 26, 2025
---
## 🎉 Major Features
### Point-in-Time Recovery (PITR)
Complete PostgreSQL Point-in-Time Recovery implementation:
- **WAL Archiving**: Continuous archiving of Write-Ahead Log files
- **WAL Monitoring**: Real-time monitoring of archive status and statistics
- **Timeline Management**: Track and visualize PostgreSQL timeline branching
- **Recovery Targets**: Restore to any point in time:
- Specific timestamp (`--target-time "2024-11-26 12:00:00"`)
- Transaction ID (`--target-xid 1000000`)
- Log Sequence Number (`--target-lsn "0/3000000"`)
- Named restore point (`--target-name before_migration`)
- Earliest consistent point (`--target-immediate`)
- **Version Support**: Both PostgreSQL 12+ (modern) and legacy formats
- **Recovery Actions**: Promote to primary, pause for inspection, or shutdown
- **Comprehensive Testing**: 700+ lines of tests with 100% pass rate
**New Commands:**
- `pitr enable/disable/status` - PITR configuration management
- `wal archive/list/cleanup/timeline` - WAL archive operations
- `restore pitr` - Point-in-time recovery with multiple target types
### Cloud Storage Integration
Multi-cloud backend support with streaming efficiency:
- **Amazon S3 / MinIO**: Full S3-compatible storage support
- **Azure Blob Storage**: Native Azure integration
- **Google Cloud Storage**: GCS backend support
- **Streaming Operations**: Memory-efficient uploads/downloads
- **Cloud-Native**: Direct backup to cloud, no local disk required
**Features:**
- Automatic multipart uploads for large files
- Resumable downloads with retry logic
- Cloud-side encryption support
- Metadata preservation in cloud storage
### Incremental Backups
Space-efficient backup strategies:
- **PostgreSQL**: File-level incremental backups
- Track changed files since base backup
- Automatic base backup detection
- Efficient restore chain resolution
- **MySQL/MariaDB**: Binary log incremental backups
- Capture changes via binlog
- Automatic log rotation handling
- Point-in-time restore capability
**Benefits:**
- 70-90% reduction in backup size
- Faster backup completion times
- Automated backup chain management
- Intelligent dependency tracking
### AES-256-GCM Encryption
Military-grade encryption for data protection:
- **Algorithm**: AES-256-GCM authenticated encryption
- **Key Derivation**: PBKDF2-SHA256 with 600,000 iterations (OWASP 2023)
- **Streaming**: Memory-efficient for large backups
- **Key Sources**: File (raw/base64), environment variable, or passphrase
- **Auto-Detection**: Restore automatically detects encrypted backups
- **Tamper Protection**: Authenticated encryption prevents tampering
**Security:**
- Unique nonce per encryption (no key reuse)
- Cryptographically secure random generation
- 56-byte header with algorithm metadata
- ~1-2 GB/s encryption throughput
### Foundation Features
Production-ready backup operations:
- **SHA-256 Verification**: Cryptographic backup integrity checking
- **Intelligent Retention**: Day-based policies with minimum backup guarantees
- **Safe Cleanup**: Dry-run mode, safety checks, detailed reporting
- **Multi-Database**: PostgreSQL, MySQL, MariaDB support
- **Interactive TUI**: Beautiful terminal UI with progress tracking
- **CLI Mode**: Full command-line interface for automation
- **Cross-Platform**: Linux, macOS, FreeBSD, OpenBSD, NetBSD
- **Docker Support**: Official container images
- **100% Test Coverage**: Comprehensive test suite
---
## ✅ Production Validated
**Real-World Deployment:**
- ✅ 2 production hosts in production environment
- ✅ 8 databases backed up nightly
- ✅ 30-day retention with minimum 5 backups
- ✅ ~10MB/night backup volume
- ✅ Scheduled at 02:09 and 02:25 CET
-**Resolved 4-day backup failure immediately**
**User Feedback (Ansible Claude):**
> "cleanup command is SO gut, dass es alle verwenden sollten"
> "--dry-run feature: chef's kiss!" 💋
> "Modern tooling in place, pragmatic and maintainable"
> "CLI design: Professional & polished"
**Impact:**
- Fixed failing backup infrastructure on first deployment
- Stable operation in production environment
- Positive feedback from DevOps team
- Validation of feature set and UX design
---
## 📦 Installation
### Download Pre-compiled Binary
**Linux (x86_64):**
```bash
wget https://git.uuxo.net/PlusOne/dbbackup/releases/download/v3.1.0/dbbackup-linux-amd64
chmod +x dbbackup-linux-amd64
sudo mv dbbackup-linux-amd64 /usr/local/bin/dbbackup
```
**Linux (ARM64):**
```bash
wget https://git.uuxo.net/PlusOne/dbbackup/releases/download/v3.1.0/dbbackup-linux-arm64
chmod +x dbbackup-linux-arm64
sudo mv dbbackup-linux-arm64 /usr/local/bin/dbbackup
```
**macOS (Intel):**
```bash
wget https://git.uuxo.net/PlusOne/dbbackup/releases/download/v3.1.0/dbbackup-darwin-amd64
chmod +x dbbackup-darwin-amd64
sudo mv dbbackup-darwin-amd64 /usr/local/bin/dbbackup
```
**macOS (Apple Silicon):**
```bash
wget https://git.uuxo.net/PlusOne/dbbackup/releases/download/v3.1.0/dbbackup-darwin-arm64
chmod +x dbbackup-darwin-arm64
sudo mv dbbackup-darwin-arm64 /usr/local/bin/dbbackup
```
### Build from Source
```bash
git clone https://git.uuxo.net/PlusOne/dbbackup.git
cd dbbackup
go build -o dbbackup
sudo mv dbbackup /usr/local/bin/
```
### Docker
```bash
docker pull git.uuxo.net/PlusOne/dbbackup:v3.1.0
docker pull git.uuxo.net/PlusOne/dbbackup:latest
```
---
## 🚀 Quick Start Examples
### Basic Backup
```bash
# Simple database backup
dbbackup backup single mydb
# Backup with verification
dbbackup backup single mydb
dbbackup verify mydb_backup.sql.gz
```
### Cloud Backup
```bash
# Backup to S3
dbbackup backup single mydb --cloud s3://my-bucket/backups/
# Backup to Azure
dbbackup backup single mydb --cloud azure://container/backups/
# Backup to GCS
dbbackup backup single mydb --cloud gs://my-bucket/backups/
```
### Encrypted Backup
```bash
# Generate encryption key
head -c 32 /dev/urandom | base64 > encryption.key
# Encrypted backup
dbbackup backup single mydb --encrypt --encryption-key-file encryption.key
# Restore (automatic decryption)
dbbackup restore single mydb_backup.sql.gz --encryption-key-file encryption.key
```
### Incremental Backup
```bash
# Create base backup
dbbackup backup single mydb --backup-type full
# Create incremental backup
dbbackup backup single mydb --backup-type incremental \
--base-backup mydb_base_20241126_120000.tar.gz
# Restore (automatic chain resolution)
dbbackup restore single mydb_incr_20241126_150000.tar.gz
```
### Point-in-Time Recovery
```bash
# Enable PITR
dbbackup pitr enable --archive-dir /backups/wal_archive
# Take base backup
pg_basebackup -D /backups/base.tar.gz -Ft -z -P
# Perform PITR
dbbackup restore pitr \
--base-backup /backups/base.tar.gz \
--wal-archive /backups/wal_archive \
--target-time "2024-11-26 12:00:00" \
--target-dir /var/lib/postgresql/14/restored
# Monitor WAL archiving
dbbackup pitr status
dbbackup wal list
```
### Retention & Cleanup
```bash
# Cleanup old backups (dry-run first!)
dbbackup cleanup --retention-days 30 --min-backups 5 --dry-run
# Actually cleanup
dbbackup cleanup --retention-days 30 --min-backups 5
```
### Cluster Operations
```bash
# Backup entire cluster
dbbackup backup cluster
# Restore entire cluster
dbbackup restore cluster --backups /path/to/backups/ --confirm
```
---
## 🔮 What's Next (v3.2)
Based on production feedback from Ansible Claude:
### High Priority
1. **Config File Support** (2-3h)
- Persist flags like `--allow-root` in `.dbbackup.conf`
- Per-directory configuration management
- Better automation support
2. **Socket Auth Auto-Detection** (1-2h)
- Auto-detect Unix socket authentication
- Skip password prompts for socket connections
- Improved UX for root users
### Medium Priority
3. **Inline Backup Verification** (2-3h)
- Automatic verification after backup
- Immediate corruption detection
- Better workflow integration
4. **Progress Indicators** (4-6h)
- Progress bars for mysqldump operations
- Real-time backup size tracking
- ETA for large backups
### Additional Features
5. **Ansible Module** (4-6h)
- Native Ansible integration
- Declarative backup configuration
- DevOps automation support
---
## 📊 Performance Metrics
**Backup Performance:**
- PostgreSQL: 50-150 MB/s (network dependent)
- MySQL: 30-100 MB/s (with compression)
- Encryption: ~1-2 GB/s (streaming)
- Compression: 70-80% size reduction (typical)
**PITR Performance:**
- WAL archiving: 100-200 MB/s
- WAL encryption: ~1-2 GB/s
- Recovery replay: 10-100 MB/s (disk I/O dependent)
**Resource Usage:**
- Memory: ~1GB constant (streaming architecture)
- CPU: 1-4 cores (configurable)
- Disk I/O: Streaming (no intermediate files)
---
## 🏗️ Architecture Highlights
**Split-Brain Development:**
- Human architects system design
- AI implements features and tests
- Micro-task decomposition (1-2h phases)
- Progressive enhancement approach
- **Result:** 52% faster development (5.75h vs 12h planned)
**Key Innovations:**
- Streaming architecture for constant memory usage
- Interface-first design for clean modularity
- Comprehensive test coverage (700+ test lines)
- Production validation in parallel with development
---
## 📄 Documentation
**Core Documentation:**
- [README.md](README.md) - Complete feature overview and setup
- [PITR.md](PITR.md) - Comprehensive PITR guide
- [DOCKER.md](DOCKER.md) - Docker usage and deployment
- [CHANGELOG.md](CHANGELOG.md) - Detailed version history
**Getting Started:**
- [QUICKRUN.md](QUICKRUN.MD) - Quick start guide
- [PROGRESS_IMPLEMENTATION.md](PROGRESS_IMPLEMENTATION.md) - Progress tracking
---
## 📜 License
Apache License 2.0
Copyright 2025 dbbackup Project
Licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) for details.
---
## 🙏 Credits
**Development:**
- Built using Multi-Claude collaboration architecture
- Split-brain development pattern (human architecture + AI implementation)
- 5.75 hours intensive development (52% time savings)
**Production Validation:**
- Deployed in production environments
- Real-world testing and feedback
- DevOps validation and feature requests
**Technologies:**
- Go 1.21+
- PostgreSQL 9.5-17
- MySQL/MariaDB 5.7+
- AWS SDK, Azure SDK, Google Cloud SDK
- Cobra CLI framework
---
## 🐛 Known Issues
None reported in production deployment.
If you encounter issues, please report them at:
https://git.uuxo.net/PlusOne/dbbackup/issues
---
## 📞 Support
**Documentation:** See [README.md](README.md) and [PITR.md](PITR.md)
**Issues:** https://git.uuxo.net/PlusOne/dbbackup/issues
**Repository:** https://git.uuxo.net/PlusOne/dbbackup
---
**Thank you for using dbbackup!** 🎉
*Professional database backup and restore utility for PostgreSQL, MySQL, and MariaDB.*

View File

@@ -85,7 +85,7 @@ We release security updates for the following versions:
- Never store unencrypted backups on public storage
**Docker Usage:**
- Use specific version tags (`:v3.1.0` not `:latest`)
- Use specific version tags (`:v3.2.0` not `:latest`)
- Run as non-root user (default in our image)
- Mount volumes read-only when possible
- Use Docker secrets for credentials

529
SYSTEMD.md Normal file
View File

@@ -0,0 +1,529 @@
# Systemd Integration Guide
This guide covers installing dbbackup as a systemd service for automated scheduled backups.
## Quick Start (Installer)
The easiest way to set up systemd services is using the built-in installer:
```bash
# Install as cluster backup service (daily at midnight)
sudo dbbackup install --backup-type cluster --schedule daily
# Check what would be installed (dry-run)
dbbackup install --dry-run --backup-type cluster
# Check installation status
dbbackup install --status
# Uninstall
sudo dbbackup uninstall cluster --purge
```
## Installer Options
| Flag | Description | Default |
|------|-------------|---------|
| `--instance NAME` | Instance name for named backups | - |
| `--backup-type TYPE` | Backup type: `cluster`, `single`, `sample` | `cluster` |
| `--schedule SPEC` | Timer schedule (see below) | `daily` |
| `--with-metrics` | Install Prometheus metrics exporter | false |
| `--metrics-port PORT` | HTTP port for metrics exporter | 9399 |
| `--dry-run` | Preview changes without applying | false |
### Schedule Format
The `--schedule` option accepts systemd OnCalendar format:
| Value | Description |
|-------|-------------|
| `daily` | Every day at midnight |
| `weekly` | Every Monday at midnight |
| `hourly` | Every hour |
| `*-*-* 02:00:00` | Every day at 2:00 AM |
| `*-*-* 00/6:00:00` | Every 6 hours |
| `Mon *-*-* 03:00` | Every Monday at 3:00 AM |
| `*-*-01 00:00:00` | First day of every month |
Test schedule with: `systemd-analyze calendar "Mon *-*-* 03:00"`
## What Gets Installed
### Directory Structure
```
/etc/dbbackup/
├── dbbackup.conf # Main configuration
└── env.d/
└── cluster.conf # Instance credentials (mode 0600)
/var/lib/dbbackup/
├── catalog/
│ └── backups.db # SQLite backup catalog
├── backups/ # Default backup storage
└── metrics/ # Prometheus textfile metrics
/var/log/dbbackup/ # Log files
/usr/local/bin/dbbackup # Binary copy
```
### Systemd Units
**For cluster backups:**
- `/etc/systemd/system/dbbackup-cluster.service` - Backup service
- `/etc/systemd/system/dbbackup-cluster.timer` - Backup scheduler
**For named instances:**
- `/etc/systemd/system/dbbackup@.service` - Template service
- `/etc/systemd/system/dbbackup@.timer` - Template timer
**Metrics exporter (optional):**
- `/etc/systemd/system/dbbackup-exporter.service`
### System User
A dedicated `dbbackup` user and group are created:
- Home: `/var/lib/dbbackup`
- Shell: `/usr/sbin/nologin`
- Purpose: Run backup services with minimal privileges
## Manual Installation
If you prefer to set up systemd services manually without the installer:
### Step 1: Create User and Directories
```bash
# Create system user
sudo useradd --system --home-dir /var/lib/dbbackup --shell /usr/sbin/nologin dbbackup
# Create directories
sudo mkdir -p /etc/dbbackup/env.d
sudo mkdir -p /var/lib/dbbackup/{catalog,backups,metrics}
sudo mkdir -p /var/log/dbbackup
# Set ownership
sudo chown -R dbbackup:dbbackup /var/lib/dbbackup /var/log/dbbackup
sudo chown root:dbbackup /etc/dbbackup
sudo chmod 750 /etc/dbbackup
# Copy binary
sudo cp dbbackup /usr/local/bin/
sudo chmod 755 /usr/local/bin/dbbackup
```
### Step 2: Create Configuration
```bash
# Main configuration
sudo tee /etc/dbbackup/dbbackup.conf << 'EOF'
# DBBackup Configuration
db-type=postgres
host=localhost
port=5432
user=postgres
backup-dir=/var/lib/dbbackup/backups
compression=6
retention-days=30
min-backups=7
EOF
# Instance credentials (secure permissions)
sudo tee /etc/dbbackup/env.d/cluster.conf << 'EOF'
PGPASSWORD=your_secure_password
# Or for MySQL:
# MYSQL_PWD=your_secure_password
EOF
sudo chmod 600 /etc/dbbackup/env.d/cluster.conf
sudo chown dbbackup:dbbackup /etc/dbbackup/env.d/cluster.conf
```
### Step 3: Create Service Unit
```bash
sudo tee /etc/systemd/system/dbbackup-cluster.service << 'EOF'
[Unit]
Description=DBBackup Cluster Backup
Documentation=https://github.com/PlusOne/dbbackup
After=network.target postgresql.service mysql.service
Wants=network.target
[Service]
Type=oneshot
User=dbbackup
Group=dbbackup
# Load configuration
EnvironmentFile=-/etc/dbbackup/env.d/cluster.conf
# Working directory
WorkingDirectory=/var/lib/dbbackup
# Execute backup
ExecStart=/usr/local/bin/dbbackup backup cluster \
--config /etc/dbbackup/dbbackup.conf \
--backup-dir /var/lib/dbbackup/backups \
--allow-root
# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
PrivateDevices=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
MemoryDenyWriteExecute=yes
LockPersonality=yes
# Allow write to specific paths
ReadWritePaths=/var/lib/dbbackup /var/log/dbbackup
# Capability restrictions
CapabilityBoundingSet=CAP_DAC_READ_SEARCH CAP_NET_CONNECT
AmbientCapabilities=
# Resource limits
MemoryMax=4G
CPUQuota=80%
# Prevent OOM killer from terminating backups
OOMScoreAdjust=-100
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=dbbackup
[Install]
WantedBy=multi-user.target
EOF
```
### Step 4: Create Timer Unit
```bash
sudo tee /etc/systemd/system/dbbackup-cluster.timer << 'EOF'
[Unit]
Description=DBBackup Cluster Backup Timer
Documentation=https://github.com/PlusOne/dbbackup
[Timer]
# Run daily at midnight
OnCalendar=daily
# Randomize start time within 15 minutes to avoid thundering herd
RandomizedDelaySec=900
# Run immediately if we missed the last scheduled time
Persistent=true
# Run even if system was sleeping
WakeSystem=false
[Install]
WantedBy=timers.target
EOF
```
### Step 5: Enable and Start
```bash
# Reload systemd
sudo systemctl daemon-reload
# Enable timer (auto-start on boot)
sudo systemctl enable dbbackup-cluster.timer
# Start timer
sudo systemctl start dbbackup-cluster.timer
# Verify timer is active
sudo systemctl status dbbackup-cluster.timer
# View next scheduled run
sudo systemctl list-timers dbbackup-cluster.timer
```
### Step 6: Test Backup
```bash
# Run backup manually
sudo systemctl start dbbackup-cluster.service
# Check status
sudo systemctl status dbbackup-cluster.service
# View logs
sudo journalctl -u dbbackup-cluster.service -f
```
## Prometheus Metrics Exporter (Manual)
### Service Unit
```bash
sudo tee /etc/systemd/system/dbbackup-exporter.service << 'EOF'
[Unit]
Description=DBBackup Prometheus Metrics Exporter
Documentation=https://github.com/PlusOne/dbbackup
After=network.target
[Service]
Type=simple
User=dbbackup
Group=dbbackup
# Working directory
WorkingDirectory=/var/lib/dbbackup
# Start HTTP metrics server
ExecStart=/usr/local/bin/dbbackup metrics serve --port 9399
# Restart on failure
Restart=on-failure
RestartSec=10
# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
PrivateDevices=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
LockPersonality=yes
# Catalog access
ReadWritePaths=/var/lib/dbbackup
# Capability restrictions
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=
# Logging
StandardOutput=journal
StandardError=journal
SyslogIdentifier=dbbackup-exporter
[Install]
WantedBy=multi-user.target
EOF
```
### Enable Exporter
```bash
sudo systemctl daemon-reload
sudo systemctl enable dbbackup-exporter
sudo systemctl start dbbackup-exporter
# Test
curl http://localhost:9399/health
curl http://localhost:9399/metrics
```
### Prometheus Configuration
Add to `prometheus.yml`:
```yaml
scrape_configs:
- job_name: 'dbbackup'
static_configs:
- targets: ['localhost:9399']
scrape_interval: 60s
```
## Security Hardening
The systemd units include comprehensive security hardening:
| Setting | Purpose |
|---------|---------|
| `NoNewPrivileges=yes` | Prevent privilege escalation |
| `ProtectSystem=strict` | Read-only filesystem except allowed paths |
| `ProtectHome=yes` | Block access to /home, /root, /run/user |
| `PrivateTmp=yes` | Isolated /tmp namespace |
| `PrivateDevices=yes` | No access to physical devices |
| `RestrictAddressFamilies` | Only Unix and IP sockets |
| `MemoryDenyWriteExecute=yes` | Prevent code injection |
| `CapabilityBoundingSet` | Minimal Linux capabilities |
| `OOMScoreAdjust=-100` | Protect backup from OOM killer |
### Database Access
For PostgreSQL with peer authentication:
```bash
# Add dbbackup user to postgres group
sudo usermod -aG postgres dbbackup
# Or create a .pgpass file
sudo -u dbbackup tee /var/lib/dbbackup/.pgpass << EOF
localhost:5432:*:postgres:password
EOF
sudo chmod 600 /var/lib/dbbackup/.pgpass
```
For PostgreSQL with password authentication:
```bash
# Store password in environment file
echo "PGPASSWORD=your_password" | sudo tee /etc/dbbackup/env.d/cluster.conf
sudo chmod 600 /etc/dbbackup/env.d/cluster.conf
```
## Multiple Instances
Run different backup configurations as separate instances:
```bash
# Install multiple instances
sudo dbbackup install --instance production --schedule "*-*-* 02:00:00"
sudo dbbackup install --instance staging --schedule "*-*-* 04:00:00"
sudo dbbackup install --instance analytics --schedule "weekly"
# Manage individually
sudo systemctl status dbbackup@production.timer
sudo systemctl start dbbackup@staging.service
```
Each instance has its own:
- Configuration: `/etc/dbbackup/env.d/<instance>.conf`
- Timer schedule
- Journal logs: `journalctl -u dbbackup@<instance>.service`
## Troubleshooting
### View Logs
```bash
# Real-time logs
sudo journalctl -u dbbackup-cluster.service -f
# Last backup run
sudo journalctl -u dbbackup-cluster.service -n 100
# All dbbackup logs
sudo journalctl -t dbbackup
# Exporter logs
sudo journalctl -u dbbackup-exporter -f
```
### Timer Not Running
```bash
# Check timer status
sudo systemctl status dbbackup-cluster.timer
# List all timers
sudo systemctl list-timers --all | grep dbbackup
# Check if timer is enabled
sudo systemctl is-enabled dbbackup-cluster.timer
```
### Service Fails to Start
```bash
# Check service status
sudo systemctl status dbbackup-cluster.service
# View detailed error
sudo journalctl -u dbbackup-cluster.service -n 50 --no-pager
# Test manually as dbbackup user
sudo -u dbbackup /usr/local/bin/dbbackup backup cluster --config /etc/dbbackup/dbbackup.conf
# Check permissions
ls -la /var/lib/dbbackup/
ls -la /etc/dbbackup/
```
### Permission Denied
```bash
# Fix ownership
sudo chown -R dbbackup:dbbackup /var/lib/dbbackup
# Check SELinux (if enabled)
sudo ausearch -m avc -ts recent
# Check AppArmor (if enabled)
sudo aa-status
```
### Exporter Not Accessible
```bash
# Check if running
sudo systemctl status dbbackup-exporter
# Check port binding
sudo ss -tlnp | grep 9399
# Test locally
curl -v http://localhost:9399/health
# Check firewall
sudo ufw status
sudo iptables -L -n | grep 9399
```
## Uninstallation
### Using Installer
```bash
# Remove cluster backup (keeps config)
sudo dbbackup uninstall cluster
# Remove and purge configuration
sudo dbbackup uninstall cluster --purge
# Remove named instance
sudo dbbackup uninstall production --purge
```
### Manual Removal
```bash
# Stop and disable services
sudo systemctl stop dbbackup-cluster.timer dbbackup-cluster.service dbbackup-exporter
sudo systemctl disable dbbackup-cluster.timer dbbackup-exporter
# Remove unit files
sudo rm /etc/systemd/system/dbbackup-cluster.service
sudo rm /etc/systemd/system/dbbackup-cluster.timer
sudo rm /etc/systemd/system/dbbackup-exporter.service
sudo rm /etc/systemd/system/dbbackup@.service
sudo rm /etc/systemd/system/dbbackup@.timer
# Reload systemd
sudo systemctl daemon-reload
# Optional: Remove user and directories
sudo userdel dbbackup
sudo rm -rf /var/lib/dbbackup
sudo rm -rf /etc/dbbackup
sudo rm -rf /var/log/dbbackup
sudo rm /usr/local/bin/dbbackup
```
## See Also
- [README.md](README.md) - Main documentation
- [DOCKER.md](DOCKER.md) - Docker deployment
- [CLOUD.md](CLOUD.md) - Cloud storage configuration
- [PITR.md](PITR.md) - Point-in-Time Recovery

View File

@@ -43,7 +43,7 @@ What are you actually getting?
```bash
# Physical backup at InnoDB page level
# No XtraBackup. No external tools. Pure Go.
dbbackup backup --engine=clone --output=s3://bucket/backup
dbbackup backup single mydb --db-type mysql --cloud s3://bucket/backups/
```
### Filesystem Snapshots
@@ -78,10 +78,10 @@ dbbackup backup --engine=streaming --parallel-workers=8
**Day 1**: Run dbbackup alongside existing solution
```bash
# Test backup
dbbackup backup --database=mydb --output=s3://test-bucket/
dbbackup backup single mydb --cloud s3://test-bucket/
# Verify integrity
dbbackup verify s3://test-bucket/backup.sql.gz.enc
dbbackup verify s3://test-bucket/mydb_20260115.dump.gz
```
**Week 1**: Compare backup times, storage costs, recovery speed
@@ -112,10 +112,9 @@ curl -LO https://github.com/UUXO/dbbackup/releases/latest/download/dbbackup_linu
chmod +x dbbackup_linux_amd64
# Your first backup
./dbbackup_linux_amd64 backup \
--database=production \
--engine=auto \
--output=s3://my-backups/$(date +%Y%m%d)/
./dbbackup_linux_amd64 backup single production \
--db-type mysql \
--cloud s3://my-backups/
```
## The Bottom Line
@@ -131,4 +130,4 @@ Every dollar you spend on backup licensing is a dollar not spent on:
*Apache 2.0 Licensed. Free forever. No sales calls required.*
[GitHub](https://github.com/UUXO/dbbackup) | [Documentation](https://github.com/UUXO/dbbackup#readme) | [Release Notes](RELEASE_NOTES_v3.2.md)
[GitHub](https://github.com/UUXO/dbbackup) | [Documentation](https://github.com/UUXO/dbbackup#readme) | [Changelog](CHANGELOG.md)

98
bin/README.md Normal file
View File

@@ -0,0 +1,98 @@
# DB Backup Tool - Pre-compiled Binaries
## Download
**Binaries are distributed via GitHub Releases:**
📦 **https://github.com/PlusOne/dbbackup/releases**
Or build from source:
```bash
git clone https://github.com/PlusOne/dbbackup.git
cd dbbackup
./build_all.sh
```
## Build Information
- **Version**: 3.40.0
- **Build Time**: 2026-01-07_10:55:47_UTC
- **Git Commit**: 495ee31
## Recent Updates (v1.1.0)
- ✅ Fixed TUI progress display with line-by-line output
- ✅ Added interactive configuration settings menu
- ✅ Improved menu navigation and responsiveness
- ✅ Enhanced completion status handling
- ✅ Better CPU detection and optimization
- ✅ Silent mode support for TUI operations
## Available Binaries
### Linux
- `dbbackup_linux_amd64` - Linux 64-bit (Intel/AMD)
- `dbbackup_linux_arm64` - Linux 64-bit (ARM)
- `dbbackup_linux_arm_armv7` - Linux 32-bit (ARMv7)
### macOS
- `dbbackup_darwin_amd64` - macOS 64-bit (Intel)
- `dbbackup_darwin_arm64` - macOS 64-bit (Apple Silicon)
### Windows
- `dbbackup_windows_amd64.exe` - Windows 64-bit (Intel/AMD)
- `dbbackup_windows_arm64.exe` - Windows 64-bit (ARM)
### BSD Systems
- `dbbackup_freebsd_amd64` - FreeBSD 64-bit
- `dbbackup_openbsd_amd64` - OpenBSD 64-bit
- `dbbackup_netbsd_amd64` - NetBSD 64-bit
## Usage
1. Download the appropriate binary for your platform
2. Make it executable (Unix-like systems): `chmod +x dbbackup_*`
3. Run: `./dbbackup_* --help`
## Interactive Mode
Launch the interactive TUI menu for easy configuration and operation:
```bash
# Interactive mode with TUI menu
./dbbackup_linux_amd64
# Features:
# - Interactive configuration settings
# - Real-time progress display
# - Operation history and status
# - CPU detection and optimization
```
## Command Line Mode
Direct command line usage with line-by-line progress:
```bash
# Show CPU information and optimization settings
./dbbackup_linux_amd64 cpu
# Auto-optimize for your hardware
./dbbackup_linux_amd64 backup cluster --auto-detect-cores
# Manual CPU configuration
./dbbackup_linux_amd64 backup single mydb --jobs 8 --dump-jobs 4
# Line-by-line progress output
./dbbackup_linux_amd64 backup cluster --progress-type line
```
## CPU Detection
All binaries include advanced CPU detection capabilities:
- Automatic core detection for optimal parallelism
- Support for different workload types (CPU-intensive, I/O-intensive, balanced)
- Platform-specific optimizations for Linux, macOS, and Windows
- Interactive CPU configuration in TUI mode
## Support
For issues or questions, please refer to the main project documentation.

View File

@@ -15,7 +15,7 @@ echo "🔧 Using Go version: $GO_VERSION"
# Configuration
APP_NAME="dbbackup"
VERSION="3.1.0"
VERSION="3.40.0"
BUILD_TIME=$(date -u '+%Y-%m-%d_%H:%M:%S_UTC')
GIT_COMMIT=$(git rev-parse --short HEAD 2>/dev/null || echo "unknown")
BIN_DIR="bin"
@@ -83,7 +83,8 @@ for platform_config in "${PLATFORMS[@]}"; do
echo -e "${YELLOW}[$current/$total_platforms]${NC} Building for ${BOLD}$description${NC} (${platform})"
# Set environment and build (using export for better compatibility)
export GOOS GOARCH
# CGO_ENABLED=0 creates static binaries without glibc dependency
export CGO_ENABLED=0 GOOS GOARCH
if go build -ldflags "$LDFLAGS" -o "${BIN_DIR}/${binary_name}" . 2>/dev/null; then
# Get file size
if [[ "$OSTYPE" == "darwin"* ]]; then

239
cmd/install.go Normal file
View File

@@ -0,0 +1,239 @@
package cmd
import (
"context"
"fmt"
"os"
"os/exec"
"os/signal"
"strings"
"syscall"
"dbbackup/internal/installer"
"github.com/spf13/cobra"
)
var (
// Install flags
installInstance string
installSchedule string
installBackupType string
installUser string
installGroup string
installBackupDir string
installConfigPath string
installTimeout int
installWithMetrics bool
installMetricsPort int
installDryRun bool
installStatus bool
// Uninstall flags
uninstallPurge bool
)
// installCmd represents the install command
var installCmd = &cobra.Command{
Use: "install",
Short: "Install dbbackup as a systemd service",
Long: `Install dbbackup as a systemd service with automatic scheduling.
This command creates systemd service and timer units for automated database backups.
It supports both single database and cluster backup modes.
Examples:
# Interactive installation (will prompt for options)
sudo dbbackup install
# Install cluster backup running daily at 2am
sudo dbbackup install --backup-type cluster --schedule "daily"
# Install single database backup with custom schedule
sudo dbbackup install --instance production --backup-type single --schedule "*-*-* 03:00:00"
# Install with Prometheus metrics exporter
sudo dbbackup install --with-metrics --metrics-port 9399
# Check installation status
dbbackup install --status
# Dry-run to see what would be installed
sudo dbbackup install --dry-run
Schedule format (OnCalendar):
daily - Every day at midnight
weekly - Every Monday at midnight
*-*-* 02:00:00 - Every day at 2am
*-*-* 02,14:00 - Twice daily at 2am and 2pm
Mon *-*-* 03:00 - Every Monday at 3am
`,
RunE: func(cmd *cobra.Command, args []string) error {
// Handle --status flag
if installStatus {
return runInstallStatus(cmd.Context())
}
return runInstall(cmd.Context())
},
}
// uninstallCmd represents the uninstall command
var uninstallCmd = &cobra.Command{
Use: "uninstall [instance]",
Short: "Uninstall dbbackup systemd service",
Long: `Uninstall dbbackup systemd service and timer.
Examples:
# Uninstall default instance
sudo dbbackup uninstall
# Uninstall specific instance
sudo dbbackup uninstall production
# Uninstall and remove all configuration
sudo dbbackup uninstall --purge
`,
RunE: func(cmd *cobra.Command, args []string) error {
instance := "cluster"
if len(args) > 0 {
instance = args[0]
}
return runUninstall(cmd.Context(), instance)
},
}
func init() {
rootCmd.AddCommand(installCmd)
rootCmd.AddCommand(uninstallCmd)
// Install flags
installCmd.Flags().StringVarP(&installInstance, "instance", "i", "", "Instance name (e.g., production, staging)")
installCmd.Flags().StringVarP(&installSchedule, "schedule", "s", "daily", "Backup schedule (OnCalendar format)")
installCmd.Flags().StringVarP(&installBackupType, "backup-type", "t", "cluster", "Backup type: single or cluster")
installCmd.Flags().StringVar(&installUser, "user", "dbbackup", "System user to run backups")
installCmd.Flags().StringVar(&installGroup, "group", "dbbackup", "System group for backup user")
installCmd.Flags().StringVar(&installBackupDir, "backup-dir", "/var/lib/dbbackup/backups", "Directory for backups")
installCmd.Flags().StringVar(&installConfigPath, "config-path", "/etc/dbbackup/dbbackup.conf", "Path to config file")
installCmd.Flags().IntVar(&installTimeout, "timeout", 3600, "Backup timeout in seconds")
installCmd.Flags().BoolVar(&installWithMetrics, "with-metrics", false, "Install Prometheus metrics exporter")
installCmd.Flags().IntVar(&installMetricsPort, "metrics-port", 9399, "Prometheus metrics port")
installCmd.Flags().BoolVar(&installDryRun, "dry-run", false, "Show what would be installed without making changes")
installCmd.Flags().BoolVar(&installStatus, "status", false, "Show installation status")
// Uninstall flags
uninstallCmd.Flags().BoolVar(&uninstallPurge, "purge", false, "Also remove configuration files")
}
func runInstall(ctx context.Context) error {
// Create context with signal handling
ctx, cancel := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
defer cancel()
// Expand schedule shortcuts
schedule := expandSchedule(installSchedule)
// Create installer
inst := installer.NewInstaller(log, installDryRun)
// Set up options
opts := installer.InstallOptions{
Instance: installInstance,
BackupType: installBackupType,
Schedule: schedule,
User: installUser,
Group: installGroup,
BackupDir: installBackupDir,
ConfigPath: installConfigPath,
TimeoutSeconds: installTimeout,
WithMetrics: installWithMetrics,
MetricsPort: installMetricsPort,
}
// For cluster backup, override instance
if installBackupType == "cluster" {
opts.Instance = "cluster"
}
return inst.Install(ctx, opts)
}
func runUninstall(ctx context.Context, instance string) error {
ctx, cancel := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
defer cancel()
inst := installer.NewInstaller(log, false)
return inst.Uninstall(ctx, instance, uninstallPurge)
}
func runInstallStatus(ctx context.Context) error {
inst := installer.NewInstaller(log, false)
// Check cluster status
clusterStatus, err := inst.Status(ctx, "cluster")
if err != nil {
return err
}
fmt.Println()
fmt.Println("📦 DBBackup Installation Status")
fmt.Println(strings.Repeat("═", 50))
if clusterStatus.Installed {
fmt.Println()
fmt.Println("🔹 Cluster Backup:")
fmt.Printf(" Service: %s\n", formatStatus(clusterStatus.Installed, clusterStatus.Active))
fmt.Printf(" Timer: %s\n", formatStatus(clusterStatus.TimerEnabled, clusterStatus.TimerActive))
if clusterStatus.NextRun != "" {
fmt.Printf(" Next run: %s\n", clusterStatus.NextRun)
}
if clusterStatus.LastRun != "" {
fmt.Printf(" Last run: %s\n", clusterStatus.LastRun)
}
} else {
fmt.Println()
fmt.Println("❌ No systemd services installed")
fmt.Println()
fmt.Println("Run 'sudo dbbackup install' to install as a systemd service")
}
// Check for exporter
if _, err := os.Stat("/etc/systemd/system/dbbackup-exporter.service"); err == nil {
fmt.Println()
fmt.Println("🔹 Metrics Exporter:")
// Check if exporter is active using systemctl
cmd := exec.CommandContext(ctx, "systemctl", "is-active", "dbbackup-exporter")
if err := cmd.Run(); err == nil {
fmt.Printf(" Service: ✅ active\n")
} else {
fmt.Printf(" Service: ⚪ inactive\n")
}
}
fmt.Println()
return nil
}
func formatStatus(installed, active bool) string {
if !installed {
return "not installed"
}
if active {
return "✅ active"
}
return "⚪ inactive"
}
func expandSchedule(schedule string) string {
shortcuts := map[string]string{
"hourly": "*-*-* *:00:00",
"daily": "*-*-* 02:00:00",
"weekly": "Mon *-*-* 02:00:00",
"monthly": "*-*-01 02:00:00",
}
if expanded, ok := shortcuts[strings.ToLower(schedule)]; ok {
return expanded
}
return schedule
}

138
cmd/metrics.go Normal file
View File

@@ -0,0 +1,138 @@
package cmd
import (
"context"
"fmt"
"os"
"os/signal"
"syscall"
"dbbackup/internal/prometheus"
"github.com/spf13/cobra"
)
var (
metricsInstance string
metricsOutput string
metricsPort int
)
// metricsCmd represents the metrics command
var metricsCmd = &cobra.Command{
Use: "metrics",
Short: "Prometheus metrics management",
Long: `Prometheus metrics management for dbbackup.
Export metrics to a textfile for node_exporter, or run an HTTP server
for direct Prometheus scraping.`,
}
// metricsExportCmd exports metrics to a textfile
var metricsExportCmd = &cobra.Command{
Use: "export",
Short: "Export metrics to textfile",
Long: `Export Prometheus metrics to a textfile for node_exporter.
The textfile collector in node_exporter can scrape metrics from files
in a designated directory (typically /var/lib/node_exporter/textfile_collector/).
Examples:
# Export metrics to default location
dbbackup metrics export
# Export with custom output path
dbbackup metrics export --output /var/lib/dbbackup/metrics/dbbackup.prom
# Export for specific instance
dbbackup metrics export --instance production --output /var/lib/dbbackup/metrics/production.prom
After export, configure node_exporter with:
--collector.textfile.directory=/var/lib/dbbackup/metrics/
`,
RunE: func(cmd *cobra.Command, args []string) error {
return runMetricsExport(cmd.Context())
},
}
// metricsServeCmd runs the HTTP metrics server
var metricsServeCmd = &cobra.Command{
Use: "serve",
Short: "Run Prometheus HTTP server",
Long: `Run an HTTP server exposing Prometheus metrics.
This starts a long-running daemon that serves metrics at /metrics.
Prometheus can scrape this endpoint directly.
Examples:
# Start server on default port 9399
dbbackup metrics serve
# Start server on custom port
dbbackup metrics serve --port 9100
# Run as systemd service (installed via 'dbbackup install --with-metrics')
sudo systemctl start dbbackup-exporter
Endpoints:
/metrics - Prometheus metrics
/health - Health check (returns 200 OK)
/ - Service info page
`,
RunE: func(cmd *cobra.Command, args []string) error {
return runMetricsServe(cmd.Context())
},
}
func init() {
rootCmd.AddCommand(metricsCmd)
metricsCmd.AddCommand(metricsExportCmd)
metricsCmd.AddCommand(metricsServeCmd)
// Export flags
metricsExportCmd.Flags().StringVar(&metricsInstance, "instance", "default", "Instance name for metrics labels")
metricsExportCmd.Flags().StringVarP(&metricsOutput, "output", "o", "/var/lib/dbbackup/metrics/dbbackup.prom", "Output file path")
// Serve flags
metricsServeCmd.Flags().StringVar(&metricsInstance, "instance", "default", "Instance name for metrics labels")
metricsServeCmd.Flags().IntVarP(&metricsPort, "port", "p", 9399, "HTTP server port")
}
func runMetricsExport(ctx context.Context) error {
// Open catalog
cat, err := openCatalog()
if err != nil {
return fmt.Errorf("failed to open catalog: %w", err)
}
defer cat.Close()
// Create metrics writer
writer := prometheus.NewMetricsWriter(log, cat, metricsInstance)
// Write textfile
if err := writer.WriteTextfile(metricsOutput); err != nil {
return fmt.Errorf("failed to write metrics: %w", err)
}
log.Info("Exported metrics to textfile", "path", metricsOutput, "instance", metricsInstance)
return nil
}
func runMetricsServe(ctx context.Context) error {
// Setup signal handling
ctx, cancel := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
defer cancel()
// Open catalog
cat, err := openCatalog()
if err != nil {
return fmt.Errorf("failed to open catalog: %w", err)
}
defer cat.Close()
// Create exporter
exporter := prometheus.NewExporter(log, cat, metricsInstance, metricsPort)
// Run server (blocks until context is cancelled)
return exporter.Serve(ctx)
}

View File

@@ -33,6 +33,13 @@ var (
restoreNoProgress bool
restoreWorkdir string
restoreCleanCluster bool
restoreDiagnose bool // Run diagnosis before restore
restoreSaveDebugLog string // Path to save debug log on failure
// Diagnose flags
diagnoseJSON bool
diagnoseDeep bool
diagnoseKeepTemp bool
// Encryption flags
restoreEncryptionKeyFile string
@@ -214,12 +221,53 @@ Examples:
RunE: runRestorePITR,
}
// restoreDiagnoseCmd diagnoses backup files before restore
var restoreDiagnoseCmd = &cobra.Command{
Use: "diagnose [archive-file]",
Short: "Diagnose backup file integrity and format",
Long: `Perform deep analysis of backup files to detect issues before restore.
This command validates backup archives and provides detailed diagnostics
including truncation detection, format verification, and COPY block integrity.
Use this when:
- Restore fails with syntax errors
- You suspect backup corruption or truncation
- You want to verify backup integrity before restore
- Restore reports millions of errors
Checks performed:
- File format detection (custom dump vs SQL)
- PGDMP signature verification
- Gzip integrity validation
- COPY block termination check
- pg_restore --list verification
- Cluster archive structure validation
Examples:
# Diagnose a single dump file
dbbackup restore diagnose mydb.dump.gz
# Diagnose with verbose output
dbbackup restore diagnose mydb.sql.gz --verbose
# Diagnose cluster archive and all contained dumps
dbbackup restore diagnose cluster_backup.tar.gz --deep
# Output as JSON for scripting
dbbackup restore diagnose mydb.dump --json
`,
Args: cobra.ExactArgs(1),
RunE: runRestoreDiagnose,
}
func init() {
rootCmd.AddCommand(restoreCmd)
restoreCmd.AddCommand(restoreSingleCmd)
restoreCmd.AddCommand(restoreClusterCmd)
restoreCmd.AddCommand(restoreListCmd)
restoreCmd.AddCommand(restorePITRCmd)
restoreCmd.AddCommand(restoreDiagnoseCmd)
// Single restore flags
restoreSingleCmd.Flags().BoolVar(&restoreConfirm, "confirm", false, "Confirm and execute restore (required)")
@@ -232,6 +280,8 @@ func init() {
restoreSingleCmd.Flags().BoolVar(&restoreNoProgress, "no-progress", false, "Disable progress indicators")
restoreSingleCmd.Flags().StringVar(&restoreEncryptionKeyFile, "encryption-key-file", "", "Path to encryption key file (required for encrypted backups)")
restoreSingleCmd.Flags().StringVar(&restoreEncryptionKeyEnv, "encryption-key-env", "DBBACKUP_ENCRYPTION_KEY", "Environment variable containing encryption key")
restoreSingleCmd.Flags().BoolVar(&restoreDiagnose, "diagnose", false, "Run deep diagnosis before restore to detect corruption/truncation")
restoreSingleCmd.Flags().StringVar(&restoreSaveDebugLog, "save-debug-log", "", "Save detailed error report to file on failure (e.g., /tmp/restore-debug.json)")
// Cluster restore flags
restoreClusterCmd.Flags().BoolVar(&restoreConfirm, "confirm", false, "Confirm and execute restore (required)")
@@ -244,6 +294,8 @@ func init() {
restoreClusterCmd.Flags().BoolVar(&restoreNoProgress, "no-progress", false, "Disable progress indicators")
restoreClusterCmd.Flags().StringVar(&restoreEncryptionKeyFile, "encryption-key-file", "", "Path to encryption key file (required for encrypted backups)")
restoreClusterCmd.Flags().StringVar(&restoreEncryptionKeyEnv, "encryption-key-env", "DBBACKUP_ENCRYPTION_KEY", "Environment variable containing encryption key")
restoreClusterCmd.Flags().BoolVar(&restoreDiagnose, "diagnose", false, "Run deep diagnosis on all dumps before restore")
restoreClusterCmd.Flags().StringVar(&restoreSaveDebugLog, "save-debug-log", "", "Save detailed error report to file on failure (e.g., /tmp/restore-debug.json)")
// PITR restore flags
restorePITRCmd.Flags().StringVar(&pitrBaseBackup, "base-backup", "", "Path to base backup file (.tar.gz) (required)")
@@ -264,6 +316,117 @@ func init() {
restorePITRCmd.MarkFlagRequired("base-backup")
restorePITRCmd.MarkFlagRequired("wal-archive")
restorePITRCmd.MarkFlagRequired("target-dir")
// Diagnose flags
restoreDiagnoseCmd.Flags().BoolVar(&diagnoseJSON, "json", false, "Output diagnosis as JSON")
restoreDiagnoseCmd.Flags().BoolVar(&diagnoseDeep, "deep", false, "For cluster archives, extract and diagnose all contained dumps")
restoreDiagnoseCmd.Flags().BoolVar(&diagnoseKeepTemp, "keep-temp", false, "Keep temporary extraction directory (for debugging)")
restoreDiagnoseCmd.Flags().BoolVar(&restoreVerbose, "verbose", false, "Show detailed analysis progress")
}
// runRestoreDiagnose diagnoses backup files
func runRestoreDiagnose(cmd *cobra.Command, args []string) error {
archivePath := args[0]
// Convert to absolute path
if !filepath.IsAbs(archivePath) {
absPath, err := filepath.Abs(archivePath)
if err != nil {
return fmt.Errorf("invalid archive path: %w", err)
}
archivePath = absPath
}
// Check if file exists
if _, err := os.Stat(archivePath); err != nil {
return fmt.Errorf("archive not found: %s", archivePath)
}
log.Info("🔍 Diagnosing backup file", "path", archivePath)
diagnoser := restore.NewDiagnoser(log, restoreVerbose)
// Check if it's a cluster archive that needs deep analysis
format := restore.DetectArchiveFormat(archivePath)
if format.IsClusterBackup() && diagnoseDeep {
// Create temp directory for extraction
tempDir, err := os.MkdirTemp("", "dbbackup-diagnose-*")
if err != nil {
return fmt.Errorf("failed to create temp directory: %w", err)
}
if !diagnoseKeepTemp {
defer os.RemoveAll(tempDir)
} else {
log.Info("Temp directory preserved", "path", tempDir)
}
log.Info("Extracting cluster archive for deep analysis...")
// Extract and diagnose all dumps
results, err := diagnoser.DiagnoseClusterDumps(archivePath, tempDir)
if err != nil {
return fmt.Errorf("cluster diagnosis failed: %w", err)
}
// Output results
var hasErrors bool
for _, result := range results {
if diagnoseJSON {
diagnoser.PrintDiagnosisJSON(result)
} else {
diagnoser.PrintDiagnosis(result)
}
if !result.IsValid {
hasErrors = true
}
}
// Summary
if !diagnoseJSON {
fmt.Println("\n" + strings.Repeat("=", 70))
fmt.Printf("📊 CLUSTER SUMMARY: %d databases analyzed\n", len(results))
validCount := 0
for _, r := range results {
if r.IsValid {
validCount++
}
}
if validCount == len(results) {
fmt.Println("✅ All dumps are valid")
} else {
fmt.Printf("❌ %d/%d dumps have issues\n", len(results)-validCount, len(results))
}
fmt.Println(strings.Repeat("=", 70))
}
if hasErrors {
return fmt.Errorf("one or more dumps have validation errors")
}
return nil
}
// Single file diagnosis
result, err := diagnoser.DiagnoseFile(archivePath)
if err != nil {
return fmt.Errorf("diagnosis failed: %w", err)
}
if diagnoseJSON {
diagnoser.PrintDiagnosisJSON(result)
} else {
diagnoser.PrintDiagnosis(result)
}
if !result.IsValid {
return fmt.Errorf("backup file has validation errors")
}
log.Info("✅ Backup file appears valid")
return nil
}
// runRestoreSingle restores a single database
@@ -401,6 +564,12 @@ func runRestoreSingle(cmd *cobra.Command, args []string) error {
// Create restore engine
engine := restore.New(cfg, log, db)
// Enable debug logging if requested
if restoreSaveDebugLog != "" {
engine.SetDebugLogPath(restoreSaveDebugLog)
log.Info("Debug logging enabled", "output", restoreSaveDebugLog)
}
// Setup signal handling
ctx, cancel := context.WithCancel(context.Background())
@@ -416,6 +585,37 @@ func runRestoreSingle(cmd *cobra.Command, args []string) error {
cancel()
}()
// Run pre-restore diagnosis if requested
if restoreDiagnose {
log.Info("🔍 Running pre-restore diagnosis...")
diagnoser := restore.NewDiagnoser(log, restoreVerbose)
result, err := diagnoser.DiagnoseFile(archivePath)
if err != nil {
return fmt.Errorf("diagnosis failed: %w", err)
}
diagnoser.PrintDiagnosis(result)
if !result.IsValid {
log.Error("❌ Pre-restore diagnosis found issues")
if result.IsTruncated {
log.Error(" The backup file appears to be TRUNCATED")
}
if result.IsCorrupted {
log.Error(" The backup file appears to be CORRUPTED")
}
fmt.Println("\nUse --force to attempt restore anyway.")
if !restoreForce {
return fmt.Errorf("aborting restore due to backup file issues")
}
log.Warn("Continuing despite diagnosis errors (--force enabled)")
} else {
log.Info("✅ Backup file passed diagnosis")
}
}
// Execute restore
log.Info("Starting restore...", "database", targetDB)
@@ -584,6 +784,12 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
// Create restore engine
engine := restore.New(cfg, log, db)
// Enable debug logging if requested
if restoreSaveDebugLog != "" {
engine.SetDebugLogPath(restoreSaveDebugLog)
log.Info("Debug logging enabled", "output", restoreSaveDebugLog)
}
// Setup signal handling
ctx, cancel := context.WithCancel(context.Background())
@@ -620,6 +826,52 @@ func runRestoreCluster(cmd *cobra.Command, args []string) error {
log.Info("Database cleanup completed")
}
// Run pre-restore diagnosis if requested
if restoreDiagnose {
log.Info("🔍 Running pre-restore diagnosis...")
// Create temp directory for extraction
diagTempDir, err := os.MkdirTemp("", "dbbackup-diagnose-*")
if err != nil {
return fmt.Errorf("failed to create temp directory for diagnosis: %w", err)
}
defer os.RemoveAll(diagTempDir)
diagnoser := restore.NewDiagnoser(log, restoreVerbose)
results, err := diagnoser.DiagnoseClusterDumps(archivePath, diagTempDir)
if err != nil {
return fmt.Errorf("diagnosis failed: %w", err)
}
// Check for any invalid dumps
var invalidDumps []string
for _, result := range results {
if !result.IsValid {
invalidDumps = append(invalidDumps, result.FileName)
diagnoser.PrintDiagnosis(result)
}
}
if len(invalidDumps) > 0 {
log.Error("❌ Pre-restore diagnosis found issues",
"invalid_dumps", len(invalidDumps),
"total_dumps", len(results))
fmt.Println("\n⚠ The following dumps have issues and will likely fail during restore:")
for _, name := range invalidDumps {
fmt.Printf(" - %s\n", name)
}
fmt.Println("\nRun 'dbbackup restore diagnose <archive> --deep' for full details.")
fmt.Println("Use --force to attempt restore anyway.")
if !restoreForce {
return fmt.Errorf("aborting restore due to %d invalid dump(s)", len(invalidDumps))
}
log.Warn("Continuing despite diagnosis errors (--force enabled)")
} else {
log.Info("✅ All dumps passed diagnosis", "count", len(results))
}
}
// Execute cluster restore
log.Info("Starting cluster restore...")

View File

@@ -502,7 +502,23 @@ func (e *Engine) BackupCluster(ctx context.Context) error {
cmd := e.db.BuildBackupCommand(name, dumpFile, options)
dbCtx, cancel := context.WithTimeout(ctx, 2*time.Hour)
// Calculate timeout based on database size:
// - Minimum 2 hours for small databases
// - Add 1 hour per 20GB for large databases
// - This allows ~69GB database to take up to 5+ hours
timeout := 2 * time.Hour
if size, err := e.db.GetDatabaseSize(ctx, name); err == nil {
sizeGB := size / (1024 * 1024 * 1024)
if sizeGB > 20 {
extraHours := (sizeGB / 20) + 1
timeout = time.Duration(2+extraHours) * time.Hour
mu.Lock()
e.printf(" Extended timeout: %v (for %dGB database)\n", timeout, sizeGB)
mu.Unlock()
}
}
dbCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
err := e.executeCommand(dbCtx, cmd, dumpFile)
cancel()
@@ -1352,20 +1368,53 @@ func (e *Engine) executeWithStreamingCompression(ctx context.Context, cmdArgs []
// Then start pg_dump
if err := dumpCmd.Start(); err != nil {
compressCmd.Process.Kill()
return fmt.Errorf("failed to start pg_dump: %w", err)
}
// Wait for pg_dump to complete
if err := dumpCmd.Wait(); err != nil {
return fmt.Errorf("pg_dump failed: %w", err)
// Wait for pg_dump in a goroutine to handle context timeout properly
// This prevents deadlock if pipe buffer fills and pg_dump blocks
dumpDone := make(chan error, 1)
go func() {
dumpDone <- dumpCmd.Wait()
}()
var dumpErr error
select {
case dumpErr = <-dumpDone:
// pg_dump completed (success or failure)
case <-ctx.Done():
// Context cancelled/timeout - kill pg_dump to unblock
e.log.Warn("Backup timeout - killing pg_dump process")
dumpCmd.Process.Kill()
<-dumpDone // Wait for goroutine to finish
dumpErr = ctx.Err()
}
// Close stdout pipe to signal compressor we're done
// This MUST happen after pg_dump exits to avoid broken pipe
dumpStdout.Close()
// Wait for compression to complete
if err := compressCmd.Wait(); err != nil {
return fmt.Errorf("compression failed: %w", err)
compressErr := compressCmd.Wait()
// Check errors - compressor failure first (it's usually the root cause)
if compressErr != nil {
e.log.Error("Compressor failed", "error", compressErr)
return fmt.Errorf("compression failed (check disk space): %w", compressErr)
}
if dumpErr != nil {
// Check for SIGPIPE (exit code 141) - indicates compressor died first
if exitErr, ok := dumpErr.(*exec.ExitError); ok && exitErr.ExitCode() == 141 {
e.log.Error("pg_dump received SIGPIPE - compressor may have failed")
return fmt.Errorf("pg_dump broken pipe - check disk space and compressor")
}
return fmt.Errorf("pg_dump failed: %w", dumpErr)
}
// Sync file to disk to ensure durability (prevents truncation on power loss)
if err := outFile.Sync(); err != nil {
e.log.Warn("Failed to sync output file", "error", err)
}
e.log.Debug("Streaming compression completed", "output", compressedFile)

View File

@@ -64,6 +64,9 @@ type Config struct {
// Cluster parallelism
ClusterParallelism int // Number of concurrent databases during cluster operations (0 = sequential)
// Working directory for large operations (extraction, diagnosis)
WorkDir string // Alternative temp directory for large operations (default: system temp)
// Swap file management (for large backups)
SwapFilePath string // Path to temporary swap file
SwapFileSizeGB int // Size in GB (0 = disabled)

View File

@@ -22,6 +22,7 @@ type LocalConfig struct {
// Backup settings
BackupDir string
WorkDir string // Working directory for large operations
Compression int
Jobs int
DumpJobs int
@@ -97,6 +98,8 @@ func LoadLocalConfig() (*LocalConfig, error) {
switch key {
case "backup_dir":
cfg.BackupDir = value
case "work_dir":
cfg.WorkDir = value
case "compression":
if c, err := strconv.Atoi(value); err == nil {
cfg.Compression = c
@@ -174,6 +177,9 @@ func SaveLocalConfig(cfg *LocalConfig) error {
if cfg.BackupDir != "" {
sb.WriteString(fmt.Sprintf("backup_dir = %s\n", cfg.BackupDir))
}
if cfg.WorkDir != "" {
sb.WriteString(fmt.Sprintf("work_dir = %s\n", cfg.WorkDir))
}
if cfg.Compression != 0 {
sb.WriteString(fmt.Sprintf("compression = %d\n", cfg.Compression))
}
@@ -244,6 +250,9 @@ func ApplyLocalConfig(cfg *Config, local *LocalConfig) {
if local.BackupDir != "" {
cfg.BackupDir = local.BackupDir
}
if local.WorkDir != "" {
cfg.WorkDir = local.WorkDir
}
if cfg.CompressionLevel == 6 && local.Compression != 0 {
cfg.CompressionLevel = local.Compression
}
@@ -280,6 +289,7 @@ func ConfigFromConfig(cfg *Config) *LocalConfig {
Database: cfg.Database,
SSLMode: cfg.SSLMode,
BackupDir: cfg.BackupDir,
WorkDir: cfg.WorkDir,
Compression: cfg.CompressionLevel,
Jobs: cfg.Jobs,
DumpJobs: cfg.DumpJobs,

View File

@@ -126,13 +126,46 @@ func (m *MySQL) ListTables(ctx context.Context, database string) ([]string, erro
return tables, rows.Err()
}
// validateMySQLIdentifier checks if a database/table name is safe for use in SQL
// Prevents SQL injection by only allowing alphanumeric names with underscores
func validateMySQLIdentifier(name string) error {
if len(name) == 0 {
return fmt.Errorf("identifier cannot be empty")
}
if len(name) > 64 {
return fmt.Errorf("identifier too long (max 64 chars): %s", name)
}
// Only allow alphanumeric, underscores, and must start with letter or underscore
for i, c := range name {
if i == 0 && !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_') {
return fmt.Errorf("identifier must start with letter or underscore: %s", name)
}
if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_') {
return fmt.Errorf("identifier contains invalid character %q: %s", c, name)
}
}
return nil
}
// quoteMySQLIdentifier safely quotes a MySQL identifier
func quoteMySQLIdentifier(name string) string {
// Escape any backticks by doubling them and wrap in backticks
return "`" + strings.ReplaceAll(name, "`", "``") + "`"
}
// CreateDatabase creates a new database
func (m *MySQL) CreateDatabase(ctx context.Context, name string) error {
if m.db == nil {
return fmt.Errorf("not connected to database")
}
query := fmt.Sprintf("CREATE DATABASE IF NOT EXISTS `%s`", name)
// Validate identifier to prevent SQL injection
if err := validateMySQLIdentifier(name); err != nil {
return fmt.Errorf("invalid database name: %w", err)
}
// Use safe quoting for identifier
query := fmt.Sprintf("CREATE DATABASE IF NOT EXISTS %s", quoteMySQLIdentifier(name))
_, err := m.db.ExecContext(ctx, query)
if err != nil {
return fmt.Errorf("failed to create database %s: %w", name, err)
@@ -148,7 +181,13 @@ func (m *MySQL) DropDatabase(ctx context.Context, name string) error {
return fmt.Errorf("not connected to database")
}
query := fmt.Sprintf("DROP DATABASE IF EXISTS `%s`", name)
// Validate identifier to prevent SQL injection
if err := validateMySQLIdentifier(name); err != nil {
return fmt.Errorf("invalid database name: %w", err)
}
// Use safe quoting for identifier
query := fmt.Sprintf("DROP DATABASE IF EXISTS %s", quoteMySQLIdentifier(name))
_, err := m.db.ExecContext(ctx, query)
if err != nil {
return fmt.Errorf("failed to drop database %s: %w", name, err)

View File

@@ -163,14 +163,47 @@ func (p *PostgreSQL) ListTables(ctx context.Context, database string) ([]string,
return tables, rows.Err()
}
// validateIdentifier checks if a database/table name is safe for use in SQL
// Prevents SQL injection by only allowing alphanumeric names with underscores
func validateIdentifier(name string) error {
if len(name) == 0 {
return fmt.Errorf("identifier cannot be empty")
}
if len(name) > 63 {
return fmt.Errorf("identifier too long (max 63 chars): %s", name)
}
// Only allow alphanumeric, underscores, and must start with letter or underscore
for i, c := range name {
if i == 0 && !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || c == '_') {
return fmt.Errorf("identifier must start with letter or underscore: %s", name)
}
if !((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') || (c >= '0' && c <= '9') || c == '_') {
return fmt.Errorf("identifier contains invalid character %q: %s", c, name)
}
}
return nil
}
// quoteIdentifier safely quotes a PostgreSQL identifier
func quoteIdentifier(name string) string {
// Double any existing double quotes and wrap in double quotes
return `"` + strings.ReplaceAll(name, `"`, `""`) + `"`
}
// CreateDatabase creates a new database
func (p *PostgreSQL) CreateDatabase(ctx context.Context, name string) error {
if p.db == nil {
return fmt.Errorf("not connected to database")
}
// Validate identifier to prevent SQL injection
if err := validateIdentifier(name); err != nil {
return fmt.Errorf("invalid database name: %w", err)
}
// PostgreSQL doesn't support CREATE DATABASE in transactions or prepared statements
query := fmt.Sprintf("CREATE DATABASE %s", name)
// Use quoted identifier for safety
query := fmt.Sprintf("CREATE DATABASE %s", quoteIdentifier(name))
_, err := p.db.ExecContext(ctx, query)
if err != nil {
return fmt.Errorf("failed to create database %s: %w", name, err)
@@ -186,8 +219,14 @@ func (p *PostgreSQL) DropDatabase(ctx context.Context, name string) error {
return fmt.Errorf("not connected to database")
}
// Validate identifier to prevent SQL injection
if err := validateIdentifier(name); err != nil {
return fmt.Errorf("invalid database name: %w", err)
}
// Force drop connections and drop database
query := fmt.Sprintf("DROP DATABASE IF EXISTS %s", name)
// Use quoted identifier for safety
query := fmt.Sprintf("DROP DATABASE IF EXISTS %s", quoteIdentifier(name))
_, err := p.db.ExecContext(ctx, query)
if err != nil {
return fmt.Errorf("failed to drop database %s: %w", name, err)

View File

@@ -339,7 +339,7 @@ func (e *CloneEngine) Backup(ctx context.Context, opts *BackupOptions) (*BackupR
// Save metadata
meta := &metadata.BackupMetadata{
Version: "3.1.0",
Version: "3.40.0",
Timestamp: startTime,
Database: opts.Database,
DatabaseType: "mysql",

View File

@@ -254,7 +254,7 @@ func (e *MySQLDumpEngine) Backup(ctx context.Context, opts *BackupOptions) (*Bac
// Save metadata
meta := &metadata.BackupMetadata{
Version: "3.1.0",
Version: "3.40.0",
Timestamp: startTime,
Database: opts.Database,
DatabaseType: "mysql",

View File

@@ -223,7 +223,7 @@ func (e *SnapshotEngine) Backup(ctx context.Context, opts *BackupOptions) (*Back
// Save metadata
meta := &metadata.BackupMetadata{
Version: "3.1.0",
Version: "3.40.0",
Timestamp: startTime,
Database: opts.Database,
DatabaseType: "mysql",

View File

@@ -0,0 +1,11 @@
// Package installer provides systemd service installation for dbbackup
package installer
import (
"embed"
)
// Templates contains embedded systemd unit files
//
//go:embed templates/*.service templates/*.timer
var Templates embed.FS

View File

@@ -0,0 +1,680 @@
// Package installer provides systemd service installation for dbbackup
package installer
import (
"context"
"fmt"
"io"
"os"
"os/exec"
"os/user"
"path/filepath"
"runtime"
"strings"
"text/template"
"dbbackup/internal/logger"
)
// Installer handles systemd service installation
type Installer struct {
log logger.Logger
unitDir string // /etc/systemd/system or custom
dryRun bool
}
// InstallOptions configures the installation
type InstallOptions struct {
// Instance name (e.g., "production", "staging")
Instance string
// Binary path (auto-detected if empty)
BinaryPath string
// Backup configuration
BackupType string // "single" or "cluster"
Schedule string // OnCalendar format, e.g., "daily", "*-*-* 02:00:00"
// Service user/group
User string
Group string
// Paths
BackupDir string
ConfigPath string
// Timeout in seconds (default: 3600)
TimeoutSeconds int
// Metrics
WithMetrics bool
MetricsPort int
}
// ServiceStatus contains information about installed services
type ServiceStatus struct {
Installed bool
Enabled bool
Active bool
TimerEnabled bool
TimerActive bool
LastRun string
NextRun string
ServicePath string
TimerPath string
ExporterPath string
}
// NewInstaller creates a new Installer
func NewInstaller(log logger.Logger, dryRun bool) *Installer {
return &Installer{
log: log,
unitDir: "/etc/systemd/system",
dryRun: dryRun,
}
}
// SetUnitDir allows overriding the systemd unit directory (for testing)
func (i *Installer) SetUnitDir(dir string) {
i.unitDir = dir
}
// Install installs the systemd service and timer
func (i *Installer) Install(ctx context.Context, opts InstallOptions) error {
// Validate platform
if runtime.GOOS != "linux" {
return fmt.Errorf("systemd installation only supported on Linux (current: %s)", runtime.GOOS)
}
// Validate prerequisites
if err := i.validatePrerequisites(); err != nil {
return err
}
// Set defaults
if err := i.setDefaults(&opts); err != nil {
return err
}
// Create user if needed
if err := i.ensureUser(opts.User, opts.Group); err != nil {
return err
}
// Create directories
if err := i.createDirectories(opts); err != nil {
return err
}
// Copy binary to /usr/local/bin (required for ProtectHome=yes)
if err := i.copyBinary(&opts); err != nil {
return err
}
// Write service and timer files
if err := i.writeUnitFiles(opts); err != nil {
return err
}
// Reload systemd
if err := i.systemctl(ctx, "daemon-reload"); err != nil {
return err
}
// Enable timer
timerName := i.getTimerName(opts)
if err := i.systemctl(ctx, "enable", timerName); err != nil {
return err
}
// Install metrics exporter if requested
if opts.WithMetrics {
if err := i.installExporter(ctx, opts); err != nil {
i.log.Warn("Failed to install metrics exporter", "error", err)
}
}
i.log.Info("Installation complete",
"instance", opts.Instance,
"timer", timerName,
"schedule", opts.Schedule)
i.printNextSteps(opts)
return nil
}
// Uninstall removes the systemd service and timer
func (i *Installer) Uninstall(ctx context.Context, instance string, purge bool) error {
if runtime.GOOS != "linux" {
return fmt.Errorf("systemd uninstallation only supported on Linux")
}
if err := i.validatePrerequisites(); err != nil {
return err
}
// Determine service names
var serviceName, timerName string
if instance == "cluster" || instance == "" {
serviceName = "dbbackup-cluster.service"
timerName = "dbbackup-cluster.timer"
} else {
serviceName = fmt.Sprintf("dbbackup@%s.service", instance)
timerName = fmt.Sprintf("dbbackup@%s.timer", instance)
}
// Stop and disable timer
_ = i.systemctl(ctx, "stop", timerName)
_ = i.systemctl(ctx, "disable", timerName)
// Stop and disable service
_ = i.systemctl(ctx, "stop", serviceName)
_ = i.systemctl(ctx, "disable", serviceName)
// Remove unit files
servicePath := filepath.Join(i.unitDir, serviceName)
timerPath := filepath.Join(i.unitDir, timerName)
if !i.dryRun {
os.Remove(servicePath)
os.Remove(timerPath)
} else {
i.log.Info("Would remove", "service", servicePath)
i.log.Info("Would remove", "timer", timerPath)
}
// Also try to remove template units if they exist
if instance != "cluster" && instance != "" {
templateService := filepath.Join(i.unitDir, "dbbackup@.service")
templateTimer := filepath.Join(i.unitDir, "dbbackup@.timer")
// Only remove templates if no other instances are using them
if i.canRemoveTemplates() {
if !i.dryRun {
os.Remove(templateService)
os.Remove(templateTimer)
}
}
}
// Remove exporter
exporterPath := filepath.Join(i.unitDir, "dbbackup-exporter.service")
_ = i.systemctl(ctx, "stop", "dbbackup-exporter.service")
_ = i.systemctl(ctx, "disable", "dbbackup-exporter.service")
if !i.dryRun {
os.Remove(exporterPath)
}
// Reload systemd
_ = i.systemctl(ctx, "daemon-reload")
// Purge config files if requested
if purge {
configDirs := []string{
"/etc/dbbackup",
"/var/lib/dbbackup",
}
for _, dir := range configDirs {
if !i.dryRun {
if err := os.RemoveAll(dir); err != nil {
i.log.Warn("Failed to remove directory", "path", dir, "error", err)
} else {
i.log.Info("Removed directory", "path", dir)
}
} else {
i.log.Info("Would remove directory", "path", dir)
}
}
}
i.log.Info("Uninstallation complete", "instance", instance, "purge", purge)
return nil
}
// Status returns the current installation status
func (i *Installer) Status(ctx context.Context, instance string) (*ServiceStatus, error) {
if runtime.GOOS != "linux" {
return nil, fmt.Errorf("systemd status only supported on Linux")
}
status := &ServiceStatus{}
// Determine service names
var serviceName, timerName string
if instance == "cluster" || instance == "" {
serviceName = "dbbackup-cluster.service"
timerName = "dbbackup-cluster.timer"
} else {
serviceName = fmt.Sprintf("dbbackup@%s.service", instance)
timerName = fmt.Sprintf("dbbackup@%s.timer", instance)
}
// Check service file exists
status.ServicePath = filepath.Join(i.unitDir, serviceName)
if _, err := os.Stat(status.ServicePath); err == nil {
status.Installed = true
}
// Check timer file exists
status.TimerPath = filepath.Join(i.unitDir, timerName)
// Check exporter
status.ExporterPath = filepath.Join(i.unitDir, "dbbackup-exporter.service")
// Check enabled/active status
if status.Installed {
status.Enabled = i.isEnabled(ctx, serviceName)
status.Active = i.isActive(ctx, serviceName)
status.TimerEnabled = i.isEnabled(ctx, timerName)
status.TimerActive = i.isActive(ctx, timerName)
// Get timer info
status.NextRun = i.getTimerNext(ctx, timerName)
status.LastRun = i.getTimerLast(ctx, timerName)
}
return status, nil
}
// validatePrerequisites checks system requirements
func (i *Installer) validatePrerequisites() error {
// Check root (skip in dry-run mode)
if os.Getuid() != 0 && !i.dryRun {
return fmt.Errorf("installation requires root privileges (use sudo)")
}
// Check systemd
if _, err := exec.LookPath("systemctl"); err != nil {
return fmt.Errorf("systemctl not found - is this a systemd-based system?")
}
// Check for container environment
if _, err := os.Stat("/.dockerenv"); err == nil {
i.log.Warn("Running inside Docker container - systemd may not work correctly")
}
return nil
}
// setDefaults fills in default values
func (i *Installer) setDefaults(opts *InstallOptions) error {
// Auto-detect binary path
if opts.BinaryPath == "" {
binPath, err := os.Executable()
if err != nil {
return fmt.Errorf("failed to detect binary path: %w", err)
}
binPath, err = filepath.EvalSymlinks(binPath)
if err != nil {
return fmt.Errorf("failed to resolve binary path: %w", err)
}
opts.BinaryPath = binPath
}
// Default instance
if opts.Instance == "" {
opts.Instance = "default"
}
// Default backup type
if opts.BackupType == "" {
opts.BackupType = "single"
}
// Default schedule (daily at 2am)
if opts.Schedule == "" {
opts.Schedule = "*-*-* 02:00:00"
}
// Default user/group
if opts.User == "" {
opts.User = "dbbackup"
}
if opts.Group == "" {
opts.Group = "dbbackup"
}
// Default paths
if opts.BackupDir == "" {
opts.BackupDir = "/var/lib/dbbackup/backups"
}
if opts.ConfigPath == "" {
opts.ConfigPath = "/etc/dbbackup/dbbackup.conf"
}
// Default timeout (1 hour)
if opts.TimeoutSeconds == 0 {
opts.TimeoutSeconds = 3600
}
// Default metrics port
if opts.MetricsPort == 0 {
opts.MetricsPort = 9399
}
return nil
}
// ensureUser creates the service user if it doesn't exist
func (i *Installer) ensureUser(username, groupname string) error {
// Check if user exists
if _, err := user.Lookup(username); err == nil {
i.log.Debug("User already exists", "user", username)
return nil
}
if i.dryRun {
i.log.Info("Would create user", "user", username, "group", groupname)
return nil
}
// Create group first
groupCmd := exec.Command("groupadd", "--system", groupname)
if output, err := groupCmd.CombinedOutput(); err != nil {
// Ignore if group already exists
if !strings.Contains(string(output), "already exists") {
i.log.Debug("Group creation output", "output", string(output))
}
}
// Create user
userCmd := exec.Command("useradd",
"--system",
"--shell", "/usr/sbin/nologin",
"--home-dir", "/var/lib/dbbackup",
"--gid", groupname,
username)
if output, err := userCmd.CombinedOutput(); err != nil {
if !strings.Contains(string(output), "already exists") {
return fmt.Errorf("failed to create user %s: %w (%s)", username, err, output)
}
}
i.log.Info("Created system user", "user", username, "group", groupname)
return nil
}
// createDirectories creates required directories
func (i *Installer) createDirectories(opts InstallOptions) error {
dirs := []struct {
path string
mode os.FileMode
}{
{"/etc/dbbackup", 0755},
{"/etc/dbbackup/env.d", 0700},
{"/var/lib/dbbackup", 0750},
{"/var/lib/dbbackup/backups", 0750},
{"/var/lib/dbbackup/metrics", 0755},
{"/var/log/dbbackup", 0750},
{opts.BackupDir, 0750},
}
for _, d := range dirs {
if i.dryRun {
i.log.Info("Would create directory", "path", d.path, "mode", d.mode)
continue
}
if err := os.MkdirAll(d.path, d.mode); err != nil {
return fmt.Errorf("failed to create directory %s: %w", d.path, err)
}
// Set ownership
u, err := user.Lookup(opts.User)
if err == nil {
var uid, gid int
fmt.Sscanf(u.Uid, "%d", &uid)
fmt.Sscanf(u.Gid, "%d", &gid)
os.Chown(d.path, uid, gid)
}
}
return nil
}
// copyBinary copies the binary to /usr/local/bin for systemd access
// This is required because ProtectHome=yes blocks access to home directories
func (i *Installer) copyBinary(opts *InstallOptions) error {
const installPath = "/usr/local/bin/dbbackup"
// Check if binary is already in a system path
if opts.BinaryPath == installPath {
return nil
}
if i.dryRun {
i.log.Info("Would copy binary", "from", opts.BinaryPath, "to", installPath)
opts.BinaryPath = installPath
return nil
}
// Read source binary
src, err := os.Open(opts.BinaryPath)
if err != nil {
return fmt.Errorf("failed to open source binary: %w", err)
}
defer src.Close()
// Create destination
dst, err := os.OpenFile(installPath, os.O_CREATE|os.O_WRONLY|os.O_TRUNC, 0755)
if err != nil {
return fmt.Errorf("failed to create %s: %w", installPath, err)
}
defer dst.Close()
// Copy
if _, err := io.Copy(dst, src); err != nil {
return fmt.Errorf("failed to copy binary: %w", err)
}
i.log.Info("Copied binary", "from", opts.BinaryPath, "to", installPath)
opts.BinaryPath = installPath
return nil
}
// writeUnitFiles renders and writes the systemd unit files
func (i *Installer) writeUnitFiles(opts InstallOptions) error {
// Prepare template data
data := map[string]interface{}{
"User": opts.User,
"Group": opts.Group,
"BinaryPath": opts.BinaryPath,
"BackupType": opts.BackupType,
"BackupDir": opts.BackupDir,
"ConfigPath": opts.ConfigPath,
"TimeoutSeconds": opts.TimeoutSeconds,
"Schedule": opts.Schedule,
"MetricsPort": opts.MetricsPort,
}
// Determine which templates to use
var serviceTemplate, timerTemplate string
var serviceName, timerName string
if opts.BackupType == "cluster" {
serviceTemplate = "templates/dbbackup-cluster.service"
timerTemplate = "templates/dbbackup-cluster.timer"
serviceName = "dbbackup-cluster.service"
timerName = "dbbackup-cluster.timer"
} else {
serviceTemplate = "templates/dbbackup@.service"
timerTemplate = "templates/dbbackup@.timer"
serviceName = "dbbackup@.service"
timerName = "dbbackup@.timer"
}
// Write service file
if err := i.writeTemplateFile(serviceTemplate, serviceName, data); err != nil {
return fmt.Errorf("failed to write service file: %w", err)
}
// Write timer file
if err := i.writeTemplateFile(timerTemplate, timerName, data); err != nil {
return fmt.Errorf("failed to write timer file: %w", err)
}
return nil
}
// writeTemplateFile reads an embedded template and writes it to the unit directory
func (i *Installer) writeTemplateFile(templatePath, outputName string, data map[string]interface{}) error {
// Read template
content, err := Templates.ReadFile(templatePath)
if err != nil {
return fmt.Errorf("failed to read template %s: %w", templatePath, err)
}
// Parse template
tmpl, err := template.New(outputName).Parse(string(content))
if err != nil {
return fmt.Errorf("failed to parse template %s: %w", templatePath, err)
}
// Render template
var buf strings.Builder
if err := tmpl.Execute(&buf, data); err != nil {
return fmt.Errorf("failed to render template %s: %w", templatePath, err)
}
// Write file
outputPath := filepath.Join(i.unitDir, outputName)
if i.dryRun {
i.log.Info("Would write unit file", "path", outputPath)
i.log.Debug("Unit file content", "content", buf.String())
return nil
}
if err := os.WriteFile(outputPath, []byte(buf.String()), 0644); err != nil {
return fmt.Errorf("failed to write %s: %w", outputPath, err)
}
i.log.Info("Created unit file", "path", outputPath)
return nil
}
// installExporter installs the metrics exporter service
func (i *Installer) installExporter(ctx context.Context, opts InstallOptions) error {
data := map[string]interface{}{
"User": opts.User,
"Group": opts.Group,
"BinaryPath": opts.BinaryPath,
"ConfigPath": opts.ConfigPath,
"MetricsPort": opts.MetricsPort,
}
if err := i.writeTemplateFile("templates/dbbackup-exporter.service", "dbbackup-exporter.service", data); err != nil {
return err
}
if err := i.systemctl(ctx, "daemon-reload"); err != nil {
return err
}
if err := i.systemctl(ctx, "enable", "dbbackup-exporter.service"); err != nil {
return err
}
if err := i.systemctl(ctx, "start", "dbbackup-exporter.service"); err != nil {
return err
}
i.log.Info("Installed metrics exporter", "port", opts.MetricsPort)
return nil
}
// getTimerName returns the timer unit name for the given options
func (i *Installer) getTimerName(opts InstallOptions) string {
if opts.BackupType == "cluster" {
return "dbbackup-cluster.timer"
}
return fmt.Sprintf("dbbackup@%s.timer", opts.Instance)
}
// systemctl runs a systemctl command
func (i *Installer) systemctl(ctx context.Context, args ...string) error {
if i.dryRun {
i.log.Info("Would run: systemctl", "args", args)
return nil
}
cmd := exec.CommandContext(ctx, "systemctl", args...)
output, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("systemctl %v failed: %w\n%s", args, err, string(output))
}
return nil
}
// isEnabled checks if a unit is enabled
func (i *Installer) isEnabled(ctx context.Context, unit string) bool {
cmd := exec.CommandContext(ctx, "systemctl", "is-enabled", unit)
return cmd.Run() == nil
}
// isActive checks if a unit is active
func (i *Installer) isActive(ctx context.Context, unit string) bool {
cmd := exec.CommandContext(ctx, "systemctl", "is-active", unit)
return cmd.Run() == nil
}
// getTimerNext gets the next run time for a timer
func (i *Installer) getTimerNext(ctx context.Context, timer string) string {
cmd := exec.CommandContext(ctx, "systemctl", "show", timer, "--property=NextElapseUSecRealtime", "--value")
output, err := cmd.Output()
if err != nil {
return ""
}
return strings.TrimSpace(string(output))
}
// getTimerLast gets the last run time for a timer
func (i *Installer) getTimerLast(ctx context.Context, timer string) string {
cmd := exec.CommandContext(ctx, "systemctl", "show", timer, "--property=LastTriggerUSec", "--value")
output, err := cmd.Output()
if err != nil {
return ""
}
return strings.TrimSpace(string(output))
}
// canRemoveTemplates checks if template units can be safely removed
func (i *Installer) canRemoveTemplates() bool {
// Check if any dbbackup@*.service instances exist
pattern := filepath.Join(i.unitDir, "dbbackup@*.service")
matches, _ := filepath.Glob(pattern)
// Also check for running instances
cmd := exec.Command("systemctl", "list-units", "--type=service", "--all", "dbbackup@*")
output, _ := cmd.Output()
return len(matches) == 0 && !strings.Contains(string(output), "dbbackup@")
}
// printNextSteps prints helpful next steps after installation
func (i *Installer) printNextSteps(opts InstallOptions) {
timerName := i.getTimerName(opts)
serviceName := strings.Replace(timerName, ".timer", ".service", 1)
fmt.Println()
fmt.Println("✅ Installation successful!")
fmt.Println()
fmt.Println("📋 Next steps:")
fmt.Println()
fmt.Printf(" 1. Edit configuration: sudo nano %s\n", opts.ConfigPath)
fmt.Printf(" 2. Set credentials: sudo nano /etc/dbbackup/env.d/%s.conf\n", opts.Instance)
fmt.Printf(" 3. Start the timer: sudo systemctl start %s\n", timerName)
fmt.Printf(" 4. Verify timer status: sudo systemctl status %s\n", timerName)
fmt.Printf(" 5. Run backup manually: sudo systemctl start %s\n", serviceName)
fmt.Println()
fmt.Println("📊 View backup logs:")
fmt.Printf(" journalctl -u %s -f\n", serviceName)
fmt.Println()
if opts.WithMetrics {
fmt.Println("📈 Prometheus metrics:")
fmt.Printf(" curl http://localhost:%d/metrics\n", opts.MetricsPort)
fmt.Println()
}
}

View File

@@ -0,0 +1,47 @@
[Unit]
Description=Database Cluster Backup
Documentation=https://github.com/PlusOne/dbbackup
After=network-online.target postgresql.service mysql.service mariadb.service
Wants=network-online.target
[Service]
Type=oneshot
User={{.User}}
Group={{.Group}}
# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=read-only
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
RestrictSUIDSGID=yes
RestrictRealtime=yes
LockPersonality=yes
RemoveIPC=yes
CapabilityBoundingSet=
AmbientCapabilities=
# Directories
ReadWritePaths={{.BackupDir}} /var/lib/dbbackup /var/log/dbbackup
# Network access for cloud uploads
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
# Environment
EnvironmentFile=-/etc/dbbackup/env.d/cluster.conf
# Execution - cluster backup (all databases)
ExecStart={{.BinaryPath}} backup cluster --config {{.ConfigPath}}
TimeoutStartSec={{.TimeoutSeconds}}
# Post-backup metrics export
ExecStopPost=-{{.BinaryPath}} metrics export --instance cluster --output /var/lib/dbbackup/metrics/cluster.prom
# OOM protection for large backups
OOMScoreAdjust=-500
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,11 @@
[Unit]
Description=Database Cluster Backup Timer
Documentation=https://github.com/PlusOne/dbbackup
[Timer]
OnCalendar={{.Schedule}}
Persistent=true
RandomizedDelaySec=1800
[Install]
WantedBy=timers.target

View File

@@ -0,0 +1,37 @@
[Unit]
Description=DBBackup Prometheus Metrics Exporter
Documentation=https://github.com/PlusOne/dbbackup
After=network-online.target
[Service]
Type=simple
User={{.User}}
Group={{.Group}}
# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
RestrictSUIDSGID=yes
RestrictRealtime=yes
LockPersonality=yes
RemoveIPC=yes
# Read-write access to catalog for metrics collection
ReadWritePaths=/var/lib/dbbackup
# Network for HTTP server
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
# Execution
ExecStart={{.BinaryPath}} metrics serve --port {{.MetricsPort}}
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,47 @@
[Unit]
Description=Database Backup for %i
Documentation=https://github.com/PlusOne/dbbackup
After=network-online.target postgresql.service mysql.service mariadb.service
Wants=network-online.target
[Service]
Type=oneshot
User={{.User}}
Group={{.Group}}
# Security hardening
NoNewPrivileges=yes
ProtectSystem=strict
ProtectHome=read-only
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
RestrictSUIDSGID=yes
RestrictRealtime=yes
LockPersonality=yes
RemoveIPC=yes
CapabilityBoundingSet=
AmbientCapabilities=
# Directories
ReadWritePaths={{.BackupDir}} /var/lib/dbbackup /var/log/dbbackup
# Network access for cloud uploads
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
# Environment
EnvironmentFile=-/etc/dbbackup/env.d/%i.conf
# Execution
ExecStart={{.BinaryPath}} backup {{.BackupType}} %i --config {{.ConfigPath}}
TimeoutStartSec={{.TimeoutSeconds}}
# Post-backup metrics export
ExecStopPost=-{{.BinaryPath}} metrics export --instance %i --output /var/lib/dbbackup/metrics/%i.prom
# OOM protection for large backups
OOMScoreAdjust=-500
[Install]
WantedBy=multi-user.target

View File

@@ -0,0 +1,11 @@
[Unit]
Description=Database Backup Timer for %i
Documentation=https://github.com/PlusOne/dbbackup
[Timer]
OnCalendar={{.Schedule}}
Persistent=true
RandomizedDelaySec=1800
[Install]
WantedBy=timers.target

View File

@@ -69,6 +69,7 @@ func (m *Manager) NotifySync(ctx context.Context, event *Event) error {
m.mu.RUnlock()
var errors []error
var errMu sync.Mutex
var wg sync.WaitGroup
for _, n := range notifiers {
@@ -80,7 +81,9 @@ func (m *Manager) NotifySync(ctx context.Context, event *Event) error {
go func(notifier Notifier) {
defer wg.Done()
if err := notifier.Send(ctx, event); err != nil {
errMu.Lock()
errors = append(errors, fmt.Errorf("%s: %w", notifier.Name(), err))
errMu.Unlock()
}
}(n)
}

View File

@@ -0,0 +1,174 @@
// Package prometheus provides Prometheus metrics for dbbackup
package prometheus
import (
"context"
"fmt"
"net/http"
"sync"
"time"
"dbbackup/internal/catalog"
"dbbackup/internal/logger"
)
// Exporter provides an HTTP endpoint for Prometheus metrics
type Exporter struct {
log logger.Logger
catalog catalog.Catalog
instance string
port int
mu sync.RWMutex
cachedData string
lastRefresh time.Time
refreshTTL time.Duration
}
// NewExporter creates a new Prometheus exporter
func NewExporter(log logger.Logger, cat catalog.Catalog, instance string, port int) *Exporter {
return &Exporter{
log: log,
catalog: cat,
instance: instance,
port: port,
refreshTTL: 30 * time.Second,
}
}
// Serve starts the HTTP server and blocks until context is cancelled
func (e *Exporter) Serve(ctx context.Context) error {
mux := http.NewServeMux()
// /metrics endpoint
mux.HandleFunc("/metrics", e.handleMetrics)
// /health endpoint
mux.HandleFunc("/health", e.handleHealth)
// / root with info
mux.HandleFunc("/", e.handleRoot)
addr := fmt.Sprintf(":%d", e.port)
srv := &http.Server{
Addr: addr,
Handler: mux,
ReadTimeout: 10 * time.Second,
WriteTimeout: 30 * time.Second,
IdleTimeout: 60 * time.Second,
}
// Start refresh goroutine
go e.refreshLoop(ctx)
// Graceful shutdown
go func() {
<-ctx.Done()
e.log.Info("Shutting down metrics server...")
shutdownCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := srv.Shutdown(shutdownCtx); err != nil {
e.log.Error("Server shutdown error", "error", err)
}
}()
e.log.Info("Starting Prometheus metrics server", "addr", addr)
if err := srv.ListenAndServe(); err != http.ErrServerClosed {
return fmt.Errorf("server error: %w", err)
}
return nil
}
// handleMetrics handles /metrics endpoint
func (e *Exporter) handleMetrics(w http.ResponseWriter, r *http.Request) {
e.mu.RLock()
data := e.cachedData
e.mu.RUnlock()
if data == "" {
// Force refresh if cache is empty
if err := e.refresh(); err != nil {
http.Error(w, "Failed to collect metrics", http.StatusInternalServerError)
return
}
e.mu.RLock()
data = e.cachedData
e.mu.RUnlock()
}
w.Header().Set("Content-Type", "text/plain; version=0.0.4; charset=utf-8")
w.WriteHeader(http.StatusOK)
w.Write([]byte(data))
}
// handleHealth handles /health endpoint
func (e *Exporter) handleHealth(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"status":"ok","service":"dbbackup-exporter"}`))
}
// handleRoot handles / endpoint
func (e *Exporter) handleRoot(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/" {
http.NotFound(w, r)
return
}
w.Header().Set("Content-Type", "text/html")
w.WriteHeader(http.StatusOK)
w.Write([]byte(`<!DOCTYPE html>
<html>
<head>
<title>DBBackup Exporter</title>
</head>
<body>
<h1>DBBackup Prometheus Exporter</h1>
<p>This is a Prometheus metrics exporter for DBBackup.</p>
<ul>
<li><a href="/metrics">/metrics</a> - Prometheus metrics</li>
<li><a href="/health">/health</a> - Health check</li>
</ul>
</body>
</html>`))
}
// refreshLoop periodically refreshes the metrics cache
func (e *Exporter) refreshLoop(ctx context.Context) {
ticker := time.NewTicker(e.refreshTTL)
defer ticker.Stop()
// Initial refresh
if err := e.refresh(); err != nil {
e.log.Error("Initial metrics refresh failed", "error", err)
}
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
if err := e.refresh(); err != nil {
e.log.Error("Metrics refresh failed", "error", err)
}
}
}
}
// refresh updates the cached metrics
func (e *Exporter) refresh() error {
writer := NewMetricsWriter(e.log, e.catalog, e.instance)
data, err := writer.GenerateMetricsString()
if err != nil {
return err
}
e.mu.Lock()
e.cachedData = data
e.lastRefresh = time.Now()
e.mu.Unlock()
e.log.Debug("Refreshed metrics cache")
return nil
}

View File

@@ -0,0 +1,245 @@
// Package prometheus provides Prometheus metrics for dbbackup
package prometheus
import (
"context"
"fmt"
"os"
"path/filepath"
"sort"
"strings"
"time"
"dbbackup/internal/catalog"
"dbbackup/internal/logger"
)
// MetricsWriter writes metrics in Prometheus text format
type MetricsWriter struct {
log logger.Logger
catalog catalog.Catalog
instance string
}
// NewMetricsWriter creates a new MetricsWriter
func NewMetricsWriter(log logger.Logger, cat catalog.Catalog, instance string) *MetricsWriter {
return &MetricsWriter{
log: log,
catalog: cat,
instance: instance,
}
}
// BackupMetrics holds metrics for a single database
type BackupMetrics struct {
Database string
Engine string
LastSuccess time.Time
LastDuration time.Duration
LastSize int64
TotalBackups int
SuccessCount int
FailureCount int
Verified bool
RPOSeconds float64
}
// WriteTextfile writes metrics to a Prometheus textfile collector file
func (m *MetricsWriter) WriteTextfile(path string) error {
metrics, err := m.collectMetrics()
if err != nil {
return fmt.Errorf("failed to collect metrics: %w", err)
}
output := m.formatMetrics(metrics)
// Atomic write: write to temp file, then rename
dir := filepath.Dir(path)
if err := os.MkdirAll(dir, 0755); err != nil {
return fmt.Errorf("failed to create directory %s: %w", dir, err)
}
tmpPath := path + ".tmp"
if err := os.WriteFile(tmpPath, []byte(output), 0644); err != nil {
return fmt.Errorf("failed to write temp file: %w", err)
}
if err := os.Rename(tmpPath, path); err != nil {
os.Remove(tmpPath)
return fmt.Errorf("failed to rename temp file: %w", err)
}
m.log.Debug("Wrote metrics to textfile", "path", path, "databases", len(metrics))
return nil
}
// collectMetrics gathers metrics from the catalog
func (m *MetricsWriter) collectMetrics() ([]BackupMetrics, error) {
if m.catalog == nil {
return nil, fmt.Errorf("catalog not available")
}
ctx := context.Background()
// Get recent backups using Search with limit
query := &catalog.SearchQuery{
Limit: 1000,
}
entries, err := m.catalog.Search(ctx, query)
if err != nil {
return nil, fmt.Errorf("failed to search backups: %w", err)
}
// Group by database
byDB := make(map[string]*BackupMetrics)
for _, e := range entries {
key := e.Database
if key == "" {
key = "unknown"
}
metrics, ok := byDB[key]
if !ok {
metrics = &BackupMetrics{
Database: key,
Engine: e.DatabaseType,
}
byDB[key] = metrics
}
metrics.TotalBackups++
isSuccess := e.Status == catalog.StatusCompleted || e.Status == catalog.StatusVerified
if isSuccess {
metrics.SuccessCount++
// Track most recent success
if e.CreatedAt.After(metrics.LastSuccess) {
metrics.LastSuccess = e.CreatedAt
metrics.LastDuration = time.Duration(e.Duration * float64(time.Second))
metrics.LastSize = e.SizeBytes
metrics.Verified = e.VerifiedAt != nil && e.VerifyValid != nil && *e.VerifyValid
metrics.Engine = e.DatabaseType
}
} else {
metrics.FailureCount++
}
}
// Calculate RPO for each database
now := time.Now()
for _, metrics := range byDB {
if !metrics.LastSuccess.IsZero() {
metrics.RPOSeconds = now.Sub(metrics.LastSuccess).Seconds()
}
}
// Convert to slice and sort
result := make([]BackupMetrics, 0, len(byDB))
for _, metrics := range byDB {
result = append(result, *metrics)
}
sort.Slice(result, func(i, j int) bool {
return result[i].Database < result[j].Database
})
return result, nil
}
// formatMetrics formats metrics in Prometheus exposition format
func (m *MetricsWriter) formatMetrics(metrics []BackupMetrics) string {
var b strings.Builder
// Timestamp of metrics generation
now := time.Now().Unix()
// Header comment
b.WriteString("# DBBackup Prometheus Metrics\n")
b.WriteString(fmt.Sprintf("# Generated at: %s\n", time.Now().Format(time.RFC3339)))
b.WriteString(fmt.Sprintf("# Instance: %s\n", m.instance))
b.WriteString("\n")
// dbbackup_last_success_timestamp
b.WriteString("# HELP dbbackup_last_success_timestamp Unix timestamp of last successful backup\n")
b.WriteString("# TYPE dbbackup_last_success_timestamp gauge\n")
for _, met := range metrics {
if !met.LastSuccess.IsZero() {
b.WriteString(fmt.Sprintf("dbbackup_last_success_timestamp{instance=%q,database=%q,engine=%q} %d\n",
m.instance, met.Database, met.Engine, met.LastSuccess.Unix()))
}
}
b.WriteString("\n")
// dbbackup_last_backup_duration_seconds
b.WriteString("# HELP dbbackup_last_backup_duration_seconds Duration of last successful backup in seconds\n")
b.WriteString("# TYPE dbbackup_last_backup_duration_seconds gauge\n")
for _, met := range metrics {
if met.LastDuration > 0 {
b.WriteString(fmt.Sprintf("dbbackup_last_backup_duration_seconds{instance=%q,database=%q,engine=%q} %.2f\n",
m.instance, met.Database, met.Engine, met.LastDuration.Seconds()))
}
}
b.WriteString("\n")
// dbbackup_last_backup_size_bytes
b.WriteString("# HELP dbbackup_last_backup_size_bytes Size of last successful backup in bytes\n")
b.WriteString("# TYPE dbbackup_last_backup_size_bytes gauge\n")
for _, met := range metrics {
if met.LastSize > 0 {
b.WriteString(fmt.Sprintf("dbbackup_last_backup_size_bytes{instance=%q,database=%q,engine=%q} %d\n",
m.instance, met.Database, met.Engine, met.LastSize))
}
}
b.WriteString("\n")
// dbbackup_backup_total (counter)
b.WriteString("# HELP dbbackup_backup_total Total number of backup attempts\n")
b.WriteString("# TYPE dbbackup_backup_total counter\n")
for _, met := range metrics {
b.WriteString(fmt.Sprintf("dbbackup_backup_total{instance=%q,database=%q,status=\"success\"} %d\n",
m.instance, met.Database, met.SuccessCount))
b.WriteString(fmt.Sprintf("dbbackup_backup_total{instance=%q,database=%q,status=\"failure\"} %d\n",
m.instance, met.Database, met.FailureCount))
}
b.WriteString("\n")
// dbbackup_rpo_seconds
b.WriteString("# HELP dbbackup_rpo_seconds Recovery Point Objective - seconds since last successful backup\n")
b.WriteString("# TYPE dbbackup_rpo_seconds gauge\n")
for _, met := range metrics {
if met.RPOSeconds > 0 {
b.WriteString(fmt.Sprintf("dbbackup_rpo_seconds{instance=%q,database=%q} %.0f\n",
m.instance, met.Database, met.RPOSeconds))
}
}
b.WriteString("\n")
// dbbackup_backup_verified
b.WriteString("# HELP dbbackup_backup_verified Whether the last backup was verified (1=yes, 0=no)\n")
b.WriteString("# TYPE dbbackup_backup_verified gauge\n")
for _, met := range metrics {
verified := 0
if met.Verified {
verified = 1
}
b.WriteString(fmt.Sprintf("dbbackup_backup_verified{instance=%q,database=%q} %d\n",
m.instance, met.Database, verified))
}
b.WriteString("\n")
// dbbackup_scrape_timestamp
b.WriteString("# HELP dbbackup_scrape_timestamp Unix timestamp when metrics were collected\n")
b.WriteString("# TYPE dbbackup_scrape_timestamp gauge\n")
b.WriteString(fmt.Sprintf("dbbackup_scrape_timestamp{instance=%q} %d\n", m.instance, now))
return b.String()
}
// GenerateMetricsString returns metrics as a string (for HTTP endpoint)
func (m *MetricsWriter) GenerateMetricsString() (string, error) {
metrics, err := m.collectMetrics()
if err != nil {
return "", err
}
return m.formatMetrics(metrics), nil
}

View File

@@ -0,0 +1,829 @@
package restore
import (
"bufio"
"bytes"
"compress/gzip"
"encoding/json"
"fmt"
"io"
"os"
"os/exec"
"path/filepath"
"regexp"
"strings"
"dbbackup/internal/logger"
)
// DiagnoseResult contains the results of a dump file diagnosis
type DiagnoseResult struct {
FilePath string `json:"file_path"`
FileName string `json:"file_name"`
FileSize int64 `json:"file_size"`
Format ArchiveFormat `json:"format"`
DetectedFormat string `json:"detected_format"`
IsValid bool `json:"is_valid"`
IsTruncated bool `json:"is_truncated"`
IsCorrupted bool `json:"is_corrupted"`
Errors []string `json:"errors,omitempty"`
Warnings []string `json:"warnings,omitempty"`
Details *DiagnoseDetails `json:"details,omitempty"`
}
// DiagnoseDetails contains detailed analysis of the dump file
type DiagnoseDetails struct {
// Header info
HasPGDMPSignature bool `json:"has_pgdmp_signature,omitempty"`
HasSQLHeader bool `json:"has_sql_header,omitempty"`
FirstBytes string `json:"first_bytes,omitempty"`
LastBytes string `json:"last_bytes,omitempty"`
// COPY block analysis (for SQL dumps)
CopyBlockCount int `json:"copy_block_count,omitempty"`
UnterminatedCopy bool `json:"unterminated_copy,omitempty"`
LastCopyTable string `json:"last_copy_table,omitempty"`
LastCopyLineNumber int `json:"last_copy_line_number,omitempty"`
SampleCopyData []string `json:"sample_copy_data,omitempty"`
// Structure analysis
HasCreateStatements bool `json:"has_create_statements,omitempty"`
HasInsertStatements bool `json:"has_insert_statements,omitempty"`
HasCopyStatements bool `json:"has_copy_statements,omitempty"`
HasTransactionBlock bool `json:"has_transaction_block,omitempty"`
ProperlyTerminated bool `json:"properly_terminated,omitempty"`
// pg_restore analysis (for custom format)
PgRestoreListable bool `json:"pg_restore_listable,omitempty"`
PgRestoreError string `json:"pg_restore_error,omitempty"`
TableCount int `json:"table_count,omitempty"`
TableList []string `json:"table_list,omitempty"`
// Compression analysis
GzipValid bool `json:"gzip_valid,omitempty"`
GzipError string `json:"gzip_error,omitempty"`
ExpandedSize int64 `json:"expanded_size,omitempty"`
CompressionRatio float64 `json:"compression_ratio,omitempty"`
}
// Diagnoser performs deep analysis of backup files
type Diagnoser struct {
log logger.Logger
verbose bool
}
// NewDiagnoser creates a new diagnoser
func NewDiagnoser(log logger.Logger, verbose bool) *Diagnoser {
return &Diagnoser{
log: log,
verbose: verbose,
}
}
// DiagnoseFile performs comprehensive diagnosis of a backup file
func (d *Diagnoser) DiagnoseFile(filePath string) (*DiagnoseResult, error) {
result := &DiagnoseResult{
FilePath: filePath,
FileName: filepath.Base(filePath),
Details: &DiagnoseDetails{},
IsValid: true, // Assume valid until proven otherwise
}
// Check file exists and get size
stat, err := os.Stat(filePath)
if err != nil {
result.IsValid = false
result.Errors = append(result.Errors, fmt.Sprintf("Cannot access file: %v", err))
return result, nil
}
result.FileSize = stat.Size()
if result.FileSize == 0 {
result.IsValid = false
result.IsTruncated = true
result.Errors = append(result.Errors, "File is empty (0 bytes)")
return result, nil
}
// Detect format
result.Format = DetectArchiveFormat(filePath)
result.DetectedFormat = result.Format.String()
// Analyze based on format
switch result.Format {
case FormatPostgreSQLDump:
d.diagnosePgDump(filePath, result)
case FormatPostgreSQLDumpGz:
d.diagnosePgDumpGz(filePath, result)
case FormatPostgreSQLSQL:
d.diagnoseSQLScript(filePath, false, result)
case FormatPostgreSQLSQLGz:
d.diagnoseSQLScript(filePath, true, result)
case FormatClusterTarGz:
d.diagnoseClusterArchive(filePath, result)
default:
result.Warnings = append(result.Warnings, "Unknown format - limited diagnosis available")
d.diagnoseUnknown(filePath, result)
}
return result, nil
}
// diagnosePgDump analyzes PostgreSQL custom format dump
func (d *Diagnoser) diagnosePgDump(filePath string, result *DiagnoseResult) {
file, err := os.Open(filePath)
if err != nil {
result.IsValid = false
result.Errors = append(result.Errors, fmt.Sprintf("Cannot open file: %v", err))
return
}
defer file.Close()
// Read first 512 bytes
header := make([]byte, 512)
n, err := file.Read(header)
if err != nil && err != io.EOF {
result.IsValid = false
result.Errors = append(result.Errors, fmt.Sprintf("Cannot read header: %v", err))
return
}
// Check PGDMP signature
if n >= 5 && string(header[:5]) == "PGDMP" {
result.Details.HasPGDMPSignature = true
result.Details.FirstBytes = "PGDMP..."
} else {
result.IsValid = false
result.IsCorrupted = true
result.Details.HasPGDMPSignature = false
result.Details.FirstBytes = fmt.Sprintf("%q", header[:minInt(n, 20)])
result.Errors = append(result.Errors,
"Missing PGDMP signature - file is NOT PostgreSQL custom format",
"This file may be SQL format incorrectly named as .dump",
"Try: file "+filePath+" to check actual file type")
return
}
// Try pg_restore --list to verify dump integrity
d.verifyWithPgRestore(filePath, result)
}
// diagnosePgDumpGz analyzes compressed PostgreSQL custom format dump
func (d *Diagnoser) diagnosePgDumpGz(filePath string, result *DiagnoseResult) {
file, err := os.Open(filePath)
if err != nil {
result.IsValid = false
result.Errors = append(result.Errors, fmt.Sprintf("Cannot open file: %v", err))
return
}
defer file.Close()
// Verify gzip integrity
gz, err := gzip.NewReader(file)
if err != nil {
result.IsValid = false
result.IsCorrupted = true
result.Details.GzipValid = false
result.Details.GzipError = err.Error()
result.Errors = append(result.Errors,
fmt.Sprintf("Invalid gzip format: %v", err),
"The file may be truncated or corrupted during transfer")
return
}
result.Details.GzipValid = true
// Read and check header
header := make([]byte, 512)
n, err := gz.Read(header)
if err != nil && err != io.EOF {
result.IsValid = false
result.Errors = append(result.Errors, fmt.Sprintf("Cannot read decompressed header: %v", err))
gz.Close()
return
}
gz.Close()
// Check PGDMP signature
if n >= 5 && string(header[:5]) == "PGDMP" {
result.Details.HasPGDMPSignature = true
result.Details.FirstBytes = "PGDMP..."
} else {
result.Details.HasPGDMPSignature = false
result.Details.FirstBytes = fmt.Sprintf("%q", header[:minInt(n, 20)])
// Check if it's actually SQL content
content := string(header[:n])
if strings.Contains(content, "PostgreSQL") || strings.Contains(content, "pg_dump") ||
strings.Contains(content, "SET ") || strings.Contains(content, "CREATE ") {
result.Details.HasSQLHeader = true
result.Warnings = append(result.Warnings,
"File contains SQL text but has .dump extension",
"This appears to be SQL format, not custom format",
"Restore should use psql, not pg_restore")
} else {
result.IsValid = false
result.IsCorrupted = true
result.Errors = append(result.Errors,
"Missing PGDMP signature in decompressed content",
"File is neither custom format nor valid SQL")
}
return
}
// Verify full gzip stream integrity by reading to end
file.Seek(0, 0)
gz, _ = gzip.NewReader(file)
var totalRead int64
buf := make([]byte, 32*1024)
for {
n, err := gz.Read(buf)
totalRead += int64(n)
if err == io.EOF {
break
}
if err != nil {
result.IsValid = false
result.IsTruncated = true
result.Details.ExpandedSize = totalRead
result.Errors = append(result.Errors,
fmt.Sprintf("Gzip stream truncated after %d bytes: %v", totalRead, err),
"The backup file appears to be incomplete",
"Check if backup process completed successfully")
gz.Close()
return
}
}
gz.Close()
result.Details.ExpandedSize = totalRead
if result.FileSize > 0 {
result.Details.CompressionRatio = float64(totalRead) / float64(result.FileSize)
}
}
// diagnoseSQLScript analyzes SQL script format
func (d *Diagnoser) diagnoseSQLScript(filePath string, compressed bool, result *DiagnoseResult) {
var reader io.Reader
var file *os.File
var gz *gzip.Reader
var err error
file, err = os.Open(filePath)
if err != nil {
result.IsValid = false
result.Errors = append(result.Errors, fmt.Sprintf("Cannot open file: %v", err))
return
}
defer file.Close()
if compressed {
gz, err = gzip.NewReader(file)
if err != nil {
result.IsValid = false
result.IsCorrupted = true
result.Details.GzipValid = false
result.Details.GzipError = err.Error()
result.Errors = append(result.Errors, fmt.Sprintf("Invalid gzip format: %v", err))
return
}
result.Details.GzipValid = true
reader = gz
defer gz.Close()
} else {
reader = file
}
// Analyze SQL content
scanner := bufio.NewScanner(reader)
// Increase buffer size for large lines (COPY data can have long lines)
buf := make([]byte, 0, 1024*1024)
scanner.Buffer(buf, 10*1024*1024)
var lineNumber int
var inCopyBlock bool
var lastCopyTable string
var copyStartLine int
var copyDataSamples []string
copyBlockPattern := regexp.MustCompile(`^COPY\s+("?[\w\."]+)"?\s+\(`)
copyEndPattern := regexp.MustCompile(`^\\\.`)
for scanner.Scan() {
lineNumber++
line := scanner.Text()
// Check first few lines for header
if lineNumber <= 10 {
if strings.Contains(line, "PostgreSQL") || strings.Contains(line, "pg_dump") {
result.Details.HasSQLHeader = true
}
}
// Track structure
upperLine := strings.ToUpper(strings.TrimSpace(line))
if strings.HasPrefix(upperLine, "CREATE ") {
result.Details.HasCreateStatements = true
}
if strings.HasPrefix(upperLine, "INSERT ") {
result.Details.HasInsertStatements = true
}
if strings.HasPrefix(upperLine, "BEGIN") {
result.Details.HasTransactionBlock = true
}
// Track COPY blocks
if copyBlockPattern.MatchString(line) {
if inCopyBlock {
// Previous COPY block wasn't terminated!
result.Details.UnterminatedCopy = true
result.IsTruncated = true
result.IsValid = false
result.Errors = append(result.Errors,
fmt.Sprintf("COPY block for '%s' starting at line %d was never terminated",
lastCopyTable, copyStartLine))
}
inCopyBlock = true
result.Details.HasCopyStatements = true
result.Details.CopyBlockCount++
matches := copyBlockPattern.FindStringSubmatch(line)
if len(matches) > 1 {
lastCopyTable = matches[1]
}
copyStartLine = lineNumber
copyDataSamples = nil
} else if copyEndPattern.MatchString(line) {
inCopyBlock = false
} else if inCopyBlock {
// We're in COPY data
if len(copyDataSamples) < 3 {
copyDataSamples = append(copyDataSamples, truncateString(line, 100))
}
}
// Store last line for termination check
if lineNumber > 0 && (lineNumber%100000 == 0) && d.verbose {
d.log.Debug("Scanning SQL file", "lines_processed", lineNumber)
}
}
if err := scanner.Err(); err != nil {
result.IsValid = false
result.IsTruncated = true
result.Errors = append(result.Errors,
fmt.Sprintf("Error reading file at line %d: %v", lineNumber, err),
"File may be truncated or contain invalid data")
}
// Check if we ended while still in a COPY block
if inCopyBlock {
result.Details.UnterminatedCopy = true
result.Details.LastCopyTable = lastCopyTable
result.Details.LastCopyLineNumber = copyStartLine
result.Details.SampleCopyData = copyDataSamples
result.IsTruncated = true
result.IsValid = false
result.Errors = append(result.Errors,
fmt.Sprintf("File ends inside COPY block for table '%s' (started at line %d)",
lastCopyTable, copyStartLine),
"The backup was truncated during data export",
"This explains the 'syntax error' during restore - COPY data is being interpreted as SQL")
if len(copyDataSamples) > 0 {
result.Errors = append(result.Errors,
fmt.Sprintf("Sample orphaned data: %s", copyDataSamples[0]))
}
} else {
result.Details.ProperlyTerminated = true
}
// Read last bytes for additional context
if !compressed {
file.Seek(-min(500, result.FileSize), 2)
lastBytes := make([]byte, 500)
n, _ := file.Read(lastBytes)
result.Details.LastBytes = strings.TrimSpace(string(lastBytes[:n]))
}
}
// diagnoseClusterArchive analyzes a cluster tar.gz archive
func (d *Diagnoser) diagnoseClusterArchive(filePath string, result *DiagnoseResult) {
// First verify tar.gz integrity
cmd := exec.Command("tar", "-tzf", filePath)
output, err := cmd.Output()
if err != nil {
result.IsValid = false
result.IsCorrupted = true
result.Errors = append(result.Errors,
fmt.Sprintf("Tar archive is invalid or corrupted: %v", err),
"Run: tar -tzf "+filePath+" 2>&1 | tail -20")
return
}
// Parse tar listing
files := strings.Split(strings.TrimSpace(string(output)), "\n")
var dumpFiles []string
hasGlobals := false
hasMetadata := false
for _, f := range files {
if strings.HasSuffix(f, ".dump") || strings.HasSuffix(f, ".sql.gz") {
dumpFiles = append(dumpFiles, f)
}
if strings.Contains(f, "globals.sql") {
hasGlobals = true
}
if strings.Contains(f, "manifest.json") || strings.Contains(f, "metadata.json") {
hasMetadata = true
}
}
result.Details.TableCount = len(dumpFiles)
result.Details.TableList = dumpFiles
if len(dumpFiles) == 0 {
result.Warnings = append(result.Warnings, "No database dump files found in archive")
}
if !hasGlobals {
result.Warnings = append(result.Warnings, "No globals.sql found - roles/tablespaces won't be restored")
}
if !hasMetadata {
result.Warnings = append(result.Warnings, "No manifest/metadata found - limited validation possible")
}
// For verbose mode, diagnose individual dumps inside the archive
if d.verbose && len(dumpFiles) > 0 {
d.log.Info("Cluster archive contains databases", "count", len(dumpFiles))
for _, df := range dumpFiles {
d.log.Info(" - " + df)
}
}
}
// diagnoseUnknown handles unknown format files
func (d *Diagnoser) diagnoseUnknown(filePath string, result *DiagnoseResult) {
file, err := os.Open(filePath)
if err != nil {
return
}
defer file.Close()
header := make([]byte, 512)
n, _ := file.Read(header)
result.Details.FirstBytes = fmt.Sprintf("%q", header[:minInt(n, 50)])
// Try to identify by content
content := string(header[:n])
if strings.Contains(content, "PGDMP") {
result.Warnings = append(result.Warnings, "File appears to be PostgreSQL custom format - rename to .dump")
} else if strings.Contains(content, "PostgreSQL") || strings.Contains(content, "pg_dump") {
result.Warnings = append(result.Warnings, "File appears to be PostgreSQL SQL - rename to .sql")
} else if bytes.HasPrefix(header, []byte{0x1f, 0x8b}) {
result.Warnings = append(result.Warnings, "File appears to be gzip compressed - add .gz extension")
}
}
// verifyWithPgRestore uses pg_restore --list to verify dump integrity
func (d *Diagnoser) verifyWithPgRestore(filePath string, result *DiagnoseResult) {
cmd := exec.Command("pg_restore", "--list", filePath)
output, err := cmd.CombinedOutput()
if err != nil {
result.Details.PgRestoreListable = false
result.Details.PgRestoreError = string(output)
// Check for specific errors
errStr := string(output)
if strings.Contains(errStr, "unexpected end of file") ||
strings.Contains(errStr, "invalid large-object TOC entry") {
result.IsTruncated = true
result.IsValid = false
result.Errors = append(result.Errors,
"pg_restore reports truncated or incomplete dump file",
fmt.Sprintf("Error: %s", truncateString(errStr, 200)))
} else if strings.Contains(errStr, "not a valid archive") {
result.IsCorrupted = true
result.IsValid = false
result.Errors = append(result.Errors,
"pg_restore reports file is not a valid archive",
"File may be corrupted or wrong format")
} else {
result.Warnings = append(result.Warnings,
fmt.Sprintf("pg_restore --list warning: %s", truncateString(errStr, 200)))
}
return
}
result.Details.PgRestoreListable = true
// Count tables in the TOC
lines := strings.Split(string(output), "\n")
tableCount := 0
var tables []string
for _, line := range lines {
if strings.Contains(line, " TABLE DATA ") {
tableCount++
if len(tables) < 20 {
parts := strings.Fields(line)
if len(parts) > 3 {
tables = append(tables, parts[len(parts)-1])
}
}
}
}
result.Details.TableCount = tableCount
result.Details.TableList = tables
}
// DiagnoseClusterDumps extracts and diagnoses all dumps in a cluster archive
func (d *Diagnoser) DiagnoseClusterDumps(archivePath, tempDir string) ([]*DiagnoseResult, error) {
// First, try to list archive contents without extracting (fast check)
listCmd := exec.Command("tar", "-tzf", archivePath)
listOutput, listErr := listCmd.CombinedOutput()
if listErr != nil {
// Archive listing failed - likely corrupted
errResult := &DiagnoseResult{
FilePath: archivePath,
FileName: filepath.Base(archivePath),
Format: FormatClusterTarGz,
DetectedFormat: "Cluster Archive (tar.gz)",
IsValid: false,
IsCorrupted: true,
Details: &DiagnoseDetails{},
}
errOutput := string(listOutput)
if strings.Contains(errOutput, "unexpected end of file") ||
strings.Contains(errOutput, "Unexpected EOF") ||
strings.Contains(errOutput, "truncated") {
errResult.IsTruncated = true
errResult.Errors = append(errResult.Errors,
"Archive appears to be TRUNCATED - incomplete download or backup",
fmt.Sprintf("tar error: %s", truncateString(errOutput, 300)),
"Possible causes: disk full during backup, interrupted transfer, network timeout",
"Solution: Re-create the backup from source database")
} else {
errResult.Errors = append(errResult.Errors,
fmt.Sprintf("Cannot list archive contents: %v", listErr),
fmt.Sprintf("tar error: %s", truncateString(errOutput, 300)),
"Run manually: tar -tzf "+archivePath+" 2>&1 | tail -50")
}
return []*DiagnoseResult{errResult}, nil
}
// Archive is listable - now check disk space before extraction
files := strings.Split(strings.TrimSpace(string(listOutput)), "\n")
// Check if we have enough disk space (estimate 4x archive size needed)
archiveInfo, _ := os.Stat(archivePath)
requiredSpace := archiveInfo.Size() * 4
// Check temp directory space - try to extract metadata first
if stat, err := os.Stat(tempDir); err == nil && stat.IsDir() {
// Try extraction of a small test file first
testCmd := exec.Command("tar", "-xzf", archivePath, "-C", tempDir, "--wildcards", "*.json", "--wildcards", "globals.sql")
testCmd.Run() // Ignore error - just try to extract metadata
}
d.log.Info("Archive listing successful", "files", len(files))
// Try full extraction
cmd := exec.Command("tar", "-xzf", archivePath, "-C", tempDir)
var stderr bytes.Buffer
cmd.Stderr = &stderr
if err := cmd.Run(); err != nil {
// Extraction failed
errResult := &DiagnoseResult{
FilePath: archivePath,
FileName: filepath.Base(archivePath),
Format: FormatClusterTarGz,
DetectedFormat: "Cluster Archive (tar.gz)",
IsValid: false,
Details: &DiagnoseDetails{},
}
errOutput := stderr.String()
if strings.Contains(errOutput, "No space left") ||
strings.Contains(errOutput, "cannot write") ||
strings.Contains(errOutput, "Disk quota exceeded") {
errResult.Errors = append(errResult.Errors,
"INSUFFICIENT DISK SPACE to extract archive for diagnosis",
fmt.Sprintf("Archive size: %s (needs ~%s for extraction)",
formatBytes(archiveInfo.Size()), formatBytes(requiredSpace)),
"Use CLI diagnosis instead: dbbackup restore diagnose "+archivePath,
"Or use --workdir flag to specify a location with more space")
} else if strings.Contains(errOutput, "unexpected end of file") ||
strings.Contains(errOutput, "Unexpected EOF") {
errResult.IsTruncated = true
errResult.IsCorrupted = true
errResult.Errors = append(errResult.Errors,
"Archive is TRUNCATED - extraction failed mid-way",
fmt.Sprintf("Error: %s", truncateString(errOutput, 200)),
"The backup file is incomplete and cannot be restored",
"Solution: Re-create the backup from source database")
} else {
errResult.IsCorrupted = true
errResult.Errors = append(errResult.Errors,
fmt.Sprintf("Extraction failed: %v", err),
fmt.Sprintf("tar error: %s", truncateString(errOutput, 300)))
}
// Still report what files we found in the listing
var dumpFiles []string
for _, f := range files {
if strings.HasSuffix(f, ".dump") || strings.HasSuffix(f, ".sql.gz") {
dumpFiles = append(dumpFiles, filepath.Base(f))
}
}
if len(dumpFiles) > 0 {
errResult.Details.TableList = dumpFiles
errResult.Details.TableCount = len(dumpFiles)
errResult.Warnings = append(errResult.Warnings,
fmt.Sprintf("Archive contains %d database dumps (listing only)", len(dumpFiles)))
}
return []*DiagnoseResult{errResult}, nil
}
// Find dump files
dumpsDir := filepath.Join(tempDir, "dumps")
entries, err := os.ReadDir(dumpsDir)
if err != nil {
// Try without dumps subdirectory
entries, err = os.ReadDir(tempDir)
if err != nil {
return nil, fmt.Errorf("cannot read extracted files: %w", err)
}
dumpsDir = tempDir
}
var results []*DiagnoseResult
for _, entry := range entries {
if entry.IsDir() {
continue
}
name := entry.Name()
if !strings.HasSuffix(name, ".dump") && !strings.HasSuffix(name, ".sql.gz") &&
!strings.HasSuffix(name, ".sql") {
continue
}
dumpPath := filepath.Join(dumpsDir, name)
d.log.Info("Diagnosing dump file", "file", name)
result, err := d.DiagnoseFile(dumpPath)
if err != nil {
d.log.Warn("Failed to diagnose file", "file", name, "error", err)
continue
}
results = append(results, result)
}
return results, nil
}
// PrintDiagnosis outputs a human-readable diagnosis report
func (d *Diagnoser) PrintDiagnosis(result *DiagnoseResult) {
fmt.Println("\n" + strings.Repeat("=", 70))
fmt.Printf("📋 DIAGNOSIS: %s\n", result.FileName)
fmt.Println(strings.Repeat("=", 70))
// Basic info
fmt.Printf("\nFile: %s\n", result.FilePath)
fmt.Printf("Size: %s\n", formatBytes(result.FileSize))
fmt.Printf("Format: %s\n", result.DetectedFormat)
// Status
if result.IsValid {
fmt.Println("\n✅ STATUS: VALID")
} else {
fmt.Println("\n❌ STATUS: INVALID")
}
if result.IsTruncated {
fmt.Println("⚠️ TRUNCATED: Yes - file appears incomplete")
}
if result.IsCorrupted {
fmt.Println("⚠️ CORRUPTED: Yes - file structure is damaged")
}
// Details
if result.Details != nil {
fmt.Println("\n📊 DETAILS:")
if result.Details.HasPGDMPSignature {
fmt.Println(" ✓ Has PGDMP signature (PostgreSQL custom format)")
}
if result.Details.HasSQLHeader {
fmt.Println(" ✓ Has PostgreSQL SQL header")
}
if result.Details.GzipValid {
fmt.Println(" ✓ Gzip compression valid")
}
if result.Details.PgRestoreListable {
fmt.Printf(" ✓ pg_restore can list contents (%d tables)\n", result.Details.TableCount)
}
if result.Details.CopyBlockCount > 0 {
fmt.Printf(" • Contains %d COPY blocks\n", result.Details.CopyBlockCount)
}
if result.Details.UnterminatedCopy {
fmt.Printf(" ✗ Unterminated COPY block: %s (line %d)\n",
result.Details.LastCopyTable, result.Details.LastCopyLineNumber)
}
if result.Details.ProperlyTerminated {
fmt.Println(" ✓ All COPY blocks properly terminated")
}
if result.Details.ExpandedSize > 0 {
fmt.Printf(" • Expanded size: %s (ratio: %.1fx)\n",
formatBytes(result.Details.ExpandedSize), result.Details.CompressionRatio)
}
}
// Errors
if len(result.Errors) > 0 {
fmt.Println("\n❌ ERRORS:")
for _, e := range result.Errors {
fmt.Printf(" • %s\n", e)
}
}
// Warnings
if len(result.Warnings) > 0 {
fmt.Println("\n⚠ WARNINGS:")
for _, w := range result.Warnings {
fmt.Printf(" • %s\n", w)
}
}
// Recommendations
if !result.IsValid {
fmt.Println("\n💡 RECOMMENDATIONS:")
if result.IsTruncated {
fmt.Println(" 1. Re-run the backup process for this database")
fmt.Println(" 2. Check disk space on backup server during backup")
fmt.Println(" 3. Verify network stability if backup was remote")
fmt.Println(" 4. Check backup logs for errors during the backup")
}
if result.IsCorrupted {
fmt.Println(" 1. Verify backup file was transferred completely")
fmt.Println(" 2. Check if backup file was modified after creation")
fmt.Println(" 3. Try restoring from a previous backup")
}
}
fmt.Println(strings.Repeat("=", 70))
}
// PrintDiagnosisJSON outputs diagnosis as JSON
func (d *Diagnoser) PrintDiagnosisJSON(result *DiagnoseResult) error {
output, err := json.MarshalIndent(result, "", " ")
if err != nil {
return err
}
fmt.Println(string(output))
return nil
}
// Helper functions
func truncateString(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
return s[:maxLen-3] + "..."
}
func formatBytes(bytes int64) string {
const unit = 1024
if bytes < unit {
return fmt.Sprintf("%d B", bytes)
}
div, exp := int64(unit), 0
for n := bytes / unit; n >= unit; n /= unit {
div *= unit
exp++
}
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
}
func min(a, b int64) int64 {
if a < b {
return a
}
return b
}
func minInt(a, b int) int {
if a < b {
return a
}
return b
}

View File

@@ -27,6 +27,8 @@ type Engine struct {
progress progress.Indicator
detailedReporter *progress.DetailedReporter
dryRun bool
debugLogPath string // Path to save debug log on error
errorCollector *ErrorCollector // Collects detailed error info
}
// New creates a new restore engine
@@ -77,6 +79,11 @@ func NewWithProgress(cfg *config.Config, log logger.Logger, db database.Database
}
}
// SetDebugLogPath enables saving detailed error reports on failure
func (e *Engine) SetDebugLogPath(path string) {
e.debugLogPath = path
}
// loggerAdapter adapts our logger to the progress.Logger interface
type loggerAdapter struct {
logger logger.Logger
@@ -245,6 +252,15 @@ func (e *Engine) restorePostgreSQLDumpWithOwnership(ctx context.Context, archive
// restorePostgreSQLSQL restores from PostgreSQL SQL script
func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB string, compressed bool) error {
// Pre-validate SQL dump to detect truncation BEFORE attempting restore
// This saves time by catching corrupted files early (vs 49min failures)
if err := e.quickValidateSQLDump(archivePath, compressed); err != nil {
e.log.Error("Pre-restore validation failed - dump file appears corrupted",
"file", archivePath,
"error", err)
return fmt.Errorf("dump validation failed: %w - the backup file may be truncated or corrupted", err)
}
// Use psql for SQL scripts
var cmd []string
@@ -255,9 +271,10 @@ func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB
}
if compressed {
psqlCmd := fmt.Sprintf("psql -U %s -d %s", e.cfg.User, targetDB)
// Use ON_ERROR_STOP=1 to fail fast on first error (prevents millions of errors on truncated dumps)
psqlCmd := fmt.Sprintf("psql -U %s -d %s -v ON_ERROR_STOP=1", e.cfg.User, targetDB)
if hostArg != "" {
psqlCmd = fmt.Sprintf("psql %s -U %s -d %s", hostArg, e.cfg.User, targetDB)
psqlCmd = fmt.Sprintf("psql %s -U %s -d %s -v ON_ERROR_STOP=1", hostArg, e.cfg.User, targetDB)
}
// Set PGPASSWORD in the bash command for password-less auth
cmd = []string{
@@ -272,6 +289,7 @@ func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB
"-p", fmt.Sprintf("%d", e.cfg.Port),
"-U", e.cfg.User,
"-d", targetDB,
"-v", "ON_ERROR_STOP=1",
"-f", archivePath,
}
} else {
@@ -279,6 +297,7 @@ func (e *Engine) restorePostgreSQLSQL(ctx context.Context, archivePath, targetDB
"psql",
"-U", e.cfg.User,
"-d", targetDB,
"-v", "ON_ERROR_STOP=1",
"-f", archivePath,
}
}
@@ -306,6 +325,11 @@ func (e *Engine) restoreMySQLSQL(ctx context.Context, archivePath, targetDB stri
// executeRestoreCommand executes a restore command
func (e *Engine) executeRestoreCommand(ctx context.Context, cmdArgs []string) error {
return e.executeRestoreCommandWithContext(ctx, cmdArgs, "", "", FormatUnknown)
}
// executeRestoreCommandWithContext executes a restore command with error collection context
func (e *Engine) executeRestoreCommandWithContext(ctx context.Context, cmdArgs []string, archivePath, targetDB string, format ArchiveFormat) error {
e.log.Info("Executing restore command", "command", strings.Join(cmdArgs, " "))
cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
@@ -316,6 +340,12 @@ func (e *Engine) executeRestoreCommand(ctx context.Context, cmdArgs []string) er
fmt.Sprintf("MYSQL_PWD=%s", e.cfg.Password),
)
// Create error collector if debug log path is set
var collector *ErrorCollector
if e.debugLogPath != "" {
collector = NewErrorCollector(e.cfg, e.log, archivePath, targetDB, format, true)
}
// Stream stderr to avoid memory issues with large output
// Don't use CombinedOutput() as it loads everything into memory
stderr, err := cmd.StderrPipe()
@@ -336,6 +366,12 @@ func (e *Engine) executeRestoreCommand(ctx context.Context, cmdArgs []string) er
n, err := stderr.Read(buf)
if n > 0 {
chunk := string(buf[:n])
// Feed to error collector if enabled
if collector != nil {
collector.CaptureStderr(chunk)
}
// Only capture REAL errors, not verbose output
if strings.Contains(chunk, "ERROR:") || strings.Contains(chunk, "FATAL:") || strings.Contains(chunk, "error:") {
lastError = strings.TrimSpace(chunk)
@@ -352,6 +388,12 @@ func (e *Engine) executeRestoreCommand(ctx context.Context, cmdArgs []string) er
}
if err := cmd.Wait(); err != nil {
// Get exit code
exitCode := 1
if exitErr, ok := err.(*exec.ExitError); ok {
exitCode = exitErr.ExitCode()
}
// PostgreSQL pg_restore returns exit code 1 even for ignorable errors
// Check if errors are ignorable (already exists, duplicate, etc.)
if lastError != "" && e.isIgnorableError(lastError) {
@@ -360,8 +402,12 @@ func (e *Engine) executeRestoreCommand(ctx context.Context, cmdArgs []string) er
}
// Classify error and provide helpful hints
var classification *checks.ErrorClassification
var errType, errHint string
if lastError != "" {
classification := checks.ClassifyError(lastError)
classification = checks.ClassifyError(lastError)
errType = classification.Type
errHint = classification.Hint
e.log.Error("Restore command failed",
"error", err,
"last_stderr", lastError,
@@ -369,11 +415,37 @@ func (e *Engine) executeRestoreCommand(ctx context.Context, cmdArgs []string) er
"error_type", classification.Type,
"hint", classification.Hint,
"action", classification.Action)
return fmt.Errorf("restore failed: %w (last error: %s, total errors: %d) - %s",
err, lastError, errorCount, classification.Hint)
} else {
e.log.Error("Restore command failed", "error", err, "error_count", errorCount)
}
e.log.Error("Restore command failed", "error", err, "last_stderr", lastError, "error_count", errorCount)
// Generate and save error report if collector is enabled
if collector != nil {
collector.SetExitCode(exitCode)
report := collector.GenerateReport(
lastError,
errType,
errHint,
)
// Print report to console
collector.PrintReport(report)
// Save to file
if e.debugLogPath != "" {
if saveErr := collector.SaveReport(report, e.debugLogPath); saveErr != nil {
e.log.Warn("Failed to save debug log", "error", saveErr)
} else {
e.log.Info("Debug log saved", "path", e.debugLogPath)
fmt.Printf("\n📋 Detailed error report saved to: %s\n", e.debugLogPath)
}
}
}
if lastError != "" {
return fmt.Errorf("restore failed: %w (last error: %s, total errors: %d) - %s",
err, lastError, errorCount, errHint)
}
return fmt.Errorf("restore failed: %w", err)
}
@@ -622,6 +694,69 @@ func (e *Engine) RestoreCluster(ctx context.Context, archivePath string) error {
return fmt.Errorf("failed to read dumps directory: %w", err)
}
// PRE-VALIDATE all SQL dumps BEFORE starting restore
// This catches truncated files early instead of failing after hours of work
e.log.Info("Pre-validating dump files before restore...")
e.progress.Update("Pre-validating dump files...")
var corruptedDumps []string
diagnoser := NewDiagnoser(e.log, false)
for _, entry := range entries {
if entry.IsDir() {
continue
}
dumpFile := filepath.Join(dumpsDir, entry.Name())
if strings.HasSuffix(dumpFile, ".sql.gz") {
result, err := diagnoser.DiagnoseFile(dumpFile)
if err != nil {
e.log.Warn("Could not validate dump file", "file", entry.Name(), "error", err)
continue
}
if result.IsTruncated || result.IsCorrupted || !result.IsValid {
dbName := strings.TrimSuffix(entry.Name(), ".sql.gz")
errDetail := "unknown issue"
if len(result.Errors) > 0 {
errDetail = result.Errors[0]
}
corruptedDumps = append(corruptedDumps, fmt.Sprintf("%s: %s", dbName, errDetail))
e.log.Error("CORRUPTED dump file detected",
"database", dbName,
"file", entry.Name(),
"truncated", result.IsTruncated,
"errors", result.Errors)
}
} else if strings.HasSuffix(dumpFile, ".dump") {
// Validate custom format dumps using pg_restore --list
cmd := exec.Command("pg_restore", "--list", dumpFile)
output, err := cmd.CombinedOutput()
if err != nil {
dbName := strings.TrimSuffix(entry.Name(), ".dump")
errDetail := strings.TrimSpace(string(output))
if len(errDetail) > 100 {
errDetail = errDetail[:100] + "..."
}
// Check for truncation indicators
if strings.Contains(errDetail, "unexpected end") || strings.Contains(errDetail, "invalid") {
corruptedDumps = append(corruptedDumps, fmt.Sprintf("%s: %s", dbName, errDetail))
e.log.Error("CORRUPTED custom dump file detected",
"database", dbName,
"file", entry.Name(),
"error", errDetail)
} else {
e.log.Warn("pg_restore --list warning (may be recoverable)",
"file", entry.Name(),
"error", errDetail)
}
}
}
}
if len(corruptedDumps) > 0 {
operation.Fail("Corrupted dump files detected")
e.progress.Fail(fmt.Sprintf("Found %d corrupted dump files - restore aborted", len(corruptedDumps)))
return fmt.Errorf("pre-validation failed: %d corrupted dump files detected:\n %s\n\nThe backup archive appears to be damaged. You need to restore from a different backup.",
len(corruptedDumps), strings.Join(corruptedDumps, "\n "))
}
e.log.Info("All dump files passed validation")
var failedDBs []string
totalDBs := 0
@@ -1214,3 +1349,48 @@ func FormatBytes(bytes int64) string {
}
return fmt.Sprintf("%.1f %cB", float64(bytes)/float64(div), "KMGTPE"[exp])
}
// quickValidateSQLDump performs a fast validation of SQL dump files
// by checking for truncated COPY blocks. This catches corrupted dumps
// BEFORE attempting a full restore (which could waste 49+ minutes).
func (e *Engine) quickValidateSQLDump(archivePath string, compressed bool) error {
e.log.Debug("Pre-validating SQL dump file", "path", archivePath, "compressed", compressed)
diagnoser := NewDiagnoser(e.log, false) // non-verbose for speed
result, err := diagnoser.DiagnoseFile(archivePath)
if err != nil {
return fmt.Errorf("diagnosis error: %w", err)
}
// Check for critical issues that would cause restore failure
if result.IsTruncated {
errMsg := "SQL dump file is TRUNCATED"
if result.Details != nil && result.Details.UnterminatedCopy {
errMsg = fmt.Sprintf("%s - unterminated COPY block for table '%s' at line %d",
errMsg, result.Details.LastCopyTable, result.Details.LastCopyLineNumber)
if len(result.Details.SampleCopyData) > 0 {
errMsg = fmt.Sprintf("%s (sample orphaned data: %s)", errMsg, result.Details.SampleCopyData[0])
}
}
return fmt.Errorf("%s", errMsg)
}
if result.IsCorrupted {
return fmt.Errorf("SQL dump file is corrupted: %v", result.Errors)
}
if !result.IsValid {
if len(result.Errors) > 0 {
return fmt.Errorf("dump validation failed: %s", result.Errors[0])
}
return fmt.Errorf("dump file is invalid (unknown reason)")
}
// Log any warnings but don't fail
for _, warning := range result.Warnings {
e.log.Warn("Dump validation warning", "warning", warning)
}
e.log.Debug("SQL dump validation passed", "path", archivePath)
return nil
}

View File

@@ -0,0 +1,569 @@
package restore
import (
"bufio"
"compress/gzip"
"encoding/json"
"fmt"
"io"
"os"
"os/exec"
"path/filepath"
"runtime"
"strings"
"time"
"dbbackup/internal/config"
"dbbackup/internal/logger"
)
// RestoreErrorReport contains comprehensive information about a restore failure
type RestoreErrorReport struct {
// Metadata
Timestamp time.Time `json:"timestamp"`
Version string `json:"version"`
GoVersion string `json:"go_version"`
OS string `json:"os"`
Arch string `json:"arch"`
// Archive info
ArchivePath string `json:"archive_path"`
ArchiveSize int64 `json:"archive_size"`
ArchiveFormat string `json:"archive_format"`
// Database info
TargetDB string `json:"target_db"`
DatabaseType string `json:"database_type"`
// Error details
ExitCode int `json:"exit_code"`
ErrorMessage string `json:"error_message"`
ErrorType string `json:"error_type"`
ErrorHint string `json:"error_hint"`
TotalErrors int `json:"total_errors"`
// Captured output
LastStderr []string `json:"last_stderr"`
FirstErrors []string `json:"first_errors"`
// Context around failure
FailureContext *FailureContext `json:"failure_context,omitempty"`
// Diagnosis results
DiagnosisResult *DiagnoseResult `json:"diagnosis_result,omitempty"`
// Environment (sanitized)
PostgresVersion string `json:"postgres_version,omitempty"`
PgRestoreVersion string `json:"pg_restore_version,omitempty"`
PsqlVersion string `json:"psql_version,omitempty"`
// Recommendations
Recommendations []string `json:"recommendations"`
}
// FailureContext captures context around where the failure occurred
type FailureContext struct {
// For SQL/COPY errors
FailedLine int `json:"failed_line,omitempty"`
FailedStatement string `json:"failed_statement,omitempty"`
SurroundingLines []string `json:"surrounding_lines,omitempty"`
// For COPY block errors
InCopyBlock bool `json:"in_copy_block,omitempty"`
CopyTableName string `json:"copy_table_name,omitempty"`
CopyStartLine int `json:"copy_start_line,omitempty"`
SampleCopyData []string `json:"sample_copy_data,omitempty"`
// File position info
BytePosition int64 `json:"byte_position,omitempty"`
PercentComplete float64 `json:"percent_complete,omitempty"`
}
// ErrorCollector captures detailed error information during restore
type ErrorCollector struct {
log logger.Logger
cfg *config.Config
archivePath string
targetDB string
format ArchiveFormat
// Captured data
stderrLines []string
firstErrors []string
lastErrors []string
totalErrors int
exitCode int
// Limits
maxStderrLines int
maxErrorCapture int
// State
startTime time.Time
enabled bool
}
// NewErrorCollector creates a new error collector
func NewErrorCollector(cfg *config.Config, log logger.Logger, archivePath, targetDB string, format ArchiveFormat, enabled bool) *ErrorCollector {
return &ErrorCollector{
log: log,
cfg: cfg,
archivePath: archivePath,
targetDB: targetDB,
format: format,
stderrLines: make([]string, 0, 100),
firstErrors: make([]string, 0, 10),
lastErrors: make([]string, 0, 10),
maxStderrLines: 100,
maxErrorCapture: 10,
startTime: time.Now(),
enabled: enabled,
}
}
// CaptureStderr processes and captures stderr output
func (ec *ErrorCollector) CaptureStderr(chunk string) {
if !ec.enabled {
return
}
lines := strings.Split(chunk, "\n")
for _, line := range lines {
line = strings.TrimSpace(line)
if line == "" {
continue
}
// Store last N lines of stderr
if len(ec.stderrLines) >= ec.maxStderrLines {
// Shift array, drop oldest
ec.stderrLines = ec.stderrLines[1:]
}
ec.stderrLines = append(ec.stderrLines, line)
// Check if this is an error line
if isErrorLine(line) {
ec.totalErrors++
// Capture first N errors
if len(ec.firstErrors) < ec.maxErrorCapture {
ec.firstErrors = append(ec.firstErrors, line)
}
// Keep last N errors (ring buffer style)
if len(ec.lastErrors) >= ec.maxErrorCapture {
ec.lastErrors = ec.lastErrors[1:]
}
ec.lastErrors = append(ec.lastErrors, line)
}
}
}
// SetExitCode records the exit code
func (ec *ErrorCollector) SetExitCode(code int) {
ec.exitCode = code
}
// GenerateReport creates a comprehensive error report
func (ec *ErrorCollector) GenerateReport(errMessage string, errType string, errHint string) *RestoreErrorReport {
report := &RestoreErrorReport{
Timestamp: time.Now(),
Version: "1.0.0", // TODO: inject actual version
GoVersion: runtime.Version(),
OS: runtime.GOOS,
Arch: runtime.GOARCH,
ArchivePath: ec.archivePath,
ArchiveFormat: ec.format.String(),
TargetDB: ec.targetDB,
DatabaseType: getDatabaseType(ec.format),
ExitCode: ec.exitCode,
ErrorMessage: errMessage,
ErrorType: errType,
ErrorHint: errHint,
TotalErrors: ec.totalErrors,
LastStderr: ec.stderrLines,
FirstErrors: ec.firstErrors,
}
// Get archive size
if stat, err := os.Stat(ec.archivePath); err == nil {
report.ArchiveSize = stat.Size()
}
// Get tool versions
report.PostgresVersion = getCommandVersion("postgres", "--version")
report.PgRestoreVersion = getCommandVersion("pg_restore", "--version")
report.PsqlVersion = getCommandVersion("psql", "--version")
// Analyze failure context
report.FailureContext = ec.analyzeFailureContext()
// Run diagnosis if not already done
diagnoser := NewDiagnoser(ec.log, false)
if diagResult, err := diagnoser.DiagnoseFile(ec.archivePath); err == nil {
report.DiagnosisResult = diagResult
}
// Generate recommendations
report.Recommendations = ec.generateRecommendations(report)
return report
}
// analyzeFailureContext extracts context around the failure
func (ec *ErrorCollector) analyzeFailureContext() *FailureContext {
ctx := &FailureContext{}
// Look for line number in errors
for _, errLine := range ec.lastErrors {
if lineNum := extractLineNumber(errLine); lineNum > 0 {
ctx.FailedLine = lineNum
break
}
}
// Look for COPY-related errors
for _, errLine := range ec.lastErrors {
if strings.Contains(errLine, "COPY") || strings.Contains(errLine, "syntax error") {
ctx.InCopyBlock = true
// Try to extract table name
if tableName := extractTableName(errLine); tableName != "" {
ctx.CopyTableName = tableName
}
break
}
}
// If we have a line number, try to get surrounding context from the dump
if ctx.FailedLine > 0 && ec.archivePath != "" {
ctx.SurroundingLines = ec.getSurroundingLines(ctx.FailedLine, 5)
}
return ctx
}
// getSurroundingLines reads lines around a specific line number from the dump
func (ec *ErrorCollector) getSurroundingLines(lineNum int, context int) []string {
var reader io.Reader
var lines []string
file, err := os.Open(ec.archivePath)
if err != nil {
return nil
}
defer file.Close()
// Handle compressed files
if strings.HasSuffix(ec.archivePath, ".gz") {
gz, err := gzip.NewReader(file)
if err != nil {
return nil
}
defer gz.Close()
reader = gz
} else {
reader = file
}
scanner := bufio.NewScanner(reader)
buf := make([]byte, 0, 1024*1024)
scanner.Buffer(buf, 10*1024*1024)
currentLine := 0
startLine := lineNum - context
endLine := lineNum + context
if startLine < 1 {
startLine = 1
}
for scanner.Scan() {
currentLine++
if currentLine >= startLine && currentLine <= endLine {
prefix := " "
if currentLine == lineNum {
prefix = "> "
}
lines = append(lines, fmt.Sprintf("%s%d: %s", prefix, currentLine, truncateString(scanner.Text(), 100)))
}
if currentLine > endLine {
break
}
}
return lines
}
// generateRecommendations provides actionable recommendations based on the error
func (ec *ErrorCollector) generateRecommendations(report *RestoreErrorReport) []string {
var recs []string
// Check diagnosis results
if report.DiagnosisResult != nil {
if report.DiagnosisResult.IsTruncated {
recs = append(recs,
"CRITICAL: Backup file is truncated/incomplete",
"Action: Re-run the backup for the affected database",
"Check: Verify disk space was available during backup",
"Check: Verify network was stable during backup transfer",
)
}
if report.DiagnosisResult.IsCorrupted {
recs = append(recs,
"CRITICAL: Backup file appears corrupted",
"Action: Restore from a previous backup",
"Action: Verify backup file checksum if available",
)
}
if report.DiagnosisResult.Details != nil && report.DiagnosisResult.Details.UnterminatedCopy {
recs = append(recs,
fmt.Sprintf("ISSUE: COPY block for table '%s' was not terminated",
report.DiagnosisResult.Details.LastCopyTable),
"Cause: Backup was interrupted during data export",
"Action: Re-run backup ensuring it completes fully",
)
}
}
// Check error patterns
if report.TotalErrors > 1000000 {
recs = append(recs,
"ISSUE: Millions of errors indicate structural problem, not individual data issues",
"Cause: Likely wrong restore method or truncated dump",
"Check: Verify dump format matches restore command",
)
}
// Check for common error types
errLower := strings.ToLower(report.ErrorMessage)
if strings.Contains(errLower, "syntax error") {
recs = append(recs,
"ISSUE: SQL syntax errors during restore",
"Cause: COPY data being interpreted as SQL commands",
"Check: Run 'dbbackup restore diagnose <archive>' for detailed analysis",
)
}
if strings.Contains(errLower, "permission denied") {
recs = append(recs,
"ISSUE: Permission denied",
"Action: Check database user has sufficient privileges",
"Action: For ownership preservation, use a superuser account",
)
}
if strings.Contains(errLower, "does not exist") {
recs = append(recs,
"ISSUE: Missing object reference",
"Action: Ensure globals.sql was restored first (for roles/tablespaces)",
"Action: Check if target database was created",
)
}
if len(recs) == 0 {
recs = append(recs,
"Run 'dbbackup restore diagnose <archive>' for detailed analysis",
"Check the stderr output above for specific error messages",
"Review the PostgreSQL/MySQL logs on the target server",
)
}
return recs
}
// SaveReport saves the error report to a file
func (ec *ErrorCollector) SaveReport(report *RestoreErrorReport, outputPath string) error {
// Create directory if needed
dir := filepath.Dir(outputPath)
if err := os.MkdirAll(dir, 0755); err != nil {
return fmt.Errorf("failed to create directory: %w", err)
}
// Marshal to JSON with indentation
data, err := json.MarshalIndent(report, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal report: %w", err)
}
// Write file
if err := os.WriteFile(outputPath, data, 0644); err != nil {
return fmt.Errorf("failed to write report: %w", err)
}
return nil
}
// PrintReport prints a human-readable summary of the error report
func (ec *ErrorCollector) PrintReport(report *RestoreErrorReport) {
fmt.Println()
fmt.Println(strings.Repeat("═", 70))
fmt.Println(" 🔴 RESTORE ERROR REPORT")
fmt.Println(strings.Repeat("═", 70))
fmt.Printf("\n📅 Timestamp: %s\n", report.Timestamp.Format("2006-01-02 15:04:05"))
fmt.Printf("📦 Archive: %s\n", filepath.Base(report.ArchivePath))
fmt.Printf("📊 Format: %s\n", report.ArchiveFormat)
fmt.Printf("🎯 Target DB: %s\n", report.TargetDB)
fmt.Printf("⚠️ Exit Code: %d\n", report.ExitCode)
fmt.Printf("❌ Total Errors: %d\n", report.TotalErrors)
fmt.Println("\n" + strings.Repeat("─", 70))
fmt.Println("ERROR DETAILS:")
fmt.Println(strings.Repeat("─", 70))
fmt.Printf("\nType: %s\n", report.ErrorType)
fmt.Printf("Message: %s\n", report.ErrorMessage)
if report.ErrorHint != "" {
fmt.Printf("Hint: %s\n", report.ErrorHint)
}
// Show failure context
if report.FailureContext != nil && report.FailureContext.FailedLine > 0 {
fmt.Println("\n" + strings.Repeat("─", 70))
fmt.Println("FAILURE CONTEXT:")
fmt.Println(strings.Repeat("─", 70))
fmt.Printf("\nFailed at line: %d\n", report.FailureContext.FailedLine)
if report.FailureContext.InCopyBlock {
fmt.Printf("Inside COPY block for table: %s\n", report.FailureContext.CopyTableName)
}
if len(report.FailureContext.SurroundingLines) > 0 {
fmt.Println("\nSurrounding lines:")
for _, line := range report.FailureContext.SurroundingLines {
fmt.Println(line)
}
}
}
// Show first few errors
if len(report.FirstErrors) > 0 {
fmt.Println("\n" + strings.Repeat("─", 70))
fmt.Println("FIRST ERRORS:")
fmt.Println(strings.Repeat("─", 70))
for i, err := range report.FirstErrors {
if i >= 5 {
fmt.Printf("... and %d more\n", len(report.FirstErrors)-5)
break
}
fmt.Printf(" %d. %s\n", i+1, truncateString(err, 100))
}
}
// Show diagnosis summary
if report.DiagnosisResult != nil && !report.DiagnosisResult.IsValid {
fmt.Println("\n" + strings.Repeat("─", 70))
fmt.Println("DIAGNOSIS:")
fmt.Println(strings.Repeat("─", 70))
if report.DiagnosisResult.IsTruncated {
fmt.Println(" ❌ File is TRUNCATED")
}
if report.DiagnosisResult.IsCorrupted {
fmt.Println(" ❌ File is CORRUPTED")
}
for i, err := range report.DiagnosisResult.Errors {
if i >= 3 {
break
}
fmt.Printf(" • %s\n", err)
}
}
// Show recommendations
fmt.Println("\n" + strings.Repeat("─", 70))
fmt.Println("💡 RECOMMENDATIONS:")
fmt.Println(strings.Repeat("─", 70))
for _, rec := range report.Recommendations {
fmt.Printf(" • %s\n", rec)
}
// Show tool versions
fmt.Println("\n" + strings.Repeat("─", 70))
fmt.Println("ENVIRONMENT:")
fmt.Println(strings.Repeat("─", 70))
fmt.Printf(" OS: %s/%s\n", report.OS, report.Arch)
fmt.Printf(" Go: %s\n", report.GoVersion)
if report.PgRestoreVersion != "" {
fmt.Printf(" pg_restore: %s\n", report.PgRestoreVersion)
}
if report.PsqlVersion != "" {
fmt.Printf(" psql: %s\n", report.PsqlVersion)
}
fmt.Println(strings.Repeat("═", 70))
}
// Helper functions
func isErrorLine(line string) bool {
return strings.Contains(line, "ERROR:") ||
strings.Contains(line, "FATAL:") ||
strings.Contains(line, "error:") ||
strings.Contains(line, "PANIC:")
}
func extractLineNumber(errLine string) int {
// Look for patterns like "LINE 1:" or "line 123"
patterns := []string{"LINE ", "line "}
for _, pattern := range patterns {
if idx := strings.Index(errLine, pattern); idx >= 0 {
numStart := idx + len(pattern)
numEnd := numStart
for numEnd < len(errLine) && errLine[numEnd] >= '0' && errLine[numEnd] <= '9' {
numEnd++
}
if numEnd > numStart {
var num int
fmt.Sscanf(errLine[numStart:numEnd], "%d", &num)
return num
}
}
}
return 0
}
func extractTableName(errLine string) string {
// Look for patterns like 'COPY "tablename"' or 'table "tablename"'
patterns := []string{"COPY ", "table "}
for _, pattern := range patterns {
if idx := strings.Index(errLine, pattern); idx >= 0 {
start := idx + len(pattern)
// Skip optional quote
if start < len(errLine) && errLine[start] == '"' {
start++
}
end := start
for end < len(errLine) && errLine[end] != '"' && errLine[end] != ' ' && errLine[end] != '(' {
end++
}
if end > start {
return errLine[start:end]
}
}
}
return ""
}
func getDatabaseType(format ArchiveFormat) string {
if format.IsMySQL() {
return "mysql"
}
return "postgresql"
}
func getCommandVersion(cmd string, arg string) string {
output, err := exec.Command(cmd, arg).CombinedOutput()
if err != nil {
return ""
}
// Return first line only
lines := strings.Split(string(output), "\n")
if len(lines) > 0 {
return strings.TrimSpace(lines[0])
}
return ""
}

View File

@@ -201,6 +201,12 @@ func (m ArchiveBrowserModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
if len(m.archives) > 0 && m.cursor < len(m.archives) {
selected := m.archives[m.cursor]
// Handle diagnose mode - go directly to diagnosis view
if m.mode == "diagnose" {
diagnoseView := NewDiagnoseView(m.config, m.logger, m.parent, m.ctx, selected)
return diagnoseView, diagnoseView.Init()
}
// Validate selection based on mode
if m.mode == "restore-cluster" && !selected.Format.IsClusterBackup() {
m.message = errorStyle.Render("❌ Please select a cluster backup (.tar.gz)")
@@ -227,6 +233,14 @@ func (m ArchiveBrowserModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
formatSize(selected.Size),
selected.Modified.Format("2006-01-02 15:04:05"))
}
case "d":
// Run diagnosis on selected archive
if len(m.archives) > 0 && m.cursor < len(m.archives) {
selected := m.archives[m.cursor]
diagnoseView := NewDiagnoseView(m.config, m.logger, m, m.ctx, selected)
return diagnoseView, diagnoseView.Init()
}
}
}
@@ -242,6 +256,8 @@ func (m ArchiveBrowserModel) View() string {
title = "📦 Select Archive to Restore (Single Database)"
} else if m.mode == "restore-cluster" {
title = "📦 Select Archive to Restore (Cluster)"
} else if m.mode == "diagnose" {
title = "🔍 Select Archive to Diagnose"
}
s.WriteString(titleStyle.Render(title))
@@ -335,7 +351,7 @@ func (m ArchiveBrowserModel) View() string {
s.WriteString(infoStyle.Render(fmt.Sprintf("Total: %d archive(s) | Selected: %d/%d",
len(m.archives), m.cursor+1, len(m.archives))))
s.WriteString("\n")
s.WriteString(infoStyle.Render("⌨️ ↑/↓: Navigate | Enter: Select | f: Filter | i: Info | Esc: Back"))
s.WriteString(infoStyle.Render("⌨️ ↑/↓: Navigate | Enter: Select | d: Diagnose | f: Filter | i: Info | Esc: Back"))
return s.String()
}

View File

@@ -20,12 +20,14 @@ type BackupExecutionModel struct {
logger logger.Logger
parent tea.Model
ctx context.Context
cancel context.CancelFunc // Cancel function to stop the operation
backupType string
databaseName string
ratio int
status string
progress int
done bool
cancelling bool // True when user has requested cancellation
err error
result string
startTime time.Time
@@ -34,11 +36,14 @@ type BackupExecutionModel struct {
}
func NewBackupExecution(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, backupType, dbName string, ratio int) BackupExecutionModel {
// Create a cancellable context derived from parent
childCtx, cancel := context.WithCancel(ctx)
return BackupExecutionModel{
config: cfg,
logger: log,
parent: parent,
ctx: ctx,
ctx: childCtx,
cancel: cancel,
backupType: backupType,
databaseName: dbName,
ratio: ratio,
@@ -206,9 +211,21 @@ func (m BackupExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
return m, nil
case tea.KeyMsg:
if m.done {
switch msg.String() {
case "enter", "esc", "q":
switch msg.String() {
case "ctrl+c", "esc":
if !m.done && !m.cancelling {
// User requested cancellation - cancel the context
m.cancelling = true
m.status = "⏹️ Cancelling backup... (please wait)"
if m.cancel != nil {
m.cancel()
}
return m, nil
} else if m.done {
return m.parent, nil
}
case "enter", "q":
if m.done {
return m.parent, nil
}
}
@@ -240,7 +257,12 @@ func (m BackupExecutionModel) View() string {
// Status with spinner
if !m.done {
s.WriteString(fmt.Sprintf(" %s %s\n", spinnerFrames[m.spinnerFrame], m.status))
if m.cancelling {
s.WriteString(fmt.Sprintf(" %s %s\n", spinnerFrames[m.spinnerFrame], m.status))
} else {
s.WriteString(fmt.Sprintf(" %s %s\n", spinnerFrames[m.spinnerFrame], m.status))
s.WriteString("\n ⌨️ Press Ctrl+C or ESC to cancel\n")
}
} else {
s.WriteString(fmt.Sprintf(" %s\n\n", m.status))

View File

@@ -0,0 +1,463 @@
package tui
import (
"context"
"fmt"
"os"
"strings"
tea "github.com/charmbracelet/bubbletea"
"github.com/charmbracelet/lipgloss"
"dbbackup/internal/config"
"dbbackup/internal/logger"
"dbbackup/internal/restore"
)
var (
diagnoseBoxStyle = lipgloss.NewStyle().
Border(lipgloss.RoundedBorder()).
BorderForeground(lipgloss.Color("63")).
Padding(1, 2)
diagnosePassStyle = lipgloss.NewStyle().
Foreground(lipgloss.Color("2")).
Bold(true)
diagnoseFailStyle = lipgloss.NewStyle().
Foreground(lipgloss.Color("1")).
Bold(true)
diagnoseWarnStyle = lipgloss.NewStyle().
Foreground(lipgloss.Color("3"))
diagnoseInfoStyle = lipgloss.NewStyle().
Foreground(lipgloss.Color("244"))
diagnoseHeaderStyle = lipgloss.NewStyle().
Foreground(lipgloss.Color("63")).
Bold(true)
)
// DiagnoseViewModel shows backup file diagnosis results
type DiagnoseViewModel struct {
config *config.Config
logger logger.Logger
parent tea.Model
ctx context.Context
archive ArchiveInfo
result *restore.DiagnoseResult
results []*restore.DiagnoseResult // For cluster archives
running bool
completed bool
progress string
cursor int // For scrolling through cluster results
err error
}
// NewDiagnoseView creates a new diagnose view
func NewDiagnoseView(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, archive ArchiveInfo) DiagnoseViewModel {
return DiagnoseViewModel{
config: cfg,
logger: log,
parent: parent,
ctx: ctx,
archive: archive,
running: true,
progress: "Starting diagnosis...",
}
}
func (m DiagnoseViewModel) Init() tea.Cmd {
return runDiagnosis(m.config, m.logger, m.archive)
}
type diagnoseCompleteMsg struct {
result *restore.DiagnoseResult
results []*restore.DiagnoseResult
err error
}
type diagnoseProgressMsg struct {
message string
}
func runDiagnosis(cfg *config.Config, log logger.Logger, archive ArchiveInfo) tea.Cmd {
return func() tea.Msg {
diagnoser := restore.NewDiagnoser(log, true)
// For cluster archives, we can do deep analysis
if archive.Format.IsClusterBackup() {
// Create temp directory (use WorkDir if configured for large archives)
log.Info("Creating temp directory for diagnosis", "workDir", cfg.WorkDir)
tempDir, err := createTempDirIn(cfg.WorkDir, "dbbackup-diagnose-*")
if err != nil {
return diagnoseCompleteMsg{err: fmt.Errorf("failed to create temp dir (workDir=%s): %w", cfg.WorkDir, err)}
}
log.Info("Using temp directory", "path", tempDir)
defer removeTempDir(tempDir)
// Diagnose all dumps in the cluster
results, err := diagnoser.DiagnoseClusterDumps(archive.Path, tempDir)
if err != nil {
return diagnoseCompleteMsg{err: err}
}
return diagnoseCompleteMsg{results: results}
}
// Single file diagnosis
result, err := diagnoser.DiagnoseFile(archive.Path)
if err != nil {
return diagnoseCompleteMsg{err: err}
}
return diagnoseCompleteMsg{result: result}
}
}
func (m DiagnoseViewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
switch msg := msg.(type) {
case diagnoseCompleteMsg:
m.running = false
m.completed = true
if msg.err != nil {
m.err = msg.err
return m, nil
}
m.result = msg.result
m.results = msg.results
return m, nil
case diagnoseProgressMsg:
m.progress = msg.message
return m, nil
case tea.KeyMsg:
switch msg.String() {
case "ctrl+c", "q", "esc":
return m.parent, nil
case "up", "k":
if len(m.results) > 0 && m.cursor > 0 {
m.cursor--
}
case "down", "j":
if len(m.results) > 0 && m.cursor < len(m.results)-1 {
m.cursor++
}
case "enter", " ":
return m.parent, nil
}
}
return m, nil
}
func (m DiagnoseViewModel) View() string {
var s strings.Builder
// Header
s.WriteString(titleStyle.Render("🔍 Backup Diagnosis"))
s.WriteString("\n\n")
// Archive info
s.WriteString(diagnoseHeaderStyle.Render("Archive: "))
s.WriteString(m.archive.Name)
s.WriteString("\n")
s.WriteString(diagnoseHeaderStyle.Render("Format: "))
s.WriteString(m.archive.Format.String())
s.WriteString("\n")
s.WriteString(diagnoseHeaderStyle.Render("Size: "))
s.WriteString(formatSize(m.archive.Size))
s.WriteString("\n\n")
if m.running {
s.WriteString(infoStyle.Render("⏳ " + m.progress))
s.WriteString("\n\n")
s.WriteString(diagnoseInfoStyle.Render("This may take a while for large archives..."))
return s.String()
}
if m.err != nil {
s.WriteString(errorStyle.Render(fmt.Sprintf("❌ Diagnosis failed: %v", m.err)))
s.WriteString("\n\n")
s.WriteString(infoStyle.Render("Press Enter or Esc to go back"))
return s.String()
}
// For cluster archives, show summary + details
if len(m.results) > 0 {
s.WriteString(m.renderClusterResults())
} else if m.result != nil {
s.WriteString(m.renderSingleResult(m.result))
}
s.WriteString("\n")
s.WriteString(infoStyle.Render("Press Enter or Esc to go back"))
return s.String()
}
func (m DiagnoseViewModel) renderSingleResult(result *restore.DiagnoseResult) string {
var s strings.Builder
// Status
s.WriteString(strings.Repeat("─", 60))
s.WriteString("\n")
if result.IsValid {
s.WriteString(diagnosePassStyle.Render("✅ STATUS: VALID"))
} else {
s.WriteString(diagnoseFailStyle.Render("❌ STATUS: INVALID"))
}
s.WriteString("\n")
if result.IsTruncated {
s.WriteString(diagnoseFailStyle.Render("⚠️ TRUNCATED: File appears incomplete"))
s.WriteString("\n")
}
if result.IsCorrupted {
s.WriteString(diagnoseFailStyle.Render("⚠️ CORRUPTED: File structure is damaged"))
s.WriteString("\n")
}
s.WriteString(strings.Repeat("─", 60))
s.WriteString("\n\n")
// Details
if result.Details != nil {
s.WriteString(diagnoseHeaderStyle.Render("📊 DETAILS:"))
s.WriteString("\n")
if result.Details.HasPGDMPSignature {
s.WriteString(diagnosePassStyle.Render(" ✓ "))
s.WriteString("Has PGDMP signature (custom format)\n")
}
if result.Details.HasSQLHeader {
s.WriteString(diagnosePassStyle.Render(" ✓ "))
s.WriteString("Has PostgreSQL SQL header\n")
}
if result.Details.GzipValid {
s.WriteString(diagnosePassStyle.Render(" ✓ "))
s.WriteString("Gzip compression valid\n")
}
if result.Details.PgRestoreListable {
s.WriteString(diagnosePassStyle.Render(" ✓ "))
s.WriteString(fmt.Sprintf("pg_restore can list contents (%d tables)\n", result.Details.TableCount))
}
if result.Details.CopyBlockCount > 0 {
s.WriteString(diagnoseInfoStyle.Render(" • "))
s.WriteString(fmt.Sprintf("Contains %d COPY blocks\n", result.Details.CopyBlockCount))
}
if result.Details.UnterminatedCopy {
s.WriteString(diagnoseFailStyle.Render(" ✗ "))
s.WriteString(fmt.Sprintf("Unterminated COPY block: %s (line %d)\n",
result.Details.LastCopyTable, result.Details.LastCopyLineNumber))
}
if result.Details.ProperlyTerminated {
s.WriteString(diagnosePassStyle.Render(" ✓ "))
s.WriteString("All COPY blocks properly terminated\n")
}
if result.Details.ExpandedSize > 0 {
s.WriteString(diagnoseInfoStyle.Render(" • "))
s.WriteString(fmt.Sprintf("Expanded size: %s (ratio: %.1fx)\n",
formatSize(result.Details.ExpandedSize), result.Details.CompressionRatio))
}
}
// Errors
if len(result.Errors) > 0 {
s.WriteString("\n")
s.WriteString(diagnoseFailStyle.Render("❌ ERRORS:"))
s.WriteString("\n")
for i, e := range result.Errors {
if i >= 5 {
s.WriteString(diagnoseInfoStyle.Render(fmt.Sprintf(" ... and %d more\n", len(result.Errors)-5)))
break
}
s.WriteString(diagnoseFailStyle.Render(" • "))
s.WriteString(truncate(e, 70))
s.WriteString("\n")
}
}
// Warnings
if len(result.Warnings) > 0 {
s.WriteString("\n")
s.WriteString(diagnoseWarnStyle.Render("⚠️ WARNINGS:"))
s.WriteString("\n")
for i, w := range result.Warnings {
if i >= 3 {
s.WriteString(diagnoseInfoStyle.Render(fmt.Sprintf(" ... and %d more\n", len(result.Warnings)-3)))
break
}
s.WriteString(diagnoseWarnStyle.Render(" • "))
s.WriteString(truncate(w, 70))
s.WriteString("\n")
}
}
// Recommendations
if !result.IsValid {
s.WriteString("\n")
s.WriteString(diagnoseHeaderStyle.Render("💡 RECOMMENDATIONS:"))
s.WriteString("\n")
if result.IsTruncated {
s.WriteString(" 1. Re-run the backup process for this database\n")
s.WriteString(" 2. Check disk space on backup server\n")
s.WriteString(" 3. Verify network stability for remote backups\n")
}
if result.IsCorrupted {
s.WriteString(" 1. Verify backup was transferred completely\n")
s.WriteString(" 2. Try restoring from a previous backup\n")
}
}
return s.String()
}
func (m DiagnoseViewModel) renderClusterResults() string {
var s strings.Builder
// Summary
validCount := 0
invalidCount := 0
for _, r := range m.results {
if r.IsValid {
validCount++
} else {
invalidCount++
}
}
s.WriteString(strings.Repeat("─", 60))
s.WriteString("\n")
s.WriteString(diagnoseHeaderStyle.Render(fmt.Sprintf("📊 CLUSTER SUMMARY: %d databases\n", len(m.results))))
s.WriteString(strings.Repeat("─", 60))
s.WriteString("\n\n")
if invalidCount == 0 {
s.WriteString(diagnosePassStyle.Render("✅ All dumps are valid"))
s.WriteString("\n\n")
} else {
s.WriteString(diagnoseFailStyle.Render(fmt.Sprintf("❌ %d/%d dumps have issues", invalidCount, len(m.results))))
s.WriteString("\n\n")
}
// List all dumps with status
s.WriteString(diagnoseHeaderStyle.Render("Database Dumps:"))
s.WriteString("\n")
// Show visible range based on cursor
start := m.cursor - 5
if start < 0 {
start = 0
}
end := start + 12
if end > len(m.results) {
end = len(m.results)
}
for i := start; i < end; i++ {
r := m.results[i]
cursor := " "
if i == m.cursor {
cursor = ">"
}
var status string
if r.IsValid {
status = diagnosePassStyle.Render("✓")
} else if r.IsTruncated {
status = diagnoseFailStyle.Render("✗ TRUNCATED")
} else if r.IsCorrupted {
status = diagnoseFailStyle.Render("✗ CORRUPTED")
} else {
status = diagnoseFailStyle.Render("✗ INVALID")
}
line := fmt.Sprintf("%s %s %-35s %s",
cursor,
status,
truncate(r.FileName, 35),
formatSize(r.FileSize))
if i == m.cursor {
s.WriteString(archiveSelectedStyle.Render(line))
} else {
s.WriteString(line)
}
s.WriteString("\n")
}
// Show selected dump details
if m.cursor < len(m.results) {
selected := m.results[m.cursor]
s.WriteString("\n")
s.WriteString(strings.Repeat("─", 60))
s.WriteString("\n")
s.WriteString(diagnoseHeaderStyle.Render("Selected: " + selected.FileName))
s.WriteString("\n\n")
// Show condensed details for selected
if selected.Details != nil {
if selected.Details.UnterminatedCopy {
s.WriteString(diagnoseFailStyle.Render(" ✗ Unterminated COPY: "))
s.WriteString(selected.Details.LastCopyTable)
s.WriteString(fmt.Sprintf(" (line %d)\n", selected.Details.LastCopyLineNumber))
}
if len(selected.Details.SampleCopyData) > 0 {
s.WriteString(diagnoseInfoStyle.Render(" Sample orphaned data: "))
s.WriteString(truncate(selected.Details.SampleCopyData[0], 50))
s.WriteString("\n")
}
}
if len(selected.Errors) > 0 {
for i, e := range selected.Errors {
if i >= 2 {
break
}
s.WriteString(diagnoseFailStyle.Render(" • "))
s.WriteString(truncate(e, 55))
s.WriteString("\n")
}
}
}
s.WriteString("\n")
s.WriteString(infoStyle.Render("Use ↑/↓ to browse, Enter/Esc to go back"))
return s.String()
}
// Helper functions for temp directory management
func createTempDir(pattern string) (string, error) {
return os.MkdirTemp("", pattern)
}
func createTempDirIn(baseDir, pattern string) (string, error) {
if baseDir == "" {
return os.MkdirTemp("", pattern)
}
// Ensure base directory exists
if err := os.MkdirAll(baseDir, 0755); err != nil {
return "", fmt.Errorf("cannot create work directory: %w", err)
}
return os.MkdirTemp(baseDir, pattern)
}
func removeTempDir(path string) error {
return os.RemoveAll(path)
}

View File

@@ -92,6 +92,7 @@ func NewMenuModel(cfg *config.Config, log logger.Logger) *MenuModel {
"────────────────────────────────",
"Restore Single Database",
"Restore Cluster Backup",
"Diagnose Backup File",
"List & Manage Backups",
"────────────────────────────────",
"View Active Operations",
@@ -163,19 +164,21 @@ func (m *MenuModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
return m.handleRestoreSingle()
case 5: // Restore Cluster Backup
return m.handleRestoreCluster()
case 6: // List & Manage Backups
case 6: // Diagnose Backup File
return m.handleDiagnoseBackup()
case 7: // List & Manage Backups
return m.handleBackupManager()
case 8: // View Active Operations
case 9: // View Active Operations
return m.handleViewOperations()
case 9: // Show Operation History
case 10: // Show Operation History
return m.handleOperationHistory()
case 10: // Database Status
case 11: // Database Status
return m.handleStatus()
case 11: // Settings
case 12: // Settings
return m.handleSettings()
case 12: // Clear History
case 13: // Clear History
m.message = "🗑️ History cleared"
case 13: // Quit
case 14: // Quit
if m.cancel != nil {
m.cancel()
}
@@ -244,21 +247,23 @@ func (m *MenuModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
return m.handleRestoreSingle()
case 5: // Restore Cluster Backup
return m.handleRestoreCluster()
case 6: // List & Manage Backups
case 6: // Diagnose Backup File
return m.handleDiagnoseBackup()
case 7: // List & Manage Backups
return m.handleBackupManager()
case 7: // Separator
case 8: // Separator
// Do nothing
case 8: // View Active Operations
case 9: // View Active Operations
return m.handleViewOperations()
case 9: // Show Operation History
case 10: // Show Operation History
return m.handleOperationHistory()
case 10: // Database Status
case 11: // Database Status
return m.handleStatus()
case 11: // Settings
case 12: // Settings
return m.handleSettings()
case 12: // Clear History
case 13: // Clear History
m.message = "🗑️ History cleared"
case 13: // Quit
case 14: // Quit
if m.cancel != nil {
m.cancel()
}
@@ -407,6 +412,12 @@ func (m *MenuModel) handleBackupManager() (tea.Model, tea.Cmd) {
return manager, manager.Init()
}
// handleDiagnoseBackup opens archive browser for diagnosis
func (m *MenuModel) handleDiagnoseBackup() (tea.Model, tea.Cmd) {
browser := NewArchiveBrowser(m.config, m.logger, m, m.ctx, "diagnose")
return browser, browser.Init()
}
func (m *MenuModel) applyDatabaseSelection() {
if m == nil || len(m.dbTypes) == 0 {
return

View File

@@ -24,6 +24,7 @@ type RestoreExecutionModel struct {
logger logger.Logger
parent tea.Model
ctx context.Context
cancel context.CancelFunc // Cancel function to stop the operation
archive ArchiveInfo
targetDB string
cleanFirst bool
@@ -31,6 +32,8 @@ type RestoreExecutionModel struct {
restoreType string
cleanClusterFirst bool // Drop all user databases before cluster restore
existingDBs []string // List of databases to drop
saveDebugLog bool // Save detailed error report on failure
workDir string // Custom work directory for extraction
// Progress tracking
status string
@@ -42,19 +45,23 @@ type RestoreExecutionModel struct {
spinnerFrames []string
// Results
done bool
err error
result string
elapsed time.Duration
done bool
cancelling bool // True when user has requested cancellation
err error
result string
elapsed time.Duration
}
// NewRestoreExecution creates a new restore execution model
func NewRestoreExecution(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, archive ArchiveInfo, targetDB string, cleanFirst, createIfMissing bool, restoreType string, cleanClusterFirst bool, existingDBs []string) RestoreExecutionModel {
func NewRestoreExecution(cfg *config.Config, log logger.Logger, parent tea.Model, ctx context.Context, archive ArchiveInfo, targetDB string, cleanFirst, createIfMissing bool, restoreType string, cleanClusterFirst bool, existingDBs []string, saveDebugLog bool, workDir string) RestoreExecutionModel {
// Create a cancellable context derived from parent
childCtx, cancel := context.WithCancel(ctx)
return RestoreExecutionModel{
config: cfg,
logger: log,
parent: parent,
ctx: ctx,
ctx: childCtx,
cancel: cancel,
archive: archive,
targetDB: targetDB,
cleanFirst: cleanFirst,
@@ -62,6 +69,8 @@ func NewRestoreExecution(cfg *config.Config, log logger.Logger, parent tea.Model
restoreType: restoreType,
cleanClusterFirst: cleanClusterFirst,
existingDBs: existingDBs,
saveDebugLog: saveDebugLog,
workDir: workDir,
status: "Initializing...",
phase: "Starting",
startTime: time.Now(),
@@ -73,7 +82,7 @@ func NewRestoreExecution(cfg *config.Config, log logger.Logger, parent tea.Model
func (m RestoreExecutionModel) Init() tea.Cmd {
return tea.Batch(
executeRestoreWithTUIProgress(m.ctx, m.config, m.logger, m.archive, m.targetDB, m.cleanFirst, m.createIfMissing, m.restoreType, m.cleanClusterFirst, m.existingDBs),
executeRestoreWithTUIProgress(m.ctx, m.config, m.logger, m.archive, m.targetDB, m.cleanFirst, m.createIfMissing, m.restoreType, m.cleanClusterFirst, m.existingDBs, m.saveDebugLog),
restoreTickCmd(),
)
}
@@ -99,7 +108,7 @@ type restoreCompleteMsg struct {
elapsed time.Duration
}
func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string, cleanFirst, createIfMissing bool, restoreType string, cleanClusterFirst bool, existingDBs []string) tea.Cmd {
func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string, cleanFirst, createIfMissing bool, restoreType string, cleanClusterFirst bool, existingDBs []string, saveDebugLog bool) tea.Cmd {
return func() tea.Msg {
// Use configurable cluster timeout (minutes) from config; default set in config.New()
// Use parent context to inherit cancellation from TUI
@@ -146,6 +155,14 @@ func executeRestoreWithTUIProgress(parentCtx context.Context, cfg *config.Config
// STEP 2: Create restore engine with silent progress (no stdout interference with TUI)
engine := restore.NewSilent(cfg, log, dbClient)
// Enable debug logging if requested
if saveDebugLog {
// Generate debug log path based on archive name and timestamp
debugLogPath := fmt.Sprintf("/tmp/dbbackup-restore-debug-%s.json", time.Now().Format("20060102-150405"))
engine.SetDebugLogPath(debugLogPath)
log.Info("Debug logging enabled", "path", debugLogPath)
}
// Set up progress callback (but it won't work in goroutine - progress is already sent via logs)
// The TUI will just use spinner animation to show activity
@@ -262,11 +279,32 @@ func (m RestoreExecutionModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
case tea.KeyMsg:
switch msg.String() {
case "ctrl+c", "q":
// Always allow quitting
return m.parent, tea.Quit
case "enter", " ", "esc":
case "ctrl+c", "esc":
if !m.done && !m.cancelling {
// User requested cancellation - cancel the context
m.cancelling = true
m.status = "⏹️ Cancelling restore... (please wait)"
m.phase = "Cancelling"
if m.cancel != nil {
m.cancel()
}
return m, nil
} else if m.done {
return m.parent, nil
}
case "q":
if !m.done && !m.cancelling {
m.cancelling = true
m.status = "⏹️ Cancelling restore... (please wait)"
m.phase = "Cancelling"
if m.cancel != nil {
m.cancel()
}
return m, nil
} else if m.done {
return m.parent, tea.Quit
}
case "enter", " ":
if m.done {
return m.parent, nil
}

View File

@@ -59,6 +59,8 @@ type RestorePreviewModel struct {
checking bool
canProceed bool
message string
saveDebugLog bool // Save detailed error report on failure
workDir string // Custom work directory for extraction
}
// NewRestorePreview creates a new restore preview
@@ -80,8 +82,10 @@ func NewRestorePreview(cfg *config.Config, log logger.Logger, parent tea.Model,
cleanFirst: false,
createIfMissing: true,
checking: true,
workDir: cfg.WorkDir, // Use configured work directory
safetyChecks: []SafetyCheck{
{Name: "Archive integrity", Status: "pending", Critical: true},
{Name: "Dump validity", Status: "pending", Critical: true},
{Name: "Disk space", Status: "pending", Critical: true},
{Name: "Required tools", Status: "pending", Critical: true},
{Name: "Target database", Status: "pending", Critical: false},
@@ -102,7 +106,7 @@ type safetyCheckCompleteMsg struct {
func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo, targetDB string) tea.Cmd {
return func() tea.Msg {
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
safety := restore.NewSafety(cfg, log)
@@ -121,7 +125,33 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
}
checks = append(checks, check)
// 2. Disk space
// 2. Dump validity (deep diagnosis)
check = SafetyCheck{Name: "Dump validity", Status: "checking", Critical: true}
diagnoser := restore.NewDiagnoser(log, false)
diagResult, diagErr := diagnoser.DiagnoseFile(archive.Path)
if diagErr != nil {
check.Status = "warning"
check.Message = fmt.Sprintf("Cannot diagnose: %v", diagErr)
} else if !diagResult.IsValid {
check.Status = "failed"
check.Critical = true
if diagResult.IsTruncated {
check.Message = "Dump is TRUNCATED - restore will fail"
} else if diagResult.IsCorrupted {
check.Message = "Dump is CORRUPTED - restore will fail"
} else if len(diagResult.Errors) > 0 {
check.Message = diagResult.Errors[0]
} else {
check.Message = "Dump has validation errors"
}
canProceed = false
} else {
check.Status = "passed"
check.Message = "Dump structure verified"
}
checks = append(checks, check)
// 3. Disk space
check = SafetyCheck{Name: "Disk space", Status: "checking", Critical: true}
multiplier := 3.0
if archive.Format.IsClusterBackup() {
@@ -137,7 +167,7 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
}
checks = append(checks, check)
// 3. Required tools
// 4. Required tools
check = SafetyCheck{Name: "Required tools", Status: "checking", Critical: true}
dbType := "postgres"
if archive.Format.IsMySQL() {
@@ -153,7 +183,7 @@ func runSafetyChecks(cfg *config.Config, log logger.Logger, archive ArchiveInfo,
}
checks = append(checks, check)
// 4. Target database check (skip for cluster restores)
// 5. Target database check (skip for cluster restores)
existingDBCount := 0
existingDBs := []string{}
@@ -243,6 +273,27 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
m.message = fmt.Sprintf("Create if missing: %v", m.createIfMissing)
}
case "d":
// Toggle debug log saving
m.saveDebugLog = !m.saveDebugLog
if m.saveDebugLog {
m.message = infoStyle.Render("📋 Debug log: enabled (will save detailed report on failure)")
} else {
m.message = "Debug log: disabled"
}
case "w":
// Toggle/set work directory
if m.workDir == "" {
// Set to backup directory as default alternative
m.workDir = m.config.BackupDir
m.message = infoStyle.Render(fmt.Sprintf("📁 Work directory set to: %s", m.workDir))
} else {
// Clear work directory (use system temp)
m.workDir = ""
m.message = "Work directory: using system temp"
}
case "enter", " ":
if m.checking {
m.message = "Please wait for safety checks to complete..."
@@ -255,7 +306,7 @@ func (m RestorePreviewModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
}
// Proceed to restore execution
exec := NewRestoreExecution(m.config, m.logger, m.parent, m.ctx, m.archive, m.targetDB, m.cleanFirst, m.createIfMissing, m.mode, m.cleanClusterFirst, m.existingDBs)
exec := NewRestoreExecution(m.config, m.logger, m.parent, m.ctx, m.archive, m.targetDB, m.cleanFirst, m.createIfMissing, m.mode, m.cleanClusterFirst, m.existingDBs, m.saveDebugLog, m.workDir)
return exec, exec.Init()
}
}
@@ -390,6 +441,41 @@ func (m RestorePreviewModel) View() string {
s.WriteString("\n\n")
}
// Advanced Options
s.WriteString(archiveHeaderStyle.Render("⚙️ Advanced Options"))
s.WriteString("\n")
// Work directory option
workDirIcon := "✗"
workDirStyle := infoStyle
workDirValue := "(system temp)"
if m.workDir != "" {
workDirIcon = "✓"
workDirStyle = checkPassedStyle
workDirValue = m.workDir
}
s.WriteString(workDirStyle.Render(fmt.Sprintf(" %s Work Dir: %s (press 'w' to toggle)", workDirIcon, workDirValue)))
s.WriteString("\n")
if m.workDir == "" {
s.WriteString(infoStyle.Render(" ⚠️ Large archives need more space than /tmp may have"))
s.WriteString("\n")
}
// Debug log option
debugIcon := "✗"
debugStyle := infoStyle
if m.saveDebugLog {
debugIcon = "✓"
debugStyle = checkPassedStyle
}
s.WriteString(debugStyle.Render(fmt.Sprintf(" %s Debug Log: %v (press 'd' to toggle)", debugIcon, m.saveDebugLog)))
s.WriteString("\n")
if m.saveDebugLog {
s.WriteString(infoStyle.Render(" Saves detailed error report to /tmp on failure"))
s.WriteString("\n")
}
s.WriteString("\n")
// Message
if m.message != "" {
s.WriteString(m.message)
@@ -403,15 +489,15 @@ func (m RestorePreviewModel) View() string {
s.WriteString(successStyle.Render("✅ Ready to restore"))
s.WriteString("\n")
if m.mode == "restore-single" {
s.WriteString(infoStyle.Render("⌨️ t: Toggle clean-first | c: Toggle create | Enter: Proceed | Esc: Cancel"))
s.WriteString(infoStyle.Render("⌨️ t: Clean-first | c: Create | w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
} else if m.mode == "restore-cluster" {
if m.existingDBCount > 0 {
s.WriteString(infoStyle.Render("⌨️ c: Toggle cleanup | Enter: Proceed | Esc: Cancel"))
s.WriteString(infoStyle.Render("⌨️ c: Cleanup | w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
} else {
s.WriteString(infoStyle.Render("⌨️ Enter: Proceed | Esc: Cancel"))
s.WriteString(infoStyle.Render("⌨️ w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
}
} else {
s.WriteString(infoStyle.Render("⌨️ Enter: Proceed | Esc: Cancel"))
s.WriteString(infoStyle.Render("⌨️ w: WorkDir | d: Debug | Enter: Proceed | Esc: Cancel"))
}
} else {
s.WriteString(errorStyle.Render("❌ Cannot proceed - please fix errors above"))

View File

@@ -115,6 +115,26 @@ func NewSettingsModel(cfg *config.Config, log logger.Logger, parent tea.Model) S
Type: "path",
Description: "Directory where backup files will be stored",
},
{
Key: "work_dir",
DisplayName: "Work Directory",
Value: func(c *config.Config) string {
if c.WorkDir == "" {
return "(system temp)"
}
return c.WorkDir
},
Update: func(c *config.Config, v string) error {
if v == "" || v == "(system temp)" {
c.WorkDir = ""
return nil
}
c.WorkDir = filepath.Clean(v)
return nil
},
Type: "path",
Description: "Working directory for large operations (extraction, diagnosis). Use when /tmp is too small.",
},
{
Key: "compression_level",
DisplayName: "Compression Level",

View File

@@ -16,7 +16,7 @@ import (
// Build information (set by ldflags)
var (
version = "3.1.0"
version = "3.40.0"
buildTime = "unknown"
gitCommit = "unknown"
)