Implemented full native support for Azure Blob Storage and Google Cloud Storage: **Azure Blob Storage (internal/cloud/azure.go):** - Native Azure SDK integration (github.com/Azure/azure-sdk-for-go) - Block blob upload for large files (>256MB with 100MB blocks) - Azurite emulator support for local testing - Production Azure authentication (account name + key) - SHA-256 integrity verification with metadata - Streaming uploads with progress tracking **Google Cloud Storage (internal/cloud/gcs.go):** - Native GCS SDK integration (cloud.google.com/go/storage) - Chunked upload for large files (16MB chunks) - fake-gcs-server emulator support for local testing - Application Default Credentials support - Service account JSON key file support - SHA-256 integrity verification with metadata - Streaming uploads with progress tracking **Backend Integration:** - Updated NewBackend() factory to support azure/azblob and gs/gcs providers - Added Name() methods to both backends - Fixed ProgressReader usage across all backends - Updated Config comments to document Azure/GCS support **Testing Infrastructure:** - docker-compose.azurite.yml: Azurite + PostgreSQL + MySQL test environment - docker-compose.gcs.yml: fake-gcs-server + PostgreSQL + MySQL test environment - scripts/test_azure_storage.sh: 8 comprehensive Azure integration tests - scripts/test_gcs_storage.sh: 8 comprehensive GCS integration tests - Both test scripts validate upload/download/verify/cleanup/restore operations **Documentation:** - AZURE.md: Complete guide (600+ lines) covering setup, authentication, usage - GCS.md: Complete guide (600+ lines) covering setup, authentication, usage - Updated CLOUD.md with Azure and GCS sections - Updated internal/config/config.go with Azure/GCS field documentation **Test Coverage:** - Large file uploads (300MB for Azure, 200MB for GCS) - Block/chunked upload verification - Backup verification with SHA-256 checksums - Restore from cloud URIs - Cleanup and retention policies - Emulator support for both providers **Dependencies Added:** - Azure: github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.3 - GCS: cloud.google.com/go/storage v1.57.2 - Plus transitive dependencies (~50+ packages) **Build:** - Compiles successfully: 68MB binary - All imports resolved - No compilation errors Sprint 4 closes the multi-cloud gap identified in Sprint 3 evaluation. Users can now use Azure and GCS URIs that were previously parsed but unsupported.
15 KiB
Google Cloud Storage Integration
This guide covers using Google Cloud Storage (GCS) with dbbackup for secure, scalable cloud backup storage.
Table of Contents
- Quick Start
- URI Syntax
- Authentication
- Configuration
- Usage Examples
- Advanced Features
- Testing with fake-gcs-server
- Best Practices
- Troubleshooting
Quick Start
1. GCP Setup
- Create a GCS bucket in Google Cloud Console
- Set up authentication (choose one):
- Service Account: Create and download JSON key file
- Application Default Credentials: Use gcloud CLI
- Workload Identity: For GKE clusters
2. Basic Backup
# Backup PostgreSQL to GCS (using ADC)
dbbackup backup postgres \
--host localhost \
--database mydb \
--output backup.sql \
--cloud "gs://mybucket/backups/db.sql"
3. Restore from GCS
# Restore from GCS backup
dbbackup restore postgres \
--source "gs://mybucket/backups/db.sql" \
--host localhost \
--database mydb_restored
URI Syntax
Basic Format
gs://bucket/path/to/backup.sql
gcs://bucket/path/to/backup.sql
Both gs:// and gcs:// prefixes are supported.
URI Components
| Component | Required | Description | Example |
|---|---|---|---|
bucket |
Yes | GCS bucket name | mybucket |
path |
Yes | Object path within bucket | backups/db.sql |
credentials |
No | Path to service account JSON | /path/to/key.json |
project |
No | GCP project ID | my-project-id |
endpoint |
No | Custom endpoint (emulator) | http://localhost:4443 |
URI Examples
Production GCS (Application Default Credentials):
gs://prod-backups/postgres/db.sql
With Service Account:
gs://prod-backups/postgres/db.sql?credentials=/path/to/service-account.json
With Project ID:
gs://prod-backups/postgres/db.sql?project=my-project-id
fake-gcs-server Emulator:
gs://test-backups/postgres/db.sql?endpoint=http://localhost:4443/storage/v1
With Path Prefix:
gs://backups/production/postgres/2024/db.sql
Authentication
Method 1: Application Default Credentials (Recommended)
Use gcloud CLI to set up ADC:
# Login with your Google account
gcloud auth application-default login
# Or use service account for server environments
gcloud auth activate-service-account --key-file=/path/to/key.json
# Use simplified URI (credentials from environment)
dbbackup backup postgres --cloud "gs://mybucket/backups/backup.sql"
Method 2: Service Account JSON
Download service account key from GCP Console:
- Go to IAM & Admin → Service Accounts
- Create or select a service account
- Click Keys → Add Key → Create new key → JSON
- Download the JSON file
Use in URI:
dbbackup backup postgres \
--cloud "gs://mybucket/backup.sql?credentials=/path/to/service-account.json"
Or via environment:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
dbbackup backup postgres --cloud "gs://mybucket/backup.sql"
Method 3: Workload Identity (GKE)
For Kubernetes workloads:
apiVersion: v1
kind: ServiceAccount
metadata:
name: dbbackup-sa
annotations:
iam.gke.io/gcp-service-account: dbbackup@project.iam.gserviceaccount.com
Then use ADC in your pod:
dbbackup backup postgres --cloud "gs://mybucket/backup.sql"
Required IAM Permissions
Service account needs these roles:
- Storage Object Creator: Upload backups
- Storage Object Viewer: List and download backups
- Storage Object Admin: Delete backups (for cleanup)
Or use predefined role: Storage Admin
# Grant permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
--member="serviceAccount:dbbackup@PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
Configuration
Bucket Setup
Create a bucket before first use:
# gcloud CLI
gsutil mb -p PROJECT_ID -c STANDARD -l us-central1 gs://mybucket/
# Or let dbbackup create it (requires permissions)
dbbackup cloud upload file.sql "gs://mybucket/file.sql?create=true&project=PROJECT_ID"
Storage Classes
GCS offers multiple storage classes:
- Standard: Frequent access (default)
- Nearline: Access <1/month (lower cost)
- Coldline: Access <1/quarter (very low cost)
- Archive: Long-term retention (lowest cost)
Set the class when creating bucket:
gsutil mb -c NEARLINE gs://mybucket/
Lifecycle Management
Configure automatic transitions and deletion:
{
"lifecycle": {
"rule": [
{
"action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
"condition": {"age": 30, "matchesPrefix": ["backups/"]}
},
{
"action": {"type": "SetStorageClass", "storageClass": "ARCHIVE"},
"condition": {"age": 90, "matchesPrefix": ["backups/"]}
},
{
"action": {"type": "Delete"},
"condition": {"age": 365, "matchesPrefix": ["backups/"]}
}
]
}
}
Apply lifecycle configuration:
gsutil lifecycle set lifecycle.json gs://mybucket/
Regional Configuration
Choose bucket location for better performance:
# US regions
gsutil mb -l us-central1 gs://mybucket/
gsutil mb -l us-east1 gs://mybucket/
# EU regions
gsutil mb -l europe-west1 gs://mybucket/
# Multi-region
gsutil mb -l us gs://mybucket/
gsutil mb -l eu gs://mybucket/
Usage Examples
Backup with Auto-Upload
# PostgreSQL backup with automatic GCS upload
dbbackup backup postgres \
--host localhost \
--database production_db \
--output /backups/db.sql \
--cloud "gs://prod-backups/postgres/$(date +%Y%m%d_%H%M%S).sql" \
--compression 6
Backup All Databases
# Backup entire PostgreSQL cluster to GCS
dbbackup backup postgres \
--host localhost \
--all-databases \
--output-dir /backups \
--cloud "gs://prod-backups/postgres/cluster/"
Verify Backup
# Verify backup integrity
dbbackup verify "gs://prod-backups/postgres/backup.sql"
List Backups
# List all backups in bucket
dbbackup cloud list "gs://prod-backups/postgres/"
# List with pattern
dbbackup cloud list "gs://prod-backups/postgres/2024/"
# Or use gsutil
gsutil ls gs://prod-backups/postgres/
Download Backup
# Download from GCS to local
dbbackup cloud download \
"gs://prod-backups/postgres/backup.sql" \
/local/path/backup.sql
Delete Old Backups
# Manual delete
dbbackup cloud delete "gs://prod-backups/postgres/old_backup.sql"
# Automatic cleanup (keep last 7 backups)
dbbackup cleanup "gs://prod-backups/postgres/" --keep 7
Scheduled Backups
#!/bin/bash
# GCS backup script (run via cron)
DATE=$(date +%Y%m%d_%H%M%S)
GCS_URI="gs://prod-backups/postgres/${DATE}.sql"
dbbackup backup postgres \
--host localhost \
--database production_db \
--output /tmp/backup.sql \
--cloud "${GCS_URI}" \
--compression 9
# Cleanup old backups
dbbackup cleanup "gs://prod-backups/postgres/" --keep 30
Crontab:
# Daily at 2 AM
0 2 * * * /usr/local/bin/gcs-backup.sh >> /var/log/gcs-backup.log 2>&1
Systemd Timer:
# /etc/systemd/system/gcs-backup.timer
[Unit]
Description=Daily GCS Database Backup
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
Advanced Features
Chunked Upload
For large files, dbbackup automatically uses GCS chunked upload:
- Chunk Size: 16MB per chunk
- Streaming: Direct streaming from source
- Checksum: SHA-256 integrity verification
# Large database backup (automatically uses chunked upload)
dbbackup backup postgres \
--host localhost \
--database huge_db \
--output /backups/huge.sql \
--cloud "gs://backups/huge.sql"
Progress Tracking
# Backup with progress display
dbbackup backup postgres \
--host localhost \
--database mydb \
--output backup.sql \
--cloud "gs://backups/backup.sql" \
--progress
Concurrent Operations
# Backup multiple databases in parallel
dbbackup backup postgres \
--host localhost \
--all-databases \
--output-dir /backups \
--cloud "gs://backups/cluster/" \
--parallelism 4
Custom Metadata
Backups include SHA-256 checksums as object metadata:
# View metadata using gsutil
gsutil stat gs://backups/backup.sql
Object Versioning
Enable versioning to protect against accidental deletion:
# Enable versioning
gsutil versioning set on gs://mybucket/
# List all versions
gsutil ls -a gs://mybucket/backup.sql
# Restore previous version
gsutil cp gs://mybucket/backup.sql#VERSION /local/backup.sql
Customer-Managed Encryption Keys (CMEK)
Use your own encryption keys:
# Create encryption key in Cloud KMS
gcloud kms keyrings create backup-keyring --location=us-central1
gcloud kms keys create backup-key --location=us-central1 --keyring=backup-keyring --purpose=encryption
# Set default CMEK for bucket
gsutil kms encryption gs://mybucket/ projects/PROJECT/locations/us-central1/keyRings/backup-keyring/cryptoKeys/backup-key
Testing with fake-gcs-server
Setup fake-gcs-server Emulator
Docker Compose:
services:
gcs-emulator:
image: fsouza/fake-gcs-server:latest
ports:
- "4443:4443"
command: -scheme http -public-host localhost:4443
Start:
docker-compose -f docker-compose.gcs.yml up -d
Create Test Bucket
# Using curl
curl -X POST "http://localhost:4443/storage/v1/b?project=test-project" \
-H "Content-Type: application/json" \
-d '{"name": "test-backups"}'
Test Backup
# Backup to fake-gcs-server
dbbackup backup postgres \
--host localhost \
--database testdb \
--output test.sql \
--cloud "gs://test-backups/test.sql?endpoint=http://localhost:4443/storage/v1"
Run Integration Tests
# Run comprehensive test suite
./scripts/test_gcs_storage.sh
Tests include:
- PostgreSQL and MySQL backups
- Upload/download operations
- Large file handling (200MB+)
- Verification and cleanup
- Restore operations
Best Practices
1. Security
- Never commit credentials to version control
- Use Application Default Credentials when possible
- Rotate service account keys regularly
- Use Workload Identity for GKE
- Enable VPC Service Controls for enterprise security
- Use Customer-Managed Encryption Keys (CMEK) for sensitive data
2. Performance
- Use compression for faster uploads:
--compression 6 - Enable parallelism for cluster backups:
--parallelism 4 - Choose appropriate GCS region (close to source)
- Use multi-region buckets for high availability
3. Cost Optimization
- Use Nearline for backups older than 30 days
- Use Archive for long-term retention (>90 days)
- Enable lifecycle management for automatic transitions
- Monitor storage costs in GCP Billing Console
- Use Coldline for quarterly access patterns
4. Reliability
- Test restore procedures regularly
- Use retention policies:
--keep 30 - Enable object versioning (30-day recovery)
- Use multi-region buckets for disaster recovery
- Monitor backup success with Cloud Monitoring
5. Organization
- Use consistent naming:
{database}/{date}/{backup}.sql - Use bucket prefixes:
prod-backups,dev-backups - Tag backups with labels (environment, version)
- Document restore procedures
- Use separate buckets per environment
Troubleshooting
Connection Issues
Problem: failed to create GCS client
Solutions:
- Check
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - Verify service account JSON file exists and is valid
- Ensure gcloud CLI is authenticated:
gcloud auth list - For emulator, confirm
http://localhost:4443is running
Authentication Errors
Problem: authentication failed or permission denied
Solutions:
- Verify service account has required IAM roles
- Check if Application Default Credentials are set up
- Run
gcloud auth application-default login - Verify service account JSON is not corrupted
- Check GCP project ID is correct
Upload Failures
Problem: failed to upload object
Solutions:
- Check bucket exists (or use
&create=true) - Verify service account has
storage.objects.createpermission - Check network connectivity to GCS
- Try smaller files first (test connection)
- Check GCP quota limits
Large File Issues
Problem: Upload timeout for large files
Solutions:
- dbbackup automatically uses chunked upload
- Increase compression:
--compression 9 - Check network bandwidth
- Use Transfer Appliance for TB+ data
List/Download Issues
Problem: object not found
Solutions:
- Verify object name (check GCS Console)
- Check bucket name is correct
- Ensure object hasn't been moved/deleted
- Check if object is in Archive class (requires restore)
Performance Issues
Problem: Slow upload/download
Solutions:
- Use compression:
--compression 6 - Choose closer GCS region
- Check network bandwidth
- Use multi-region bucket for better availability
- Enable parallelism for multiple files
Debugging
Enable debug mode:
dbbackup backup postgres \
--cloud "gs://bucket/backup.sql" \
--debug
Check GCP logs:
# Cloud Logging
gcloud logging read "resource.type=gcs_bucket AND resource.labels.bucket_name=mybucket" \
--limit 50 \
--format json
View bucket details:
gsutil ls -L -b gs://mybucket/
Monitoring and Alerting
Cloud Monitoring
Create metrics and alerts:
# Monitor backup success rate
gcloud monitoring policies create \
--notification-channels=CHANNEL_ID \
--display-name="Backup Failure Alert" \
--condition-display-name="No backups in 24h" \
--condition-threshold-value=0 \
--condition-threshold-duration=86400s
Logging
Export logs to BigQuery for analysis:
gcloud logging sinks create backup-logs \
bigquery.googleapis.com/projects/PROJECT_ID/datasets/backup_logs \
--log-filter='resource.type="gcs_bucket" AND resource.labels.bucket_name="prod-backups"'
Additional Resources
- Google Cloud Storage Documentation
- fake-gcs-server
- gsutil Tool
- GCS Client Libraries
- dbbackup Cloud Storage Guide
Support
For issues specific to GCS integration:
- Check Troubleshooting section
- Run integration tests:
./scripts/test_gcs_storage.sh - Enable debug mode:
--debug - Check GCP Service Status
- Open an issue on GitHub with debug logs