Files
dbbackup/GCS.md
Renz 64f1458e9a feat: Sprint 4 - Azure Blob Storage and Google Cloud Storage support
Implemented full native support for Azure Blob Storage and Google Cloud Storage:

**Azure Blob Storage (internal/cloud/azure.go):**
- Native Azure SDK integration (github.com/Azure/azure-sdk-for-go)
- Block blob upload for large files (>256MB with 100MB blocks)
- Azurite emulator support for local testing
- Production Azure authentication (account name + key)
- SHA-256 integrity verification with metadata
- Streaming uploads with progress tracking

**Google Cloud Storage (internal/cloud/gcs.go):**
- Native GCS SDK integration (cloud.google.com/go/storage)
- Chunked upload for large files (16MB chunks)
- fake-gcs-server emulator support for local testing
- Application Default Credentials support
- Service account JSON key file support
- SHA-256 integrity verification with metadata
- Streaming uploads with progress tracking

**Backend Integration:**
- Updated NewBackend() factory to support azure/azblob and gs/gcs providers
- Added Name() methods to both backends
- Fixed ProgressReader usage across all backends
- Updated Config comments to document Azure/GCS support

**Testing Infrastructure:**
- docker-compose.azurite.yml: Azurite + PostgreSQL + MySQL test environment
- docker-compose.gcs.yml: fake-gcs-server + PostgreSQL + MySQL test environment
- scripts/test_azure_storage.sh: 8 comprehensive Azure integration tests
- scripts/test_gcs_storage.sh: 8 comprehensive GCS integration tests
- Both test scripts validate upload/download/verify/cleanup/restore operations

**Documentation:**
- AZURE.md: Complete guide (600+ lines) covering setup, authentication, usage
- GCS.md: Complete guide (600+ lines) covering setup, authentication, usage
- Updated CLOUD.md with Azure and GCS sections
- Updated internal/config/config.go with Azure/GCS field documentation

**Test Coverage:**
- Large file uploads (300MB for Azure, 200MB for GCS)
- Block/chunked upload verification
- Backup verification with SHA-256 checksums
- Restore from cloud URIs
- Cleanup and retention policies
- Emulator support for both providers

**Dependencies Added:**
- Azure: github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.3
- GCS: cloud.google.com/go/storage v1.57.2
- Plus transitive dependencies (~50+ packages)

**Build:**
- Compiles successfully: 68MB binary
- All imports resolved
- No compilation errors

Sprint 4 closes the multi-cloud gap identified in Sprint 3 evaluation.
Users can now use Azure and GCS URIs that were previously parsed but unsupported.
2025-11-25 21:31:21 +00:00

15 KiB

Google Cloud Storage Integration

This guide covers using Google Cloud Storage (GCS) with dbbackup for secure, scalable cloud backup storage.

Table of Contents

Quick Start

1. GCP Setup

  1. Create a GCS bucket in Google Cloud Console
  2. Set up authentication (choose one):
    • Service Account: Create and download JSON key file
    • Application Default Credentials: Use gcloud CLI
    • Workload Identity: For GKE clusters

2. Basic Backup

# Backup PostgreSQL to GCS (using ADC)
dbbackup backup postgres \
  --host localhost \
  --database mydb \
  --output backup.sql \
  --cloud "gs://mybucket/backups/db.sql"

3. Restore from GCS

# Restore from GCS backup
dbbackup restore postgres \
  --source "gs://mybucket/backups/db.sql" \
  --host localhost \
  --database mydb_restored

URI Syntax

Basic Format

gs://bucket/path/to/backup.sql
gcs://bucket/path/to/backup.sql

Both gs:// and gcs:// prefixes are supported.

URI Components

Component Required Description Example
bucket Yes GCS bucket name mybucket
path Yes Object path within bucket backups/db.sql
credentials No Path to service account JSON /path/to/key.json
project No GCP project ID my-project-id
endpoint No Custom endpoint (emulator) http://localhost:4443

URI Examples

Production GCS (Application Default Credentials):

gs://prod-backups/postgres/db.sql

With Service Account:

gs://prod-backups/postgres/db.sql?credentials=/path/to/service-account.json

With Project ID:

gs://prod-backups/postgres/db.sql?project=my-project-id

fake-gcs-server Emulator:

gs://test-backups/postgres/db.sql?endpoint=http://localhost:4443/storage/v1

With Path Prefix:

gs://backups/production/postgres/2024/db.sql

Authentication

Use gcloud CLI to set up ADC:

# Login with your Google account
gcloud auth application-default login

# Or use service account for server environments
gcloud auth activate-service-account --key-file=/path/to/key.json

# Use simplified URI (credentials from environment)
dbbackup backup postgres --cloud "gs://mybucket/backups/backup.sql"

Method 2: Service Account JSON

Download service account key from GCP Console:

  1. Go to IAM & AdminService Accounts
  2. Create or select a service account
  3. Click KeysAdd KeyCreate new keyJSON
  4. Download the JSON file

Use in URI:

dbbackup backup postgres \
  --cloud "gs://mybucket/backup.sql?credentials=/path/to/service-account.json"

Or via environment:

export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
dbbackup backup postgres --cloud "gs://mybucket/backup.sql"

Method 3: Workload Identity (GKE)

For Kubernetes workloads:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: dbbackup-sa
  annotations:
    iam.gke.io/gcp-service-account: dbbackup@project.iam.gserviceaccount.com

Then use ADC in your pod:

dbbackup backup postgres --cloud "gs://mybucket/backup.sql"

Required IAM Permissions

Service account needs these roles:

  • Storage Object Creator: Upload backups
  • Storage Object Viewer: List and download backups
  • Storage Object Admin: Delete backups (for cleanup)

Or use predefined role: Storage Admin

# Grant permissions
gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:dbbackup@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectAdmin"

Configuration

Bucket Setup

Create a bucket before first use:

# gcloud CLI
gsutil mb -p PROJECT_ID -c STANDARD -l us-central1 gs://mybucket/

# Or let dbbackup create it (requires permissions)
dbbackup cloud upload file.sql "gs://mybucket/file.sql?create=true&project=PROJECT_ID"

Storage Classes

GCS offers multiple storage classes:

  • Standard: Frequent access (default)
  • Nearline: Access <1/month (lower cost)
  • Coldline: Access <1/quarter (very low cost)
  • Archive: Long-term retention (lowest cost)

Set the class when creating bucket:

gsutil mb -c NEARLINE gs://mybucket/

Lifecycle Management

Configure automatic transitions and deletion:

{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "SetStorageClass", "storageClass": "NEARLINE"},
        "condition": {"age": 30, "matchesPrefix": ["backups/"]}
      },
      {
        "action": {"type": "SetStorageClass", "storageClass": "ARCHIVE"},
        "condition": {"age": 90, "matchesPrefix": ["backups/"]}
      },
      {
        "action": {"type": "Delete"},
        "condition": {"age": 365, "matchesPrefix": ["backups/"]}
      }
    ]
  }
}

Apply lifecycle configuration:

gsutil lifecycle set lifecycle.json gs://mybucket/

Regional Configuration

Choose bucket location for better performance:

# US regions
gsutil mb -l us-central1 gs://mybucket/
gsutil mb -l us-east1 gs://mybucket/

# EU regions
gsutil mb -l europe-west1 gs://mybucket/

# Multi-region
gsutil mb -l us gs://mybucket/
gsutil mb -l eu gs://mybucket/

Usage Examples

Backup with Auto-Upload

# PostgreSQL backup with automatic GCS upload
dbbackup backup postgres \
  --host localhost \
  --database production_db \
  --output /backups/db.sql \
  --cloud "gs://prod-backups/postgres/$(date +%Y%m%d_%H%M%S).sql" \
  --compression 6

Backup All Databases

# Backup entire PostgreSQL cluster to GCS
dbbackup backup postgres \
  --host localhost \
  --all-databases \
  --output-dir /backups \
  --cloud "gs://prod-backups/postgres/cluster/"

Verify Backup

# Verify backup integrity
dbbackup verify "gs://prod-backups/postgres/backup.sql"

List Backups

# List all backups in bucket
dbbackup cloud list "gs://prod-backups/postgres/"

# List with pattern
dbbackup cloud list "gs://prod-backups/postgres/2024/"

# Or use gsutil
gsutil ls gs://prod-backups/postgres/

Download Backup

# Download from GCS to local
dbbackup cloud download \
  "gs://prod-backups/postgres/backup.sql" \
  /local/path/backup.sql

Delete Old Backups

# Manual delete
dbbackup cloud delete "gs://prod-backups/postgres/old_backup.sql"

# Automatic cleanup (keep last 7 backups)
dbbackup cleanup "gs://prod-backups/postgres/" --keep 7

Scheduled Backups

#!/bin/bash
# GCS backup script (run via cron)

DATE=$(date +%Y%m%d_%H%M%S)
GCS_URI="gs://prod-backups/postgres/${DATE}.sql"

dbbackup backup postgres \
  --host localhost \
  --database production_db \
  --output /tmp/backup.sql \
  --cloud "${GCS_URI}" \
  --compression 9

# Cleanup old backups
dbbackup cleanup "gs://prod-backups/postgres/" --keep 30

Crontab:

# Daily at 2 AM
0 2 * * * /usr/local/bin/gcs-backup.sh >> /var/log/gcs-backup.log 2>&1

Systemd Timer:

# /etc/systemd/system/gcs-backup.timer
[Unit]
Description=Daily GCS Database Backup

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

Advanced Features

Chunked Upload

For large files, dbbackup automatically uses GCS chunked upload:

  • Chunk Size: 16MB per chunk
  • Streaming: Direct streaming from source
  • Checksum: SHA-256 integrity verification
# Large database backup (automatically uses chunked upload)
dbbackup backup postgres \
  --host localhost \
  --database huge_db \
  --output /backups/huge.sql \
  --cloud "gs://backups/huge.sql"

Progress Tracking

# Backup with progress display
dbbackup backup postgres \
  --host localhost \
  --database mydb \
  --output backup.sql \
  --cloud "gs://backups/backup.sql" \
  --progress

Concurrent Operations

# Backup multiple databases in parallel
dbbackup backup postgres \
  --host localhost \
  --all-databases \
  --output-dir /backups \
  --cloud "gs://backups/cluster/" \
  --parallelism 4

Custom Metadata

Backups include SHA-256 checksums as object metadata:

# View metadata using gsutil
gsutil stat gs://backups/backup.sql

Object Versioning

Enable versioning to protect against accidental deletion:

# Enable versioning
gsutil versioning set on gs://mybucket/

# List all versions
gsutil ls -a gs://mybucket/backup.sql

# Restore previous version
gsutil cp gs://mybucket/backup.sql#VERSION /local/backup.sql

Customer-Managed Encryption Keys (CMEK)

Use your own encryption keys:

# Create encryption key in Cloud KMS
gcloud kms keyrings create backup-keyring --location=us-central1
gcloud kms keys create backup-key --location=us-central1 --keyring=backup-keyring --purpose=encryption

# Set default CMEK for bucket
gsutil kms encryption gs://mybucket/ projects/PROJECT/locations/us-central1/keyRings/backup-keyring/cryptoKeys/backup-key

Testing with fake-gcs-server

Setup fake-gcs-server Emulator

Docker Compose:

services:
  gcs-emulator:
    image: fsouza/fake-gcs-server:latest
    ports:
      - "4443:4443"
    command: -scheme http -public-host localhost:4443

Start:

docker-compose -f docker-compose.gcs.yml up -d

Create Test Bucket

# Using curl
curl -X POST "http://localhost:4443/storage/v1/b?project=test-project" \
  -H "Content-Type: application/json" \
  -d '{"name": "test-backups"}'

Test Backup

# Backup to fake-gcs-server
dbbackup backup postgres \
  --host localhost \
  --database testdb \
  --output test.sql \
  --cloud "gs://test-backups/test.sql?endpoint=http://localhost:4443/storage/v1"

Run Integration Tests

# Run comprehensive test suite
./scripts/test_gcs_storage.sh

Tests include:

  • PostgreSQL and MySQL backups
  • Upload/download operations
  • Large file handling (200MB+)
  • Verification and cleanup
  • Restore operations

Best Practices

1. Security

  • Never commit credentials to version control
  • Use Application Default Credentials when possible
  • Rotate service account keys regularly
  • Use Workload Identity for GKE
  • Enable VPC Service Controls for enterprise security
  • Use Customer-Managed Encryption Keys (CMEK) for sensitive data

2. Performance

  • Use compression for faster uploads: --compression 6
  • Enable parallelism for cluster backups: --parallelism 4
  • Choose appropriate GCS region (close to source)
  • Use multi-region buckets for high availability

3. Cost Optimization

  • Use Nearline for backups older than 30 days
  • Use Archive for long-term retention (>90 days)
  • Enable lifecycle management for automatic transitions
  • Monitor storage costs in GCP Billing Console
  • Use Coldline for quarterly access patterns

4. Reliability

  • Test restore procedures regularly
  • Use retention policies: --keep 30
  • Enable object versioning (30-day recovery)
  • Use multi-region buckets for disaster recovery
  • Monitor backup success with Cloud Monitoring

5. Organization

  • Use consistent naming: {database}/{date}/{backup}.sql
  • Use bucket prefixes: prod-backups, dev-backups
  • Tag backups with labels (environment, version)
  • Document restore procedures
  • Use separate buckets per environment

Troubleshooting

Connection Issues

Problem: failed to create GCS client

Solutions:

  • Check GOOGLE_APPLICATION_CREDENTIALS environment variable
  • Verify service account JSON file exists and is valid
  • Ensure gcloud CLI is authenticated: gcloud auth list
  • For emulator, confirm http://localhost:4443 is running

Authentication Errors

Problem: authentication failed or permission denied

Solutions:

  • Verify service account has required IAM roles
  • Check if Application Default Credentials are set up
  • Run gcloud auth application-default login
  • Verify service account JSON is not corrupted
  • Check GCP project ID is correct

Upload Failures

Problem: failed to upload object

Solutions:

  • Check bucket exists (or use &create=true)
  • Verify service account has storage.objects.create permission
  • Check network connectivity to GCS
  • Try smaller files first (test connection)
  • Check GCP quota limits

Large File Issues

Problem: Upload timeout for large files

Solutions:

  • dbbackup automatically uses chunked upload
  • Increase compression: --compression 9
  • Check network bandwidth
  • Use Transfer Appliance for TB+ data

List/Download Issues

Problem: object not found

Solutions:

  • Verify object name (check GCS Console)
  • Check bucket name is correct
  • Ensure object hasn't been moved/deleted
  • Check if object is in Archive class (requires restore)

Performance Issues

Problem: Slow upload/download

Solutions:

  • Use compression: --compression 6
  • Choose closer GCS region
  • Check network bandwidth
  • Use multi-region bucket for better availability
  • Enable parallelism for multiple files

Debugging

Enable debug mode:

dbbackup backup postgres \
  --cloud "gs://bucket/backup.sql" \
  --debug

Check GCP logs:

# Cloud Logging
gcloud logging read "resource.type=gcs_bucket AND resource.labels.bucket_name=mybucket" \
  --limit 50 \
  --format json

View bucket details:

gsutil ls -L -b gs://mybucket/

Monitoring and Alerting

Cloud Monitoring

Create metrics and alerts:

# Monitor backup success rate
gcloud monitoring policies create \
  --notification-channels=CHANNEL_ID \
  --display-name="Backup Failure Alert" \
  --condition-display-name="No backups in 24h" \
  --condition-threshold-value=0 \
  --condition-threshold-duration=86400s

Logging

Export logs to BigQuery for analysis:

gcloud logging sinks create backup-logs \
  bigquery.googleapis.com/projects/PROJECT_ID/datasets/backup_logs \
  --log-filter='resource.type="gcs_bucket" AND resource.labels.bucket_name="prod-backups"'

Additional Resources

Support

For issues specific to GCS integration:

  1. Check Troubleshooting section
  2. Run integration tests: ./scripts/test_gcs_storage.sh
  3. Enable debug mode: --debug
  4. Check GCP Service Status
  5. Open an issue on GitHub with debug logs

See Also