Backup and Disaster-Recovery Policy — Veloxis
Effective date: [to be set on first publication] Last revised: 2026-05-24 Owner: CA Krishna Gujarathi Review cadence: Annual + on every material infrastructure change
This policy specifies how the Veloxis Platform is backed up, where backups live, and how the Firm recovers from failure or destruction events. It is read with the Information Security Policy and the Incident Response Plan.
1. Recovery objectives
| Metric | Target | Definition |
|---|---|---|
| RPO (Recovery Point Objective) | 24 hours | Maximum acceptable data loss measured in time |
| RTO (Recovery Time Objective) | 4 hours for partial restore from same-region backup; 24 hours for full rebuild on new VM | Maximum acceptable time to restore service after a disaster |
These objectives are appropriate for the Firm's current single-firm, internal-tool deployment. If Veloxis becomes externally-customer-facing, the objectives will be tightened.
2. What is backed up
| Subject | Method | Frequency | Storage | Encryption | Retention |
|---|---|---|---|---|---|
| PostgreSQL database (every table, including TokenMap, AuditLog, AIPrivacyLog) | pg_dump --format=custom --compress=9 to local file, then upload to R2 |
Nightly at 02:00 IST | Local + Cloudflare R2 (India region where possible) | AES-256-GCM with BACKUP_AES_KEY (separate from firm key) before upload |
30 days rolling |
| Cloudflare R2 object storage (uploaded client documents + generated outputs) | R2 internal versioning + lifecycle policy | Continuous | R2 native | Server-side encryption | 30 days versioned, then permanent latest |
| Source code | Git push to github.com:kgujarathi/veloxis.git |
Every commit | GitHub (US) | TLS in transit; encryption at rest by GitHub | Permanent (Git history) |
| Knowledge base + legal docs + checklists | Same as source code (in Git) | Every commit | GitHub | Same | Permanent |
.env files containing keys |
Not backed up to any external location. Stored mode-600 on production VM only; documented in encrypted Firm password store | Manual | Firm password store | At rest in password store | Permanent / on rotation |
| Application logs (Nginx, PM2, PostgreSQL, sidecar, cron) | Rotated by logrotate; latest 180 days retained on local disk |
Continuous | Local VM disk | Disk encryption (LUKS) | 180 days rolling (CERT-In Directions §IV) |
3. Where backups live
- Local working backup on the production VM at
/var/backups/veloxis/. Last seven days. Mode-600. Owned by application user. - Off-server backup on Cloudflare R2 bucket
veloxis-backups. Last thirty days. Encrypted before upload usingBACKUP_AES_KEY(AES-256-GCM); R2 also encrypts at rest server-side, giving two layers. - Off-site backup of the encryption key —
BACKUP_AES_KEYitself lives in the Firm's password store (1Password / Bitwarden / similar) so a complete VM loss does not destroy the only key.
The Firm does not back up to a country other than India. R2 is configured to use the India region as primary where available.
4. Backup integrity
- Each nightly backup runs a
pg_restore --listcheck after upload to confirm the dump is readable. - Once a quarter, the Firm restores the latest backup into a throwaway PostgreSQL instance and runs the Veloxis migration verification scripts. The result is recorded under
docs/incidents/backup-restore-drill-{YYYY-QQ}.md. - A failed quarterly restore is treated as a SEV-2 incident under the Incident Response Plan.
5. Disaster scenarios and recovery
5.1 Single-row corruption
- Restore the affected row(s) from the latest nightly dump.
- Document the corruption + the restore in the application audit log.
5.2 Database corruption
- Stop
veloxis-prodto prevent further writes. - Restore the latest clean nightly dump into a fresh PostgreSQL instance.
- Verify TokenMap + AuditLog + AIPrivacyLog row counts against the previous business-day reconciliation.
- Restart
veloxis-prodpointing at the restored DB. - Investigate the corruption trigger; if security-related, escalate to SEV-1.
5.3 VM compromise
- Detach the compromised VM from the public internet.
- Snapshot the disk for forensics.
- Provision a fresh Linode VM in the India region.
- Re-run the Linode bootstrap script (
scripts/bootstrap-prod-vm.sh). git clonethe Veloxis repository; check out the last-known-good commit.- Restore the database from the latest pre-compromise R2 backup.
- Re-issue keys (
FIRM_KEY_BYTES,BACKUP_AES_KEY,JWT signing secret, DB password, AI-provider keys); re-encrypt TokenMap rows via the key-rotation script. - Restore the R2 documents (already on R2 and untouched if R2 access was not compromised).
- Repoint DNS to the new VM IP.
- Verify the public health endpoint returns HTTP 200.
- The Firm publishes a post-mortem under
docs/incidents/within 30 days.
5.4 R2 service outage
- Application falls back to serving documents already in the on-VM cache.
- Uploads queue locally and retry on R2 recovery.
- If R2 outage exceeds 24 hours, the Firm provisions an alternative S3-compatible bucket (e.g., a separate Linode Object Storage) and redirects writes there until R2 recovers.
5.5 Linode region failure
- The Firm provisions a fresh VM in an alternative Linode region (Singapore as nearest), restores from R2 backup, repoints DNS.
- Note: a region-failure recovery into a non-India region may temporarily breach sectoral data-localisation overlays (RBI, SEBI, IRDAI). The Firm flags affected engagements and disables AI features for them until the Indian region is restored. Affected partners are notified within four hours.
5.6 Catastrophic loss (region + R2 bucket + Firm password store all unavailable simultaneously)
This requires four independent failures. The Firm accepts this residual risk for the current scale; the source code in GitHub remains intact and the Firm can rebuild from scratch, with seven years of audit working papers being the unrecoverable loss. Customer-facing deployment will require a third backup region to retire this residual risk.
6. Operational procedure
The nightly backup job is a cron entry on the production VM:
0 2 * * * /home/kgujrathi/veloxis/scripts/backup-prod.sh >> /var/log/veloxis/backup.log 2>&1
The script scripts/backup-prod.sh (to be authored) performs the steps in §2 above and emits a one-line summary log entry. Daily log monitoring confirms the job ran.
7. Restore drill schedule
| Drill | Frequency | Owner |
|---|---|---|
| Quarterly read-only restore | 4 × per year | Krishna |
| Annual VM-rebuild dry run on a sandbox VM | 1 × per year | Krishna |
| Tabletop walk-through of §5 scenarios | 1 × per year | Krishna |
Each drill produces a dated entry under docs/incidents/. Findings feed remediation tasks.
8. Future-readiness
For external-customer deployment, the Firm will upgrade:
- RPO to 1 hour (streaming replication or WAL shipping).
- RTO to 1 hour (warm standby in a second region).
- Backup retention to 1 year minimum.
- Quarterly restore drill to monthly.
These upgrades are tracked in the Veloxis backlog under "Customer-readiness".