Skip to main content

๐Ÿงน Contact Export Cleanup

๐Ÿ“– Overviewโ€‹

The Contact Export Cleanup module automatically removes expired CSV export files from Wasabi S3 storage and their corresponding database records. It runs on a scheduled basis to maintain storage hygiene and comply with data retention policies.

Cron Schedule: Configured in crons/contacts/cleanup.js (typically daily)

Source Files:

  • Cron: queue-manager/crons/contacts/cleanup.js
  • Service: queue-manager/services/contacts/cleanup.js (~35 lines)

๐ŸŽฏ Business Purposeโ€‹

Ensures:

  • Storage Cost Management: Removes old export files to reduce S3 storage costs
  • Data Retention Compliance: Adheres to 24-hour retention policy for exports
  • System Hygiene: Prevents accumulation of temporary export files
  • Database Cleanup: Removes orphaned upload records
  • Security: Limits exposure of exported data to 24-hour window

๐Ÿ”„ Complete Processing Flowโ€‹

sequenceDiagram
participant CRON as Cleanup Cron
participant SERVICE as Cleanup Service
participant DB as MongoDB (Uploads)
participant WASABI as Wasabi S3
participant LOGGER as Logger

loop Daily Schedule
CRON->>SERVICE: Execute cleanupContactsCSV()
SERVICE->>DB: Query exports > 24 hours old
DB-->>SERVICE: Export records

alt Exports Found
loop For each export
SERVICE->>LOGGER: Log cleanup start
SERVICE->>WASABI: deleteByKeyPublic(key)
WASABI-->>SERVICE: File deleted
SERVICE->>DB: Delete upload record
DB-->>SERVICE: Record deleted
end
SERVICE->>LOGGER: Log completion
else No Exports
SERVICE->>LOGGER: No files to cleanup
end
end

๐Ÿ”ง Main Service Functionโ€‹

cleanupContactsCSV()โ€‹

Purpose: Identifies and removes expired contact export files from both S3 storage and database.

Complete Source Codeโ€‹

const UploadModel = require('../../models/uploads');
const wasabiUtility = require('../../utilities/wasabi');
const logger = require('../../utilities/logger');

exports.cleanupContactsCSV = async () => {
try {
// Step 1: Query exports older than 24 hours
let exportData = await UploadModel.find({
$or: [
{
source: 'person-export',
type: 'public',
createdAt: {
$lte: new Date(new Date() - 24 * 60 * 60 * 1000),
},
},
{
source: 'business-export',
type: 'public',
createdAt: {
$lte: new Date(new Date() - 24 * 60 * 60 * 1000),
},
},
],
})
.sort({ createdAt: -1 })
.lean()
.exec();

if (exportData.length > 0) {
let wasabiObj = new wasabiUtility();

// Step 2: Delete all files in parallel
await Promise.all(
exportData.map(async d => {
logger.log({
initiator: 'wasabi cleanup',
message: `cleaning up ${d.key} from wasabi`,
});

await wasabiObj.deleteByKeyPublic(d.key);
await UploadModel.deleteOne({ _id: d._id });
}),
);
}

logger.log({
initiator: 'wasabi cleanup',
message: `Deletion completed`,
});
} catch (error) {
console.log(error);
}
};

๐Ÿ“‹ Step-by-Step Logicโ€‹

Step 1: Query Expired Exportsโ€‹

let exportData = await UploadModel.find({
$or: [
{
source: 'person-export',
type: 'public',
createdAt: {
$lte: new Date(new Date() - 24 * 60 * 60 * 1000),
},
},
{
source: 'business-export',
type: 'public',
createdAt: {
$lte: new Date(new Date() - 24 * 60 * 60 * 1000),
},
},
],
})
.sort({ createdAt: -1 })
.lean()
.exec();

Query Logic:

OR Conditionโ€‹

Matches either person or business exports:

  • source: 'person-export': Contact/person exports
  • source: 'business-export': Company/business exports

Filters Appliedโ€‹

  • type: 'public': Only public uploads (accessible via direct URL)
  • createdAt: { $lte: ... }: Created 24+ hours ago

Date Calculationโ€‹

new Date(new Date() - 24 * 60 * 60 * 1000);
// 24 hours * 60 minutes * 60 seconds * 1000 milliseconds
// = 86,400,000 milliseconds = 24 hours

Example:

  • Current time: 2025-10-10 15:00:00
  • Cutoff time: 2025-10-09 15:00:00
  • Files created before cutoff are selected

Sort and Optimizationโ€‹

  • sort({ createdAt: -1 }): Oldest first (for logging consistency)
  • lean(): Returns plain JavaScript objects (faster, no Mongoose overhead)
  • exec(): Explicitly executes query

Step 2: Parallel Deletionโ€‹

await Promise.all(
exportData.map(async d => {
logger.log({
initiator: 'wasabi cleanup',
message: `cleaning up ${d.key} from wasabi`,
});

await wasabiObj.deleteByKeyPublic(d.key);
await UploadModel.deleteOne({ _id: d._id });
}),
);

Parallel Processing:

  • Promise.all(): Executes all deletions concurrently
  • Significantly faster than sequential deletion
  • All deletions must complete before function continues

Two-Step Deletion:

  1. S3 Deletion: wasabiObj.deleteByKeyPublic(d.key)

    • Removes file from Wasabi S3 storage
    • Uses public bucket deletion method
    • Key format: exports/contacts/account-123/export-456.csv
  2. Database Deletion: UploadModel.deleteOne({ _id: d._id })

    • Removes upload record from MongoDB
    • Prevents orphaned database records
    • Frees up database storage

Logging:

  • Logs each file being cleaned up
  • Includes S3 key for audit trail
  • Helps troubleshoot deletion failures

Step 3: Completion Loggingโ€‹

logger.log({
initiator: 'wasabi cleanup',
message: `Deletion completed`,
});

Purpose:

  • Confirms cleanup job completed
  • Marks end of cleanup cycle
  • Useful for monitoring and alerting

๐Ÿ“Š Data Structuresโ€‹

UploadModel Documentโ€‹

{
_id: ObjectId,
source: 'person-export', // 'person-export' or 'business-export'
type: 'public', // 'public' or 'private'
key: 'exports/contacts/account-123/export-456.csv',
url: 'https://bucket.s3.wasabisys.com/exports/...',
filename: 'contacts_export_2025-10-09.csv',
size: 1024576, // File size in bytes
account_id: ObjectId,
user_id: ObjectId,
createdAt: Date, // Used for 24-hour filter
updatedAt: Date
}

Cleanup Query Result Exampleโ€‹

[
{
_id: '507f1f77bcf86cd799439011',
source: 'person-export',
type: 'public',
key: 'exports/contacts/acc123/person-2025-10-08.csv',
createdAt: new Date('2025-10-08T10:00:00Z'), // 2+ days old
// ... other fields
},
{
_id: '507f1f77bcf86cd799439012',
source: 'business-export',
type: 'public',
key: 'exports/contacts/acc456/business-2025-10-07.csv',
createdAt: new Date('2025-10-07T14:00:00Z'), // 3+ days old
// ... other fields
},
];

๐ŸŽจ Usage Patternsโ€‹

Typical Cleanup Cycleโ€‹

// Cron runs daily at midnight
// Example: 12:00 AM every day

// 1. User exports contacts at 10:00 AM on Oct 9
const exportRecord = await UploadModel.create({
source: 'person-export',
type: 'public',
key: 'exports/contacts/acc123/export.csv',
createdAt: new Date('2025-10-09T10:00:00Z'),
});

// 2. Export remains available for 24 hours

// 3. Cleanup runs at midnight on Oct 10 (14 hours later)
// File NOT deleted (only 14 hours old)

// 4. Cleanup runs at midnight on Oct 11 (38 hours later)
// File DELETED (over 24 hours old)

Manual Cleanup Triggerโ€‹

// For immediate cleanup (e.g., after storage migration)
const { cleanupContactsCSV } = require('./services/contacts/cleanup');

await cleanupContactsCSV();
console.log('Manual cleanup completed');

โš™๏ธ Configurationโ€‹

Required Environment Variablesโ€‹

# Wasabi S3 Configuration
WASABI_ACCESS_KEY=your-access-key
WASABI_SECRET_KEY=your-secret-key
WASABI_BUCKET=your-bucket-name
WASABI_REGION=us-east-1
WASABI_ENDPOINT=s3.wasabisys.com

# MongoDB
MONGO_DB_URL=mongodb://...

Cleanup Timingโ€‹

// Retention period: 24 hours
const RETENTION_HOURS = 24;
const RETENTION_MS = RETENTION_HOURS * 60 * 60 * 1000;

// Can be made configurable via environment variable:
const RETENTION_MS = process.env.EXPORT_RETENTION_HOURS
? process.env.EXPORT_RETENTION_HOURS * 60 * 60 * 1000
: 24 * 60 * 60 * 1000;

Cron Schedule Examplesโ€‹

// Daily at midnight
'0 0 * * *';

// Every 6 hours
'0 */6 * * *';

// Every hour
'0 * * * *';

// Twice daily (midnight and noon)
'0 0,12 * * *';

๐Ÿšจ Error Handlingโ€‹

Top-Level Error Handlingโ€‹

try {
// Cleanup logic
} catch (error) {
console.log(error);
}

Error Behavior:

  • Logs error to console
  • Does not rethrow (prevents cron failure)
  • Cleanup will retry on next scheduled run
  • No user notification on failures

Partial Failure Handlingโ€‹

await Promise.all(
exportData.map(async d => {
// If one deletion fails, others continue
await wasabiObj.deleteByKeyPublic(d.key);
await UploadModel.deleteOne({ _id: d._id });
}),
);

Failure Scenarios:

  1. S3 Deletion Fails, Database Succeeds:

    • File remains in S3 (orphaned)
    • Database record removed
    • File will be manually cleaned or remain until bucket lifecycle policy
  2. S3 Deletion Succeeds, Database Fails:

    • File removed from S3
    • Database record remains (orphaned)
    • Will be attempted again on next run (S3 deletion will fail gracefully)
  3. One File Fails, Others Continue:

    • Promise.all() continues other deletions
    • Failed file logged but not retried
    • Requires manual intervention

Improvement: Better Error Handlingโ€‹

await Promise.all(
exportData.map(async d => {
try {
logger.log({
initiator: 'wasabi cleanup',
message: `cleaning up ${d.key}`,
});

await wasabiObj.deleteByKeyPublic(d.key);
await UploadModel.deleteOne({ _id: d._id });

logger.log({
initiator: 'wasabi cleanup',
message: `Successfully cleaned up ${d.key}`,
});
} catch (error) {
logger.error({
initiator: 'wasabi cleanup',
message: `Failed to cleanup ${d.key}`,
error: error.message,
});
// Continue with other deletions
}
}),
);

๐Ÿ“ˆ Performance Considerationsโ€‹

Optimization Strategiesโ€‹

  1. Parallel Deletion: Uses Promise.all() for concurrent operations
  2. Lean Queries: Uses .lean() to avoid Mongoose document overhead
  3. Targeted Query: Filters at database level (not in-memory)
  4. Batch Processing: No pagination needed for typical volumes

Scalabilityโ€‹

  • Query Performance: Indexed on source, type, createdAt
  • S3 Rate Limits: Wasabi allows 100 deletes/second (sufficient)
  • Deletion Speed: ~100-200ms per file (S3 + DB)
  • Typical Load: 10-100 exports per day (cleanup takes 1-10 seconds)

Index Recommendationโ€‹

// Uploads collection indexes
{
source: 1,
type: 1,
createdAt: 1
}

// Or compound index
{
source: 1,
type: 1,
createdAt: -1
}

Typical Performanceโ€‹

  • Small Cleanup (< 10 files): 1-2 seconds
  • Medium Cleanup (10-50 files): 5-10 seconds
  • Large Cleanup (50-200 files): 15-30 seconds
  • Very Large (200+ files): 1-2 minutes

๐Ÿงช Testing Considerationsโ€‹

Mock Setupโ€‹

jest.mock('../../models/uploads');
jest.mock('../../utilities/wasabi');
jest.mock('../../utilities/logger');

const { cleanupContactsCSV } = require('./services/contacts/cleanup');

Test Casesโ€‹

describe('cleanupContactsCSV', () => {
test('Deletes expired person exports', async () => {
const mockExports = [
{
_id: 'export1',
source: 'person-export',
type: 'public',
key: 'exports/test.csv',
createdAt: new Date(Date.now() - 48 * 60 * 60 * 1000), // 48 hours old
},
];

UploadModel.find.mockReturnValue({
sort: jest.fn().mockReturnThis(),
lean: jest.fn().mockReturnThis(),
exec: jest.fn().mockResolvedValue(mockExports),
});

await cleanupContactsCSV();

expect(wasabiUtility.prototype.deleteByKeyPublic).toHaveBeenCalledWith('exports/test.csv');
expect(UploadModel.deleteOne).toHaveBeenCalledWith({ _id: 'export1' });
});

test('Does not delete recent exports', async () => {
const mockExports = [];

UploadModel.find.mockReturnValue({
sort: jest.fn().mockReturnThis(),
lean: jest.fn().mockReturnThis(),
exec: jest.fn().mockResolvedValue(mockExports),
});

await cleanupContactsCSV();

expect(wasabiUtility.prototype.deleteByKeyPublic).not.toHaveBeenCalled();
});

test('Handles errors gracefully', async () => {
UploadModel.find.mockImplementation(() => {
throw new Error('Database error');
});

await expect(cleanupContactsCSV()).resolves.not.toThrow();
expect(console.log).toHaveBeenCalledWith(expect.any(Error));
});
});

๐Ÿ“ Notesโ€‹

Why 24-Hour Retention?โ€‹

  • Balance: Gives users time to download without long-term storage
  • Security: Limits exposure of exported customer data
  • Cost: Reduces S3 storage costs
  • Compliance: Meets temporary data processing requirements

Public vs Private Exportsโ€‹

This cleanup only targets public exports:

  • Public: Direct download URLs (expires after 24 hours)
  • Private: Internal use, longer retention, different cleanup policy

Storage Cost Impactโ€‹

Typical savings:

  • 100 exports/day: ~500MB/day
  • 30 days without cleanup: ~15GB accumulated
  • With cleanup: < 1GB at any time
  • Annual savings: ~180GB vs ~6TB

Orphaned Recordsโ€‹

Possible orphan scenarios:

  1. S3 file deleted manually but database record remains
  2. Database record deleted but S3 file remains
  3. Partial cleanup failure

Solution: Run cleanup more frequently or add reconciliation job

Alternative: S3 Lifecycle Policyโ€‹

Instead of application-level cleanup, could use S3 bucket lifecycle rules:

// Wasabi lifecycle rule
{
"Rules": [{
"ID": "cleanup-exports",
"Status": "Enabled",
"Prefix": "exports/contacts/",
"Expiration": {
"Days": 1
}
}]
}

Pros: Automatic, no code maintenance
Cons: Doesn't cleanup database records, less control


Complexity: Low
Business Impact: Medium - Cost optimization and compliance
Dependencies: Wasabi S3, UploadModel
Last Updated: 2025-10-10

๐Ÿ’ฌ

Documentation Assistant

Ask me anything about the docs

Hi! I'm your documentation assistant. Ask me anything about the docs!

I can help you with:
- Code examples
- Configuration details
- Troubleshooting
- Best practices

Try asking: How do I configure the API?
09:31 AM