Deployment Guide
This guide covers environment configuration, Docker deployment, monitoring, and production best practices for the General Socket service.
Environment Variables
Core Configuration
# Service Identity
PORT=4000
SERVICE_NAME=general-socket
# MongoDB Connection
MONGO_HOST=mongodb://localhost:27017/dashclicks
MONGO_CONNECTION_STRING=mongodb://username:password@host:27017/dashclicks
# Redis Configuration (Required for Socket.IO adapter)
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=your-redis-password
# API Endpoints
INTERNAL_API_URL=http://localhost:5002
EXTERNAL_API_URL=http://localhost:5003
# JWT Authentication
JWT_SECRET=your-jwt-secret-key
JWT_ISSUER=dashclicks-auth
JWT_AUDIENCE=dashclicks-api
# CORS Configuration
ALLOWED_ORIGINS=http://localhost:3000,https://app.dashclicks.com
Socket.IO Configuration
# Connection Settings
SOCKET_PING_TIMEOUT=60000 # 60 seconds
SOCKET_PING_INTERVAL=25000 # 25 seconds
SOCKET_MAX_HTTP_BUFFER_SIZE=1048576 # 1MB
# Performance Tuning
SOCKET_TRANSPORTS=["websocket", "polling"]
SOCKET_UPGRADE_TIMEOUT=10000 # 10 seconds
SOCKET_CLOSE_TIMEOUT=60000 # 60 seconds
Logging & Monitoring
# Log Level
LOG_LEVEL=info # debug, info, warn, error
# Monitoring
ENABLE_METRICS=true
METRICS_PORT=9090
# Sentry Error Tracking
SENTRY_DSN=https://your-sentry-dsn@sentry.io/project-id
SENTRY_ENVIRONMENT=production
SENTRY_TRACES_SAMPLE_RATE=0.1
Feature Flags
# Enable/Disable Namespaces
ENABLE_CONVERSATION=true
ENABLE_LIVECHAT=true
ENABLE_LEADFINDER=true
ENABLE_SHARED=true
# Rate Limiting
ENABLE_RATE_LIMITING=true
RATE_LIMIT_WINDOW_MS=60000 # 1 minute
RATE_LIMIT_MAX_REQUESTS=100 # per window
Docker Deployment
Dockerfile
FROM node:18-alpine
WORKDIR /app
# Install dependencies
COPY package*.json ./
RUN npm ci --only=production
# Copy application code
COPY . .
# Copy shared models and utilities
COPY ../shared/models ./models
COPY ../shared/utilities ./utilities
# Expose port
EXPOSE 4000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
CMD node -e "require('http').get('http://localhost:4000/health', (r) => r.statusCode === 200 ? process.exit(0) : process.exit(1))"
# Start service
CMD ["node", "index.js"]
Docker Compose
version: '3.8'
services:
general-socket:
build:
context: .
dockerfile: Dockerfile
container_name: general-socket
restart: unless-stopped
ports:
- '4000:4000'
environment:
- PORT=4000
- MONGO_HOST=mongodb://mongo:27017/dashclicks
- REDIS_HOST=redis
- REDIS_PORT=6379
- JWT_SECRET=${JWT_SECRET}
- INTERNAL_API_URL=http://internal-api:5002
- LOG_LEVEL=info
depends_on:
- mongo
- redis
networks:
- dashclicks-network
volumes:
- ./logs:/app/logs
redis:
image: redis:7-alpine
container_name: redis
restart: unless-stopped
ports:
- '6379:6379'
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis-data:/data
networks:
- dashclicks-network
mongo:
image: mongo:6
container_name: mongo
restart: unless-stopped
ports:
- '27017:27017'
environment:
- MONGO_INITDB_ROOT_USERNAME=${MONGO_USERNAME}
- MONGO_INITDB_ROOT_PASSWORD=${MONGO_PASSWORD}
volumes:
- mongo-data:/data/db
networks:
- dashclicks-network
volumes:
redis-data:
mongo-data:
networks:
dashclicks-network:
driver: bridge
Multi-Instance Deployment (Horizontal Scaling)
version: '3.8'
services:
general-socket-1:
build: .
environment:
- PORT=4000
- REDIS_HOST=redis
- INSTANCE_ID=1
ports:
- '4001:4000'
networks:
- dashclicks-network
general-socket-2:
build: .
environment:
- PORT=4000
- REDIS_HOST=redis
- INSTANCE_ID=2
ports:
- '4002:4000'
networks:
- dashclicks-network
general-socket-3:
build: .
environment:
- PORT=4000
- REDIS_HOST=redis
- INSTANCE_ID=3
ports:
- '4003:4000'
networks:
- dashclicks-network
nginx:
image: nginx:alpine
ports:
- '4000:80'
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- general-socket-1
- general-socket-2
- general-socket-3
networks:
- dashclicks-network
networks:
dashclicks-network:
driver: bridge
Nginx Load Balancer Configuration
http {
upstream general_socket {
ip_hash; # Sticky sessions for Socket.IO
server general-socket-1:4000;
server general-socket-2:4000;
server general-socket-3:4000;
}
server {
listen 80;
server_name socket.dashclicks.com;
location / {
proxy_pass http://general_socket;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 7d;
proxy_send_timeout 7d;
proxy_read_timeout 7d;
}
location /health {
proxy_pass http://general_socket/health;
access_log off;
}
}
}
Kubernetes Deployment
Deployment Manifest
apiVersion: apps/v1
kind: Deployment
metadata:
name: general-socket
namespace: dashclicks
spec:
replicas: 3
selector:
matchLabels:
app: general-socket
template:
metadata:
labels:
app: general-socket
spec:
containers:
- name: general-socket
image: dashclicks/general-socket:latest
ports:
- containerPort: 4000
name: http
env:
- name: PORT
value: '4000'
- name: REDIS_HOST
value: redis-service
- name: REDIS_PORT
value: '6379'
- name: MONGO_HOST
valueFrom:
secretKeyRef:
name: mongo-secret
key: connection-string
- name: JWT_SECRET
valueFrom:
secretKeyRef:
name: jwt-secret
key: secret
resources:
requests:
memory: '256Mi'
cpu: '250m'
limits:
memory: '512Mi'
cpu: '500m'
livenessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 4000
initialDelaySeconds: 10
periodSeconds: 5
Service Manifest
apiVersion: v1
kind: Service
metadata:
name: general-socket-service
namespace: dashclicks
spec:
type: LoadBalancer
sessionAffinity: ClientIP # Sticky sessions for Socket.IO
selector:
app: general-socket
ports:
- port: 4000
targetPort: 4000
protocol: TCP
Ingress Manifest
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: general-socket-ingress
namespace: dashclicks
annotations:
nginx.ingress.kubernetes.io/websocket-services: general-socket-service
nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
nginx.ingress.kubernetes.io/proxy-send-timeout: '3600'
spec:
rules:
- host: socket.dashclicks.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: general-socket-service
port:
number: 4000
tls:
- hosts:
- socket.dashclicks.com
secretName: socket-tls-secret
Monitoring & Observability
Health Check Endpoint
// Add to index.js
app.get('/health', (req, res) => {
const health = {
status: 'UP',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks: {
mongodb: mongoStatus,
redis: redisStatus,
socketConnections: io.engine.clientsCount,
},
};
res.status(200).json(health);
});
Prometheus Metrics
// Install: npm install prom-client
const promClient = require('prom-client');
// Metrics
const socketConnectionsGauge = new promClient.Gauge({
name: 'socket_connections_total',
help: 'Total number of active socket connections',
});
const socketEventsCounter = new promClient.Counter({
name: 'socket_events_total',
help: 'Total number of socket events',
labelNames: ['event', 'namespace'],
});
const socketEmissionLatency = new promClient.Histogram({
name: 'socket_emission_duration_seconds',
help: 'Socket emission latency',
labelNames: ['event'],
});
// Update metrics
io.on('connection', socket => {
socketConnectionsGauge.inc();
socket.on('disconnect', () => {
socketConnectionsGauge.dec();
});
socket.onAny((event, ...args) => {
socketEventsCounter.inc({ event, namespace: socket.nsp.name });
});
});
// Expose metrics endpoint
app.get('/metrics', (req, res) => {
res.set('Content-Type', promClient.register.contentType);
res.end(promClient.register.metrics());
});
Grafana Dashboard
Key Metrics to Monitor:
- Active socket connections
- Events per second (by namespace)
- Message emission latency
- Redis pub/sub lag
- Memory usage
- CPU usage
- Error rate
Example Queries:
# Active connections
socket_connections_total
# Event rate
rate(socket_events_total[5m])
# 95th percentile latency
histogram_quantile(0.95, socket_emission_duration_seconds)
# Error rate
rate(socket_errors_total[5m])
Logging Best Practices
Structured Logging with Winston
const winston = require('winston');
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json(),
),
defaultMeta: {
service: 'general-socket',
instance: process.env.INSTANCE_ID,
},
transports: [
new winston.transports.File({
filename: 'logs/error.log',
level: 'error',
}),
new winston.transports.File({
filename: 'logs/combined.log',
}),
],
});
if (process.env.NODE_ENV !== 'production') {
logger.add(
new winston.transports.Console({
format: winston.format.simple(),
}),
);
}
// Usage
io.on('connection', socket => {
logger.info('Socket connected', {
socketId: socket.id,
userId: socket.userId,
namespace: socket.nsp.name,
});
});
Log Aggregation with ELK Stack
Filebeat Configuration:
filebeat.inputs:
- type: log
enabled: true
paths:
- /app/logs/*.log
json.keys_under_root: true
json.add_error_key: true
output.elasticsearch:
hosts: ['elasticsearch:9200']
index: 'general-socket-%{+yyyy.MM.dd}'
setup.kibana:
host: 'kibana:5601'
Testing
Unit Tests
# Run all tests
npm test
# Run with coverage
npm run coverage
# Run specific test suite
npm test -- --testPathPattern=conversation
Integration Tests
// tests/integration/socket.test.js
const io = require('socket.io-client');
describe('Socket Connection', () => {
let socket;
beforeAll(() => {
socket = io('http://localhost:4000/v1/conversation', {
auth: { token: validJWT },
});
});
afterAll(() => {
socket.close();
});
test('should connect with valid JWT', done => {
socket.on('connect', () => {
expect(socket.connected).toBe(true);
done();
});
});
test('should emit and receive message', done => {
socket.emit('convo_send_message', {
conversationId: 'test-convo',
message: 'Hello',
});
socket.on('convo_new_message', data => {
expect(data.message).toBe('Hello');
done();
});
});
});
Load Testing with Artillery
# artillery-load-test.yml
config:
target: 'http://localhost:4000'
socketio:
transports: ['websocket']
phases:
- duration: 60
arrivalRate: 10
name: 'Warm up'
- duration: 300
arrivalRate: 50
name: 'Sustained load'
- duration: 60
arrivalRate: 100
name: 'Spike test'
scenarios:
- name: 'Socket Connection and Message'
engine: socketio
flow:
- emit:
channel: 'convo_join'
data:
conversationId: 'test-convo-{{ $randomNumber() }}'
- think: 3
- emit:
channel: 'convo_send_message'
data:
conversationId: 'test-convo-{{ $randomNumber() }}'
message: 'Load test message'
- think: 5
Run Load Test:
artillery run artillery-load-test.yml
Production Checklist
Pre-Deployment
- Environment variables configured
- JWT secret set (strong, random)
- Redis connection tested
- MongoDB connection tested
- CORS origins whitelisted
- SSL/TLS certificates installed
- Load balancer configured with sticky sessions
- Health checks enabled
- Monitoring dashboards created
Security
- JWT authentication enforced (except LeadFinder)
- Rate limiting enabled
- Input validation on all socket events
- SQL injection prevention (Mongoose escaping)
- XSS prevention (sanitize user input)
- API key authentication for REST endpoints
- Network isolation (internal services only)
Performance
- Redis adapter enabled for multi-instance
- Connection pooling optimized
- Socket timeout configured (60s)
- Message size limits enforced (1MB)
- Database indexes created
- Query optimization completed
- Caching strategy implemented
Monitoring
- Prometheus metrics exposed
- Grafana dashboards configured
- Error tracking (Sentry) enabled
- Log aggregation (ELK) configured
- Alerts configured (high error rate, low connections)
- Uptime monitoring (Pingdom/UptimeRobot)
Disaster Recovery
- Database backups automated (daily)
- Redis persistence enabled (AOF)
- Graceful shutdown implemented
- Auto-restart on crash (PM2/systemd)
- Rollback plan documented
- Incident response plan created
Troubleshooting
Common Issues
Socket Connections Not Working
Symptoms: Clients can't connect, timeout errors
Solutions:
- Check CORS configuration matches frontend origin
- Verify JWT token is valid and not expired
- Check firewall allows WebSocket connections (port 4000)
- Ensure Nginx/Load balancer has WebSocket upgrade headers
- Check Redis connection (Redis adapter requirement)
Redis Connection Errors
Symptoms: Error: Redis connection refused
Solutions:
- Verify Redis is running:
redis-cli ping - Check Redis host/port in environment variables
- Verify Redis password if authentication enabled
- Check network connectivity to Redis server
- Review Redis logs for errors
High Memory Usage
Symptoms: Memory usage increasing over time
Solutions:
- Check for socket memory leaks (disconnect listeners)
- Monitor active socket connections (may have stale sockets)
- Implement socket connection limits per user
- Review message buffering strategy
- Enable Redis persistence to offload memory
Message Delivery Failures
Symptoms: Messages not reaching clients
Solutions:
- Verify user is in correct Socket.IO room
- Check socket ID exists in userSocketObj
- Review emit logic (ensure correct namespace)
- Check client-side event listeners are registered
- Verify MongoDB Communication record created
Scaling Strategies
Vertical Scaling
- Increase CPU/memory allocation
- Optimize Node.js event loop
- Use clustering (PM2)
Horizontal Scaling
- Deploy multiple instances behind load balancer
- Use Redis adapter for pub/sub
- Implement sticky sessions (IP hash)
- Database read replicas for queries
Database Optimization
- Index frequently queried fields
- Use aggregation pipelines for complex queries
- Implement caching layer (Redis)
- Archive old messages/conversations
Redis Optimization
- Use Redis Cluster for high availability
- Enable Redis persistence (AOF + RDB)
- Monitor Redis memory usage
- Set eviction policy (allkeys-lru)
Maintenance
Regular Tasks
- Daily: Monitor error logs and metrics
- Weekly: Review socket connection patterns
- Monthly: Database cleanup (old messages)
- Quarterly: Security audit and dependency updates
Upgrade Process
- Test new version in staging environment
- Deploy to single instance (canary deployment)
- Monitor metrics for 1 hour
- Gradually roll out to remaining instances
- Keep previous version ready for rollback