Skip to main content

LeadFinder Integration

Namespace: /v1/lead-finder

Purpose: Lead scraper connection tracking and task distribution

Authentication: None required


Socket Events

Client → Server Events

EventDescriptionAuthenticationScope Required
scraper_connectRegister scraper connectionNoneNone
disconnectScraper disconnected (automatic)N/AN/A

Database Models

  • LeadFinderScraper - Scraper connection tracking

Schema:

{
type: String, // Scraper type (e.g., 'google', 'yelp', 'bing')
name: String, // Scraper instance name
socketId: String, // Current socket connection ID
createdAt: Date,
updatedAt: Date
}

Event Details

scraper_connect

Purpose: Register external scraper with socket connection

Request:

socket.emit('scraper_connect', {
type: 'google', // Scraper type identifier
name: 'google-scraper-1', // Unique scraper name
});

No Response: Fire-and-forget event

Side Effects:

  • Creates or updates LeadFinderScraper document
  • Sets socketId to current connection ID
  • Uses { type, name } as unique identifier (upsert)

Example:

// Scraper connects
await ScraperModel.updateOne(
{ type: 'google', name: 'google-scraper-1' },
{ socketId: 'xyz123' },
{ upsert: true },
);

disconnect

Purpose: Handle scraper disconnection (automatic)

Trigger: Automatic when socket disconnects

Side Effects:

  • Connection removed from socket pool
  • ScraperModel record remains (socketId becomes stale)
  • Next scraper_connect updates socketId

REST API Integration

The LeadFinder namespace is controlled via REST API endpoints:

Emit Custom Event to Scraper

Endpoint: POST /lead-finder/:event

Purpose: Send task or command to specific scraper

Request:

POST /lead-finder/scrape_task
Content-Type: application/json

{
"socketId": "xyz123",
"payload": {
"task_id": "task_001",
"keyword": "plumber near me",
"location": "New York, NY",
"max_results": 50
}
}

Response:

{ success: true, message: 'SUCCESS' }

Controller Logic:

exports.emit = async (req, res, next) => {
const io = req.io;
const event = req.params.event; // 'scrape_task'
const socketId = req.body.socketId; // 'xyz123'
const payload = req.body.payload;

io.to(socketId).emit(event, payload); // Emit to specific scraper

return res.status(200).json({ success: true, message: 'SUCCESS' });
};

Use Cases

1. Lead Scraping Task Distribution

Flow:

1. User initiates lead scraping from Internal API
POST /v1/lead-finder/scrape
body: { keyword, location, type: 'google' }

2. Internal API finds available scraper:
query LeadFinderScraper where type = 'google'

3. Internal API calls General Socket REST API:
POST /lead-finder/scrape_task
body: { socketId, payload: { task_id, keyword, location } }

4. General Socket emits 'scrape_task' to scraper socket

5. Scraper receives task and starts processing

6. Scraper sends results back via webhook or socket
POST /v1/lead-finder/results

2. Scraper Health Check

Flow:

1. Cron job queries all scrapers
LeadFinderScraper.find({})

2. For each scraper, emit 'health_check' event:
POST /lead-finder/health_check
body: { socketId: scraper.socketId }

3. Scraper responds with status:
socket.emit('health_status', { status: 'active', tasks: 3 })

4. Internal API updates scraper availability

3. Scraper Load Balancing

Internal API Logic:

// Find available scraper with least tasks
const scrapers = await LeadFinderScraper.find({ type: 'google' });
const availableScraper = scrapers.find(s => s.socketId && s.active);

// Emit task to scraper
await axios.post('http://general-socket:4000/lead-finder/scrape_task', {
socketId: availableScraper.socketId,
payload: { task_id, keyword, location },
});

Client Example (Scraper)

import io from 'socket.io-client';

const socket = io('http://localhost:4000/v1/lead-finder', {
transports: ['websocket'],
// No authentication required
});

// Register scraper
socket.on('connect', () => {
console.log('Connected to LeadFinder namespace');

socket.emit('scraper_connect', {
type: 'google',
name: 'google-scraper-1',
});
});

// Listen for scraping tasks
socket.on('scrape_task', async payload => {
console.log('Received scrape task:', payload);

const { task_id, keyword, location, max_results } = payload;

try {
// Perform scraping
const results = await scrapGoogle(keyword, location, max_results);

// Send results back to Internal API
await axios.post('http://internal-api:5002/v1/lead-finder/results', {
task_id,
results,
status: 'completed',
});

console.log(`Task ${task_id} completed`);
} catch (error) {
console.error(`Task ${task_id} failed:`, error);

// Report error
await axios.post('http://internal-api:5002/v1/lead-finder/results', {
task_id,
status: 'failed',
error: error.message,
});
}
});

// Handle health checks
socket.on('health_check', () => {
socket.emit('health_status', {
status: 'active',
tasks: currentTaskCount,
uptime: process.uptime(),
});
});

// Handle disconnection
socket.on('disconnect', () => {
console.log('Disconnected from LeadFinder namespace');
});

Scraper Types

Common scraper types registered:

  • google - Google Maps/Search lead scraping
  • yelp - Yelp business listings
  • bing - Bing Places scraping
  • yellowpages - Yellow Pages directory
  • facebook - Facebook business pages
  • linkedin - LinkedIn company profiles

Security Considerations

  1. No Authentication: Anyone can connect to /v1/lead-finder namespace
  2. IP Whitelisting: Recommended for production (only allow scraper IPs)
  3. Rate Limiting: Implement per-socket rate limits
  4. Task Validation: Validate task payload before emitting
  5. Scraper Verification: Verify scraper identity through additional mechanism

Monitoring

Key Metrics:

  • Active scraper count
  • Tasks per scraper
  • Task completion rate
  • Average task duration
  • Socket connection/disconnection rate

Recommended Monitoring:

// Count active scrapers
const activeScrapers = await LeadFinderScraper.countDocuments({
socketId: { $exists: true, $ne: null },
});

// Check scraper uptime
io.of('/v1/lead-finder').sockets.forEach(socket => {
console.log('Active scraper socket:', socket.id);
});
💬

Documentation Assistant

Ask me anything about the docs

Hi! I'm your documentation assistant. Ask me anything about the docs!

I can help you with:
- Code examples
- Configuration details
- Troubleshooting
- Best practices

Try asking: How do I configure the API?
09:31 AM