Skip to main content

Yext Scan Service

๐Ÿ“– Overviewโ€‹

The Yext Scan service detects duplicate business listings across the web that may compete with or harm a business's online presence. It scans publisher directories to find existing listings for a business, identifies duplicates, and provides tools to monitor and manage them. Scan results are cached in MongoDB with TTL expiration for efficient retrieval.

Source File: external/Integrations/Yext/services/scan.service.js
External API: Yext Scan API
Primary Use: Detect duplicate listings and monitor online presence

๐Ÿ—„๏ธ Collections Usedโ€‹

yext-scansโ€‹

  • Operations: Create, Read, Update, Delete
  • Model: external/models/yext-scans.js
  • Usage Context: Cache scan results with automatic expiration
  • Key Fields:
    • account_id - DashClicks account ID
    • scan_data.jobId - Yext scan job ID
    • scan_data.sites - Array of detected listings with logos
    • business_info - Business data used for scan
    • expiresAt - TTL expiration timestamp

๐Ÿ”„ Data Flowโ€‹

Initial Scan Creationโ€‹

sequenceDiagram
participant Client
participant ScanService
participant ScanDB as YextScan Collection
participant YextScanAPI as Yext Scan API
participant YextListingsAPI as Yext Listings API
participant Logger

Client->>ScanService: getScan(account_id, business_info)
ScanService->>ScanDB: Find existing scan

alt No scan exists
ScanService->>YextScanAPI: POST /scan (create)
Note over ScanService,YextScanAPI: Body: name, address, phone
YextScanAPI-->>ScanService: jobId + sites array

ScanService->>YextScanAPI: GET /scan/:jobId/:siteIds
YextScanAPI-->>ScanService: Scan results

ScanService->>ScanService: Add logos to results
ScanService->>ScanDB: Save scan with TTL
ScanService-->>Client: Return scan with logos
end

style ScanService fill:#e3f2fd
style YextScanAPI fill:#fff4e6

Cached Scan Retrievalโ€‹

sequenceDiagram
participant Client
participant ScanService
participant ScanDB as YextScan Collection
participant YextScanAPI as Yext Scan API
participant YextListingsAPI as Yext Listings API
participant Logger

Client->>ScanService: getScan(account_id, business_info)
ScanService->>ScanDB: Find existing scan

alt Scan exists and valid
ScanDB-->>ScanService: Return cached jobId
ScanService->>YextScanAPI: GET /scan/:jobId/:siteIds
YextScanAPI-->>ScanService: Updated results

ScanService->>YextListingsAPI: GET /listings/publishers
YextListingsAPI-->>ScanService: Publisher details

ScanService->>ScanService: Merge publisher data
Note over ScanService: Add publisher names, URLs

ScanService->>ScanDB: Update cached results
ScanService-->>Client: Return updated scan
else Scan expired/invalid
ScanService->>YextScanAPI: POST /scan (recreate)
Note over ScanService: Follow creation flow
end

alt Error during merge
ScanService->>Logger: Log merge error
Note over Logger: Error logged but doesn't fail
end

style ScanService fill:#e3f2fd
style YextScanAPI fill:#fff4e6

๐Ÿ”ง Business Logic & Functionsโ€‹

create(requestBody)โ€‹

Purpose: Initiate a new duplicate listing scan with Yext

Source: services/scan.service.js

External API Endpoint: POST https://api.yext.com/v2/accounts/me/scan

Parameters:

  • requestBody (Object) - Business information for scan
    • name (String) - Business name
    • address (String) - Business address
    • phone (String) - Business phone number
    • Additional business fields

Returns: Promise<Object> - Scan creation response with jobId and sites

Business Logic Flow:

const v = process.env.YEXT_API_VPARAM || '20200525';
const url = 'https://api.yext.com/v2/accounts/me/scan';

const options = {
method: 'POST',
headers: { 'api-key': process.env.YEXT_API_KEY_SCAN },
params: { v: v },
data: requestBody,
url,
};

const dns = await axios(options);
return dns.data;

API Request Example:

POST https://api.yext.com/v2/accounts/me/scan?v=20200525
Headers:
api-key: {YEXT_API_KEY_SCAN}
Content-Type: application/json
Body:
{
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY 10001",
"phone": "+1-212-555-0100"
}

API Response Example:

{
"meta": {
"uuid": "abc-def-123",
"errors": []
},
"response": {
"jobId": "scan_xyz789",
"sites": [
{
"siteId": "123",
"siteName": "Google",
"logo": "https://example.com/google-logo.png"
},
{
"siteId": "456",
"siteName": "Yelp",
"logo": "https://example.com/yelp-logo.png"
}
]
}
}

Key Business Rules:

  • Separate API Key: Uses YEXT_API_KEY_SCAN (different from main API key)
  • Returns Job ID: Use jobId to fetch detailed results
  • Site List: Initial response includes site IDs where duplicates may exist
  • Async Process: Scan runs in background, may take time to complete

get(scanId, siteIds)โ€‹

Purpose: Retrieve scan results for specific job and sites

Source: services/scan.service.js

External API Endpoint: GET https://api.yext.com/v2/accounts/me/scan/:scanId/:siteIds

Parameters:

  • scanId (String) - Scan job ID from create()
  • siteIds (String) - Comma-separated site IDs or array joined

Returns: Promise<Object> - Detailed scan results

Business Logic Flow:

const v = process.env.YEXT_API_VPARAM || '20200525';
const url = 'https://api.yext.com/v2/accounts/me/scan/' + scanId + '/' + siteIds;

const options = {
method: 'GET',
headers: { 'api-key': process.env.YEXT_API_KEY_SCAN },
params: { v: v },
url,
};

const dns = await axios(options);
return dns.data;

API Request Example:

GET https://api.yext.com/v2/accounts/me/scan/scan_xyz789/123,456,789?v=20200525
Headers:
api-key: {YEXT_API_KEY_SCAN}

API Response Example:

{
"meta": {
"uuid": "abc-def-123",
"errors": []
},
"response": [
{
"siteId": "123",
"siteName": "Google",
"status": "FOUND",
"url": "https://www.google.com/maps/place/...",
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY",
"phone": "+1-212-555-0100",
"duplicates": [
{
"url": "https://www.google.com/maps/place/another-listing",
"name": "Coffee Shop",
"address": "123 Broadway"
}
]
},
{
"siteId": "456",
"siteName": "Yelp",
"status": "FOUND",
"url": "https://www.yelp.com/biz/coffee-shop-nyc",
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY 10001",
"phone": "(212) 555-0100",
"duplicates": []
},
{
"siteId": "789",
"siteName": "Facebook",
"status": "NOT_FOUND",
"duplicates": []
}
]
}

Scan Result Structure:

  • siteId - Publisher site ID
  • siteName - Publisher name
  • status - FOUND, NOT_FOUND, SCANNING
  • url - URL of found listing
  • name, address, phone - Listing information
  • duplicates - Array of duplicate listings found

Key Business Rules:

  • Comma-Separated Sites: Multiple site IDs separated by commas
  • Async Results: Status may be SCANNING if not complete
  • Duplicate Detection: duplicates array contains potential duplicates

getScan(account_id, body)โ€‹

Purpose: Get or create scan for account with intelligent caching and error recovery

Source: services/scan.service.js

Parameters:

  • account_id (ObjectId) - DashClicks account ID
  • body (Object) - Business information for scan
    • name (String) - Business name
    • address (String) - Business address
    • phone (String) - Business phone
    • Additional fields

Returns: Promise<Object | null> - Scan data with sites or null

Business Logic Flow:

  1. Check for Existing Scan

    let data = await YextScan.findOne({ account_id });
  2. If Scan Exists

    if (data) {
    const resp = {
    jobId: data.scan_data.jobId,
    };

    const sites = data.scan_data.sites.map(site => site.siteId);
    let scans;

    try {
    scans = await exports.get(data.scan_data.jobId, sites);
    } catch (err) {
    // Handle invalid/expired job ID
    const errMsg = err?.response?.data?.meta?.errors?.[0];
    if (
    errMsg.code === 21 &&
    errMsg?.name === 'BAD_PARAMETER' &&
    errMsg.type === 'FATAL_ERROR' &&
    errMsg.message.startsWith('Invalid parameter Invalid Job ID')
    ) {
    // Recreate scan
    const scanData = await exports.create(body);
    const response = scanData.response;
    // ... fetch new results and update DB
    }
    }
    }
  3. Merge Publisher Data

    try {
    const url = 'https://api.yext.com/v2/accounts/me/listings/publishers';
    const options = {
    method: 'GET',
    headers: { 'api-key': process.env.YEXT_API_KEYS },
    data: body,
    params: { v },
    url,
    };
    let {
    data: {
    response: { publishers },
    },
    } = await axios(options);

    // Rename publisher fields
    for (pub of publishers || []) {
    pub.pubName = pub.name;
    delete pub.name;
    }

    // Merge publisher data with scan results
    scanResp = scanResp.map(sc => {
    const publisher = publishers.find(site => site.id === sc.siteId);
    if (publisher && !sc.hasOwnProperty('url')) {
    return { ...sc, ...publisher };
    }
    return sc;
    });
    } catch (err) {
    logger.error({
    initiator: 'external/yext/get-scan',
    error: err,
    message: 'Error while merging',
    });
    }
  4. Update Database

    resp.sites = scanResp;
    data = await YextScan.findOneAndUpdate({ account_id }, { scan_data: resp });

    data = data.toObject();
    data.id = data._id;
    delete data['_id'];
    delete data['__v'];
    delete data['expiresAt'];

    return data;
  5. If No Scan Exists

    else {
    if (body?.name) {
    const scanData = await exports.create(body);
    const response = scanData.response;
    const resp = {
    jobId: response.jobId
    };

    const sites = response.sites.map(site => site.siteId);
    const scans = await exports.get(response.jobId, sites);
    const scanResp = scans.response.map(sc => {
    sc.logo = response.sites.find(site => site.siteId === sc.siteId)?.logo;
    return sc;
    });

    resp.sites = scanResp;

    data = await (new YextScan({ account_id, scan_data: resp, business_info: body })).save();
    data = data.toObject();
    data.id = data._id;
    delete data["_id"]; delete data["__v"]; delete data["expiresAt"];

    return data;
    }
    else {
    return null;
    }
    }

Return Structure:

{
"id": ObjectId("507f1f77bcf86cd799439011"),
"account_id": ObjectId("507f191e810c19729de860ea"),
"scan_data": {
"jobId": "scan_xyz789",
"sites": [
{
"siteId": "123",
"siteName": "Google",
"logo": "https://example.com/google-logo.png",
"status": "FOUND",
"url": "https://www.google.com/maps/...",
"name": "Coffee Shop NYC",
"address": "123 Broadway",
"phone": "+1-212-555-0100",
"pubName": "Google", // From publisher merge
"favicon": "https://google.com/favicon.ico", // From publisher merge
"duplicates": [...]
}
]
},
"business_info": {
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY 10001",
"phone": "+1-212-555-0100"
}
}

Error Recovery:

Invalid Job ID Error:

// Yext error structure
{
"code": 21,
"name": "BAD_PARAMETER",
"type": "FATAL_ERROR",
"message": "Invalid parameter Invalid Job ID: scan_xyz789"
}

// Recovery: Create new scan
const scanData = await exports.create(body);
// Update database with new jobId

Publisher Merge Error:

// Logged but doesn't fail
logger.error({
initiator: 'external/yext/get-scan',
error: err,
message: 'Error while merging',
});
// Continues with scan data without publisher info

Example Usage:

// Get scan for account (creates if doesn't exist)
const scan = await scanService.getScan(accountId, {
name: 'Coffee Shop NYC',
address: '123 Broadway, New York, NY 10001',
phone: '+1-212-555-0100',
});

if (scan) {
console.log('Scan Job ID:', scan.scan_data.jobId);
console.log('Sites scanned:', scan.scan_data.sites.length);

// Count found listings
const found = scan.scan_data.sites.filter(s => s.status === 'FOUND');
console.log('Listings found:', found.length);

// Count duplicates
const withDuplicates = scan.scan_data.sites.filter(s => s.duplicates?.length > 0);
console.log('Sites with duplicates:', withDuplicates.length);
} else {
console.log('No business info provided');
}

Key Business Rules:

  • Caching: Results cached in MongoDB with TTL
  • Auto-Recovery: Recreates scan if job ID expired/invalid
  • Publisher Enrichment: Merges publisher data for enhanced display
  • Logo Preservation: Preserves logos from initial scan creation
  • Graceful Failures: Publisher merge errors logged but don't fail request
  • Null Return: Returns null if no business info provided
  • Field Cleanup: Removes internal MongoDB fields from response

deleteScan(account_id)โ€‹

Purpose: Delete cached scan data for an account

Source: services/scan.service.js

Parameters:

  • account_id (ObjectId) - DashClicks account ID

Returns: Promise<Boolean> - Returns true on success

Business Logic Flow:

await YextScan.deleteOne({ account_id });
return true;

Example Usage:

// Delete scan to force fresh scan on next request
await scanService.deleteScan(accountId);

// Next getScan() will create new scan
const newScan = await scanService.getScan(accountId, businessInfo);

Key Business Rules:

  • Simple Delete: Removes document from collection
  • No Validation: Doesn't check if scan exists
  • Force Refresh: Next getScan() will create new scan
  • TTL Alternative: Scans auto-expire via TTL, manual delete optional

๐Ÿ”€ Integration Pointsโ€‹

Complete Scan Workflowโ€‹

// Complete duplicate detection workflow
async function detectDuplicateListings(accountId, businessInfo) {
// 1. Get or create scan
const scan = await scanService.getScan(accountId, businessInfo);

if (!scan) {
return {
error: 'Business information required',
};
}

// 2. Analyze results
const sites = scan.scan_data.sites;
const found = sites.filter(s => s.status === 'FOUND');
const withDuplicates = sites.filter(s => s.duplicates?.length > 0);

// 3. Count total duplicates
const totalDuplicates = withDuplicates.reduce((sum, s) => sum + (s.duplicates?.length || 0), 0);

// 4. Group by publisher
const byPublisher = found.reduce((acc, site) => {
acc[site.siteName] = {
url: site.url,
name: site.name,
address: site.address,
phone: site.phone,
duplicates: site.duplicates || [],
};
return acc;
}, {});

return {
jobId: scan.scan_data.jobId,
totalSites: sites.length,
foundListings: found.length,
sitesWithDuplicates: withDuplicates.length,
totalDuplicates: totalDuplicates,
byPublisher: byPublisher,
recommendations:
totalDuplicates > 0
? 'Duplicate listings detected. Consider claiming or removing duplicates.'
: 'No duplicate listings detected. Your online presence is clean.',
};
}

Scan Monitoring Dashboardโ€‹

// Monitor scan results over time
async function monitorScanChanges(accountId, businessInfo) {
// Delete old scan to force refresh
await scanService.deleteScan(accountId);

// Get fresh scan
const scan = await scanService.getScan(accountId, businessInfo);

const sites = scan.scan_data.sites;

return {
timestamp: new Date(),
summary: {
sitesScanned: sites.length,
listingsFound: sites.filter(s => s.status === 'FOUND').length,
duplicatesDetected: sites.reduce((sum, s) => sum + (s.duplicates?.length || 0), 0),
},
details: sites.map(site => ({
publisher: site.siteName,
status: site.status,
url: site.url,
duplicateCount: site.duplicates?.length || 0,
})),
};
}

Duplicate Alert Systemโ€‹

// Check for new duplicates and alert
async function checkForNewDuplicates(accountId, businessInfo, previousScan) {
const currentScan = await scanService.getScan(accountId, businessInfo);

if (!previousScan) {
return {
newDuplicates: 0,
message: 'First scan completed',
};
}

// Compare duplicate counts
const prevTotal = previousScan.scan_data.sites.reduce(
(sum, s) => sum + (s.duplicates?.length || 0),
0,
);

const currTotal = currentScan.scan_data.sites.reduce(
(sum, s) => sum + (s.duplicates?.length || 0),
0,
);

if (currTotal > prevTotal) {
return {
newDuplicates: currTotal - prevTotal,
message: `${currTotal - prevTotal} new duplicate(s) detected!`,
alert: true,
};
}

return {
newDuplicates: 0,
message: 'No new duplicates',
alert: false,
};
}

๐Ÿงช Edge Cases & Special Handlingโ€‹

Expired Job IDโ€‹

Issue: Cached job ID expired or invalid

Handling: Auto-recovery by creating new scan

catch (err) {
const errMsg = err?.response?.data?.meta?.errors?.[0];
if (errMsg.code === 21 && errMsg.message.startsWith('Invalid parameter Invalid Job ID')) {
// Create new scan
const scanData = await exports.create(body);
// Update database
}
}

Why: Job IDs expire after period, auto-recreation provides seamless experience

Publisher Merge Failureโ€‹

Issue: Fetching publisher data fails

Handling: Logged but doesn't fail request

try {
// Fetch and merge publisher data
} catch (err) {
logger.error({
initiator: 'external/yext/get-scan',
error: err,
message: 'Error while merging',
});
// Continue without publisher data
}

Why: Publisher data is enhancement, not critical

Missing Business Infoโ€‹

Issue: No body.name provided

Handling: Returns null

if (body?.name) {
// Create scan
} else {
return null;
}

Why: Business name required to create scan

TTL Expirationโ€‹

Issue: Cached scan expires via TTL

Handling: Document automatically deleted by MongoDB

Result: Next getScan() creates fresh scan

Site ID Formatโ€‹

Issue: Multiple site IDs need joining

Handling: Maps and joins array

const sites = data.scan_data.sites.map(site => site.siteId);
// ['123', '456', '789']

await exports.get(jobId, sites);
// Internally: GET /scan/:jobId/123,456,789

โš ๏ธ Important Notesโ€‹

  1. Separate API Key: Uses YEXT_API_KEY_SCAN (not YEXT_API_KEYS)
  2. Caching: Results cached in MongoDB with TTL expiration
  3. Auto-Recovery: Recreates scan if job ID expired
  4. Publisher Enrichment: Merges publisher data for display
  5. Graceful Failures: Publisher merge errors don't fail request
  6. Logo Preservation: Logos from initial scan preserved
  7. Field Cleanup: Removes _id, __v, expiresAt from response
  8. Null Return: Returns null if no business info provided
  9. Async Scanning: Scan results may show "SCANNING" status
  10. Duplicate Detection: duplicates array contains found duplicates
  11. Status Types: FOUND, NOT_FOUND, SCANNING
  12. Job ID Validation: Auto-detects invalid job IDs via error code 21

๐ŸŽฏ Scan Result Interpretationโ€‹

Status Meaningsโ€‹

  • FOUND: Listing exists on publisher
  • NOT_FOUND: No listing found on publisher
  • SCANNING: Scan still in progress (check again later)

Duplicate Scenariosโ€‹

No Duplicates:

{
"status": "FOUND",
"url": "https://...",
"duplicates": [] // Single correct listing
}

With Duplicates:

{
"status": "FOUND",
"url": "https://...", // Primary listing
"duplicates": [
{
"url": "https://...", // Duplicate 1
"name": "...",
"address": "..."
},
{
"url": "https://...", // Duplicate 2
"name": "...",
"address": "..."
}
]
}

Action Items:

  • Claim correct listing
  • Remove/merge duplicates
  • Update incorrect information
  • Monitor for new duplicates

๐Ÿš€ Performance Considerationsโ€‹

Caching Strategy:

  • Scans cached with TTL (auto-expiration)
  • Reduces API calls to Yext
  • Fresh scans on TTL expiry or manual delete

API Limits:

  • Scan API has rate limits (check Yext documentation)
  • Consider throttling scan requests
  • Cache results for reasonable period

Best Practices:

  1. Don't scan on every request
  2. Use cached results when available
  3. Delete and rescan periodically (weekly/monthly)
  4. Monitor for new duplicates proactively
๐Ÿ’ฌ

Documentation Assistant

Ask me anything about the docs

Hi! I'm your documentation assistant. Ask me anything about the docs!

I can help you with:
- Code examples
- Configuration details
- Troubleshooting
- Best practices

Try asking: How do I configure the API?
09:31 AM