Yext Scan Service
๐ Overviewโ
The Yext Scan service detects duplicate business listings across the web that may compete with or harm a business's online presence. It scans publisher directories to find existing listings for a business, identifies duplicates, and provides tools to monitor and manage them. Scan results are cached in MongoDB with TTL expiration for efficient retrieval.
Source File: external/Integrations/Yext/services/scan.service.js
External API: Yext Scan API
Primary Use: Detect duplicate listings and monitor online presence
๐๏ธ Collections Usedโ
yext-scansโ
- Operations: Create, Read, Update, Delete
- Model:
external/models/yext-scans.js - Usage Context: Cache scan results with automatic expiration
- Key Fields:
account_id- DashClicks account IDscan_data.jobId- Yext scan job IDscan_data.sites- Array of detected listings with logosbusiness_info- Business data used for scanexpiresAt- TTL expiration timestamp
๐ Data Flowโ
Initial Scan Creationโ
sequenceDiagram
participant Client
participant ScanService
participant ScanDB as YextScan Collection
participant YextScanAPI as Yext Scan API
participant YextListingsAPI as Yext Listings API
participant Logger
Client->>ScanService: getScan(account_id, business_info)
ScanService->>ScanDB: Find existing scan
alt No scan exists
ScanService->>YextScanAPI: POST /scan (create)
Note over ScanService,YextScanAPI: Body: name, address, phone
YextScanAPI-->>ScanService: jobId + sites array
ScanService->>YextScanAPI: GET /scan/:jobId/:siteIds
YextScanAPI-->>ScanService: Scan results
ScanService->>ScanService: Add logos to results
ScanService->>ScanDB: Save scan with TTL
ScanService-->>Client: Return scan with logos
end
style ScanService fill:#e3f2fd
style YextScanAPI fill:#fff4e6
Cached Scan Retrievalโ
sequenceDiagram
participant Client
participant ScanService
participant ScanDB as YextScan Collection
participant YextScanAPI as Yext Scan API
participant YextListingsAPI as Yext Listings API
participant Logger
Client->>ScanService: getScan(account_id, business_info)
ScanService->>ScanDB: Find existing scan
alt Scan exists and valid
ScanDB-->>ScanService: Return cached jobId
ScanService->>YextScanAPI: GET /scan/:jobId/:siteIds
YextScanAPI-->>ScanService: Updated results
ScanService->>YextListingsAPI: GET /listings/publishers
YextListingsAPI-->>ScanService: Publisher details
ScanService->>ScanService: Merge publisher data
Note over ScanService: Add publisher names, URLs
ScanService->>ScanDB: Update cached results
ScanService-->>Client: Return updated scan
else Scan expired/invalid
ScanService->>YextScanAPI: POST /scan (recreate)
Note over ScanService: Follow creation flow
end
alt Error during merge
ScanService->>Logger: Log merge error
Note over Logger: Error logged but doesn't fail
end
style ScanService fill:#e3f2fd
style YextScanAPI fill:#fff4e6
๐ง Business Logic & Functionsโ
create(requestBody)โ
Purpose: Initiate a new duplicate listing scan with Yext
Source: services/scan.service.js
External API Endpoint: POST https://api.yext.com/v2/accounts/me/scan
Parameters:
requestBody(Object) - Business information for scanname(String) - Business nameaddress(String) - Business addressphone(String) - Business phone number- Additional business fields
Returns: Promise<Object> - Scan creation response with jobId and sites
Business Logic Flow:
const v = process.env.YEXT_API_VPARAM || '20200525';
const url = 'https://api.yext.com/v2/accounts/me/scan';
const options = {
method: 'POST',
headers: { 'api-key': process.env.YEXT_API_KEY_SCAN },
params: { v: v },
data: requestBody,
url,
};
const dns = await axios(options);
return dns.data;
API Request Example:
POST https://api.yext.com/v2/accounts/me/scan?v=20200525
Headers:
api-key: {YEXT_API_KEY_SCAN}
Content-Type: application/json
Body:
{
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY 10001",
"phone": "+1-212-555-0100"
}
API Response Example:
{
"meta": {
"uuid": "abc-def-123",
"errors": []
},
"response": {
"jobId": "scan_xyz789",
"sites": [
{
"siteId": "123",
"siteName": "Google",
"logo": "https://example.com/google-logo.png"
},
{
"siteId": "456",
"siteName": "Yelp",
"logo": "https://example.com/yelp-logo.png"
}
]
}
}
Key Business Rules:
- Separate API Key: Uses
YEXT_API_KEY_SCAN(different from main API key) - Returns Job ID: Use jobId to fetch detailed results
- Site List: Initial response includes site IDs where duplicates may exist
- Async Process: Scan runs in background, may take time to complete
get(scanId, siteIds)โ
Purpose: Retrieve scan results for specific job and sites
Source: services/scan.service.js
External API Endpoint: GET https://api.yext.com/v2/accounts/me/scan/:scanId/:siteIds
Parameters:
scanId(String) - Scan job ID from create()siteIds(String) - Comma-separated site IDs or array joined
Returns: Promise<Object> - Detailed scan results
Business Logic Flow:
const v = process.env.YEXT_API_VPARAM || '20200525';
const url = 'https://api.yext.com/v2/accounts/me/scan/' + scanId + '/' + siteIds;
const options = {
method: 'GET',
headers: { 'api-key': process.env.YEXT_API_KEY_SCAN },
params: { v: v },
url,
};
const dns = await axios(options);
return dns.data;
API Request Example:
GET https://api.yext.com/v2/accounts/me/scan/scan_xyz789/123,456,789?v=20200525
Headers:
api-key: {YEXT_API_KEY_SCAN}
API Response Example:
{
"meta": {
"uuid": "abc-def-123",
"errors": []
},
"response": [
{
"siteId": "123",
"siteName": "Google",
"status": "FOUND",
"url": "https://www.google.com/maps/place/...",
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY",
"phone": "+1-212-555-0100",
"duplicates": [
{
"url": "https://www.google.com/maps/place/another-listing",
"name": "Coffee Shop",
"address": "123 Broadway"
}
]
},
{
"siteId": "456",
"siteName": "Yelp",
"status": "FOUND",
"url": "https://www.yelp.com/biz/coffee-shop-nyc",
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY 10001",
"phone": "(212) 555-0100",
"duplicates": []
},
{
"siteId": "789",
"siteName": "Facebook",
"status": "NOT_FOUND",
"duplicates": []
}
]
}
Scan Result Structure:
siteId- Publisher site IDsiteName- Publisher namestatus- FOUND, NOT_FOUND, SCANNINGurl- URL of found listingname,address,phone- Listing informationduplicates- Array of duplicate listings found
Key Business Rules:
- Comma-Separated Sites: Multiple site IDs separated by commas
- Async Results: Status may be SCANNING if not complete
- Duplicate Detection: duplicates array contains potential duplicates
getScan(account_id, body)โ
Purpose: Get or create scan for account with intelligent caching and error recovery
Source: services/scan.service.js
Parameters:
account_id(ObjectId) - DashClicks account IDbody(Object) - Business information for scanname(String) - Business nameaddress(String) - Business addressphone(String) - Business phone- Additional fields
Returns: Promise<Object | null> - Scan data with sites or null
Business Logic Flow:
-
Check for Existing Scan
let data = await YextScan.findOne({ account_id }); -
If Scan Exists
if (data) {
const resp = {
jobId: data.scan_data.jobId,
};
const sites = data.scan_data.sites.map(site => site.siteId);
let scans;
try {
scans = await exports.get(data.scan_data.jobId, sites);
} catch (err) {
// Handle invalid/expired job ID
const errMsg = err?.response?.data?.meta?.errors?.[0];
if (
errMsg.code === 21 &&
errMsg?.name === 'BAD_PARAMETER' &&
errMsg.type === 'FATAL_ERROR' &&
errMsg.message.startsWith('Invalid parameter Invalid Job ID')
) {
// Recreate scan
const scanData = await exports.create(body);
const response = scanData.response;
// ... fetch new results and update DB
}
}
} -
Merge Publisher Data
try {
const url = 'https://api.yext.com/v2/accounts/me/listings/publishers';
const options = {
method: 'GET',
headers: { 'api-key': process.env.YEXT_API_KEYS },
data: body,
params: { v },
url,
};
let {
data: {
response: { publishers },
},
} = await axios(options);
// Rename publisher fields
for (pub of publishers || []) {
pub.pubName = pub.name;
delete pub.name;
}
// Merge publisher data with scan results
scanResp = scanResp.map(sc => {
const publisher = publishers.find(site => site.id === sc.siteId);
if (publisher && !sc.hasOwnProperty('url')) {
return { ...sc, ...publisher };
}
return sc;
});
} catch (err) {
logger.error({
initiator: 'external/yext/get-scan',
error: err,
message: 'Error while merging',
});
} -
Update Database
resp.sites = scanResp;
data = await YextScan.findOneAndUpdate({ account_id }, { scan_data: resp });
data = data.toObject();
data.id = data._id;
delete data['_id'];
delete data['__v'];
delete data['expiresAt'];
return data; -
If No Scan Exists
else {
if (body?.name) {
const scanData = await exports.create(body);
const response = scanData.response;
const resp = {
jobId: response.jobId
};
const sites = response.sites.map(site => site.siteId);
const scans = await exports.get(response.jobId, sites);
const scanResp = scans.response.map(sc => {
sc.logo = response.sites.find(site => site.siteId === sc.siteId)?.logo;
return sc;
});
resp.sites = scanResp;
data = await (new YextScan({ account_id, scan_data: resp, business_info: body })).save();
data = data.toObject();
data.id = data._id;
delete data["_id"]; delete data["__v"]; delete data["expiresAt"];
return data;
}
else {
return null;
}
}
Return Structure:
{
"id": ObjectId("507f1f77bcf86cd799439011"),
"account_id": ObjectId("507f191e810c19729de860ea"),
"scan_data": {
"jobId": "scan_xyz789",
"sites": [
{
"siteId": "123",
"siteName": "Google",
"logo": "https://example.com/google-logo.png",
"status": "FOUND",
"url": "https://www.google.com/maps/...",
"name": "Coffee Shop NYC",
"address": "123 Broadway",
"phone": "+1-212-555-0100",
"pubName": "Google", // From publisher merge
"favicon": "https://google.com/favicon.ico", // From publisher merge
"duplicates": [...]
}
]
},
"business_info": {
"name": "Coffee Shop NYC",
"address": "123 Broadway, New York, NY 10001",
"phone": "+1-212-555-0100"
}
}
Error Recovery:
Invalid Job ID Error:
// Yext error structure
{
"code": 21,
"name": "BAD_PARAMETER",
"type": "FATAL_ERROR",
"message": "Invalid parameter Invalid Job ID: scan_xyz789"
}
// Recovery: Create new scan
const scanData = await exports.create(body);
// Update database with new jobId
Publisher Merge Error:
// Logged but doesn't fail
logger.error({
initiator: 'external/yext/get-scan',
error: err,
message: 'Error while merging',
});
// Continues with scan data without publisher info
Example Usage:
// Get scan for account (creates if doesn't exist)
const scan = await scanService.getScan(accountId, {
name: 'Coffee Shop NYC',
address: '123 Broadway, New York, NY 10001',
phone: '+1-212-555-0100',
});
if (scan) {
console.log('Scan Job ID:', scan.scan_data.jobId);
console.log('Sites scanned:', scan.scan_data.sites.length);
// Count found listings
const found = scan.scan_data.sites.filter(s => s.status === 'FOUND');
console.log('Listings found:', found.length);
// Count duplicates
const withDuplicates = scan.scan_data.sites.filter(s => s.duplicates?.length > 0);
console.log('Sites with duplicates:', withDuplicates.length);
} else {
console.log('No business info provided');
}
Key Business Rules:
- Caching: Results cached in MongoDB with TTL
- Auto-Recovery: Recreates scan if job ID expired/invalid
- Publisher Enrichment: Merges publisher data for enhanced display
- Logo Preservation: Preserves logos from initial scan creation
- Graceful Failures: Publisher merge errors logged but don't fail request
- Null Return: Returns null if no business info provided
- Field Cleanup: Removes internal MongoDB fields from response
deleteScan(account_id)โ
Purpose: Delete cached scan data for an account
Source: services/scan.service.js
Parameters:
account_id(ObjectId) - DashClicks account ID
Returns: Promise<Boolean> - Returns true on success
Business Logic Flow:
await YextScan.deleteOne({ account_id });
return true;
Example Usage:
// Delete scan to force fresh scan on next request
await scanService.deleteScan(accountId);
// Next getScan() will create new scan
const newScan = await scanService.getScan(accountId, businessInfo);
Key Business Rules:
- Simple Delete: Removes document from collection
- No Validation: Doesn't check if scan exists
- Force Refresh: Next getScan() will create new scan
- TTL Alternative: Scans auto-expire via TTL, manual delete optional
๐ Integration Pointsโ
Complete Scan Workflowโ
// Complete duplicate detection workflow
async function detectDuplicateListings(accountId, businessInfo) {
// 1. Get or create scan
const scan = await scanService.getScan(accountId, businessInfo);
if (!scan) {
return {
error: 'Business information required',
};
}
// 2. Analyze results
const sites = scan.scan_data.sites;
const found = sites.filter(s => s.status === 'FOUND');
const withDuplicates = sites.filter(s => s.duplicates?.length > 0);
// 3. Count total duplicates
const totalDuplicates = withDuplicates.reduce((sum, s) => sum + (s.duplicates?.length || 0), 0);
// 4. Group by publisher
const byPublisher = found.reduce((acc, site) => {
acc[site.siteName] = {
url: site.url,
name: site.name,
address: site.address,
phone: site.phone,
duplicates: site.duplicates || [],
};
return acc;
}, {});
return {
jobId: scan.scan_data.jobId,
totalSites: sites.length,
foundListings: found.length,
sitesWithDuplicates: withDuplicates.length,
totalDuplicates: totalDuplicates,
byPublisher: byPublisher,
recommendations:
totalDuplicates > 0
? 'Duplicate listings detected. Consider claiming or removing duplicates.'
: 'No duplicate listings detected. Your online presence is clean.',
};
}
Scan Monitoring Dashboardโ
// Monitor scan results over time
async function monitorScanChanges(accountId, businessInfo) {
// Delete old scan to force refresh
await scanService.deleteScan(accountId);
// Get fresh scan
const scan = await scanService.getScan(accountId, businessInfo);
const sites = scan.scan_data.sites;
return {
timestamp: new Date(),
summary: {
sitesScanned: sites.length,
listingsFound: sites.filter(s => s.status === 'FOUND').length,
duplicatesDetected: sites.reduce((sum, s) => sum + (s.duplicates?.length || 0), 0),
},
details: sites.map(site => ({
publisher: site.siteName,
status: site.status,
url: site.url,
duplicateCount: site.duplicates?.length || 0,
})),
};
}
Duplicate Alert Systemโ
// Check for new duplicates and alert
async function checkForNewDuplicates(accountId, businessInfo, previousScan) {
const currentScan = await scanService.getScan(accountId, businessInfo);
if (!previousScan) {
return {
newDuplicates: 0,
message: 'First scan completed',
};
}
// Compare duplicate counts
const prevTotal = previousScan.scan_data.sites.reduce(
(sum, s) => sum + (s.duplicates?.length || 0),
0,
);
const currTotal = currentScan.scan_data.sites.reduce(
(sum, s) => sum + (s.duplicates?.length || 0),
0,
);
if (currTotal > prevTotal) {
return {
newDuplicates: currTotal - prevTotal,
message: `${currTotal - prevTotal} new duplicate(s) detected!`,
alert: true,
};
}
return {
newDuplicates: 0,
message: 'No new duplicates',
alert: false,
};
}
๐งช Edge Cases & Special Handlingโ
Expired Job IDโ
Issue: Cached job ID expired or invalid
Handling: Auto-recovery by creating new scan
catch (err) {
const errMsg = err?.response?.data?.meta?.errors?.[0];
if (errMsg.code === 21 && errMsg.message.startsWith('Invalid parameter Invalid Job ID')) {
// Create new scan
const scanData = await exports.create(body);
// Update database
}
}
Why: Job IDs expire after period, auto-recreation provides seamless experience
Publisher Merge Failureโ
Issue: Fetching publisher data fails
Handling: Logged but doesn't fail request
try {
// Fetch and merge publisher data
} catch (err) {
logger.error({
initiator: 'external/yext/get-scan',
error: err,
message: 'Error while merging',
});
// Continue without publisher data
}
Why: Publisher data is enhancement, not critical
Missing Business Infoโ
Issue: No body.name provided
Handling: Returns null
if (body?.name) {
// Create scan
} else {
return null;
}
Why: Business name required to create scan
TTL Expirationโ
Issue: Cached scan expires via TTL
Handling: Document automatically deleted by MongoDB
Result: Next getScan() creates fresh scan
Site ID Formatโ
Issue: Multiple site IDs need joining
Handling: Maps and joins array
const sites = data.scan_data.sites.map(site => site.siteId);
// ['123', '456', '789']
await exports.get(jobId, sites);
// Internally: GET /scan/:jobId/123,456,789
โ ๏ธ Important Notesโ
- Separate API Key: Uses
YEXT_API_KEY_SCAN(notYEXT_API_KEYS) - Caching: Results cached in MongoDB with TTL expiration
- Auto-Recovery: Recreates scan if job ID expired
- Publisher Enrichment: Merges publisher data for display
- Graceful Failures: Publisher merge errors don't fail request
- Logo Preservation: Logos from initial scan preserved
- Field Cleanup: Removes
_id,__v,expiresAtfrom response - Null Return: Returns
nullif no business info provided - Async Scanning: Scan results may show "SCANNING" status
- Duplicate Detection:
duplicatesarray contains found duplicates - Status Types: FOUND, NOT_FOUND, SCANNING
- Job ID Validation: Auto-detects invalid job IDs via error code 21
๐ Related Documentationโ
- Yext Integration Overview: index.md
- Yext Scan API: https://developer.yext.com/docs/api-reference/scan/
- YextScan Model:
external/models/yext-scans.js - Logger Utility:
external/utilities/logger.js
๐ฏ Scan Result Interpretationโ
Status Meaningsโ
- FOUND: Listing exists on publisher
- NOT_FOUND: No listing found on publisher
- SCANNING: Scan still in progress (check again later)
Duplicate Scenariosโ
No Duplicates:
{
"status": "FOUND",
"url": "https://...",
"duplicates": [] // Single correct listing
}
With Duplicates:
{
"status": "FOUND",
"url": "https://...", // Primary listing
"duplicates": [
{
"url": "https://...", // Duplicate 1
"name": "...",
"address": "..."
},
{
"url": "https://...", // Duplicate 2
"name": "...",
"address": "..."
}
]
}
Action Items:
- Claim correct listing
- Remove/merge duplicates
- Update incorrect information
- Monitor for new duplicates
๐ Performance Considerationsโ
Caching Strategy:
- Scans cached with TTL (auto-expiration)
- Reduces API calls to Yext
- Fresh scans on TTL expiry or manual delete
API Limits:
- Scan API has rate limits (check Yext documentation)
- Consider throttling scan requests
- Cache results for reasonable period
Best Practices:
- Don't scan on every request
- Use cached results when available
- Delete and rescan periodically (weekly/monthly)
- Monitor for new duplicates proactively