Yext Scan Service

📖 Overview

The Yext Scan service detects duplicate business listings across the web that may compete with or harm a business's online presence. It scans publisher directories to find existing listings for a business, identifies duplicates, and provides tools to monitor and manage them. Scan results are cached in MongoDB with TTL expiration for efficient retrieval.

Source File: external/Integrations/Yext/services/scan.service.js
External API: Yext Scan API
Primary Use: Detect duplicate listings and monitor online presence

🗄️ Collections Used

`yext-scans`

Operations: Create, Read, Update, Delete
Model: external/models/yext-scans.js
Usage Context: Cache scan results with automatic expiration
Key Fields:
- account_id - DashClicks account ID
- scan_data.jobId - Yext scan job ID
- scan_data.sites - Array of detected listings with logos
- business_info - Business data used for scan
- expiresAt - TTL expiration timestamp

🔄 Data Flow

Initial Scan Creation

sequenceDiagram
    participant Client
    participant ScanService
    participant ScanDB as YextScan Collection
    participant YextScanAPI as Yext Scan API
    participant YextListingsAPI as Yext Listings API
    participant Logger

    Client->>ScanService: getScan(account_id, business_info)
    ScanService->>ScanDB: Find existing scan

    alt No scan exists
        ScanService->>YextScanAPI: POST /scan (create)
        Note over ScanService,YextScanAPI: Body: name, address, phone
        YextScanAPI-->>ScanService: jobId + sites array

        ScanService->>YextScanAPI: GET /scan/:jobId/:siteIds
        YextScanAPI-->>ScanService: Scan results

        ScanService->>ScanService: Add logos to results
        ScanService->>ScanDB: Save scan with TTL
        ScanService-->>Client: Return scan with logos
    end

    style ScanService fill:#e3f2fd
    style YextScanAPI fill:#fff4e6

Cached Scan Retrieval

sequenceDiagram
    participant Client
    participant ScanService
    participant ScanDB as YextScan Collection
    participant YextScanAPI as Yext Scan API
    participant YextListingsAPI as Yext Listings API
    participant Logger

    Client->>ScanService: getScan(account_id, business_info)
    ScanService->>ScanDB: Find existing scan

    alt Scan exists and valid
        ScanDB-->>ScanService: Return cached jobId
        ScanService->>YextScanAPI: GET /scan/:jobId/:siteIds
        YextScanAPI-->>ScanService: Updated results

        ScanService->>YextListingsAPI: GET /listings/publishers
        YextListingsAPI-->>ScanService: Publisher details

        ScanService->>ScanService: Merge publisher data
        Note over ScanService: Add publisher names, URLs

        ScanService->>ScanDB: Update cached results
        ScanService-->>Client: Return updated scan
    else Scan expired/invalid
        ScanService->>YextScanAPI: POST /scan (recreate)
        Note over ScanService: Follow creation flow
    end

    alt Error during merge
        ScanService->>Logger: Log merge error
        Note over Logger: Error logged but doesn't fail
    end

    style ScanService fill:#e3f2fd
    style YextScanAPI fill:#fff4e6

🔧 Business Logic & Functions

create(requestBody)

Purpose: Initiate a new duplicate listing scan with Yext

Source: services/scan.service.js

External API Endpoint: POST https://api.yext.com/v2/accounts/me/scan

Parameters:

requestBody (Object) - Business information for scan
- name (String) - Business name
- address (String) - Business address
- phone (String) - Business phone number
- Additional business fields

Returns: Promise<Object> - Scan creation response with jobId and sites

Business Logic Flow:

const v = process.env.YEXT_API_VPARAM || '20200525';
const url = 'https://api.yext.com/v2/accounts/me/scan';

const options = {
  method: 'POST',
  headers: { 'api-key': process.env.YEXT_API_KEY_SCAN },
  params: { v: v },
  data: requestBody,
  url,
};

const dns = await axios(options);
return dns.data;

API Request Example:

POST https://api.yext.com/v2/accounts/me/scan?v=20200525
Headers:
  api-key: {YEXT_API_KEY_SCAN}
  Content-Type: application/json
Body:
{
  "name": "Coffee Shop NYC",
  "address": "123 Broadway, New York, NY 10001",
  "phone": "+1-212-555-0100"
}

API Response Example:

{
  "meta": {
    "uuid": "abc-def-123",
    "errors": []
  },
  "response": {
    "jobId": "scan_xyz789",
    "sites": [
      {
        "siteId": "123",
        "siteName": "Google",
        "logo": "https://example.com/google-logo.png"
      },
      {
        "siteId": "456",
        "siteName": "Yelp",
        "logo": "https://example.com/yelp-logo.png"
      }
    ]
  }
}

Key Business Rules:

Separate API Key: Uses YEXT_API_KEY_SCAN (different from main API key)
Returns Job ID: Use jobId to fetch detailed results
Site List: Initial response includes site IDs where duplicates may exist
Async Process: Scan runs in background, may take time to complete

get(scanId, siteIds)

Purpose: Retrieve scan results for specific job and sites

Source: services/scan.service.js

External API Endpoint: GET https://api.yext.com/v2/accounts/me/scan/:scanId/:siteIds

Parameters:

scanId (String) - Scan job ID from create()
siteIds (String) - Comma-separated site IDs or array joined

Returns: Promise<Object> - Detailed scan results

Business Logic Flow:

const v = process.env.YEXT_API_VPARAM || '20200525';
const url = 'https://api.yext.com/v2/accounts/me/scan/' + scanId + '/' + siteIds;

const options = {
  method: 'GET',
  headers: { 'api-key': process.env.YEXT_API_KEY_SCAN },
  params: { v: v },
  url,
};

const dns = await axios(options);
return dns.data;

API Request Example:

GET https://api.yext.com/v2/accounts/me/scan/scan_xyz789/123,456,789?v=20200525
Headers:
  api-key: {YEXT_API_KEY_SCAN}

API Response Example:

{
  "meta": {
    "uuid": "abc-def-123",
    "errors": []
  },
  "response": [
    {
      "siteId": "123",
      "siteName": "Google",
      "status": "FOUND",
      "url": "https://www.google.com/maps/place/...",
      "name": "Coffee Shop NYC",
      "address": "123 Broadway, New York, NY",
      "phone": "+1-212-555-0100",
      "duplicates": [
        {
          "url": "https://www.google.com/maps/place/another-listing",
          "name": "Coffee Shop",
          "address": "123 Broadway"
        }
      ]
    },
    {
      "siteId": "456",
      "siteName": "Yelp",
      "status": "FOUND",
      "url": "https://www.yelp.com/biz/coffee-shop-nyc",
      "name": "Coffee Shop NYC",
      "address": "123 Broadway, New York, NY 10001",
      "phone": "(212) 555-0100",
      "duplicates": []
    },
    {
      "siteId": "789",
      "siteName": "Facebook",
      "status": "NOT_FOUND",
      "duplicates": []
    }
  ]
}

Scan Result Structure:

siteId - Publisher site ID
siteName - Publisher name
status - FOUND, NOT_FOUND, SCANNING
url - URL of found listing
name, address, phone - Listing information
duplicates - Array of duplicate listings found

Key Business Rules:

Comma-Separated Sites: Multiple site IDs separated by commas
Async Results: Status may be SCANNING if not complete
Duplicate Detection: duplicates array contains potential duplicates

getScan(account_id, body)

Purpose: Get or create scan for account with intelligent caching and error recovery

Source: services/scan.service.js

Parameters:

account_id (ObjectId) - DashClicks account ID
body (Object) - Business information for scan
- name (String) - Business name
- address (String) - Business address
- phone (String) - Business phone
- Additional fields

Returns: Promise<Object | null> - Scan data with sites or null

Business Logic Flow:

Check for Existing Scan

let data = await YextScan.findOne({ account_id });

If Scan Exists

if (data) {
  const resp = {
    jobId: data.scan_data.jobId,
  };

  const sites = data.scan_data.sites.map(site => site.siteId);
  let scans;

  try {
    scans = await exports.get(data.scan_data.jobId, sites);
  } catch (err) {
    // Handle invalid/expired job ID
    const errMsg = err?.response?.data?.meta?.errors?.[0];
    if (
      errMsg.code === 21 &&
      errMsg?.name === 'BAD_PARAMETER' &&
      errMsg.type === 'FATAL_ERROR' &&
      errMsg.message.startsWith('Invalid parameter Invalid Job ID')
    ) {
      // Recreate scan
      const scanData = await exports.create(body);
      const response = scanData.response;
      // ... fetch new results and update DB
    }
  }
}

Merge Publisher Data

try {
  const url = 'https://api.yext.com/v2/accounts/me/listings/publishers';
  const options = {
    method: 'GET',
    headers: { 'api-key': process.env.YEXT_API_KEYS },
    data: body,
    params: { v },
    url,
  };
  let {
    data: {
      response: { publishers },
    },
  } = await axios(options);

  // Rename publisher fields
  for (pub of publishers || []) {
    pub.pubName = pub.name;
    delete pub.name;
  }

  // Merge publisher data with scan results
  scanResp = scanResp.map(sc => {
    const publisher = publishers.find(site => site.id === sc.siteId);
    if (publisher && !sc.hasOwnProperty('url')) {
      return { ...sc, ...publisher };
    }
    return sc;
  });
} catch (err) {
  logger.error({
    initiator: 'external/yext/get-scan',
    error: err,
    message: 'Error while merging',
  });
}

Update Database

resp.sites = scanResp;
data = await YextScan.findOneAndUpdate({ account_id }, { scan_data: resp });

data = data.toObject();
data.id = data._id;
delete data['_id'];
delete data['__v'];
delete data['expiresAt'];

return data;

If No Scan Exists

else {
  if (body?.name) {
    const scanData = await exports.create(body);
    const response = scanData.response;
    const resp = {
      jobId: response.jobId
    };

    const sites = response.sites.map(site => site.siteId);
    const scans = await exports.get(response.jobId, sites);
    const scanResp = scans.response.map(sc => {
      sc.logo = response.sites.find(site => site.siteId === sc.siteId)?.logo;
      return sc;
    });

    resp.sites = scanResp;

    data = await (new YextScan({ account_id, scan_data: resp, business_info: body })).save();
    data = data.toObject();
    data.id = data._id;
    delete data["_id"]; delete data["__v"]; delete data["expiresAt"];

    return data;
  }
  else {
    return null;
  }
}

Return Structure:

{
  "id": ObjectId("507f1f77bcf86cd799439011"),
  "account_id": ObjectId("507f191e810c19729de860ea"),
  "scan_data": {
    "jobId": "scan_xyz789",
    "sites": [
      {
        "siteId": "123",
        "siteName": "Google",
        "logo": "https://example.com/google-logo.png",
        "status": "FOUND",
        "url": "https://www.google.com/maps/...",
        "name": "Coffee Shop NYC",
        "address": "123 Broadway",
        "phone": "+1-212-555-0100",
        "pubName": "Google",  // From publisher merge
        "favicon": "https://google.com/favicon.ico",  // From publisher merge
        "duplicates": [...]
      }
    ]
  },
  "business_info": {
    "name": "Coffee Shop NYC",
    "address": "123 Broadway, New York, NY 10001",
    "phone": "+1-212-555-0100"
  }
}

Error Recovery:

Invalid Job ID Error:

// Yext error structure
{
  "code": 21,
  "name": "BAD_PARAMETER",
  "type": "FATAL_ERROR",
  "message": "Invalid parameter Invalid Job ID: scan_xyz789"
}

// Recovery: Create new scan
const scanData = await exports.create(body);
// Update database with new jobId

Publisher Merge Error:

// Logged but doesn't fail
logger.error({
  initiator: 'external/yext/get-scan',
  error: err,
  message: 'Error while merging',
});
// Continues with scan data without publisher info

Example Usage:

// Get scan for account (creates if doesn't exist)
const scan = await scanService.getScan(accountId, {
  name: 'Coffee Shop NYC',
  address: '123 Broadway, New York, NY 10001',
  phone: '+1-212-555-0100',
});

if (scan) {
  console.log('Scan Job ID:', scan.scan_data.jobId);
  console.log('Sites scanned:', scan.scan_data.sites.length);

  // Count found listings
  const found = scan.scan_data.sites.filter(s => s.status === 'FOUND');
  console.log('Listings found:', found.length);

  // Count duplicates
  const withDuplicates = scan.scan_data.sites.filter(s => s.duplicates?.length > 0);
  console.log('Sites with duplicates:', withDuplicates.length);
} else {
  console.log('No business info provided');
}

Key Business Rules:

Caching: Results cached in MongoDB with TTL
Auto-Recovery: Recreates scan if job ID expired/invalid
Publisher Enrichment: Merges publisher data for enhanced display
Logo Preservation: Preserves logos from initial scan creation
Graceful Failures: Publisher merge errors logged but don't fail request
Null Return: Returns null if no business info provided
Field Cleanup: Removes internal MongoDB fields from response

deleteScan(account_id)

Purpose: Delete cached scan data for an account

Source: services/scan.service.js

Parameters:

account_id (ObjectId) - DashClicks account ID

Returns: Promise<Boolean> - Returns true on success

Business Logic Flow:

await YextScan.deleteOne({ account_id });
return true;

Example Usage:

// Delete scan to force fresh scan on next request
await scanService.deleteScan(accountId);

// Next getScan() will create new scan
const newScan = await scanService.getScan(accountId, businessInfo);

Key Business Rules:

Simple Delete: Removes document from collection
No Validation: Doesn't check if scan exists
Force Refresh: Next getScan() will create new scan
TTL Alternative: Scans auto-expire via TTL, manual delete optional

🔀 Integration Points

Complete Scan Workflow

// Complete duplicate detection workflow
async function detectDuplicateListings(accountId, businessInfo) {
  // 1. Get or create scan
  const scan = await scanService.getScan(accountId, businessInfo);

  if (!scan) {
    return {
      error: 'Business information required',
    };
  }

  // 2. Analyze results
  const sites = scan.scan_data.sites;
  const found = sites.filter(s => s.status === 'FOUND');
  const withDuplicates = sites.filter(s => s.duplicates?.length > 0);

  // 3. Count total duplicates
  const totalDuplicates = withDuplicates.reduce((sum, s) => sum + (s.duplicates?.length || 0), 0);

  // 4. Group by publisher
  const byPublisher = found.reduce((acc, site) => {
    acc[site.siteName] = {
      url: site.url,
      name: site.name,
      address: site.address,
      phone: site.phone,
      duplicates: site.duplicates || [],
    };
    return acc;
  }, {});

  return {
    jobId: scan.scan_data.jobId,
    totalSites: sites.length,
    foundListings: found.length,
    sitesWithDuplicates: withDuplicates.length,
    totalDuplicates: totalDuplicates,
    byPublisher: byPublisher,
    recommendations:
      totalDuplicates > 0
        ? 'Duplicate listings detected. Consider claiming or removing duplicates.'
        : 'No duplicate listings detected. Your online presence is clean.',
  };
}

Scan Monitoring Dashboard

// Monitor scan results over time
async function monitorScanChanges(accountId, businessInfo) {
  // Delete old scan to force refresh
  await scanService.deleteScan(accountId);

  // Get fresh scan
  const scan = await scanService.getScan(accountId, businessInfo);

  const sites = scan.scan_data.sites;

  return {
    timestamp: new Date(),
    summary: {
      sitesScanned: sites.length,
      listingsFound: sites.filter(s => s.status === 'FOUND').length,
      duplicatesDetected: sites.reduce((sum, s) => sum + (s.duplicates?.length || 0), 0),
    },
    details: sites.map(site => ({
      publisher: site.siteName,
      status: site.status,
      url: site.url,
      duplicateCount: site.duplicates?.length || 0,
    })),
  };
}

Duplicate Alert System

// Check for new duplicates and alert
async function checkForNewDuplicates(accountId, businessInfo, previousScan) {
  const currentScan = await scanService.getScan(accountId, businessInfo);

  if (!previousScan) {
    return {
      newDuplicates: 0,
      message: 'First scan completed',
    };
  }

  // Compare duplicate counts
  const prevTotal = previousScan.scan_data.sites.reduce(
    (sum, s) => sum + (s.duplicates?.length || 0),
    0,
  );

  const currTotal = currentScan.scan_data.sites.reduce(
    (sum, s) => sum + (s.duplicates?.length || 0),
    0,
  );

  if (currTotal > prevTotal) {
    return {
      newDuplicates: currTotal - prevTotal,
      message: `${currTotal - prevTotal} new duplicate(s) detected!`,
      alert: true,
    };
  }

  return {
    newDuplicates: 0,
    message: 'No new duplicates',
    alert: false,
  };
}

🧪 Edge Cases & Special Handling

Expired Job ID

Issue: Cached job ID expired or invalid

Handling: Auto-recovery by creating new scan

catch (err) {
  const errMsg = err?.response?.data?.meta?.errors?.[0];
  if (errMsg.code === 21 && errMsg.message.startsWith('Invalid parameter Invalid Job ID')) {
    // Create new scan
    const scanData = await exports.create(body);
    // Update database
  }
}

Why: Job IDs expire after period, auto-recreation provides seamless experience

Publisher Merge Failure

Issue: Fetching publisher data fails

Handling: Logged but doesn't fail request

try {
  // Fetch and merge publisher data
} catch (err) {
  logger.error({
    initiator: 'external/yext/get-scan',
    error: err,
    message: 'Error while merging',
  });
  // Continue without publisher data
}

Why: Publisher data is enhancement, not critical

Missing Business Info

Issue: No body.name provided

Handling: Returns null

if (body?.name) {
  // Create scan
} else {
  return null;
}

Why: Business name required to create scan

TTL Expiration

Issue: Cached scan expires via TTL

Handling: Document automatically deleted by MongoDB

Result: Next getScan() creates fresh scan

Site ID Format

Issue: Multiple site IDs need joining

Handling: Maps and joins array

const sites = data.scan_data.sites.map(site => site.siteId);
// ['123', '456', '789']

await exports.get(jobId, sites);
// Internally: GET /scan/:jobId/123,456,789

⚠️ Important Notes

Separate API Key: Uses YEXT_API_KEY_SCAN (not YEXT_API_KEYS)
Caching: Results cached in MongoDB with TTL expiration
Auto-Recovery: Recreates scan if job ID expired
Publisher Enrichment: Merges publisher data for display
Graceful Failures: Publisher merge errors don't fail request
Logo Preservation: Logos from initial scan preserved
Field Cleanup: Removes _id, __v, expiresAt from response
Null Return: Returns null if no business info provided
Async Scanning: Scan results may show "SCANNING" status
Duplicate Detection: duplicates array contains found duplicates
Status Types: FOUND, NOT_FOUND, SCANNING
Job ID Validation: Auto-detects invalid job IDs via error code 21

Yext Integration Overview: index.md
Yext Scan API: https://developer.yext.com/docs/api-reference/scan/
YextScan Model: external/models/yext-scans.js
Logger Utility: external/utilities/logger.js

🎯 Scan Result Interpretation

Status Meanings

FOUND: Listing exists on publisher
NOT_FOUND: No listing found on publisher
SCANNING: Scan still in progress (check again later)

Duplicate Scenarios

No Duplicates:

{
  "status": "FOUND",
  "url": "https://...",
  "duplicates": []  // Single correct listing
}

With Duplicates:

{
  "status": "FOUND",
  "url": "https://...",  // Primary listing
  "duplicates": [
    {
      "url": "https://...",  // Duplicate 1
      "name": "...",
      "address": "..."
    },
    {
      "url": "https://...",  // Duplicate 2
      "name": "...",
      "address": "..."
    }
  ]
}

Action Items:

Claim correct listing
Remove/merge duplicates
Update incorrect information
Monitor for new duplicates

🚀 Performance Considerations

Caching Strategy:

Scans cached with TTL (auto-expiration)
Reduces API calls to Yext
Fresh scans on TTL expiry or manual delete

API Limits:

Scan API has rate limits (check Yext documentation)
Consider throttling scan requests
Cache results for reasonable period

Best Practices:

Don't scan on every request
Use cached results when available
Delete and rescan periodically (weekly/monthly)
Monitor for new duplicates proactively

📖 Overview​

🗄️ Collections Used​

yext-scans​

🔄 Data Flow​

Initial Scan Creation​

Cached Scan Retrieval​

🔧 Business Logic & Functions​

create(requestBody)​

get(scanId, siteIds)​

getScan(account_id, body)​

deleteScan(account_id)​

🔀 Integration Points​

Complete Scan Workflow​

Scan Monitoring Dashboard​

Duplicate Alert System​

🧪 Edge Cases & Special Handling​

Expired Job ID​

Publisher Merge Failure​

Missing Business Info​

TTL Expiration​

Site ID Format​

⚠️ Important Notes​

🔗 Related Documentation​

🎯 Scan Result Interpretation​

Status Meanings​

Duplicate Scenarios​

🚀 Performance Considerations​

Documentation Assistant

📖 Overview

🗄️ Collections Used

`yext-scans`

🔄 Data Flow

Initial Scan Creation

Cached Scan Retrieval

🔧 Business Logic & Functions

create(requestBody)

get(scanId, siteIds)

getScan(account_id, body)

deleteScan(account_id)

🔀 Integration Points

Complete Scan Workflow

Scan Monitoring Dashboard

Duplicate Alert System

🧪 Edge Cases & Special Handling

Expired Job ID

Publisher Merge Failure

Missing Business Info

TTL Expiration

Site ID Format

⚠️ Important Notes

🔗 Related Documentation

🎯 Scan Result Interpretation

Status Meanings

Duplicate Scenarios

🚀 Performance Considerations