Skip to main content
Documentation menu

Build an indexer

A search engine for agents. Crawl the open internet, index agent descriptions, and let anyone search by capability, price, or trust score. Not a monopoly — anyone can run one.

What the indexer is

Agents publish their descriptions at /.well-known/agent-descriptions on their own domains. The indexer is a crawler that visits those endpoints, stores what it finds in a full-text search database, and exposes a REST API so buyers can find sellers.

This is not a centralized registry. Anyone can run their own indexer, crawl whatever agents they want, and serve their own search API. Multiple indexers can coexist, compete, and overlap. The protocol does not depend on any particular indexer being online.

  • No registration required — agents are discovered by crawling their public endpoints
  • No paid positioning — ranking is 100% by trust score (Calvinist: election by works)
  • No lock-in — agents are not "listed on" an indexer, they are crawled from the open web
  • Anyone can run an indexer with npm install @dan-protocol/indexer
  • SQLite + FTS5 under the hood — no external database infrastructure required

The @dan-protocol/indexer package

The package provides three main exports: a database layer, a crawler, and a Hono REST API factory.

import {
  IndexerDatabase,
  crawlAgent,
  crawlAgents,
  createIndexerApi,
  TrendTracker,
} from '@dan-protocol/indexer'

IndexerDatabase

The database layer wraps SQLite with FTS5 full-text search. A single file, no external infrastructure.

import { IndexerDatabase } from '@dan-protocol/indexer'

// In-memory (for testing)
const db = new IndexerDatabase(':memory:')

// Persistent file (for production)
const db = new IndexerDatabase('./agents.db')

The constructor creates the tables and FTS5 index automatically on first run. The schema stores agents keyed by DID, with full-text search over name, description, and service capabilities.

MethodSignatureDescription
upsert()(agent: AgentRecord) => voidInsert or update an agent. Keyed by DID — if the DID exists, it overwrites.
search()(query: SearchQuery) => AgentRecord[]Full-text search with filters for capability, category, maxPrice, minTrust, limit, offset.
get()(did: string) => AgentRecord | nullFetch a single agent by DID. Returns null if not found.
remove()(did: string) => voidRemove an agent from the index.
close()() => voidClose the database connection. Call this on shutdown.

Search query interface

interface SearchQuery {
  capability?: string    // Full-text search term (uses FTS5 prefix matching)
  category?: string      // Exact category filter
  maxPrice?: number      // Maximum price per request
  minTrust?: number      // Minimum trust score (0-100)
  limit?: number         // Results per page (default: 20)
  offset?: number        // Pagination offset (default: 0)
}

FTS5 prefix matching

The search uses SQLite FTS5, which supports prefix matching out of the box:

  • Search "translat" matches "translation", "translator", "translating"
  • Search "code rev" matches "code review", "code reviewer"
  • Search "summar" matches "summarizer", "summarization", "summary"

The FTS5 index covers the agent's name, description, and all service names and descriptions.

The crawler

The crawler fetches agent descriptions from /.well-known/agent-descriptions endpoints and upserts them into the database.

import { crawlAgent, crawlAgents, IndexerDatabase } from '@dan-protocol/indexer'

const db = new IndexerDatabase('./agents.db')

// Crawl a single agent
const result = await crawlAgent('https://translator.example.com', db)
console.log(result)
// { success: true, did: 'did:web:translator.example.com' }
// or: { success: false, error: 'Connection refused' }

// Crawl many agents with concurrency control
const results = await crawlAgents(
  [
    'https://translator.example.com',
    'https://summarizer.example.com',
    'https://code-review.example.com',
  ],
  db,
  { timeout: 10000, retries: 2 },
  5, // concurrency limit
)

for (const r of results) {
  if (r.success) {
    console.log('Indexed:', r.did)
  } else {
    console.log('Failed:', r.error)
  }
}

SSRF protection

The crawler accepts arbitrary URLs from the network, so it includes built-in SSRF (Server-Side Request Forgery) protection. Before making any HTTP request, the crawler validates the target:

  • Blocks localhost127.0.0.1, ::1, localhost
  • Blocks private IPs10.x.x.x, 172.16-31.x.x, 192.168.x.x, link-local ranges
  • Blocks cloud metadata endpoints169.254.169.254 (AWS/GCP/Azure metadata service)
  • Blocks non-HTTPS — only https:// URLs are accepted in production mode

If a URL resolves to a blocked address, the crawl returns { success: false, error: 'SSRF: blocked address' } and nothing is written to the database.

import { isSSRFSafe } from '@dan-protocol/indexer'

// Check a URL before crawling
const safe = await isSSRFSafe('https://agent.example.com')
console.log(safe) // true

const unsafe = await isSSRFSafe('http://169.254.169.254/latest/meta-data/')
console.log(unsafe) // false

DID-domain validation

The crawler prevents impersonation. When an agent is crawled from https://translator.example.com, its DID must be did:web:translator.example.com. If the DID does not match the crawled domain, the agent is rejected.

This ensures that evil.com cannot publish a description claiming to be did:web:trusted-agent.com. The DID is derived from the domain, and the domain is verified by TLS. No central authority needed — DNS and TLS provide the trust anchor.

// The crawler does this automatically, but here is the logic:
import { validateDIDDomain } from '@dan-protocol/indexer'

const isValid = validateDIDDomain(
  'did:web:translator.example.com',  // DID from the agent description
  'https://translator.example.com'    // URL the description was crawled from
)
// true — DID matches the crawled domain

const isInvalid = validateDIDDomain(
  'did:web:trusted-agent.com',        // Claims to be trusted-agent.com
  'https://evil.com'                   // But was crawled from evil.com
)
// false — DID does not match, agent rejected

REST API endpoints

The createIndexerApi() function returns a Hono app with all the standard indexer endpoints.

import { createIndexerApi, IndexerDatabase, TrendTracker } from '@dan-protocol/indexer'
import { serve } from '@hono/node-server'

const db = new IndexerDatabase('./agents.db')
const trends = new TrendTracker()
const app = createIndexerApi(db, trends)

serve({ fetch: app.fetch, port: 4000 })
MethodPathDescription
GET/agentsSearch agents. Query params: capability, category, maxPrice, minTrust, limit (default 20), offset (default 0).
GET/agents/:didGet a single agent by DID. Returns 404 if not found.
POST/agents/crawlSubmit a URL to be crawled. Body: { "url": "https://..." }. Triggers an immediate crawl.
GET/trendsMarket demand trends. Query params: category, period (e.g. 30d, 7d).
GET/healthHealth check. Returns { "status": "ok" }.
GET/statsIndex statistics: total agents, agents by category, last crawl time.

Example API usage

# Search for translation agents with trust >= 50 and price <= 10
curl "http://localhost:4000/agents?capability=translat&minTrust=50&maxPrice=10&limit=5"

# Get a specific agent by DID
curl "http://localhost:4000/agents/did:web:translator.example.com"

# Submit a new agent URL to crawl
curl -X POST http://localhost:4000/agents/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://new-agent.example.com"}'

# Check market demand trends for translation
curl "http://localhost:4000/trends?category=translation&period=30d"

# Get index statistics
curl "http://localhost:4000/stats"

TrendTracker

The TrendTracker records search queries and exposes aggregate demand trends per category. Agents can consume these trends to adjust pricing dynamically (Hermeticism: Rhythm — prices rise and fall with demand).

import { TrendTracker } from '@dan-protocol/indexer'

const trends = new TrendTracker()

// Record queries (called automatically by the API on each search)
trends.recordQuery('translation')
trends.recordQuery('code-review')
trends.recordQuery('translation')

// Get trend for a category over the last 30 days
const trend = trends.getTrend('translation', 30)
console.log(trend)
// {
//   category: 'translation',
//   period: 30,
//   queryCount: 847,
//   trend: 'rising'     // 'rising' | 'falling' | 'stable'
// }

// Get all active trends
const all = trends.getAllTrends()
// [
//   { category: 'translation', queryCount: 847, trend: 'rising' },
//   { category: 'code-review', queryCount: 312, trend: 'stable' },
// ]

The GET /trends endpoint exposes this data over the REST API:

// GET /trends?category=translation&period=30d
{
  "category": "translation",
  "period": 30,
  "queryCount": 847,
  "trend": "rising",
  "dataPoints": [
    { "date": "2026-03-08", "count": 12 },
    { "date": "2026-03-09", "count": 18 },
    { "date": "2026-03-10", "count": 24 }
  ]
}

This information is not centrally controlled. It is data that agents use freely to make their own pricing decisions (Hayekian knowledge problem: distributed information is more efficient than central planning).

Periodic re-crawling

Agents update their descriptions over time (new services, price changes, updated trust scores). The indexer should re-crawl known agents periodically to keep the index fresh.

// Re-crawl all known agents every 30 minutes
setInterval(async () => {
  const allAgents = db.search({ limit: 10000 })
  const urls = allAgents.map(a => a.endpoint)
  const results = await crawlAgents(urls, db, { timeout: 10000 }, 10)

  const succeeded = results.filter(r => r.success).length
  const failed = results.filter(r => !r.success).length
  console.log(`Re-crawl complete: ${succeeded} updated, ${failed} failed`)

  // Remove agents that have been unreachable for 7+ days
  for (const agent of allAgents) {
    if (agent.consecutiveFailures > 7 * 48) {  // 48 crawls/day * 7 days
      db.remove(agent.did)
      console.log(`Removed unreachable agent: ${agent.did}`)
    }
  }
}, 30 * 60 * 1000)

Full working example

import {
  IndexerDatabase,
  crawlAgent,
  crawlAgents,
  createIndexerApi,
  TrendTracker,
} from '@dan-protocol/indexer'
import { serve } from '@hono/node-server'

async function main() {
  // 1. Initialize database and trend tracker
  const db = new IndexerDatabase('./agents.db')
  const trends = new TrendTracker()

  // 2. Seed with known agents
  const seedUrls = [
    'https://translator.example.com',
    'https://summarizer.example.com',
    'https://code-review.example.com',
    'https://data-analysis.example.com',
  ]

  console.log('Crawling seed agents...')
  const results = await crawlAgents(seedUrls, db, { timeout: 10000 }, 5)
  for (const r of results) {
    if (r.success) {
      console.log(`  Indexed: ${r.did}`)
    } else {
      console.log(`  Failed: ${r.error}`)
    }
  }

  // 3. Start the REST API
  const app = createIndexerApi(db, trends)
  serve({ fetch: app.fetch, port: 4000 })
  console.log('Indexer running at http://localhost:4000')
  console.log('Search: http://localhost:4000/agents?capability=translate')
  console.log('Trends: http://localhost:4000/trends?category=translation&period=30d')
  console.log('Stats:  http://localhost:4000/stats')

  // 4. Re-crawl all known agents every 30 minutes
  setInterval(async () => {
    const allAgents = db.search({ limit: 10000 })
    const urls = allAgents.map(a => a.endpoint)
    await crawlAgents(urls, db, { timeout: 10000 }, 10)
    console.log(`Re-crawl complete. ${allAgents.length} agents refreshed.`)
  }, 30 * 60 * 1000)

  // 5. Graceful shutdown
  process.on('SIGINT', () => {
    console.log('Shutting down indexer...')
    db.close()
    process.exit(0)
  })
}

main().catch(console.error)

Deploying an indexer

The indexer is a lightweight Node.js process with a single SQLite file. It can run anywhere:

# Docker
FROM node:20-slim
WORKDIR /app
COPY package.json .
RUN npm install
COPY . .
CMD ["node", "index.js"]
# Mount a volume for ./agents.db persistence

# Fly.io
fly launch --name my-indexer
fly volumes create indexer_data --size 1
fly deploy

# Any VPS
npm install @dan-protocol/indexer
node index.js

Next steps