Skip to content

Technical Documentation

How Threada works, what it requires, and how to integrate it.

Architecture overview

  • Crawl — Fetch pages from your domain following sitemap and internal links
  • Render — Render JavaScript-heavy pages before extraction
  • Extract — Extract content including structured data
  • Chunk — Split content into semantic chunks
  • Embed — Convert chunks into vector embeddings
  • Index — Store embeddings for retrieval
  • Retrieve — Retrieve relevant chunks based on semantic similarity
  • Generate — Synthesize a response from retrieved content
  • Cite — Link the source page when relevant content is found

Crawling

What we crawl

  • HTML pages from sitemap or internal links
  • Structured data (JSON-LD, schema.org)
  • Manually uploaded documents (PDF, HTML, DOCX)

Crawl behavior

  • Respects robots.txt
  • User-Agent identified
  • Rate-limited to avoid server load
  • Backs off on 429 or 5xx

JavaScript rendering — Pages rendered using a headless browser before extraction for SPAs and dynamic content.

Re-indexing

  • Sitemap monitoring triggers automatic re-indexing
  • Manual re-index available anytime
  • Scheduled re-indexing configurable by plan

Retrieval

Semantic search Queries are converted to embeddings and matched by semantic similarity, not keywords.

Relevance thresholds Configurable. Higher thresholds = fewer, more confident answers. Lower thresholds = broader coverage with more clarifying questions.

Response generation

Retrieved content is passed as context to the language model. Responses are synthesized from provided context only.

Source citations — When relevant content is found, responses include a link to the source page. When acknowledging uncertainty, there may be no citation because there is no source to link.

Boundary behavior

  • Acknowledges what it doesn’t know
  • Asks clarifying questions
  • Suggests related topics

Embed integration

Installation Single script snippet. Loads asynchronously.

Compatibility

  • WordPress
  • Webflow
  • Shopify
  • Squarespace
  • Custom builds
  • SPAs (React, Vue, Angular)

Configuration via dashboard

  • Colors, logo, placement
  • Welcome message, tone
  • Language settings
  • Relevance thresholds

Public API

Secure, tenant-scoped API access for tickets and actions across channels.

Authentication & access

  • Send the API key in the `X-Api-Key` header.
  • Create and revoke keys in Admin → API keys; use one key per integration.
  • Keys are tenant-scoped and least-privilege via scopes.

Scopes

  • `support.read` — list and retrieve tickets, messages, and actions.
  • `support.write` — create/update tickets and append messages.
  • `support.actions` — execute actions and query action status.

Core endpoints

  • `/api/v1/public/tickets` — list or create tickets.
  • `/api/v1/public/tickets/{ticket_id}` — get or update ticket details.
  • `/api/v1/public/tickets/{ticket_id}/messages` — list or append messages.
  • `/api/v1/public/tickets/{ticket_id}/actions` — execute an action for a ticket.
  • `/api/v1/public/actions` — list or fetch action status.

Response format

  • JSON responses; timestamps are RFC 3339.
  • Errors return `{ error: { type, message, code } }`.
  • List endpoints accept `limit` and `page_token`.

Channel values

  • `web`, `email`, `sms`, `whatsapp`, `social`, `voice`, `custom`.
  • Use `channel_id` and `channel_thread_id` to map external conversations.
  • Use `external_message_id` to de-duplicate message writes.

Zapier setup

Connect Threada to Zapier for no-code automation with a controlled surface.

Outbound automation (Threada → Zapier)

  • Create a Zap with a Webhooks by Zapier “Catch Hook” trigger and copy the hook URL.
  • In Admin → Support → Integrations, add a Zapier integration with the webhook URL and optional secret header.
  • Use a Custom HTTP action to send payloads; if no custom HTTP integration is selected, Zapier is used by default.

Inbound automation (Zapier → Threada)

  • Create a scoped API key for Zapier (least privilege).
  • Use Webhooks by Zapier to call the Public API endpoints for tickets and actions.
  • Store the key securely in Zapier and rotate it on a schedule.

Enterprise hygiene

  • Separate keys per environment (production vs. sandbox).
  • Revoke keys when a Zap is disabled or ownership changes.
  • Prefer read-only scopes unless an automation must write.

Security

Hosting: GCP, US-Central1 default. Encryption: TLS 1.2+ in transit, AES-256 at rest. Authentication: SSO via OIDC/SAML (Enterprise). Compliance: GDPR-aligned, configurable retention, audit logging. Threat model documented covering prompt injection, XSS, SSRF, data exfiltration.

Data handling

Stored

  • Indexed content
  • Embeddings
  • Chat logs
  • Analytics
  • Configuration

Not stored

  • Payment info (handled by processor)
  • Credentials in plaintext

Retention Configurable per tenant. Deletion available on request.

Training Your content is not used to train AI models.

FAQ

Password-protected pages?
Not currently. Upload documents for private content.
Exclude pages?
Yes. Via robots.txt or dashboard configuration.
How fast are updates?
Automatic re-indexing typically within 24 hours. Manual re-index is immediate.
Site down during crawl?
Crawler backs off and retries. Previous content remains available.
See indexed content?
Yes. The dashboard shows indexed pages and status.
Multiple languages?
Yes. Auto-detect or set a default per embed.
Why no citation sometimes?
Citations link to sources. When acknowledging uncertainty, there’s no source to cite.