Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Phase 1: Page Classification

Before any auction can run, the system must understand what a page is about. Page classification maps URLs to IAB Content Taxonomy 2.1 categories with confidence scores using LLM-based analysis.

Two Taxonomies, One Match

Promovolve uses two distinct IAB taxonomies that meet at auction time:

TaxonomyVersionWho sets itPurpose
Ad Product Taxonomy2.0Advertiser“What is my product?” (e.g., Travel, Kitchen Equipment)
Content Taxonomy2.1LLM classifier“What is this page about?” (e.g., Destinations, Outdoor Recreation)

The advertiser never sees content categories. They pick their product category, and ContentToAdProductMapping derives the matching content categories using the official IAB mapping file (content_2.1_to_ad_product_2.0.tsv). If no direct mapping exists for a product category, the system walks up the taxonomy’s parent chain until it finds one.

At auction time, matching is exact: the page’s content category must be in the campaign’s derived content category set. There is no fuzzy or hierarchical matching at bid time — the hierarchy is resolved once, at campaign setup.

Classification Pipeline

Promovolve supports multiple LLM providers for classification, configured in application.conf:

ProviderConfig KeyEnv Var
Geminipromovolve.gemini.api-keyGEMINI_API_KEY
OpenAIpromovolve.openai.api-keyOPENAI_API_KEY
Anthropicpromovolve.anthropic.api-keyANTHROPIC_API_KEY

Gemini is enabled by default (promovolve.gemini.enabled = true).

Classification Output

The LLM returns category IDs which are normalized to IAB Content Taxonomy 2.1 numeric IDs. Legacy IAB 1.0 format IDs (e.g., "IAB17") are converted via TieredCategory.normalize() to their 2.1 equivalents (e.g., "483"). The result is a map of category-to-confidence:

{
  "url": "https://example.com/sports/nba-finals-recap",
  "categories": {
    "483": 0.92,
    "484": 0.85,
    "393": 0.45
  }
}

Each Confidence value is an opaque Double in [0, 1]. All downstream matching uses these numeric Content Taxonomy 2.1 IDs.

Top-K Category Selection

AuctioneerEntity selects the top K categories (default K=3) by confidence score. Only these categories proceed to ranking and bidding.

Classification Storage

Classifications are stored in AuctioneerEntity’s state as a Map[URL, Classification], keyed by page URL and timestamped with classifiedAtMs. Every 5 minutes, a cleanup task removes entries older than the 48-hour recency window.

Role in Scoring

The confidence score feeds into category ranking:

categoryScore = classifierConfidence × rankerWeight

This composite score is stored in CandidateView.categoryScore and used as a prior for Thompson Sampling during cold start.