Phase 1: Page Classification

Before any auction can run, the system must understand what a page is about. Page classification maps URLs to IAB Content Taxonomy 2.1 categories with confidence scores using LLM-based analysis.

Two Taxonomies, One Match

Promovolve uses two distinct IAB taxonomies that meet at auction time:

Taxonomy	Version	Who sets it	Purpose
Ad Product Taxonomy	2.0	Advertiser	“What is my product?” (e.g., Travel, Kitchen Equipment)
Content Taxonomy	2.1	LLM classifier	“What is this page about?” (e.g., Destinations, Outdoor Recreation)

The advertiser never sees content categories. They pick their product category, and ContentToAdProductMapping derives the matching content categories using the official IAB mapping file (content_2.1_to_ad_product_2.0.tsv). If no direct mapping exists for a product category, the system walks up the taxonomy’s parent chain until it finds one.

At auction time, matching is exact: the page’s content category must be in the campaign’s derived content category set. There is no fuzzy or hierarchical matching at bid time — the hierarchy is resolved once, at campaign setup.

Classification Pipeline

Promovolve supports multiple LLM providers for classification, configured in application.conf:

Provider	Config Key	Env Var
Gemini	`promovolve.gemini.api-key`	`GEMINI_API_KEY`
OpenAI	`promovolve.openai.api-key`	`OPENAI_API_KEY`
Anthropic	`promovolve.anthropic.api-key`	`ANTHROPIC_API_KEY`

Gemini is enabled by default (promovolve.gemini.enabled = true).

Classification Output

The LLM returns category IDs which are normalized to IAB Content Taxonomy 2.1 numeric IDs. Legacy IAB 1.0 format IDs (e.g., "IAB17") are converted via TieredCategory.normalize() to their 2.1 equivalents (e.g., "483"). The result is a map of category-to-confidence:

{
  "url": "https://example.com/sports/nba-finals-recap",
  "categories": {
    "483": 0.92,
    "484": 0.85,
    "393": 0.45
  }
}

Each Confidence value is an opaque Double in [0, 1]. All downstream matching uses these numeric Content Taxonomy 2.1 IDs.

categoryScore = classifierConfidence × rankerWeight

This composite score is stored in CandidateView.categoryScore and used as a prior for Thompson Sampling during cold start.

Promovolve: Ad Auction Algorithms & Architecture

Phase 1: Page Classification

Two Taxonomies, One Match

Classification Pipeline

Classification Output

Top-K Category Selection

Classification Storage

Role in Scoring

Keyboard shortcuts

Promovolve: Ad Auction Algorithms & Architecture

Phase 1: Page Classification

Two Taxonomies, One Match

Classification Pipeline

Classification Output

Top-K Category Selection

Classification Storage

Role in Scoring