Data Provider Guide
This guide explains how data providers publish signal definitions viaadagents.json signals[], enabling AI agents to discover, verify authorization, and activate signals for advertising campaigns.
The Problem
Data providers (Pinnacle Data, Meridian Analytics, Apex Segments, etc.) own valuable audience and contextual data, but integrating with the growing ecosystem of AI-powered advertising agents presents challenges: Discovery is fragmented. Each signals agent (Luminary Data, Nova DSP, etc.) needs custom integrations to know what signals you offer. There’s no standard way for an AI agent to ask “what automotive purchase intent signals does Pinnacle Data have?” Authorization is opaque. When a buyer receives a signal from a signals agent, they can’t verify that the agent is actually authorized to resell it. They have to trust the intermediary. Signal semantics are inconsistent. Without standardized definitions, an AI agent can’t know whether “auto_intenders” is a binary segment, a propensity score, or a multi-value category—making it impossible to construct proper targeting expressions. Scaling requires N×M integrations. Every data provider needs custom integrations with every signals agent. This doesn’t scale.The Solution
Published signal definitions solve these problems by letting data providers publish machine-readable descriptions of their signals at a well-known URL. This enables:- Discovery: AI agents can find signals via natural language (“find automotive purchase intent signals”) or structured lookup
- Authorization verification: Buyers can verify authorization by checking the data provider’s domain directly
- Typed targeting: Signal definitions include value types (binary, categorical, numeric) so agents can construct correct targeting expressions
- Scalable partnerships: Authorize agents once in
adagents.json; as you add signals, authorized agents automatically have access
Overview
Data providers own audience and contextual data (purchase intent, demographics, behavioral segments). Thesignals[] publishing model lets you publish your signals in a standardized format that:
- Enables discovery via natural language queries
- Provides authorization verification for agents
- Describes signal characteristics (binary, categorical, numeric)
- Supports tag-based grouping for efficient authorization
The Parallel Pattern
| Publishers | Data Providers |
|---|---|
| Declare properties (websites, apps) | Declare signals (audiences, segments) |
| Authorize agents to sell inventory | Authorize agents to resell signals |
Use property_ids / property_tags | Use signal_ids / signal_tags |
Buyers verify via publisher_domain | Buyers verify via data_provider_domain |
/.well-known/adagents.json as the publishing mechanism. A single adagents.json file can declare both properties and signals simultaneously — see Unified declaration model.
File Location
Data providers host their published signal definitions at:Basic Structure
Signal Definition
Each signal in thesignals array describes a targetable segment:
The signal definition is not itself a signal_ref. In adagents.json, the hosting domain supplies the namespace, so the definition only needs its local id. A media-buy product that makes this provider-published signal selectable points back to it with signal_ref: { "scope": "data_provider", "data_provider_domain": "<this adagents.json domain>", "signal_id": "<signals[].id>" }. If a seller publishes its own signals[], the seller can be the data-provider domain for these references; scope: "data_provider" means adagents.json-resolved, not necessarily third-party.
Required Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier within your published signals[]. Pattern: ^[a-zA-Z0-9_-]+$ |
name | string | Human-readable signal name |
value_type | enum | Data type: binary, categorical, or numeric |
Optional Fields
| Field | Type | Description |
|---|---|---|
description | string | Detailed description of what this signal represents |
tags | array | Tags for grouping (lowercase, alphanumeric: ^[a-z0-9_-]+$) |
allowed_values | array | For categorical signals: valid values |
range | object | For numeric signals: { min, max, unit } |
restricted_attributes | array | Restricted attribute categories this signal touches (e.g., ["health_data"]). Enables structural governance matching. |
policy_categories | array | Policy categories this signal is sensitive for (e.g., ["children_directed"]). Enables structural governance matching. |
taxonomy | object | External taxonomy metadata for discovery, review, and translation. Does not change value_type or package targeting grammar. |
data_sources | array | IAB Data Transparency Standard-aligned source categories used to compile the signal. |
methodology | enum | How membership or attribute values were determined: observed, declared, derived, inferred, or modeled. |
modeling | object | Required when methodology is modeled or audience_expansion is true. Includes seed source, training jurisdictions, AI risk class, and disclosure notes. |
countries | array | ISO 3166-1 alpha-2 countries where the signal is applicable. |
consent_basis | array | GDPR Article 6 lawful basis or consent basis for processing this signal’s data. Use countries, policy_categories, and disclosure fields for non-GDPR obligations. |
data_subject_rights | object | Per-signal rights-routing channels and commitments for access, erasure, objection, and related requests. Use this when routing or SLA varies by signal, upstream source, or custom/private signal context. |
Definition Enrichment
Use enrichment fields when buyers, sellers, governance agents, or regulators need more than a name and value type:| Purpose | Fields | When to use |
|---|---|---|
| Discovery and taxonomy | taxonomy, taxonomy.values, taxonomy.value_mappings, taxonomy.parent_match_behavior | The signal maps to an external audience, content, retail-media, or provider-owned taxonomy. For categorical signals, value_mappings maps each package-targeting allowed_values[] string to a stable taxonomy node; each mapping value should match an allowed_values[] entry. |
| Source and methodology | data_sources, methodology, segmentation_criteria, criteria_url, refresh_cadence, lookback_window, onboarder | The provider wants DTS-style transparency. onboarder is required for offline or public-record sources. |
| Modeling and disclosure | modeling, audience_expansion, device_expansion | Required when methodology is modeled or audience_expansion is true. device_expansion is for deterministic cross-device expansion; probabilistic cross-device inference should use modeling. modeling.training_data_jurisdictions must be non-empty. If modeling.disclosure.required is true, modeling.disclosure.jurisdictions is required. |
| Privacy and governance | countries, consent_basis, restricted_attributes, policy_categories, art9_basis | The signal may be jurisdictionally limited, policy-sensitive, or subject to special-category review. |
| Rights routing | data_subject_rights | Rights requests route differently by signal or upstream source. Each routing block must include at least one channel supporting one or more of access, erasure, or objection. response_sla_days is the maximum response time for the declared channels. Signal definitions do not declare Global Privacy Control support, and consumers must not infer GPC handling from this routing block. |
parent_match_behavior is descendants_supported, the seller may expand known, version-pinned parent nodes internally, typically by ORing children in its execution system. Package targeting still uses the signal’s declared value_type.
Publishing large definition resources
Provider-publishedadagents.json is the authoritative definition surface, but
large external resources do not need to be duplicated in every get_signals
listing. For large taxonomies, long segmentation criteria, or
jurisdiction-specific disclosure text, publish stable pointers such as taxonomy.ref,
criteria_url, and modeling.disclosure.jurisdictions[].disclosure_url; use
taxonomy.etag when the taxonomy has its own freshness validator. For
provider-published public signals, the signal_ref already points to the
authoritative definition: fetch the provider’s /.well-known/adagents.json
(following authoritative_location when present) and select the matching
signals[].id. Buyers cache the fetched adagents.json document by resolved
authoritative URL plus the existing catalog_etag, HTTP ETag/Last-Modified,
or a bounded TTL; clients must not key signal-definition caches by validator
alone. No signal-specific validator is required. Signal agents can then return
compact discovery listings while buyers fetch the full definition only when they
need the deeper review context.
Signal Value Types
Binary Signals
User either matches or doesn’t. Most common type.Categorical Signals
User has one of several possible values.Numeric Signals
User has a score or measurement within a range.Authorization Patterns
Pattern 1: Signal IDs (Direct References)
Authorize specific signals by ID: Heresignal_ids means local catalog IDs from this adagents.json signals[] array, not the deprecated SignalId object used by older Signals Protocol payloads.
Pattern 2: Signal Tags (Efficient Grouping)
Authorize all signals with certain tags:Signal Tags
Thesignal_tags object provides metadata for tags used in signals:
- Human-readable context for buyers exploring your signal definitions
- Enables efficient authorization (“all premium signals”)
- Groups related signals for easier discovery
How Buyers Use Your Signal Definitions
1. Discovery
Buyers callget_signals on a signals agent. The agent may use your published definitions for:
- Natural language matching (“find automotive purchase intent signals”)
- Structured lookup by
signal_ref
2. Authorization Verification
When a buyer receives a signal, they can verify authorization:https://pinnacle-auto-data.com/.well-known/adagents.json and checks:
- Does the signal exist in the
signalsarray? - Is the signals agent in
authorized_agents? - Does the authorization cover this signal (by ID or tag)?
3. Targeting
Based onvalue_type, buyers construct targeting expressions:
Local and External Signal References
Not all signals come from third-party data providers. Sellers and publishers can also define their own signals - custom models, first-party data, or composite segments - the same way they define other named resources inadagents.json. When those signals are meant to be portable and externally referenceable, publish them in the signals array and reference them from products with scope: "data_provider" using the seller’s own publishing domain. When they are only meaningful within one product/package context, expose them only as product-local signal_targeting_options with scope: "product".
Use signal_ref for get_signals, media-buy product targeting, and package-level buy-time targeting:
| Reference | Meaning |
|---|---|
{ "scope": "product", "signal_id": "high_intent_shoppers" } | Product-local signal option, meaningful inside the selected product/package context |
{ "scope": "data_provider", "data_provider_domain": "pinnacle-data.example", "signal_id": "auto_intenders" } | Signal defined in a data provider’s published adagents.json signals[] |
{ "scope": "signal_source", "signal_source_url": "https://signals.example/.well-known/adcp/signals", "signal_id": "custom_model_run_123" } | Source-native signal not published in upstream adagents.json signals[] |
signal_id.source shape is deprecated and retained only for backwards compatibility with older Signals Protocol clients. New discovery, activation, and media-buy targeting surfaces use signal_ref. For product-local signals exposed on both get_products and a product-contextual get_signals response, signal_ref.signal_id MUST match across both surfaces.
Complete Example
A fulladagents.json signal definition set for an automotive data provider:
Location data provider example
A geo/mobility provider’s published signal definitions use the same structure but with location-specific signals. Here’s thesignals array for a provider publishing foot traffic and mobility data:
binary for yes/no store visitation, numeric for visit frequency with a meaningful range, and categorical for classified mobility behavior.
Identity / demographic provider example
An identity company’s published signal definitions can include consumer segments derived from financial records, surveys, and public data. Note: these are targeting segments, not raw data. Credit-derived signals may carry regulatory obligations (FCRA) — consult your compliance team before publishing.Retail media provider example
Retailers have first-party purchase data that doubles as high-value targeting signals. A retail media network can publish signals alongside its properties in the sameadagents.json:
Validation
Use the AdAgents.json Builder to validate your published signal definitions, or validate programmatically:- Required fields (
id,name,value_typefor each signal) - ID patterns (alphanumeric with underscores/hyphens)
- Tag consistency (tags used in signals should be defined in
signal_tags) - Authorization references (signal_ids/signal_tags should reference existing signals/tags)
Best Practices
1. Use Descriptive IDs
2. Provide Complete Metadata
Includedescription so buyers understand what each signal represents.
3. Use Tags for Scalability
As your signal set grows, tags enable efficient authorization without listing every signal ID.4. Document Value Types Clearly
For categorical signals, always includeallowed_values. For numeric signals, include range with unit.
5. Keep Files Updated
Updatelast_updated timestamp when signals change. Buyers cache these files - stale data causes authorization failures.
Declaring governance metadata
Signal definitions support two optional fields that enable structural governance matching:restricted_attributes and policy_categories. When declared, governance agents can match signals against a campaign plan’s restrictions deterministically instead of relying on semantic inference from signal names.
restricted_attributes
Declare which GDPR Article 9 special categories of personal data a signal touches. Values:racial_ethnic_origin, political_opinions, religious_beliefs, trade_union_membership, health_data, sex_life_sexual_orientation, genetic_data, biometric_data.
restricted_attributes: ["health_data"], a governance agent blocks this signal without needing to interpret the description.
policy_categories
Declare which policy categories a signal is sensitive for. Policy categories group related regulatory regimes —children_directed covers COPPA, UK AADC, and GDPR Article 8. Values are registry-defined category IDs.
Combining both fields
A signal can declare both when it touches restricted personal data and is relevant to a specific regulatory regime:Relationship to the Policy Registry
Signal definitions declarepolicy_categories and restricted_attributes using the same vocabulary as the Policy Registry. These fields enable governance agents to match signal metadata against policy entries during campaign validation.
| Signal field | Registry equivalent | Purpose |
|---|---|---|
policy_categories | policy_categories on policy entries | Declares which regulatory regimes the signal touches (e.g., children_directed, health_wellness) |
restricted_attributes | restricted_attributes on policy categories | Declares which GDPR Article 9 special categories the signal touches (e.g., health_data, racial_ethnic_origin) |
policy_categories values and restricted attribute definitions for valid restricted_attributes values.
Integration with get_adcp_capabilities
Signal agents advertise available data providers viaget_adcp_capabilities:
Next Steps
- Create your adagents.json with your published signal definitions
- Host at
/.well-known/adagents.jsonon your domain - Validate using the AdAgents.json Builder
- Partner with signals agents who will resell your data
- Add agents to authorized_agents as partnerships are established
Related Documentation
- Signals Protocol Overview - How signals work in AdCP
- get_signals Task - Signal discovery API
- activate_signal Task - Signal activation API
- adagents.json Tech Spec - Full adagents.json reference (property-focused)