AI Analyst Platform
Technical Documentation & Architecture Reference
market-analysis-451415
01 Architecture Overview

Pipeline Architecture

Eight source platforms deliver data files to a shared GCS storage bucket. From there, an automated three-stage pipeline processes each file through BigQuery — first capturing it as-is, then cleaning and transforming it, then rolling it into the analysis tables that power the AI Analyst.

Figure 1.1 — End-to-end data flow
Source platforms
Google Ads
  • Delivery report
  • Match type report
raw_google_ads
Microsoft
  • Search delivery report
raw_microsoft
Meta
  • Delivery report
  • Performance report
raw_meta
Nexxen
  • Delivery report
raw_nexxen
MTA
  • Media cost
  • Conversions (P2C)
  • Final attribution
raw_mta
Innovid
  • Creative distribution
raw_innovid
Polk
  • Registrations
raw_polk
Toyota
  • LeadView lead list
raw_toyota
Files are uploaded manually or via automated export to the GCS bucket, in formats including CSV, ZIP, and CSV.GZ.
GCS Storage Bucket — tcaa_ne_data_ingestion
A single shared storage bucket organized by source path (e.g. google_ads/delivery/). Uploading any file here automatically kicks off the pipeline — no manual trigger needed.
The upload event is detected automatically. The file path is used to look up routing and configuration in ref.pipeline_sources. Metadata is written back to the GCS file to mark it as processed.
Stage 1 — Ingest
The file is read, column names are standardized to snake_case, and all values are loaded into BigQuery exactly as they arrived — no type-casting or transformation at this stage. ZIP and GZ files are decompressed automatically.
On successful ingest, a notification triggers Stage 2.
Raw tables
One table per source file. All values stored as plain text strings. Serves as the permanent record of exactly what each platform delivered.
Stage 2 — Clean & Transform
Values are cast to proper types (dates, integers, decimals). Market names are resolved against ref.dim_market. Source-specific logic is applied — e.g. parsing MTA placement strings, classifying Innovid creative types.
Clean tables trigger Stage 3. The dependency map in ref.pipeline_analysis_dependencies determines which analysis tables need to be rebuilt based on which source just updated.
Stage 3 — Analysis
Cross-source joins produce the final reporting tables used by the AI Analyst. Each table is rebuilt in full whenever its upstream dependencies update.

Current outputs:
analysis.media_delivery

Additional tables for registrations, search performance, and attribution are planned and will be added as their source data pipelines are completed.
Analysis tables are queried by the AI Analyst.
AI Analyst
Queries the analysis dataset to answer questions about campaign performance, market trends, attribution, and more.
Audit trail — ref.pipeline_log
Every step across all three stages is logged here: the source file, target table, timestamp, row count, status, and any error message. In addition, metadata is written back to the GCS file itself to mark it as processed. If something fails at any point in the pipeline, this is the first place to check.
Storage / Output
Processing stages
BigQuery tables
Audit log

All data remains within GCP (us-east1). The pipeline is fully event-driven — uploading a file is the only action required. Manual rebuilds of any analysis table are available via the corresponding manual_refresh_[table name] saved query in BigQuery.