Pipeline Architecture
Eight source platforms deliver data files to a shared GCS storage bucket. From there, an automated three-stage pipeline processes each file through BigQuery — first capturing it as-is, then cleaning and transforming it, then rolling it into the analysis tables that power the AI Analyst.
- Delivery report
- Match type report
- Search delivery report
- Delivery report
- Performance report
- Delivery report
- Media cost
- Conversions (P2C)
- Final attribution
- Creative distribution
- Registrations
- LeadView lead list
google_ads/delivery/). Uploading any file here automatically kicks off the pipeline — no manual trigger needed.ref.pipeline_sources. Metadata is written back to the GCS file to mark it as processed.
snake_case, and all values are loaded into BigQuery exactly as they arrived — no type-casting or transformation at this stage. ZIP and GZ files are decompressed automatically.ref.dim_market. Source-specific logic is applied — e.g. parsing MTA placement strings, classifying Innovid creative types.ref.pipeline_analysis_dependencies determines which analysis tables need to be rebuilt based on which source just updated.
Current outputs:
analysis.media_deliveryThe primary cross-source delivery table, joining cleaned impressions, clicks, and spend data across Google Ads, Microsoft, Meta, and Nexxen.
Additional tables for registrations, search performance, and attribution are planned and will be added as their source data pipelines are completed.
Every step across all three stages is logged here: the source file, target table, timestamp, row count, status, and any error message. In addition, metadata is written back to the GCS file itself to mark it as processed. If something fails at any point in the pipeline, this is the first place to check.
All data remains within GCP (us-east1). The pipeline is fully event-driven — uploading a file is the only action required. Manual rebuilds of any analysis table are available via the corresponding manual_refresh_[table name] saved query in BigQuery.
How the Pipeline Is Configured
The pipeline's behavior is controlled by two configuration tables in BigQuery, not by code. Adding a new data source or changing what gets rebuilt downstream means updating a table row — no Cloud Function changes required.
ref.pipeline_sourcesThis table is the single source of truth for what the pipeline recognizes. If a source isn't listed here, the pipeline won't act on it.
ref.pipeline_analysis_dependenciesThis is what makes the pipeline fully automatic end to end. When new delivery data arrives from Google Ads, Microsoft, Meta, or Nexxen, the pipeline knows — without any manual instruction — that
analysis.media_delivery needs to be refreshed.
The configuration tables above define the rules — the Cloud Functions carry them out.
ref.pipeline_sources with its GCS folder path and target raw table. If it should contribute to an analysis table, add a corresponding row to ref.pipeline_analysis_dependencies. No code changes are needed.