gravatar
 · 
April 9, 2024
 · 
2 min read

Pivot data in Dataform

We like Dataform. It's built into BigQuery, and you don't have to bother setting up infrastructure, services, or pipelines to run it. You don't even have to link it to a remote git repository (although it's highly recommended!). The built-in editor has git pre-baked and is super user-friendly for people who just want to run SQL in a dag-like fashion.

It's got one problem, though. We tried to make a pivot() macro using JS, but found the best way to do it was by using an operation within an sqlx file.

Create a .sqlx file, just as you would usually, but instead of configuring it as a table or a view, configure it as an operation. It's basically like a stored procedure, but you get all the goodies of version control, dependency graphs, scheduled execution, and the Dataform API.

SQL
config {
  type: "operations",
  hasOutput: true // Necessary for dependency graphs.
}

/*
  PARAMETERS TO CUSTOMIZE:
  - source_schema: The schema of your source table.
  - source_table: The table you're pivoting.
  - dynamic_column: The column to pivot on.
  - dimension_1, dimension_2: Columns you want to group by.
  - pivot_metric: The metric column you're aggregating.
  - aggregation_function: The aggregation function (e.g., AVG, SUM).
  - order_by_column: The column to order your results by.
*/

DECLARE pivot_columns STRING;

-- Dynamic SQL for pivoting
DECLARE dynamic_sql STRING;
SET dynamic_sql = FORMAT("""
  CREATE OR REPLACE TABLE ${self()} FROM (
    SELECT 
      dimension_1,
      dimension_2,
      pivot_metric
    FROM ${ref({name:"source_table", schema:"source_schema"})}
  ) PIVOT (
    %s(pivot_metric) AS pivot_metric
    FOR dynamic_column IN (%s)
  ) ORDER BY %s ASC
""", 
  "aggregation_function", -- Placeholder for the aggregation function, replace as needed
  pivot_columns,
  "order_by_column" -- Placeholder for ordering, replace with dimension_1, dimension_2 as required
);

-- Generate the dynamic pivot columns
SET pivot_columns = (
  SELECT 
    STRING_AGG(DISTINCT FORMAT('"%s"', dynamic_column), ', ')
  FROM ${ref({name:"source_table", schema:"source_schema"})}
);

-- Execute the dynamic SQL
EXECUTE IMMEDIATE dynamic_sql;

You can then reference this file in your dependant .sqlx files, just as you would reference a table using the ${ref()} Javascript function.

If you're still struggling with it, let us know - I'd love to help! 👇

Stay Social

© Cobry Ltd | 0333 789 0102
24 Sandyford Place, Glasgow, Scotland, UK, G3 7NG
167/169 Great Portland Street, 5th Floor, London, W1W 5PF
Newsletter
Compliance
Privacy Policy

Care for a towel? 👀

logo-established-large