Mastering Date Dimensions with BigQuery Procedures

 Efficient date management is crucial for any data analytics project. A calendar table, often called a "date dimension," provides a precomputed dataset that simplifies time-based queries. In this article, we’ll explore how to create a dynamic and comprehensive calendar table in BigQuery using a stored procedure. This solution handles holidays, quarters, and other essential date features, making it a valuable tool for analytics.

Why You Need a Calendar Table

If you’re analyzing time-series data, you’ve likely faced challenges with date calculations. A well-designed calendar table lets you:

  • Avoid repetitive date logic in your queries.
  • Enable rich time-based analysis (e.g., fiscal quarters, holidays).
  • Optimize performance by precomputing frequently used fields.

BigQuery’s stored procedures allow us to automate the creation of such a table, saving time and ensuring consistency.

Key Features of the Calendar Table

Here’s what our solution provides:

  1. Dynamic Year Input: The stored procedure accepts a year as input. If no year is provided, it defaults to the upcoming year.
  2. Comprehensive Date Fields: Includes yearQuarter, yearSemester, dayName, and holiday markers (isHoliday).
  3. Previous Month Calculations: Fields like lastBom, lastEom, and last_eom_cutoff_time offer insights into the prior month's start and end.

This design enables flexibility for both business and technical users.


The Query in Action

The stored procedure automatically generates a calendar for the given year and inserts it into a table. Below is a breakdown of the logic:

  1. Dynamic Defaults and Validation:

    • If no year is provided, it calculates the next year.
    • Ensures the input is within logical bounds (1900–2100).
  2. Date Range and Deletion:

    • Defines the year’s start and end dates.
    • Deletes any existing rows for the same year to prevent duplicates.
  3. Comprehensive Fields:

    • Fields like isHoliday use day-of-week logic to mark Saturdays and Sundays.
    • Custom fields like last_eom_cutoff_time append a fixed UTC time to prior month-end dates.

Example Output

The table generated by this procedure includes the following fields:

dateKeydayNameisHolidayyearQuarteryearSemesterlastEomlast_eom_cutoff_time
2024-01-01MondayFALSE2024Q12024S12023-12-312023-12-31 17:00:00 UTC
2024-01-02TuesdayFALSE2024Q12024S12023-12-312023-12-31 17:00:00 UTC

With fields like yearQuarter and isHoliday, you can easily implement business logic for reporting or planning.


How This Helps Your Analytics

  1. Performance Optimization: Precomputed fields reduce query complexity.
  2. Ease of Use: Business users can directly query attributes like yearQuarter without needing custom logic.
  3. Automation: With a single stored procedure, you can generate consistent date data year after year.

Conclusion

Building a calendar table in BigQuery isn’t just about dates—it’s about enabling smarter analytics. This stored procedure simplifies the process, ensuring your team can focus on insights instead of repetitive date logic.

Try implementing this solution in your projects to unlock the full potential of your time-series data!



CREATE OR REPLACE PROCEDURE your_dataset.insert_calendar(year_input INT64 DEFAULT NULL)
BEGIN
  DECLARE start_date DATE;
  DECLARE end_date DATE;
  DECLARE input_year INT64;

  -- Validasi input: Jika NULL, gunakan tahun depan
  SET input_year = IFNULL(year_input, EXTRACT(YEAR FROM CURRENT_DATE()) + 1);

  -- Validasi tambahan untuk tahun input (logis)
  IF input_year < 1900 OR input_year > 2100 THEN
    RAISE USING MESSAGE = 'Invalid year_input. Please input a year between 1900 and 2100.';
  END IF;

  -- Menentukan awal dan akhir tahun berdasarkan parameter
  SET start_date = DATE(input_year, 1, 1);
  SET end_date = DATE(input_year, 12, 31);

  -- Menghapus data untuk tahun yang akan di-generate
  DELETE FROM your_dataset.calendar
  WHERE EXTRACT(YEAR FROM dateKey) = input_year;

  -- Menyisipkan data kalender ke tabel
  INSERT INTO your_dataset.calendar (
    dateKey,
    bom,
    eom,
    period,
    eom_cutoff,
    yearNumber,
    dayName,
    isHoliday,
    yearQuarter,
    yearSemester,
    lastBom,
    lastEom,
    last_eom_cutoff_time
  )
  SELECT
    day AS dateKey,
    DATE_TRUNC(day, MONTH) AS bom,
    LAST_DAY(day, MONTH) AS eom,
    FORMAT_DATE('%Y%m', day) AS period,
    DATETIME(LAST_DAY(day, MONTH), '23:59:59') AS eom_cutoff,
    EXTRACT(YEAR FROM day) AS yearNumber,
    FORMAT_TIMESTAMP('%A', TIMESTAMP(day)) AS dayName,
    -- Menentukan apakah hari adalah hari libur (Sabtu/Minggu)
    CASE WHEN EXTRACT(DAYOFWEEK FROM day) IN (7, 1) THEN TRUE ELSE FALSE END AS isHoliday,
    -- Format kuartal tahun
    FORMAT('%dQ%d', EXTRACT(YEAR FROM day), EXTRACT(QUARTER FROM day)) AS yearQuarter,
    -- Format semester tahun
    FORMAT('%dS%d', EXTRACT(YEAR FROM day), IF(EXTRACT(MONTH FROM day) <= 6, 1, 2)) AS yearSemester,
    -- Beginning of the month sebelum tanggal saat ini
    DATE_TRUNC(DATE_SUB(day, INTERVAL 1 MONTH), MONTH) AS lastBom,
    -- End of the month sebelum tanggal saat ini
    LAST_DAY(DATE_SUB(day, INTERVAL 1 MONTH), MONTH) AS lastEom,
    -- Last EOM dengan jam 17:00 UTC
    TIMESTAMP(LAST_DAY(DATE_SUB(day, INTERVAL 1 MONTH), MONTH), '17:00:00+00:00') AS last_eom_cutoff_time
  FROM UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day;
END;