Mastering Date Dimensions with BigQuery Procedures
Efficient date management is crucial for any data analytics project. A calendar table, often called a "date dimension," provides a precomputed dataset that simplifies time-based queries. In this article, we’ll explore how to create a dynamic and comprehensive calendar table in BigQuery using a stored procedure. This solution handles holidays, quarters, and other essential date features, making it a valuable tool for analytics.
Why You Need a Calendar Table
If you’re analyzing time-series data, you’ve likely faced challenges with date calculations. A well-designed calendar table lets you:
- Avoid repetitive date logic in your queries.
- Enable rich time-based analysis (e.g., fiscal quarters, holidays).
- Optimize performance by precomputing frequently used fields.
BigQuery’s stored procedures allow us to automate the creation of such a table, saving time and ensuring consistency.
Key Features of the Calendar Table
Here’s what our solution provides:
- Dynamic Year Input: The stored procedure accepts a year as input. If no year is provided, it defaults to the upcoming year.
- Comprehensive Date Fields: Includes
yearQuarter,yearSemester,dayName, and holiday markers (isHoliday). - Previous Month Calculations: Fields like
lastBom,lastEom, andlast_eom_cutoff_timeoffer insights into the prior month's start and end.
This design enables flexibility for both business and technical users.
The Query in Action
The stored procedure automatically generates a calendar for the given year and inserts it into a table. Below is a breakdown of the logic:
Dynamic Defaults and Validation:
- If no year is provided, it calculates the next year.
- Ensures the input is within logical bounds (1900–2100).
Date Range and Deletion:
- Defines the year’s start and end dates.
- Deletes any existing rows for the same year to prevent duplicates.
Comprehensive Fields:
- Fields like
isHolidayuse day-of-week logic to mark Saturdays and Sundays. - Custom fields like
last_eom_cutoff_timeappend a fixed UTC time to prior month-end dates.
- Fields like
Example Output
The table generated by this procedure includes the following fields:
| dateKey | dayName | isHoliday | yearQuarter | yearSemester | lastEom | last_eom_cutoff_time |
|---|---|---|---|---|---|---|
| 2024-01-01 | Monday | FALSE | 2024Q1 | 2024S1 | 2023-12-31 | 2023-12-31 17:00:00 UTC |
| 2024-01-02 | Tuesday | FALSE | 2024Q1 | 2024S1 | 2023-12-31 | 2023-12-31 17:00:00 UTC |
With fields like yearQuarter and isHoliday, you can easily implement business logic for reporting or planning.
How This Helps Your Analytics
- Performance Optimization: Precomputed fields reduce query complexity.
- Ease of Use: Business users can directly query attributes like
yearQuarterwithout needing custom logic. - Automation: With a single stored procedure, you can generate consistent date data year after year.
Conclusion
Building a calendar table in BigQuery isn’t just about dates—it’s about enabling smarter analytics. This stored procedure simplifies the process, ensuring your team can focus on insights instead of repetitive date logic.
Try implementing this solution in your projects to unlock the full potential of your time-series data!
CREATE OR REPLACE PROCEDURE your_dataset.insert_calendar(year_input INT64 DEFAULT NULL)
BEGIN
DECLARE start_date DATE;
DECLARE end_date DATE;
DECLARE input_year INT64;
-- Validasi input: Jika NULL, gunakan tahun depan
SET input_year = IFNULL(year_input, EXTRACT(YEAR FROM CURRENT_DATE()) + 1);
-- Validasi tambahan untuk tahun input (logis)
IF input_year < 1900 OR input_year > 2100 THEN
RAISE USING MESSAGE = 'Invalid year_input. Please input a year between 1900 and 2100.';
END IF;
-- Menentukan awal dan akhir tahun berdasarkan parameter
SET start_date = DATE(input_year, 1, 1);
SET end_date = DATE(input_year, 12, 31);
-- Menghapus data untuk tahun yang akan di-generate
DELETE FROM your_dataset.calendar
WHERE EXTRACT(YEAR FROM dateKey) = input_year;
-- Menyisipkan data kalender ke tabel
INSERT INTO your_dataset.calendar (
dateKey,
bom,
eom,
period,
eom_cutoff,
yearNumber,
dayName,
isHoliday,
yearQuarter,
yearSemester,
lastBom,
lastEom,
last_eom_cutoff_time
)
SELECT
day AS dateKey,
DATE_TRUNC(day, MONTH) AS bom,
LAST_DAY(day, MONTH) AS eom,
FORMAT_DATE('%Y%m', day) AS period,
DATETIME(LAST_DAY(day, MONTH), '23:59:59') AS eom_cutoff,
EXTRACT(YEAR FROM day) AS yearNumber,
FORMAT_TIMESTAMP('%A', TIMESTAMP(day)) AS dayName,
-- Menentukan apakah hari adalah hari libur (Sabtu/Minggu)
CASE WHEN EXTRACT(DAYOFWEEK FROM day) IN (7, 1) THEN TRUE ELSE FALSE END AS isHoliday,
-- Format kuartal tahun
FORMAT('%dQ%d', EXTRACT(YEAR FROM day), EXTRACT(QUARTER FROM day)) AS yearQuarter,
-- Format semester tahun
FORMAT('%dS%d', EXTRACT(YEAR FROM day), IF(EXTRACT(MONTH FROM day) <= 6, 1, 2)) AS yearSemester,
-- Beginning of the month sebelum tanggal saat ini
DATE_TRUNC(DATE_SUB(day, INTERVAL 1 MONTH), MONTH) AS lastBom,
-- End of the month sebelum tanggal saat ini
LAST_DAY(DATE_SUB(day, INTERVAL 1 MONTH), MONTH) AS lastEom,
-- Last EOM dengan jam 17:00 UTC
TIMESTAMP(LAST_DAY(DATE_SUB(day, INTERVAL 1 MONTH), MONTH), '17:00:00+00:00') AS last_eom_cutoff_time
FROM UNNEST(GENERATE_DATE_ARRAY(start_date, end_date)) AS day;
END;
Post a Comment