Summarize Query Results by Week

When writing queries to track your metrics, it is common (and easy) to summarize your data by date. If you have a timestamp column (“2016-01-01 00:00:00”), there are built in functions to remove the time component. In MySQL and Postgres the date() function will do that for you. Using this method in the group by statements makes summarizing data very easy.

However, when you want to group your data by week, there is no easy built in method that will allow you to generate a human readable, group-by compatible date stamp.

The Bad Way:
The first instinct may be to want to generate a year week string (“2016 W1”). This is easy to do with a string concatenation call on two date components. However, its much better to still generate a valid date stamp for one simple reason: summarizing data like this is mostly being done to graph the numbers. Most charting libraries or tools do not support the “2016 W1” string. This is not a problem if you have at least one data point for ever week (no gaps in the data). If you do have gaps in the data, where nothing happens on one week, you will not be able to graph a timeline correctly and you will be looking at a distorted view.

The Better Way:
To ensure that your data will always render correctly in a timeline view, it is much better to determine a date transformation like the built in date() method but to determine one particular day of the week (start or end) and group by that value:

  1. Get the date stamp from the timestamp object
  2. Add X days to date based on the day of the week that date falls on
  3. You are done!


// Start of Week (Monday)
DATE_ADD(date(created_at), INTERVAL(-WEEKDAY(created_at)) DAY)

// End of Week (Sunday)
DATE_ADD(date(created_at), INTERVAL(6-WEEKDAY(created_at)) DAY)

Postgres / Redshift:

// Start of Week (Monday) 
date(date_trunc('week', created_at))

// End of Week (Sunday)
DATEADD(day, 6, date(date_trunc('week', created_at )))

Thats all there is  to it. The same concept can be applied to group by full months, years, or quarters and still just use a plain date string as the aggregator. This ensures you will always be able to throw your data into a charting library and easily generate actually useful timelines.