Group by any date interval

  • Filter
  • Time
  • Show
Clear All
new posts

  • Group by any date interval

    I have a table with datestamped info (DATETIME type) and some tinyint and smallint columns. I want to calculate averages and sums over any date interval the user in my application wants.

    I have tried many ways but none seem to work properly. Has anybody done this before and has a definitive solution?

    The things I have tried:

    1) Convert to unix time:


    Where 86400 is the number of seconds in the grouping period (e.g. if you want to group by hour, then it is 60 * 60, for grouping by 2 hours it is 2 * 60 * 60 ).

    There are problems with this approach although I don't remember the exact details I'm afraid.

    2) Group by year, month, day, hour separately:

    SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timestamp, '%Y%m%d-%H:%i:%s')),STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), (FLOOR(DATE_FORMAT(timeStamp, '%H') / 2) ) * 2) ,'%Y%m%d%H') as CalculatedTimeStamp, sum(Length*(nVehicles1+nVehicles2+nVehicles3+nVehi cles4+nVehicles5))/sum(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nV ehicles5), sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= "2008-05-09 11:28:09" AND TimeStamp <= "2008-06-08 11:28:00" AND ( zoneid=1 OR zoneid=2 OR zoneid=3 OR zoneid=4 OR zoneid=5 OR zoneid=6 OR zoneid=7 OR zoneid=8) GROUP BY DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), FLOOR(DATE_FORMAT(timeStamp, '%H') / 2), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000

    This example groups per 2 hours. Notice the 'CalculatedTimeStamp', this calculation makes it so that each returned row starts at the start of an hour. The returned resultset will contains times like this:


    -> It does not work when you have summer and wintertime! If you group by hour:

    SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timeStamp, '%H')), STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), DATE_FORMAT(timeStamp, '%H')) ,'%Y%m%d%H') as CalculatedTimeStamp, sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= "2008-03-30 0:01:00" AND TimeStamp <= "2008-03-30 14:01:00" AND ( zoneid=1) GROUP BY DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), DATE_FORMAT(timeStamp, '%H'), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000

    Then you get the following result:
    The group by happens correctly, the GROUP_CONCAT column shows:

    However, the calculated timestamp contains 3:00 2 times!
    30-mrt-2008 00:00:00
    30-mrt-2008 01:00:00
    30-mrt-2008 03:00:00
    30-mrt-2008 03:00:00

    This is due to the fact that at that time we switch from winter to summertime.

    Another problem with this approarch is that if the user selects to group by 7 days, it always gets truncated at the month. E.g.:
    1 jan -> 7 jan
    8 jan -> 16 jan
    17 jan -> 23 jan
    23 jan -> 30 jan
    31 jan -> 1 feb !!!

    That is ofcourse because I group by year first, then by month and then by day.

    Any help would be greatly appriciated, I already spend 3 days trying to figure out the correct way to do this.