Announcement

Announcement Module
Collapse
No announcement yet.

Group by any date interval

Page Title Module
Move Remove Collapse
X
Conversation Detail Module
Collapse
  • Filter
  • Time
  • Show
Clear All
new posts

  • Group by any date interval

    I have a table with datestamped info (DATETIME type) and some tinyint and smallint columns. I want to calculate averages and sums over any date interval the user in my application wants.

    I have tried many ways but none seem to work properly. Has anybody done this before and has a definitive solution?

    The things I have tried:

    1) Convert to unix time:

    SELECT ... FROM ... GROUP BY FLOOR(UNIX_TIMESTAMP(TimeStamp)/86400)

    Where 86400 is the number of seconds in the grouping period (e.g. if you want to group by hour, then it is 60 * 60, for grouping by 2 hours it is 2 * 60 * 60 ).

    There are problems with this approach although I don't remember the exact details I'm afraid.

    2) Group by year, month, day, hour separately:

    SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timestamp, '%Y%m%d-%H:%i:%s')),STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), (FLOOR(DATE_FORMAT(timeStamp, '%H') / 2) ) * 2) ,'%Y%m%d%H') as CalculatedTimeStamp, sum(Length*(nVehicles1+nVehicles2+nVehicles3+nVehi cles4+nVehicles5))/sum(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nV ehicles5), sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= "2008-05-09 11:28:09" AND TimeStamp <= "2008-06-08 11:28:00" AND ( zoneid=1 OR zoneid=2 OR zoneid=3 OR zoneid=4 OR zoneid=5 OR zoneid=6 OR zoneid=7 OR zoneid=8) GROUP BY DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), FLOOR(DATE_FORMAT(timeStamp, '%H') / 2), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000


    This example groups per 2 hours. Notice the 'CalculatedTimeStamp', this calculation makes it so that each returned row starts at the start of an hour. The returned resultset will contains times like this:

    0:00
    2:00
    4:00
    6:00
    ...

    BUT:
    -> It does not work when you have summer and wintertime! If you group by hour:

    SELECT ZoneId, GROUP_CONCAT(DATE_FORMAT(timeStamp, '%H')), STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), DATE_FORMAT(timeStamp, '%H')) ,'%Y%m%d%H') as CalculatedTimeStamp, sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= "2008-03-30 0:01:00" AND TimeStamp <= "2008-03-30 14:01:00" AND ( zoneid=1) GROUP BY DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), DATE_FORMAT(timeStamp, '%H'), zoneid ORDER BY CalculatedTimeStamp LIMIT 0, 5000

    Then you get the following result:
    The group by happens correctly, the GROUP_CONCAT column shows:
    00,00,00,00
    01,01,01,01
    02,02,02,02
    03,03,03,03

    However, the calculated timestamp contains 3:00 2 times!
    30-mrt-2008 00:00:00
    30-mrt-2008 01:00:00
    30-mrt-2008 03:00:00
    30-mrt-2008 03:00:00

    This is due to the fact that at that time we switch from winter to summertime.

    Another problem with this approarch is that if the user selects to group by 7 days, it always gets truncated at the month. E.g.:
    1 jan -> 7 jan
    8 jan -> 16 jan
    17 jan -> 23 jan
    23 jan -> 30 jan
    31 jan -> 1 feb !!!

    That is ofcourse because I group by year first, then by month and then by day.


    Any help would be greatly appriciated, I already spend 3 days trying to figure out the correct way to do this.

    regards,

    Wim
Working...
X