November 27, 2014

Progress with ClickAider project

About three months ago I announced ClickAider to become available to general public. And I think it is about the time to write about the progress we have with this project for those who interested.

The project generates decent interest and we have about 3000 sites Registered over this time, which I consider decent number especially as we did not do much of advertisement and PR keeping it low profile and working out few bugs which we might have.

We use GeoIP DNS based load balancing between “gathering” servers in Europe and US which seems to work very well both providing level of HA if one of the servers goes down and allowing to increase accuracy by reducing round trip. Over time we are planning to get more locations with pair of servers in each so we do not need to use relatively slow DNS based fail over if one of the servers goes down.

We get some 600 tracking events per second on our lighttpd tracking servers which currently works well and there is still some capacity available but we’re still planning to get rid of little PHP code we have left at this layer to get it even more efficient. It would be good to handle some 5000 events/sec per server.

MySQL 5.1 with Partitioning works nicely for data storage with no MySQL bugs hitting us with this project so far. We use Innodb tables now because checking and repairing MyISAM is nightmare and PBXT which could be good for this work is just not ready.

MySQL Performance is in fact the most serious issues we have to work with, even now reporting only clicks statistics for some huge sites like Mininova may take quite a while to generate.

The typical solution for trackers is to have summary data built one way or around and we might need to do some of it for certain most common queries. At this point however we’re looking how much performance we can get from real time aggregation because we want absolutely unrestricted dynamic filters and dynamic timezones and this makes things hard to aggregate. This is surely fun challenge to deal with.

We also continue to work on adding more Advertisers we support and improving a ways we track the old ones. Recently we’ve added support for Vibrant Media Intellitxt (hovers only at this point) . We also now support ShoppingAds even though they are not yet out of private beta version.

Looking at the site we’ve added major improvement of saved reports – now you can create custom reports with all filters you would want to and save them for quick use at later time. For example you can track what are the most popular click directions for US audience compared to general audience or track performance for referrals from given domain name to see if partnership makes any sense for you.

Finally we’ve added Demo Account so you do not have to register any more to see system in action. We used our MySQL Performance Forums site for the demo, which might be a bit low traffic but still good to see how system works. This is the reason why we added Google Adsense Adds on that site.

If you have any other ideas what would you like to see implemented in ClickAider, let us know.

About Peter Zaitsev

Peter managed the High Performance Group within MySQL until 2006, when he founded Percona. Peter has a Master's Degree in Computer Science and is an expert in database kernels, computer hardware, and application scaling.

Comments

  1. Brian Aker says:

    H!

    That is wonderful to hear about partitioning. How many partitions are you running with? Did you have to adjust file descriptors?

    Cheers,
    -Brian

  2. What are you using for GeoIP DNS based load balancing?

  3. peter says:

    Norbert,

    We’re using PowerDNS. Actually GeoIP is probably wrong name to put to it – it has a bit different database but it is similar in meaning – mapping of IPs to countries.

    Aurimas is still to write the posts with good instructions for this, especially as it uses MySQL and MySQL Replication to transfer DNS zone to secondary DNS servers. So let me know if you’re interested and this will likely motivate Aurimas :)

  4. peter says:

    Brian,

    We’re using Innodb so file handlers are less of the issue and these are cheap so It is not the big problem.
    We use one partition per month so it is not hell a lot of partitions.

    Two things which are inconvenient for us is lack of automatic partitioning – we can’t tell we want to partition by month and have partitions automatically created, this has to be done manually and restriction of using one storage engine per partition. I would love to keep only current month in Innodb and compress the rest to Packed MyISAM.

    It is also worth to note large number of partitions causes very significant overhead – we tried to create partitions by day but that was way to slow on the larger date ranges.

  5. Peter,

    I’m very interested in it, please motivate Aurimas. :-) I’m currently looking for a DNS-based load balancing solution, but focused mostly on appliances (e.g. from Zeus). Good to know that PowerDNS does DNS-based load balancing as well.

  6. Brian Aker says:

    Hi Peter,

    I am hearing that there are issues at around 300 partitions, so I am curious about your setup. The stated limit is 1024, but I have not met anyone who can make that work. In testing I can’t get MyISAM anywhere near this, and for Archive I modified the use of file descriptors and open handles o that it doesn’t have the problems MyISAM is having.

    So did you set up your partitions so that they are “into the future”? Re-partioning requires a complete rebuild.

    Cheers,
    -Brian

  7. Vadim says:

    Brian,

    We have 25 tables per server, each divided into 12 partitions (one per month).
    We cover period from Jun-2007 to May-2008, so as you see we have “future” partitions.
    Now we have to not forget to add new partitions near May-2008. It would be good MySQL does it automatically :)

  8. peter says:

    I have not tested with large number of partitions. But what is the trouble ? Does it just require a lot of file handlers or it starts to get into the problems ?

    Generally that high number of partitions can be nightmare if a lot of partitions needs to be searched, ie there is lookup by non partitioned key.

  9. Basi says:

    Maybe you can use the event scheduler available on 5.1 to create new partitions on the fly, only at the moment the new partitions are needed.

  10. I agree with Basi on this option. I have implemented a solution using partitioning under 5.1 using the event scheduler to automagically create new partitions as needed. This was because of the limitation Brian mentioned in trying to exceed the 300 mark. From memory, I ended up restricting it to 250 and did a roll of the partitions – create a new one and archive off the oldest. The event scheduler made this job much easier to achieve, but I never tried to solve the higher number of partitions, as I put it down to the beta state of the software.

  11. Brian Aker says:

    Hi!

    Lookup on a non-partitioned key is one of the issues with partitioning, the other is just the use of resources. When a table is opened all of the underlying tables are opened at the same time. Depending on engine and number of partitions this can turn out pretty bad.

    Cheers,
    -Brian

  12. peter says:

    Jonathon,

    Surely you can use event scheduler or cron job to do the stuff. It is just another piece to maintain independent of table with partitions itself. Especially as in this case there are multiple partitioned tables on the server and their number can change at the same time. Surely you can still do it it is just more complicated than some auto creation would be.

  13. peter says:

    Thanks Brian,

    So there is nothing totally unexpected. Though I would expect lazy opening would make a lot of sense for partitioning so table is only opened first time it is accessed. There are many cases, like joins when static partition pruning does not work but by join conditions only few partitions will be touched.

  14. Brian Aker says:

    Hi!

    Right now only Archive does a lazy open (and I am thinking about adding a reaper to go through and close based on non-usage).

    One thing to consider for those talking about events. An event that does an alter will lock the table up throughout the alter. You also need 2X the diskspace during the alter (half for the old partition, and half for the new).

    Cheers,
    -Brian

  15. peter says:

    That is nasty. Why do not you allow to add partitions online if none of partitions are affected, or at least only affect the partitions which are having their range allocation changed.

  16. Brian Aker says:

    Hi!

    It just wasn’t written that way (and I would agree that it is bad). Partitioning on “at rest” data makes sense or on data that doesn’t require 24/7. For 24/7… I suspect someone is going to have to come up with a solution that doesn’t require blocking.

    I need to look and see if NDB can now do online partitioning adding, I know it was talked about.

    Cheers,
    -Brian

  17. AlexN says:

    Adding demo account is the most important improvement. Now, even in beta stage
    the service looks attractive. Spylog is too expensive, Google analytics has too
    many bugs that they are not going to fix soon. At reasonable price it would be
    nice alternative.
    The biggest problem with all these sites is connection availability. It is better
    to loose some information than to loose some visitors, annoyed by “connecting to
    #$@analytics.com” message.

  18. Emin says:

    I know it is not a bottleneck now, but suggest to look at nginx instead of lighttpd as well. Much better performance.

  19. peter says:

    Emin,

    Can you show any benchmarks which would show significantly better performance for nginx compared to lighttpd ?

    I’ve seen nginx being say 10% faster for example but that is minor. I’ve also see people comparing things wrong, ie you need to configure multiple workers in lighttpd 1.4 to serve many small files from disk efficiently etc.

    The choice for lighttpd in this case was made because there is good documentation for developing modules, together with good enough performance.

  20. Emin says:

    Peter, I am running a website which is although much smaller, still gets some few hundreds requests for static content and a few dozen requests for dynamic content per second.

    I do not have formal benchmarks at hands, but I have myself experienced a significant performance improvement on the server. Very important point that is quite often omitted in the benchmarks is that although nginx may be for example 10% faster in serving requests, CPU and memory usage are much, much less, actually almost zero. This leaves it all for PHP/MySQL and other memory/cpu hungry processes. My server load averages have dropped significantly after I moved to nginx and its flexibility in for example adjusting config files or even upgrading the web server without stopping service for a second are unbeatable.

  21. adrian ilarion ciobanu says:

    re: What are you using for GeoIP DNS based load balancing? (Norbert)

    there is actually a djbdns-based dns-auth server called geoipdns that has some more functionality and adds views with per-record granularity instead of per-zone for geo-based filters/rules: http://pub.mud.ro/wiki/Geoipdns

Speak Your Mind

*