Ryan will demonstrate how FoundationDB can be applied to solve real business problems today and how to map common infrastructure components like logs, tables, and indexes into a cohesive system within FoundationDB. His example applies these techniques to a problem ClickFunnels was facing in mid-2018, which required scanning millions of end-user data points for each of their tens of thousands of customers multiple times per hour. Through custom bitmap indexes built on top of FoundationDB, queries which simply wouldn't finish now take milliseconds, which enables new use cases never thought possible.
I will present a subset of the most notable ClickHouse features over the last half of year:
- data skipping indices, including full text indices (with performance evaluation and insights on implementation);
- custom compression codecs for time series data;
- HDFS and Parquet integration;
- fuzzy string search (it is really fast fuzzy string search); multiple substring matching;
- sampling profiler on the query level;
- z-curve indexing;
- table and columns TTL;
Embedded databases, tightly integrated with application software, are a great alternative to standalone database systems for small applications. This talk will cover:
- Comparison of popular embedded databases engines (Berkeley DB, SQLite, Firebird Embedded, deprecated libmysqld embedded).
- How to design an application which is using an embedded database? When not to use it?
- What are the advantages and limitations of embedded database engines?
By the end of the session, attendees will learn the advantages, when and how to use an embedded database compared to using an external database.
We all use and love relational databases... until we use them for purposes for which they are not a good fit: queues, caches, catalogs, unstructured data, counters, and many other use cases could be solved with relational databases, but are better solved with other alternatives.
In this talk, we'll review the goals, pros and cons, and good and bad use cases of these alternative paradigms by looking at some modern open source implementations.
By the end of this talk, the audience will have learned the basics of three database paradigms (document, key-value, and columnar store) and will know when it's appropriate to opt for one of these or when to favor relational databases and avoid falling into buzzword temptations.
Columnar stores like ClickHouse enable users to pull insights from big data in seconds, but only if you set things up correctly. This talk will walk through how to implement a data warehouse that contains 1.3 billion rows using the famous NY Yellow Cab ride data. We'll start with basic data implementation including clustering and table definitions, then show how to load efficiently. Next we'll discuss important features like dictionaries and materialized views, and how they improve query efficiency. We'll end by demonstrating typical queries to illustrate the kind of inferences you can draw rapidly from a well-designed data warehouse. It should be enough to get you started--the next billion rows is up to you!
It's easy for Java developers (and users of other OO languages) to mix object-oriented thinking and imperative thinking. But when it comes to writing SQL the nightmare begins! Firstly because SQL is a declarative language and it has nothing to do with either OO or imperative thinking and as for one point it makes it relatively easy to express a condition in SQL it is not so easy to express it optimally and even worse to translate it to the OO paradigm. For another point, they need to think in terms of set and relational algebra, even if unconsciously!
In this talk, we'll see the most common mistakes that OO, and in special Java, developers make when writing SQL code and how we can avoid them.
At ITSumma, we provide 24/7 site reliability engineering for more than 300 clients with 10,000+ servers in total, collecting over 200 thousand metrics per second.
In 2010, we realized that existing monitoring systems could not handle our requirements. What we needed was the capability to instantly process and display analytics, store a minimum of 1 year's worth of data in 15-second, (better yet 1-second) intervals, and make quick-fire (as little as 200-millisecond) queries to retrieve high-resolution data snapshots.
That's why we developed our own monitoring system, and it worked well with the infrastructure of that time. In 2018, our system could no longer meet the requirements of new infrastructures, and had outlived its usefulness in some ways.
Since late 2018, we have been developing a new monitoring system.
To assist us with this project, we compared several major solutions for storing time-series data, including Prometheus storage, InfluxDB, Cassandra, Clickhouse and others.
We investigated their capabilities with our production data in terms of performance, stability, scalability, and storage usage.
At Percona Live I would like to present our findings and show the results of our production and performance tests which we consider useful for anyone interested in storing massive amounts of time series data.
Kirill & Kostja will present an overview of Tarantool, an open source in-memory DBMS. They will explain why it is cool to have a DBMS in the same address space as your application server, why Tarantool is in fact single threaded and why the in-memory DBMS now features new disk-oriented engine.
Finally, they'll explain, why a no-SQL DBMS now supports SQL.