Revision control for source code - and especially Git - has caused a great leap forward in software development and delivery.
A similar revolution has not yet taken place in data. This talk will discuss the various #OpenSource #databases that are approaching this problem, the underlying architectures and challenges in building both a 'Git for data' and a 'GitHub for data'.
It will posit that to be a true collaboration and distributed system, it must be:
2) offline-first: work offline and then resync when online again
3) reliable: conflicts are handled properly
4) private: end-to-end-encrypted, if desired
5) efficient: only changes (diffs) to the data set are transmitted between participants
6) collaborative: multiple people can work on the same data set
Many applications choose the SaaS-route with one central database behind a web service and every frontend displays an instantaneous view of some part of the data set. This breaks most requirements. The database-as-a-service approach with an MVCC database & the flexibility to version schemas is a prerequisite for success. Finally, the talk will look to the future and the dawn of CI/CD for data.
Related Videos: Open Source DatabaseEngineering Data Reliably Using SLO Theory - Emily Gorcenski - Percona Live ONLINE 2020
What If We Could Use Machine Learning Models as Tables? - Jorge Torres - Percona Live ONLINE 2020
Mastering Open Source Data Governance - Elisha Chitsenga - Percona Live ONLINE 2020
How can databases capitalize on Computational Storage - PLO October 2020
Availability and Performance Tradeoffs in Global Database Deployments - Kevin Jernigan
Full Observability for Application Monitoring - Markus Strauss - Percona Live ONLINE 2020
Power Hour, Managing Databases at Scale - Peter Zaitsev Managing Databases at Scale - PLO October 2020