CIKM 2018 Tutorial: Multi-model Databases and Tightly Coupled Polystores
As more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. One of the most challenging issues in the era of big data is the ``Variety'' of the data. It may be presented in various types and formats -- structured, semi-structured and unstructured -- and produced by different sources, and hence natively have various models. In general, there are two solutions to manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware over multiple single-model data stores.
In this tutorial, we review and compare these two approaches giving insights on their advantages, trade-offs, and research opportunities. In particular, we dive into four key aspects of technology for both systems, namely (1) theoretical foundation of multi-model data management, (2) storage strategies for multi-model data, (3) query languages across models, and (4) query evaluation and its optimization. We provide a comparison of performance for the two approaches and discuss related open problems and remaining challenges.
Authors: Jiaheng Lu and Irena Holubova and Bogdan Cautis
Tutorial in Proc. of the International Conference on Information and Knowledge Management (CIKM), October 26th, 2018.
1.Motivation and multiple model examples
3.Multi-model data storage
4.Multi-model data query languages
5.Multi-model query processing
6.Overview on tightly integrated polystores
7.Query processing in tightly integrated polystores
8.Advanced aspects of tightly integrated polystores
9.Comparison of multi-model databases and tightly integrated polystores
10.Open problem and challenges
Code and Data
Open multi-model datasets: The Helsinki Multi-Model Dataset Repository collects and integrates publicly available datasets in diverse formats, and provides statistics on the datasets, for use in research experiments.
- UniBench: Towards Benchmarking Multi-Model DBMS
Jiaheng Lu is an Associate Professor at the University of Helsinki, Finland. His main research interests lie in the Big Data management and database systems. He has published more than eighty journal and conference papers. He has published several books, on XML, Hadoop and NoSQL databases. His book on Hadoop is one of the top-10 best-selling books in the category of computer software in China in 2013. He has frequently served as a PC member for conferences including SIGMOD, VLDB, ICDE, EDBT, CIKM etc.
Irena Holubova is an Associate Professor at the Charles University, Prague, Czech Republic. Her current main research interests include Big Data management and NoSQL databases, evolution and change management of database applications, analysis of real-world data, and schema inference. She has published more than 80 conference and journal papers; her works gained 4 awards. She has published 2 books on XML and NoSQL databases.
Bogdan Cautis is a Professor at the Department of Computer Science of University of Paris-Sud, France, since Sept. 2013. Before that, he was an Associate Professor at Telecom Paristech, Paris (2007--2013). His current research interests lie in the broad area of data management and information retrieval, including social data management and database theory. He has frequently served as a PC member for conferences including SIGMOD, VLDB, ICDE, EDBT etc.