NO-SQL and NewSQL impact on next generation SQL standard for unified big data systems Pr Serge Miranda, University of Nice Sophia Antipolis, (MBDS) and LIS France Gaetan Lescouflair, University of Nice (MBDS) and LIS (UMA)
Big Data is the evolution of computing boundaries ( one Zeta Bytes (ZB) = 10**21; 1000 EXA) 2020* 40ZB ZB2.8ZB1.2ZB0.1ZB Volume Variety Velocity IDC Estimates that by 2020, business transactions on the internet - business-to-business and business-to-consumer - will reach 450 billion per day. *Source : IDC Digital Universe in 2020 Mobility Big Data Cloud/IOT Copyright Serge Miranda 2018
SGBD orientés SERIES TEMPORELLES : Open TSDB, KAIROS DB Plethora of BIG DATA MANAGEMENT SYSTEMS (Inspired by ASLETT 2015)
Goal ? « An effective mathematical model that encompasses the concepts of SQL, NoSQL and NewSQL would enable their interoperability » Kepner (MIT, 2016)
TWO complementary Computational models in BIG DATA MANAGEMENT DATA Management is a SCIENCE (theoretical) CONCEPTS + METHODS + TOOLS « Computational model » STRUCTURES + OPERATORS Data Structures : SET, GRAPH, MATRIX, SERIES Operators : Set operators, Graph traversal, Ordering, sub-graph, linear algebra,… 2 complementary Computational models under big data constraints (3Vs) DATA retrieval USER QUERY DATA analysis Method? Interpretation?
« VARIETY » ( BIG DATA) ? DATA STRUCTURED data (SQL, OQL) SCHEMA NO-STRUCTURED data (N.O.SQL) SEMI-STRUCTURED data (RDF, SparQL, OWL) METADATA Copyright Serge Miranda 2018
Major SQL milestones Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis Relational DATA Model(CODD70) Object-Relational DATA Model (3rd manifesto) Semantical web Google BIG TABLE NEWSQL SQL NO-SQL (DATE 95) (RDF98) (CHANG08) (CATTEL10), (STON13) SQL2 SQL3 (OQL) SPARQL, …RQL in 2004 {{{{{{ STRUCTURED DATA (predefined schema) SEMI- STRUCTURED DATA (metadata) NON- STRUCTURED DATA
Codds Relational algebra (SQL foundation) « RELATION » : « Set » or « predicate » 2 dimensional arrays with 4 specific algebraic operators (Select, Project, Join, Division) to get an abstraction of the real world : set of arrays (schema) COLUMN implementation for decision support AD HOC applications on BIG TABLES or Value AGGREGATION CLASSIFICATION algorithms (ML & DL) LINE (tuple) implementation for transaction support Closeness + Completeness + Orthogonality of the relational ALGEBRA QUERY INTERFACE without any programming to retrieve DATA (code being predifined once for EVERY TYPE of RESEARCH!) NON-PROCEDURAL PROGRAMMING ! NOTE : Some operators (like join) are big-size sensitive,… join algorithm wih sorting, index (b-tree), parallelism,…
SQL2 – Relational- (Example) What are the pilots (number and names) from Nice who are in duty (flights) from Nice ? SELECT pl#, plname FROM pilot, flight WHERE pilot.pl#= flight.pl# and pilot.adr= Nice and flight.dc= Nice; Codds algebra V1 = Join Pilot (pl#= pl#) flight V2 = Select V1 (adr= Nice and dc=Nice) RES = Project V2 (pl#, plname)
SQL3 (object relational) - Example What are the pilots (number and names) from Nice who are in duty (flights) from Nice ? SELECT REFPIL PL#,PLNAME FROM flight WHERE DC= 'Nice' and REFPIL ADR =Nice; Note : with : - REFPIL attribute of REF type encompassing ROWID (OID) from Pilot (Rowid is a tuple pointer not a value) - « » : Dereferencing operator
OQL (ODMG) - Example- What are the pilots (number and names) from Nice who are in duty (flights) from Nice ? SELECT p.Pl#, p.PLNAME FROM p in pilot f in p.insureflight WHERE p.adr= Nice and f.dc=Nice; Note : With « insureflight», bidirectional REF pointer defined in ODMG schema from PILOT class to FLIGHT class
RDF graph (Example) :Serge:AF100 AIRBUSA320 Paul :ispasengeroflight:isusedinflight : insureflight :drivesplane
SPARQL (Example) What are the pilots from Nice in duty (flight) from Nice ? Prefix rdf : SELECT ? Pilote WHERE { GRAPH ?g { ?pilote rdf :address rdf: Nice ?vol rdf:dc rdf: Nice }}
Mathematics underlying data mngt DATA type paradigm ptiesData model Math theory Data structures Data opsREF Structured data (TRANSACTION oriented) VALUE POINTER/ VALUE TIPS RICE Codds RM Object RM (DATEs 3rd Manifesto) SET GRAPH Relation/ TABLE NF2, CLASS Relational algebra (Codd 70) (DATE 95) Semi- structured DATA (SEARCH oriented) PREDICATE /VALUE WHAT RDF data model GRAPHCLASS(RDF 98) UNSTRUCTURED DATA (analytics oriented) KEY/ VALUE &Graph WHAT Key/blob Key/doc Key/column GRAPHCLASS & DOCUMENT NOMAD ALGEBRA (NA) Associative Array (AA) algebra ( Chang 08) NEWSQL (analytics oriented) VALUE & Graph RM (sparse) MATRICES TABLESNA & AA (Cattel 10) Polystore//D4 model ARRAYS arraysAA (Duggan 15)
SQL, RDF, NoSQL and NewSQL on an example SQL : a set of rows within a table Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis FLIGHT2F#PilotNamePlaneN AF100SergeAirbusA320 AF101PeterB747 AF102SergeB747 SELECT * From FLIGHT2 WHERE Pilotname=Serge;
Semi-structured data: RDF graph (for 1st Row) :Serge:AF100 AIRBUSA320 :isusedinflight : insureflight :drivesplane
NoSQL (graphs) Serge AF100AF102 A320 B747 Peter AF101
NewSQL (matrix) AF100 AF101 AF102 Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis MT V MTV SergePeter
Mathematical bridge over SQL, NO SQL and NEWSQL Erik Meijer, Partner Architect, Microsoft, CACM 2011 …the industry needs a common query language and data model to feed the ecosystem for key-value stores. The UnQL language presents an important practical next step in this process. We are looking forward to working with Couchbase and other industry leaders in the NoSQL space on taking the design to the next level. (CACM 2011 on CoSQL) Jeremy Kepner (MIT, 2016) « An effective mathematical model that encompasses the concepts of SQL, NoSQL and NewSQL would enable their interoperability »
Bridging the gap between SQL, NO SQL and NEWSQL 2 existing Formal approaches « associative arrays » (MIT, 2016) « category theory » (Microsoft Research 2011) Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis
SQL and No SQL/new SQL SQL focus on SET THEORY for TRANSACTIONS NoSQL and NewSQL focus on Graph theory and Matrix mathematics for high-performance DATA ANALYSIS with mathematical properties such as : Associativity, commutativity, distributivity, identity, annihilator and inverses Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis
coSQL, Microsoft Research (CACM, 2011)* *« A co-Relational Model of Data for Large Shared Data Banks » Erik Meijer and Gavin Bierman, Microsoft Research, CACM 2011 DATA MODEL for common noSQL databasesnamely, key/value relationships demonstrate that this data model is the mathematical dual of CODDs relational data model of foreign-/primary key relationships. « instead deeply connected via beautiful mathematical theory (category theory)»
EX : « Category » (labelled directed graph) wikipedia Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis Nodes of a category : « OBJECTS » (identity arrow for each object) Arrows of a category : « morphism » « COMPOSITION », « IDENTITY » and « DUALITY » EX : Schematic representation of a category with objects X, Y, Z and morphisms f, g, g f. (The category's three identity morphisms 1 X, 1 Y and 1 Z, if explicitly represented, would appear as three arrows, from the letters X, Y, and Z to themselves, respectively.)
« MONADS » (Category) and UQL Big Data ManagementMathematical modelParadigm SQLCodds relational data model (SET THEORY or 1st Order LOGIC) VALUE PRIMARY KEY/FOREIGN KEY Synchronous ACID N.O.SQLGRAPHS & HypergraphsKEY/VALUE Asynchronous BASE UQLCATEGORY THEORY MONADS (comprehension)» « key/value stores are the mathematical DUAL of SQL FK/PK » coSQL 2011
towards a « Monad Algebra » for NoSQL (key value data stores) & SQL F-algebra (F /Functor) in category theory to generalize algebraic structure to model data structures as Lists and trees Lattices are F-algbra Algebraic structures are F-algebra … Towards a « MONAD ALGEBRA » for SQL and NoSQL (Key value data stores) Machine Learning interfaces Operators to be emulated in Category algebra : Equivalence, Rename PROJECT (Extended projection), SELECT, JOIN UNION, INTERSECTION, DIFFERENCE AGGREGATION (aggregation function on all the values of a column) ML and DL (matrix) Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis
BIG QUESTION S?
BIO Serge Miranda is a full-time Professor of Computer Science at the University of Nice Sophia Antipolis (UNS), France, a position he has held since October 1983 after a PH-D in Toulouse University (France) and a Master thesis at UCLA (University of California, Los Angeles with an INRIA Scholarship). He has been running a MBDS master degree and innovation laboratory (since 1992) devoted to data base, Big Data and mobiquitous information systems with important financial involvement of industry partners to prototype information services of the future. MBDS is de facto an INNOVATION laboratory which gave the key initial impetus of Nice becoming the 1 st NFC city in Europe in MBDS has been successfully delocalized in Haiti (since 1998), Morocco, Madagascar and Russia. Serge Miranda founded and became the first president (until 2012) of a multidisciplinary University foundation DreamIT on December 2009 around the MBDS kernel. DreamIT was key in rebuilding MBDS facility in Haiti: on April 2011, was inaugurated the first mobiquitous smart building in America for MBDS degree in the Science Campus of the University of Haiti in Port of Prince (the initial MBDS building was destroyed during the devastating 2010 earthquake) On March the 21st 1998, he was decorated (Chevalier Ordre du Merite) by Senator Pierre Laffitte (founder of Sophia Antipolis science park) on behalf of the Ministry of Industry of France for recognition of his original contribution between higher education and industry in the science park of Sophia Antipolis. Strictement Confidentiel Pr Serge Miranda (Directeur MBDS)