NO-SQL and NewSQL impact on next generation SQL standard for unified big data systems Pr Serge Miranda, University of Nice Sophia Antipolis, (MBDS) and.

Презентация:



Advertisements
Похожие презентации
Database Systems 1. The purpose of the lecture Learn to use database management software to develop data-intensive applications Become familiar with fundamental.
Advertisements

Relational Database 1Data model 2Relational Database 3 Relational algebra 4 Algebraic operations 4.1 Standard set-theoretic operations 4.2 Cartesian product.
Lecture 5. Data base systems. Basics of database systems: concept and architecture. Data models. Basics of SQL. Lecture 5. Data base systems. Basics of.
The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to.
«MODERN IT TRENDS IN THE PROFESSIONAL SPHERE». What is information? The word "information" is used in many different ways. Originally, it comes from a.
Computer software Operating systems. The objective of the lesson: To explain what the software is and why it exists; to learn to distinguish the system,
WEB SERVICES Mr. P. VASANTH SENA. W EB SERVICES The world before Situation Problems Solutions Motiv. for Web Services Probs. with Curr. sols. Web Services.
Loader Design Options Linkage Editors Dynamic Linking Bootstrap Loaders.
Microsoft Excel Performed: Kerimbayeva Dana Group: 145.
MY FUTURE PROFESSIONAL EXECUTED: YBRAIM S. A Cadastre is normally a parcel based and up-to-date land information system. It contains a record of interests.
Creating Grammar Activities and Tasks BY JOSH GASTON.
In The Name Of Allah, Most Gracious And Most Merciful.
Name: Yogesh Mehla Website: Phone:
Business Statistics 1-1 Chapter Two Describing Data: Frequency Distributions and Graphic Presentation GOALS When you have completed this chapter, you will.
BREADTH FIRST TRAVERSAL Lesson Plan -3. Evocation.
Intelligence framework for labour-market and educational services resources management Personalreserve Authors: Antonets A. Galushkin M. c.t.s. Kravets.
The waterfall model is a popular version of the systems development life cycle model for software engineering. Often considered the classic approach to.
© 2002 IBM Corporation Confidential | Date | Other Information, if necessary © Wind River Systems, released under EPL 1.0. All logos are TM of their respective.
1 Where is the O(penness) in SaaS? Make sure youre ready for the next wave … Jiri De Jagere Senior Solution Engineer, Progress Software Session 123.
A new interface model for the Jazyki Mira typological database Oleg Belyaev The research is supported by RFBR grant ( а.
Транксрипт:

NO-SQL and NewSQL impact on next generation SQL standard for unified big data systems Pr Serge Miranda, University of Nice Sophia Antipolis, (MBDS) and LIS France Gaetan Lescouflair, University of Nice (MBDS) and LIS (UMA)

Big Data is the evolution of computing boundaries ( one Zeta Bytes (ZB) = 10**21; 1000 EXA) 2020* 40ZB ZB2.8ZB1.2ZB0.1ZB Volume Variety Velocity IDC Estimates that by 2020, business transactions on the internet - business-to-business and business-to-consumer - will reach 450 billion per day. *Source : IDC Digital Universe in 2020 Mobility Big Data Cloud/IOT Copyright Serge Miranda 2018

SGBD orientés SERIES TEMPORELLES : Open TSDB, KAIROS DB Plethora of BIG DATA MANAGEMENT SYSTEMS (Inspired by ASLETT 2015)

Goal ? « An effective mathematical model that encompasses the concepts of SQL, NoSQL and NewSQL would enable their interoperability » Kepner (MIT, 2016)

TWO complementary Computational models in BIG DATA MANAGEMENT DATA Management is a SCIENCE (theoretical) CONCEPTS + METHODS + TOOLS « Computational model » STRUCTURES + OPERATORS Data Structures : SET, GRAPH, MATRIX, SERIES Operators : Set operators, Graph traversal, Ordering, sub-graph, linear algebra,… 2 complementary Computational models under big data constraints (3Vs) DATA retrieval USER QUERY DATA analysis Method? Interpretation?

« VARIETY » ( BIG DATA) ? DATA STRUCTURED data (SQL, OQL) SCHEMA NO-STRUCTURED data (N.O.SQL) SEMI-STRUCTURED data (RDF, SparQL, OWL) METADATA Copyright Serge Miranda 2018

Major SQL milestones Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis Relational DATA Model(CODD70) Object-Relational DATA Model (3rd manifesto) Semantical web Google BIG TABLE NEWSQL SQL NO-SQL (DATE 95) (RDF98) (CHANG08) (CATTEL10), (STON13) SQL2 SQL3 (OQL) SPARQL, …RQL in 2004 {{{{{{ STRUCTURED DATA (predefined schema) SEMI- STRUCTURED DATA (metadata) NON- STRUCTURED DATA

Codds Relational algebra (SQL foundation) « RELATION » : « Set » or « predicate » 2 dimensional arrays with 4 specific algebraic operators (Select, Project, Join, Division) to get an abstraction of the real world : set of arrays (schema) COLUMN implementation for decision support AD HOC applications on BIG TABLES or Value AGGREGATION CLASSIFICATION algorithms (ML & DL) LINE (tuple) implementation for transaction support Closeness + Completeness + Orthogonality of the relational ALGEBRA QUERY INTERFACE without any programming to retrieve DATA (code being predifined once for EVERY TYPE of RESEARCH!) NON-PROCEDURAL PROGRAMMING ! NOTE : Some operators (like join) are big-size sensitive,… join algorithm wih sorting, index (b-tree), parallelism,…

SQL2 – Relational- (Example) What are the pilots (number and names) from Nice who are in duty (flights) from Nice ? SELECT pl#, plname FROM pilot, flight WHERE pilot.pl#= flight.pl# and pilot.adr= Nice and flight.dc= Nice; Codds algebra V1 = Join Pilot (pl#= pl#) flight V2 = Select V1 (adr= Nice and dc=Nice) RES = Project V2 (pl#, plname)

SQL3 (object relational) - Example What are the pilots (number and names) from Nice who are in duty (flights) from Nice ? SELECT REFPIL PL#,PLNAME FROM flight WHERE DC= 'Nice' and REFPIL ADR =Nice; Note : with : - REFPIL attribute of REF type encompassing ROWID (OID) from Pilot (Rowid is a tuple pointer not a value) - « » : Dereferencing operator

OQL (ODMG) - Example- What are the pilots (number and names) from Nice who are in duty (flights) from Nice ? SELECT p.Pl#, p.PLNAME FROM p in pilot f in p.insureflight WHERE p.adr= Nice and f.dc=Nice; Note : With « insureflight», bidirectional REF pointer defined in ODMG schema from PILOT class to FLIGHT class

RDF graph (Example) :Serge:AF100 AIRBUSA320 Paul :ispasengeroflight:isusedinflight : insureflight :drivesplane

SPARQL (Example) What are the pilots from Nice in duty (flight) from Nice ? Prefix rdf : SELECT ? Pilote WHERE { GRAPH ?g { ?pilote rdf :address rdf: Nice ?vol rdf:dc rdf: Nice }}

Mathematics underlying data mngt DATA type paradigm ptiesData model Math theory Data structures Data opsREF Structured data (TRANSACTION oriented) VALUE POINTER/ VALUE TIPS RICE Codds RM Object RM (DATEs 3rd Manifesto) SET GRAPH Relation/ TABLE NF2, CLASS Relational algebra (Codd 70) (DATE 95) Semi- structured DATA (SEARCH oriented) PREDICATE /VALUE WHAT RDF data model GRAPHCLASS(RDF 98) UNSTRUCTURED DATA (analytics oriented) KEY/ VALUE &Graph WHAT Key/blob Key/doc Key/column GRAPHCLASS & DOCUMENT NOMAD ALGEBRA (NA) Associative Array (AA) algebra ( Chang 08) NEWSQL (analytics oriented) VALUE & Graph RM (sparse) MATRICES TABLESNA & AA (Cattel 10) Polystore//D4 model ARRAYS arraysAA (Duggan 15)

SQL, RDF, NoSQL and NewSQL on an example SQL : a set of rows within a table Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis FLIGHT2F#PilotNamePlaneN AF100SergeAirbusA320 AF101PeterB747 AF102SergeB747 SELECT * From FLIGHT2 WHERE Pilotname=Serge;

Semi-structured data: RDF graph (for 1st Row) :Serge:AF100 AIRBUSA320 :isusedinflight : insureflight :drivesplane

NoSQL (graphs) Serge AF100AF102 A320 B747 Peter AF101

NewSQL (matrix) AF100 AF101 AF102 Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis MT V MTV SergePeter

Mathematical bridge over SQL, NO SQL and NEWSQL Erik Meijer, Partner Architect, Microsoft, CACM 2011 …the industry needs a common query language and data model to feed the ecosystem for key-value stores. The UnQL language presents an important practical next step in this process. We are looking forward to working with Couchbase and other industry leaders in the NoSQL space on taking the design to the next level. (CACM 2011 on CoSQL) Jeremy Kepner (MIT, 2016) « An effective mathematical model that encompasses the concepts of SQL, NoSQL and NewSQL would enable their interoperability »

Bridging the gap between SQL, NO SQL and NEWSQL 2 existing Formal approaches « associative arrays » (MIT, 2016) « category theory » (Microsoft Research 2011) Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis

SQL and No SQL/new SQL SQL focus on SET THEORY for TRANSACTIONS NoSQL and NewSQL focus on Graph theory and Matrix mathematics for high-performance DATA ANALYSIS with mathematical properties such as : Associativity, commutativity, distributivity, identity, annihilator and inverses Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis

coSQL, Microsoft Research (CACM, 2011)* *« A co-Relational Model of Data for Large Shared Data Banks » Erik Meijer and Gavin Bierman, Microsoft Research, CACM 2011 DATA MODEL for common noSQL databasesnamely, key/value relationships demonstrate that this data model is the mathematical dual of CODDs relational data model of foreign-/primary key relationships. « instead deeply connected via beautiful mathematical theory (category theory)»

EX : « Category » (labelled directed graph) wikipedia Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis Nodes of a category : « OBJECTS » (identity arrow for each object) Arrows of a category : « morphism » « COMPOSITION », « IDENTITY » and « DUALITY » EX : Schematic representation of a category with objects X, Y, Z and morphisms f, g, g f. (The category's three identity morphisms 1 X, 1 Y and 1 Z, if explicitly represented, would appear as three arrows, from the letters X, Y, and Z to themselves, respectively.)

« MONADS » (Category) and UQL Big Data ManagementMathematical modelParadigm SQLCodds relational data model (SET THEORY or 1st Order LOGIC) VALUE PRIMARY KEY/FOREIGN KEY Synchronous ACID N.O.SQLGRAPHS & HypergraphsKEY/VALUE Asynchronous BASE UQLCATEGORY THEORY MONADS (comprehension)» « key/value stores are the mathematical DUAL of SQL FK/PK » coSQL 2011

towards a « Monad Algebra » for NoSQL (key value data stores) & SQL F-algebra (F /Functor) in category theory to generalize algebraic structure to model data structures as Lists and trees Lattices are F-algbra Algebraic structures are F-algebra … Towards a « MONAD ALGEBRA » for SQL and NoSQL (Key value data stores) Machine Learning interfaces Operators to be emulated in Category algebra : Equivalence, Rename PROJECT (Extended projection), SELECT, JOIN UNION, INTERSECTION, DIFFERENCE AGGREGATION (aggregation function on all the values of a column) ML and DL (matrix) Copyright Big Data 2018 Pr Serge Miranda, MBDS, Univ de Nice Sophia Antipolis

BIG QUESTION S?

BIO Serge Miranda is a full-time Professor of Computer Science at the University of Nice Sophia Antipolis (UNS), France, a position he has held since October 1983 after a PH-D in Toulouse University (France) and a Master thesis at UCLA (University of California, Los Angeles with an INRIA Scholarship). He has been running a MBDS master degree and innovation laboratory (since 1992) devoted to data base, Big Data and mobiquitous information systems with important financial involvement of industry partners to prototype information services of the future. MBDS is de facto an INNOVATION laboratory which gave the key initial impetus of Nice becoming the 1 st NFC city in Europe in MBDS has been successfully delocalized in Haiti (since 1998), Morocco, Madagascar and Russia. Serge Miranda founded and became the first president (until 2012) of a multidisciplinary University foundation DreamIT on December 2009 around the MBDS kernel. DreamIT was key in rebuilding MBDS facility in Haiti: on April 2011, was inaugurated the first mobiquitous smart building in America for MBDS degree in the Science Campus of the University of Haiti in Port of Prince (the initial MBDS building was destroyed during the devastating 2010 earthquake) On March the 21st 1998, he was decorated (Chevalier Ordre du Merite) by Senator Pierre Laffitte (founder of Sophia Antipolis science park) on behalf of the Ministry of Industry of France for recognition of his original contribution between higher education and industry in the science park of Sophia Antipolis. Strictement Confidentiel Pr Serge Miranda (Directeur MBDS)