SlideShare a Scribd company logo
2015- © GMC - 1
with Apache MetaModel
Unified access to all your
data points
2015- © GMC - 2
Who am I?
Kasper Sørensen, dad, geek, guitarist …
@kaspersor
Long-time developer and
PMC member of:
Founder also of another
nice open source project:
Principal Software Engineer @
2015- © GMC - 3
Session agenda
- Introduction to Apache MetaModel
- Use case: Query composition
- The role of Apache MetaModel in Big Data
- (New) architectural possibilities with Apache MetaModel
Introduction
to Apache MetaModel
2015- © GMC - 5
The Apache journey
2011-xx: MetaModel founded (outside of Apache)
2013-06: Incubation of Apache MetaModel starts
2014-12: Apache MetaModel is officially an Apache TLP
2015-08: Latest stable release, 4.3.6
2015-09: Latest RC, 4.4.0-RC1
- We're still a small project, just 10 committers (plus some
extra contributors), so we have room for you!
2015- © GMC - 6
Helicopter view
You can look at Apache MetaModel by it's formal description:
"… a uniform connector and query API
to many very different datastore types …"
But let's start with a problem to solve ...
2015- © GMC - 7
A problem
How to process multiple data sources while:
- Staying agnostic to the database engine.
- Respecting the metadata of the underlying database.
- Not repeating yourself.
- Avoiding fragility towards metadata changes.
- Mutating the data structure when needed.
We used to rely on ORM frameworks to handle the bulk of these ...
2015- © GMC - 8
ORM - Queries via domain models
public class Person {
public String getName() {...}
}
ORM.query(Person.class).where(Person::getName).eq("John Doe");
2015- © GMC - 9
Why an ORM might not work
Requires a domain model
- While many data-oriented apps are agnostic to a domain model.
- Sometimes the data itself is the domain.
Model cannot change at runtime
- Usually needed especially if your app creates new tables/entities
Type-safety through metadata assumptions
- This is quite common, yet it might be a problem that the model is statically
mapped and cannot survive e.g. column renames etc.
2015- © GMC - 10
Alternative to ORM: Use JDBC
Metadata discoverable via java.sql.DatabaseMetaData.
Queries can be assembled safely with a bit of String magic:
"SELECT " + columnNames + " FROM " + tableName + …
...
2015- © GMC - 11
Alternative to ORM: Use JDBC
Wrong!
It turns out that …
- not all databases have the same SQL "dialect".
- they also don't all implement DatabaseMetaData the same way.
- you cannot use this on much else than relational databases that use SQL.
- What about NoSQL, various file formats, web services etc.
2015- © GMC - 12
MetaModel - Queries via metadata model
DataContext dc = …
Table[] tables = dc.getDefaultSchema().getTables();
Table table = tables[0];
Column[] primaryKeys = table.getPrimaryKeys();
dc.query().from(table).selectAll().where(primaryKeys[0]).eq(42);
dc.query().from("person").selectAll().where("id").eq(42);
2015- © GMC - 13
MetaModel updates
UpdateableDataContext dc = …
dc.executeUpdate(new UpdateScript() {
// multiple updates go here - transactional characteristics (ACID, synchronization etc.)
// as per the capabilities of the concrete DataContext implementation.
});
dc.executeUpdate(new BatchUpdateScript() {
// multiple "batch/bulk" (non atomic) updates go here
});
// if I only want to do a single update, there are convenience classes for this
InsertInto insert = new InsertInto(table).value(nameColumn, "John Doe");
dc.executeUpdate(insert);
2015- © GMC - 14
MetaModel - Connectivity
Sometimes we have SQL, sometimes we have another native query engine (e.g.
Lucene) and sometimes we use MetaModel's own query engine.
(A few more connectors available via the MetaModel-extras (LGPL) project)
2015- © GMC - 15
Tables, Columns and SQL
– that doesn't sound right.
This is one of our trade-offs to make life easier!
We map NoSQL models (doc, key/value etc.) to a table-based model.
As a user you have some choices:
• Supply your own mapping (non-dynamic).
• Use schema inference provided by Apache MetaModel.
• Provide schema inference instructions
(e.g. turn certain key/value pairs into separate tables).
2015- © GMC - 16
Other key characteristic about MetaModel
• Non-intrusive, you can use it when you want to.
• It's just a library – no need for a running a separate service.
• Easy to test, stub and mock. Inject a replacement DataContext
instead of the real one (typically PojoDataContext).
• Easy to implement a new DataContext for your curious
database or data format – reuse our abstract
implementations.
Use case:
Query composition
2015- © GMC - 18
MetaModel schema model
2015- © GMC - 19
MetaModel query and schema model
2015- © GMC - 20
Query representation for developers?
"SELECT * FROM foo"
Query as a string
- Easy to write.
- Easy to read.
- Prone to parsing errors.
- Prone to typos etc.
- Fails at runtime.
Query q = new Query();
q.from(table).selectAll();
Query as an object
- More involved code to
read and write.
- Lends itself to inspection
and mutation.
- Fails at compile time.
2015- © GMC - 21
This STATUS=DELIVERED
filter never actually executes, it
just updates the query on the
ORDERS table.
Context-based Query optimization
You might not always need to
pass data around between
components …
Sometimes you can just pass
the query around!
The role of Apache MetaModel
in Big Data
2015- © GMC - 23
So how does this all
relate to Big Data?
2015- © GMC - 24
Big Data and the need for metadata
Variety
- Not only structured data
- Social
- Sensors
- Many new sources, but
also the “old”:
• Relational
• NoSQL
• Files
• Cloud
Volume
Volume
Velocity
Volume
Velocity
Variety
Volume
Velocity
Variety
Veracity ?
2015- © GMC - 29
Query language examples
SELECT * FROM customers
WHERE country_code = 'GB'
OR country_code IS NULL
db.customers.find({
$or: [
{"country.code": "GB"},
{"country.code": {$exists: false}}
]
})
for (line : customers.csv) {
values = parse(line);
country = values[country_index];
if (country == null || "GB".equals(country) {
emit(line);
}
}
SQL
CSV
MongoDB
2015- © GMC - 30
Query language examples
dataContext.query()
.from(customers)
.selectAll()
.where(countryCode).eq("GB")
.or(countryCode).isNull();
Any datastore
2015- © GMC - 31
Make it easy to ingest data in your lake
2015- © GMC - 32
Metadata to enable automation
Big Data Variety and Veracity means we will be handling:
● Different physical formats of the same data
● Different query engines
● Different quality-levels of data
How can we automate data ingestion in such a landscape?
2015- © GMC - 33
Metadata to enable automation
Big Data Variety and Veracity means we will be handling:
● Different physical formats of the same data
We need a uniform metamodel for all the datastores.
And enough metadata to infer the ingestion transformations needed.
● Different query engines
We need a uniform query API based on the metamodel.
● Different quality-levels of data
We need our ingestion target to be aware of the ingestion sources.
2015- © GMC - 34
Metadata to enable automation
Big Data Variety and Veracity means we will be handling:
● Different physical formats of the same data
We need a uniform metamodel for all the datastores.
And enough metadata to infer the ingestion transformations needed.
● Different query engines
We need a uniform query API based on the metamodel.
● Different quality-levels of data
We need our ingestion target to be aware of the ingestion sources.
As an industry we lack more
elaborate metadata to support this!
2015- © GMC - 35
Traditional view on metadata
user
id (pk) CHAR(32) java.lang.String not null
username VARCHAR(64) java.lang.String not null, unique
real_name VARCHAR(256) java.lang.String not null
address VARCHAR(256) java.lang.String nullable
age INT int nullable
2015- © GMC - 36
Traditional view on metadata
user
id (pk) CHAR(32) java.lang.String not null
username VARCHAR(64) java.lang.String not null, unique
real_name VARCHAR(256) java.lang.String not null
address VARCHAR(256) java.lang.String nullable
age INT int nullable
customer
id (pk) BIGINT long not null
firstname VARCHAR(128) java.lang.String not null
lastname VARCHAR(128) java.lang.String not null
street VARCHAR(64) java.lang.String not null
house number INT int not null
2015- © GMC - 37
A more elaborate metadata view
user
id (pk) CHAR(32) java.lang.String not null
username VARCHAR(64) java.lang.String not null, unique
real_name VARCHAR(256) java.lang.String not null
address VARCHAR(256) java.lang.String nullable
age INT int nullable
customer
id (pk) BIGINT long not null
firstname VARCHAR(128) java.lang.String not null
lastname VARCHAR(128) java.lang.String not null
street VARCHAR(64) java.lang.String not null
house number INT int not null
32 char UUID
Person full name
Address (unstructured)
Person first name
Person last name
Address part - street
Same real-
world
entity?
Address part - house number
Numeric nominal variable
Person age
Numeric ratio variable
2015- © GMC - 38
Elaborate metadata and querying
SELECT (columns with header 'name') FROM (customer and user)
SELECT firstname, lastname FROM customer
SELECT real_name FROM user
SELECT EXTRACT(full name FROM (name columns))
FROM (tables containing person names)
SELECT (person name columns) FROM (customer and user)
Static
Manual
Dynamic
Automated
(or supervised)
2015- © GMC - 39
Elaborate metadata and querying
SELECT street, house_number FROM customer
SELECT address FROM user
SELECT EXTRACT(street, hno, city, zip) FROM (Address part columns))
FROM (tables containing addresses)
SELECT (Address part columns) FROM (customer and user)
Static
Manual
Dynamic
Automated
(or supervised)
2015- © GMC - 40
Elaborate metadata and querying
Individual/specific DB connectors
JDBC
Apache MetaModel (today)
Apache MetaModel (future)
Static
Manual
Dynamic
Automated
(or supervised)
(New) architectural possibilities
with Apache MetaModel
2015- © GMC - 42
Data Integration scenario
Copy (ETL)
2015- © GMC - 43
Data Federation scenario
2015- © GMC - 44
What about speed?
Performance
- Performance is important in
development of MetaModel.
- But not in favor of uniformness.
- In some cases the metadata may
benefit performance by
automatically tweaking query
parameters (e.g. fetching
strategies).
- We usually expose the native
connector objects too.
Time to market
- MetaModel makes it easy to cover
all the data sources with the same
codebase.
- Typical 80/20 trade-off scenario.
- Avoid premature optimization.
2015- © GMC - 45
The future of Apache MetaModel?
Stuff currently being built (version 4.4.0):
• Pluggable operators in queries
• Pluggable functions in queries
• Phase-out Java 6 support
Ideas being prototyped:
• Elaborate metadata service
• Spark module (Turn a Query into a RDD or DataFrame)
• More connectors - Apache Solr, Neo4j, Couchbase, Riak
Let's see some code
Query a remote CSV file
https://github.com/kaspersorensen/ApacheBigDataMetaModelExample
Thank you!
Questions?

More Related Content

What's hot

Spark sql
Spark sqlSpark sql
Spark sql
Zahra Eskandari
 
Spark sql
Spark sqlSpark sql
Spark sql
Freeman Zhang
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
Ryan Bosshart
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
Yan Zhou
 
Data Migration with Spark to Hive
Data Migration with Spark to HiveData Migration with Spark to Hive
Data Migration with Spark to Hive
Databricks
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
Sandish Kumar H N
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
Caserta
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
DataWorks Summit
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Simplilearn
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
Simplilearn
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
Provectus
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
DataWorks Summit
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
Takuya UESHIN
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Skills Matter
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
Eric Sun
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsar
Timothy Spann
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
Meeraj Kunnumpurath
 

What's hot (20)

Spark sql
Spark sqlSpark sql
Spark sql
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 
Data Migration with Spark to Hive
Data Migration with Spark to HiveData Migration with Spark to Hive
Data Migration with Spark to Hive
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Spark,Hadoop,Presto Comparition
Spark,Hadoop,Presto ComparitionSpark,Hadoop,Presto Comparition
Spark,Hadoop,Presto Comparition
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons          Using Apache Spark as ETL engine. Pros and Cons
Using Apache Spark as ETL engine. Pros and Cons
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
Easy, Scalable, Fault-tolerant stream processing with Structured Streaming in...
 
20140908 spark sql & catalyst
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Ingesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsarIngesting data at scale into elasticsearch with apache pulsar
Ingesting data at scale into elasticsearch with apache pulsar
 
Machine Learning by Example - Apache Spark
Machine Learning by Example - Apache SparkMachine Learning by Example - Apache Spark
Machine Learning by Example - Apache Spark
 

Similar to Apache MetaModel - unified access to all your data points

黑豹 ch4 ddd pattern practice (2)
黑豹 ch4 ddd pattern practice (2)黑豹 ch4 ddd pattern practice (2)
黑豹 ch4 ddd pattern practice (2)
Fong Liou
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
Mario Beck
 
Elevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBCElevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBC
MongoDB
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
James Anderson
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worlds
Andrew Morgan
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI Connectors
Sumit Sarkar
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
Embarcadero Technologies
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
DataWorks Summit
 
Ad hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and SparkAd hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and Spark
Mohammed Guller
 
Practical OData
Practical ODataPractical OData
Practical OData
Vagif Abilov
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
kbajda
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
PascalDesmarets1
 
Adding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSASAdding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSAS
sumedha.r
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
Simone Campora
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Codemotion
 
From Legacy Database to Domain Layer Using a New Cincom VisualWorks Tool
From Legacy Database to Domain Layer Using a New Cincom VisualWorks ToolFrom Legacy Database to Domain Layer Using a New Cincom VisualWorks Tool
From Legacy Database to Domain Layer Using a New Cincom VisualWorks Tool
ESUG
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7
Morgan Tocker
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
Srivatsan Ramanujam
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
Deepak Chandramouli
 

Similar to Apache MetaModel - unified access to all your data points (20)

黑豹 ch4 ddd pattern practice (2)
黑豹 ch4 ddd pattern practice (2)黑豹 ch4 ddd pattern practice (2)
黑豹 ch4 ddd pattern practice (2)
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
 
Elevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBCElevate MongoDB with ODBC/JDBC
Elevate MongoDB with ODBC/JDBC
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
FOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worldsFOSDEM 2015 - NoSQL and SQL the best of both worlds
FOSDEM 2015 - NoSQL and SQL the best of both worlds
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI Connectors
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Ad hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and SparkAd hoc analytics with Cassandra and Spark
Ad hoc analytics with Cassandra and Spark
 
Practical OData
Practical ODataPractical OData
Practical OData
 
Presto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 BostonPresto talk @ Global AI conference 2018 Boston
Presto talk @ Global AI conference 2018 Boston
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
 
Adding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSASAdding Data into your SOA with WSO2 WSAS
Adding Data into your SOA with WSO2 WSAS
 
Mondrian - Geo Mondrian
Mondrian - Geo MondrianMondrian - Geo Mondrian
Mondrian - Geo Mondrian
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
From Legacy Database to Domain Layer Using a New Cincom VisualWorks Tool
From Legacy Database to Domain Layer Using a New Cincom VisualWorks ToolFrom Legacy Database to Domain Layer Using a New Cincom VisualWorks Tool
From Legacy Database to Domain Layer Using a New Cincom VisualWorks Tool
 
Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7Upcoming changes in MySQL 5.7
Upcoming changes in MySQL 5.7
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Scale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | GimelScale By The Bay | 2020 | Gimel
Scale By The Bay | 2020 | Gimel
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

Apache MetaModel - unified access to all your data points

  • 1. 2015- © GMC - 1 with Apache MetaModel Unified access to all your data points
  • 2. 2015- © GMC - 2 Who am I? Kasper Sørensen, dad, geek, guitarist … @kaspersor Long-time developer and PMC member of: Founder also of another nice open source project: Principal Software Engineer @
  • 3. 2015- © GMC - 3 Session agenda - Introduction to Apache MetaModel - Use case: Query composition - The role of Apache MetaModel in Big Data - (New) architectural possibilities with Apache MetaModel
  • 5. 2015- © GMC - 5 The Apache journey 2011-xx: MetaModel founded (outside of Apache) 2013-06: Incubation of Apache MetaModel starts 2014-12: Apache MetaModel is officially an Apache TLP 2015-08: Latest stable release, 4.3.6 2015-09: Latest RC, 4.4.0-RC1 - We're still a small project, just 10 committers (plus some extra contributors), so we have room for you!
  • 6. 2015- © GMC - 6 Helicopter view You can look at Apache MetaModel by it's formal description: "… a uniform connector and query API to many very different datastore types …" But let's start with a problem to solve ...
  • 7. 2015- © GMC - 7 A problem How to process multiple data sources while: - Staying agnostic to the database engine. - Respecting the metadata of the underlying database. - Not repeating yourself. - Avoiding fragility towards metadata changes. - Mutating the data structure when needed. We used to rely on ORM frameworks to handle the bulk of these ...
  • 8. 2015- © GMC - 8 ORM - Queries via domain models public class Person { public String getName() {...} } ORM.query(Person.class).where(Person::getName).eq("John Doe");
  • 9. 2015- © GMC - 9 Why an ORM might not work Requires a domain model - While many data-oriented apps are agnostic to a domain model. - Sometimes the data itself is the domain. Model cannot change at runtime - Usually needed especially if your app creates new tables/entities Type-safety through metadata assumptions - This is quite common, yet it might be a problem that the model is statically mapped and cannot survive e.g. column renames etc.
  • 10. 2015- © GMC - 10 Alternative to ORM: Use JDBC Metadata discoverable via java.sql.DatabaseMetaData. Queries can be assembled safely with a bit of String magic: "SELECT " + columnNames + " FROM " + tableName + … ...
  • 11. 2015- © GMC - 11 Alternative to ORM: Use JDBC Wrong! It turns out that … - not all databases have the same SQL "dialect". - they also don't all implement DatabaseMetaData the same way. - you cannot use this on much else than relational databases that use SQL. - What about NoSQL, various file formats, web services etc.
  • 12. 2015- © GMC - 12 MetaModel - Queries via metadata model DataContext dc = … Table[] tables = dc.getDefaultSchema().getTables(); Table table = tables[0]; Column[] primaryKeys = table.getPrimaryKeys(); dc.query().from(table).selectAll().where(primaryKeys[0]).eq(42); dc.query().from("person").selectAll().where("id").eq(42);
  • 13. 2015- © GMC - 13 MetaModel updates UpdateableDataContext dc = … dc.executeUpdate(new UpdateScript() { // multiple updates go here - transactional characteristics (ACID, synchronization etc.) // as per the capabilities of the concrete DataContext implementation. }); dc.executeUpdate(new BatchUpdateScript() { // multiple "batch/bulk" (non atomic) updates go here }); // if I only want to do a single update, there are convenience classes for this InsertInto insert = new InsertInto(table).value(nameColumn, "John Doe"); dc.executeUpdate(insert);
  • 14. 2015- © GMC - 14 MetaModel - Connectivity Sometimes we have SQL, sometimes we have another native query engine (e.g. Lucene) and sometimes we use MetaModel's own query engine. (A few more connectors available via the MetaModel-extras (LGPL) project)
  • 15. 2015- © GMC - 15 Tables, Columns and SQL – that doesn't sound right. This is one of our trade-offs to make life easier! We map NoSQL models (doc, key/value etc.) to a table-based model. As a user you have some choices: • Supply your own mapping (non-dynamic). • Use schema inference provided by Apache MetaModel. • Provide schema inference instructions (e.g. turn certain key/value pairs into separate tables).
  • 16. 2015- © GMC - 16 Other key characteristic about MetaModel • Non-intrusive, you can use it when you want to. • It's just a library – no need for a running a separate service. • Easy to test, stub and mock. Inject a replacement DataContext instead of the real one (typically PojoDataContext). • Easy to implement a new DataContext for your curious database or data format – reuse our abstract implementations.
  • 18. 2015- © GMC - 18 MetaModel schema model
  • 19. 2015- © GMC - 19 MetaModel query and schema model
  • 20. 2015- © GMC - 20 Query representation for developers? "SELECT * FROM foo" Query as a string - Easy to write. - Easy to read. - Prone to parsing errors. - Prone to typos etc. - Fails at runtime. Query q = new Query(); q.from(table).selectAll(); Query as an object - More involved code to read and write. - Lends itself to inspection and mutation. - Fails at compile time.
  • 21. 2015- © GMC - 21 This STATUS=DELIVERED filter never actually executes, it just updates the query on the ORDERS table. Context-based Query optimization You might not always need to pass data around between components … Sometimes you can just pass the query around!
  • 22. The role of Apache MetaModel in Big Data
  • 23. 2015- © GMC - 23 So how does this all relate to Big Data?
  • 24. 2015- © GMC - 24 Big Data and the need for metadata Variety - Not only structured data - Social - Sensors - Many new sources, but also the “old”: • Relational • NoSQL • Files • Cloud
  • 29. 2015- © GMC - 29 Query language examples SELECT * FROM customers WHERE country_code = 'GB' OR country_code IS NULL db.customers.find({ $or: [ {"country.code": "GB"}, {"country.code": {$exists: false}} ] }) for (line : customers.csv) { values = parse(line); country = values[country_index]; if (country == null || "GB".equals(country) { emit(line); } } SQL CSV MongoDB
  • 30. 2015- © GMC - 30 Query language examples dataContext.query() .from(customers) .selectAll() .where(countryCode).eq("GB") .or(countryCode).isNull(); Any datastore
  • 31. 2015- © GMC - 31 Make it easy to ingest data in your lake
  • 32. 2015- © GMC - 32 Metadata to enable automation Big Data Variety and Veracity means we will be handling: ● Different physical formats of the same data ● Different query engines ● Different quality-levels of data How can we automate data ingestion in such a landscape?
  • 33. 2015- © GMC - 33 Metadata to enable automation Big Data Variety and Veracity means we will be handling: ● Different physical formats of the same data We need a uniform metamodel for all the datastores. And enough metadata to infer the ingestion transformations needed. ● Different query engines We need a uniform query API based on the metamodel. ● Different quality-levels of data We need our ingestion target to be aware of the ingestion sources.
  • 34. 2015- © GMC - 34 Metadata to enable automation Big Data Variety and Veracity means we will be handling: ● Different physical formats of the same data We need a uniform metamodel for all the datastores. And enough metadata to infer the ingestion transformations needed. ● Different query engines We need a uniform query API based on the metamodel. ● Different quality-levels of data We need our ingestion target to be aware of the ingestion sources. As an industry we lack more elaborate metadata to support this!
  • 35. 2015- © GMC - 35 Traditional view on metadata user id (pk) CHAR(32) java.lang.String not null username VARCHAR(64) java.lang.String not null, unique real_name VARCHAR(256) java.lang.String not null address VARCHAR(256) java.lang.String nullable age INT int nullable
  • 36. 2015- © GMC - 36 Traditional view on metadata user id (pk) CHAR(32) java.lang.String not null username VARCHAR(64) java.lang.String not null, unique real_name VARCHAR(256) java.lang.String not null address VARCHAR(256) java.lang.String nullable age INT int nullable customer id (pk) BIGINT long not null firstname VARCHAR(128) java.lang.String not null lastname VARCHAR(128) java.lang.String not null street VARCHAR(64) java.lang.String not null house number INT int not null
  • 37. 2015- © GMC - 37 A more elaborate metadata view user id (pk) CHAR(32) java.lang.String not null username VARCHAR(64) java.lang.String not null, unique real_name VARCHAR(256) java.lang.String not null address VARCHAR(256) java.lang.String nullable age INT int nullable customer id (pk) BIGINT long not null firstname VARCHAR(128) java.lang.String not null lastname VARCHAR(128) java.lang.String not null street VARCHAR(64) java.lang.String not null house number INT int not null 32 char UUID Person full name Address (unstructured) Person first name Person last name Address part - street Same real- world entity? Address part - house number Numeric nominal variable Person age Numeric ratio variable
  • 38. 2015- © GMC - 38 Elaborate metadata and querying SELECT (columns with header 'name') FROM (customer and user) SELECT firstname, lastname FROM customer SELECT real_name FROM user SELECT EXTRACT(full name FROM (name columns)) FROM (tables containing person names) SELECT (person name columns) FROM (customer and user) Static Manual Dynamic Automated (or supervised)
  • 39. 2015- © GMC - 39 Elaborate metadata and querying SELECT street, house_number FROM customer SELECT address FROM user SELECT EXTRACT(street, hno, city, zip) FROM (Address part columns)) FROM (tables containing addresses) SELECT (Address part columns) FROM (customer and user) Static Manual Dynamic Automated (or supervised)
  • 40. 2015- © GMC - 40 Elaborate metadata and querying Individual/specific DB connectors JDBC Apache MetaModel (today) Apache MetaModel (future) Static Manual Dynamic Automated (or supervised)
  • 42. 2015- © GMC - 42 Data Integration scenario Copy (ETL)
  • 43. 2015- © GMC - 43 Data Federation scenario
  • 44. 2015- © GMC - 44 What about speed? Performance - Performance is important in development of MetaModel. - But not in favor of uniformness. - In some cases the metadata may benefit performance by automatically tweaking query parameters (e.g. fetching strategies). - We usually expose the native connector objects too. Time to market - MetaModel makes it easy to cover all the data sources with the same codebase. - Typical 80/20 trade-off scenario. - Avoid premature optimization.
  • 45. 2015- © GMC - 45 The future of Apache MetaModel? Stuff currently being built (version 4.4.0): • Pluggable operators in queries • Pluggable functions in queries • Phase-out Java 6 support Ideas being prototyped: • Elaborate metadata service • Spark module (Turn a Query into a RDD or DataFrame) • More connectors - Apache Solr, Neo4j, Couchbase, Riak
  • 46. Let's see some code Query a remote CSV file https://github.com/kaspersorensen/ApacheBigDataMetaModelExample