Your SlideShare is downloading. ×
0
1JaMU – Jakarta 7 Maret 2014
Pentaho
and NoSQL
Java Meet Up (JaMU), Jakarta
7th March, 2014
Feris Thia
feris@phi-integrati...
2JaMU – Jakarta 7 Maret 2014
ABOUT ME
Founder
2007 2013
Feris Thia
PHI-Integration
3JaMU – Jakarta 7 Maret 2014
ABOUT ME
Book Author
Feris Thia
November 2013
4JaMU – Jakarta 7 Maret 2014
ABOUT ME
Community Manager
Feris Thia
Excel Indonesia User
Group (EIUG)
Pentaho User Group
In...
5JaMU – Jakarta 7 Maret 2014
ABOUT ME
PHI-Integration Clients
Community Manager
Feris Thia
6JaMU – Jakarta 7 Maret 2014
AGENDA
DATA PREPARATION
What and why it is important?
PENTAHO DATA INTEGRATION
Popular Open S...
7JaMU – Jakarta 7 Maret 2014
PROBLEMS?
8JaMU – Jakarta 7 Maret 2014
image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/
What caus...
9JaMU – Jakarta 7 Maret 2014
Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/
TYPICAL SOLUTION
SOPHISTICATED REPO...
10JaMU – Jakarta 7 Maret 2014
Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg
PROBLEMS R...
11JaMU – Jakarta 7 Maret 2014
Time Spent on Data Preparation
80 %
Data Quality
50%
Extract, Transformation & Load
30%
12JaMU – Jakarta 7 Maret 2014
13JaMU – Jakarta 7 Maret 2014
DATA PREPARATION IS THE KEY
Entry Systems Data Preparation
Reporting
Basic Data
Presentation...
14JaMU – Jakarta 7 Maret 2014
DATA WAREHOUSE
Entry Systems Data Warehouse
Business
Intelligence
1 2 3
15JaMU – Jakarta 7 Maret 2014
DATA WAREHOUSE
16JaMU – Jakarta 7 Maret 2014
CHALLENGES
17JaMU – Jakarta 7 Maret 2014
INTEGRATION
of many data sources
INCREMENTAL
Extract only changes
DATASIZE
Big data
INFRASTR...
18JaMU – Jakarta 7 Maret 2014
NORMALIZE
DENORMALIZE
SPLIT/ MERGE
DATAREDUCTION
(Aggregate,etc)
TRANSPOSE
TEXTPARSING
TRANS...
19JaMU – Jakarta 7 Maret 2014
PERFORMANCE
of many data sources
CHANGES
structure, data type, column
size, etc
DATASIZE
Big...
20JaMU – Jakarta 7 Maret 2014
DEMO
Data structure changes to increase SQL query performance.
21JaMU – Jakarta 7 Maret 2014
Pentaho Data Integration
Open Source ETL
22JaMU – Jakarta 7 Maret 2014
FEATURES AND BENEFITS
• Open Source
• Cost Efficient
• More than 200 modules
• Multi OS Plat...
23JaMU – Jakarta 7 Maret 2014
DEMO
Basic Extract
and
Transformaion
More I/O
Helper Table
(Closure)
1 2 3
24JaMU – Jakarta 7 Maret 2014
NoSQL
Not only SQL
25JaMU – Jakarta 7 Maret 2014
2009
Redis Initial Release
TIMELINE
Emergence of open source NoSQL
2004 2006 2007 2008 2009 ...
26JaMU – Jakarta 7 Maret 2014
NOSQL GROUPS
DOCUMENT
MongoDB, CouchDB, Ria
k
WIDE COLUMN
Cassandra, Hbase, Hype
rtable
GRAP...
27JaMU – Jakarta 7 Maret 2014
NOSQL VS SQL
http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-desti...
28JaMU – Jakarta 7 Maret 2014
Nosql VS SQL
• Schema are much more flexible
• Non relational (no joins)
• Horizontal Scalab...
29JaMU – Jakarta 7 Maret 2014
DB-ENGINES.COM DB RANKING
PER 7 MARCH 2014
Rank Last Month DBMS Database Model Score Changes...
30JaMU – Jakarta 7 Maret 2014
MongoDB
Document Oriented Database
• Schemaless
• Distributed
• Auto Sharding
• Map Reduce C...
31JaMU – Jakarta 7 Maret 2014
MongoDB
• MongoDB Shell
• Insert
db.koleksi.insert( {nama: “PHI-Integration”, type: “Company...
32JaMU – Jakarta 7 Maret 2014
MONGODB DEMO
Basic
Commands
PDI Extract
and
Load
Aggregation
Framework
1 2 3
33JaMU – Jakarta 7 Maret 2014
Neo4j
Graph Database
Properties
Relationship
Cypher
Node
34JaMU – Jakarta 7 Maret 2014
Neo4J
• Neo4J Web Admin
• Create Node
CREATE (n {property_name :“property_value" })
• Create...
35JaMU – Jakarta 7 Maret 2014
Neo4J
• Matching and Returning Objects
START emil=node:people(name='Emil')
MATCH emil-[:MARR...
36JaMU – Jakarta 7 Maret 2014
HIERARCHICAL MODEL
Neo4j Case Demo
Root
Child 3 Child 4Child 2Child 1 Child 5
37JaMU – Jakarta 7 Maret 2014
Q&A
38JaMU – Jakarta 7 Maret 2014
Universitas Multimedia Nusantara
New Media Tower, Lv.12
Scientia Boulevard St.
Tangerang, Ba...
39JaMU – Jakarta 7 Maret 2014
BIG
THANK YOU!
Upcoming SlideShare
Loading in...5
×

Pentaho and NoSQL

1,421

Published on

This is the powerpoint slide presentation given during Jakarta Java Meet Up, 7th March 2014 at BliBli.com.

Transcript of "Pentaho and NoSQL"

  1. 1. 1JaMU – Jakarta 7 Maret 2014 Pentaho and NoSQL Java Meet Up (JaMU), Jakarta 7th March, 2014 Feris Thia feris@phi-integration.com 08176-474-525
  2. 2. 2JaMU – Jakarta 7 Maret 2014 ABOUT ME Founder 2007 2013 Feris Thia PHI-Integration
  3. 3. 3JaMU – Jakarta 7 Maret 2014 ABOUT ME Book Author Feris Thia November 2013
  4. 4. 4JaMU – Jakarta 7 Maret 2014 ABOUT ME Community Manager Feris Thia Excel Indonesia User Group (EIUG) Pentaho User Group Indonesia (Pentaho-ID) 2008 (~1000 members) 2013 (~5000 members)
  5. 5. 5JaMU – Jakarta 7 Maret 2014 ABOUT ME PHI-Integration Clients Community Manager Feris Thia
  6. 6. 6JaMU – Jakarta 7 Maret 2014 AGENDA DATA PREPARATION What and why it is important? PENTAHO DATA INTEGRATION Popular Open Source ETL NOSQL An Emerging Non Relational DatabaseTechnology
  7. 7. 7JaMU – Jakarta 7 Maret 2014 PROBLEMS?
  8. 8. 8JaMU – Jakarta 7 Maret 2014 image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/ What cause sales increase in this area? Is there something unusual happen? WHAT?? So we cannot make any decisions until the data ready. We need some times to prepare additional data to answer that. Yes, sir….
  9. 9. 9JaMU – Jakarta 7 Maret 2014 Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/ TYPICAL SOLUTION SOPHISTICATED REPORTING OR DASHBOARD APPLICATION!
  10. 10. 10JaMU – Jakarta 7 Maret 2014 Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg PROBLEMS REMAIN…
  11. 11. 11JaMU – Jakarta 7 Maret 2014 Time Spent on Data Preparation 80 % Data Quality 50% Extract, Transformation & Load 30%
  12. 12. 12JaMU – Jakarta 7 Maret 2014
  13. 13. 13JaMU – Jakarta 7 Maret 2014 DATA PREPARATION IS THE KEY Entry Systems Data Preparation Reporting Basic Data Presentation Performance Dashboard (Visualization) 1 2 3 4 Notes: Data preparation is often undermine.
  14. 14. 14JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE Entry Systems Data Warehouse Business Intelligence 1 2 3
  15. 15. 15JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE
  16. 16. 16JaMU – Jakarta 7 Maret 2014 CHALLENGES
  17. 17. 17JaMU – Jakarta 7 Maret 2014 INTEGRATION of many data sources INCREMENTAL Extract only changes DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAQUALITY missing data, conversion etc. PROTOCOL driver availability, reliability, etc. EXTRACT
  18. 18. 18JaMU – Jakarta 7 Maret 2014 NORMALIZE DENORMALIZE SPLIT/ MERGE DATAREDUCTION (Aggregate,etc) TRANSPOSE TEXTPARSING TRANSFORM
  19. 19. 19JaMU – Jakarta 7 Maret 2014 PERFORMANCE of many data sources CHANGES structure, data type, column size, etc DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAMAPPING sync with correlated data Output Format Excel, PDF, HTML, RDBMS, etc. LOAD
  20. 20. 20JaMU – Jakarta 7 Maret 2014 DEMO Data structure changes to increase SQL query performance.
  21. 21. 21JaMU – Jakarta 7 Maret 2014 Pentaho Data Integration Open Source ETL
  22. 22. 22JaMU – Jakarta 7 Maret 2014 FEATURES AND BENEFITS • Open Source • Cost Efficient • More than 200 modules • Multi OS Platform • Working with emerging Big Data platforms • Low Learning Curve
  23. 23. 23JaMU – Jakarta 7 Maret 2014 DEMO Basic Extract and Transformaion More I/O Helper Table (Closure) 1 2 3
  24. 24. 24JaMU – Jakarta 7 Maret 2014 NoSQL Not only SQL
  25. 25. 25JaMU – Jakarta 7 Maret 2014 2009 Redis Initial Release TIMELINE Emergence of open source NoSQL 2004 2006 2007 2008 2009 2011 2012 2013 2014 2007 MongoDB Started, Neo4J Initial Release 2004 Google’s Map Reduce Paper Published 2012 Google Spanner Paper Published 1998 1998 NoSQL coined 2006 Hadoop Started 2008 Apache Hbase, Apache Cassandra
  26. 26. 26JaMU – Jakarta 7 Maret 2014 NOSQL GROUPS DOCUMENT MongoDB, CouchDB, Ria k WIDE COLUMN Cassandra, Hbase, Hype rtable GRAPH Neo4J, OrientDB KEY - VALUE Redis, MemcacheDB, SimpleDB <K, V>
  27. 27. 27JaMU – Jakarta 7 Maret 2014 NOSQL VS SQL http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/ Data Store Type Use Cases Advantages Disadvantages Key Product Key-Value In-memory cache, web-site analytics, log file analysis Simple, replication, versioning, locking, transactions, and sorting web-accessible, schema-less, distributed Simple, small set of data types, limited transaction support Redis, Scalaris, Tokyo Cabinet Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable, versioning, locking, web- accessible, schema-less, distributed Limited transaction support Google BigTable, Hbase or HyperTable, Cassandra Document Store Document management CRM, Business continuity Stores and retrieves unstructured documents, map/reduce, web- accessible, schema-less, distributed Limited transaction support CouchDB, MongoDB, Riak Traditional Relational Transaction processing, typical corporate workloads Well documented and supported, mature code, widely implemented in production Cost, vertical scaling, increased complexity Oracle, Microsoft SQL Server, MySQL Cluster
  28. 28. 28JaMU – Jakarta 7 Maret 2014 Nosql VS SQL • Schema are much more flexible • Non relational (no joins) • Horizontal Scalability • Master – Slave • Peer-to-peer • Data Pipeline – Expressions – Functional Programming • ACID (Atomicity, Consistency, Isolation, Du rability) • BASE (Basic Availability, Soft- state, Eventual consistency) • CAP (Consistency, Availability, Partition Tolerance)
  29. 29. 29JaMU – Jakarta 7 Maret 2014 DB-ENGINES.COM DB RANKING PER 7 MARCH 2014 Rank Last Month DBMS Database Model Score Changes 1 1Oracle Relational DBMS 1491.8 -8.43 2 2MySQL Relational DBMS 1290.21 1.83 3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99 4 4PostgreSQL Relational DBMS 235.06 4.61 5 5MongoDB Document store 199.99 4.81 6 6DB2 Relational DBMS 187.32 -1.14 7 7Microsoft Access Relational DBMS 146.48 -6.4 8 8SQLite Relational DBMS 92.98 -0.03 9 9Sybase ASE Relational DBMS 81.55 -6.33 10 10Cassandra Wide column store 78.09 -2.23
  30. 30. 30JaMU – Jakarta 7 Maret 2014 MongoDB Document Oriented Database • Schemaless • Distributed • Auto Sharding • Map Reduce Capabilities • Multi Platform • Structures – Database – Collections – Documents • Document – A record is a document – Similar to JSON Objects
  31. 31. 31JaMU – Jakarta 7 Maret 2014 MongoDB • MongoDB Shell • Insert db.koleksi.insert( {nama: “PHI-Integration”, type: “Company”}) • Insert / Update db. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”}, {upsert:true}) • Delete db. koleksi.remove( {nama: “PHI-Integration”, type: “Company”}) • Read / Query db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting: {$lt: 200}}]) Basic Commands & Expressions
  32. 32. 32JaMU – Jakarta 7 Maret 2014 MONGODB DEMO Basic Commands PDI Extract and Load Aggregation Framework 1 2 3
  33. 33. 33JaMU – Jakarta 7 Maret 2014 Neo4j Graph Database Properties Relationship Cypher Node
  34. 34. 34JaMU – Jakarta 7 Maret 2014 Neo4J • Neo4J Web Admin • Create Node CREATE (n {property_name :“property_value" }) • Create Relation CREATE n-[:RELATION]->m • Where: – n, m is identifier – :RELATION is relation name Basic Utility, Commands & Expressions
  35. 35. 35JaMU – Jakarta 7 Maret 2014 Neo4J • Matching and Returning Objects START emil=node:people(name='Emil') MATCH emil-[:MARRIED_TO]-madde RETURN madde Basic Commands & Expressions
  36. 36. 36JaMU – Jakarta 7 Maret 2014 HIERARCHICAL MODEL Neo4j Case Demo Root Child 3 Child 4Child 2Child 1 Child 5
  37. 37. 37JaMU – Jakarta 7 Maret 2014 Q&A
  38. 38. 38JaMU – Jakarta 7 Maret 2014 Universitas Multimedia Nusantara New Media Tower, Lv.12 Scientia Boulevard St. Tangerang, Banten, 15811 +6221-7038-7738 (phone) + 628176-474-525 (mobile) https://www.facebook.com/feris.thia @FerisThia feris@phi-integration.com CONTACT ME
  39. 39. 39JaMU – Jakarta 7 Maret 2014 BIG THANK YOU!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×