Pentaho and NoSQL

2,617 views

Published on

This is the powerpoint slide presentation given during Jakarta Java Meet Up, 7th March 2014 at BliBli.com.

Pentaho and NoSQL

  1. 1. 1JaMU – Jakarta 7 Maret 2014 Pentaho and NoSQL Java Meet Up (JaMU), Jakarta 7th March, 2014 Feris Thia feris@phi-integration.com 08176-474-525
  2. 2. 2JaMU – Jakarta 7 Maret 2014 ABOUT ME Founder 2007 2013 Feris Thia PHI-Integration
  3. 3. 3JaMU – Jakarta 7 Maret 2014 ABOUT ME Book Author Feris Thia November 2013
  4. 4. 4JaMU – Jakarta 7 Maret 2014 ABOUT ME Community Manager Feris Thia Excel Indonesia User Group (EIUG) Pentaho User Group Indonesia (Pentaho-ID) 2008 (~1000 members) 2013 (~5000 members)
  5. 5. 5JaMU – Jakarta 7 Maret 2014 ABOUT ME PHI-Integration Clients Community Manager Feris Thia
  6. 6. 6JaMU – Jakarta 7 Maret 2014 AGENDA DATA PREPARATION What and why it is important? PENTAHO DATA INTEGRATION Popular Open Source ETL NOSQL An Emerging Non Relational DatabaseTechnology
  7. 7. 7JaMU – Jakarta 7 Maret 2014 PROBLEMS?
  8. 8. 8JaMU – Jakarta 7 Maret 2014 image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/ What cause sales increase in this area? Is there something unusual happen? WHAT?? So we cannot make any decisions until the data ready. We need some times to prepare additional data to answer that. Yes, sir….
  9. 9. 9JaMU – Jakarta 7 Maret 2014 Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/ TYPICAL SOLUTION SOPHISTICATED REPORTING OR DASHBOARD APPLICATION!
  10. 10. 10JaMU – Jakarta 7 Maret 2014 Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg PROBLEMS REMAIN…
  11. 11. 11JaMU – Jakarta 7 Maret 2014 Time Spent on Data Preparation 80 % Data Quality 50% Extract, Transformation & Load 30%
  12. 12. 12JaMU – Jakarta 7 Maret 2014
  13. 13. 13JaMU – Jakarta 7 Maret 2014 DATA PREPARATION IS THE KEY Entry Systems Data Preparation Reporting Basic Data Presentation Performance Dashboard (Visualization) 1 2 3 4 Notes: Data preparation is often undermine.
  14. 14. 14JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE Entry Systems Data Warehouse Business Intelligence 1 2 3
  15. 15. 15JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE
  16. 16. 16JaMU – Jakarta 7 Maret 2014 CHALLENGES
  17. 17. 17JaMU – Jakarta 7 Maret 2014 INTEGRATION of many data sources INCREMENTAL Extract only changes DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAQUALITY missing data, conversion etc. PROTOCOL driver availability, reliability, etc. EXTRACT
  18. 18. 18JaMU – Jakarta 7 Maret 2014 NORMALIZE DENORMALIZE SPLIT/ MERGE DATAREDUCTION (Aggregate,etc) TRANSPOSE TEXTPARSING TRANSFORM
  19. 19. 19JaMU – Jakarta 7 Maret 2014 PERFORMANCE of many data sources CHANGES structure, data type, column size, etc DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAMAPPING sync with correlated data Output Format Excel, PDF, HTML, RDBMS, etc. LOAD
  20. 20. 20JaMU – Jakarta 7 Maret 2014 DEMO Data structure changes to increase SQL query performance.
  21. 21. 21JaMU – Jakarta 7 Maret 2014 Pentaho Data Integration Open Source ETL
  22. 22. 22JaMU – Jakarta 7 Maret 2014 FEATURES AND BENEFITS • Open Source • Cost Efficient • More than 200 modules • Multi OS Platform • Working with emerging Big Data platforms • Low Learning Curve
  23. 23. 23JaMU – Jakarta 7 Maret 2014 DEMO Basic Extract and Transformaion More I/O Helper Table (Closure) 1 2 3
  24. 24. 24JaMU – Jakarta 7 Maret 2014 NoSQL Not only SQL
  25. 25. 25JaMU – Jakarta 7 Maret 2014 2009 Redis Initial Release TIMELINE Emergence of open source NoSQL 2004 2006 2007 2008 2009 2011 2012 2013 2014 2007 MongoDB Started, Neo4J Initial Release 2004 Google’s Map Reduce Paper Published 2012 Google Spanner Paper Published 1998 1998 NoSQL coined 2006 Hadoop Started 2008 Apache Hbase, Apache Cassandra
  26. 26. 26JaMU – Jakarta 7 Maret 2014 NOSQL GROUPS DOCUMENT MongoDB, CouchDB, Ria k WIDE COLUMN Cassandra, Hbase, Hype rtable GRAPH Neo4J, OrientDB KEY - VALUE Redis, MemcacheDB, SimpleDB <K, V>
  27. 27. 27JaMU – Jakarta 7 Maret 2014 NOSQL VS SQL http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/ Data Store Type Use Cases Advantages Disadvantages Key Product Key-Value In-memory cache, web-site analytics, log file analysis Simple, replication, versioning, locking, transactions, and sorting web-accessible, schema-less, distributed Simple, small set of data types, limited transaction support Redis, Scalaris, Tokyo Cabinet Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable, versioning, locking, web- accessible, schema-less, distributed Limited transaction support Google BigTable, Hbase or HyperTable, Cassandra Document Store Document management CRM, Business continuity Stores and retrieves unstructured documents, map/reduce, web- accessible, schema-less, distributed Limited transaction support CouchDB, MongoDB, Riak Traditional Relational Transaction processing, typical corporate workloads Well documented and supported, mature code, widely implemented in production Cost, vertical scaling, increased complexity Oracle, Microsoft SQL Server, MySQL Cluster
  28. 28. 28JaMU – Jakarta 7 Maret 2014 Nosql VS SQL • Schema are much more flexible • Non relational (no joins) • Horizontal Scalability • Master – Slave • Peer-to-peer • Data Pipeline – Expressions – Functional Programming • ACID (Atomicity, Consistency, Isolation, Du rability) • BASE (Basic Availability, Soft- state, Eventual consistency) • CAP (Consistency, Availability, Partition Tolerance)
  29. 29. 29JaMU – Jakarta 7 Maret 2014 DB-ENGINES.COM DB RANKING PER 7 MARCH 2014 Rank Last Month DBMS Database Model Score Changes 1 1Oracle Relational DBMS 1491.8 -8.43 2 2MySQL Relational DBMS 1290.21 1.83 3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99 4 4PostgreSQL Relational DBMS 235.06 4.61 5 5MongoDB Document store 199.99 4.81 6 6DB2 Relational DBMS 187.32 -1.14 7 7Microsoft Access Relational DBMS 146.48 -6.4 8 8SQLite Relational DBMS 92.98 -0.03 9 9Sybase ASE Relational DBMS 81.55 -6.33 10 10Cassandra Wide column store 78.09 -2.23
  30. 30. 30JaMU – Jakarta 7 Maret 2014 MongoDB Document Oriented Database • Schemaless • Distributed • Auto Sharding • Map Reduce Capabilities • Multi Platform • Structures – Database – Collections – Documents • Document – A record is a document – Similar to JSON Objects
  31. 31. 31JaMU – Jakarta 7 Maret 2014 MongoDB • MongoDB Shell • Insert db.koleksi.insert( {nama: “PHI-Integration”, type: “Company”}) • Insert / Update db. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”}, {upsert:true}) • Delete db. koleksi.remove( {nama: “PHI-Integration”, type: “Company”}) • Read / Query db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting: {$lt: 200}}]) Basic Commands & Expressions
  32. 32. 32JaMU – Jakarta 7 Maret 2014 MONGODB DEMO Basic Commands PDI Extract and Load Aggregation Framework 1 2 3
  33. 33. 33JaMU – Jakarta 7 Maret 2014 Neo4j Graph Database Properties Relationship Cypher Node
  34. 34. 34JaMU – Jakarta 7 Maret 2014 Neo4J • Neo4J Web Admin • Create Node CREATE (n {property_name :“property_value" }) • Create Relation CREATE n-[:RELATION]->m • Where: – n, m is identifier – :RELATION is relation name Basic Utility, Commands & Expressions
  35. 35. 35JaMU – Jakarta 7 Maret 2014 Neo4J • Matching and Returning Objects START emil=node:people(name='Emil') MATCH emil-[:MARRIED_TO]-madde RETURN madde Basic Commands & Expressions
  36. 36. 36JaMU – Jakarta 7 Maret 2014 HIERARCHICAL MODEL Neo4j Case Demo Root Child 3 Child 4Child 2Child 1 Child 5
  37. 37. 37JaMU – Jakarta 7 Maret 2014 Q&A
  38. 38. 38JaMU – Jakarta 7 Maret 2014 Universitas Multimedia Nusantara New Media Tower, Lv.12 Scientia Boulevard St. Tangerang, Banten, 15811 +6221-7038-7738 (phone) + 628176-474-525 (mobile) https://www.facebook.com/feris.thia @FerisThia feris@phi-integration.com CONTACT ME
  39. 39. 39JaMU – Jakarta 7 Maret 2014 BIG THANK YOU!

×