Pentaho and NoSQL
Upcoming SlideShare
Loading in...5
×
 

Pentaho and NoSQL

on

  • 807 views

This is the powerpoint slide presentation given during Jakarta Java Meet Up, 7th March 2014 at BliBli.com.

This is the powerpoint slide presentation given during Jakarta Java Meet Up, 7th March 2014 at BliBli.com.

Statistics

Views

Total Views
807
Views on SlideShare
805
Embed Views
2

Actions

Likes
1
Downloads
69
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Pentaho and NoSQL Pentaho and NoSQL Presentation Transcript

    • 1JaMU – Jakarta 7 Maret 2014 Pentaho and NoSQL Java Meet Up (JaMU), Jakarta 7th March, 2014 Feris Thia feris@phi-integration.com 08176-474-525
    • 2JaMU – Jakarta 7 Maret 2014 ABOUT ME Founder 2007 2013 Feris Thia PHI-Integration
    • 3JaMU – Jakarta 7 Maret 2014 ABOUT ME Book Author Feris Thia November 2013
    • 4JaMU – Jakarta 7 Maret 2014 ABOUT ME Community Manager Feris Thia Excel Indonesia User Group (EIUG) Pentaho User Group Indonesia (Pentaho-ID) 2008 (~1000 members) 2013 (~5000 members)
    • 5JaMU – Jakarta 7 Maret 2014 ABOUT ME PHI-Integration Clients Community Manager Feris Thia
    • 6JaMU – Jakarta 7 Maret 2014 AGENDA DATA PREPARATION What and why it is important? PENTAHO DATA INTEGRATION Popular Open Source ETL NOSQL An Emerging Non Relational DatabaseTechnology
    • 7JaMU – Jakarta 7 Maret 2014 PROBLEMS?
    • 8JaMU – Jakarta 7 Maret 2014 image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/ What cause sales increase in this area? Is there something unusual happen? WHAT?? So we cannot make any decisions until the data ready. We need some times to prepare additional data to answer that. Yes, sir….
    • 9JaMU – Jakarta 7 Maret 2014 Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/ TYPICAL SOLUTION SOPHISTICATED REPORTING OR DASHBOARD APPLICATION!
    • 10JaMU – Jakarta 7 Maret 2014 Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg PROBLEMS REMAIN…
    • 11JaMU – Jakarta 7 Maret 2014 Time Spent on Data Preparation 80 % Data Quality 50% Extract, Transformation & Load 30%
    • 12JaMU – Jakarta 7 Maret 2014
    • 13JaMU – Jakarta 7 Maret 2014 DATA PREPARATION IS THE KEY Entry Systems Data Preparation Reporting Basic Data Presentation Performance Dashboard (Visualization) 1 2 3 4 Notes: Data preparation is often undermine.
    • 14JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE Entry Systems Data Warehouse Business Intelligence 1 2 3
    • 15JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE
    • 16JaMU – Jakarta 7 Maret 2014 CHALLENGES
    • 17JaMU – Jakarta 7 Maret 2014 INTEGRATION of many data sources INCREMENTAL Extract only changes DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAQUALITY missing data, conversion etc. PROTOCOL driver availability, reliability, etc. EXTRACT
    • 18JaMU – Jakarta 7 Maret 2014 NORMALIZE DENORMALIZE SPLIT/ MERGE DATAREDUCTION (Aggregate,etc) TRANSPOSE TEXTPARSING TRANSFORM
    • 19JaMU – Jakarta 7 Maret 2014 PERFORMANCE of many data sources CHANGES structure, data type, column size, etc DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAMAPPING sync with correlated data Output Format Excel, PDF, HTML, RDBMS, etc. LOAD
    • 20JaMU – Jakarta 7 Maret 2014 DEMO Data structure changes to increase SQL query performance.
    • 21JaMU – Jakarta 7 Maret 2014 Pentaho Data Integration Open Source ETL
    • 22JaMU – Jakarta 7 Maret 2014 FEATURES AND BENEFITS • Open Source • Cost Efficient • More than 200 modules • Multi OS Platform • Working with emerging Big Data platforms • Low Learning Curve
    • 23JaMU – Jakarta 7 Maret 2014 DEMO Basic Extract and Transformaion More I/O Helper Table (Closure) 1 2 3
    • 24JaMU – Jakarta 7 Maret 2014 NoSQL Not only SQL
    • 25JaMU – Jakarta 7 Maret 2014 2009 Redis Initial Release TIMELINE Emergence of open source NoSQL 2004 2006 2007 2008 2009 2011 2012 2013 2014 2007 MongoDB Started, Neo4J Initial Release 2004 Google’s Map Reduce Paper Published 2012 Google Spanner Paper Published 1998 1998 NoSQL coined 2006 Hadoop Started 2008 Apache Hbase, Apache Cassandra
    • 26JaMU – Jakarta 7 Maret 2014 NOSQL GROUPS DOCUMENT MongoDB, CouchDB, Ria k WIDE COLUMN Cassandra, Hbase, Hype rtable GRAPH Neo4J, OrientDB KEY - VALUE Redis, MemcacheDB, SimpleDB <K, V>
    • 27JaMU – Jakarta 7 Maret 2014 NOSQL VS SQL http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/ Data Store Type Use Cases Advantages Disadvantages Key Product Key-Value In-memory cache, web-site analytics, log file analysis Simple, replication, versioning, locking, transactions, and sorting web-accessible, schema-less, distributed Simple, small set of data types, limited transaction support Redis, Scalaris, Tokyo Cabinet Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable, versioning, locking, web- accessible, schema-less, distributed Limited transaction support Google BigTable, Hbase or HyperTable, Cassandra Document Store Document management CRM, Business continuity Stores and retrieves unstructured documents, map/reduce, web- accessible, schema-less, distributed Limited transaction support CouchDB, MongoDB, Riak Traditional Relational Transaction processing, typical corporate workloads Well documented and supported, mature code, widely implemented in production Cost, vertical scaling, increased complexity Oracle, Microsoft SQL Server, MySQL Cluster
    • 28JaMU – Jakarta 7 Maret 2014 Nosql VS SQL • Schema are much more flexible • Non relational (no joins) • Horizontal Scalability • Master – Slave • Peer-to-peer • Data Pipeline – Expressions – Functional Programming • ACID (Atomicity, Consistency, Isolation, Du rability) • BASE (Basic Availability, Soft- state, Eventual consistency) • CAP (Consistency, Availability, Partition Tolerance)
    • 29JaMU – Jakarta 7 Maret 2014 DB-ENGINES.COM DB RANKING PER 7 MARCH 2014 Rank Last Month DBMS Database Model Score Changes 1 1Oracle Relational DBMS 1491.8 -8.43 2 2MySQL Relational DBMS 1290.21 1.83 3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99 4 4PostgreSQL Relational DBMS 235.06 4.61 5 5MongoDB Document store 199.99 4.81 6 6DB2 Relational DBMS 187.32 -1.14 7 7Microsoft Access Relational DBMS 146.48 -6.4 8 8SQLite Relational DBMS 92.98 -0.03 9 9Sybase ASE Relational DBMS 81.55 -6.33 10 10Cassandra Wide column store 78.09 -2.23
    • 30JaMU – Jakarta 7 Maret 2014 MongoDB Document Oriented Database • Schemaless • Distributed • Auto Sharding • Map Reduce Capabilities • Multi Platform • Structures – Database – Collections – Documents • Document – A record is a document – Similar to JSON Objects
    • 31JaMU – Jakarta 7 Maret 2014 MongoDB • MongoDB Shell • Insert db.koleksi.insert( {nama: “PHI-Integration”, type: “Company”}) • Insert / Update db. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”}, {upsert:true}) • Delete db. koleksi.remove( {nama: “PHI-Integration”, type: “Company”}) • Read / Query db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting: {$lt: 200}}]) Basic Commands & Expressions
    • 32JaMU – Jakarta 7 Maret 2014 MONGODB DEMO Basic Commands PDI Extract and Load Aggregation Framework 1 2 3
    • 33JaMU – Jakarta 7 Maret 2014 Neo4j Graph Database Properties Relationship Cypher Node
    • 34JaMU – Jakarta 7 Maret 2014 Neo4J • Neo4J Web Admin • Create Node CREATE (n {property_name :“property_value" }) • Create Relation CREATE n-[:RELATION]->m • Where: – n, m is identifier – :RELATION is relation name Basic Utility, Commands & Expressions
    • 35JaMU – Jakarta 7 Maret 2014 Neo4J • Matching and Returning Objects START emil=node:people(name='Emil') MATCH emil-[:MARRIED_TO]-madde RETURN madde Basic Commands & Expressions
    • 36JaMU – Jakarta 7 Maret 2014 HIERARCHICAL MODEL Neo4j Case Demo Root Child 3 Child 4Child 2Child 1 Child 5
    • 37JaMU – Jakarta 7 Maret 2014 Q&A
    • 38JaMU – Jakarta 7 Maret 2014 Universitas Multimedia Nusantara New Media Tower, Lv.12 Scientia Boulevard St. Tangerang, Banten, 15811 +6221-7038-7738 (phone) + 628176-474-525 (mobile) https://www.facebook.com/feris.thia @FerisThia feris@phi-integration.com CONTACT ME
    • 39JaMU – Jakarta 7 Maret 2014 BIG THANK YOU!