SlideShare a Scribd company logo
1 of 39
1JaMU – Jakarta 7 Maret 2014
Pentaho
and NoSQL
Java Meet Up (JaMU), Jakarta
7th March, 2014
Feris Thia
feris@phi-integration.com
08176-474-525
2JaMU – Jakarta 7 Maret 2014
ABOUT ME
Founder
2007 2013
Feris Thia
PHI-Integration
3JaMU – Jakarta 7 Maret 2014
ABOUT ME
Book Author
Feris Thia
November 2013
4JaMU – Jakarta 7 Maret 2014
ABOUT ME
Community Manager
Feris Thia
Excel Indonesia User
Group (EIUG)
Pentaho User Group
Indonesia (Pentaho-ID)
2008
(~1000 members)
2013
(~5000 members)
5JaMU – Jakarta 7 Maret 2014
ABOUT ME
PHI-Integration Clients
Community Manager
Feris Thia
6JaMU – Jakarta 7 Maret 2014
AGENDA
DATA PREPARATION
What and why it is important?
PENTAHO DATA INTEGRATION
Popular Open Source ETL
NOSQL
An Emerging Non Relational
DatabaseTechnology
7JaMU – Jakarta 7 Maret 2014
PROBLEMS?
8JaMU – Jakarta 7 Maret 2014
image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/
What cause sales increase
in this area? Is there
something unusual
happen?
WHAT?? So we cannot
make any decisions until
the data ready.
We need some times
to prepare additional
data to answer that.
Yes, sir….
9JaMU – Jakarta 7 Maret 2014
Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/
TYPICAL SOLUTION
SOPHISTICATED REPORTING OR
DASHBOARD APPLICATION!
10JaMU – Jakarta 7 Maret 2014
Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg
PROBLEMS REMAIN…
11JaMU – Jakarta 7 Maret 2014
Time Spent on Data Preparation
80 %
Data Quality
50%
Extract, Transformation & Load
30%
12JaMU – Jakarta 7 Maret 2014
13JaMU – Jakarta 7 Maret 2014
DATA PREPARATION IS THE KEY
Entry Systems Data Preparation
Reporting
Basic Data
Presentation
Performance
Dashboard
(Visualization)
1 2 3 4
Notes: Data preparation is often undermine.
14JaMU – Jakarta 7 Maret 2014
DATA WAREHOUSE
Entry Systems Data Warehouse
Business
Intelligence
1 2 3
15JaMU – Jakarta 7 Maret 2014
DATA WAREHOUSE
16JaMU – Jakarta 7 Maret 2014
CHALLENGES
17JaMU – Jakarta 7 Maret 2014
INTEGRATION
of many data sources
INCREMENTAL
Extract only changes
DATASIZE
Big data
INFRASTRUCTURE
network failure, high latency, slow
i/o, etc.
DATAQUALITY
missing data, conversion etc.
PROTOCOL
driver availability, reliability, etc.
EXTRACT
18JaMU – Jakarta 7 Maret 2014
NORMALIZE
DENORMALIZE
SPLIT/ MERGE
DATAREDUCTION
(Aggregate,etc)
TRANSPOSE
TEXTPARSING
TRANSFORM
19JaMU – Jakarta 7 Maret 2014
PERFORMANCE
of many data sources
CHANGES
structure, data type, column
size, etc
DATASIZE
Big data
INFRASTRUCTURE
network failure, high latency, slow
i/o, etc.
DATAMAPPING
sync with correlated data
Output Format
Excel, PDF, HTML, RDBMS, etc.
LOAD
20JaMU – Jakarta 7 Maret 2014
DEMO
Data structure changes to increase SQL query performance.
21JaMU – Jakarta 7 Maret 2014
Pentaho Data Integration
Open Source ETL
22JaMU – Jakarta 7 Maret 2014
FEATURES AND BENEFITS
• Open Source
• Cost Efficient
• More than 200 modules
• Multi OS Platform
• Working with emerging Big Data platforms
• Low Learning Curve
23JaMU – Jakarta 7 Maret 2014
DEMO
Basic Extract
and
Transformaion
More I/O
Helper Table
(Closure)
1 2 3
24JaMU – Jakarta 7 Maret 2014
NoSQL
Not only SQL
25JaMU – Jakarta 7 Maret 2014
2009
Redis Initial Release
TIMELINE
Emergence of open source NoSQL
2004 2006 2007 2008 2009 2011 2012 2013 2014
2007
MongoDB Started,
Neo4J Initial Release
2004
Google’s Map
Reduce Paper
Published
2012
Google Spanner Paper
Published
1998
1998
NoSQL coined
2006
Hadoop
Started
2008
Apache Hbase,
Apache Cassandra
26JaMU – Jakarta 7 Maret 2014
NOSQL GROUPS
DOCUMENT
MongoDB, CouchDB, Ria
k
WIDE COLUMN
Cassandra, Hbase, Hype
rtable
GRAPH
Neo4J, OrientDB
KEY - VALUE
Redis, MemcacheDB,
SimpleDB
<K, V>
27JaMU – Jakarta 7 Maret 2014
NOSQL VS SQL
http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/
Data Store Type Use Cases Advantages Disadvantages Key Product
Key-Value In-memory cache, web-site
analytics, log file analysis
Simple, replication, versioning,
locking, transactions, and sorting
web-accessible, schema-less,
distributed
Simple, small set of data types,
limited transaction support
Redis, Scalaris, Tokyo
Cabinet
Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable,
versioning, locking, web-
accessible, schema-less,
distributed
Limited transaction support Google BigTable, Hbase or
HyperTable, Cassandra
Document Store Document management CRM,
Business continuity
Stores and retrieves unstructured
documents, map/reduce, web-
accessible, schema-less,
distributed
Limited transaction support CouchDB, MongoDB, Riak
Traditional Relational Transaction processing, typical
corporate workloads
Well documented and supported,
mature code, widely implemented
in production
Cost, vertical scaling, increased
complexity
Oracle, Microsoft SQL
Server, MySQL Cluster
28JaMU – Jakarta 7 Maret 2014
Nosql VS SQL
• Schema are much more flexible
• Non relational (no joins)
• Horizontal Scalability
• Master – Slave
• Peer-to-peer
• Data Pipeline
– Expressions
– Functional Programming
• ACID
(Atomicity, Consistency, Isolation, Du
rability)
• BASE (Basic Availability, Soft-
state, Eventual consistency)
• CAP
(Consistency, Availability, Partition
Tolerance)
29JaMU – Jakarta 7 Maret 2014
DB-ENGINES.COM DB RANKING
PER 7 MARCH 2014
Rank Last Month DBMS Database Model Score Changes
1 1Oracle Relational DBMS 1491.8 -8.43
2 2MySQL Relational DBMS 1290.21 1.83
3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99
4 4PostgreSQL Relational DBMS 235.06 4.61
5 5MongoDB Document store 199.99 4.81
6 6DB2 Relational DBMS 187.32 -1.14
7 7Microsoft Access Relational DBMS 146.48 -6.4
8 8SQLite Relational DBMS 92.98 -0.03
9 9Sybase ASE Relational DBMS 81.55 -6.33
10 10Cassandra Wide column store 78.09 -2.23
30JaMU – Jakarta 7 Maret 2014
MongoDB
Document Oriented Database
• Schemaless
• Distributed
• Auto Sharding
• Map Reduce Capabilities
• Multi Platform
• Structures
– Database
– Collections
– Documents
• Document
– A record is a document
– Similar to JSON Objects
31JaMU – Jakarta 7 Maret 2014
MongoDB
• MongoDB Shell
• Insert
db.koleksi.insert( {nama: “PHI-Integration”, type: “Company”})
• Insert / Update
db. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”}, {upsert:true})
• Delete
db. koleksi.remove( {nama: “PHI-Integration”, type: “Company”})
• Read / Query
db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting:
{$lt: 200}}])
Basic Commands & Expressions
32JaMU – Jakarta 7 Maret 2014
MONGODB DEMO
Basic
Commands
PDI Extract
and
Load
Aggregation
Framework
1 2 3
33JaMU – Jakarta 7 Maret 2014
Neo4j
Graph Database
Properties
Relationship
Cypher
Node
34JaMU – Jakarta 7 Maret 2014
Neo4J
• Neo4J Web Admin
• Create Node
CREATE (n {property_name :“property_value" })
• Create Relation
CREATE n-[:RELATION]->m
• Where:
– n, m is identifier
– :RELATION is relation name
Basic Utility, Commands & Expressions
35JaMU – Jakarta 7 Maret 2014
Neo4J
• Matching and Returning Objects
START emil=node:people(name='Emil')
MATCH emil-[:MARRIED_TO]-madde
RETURN madde
Basic Commands & Expressions
36JaMU – Jakarta 7 Maret 2014
HIERARCHICAL MODEL
Neo4j Case Demo
Root
Child 3 Child 4Child 2Child 1 Child 5
37JaMU – Jakarta 7 Maret 2014
Q&A
38JaMU – Jakarta 7 Maret 2014
Universitas Multimedia Nusantara
New Media Tower, Lv.12
Scientia Boulevard St.
Tangerang, Banten, 15811
+6221-7038-7738 (phone)
+ 628176-474-525 (mobile)
https://www.facebook.com/feris.thia
@FerisThia
feris@phi-integration.com
CONTACT ME
39JaMU – Jakarta 7 Maret 2014
BIG
THANK YOU!

More Related Content

What's hot

NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options ComparedSergey Bushik
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB Shy Engelberg
 
Designing and developing your database for application availability
Designing and developing your database for application availabilityDesigning and developing your database for application availability
Designing and developing your database for application availabilityCharley Hanania
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsDATAVERSITY
 
Storing and managing your content in share point spsnyc
Storing and managing your content in share point spsnycStoring and managing your content in share point spsnyc
Storing and managing your content in share point spsnycBaris Bruce Tuncertan
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 GanglerSecure-24
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarJainul Musani
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
Characteristics of no sql databases
Characteristics of no sql databasesCharacteristics of no sql databases
Characteristics of no sql databasesDipti Borkar
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionKrishnakumar S
 

What's hot (20)

NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
NoSQL and Couchbase
NoSQL and CouchbaseNoSQL and Couchbase
NoSQL and Couchbase
 
SQL vs NoSQL
SQL vs NoSQLSQL vs NoSQL
SQL vs NoSQL
 
NoSQL - what's that
NoSQL - what's thatNoSQL - what's that
NoSQL - what's that
 
NoSQL Options Compared
NoSQL Options ComparedNoSQL Options Compared
NoSQL Options Compared
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB SQL Server 2016 - Stretch DB
SQL Server 2016 - Stretch DB
 
Designing and developing your database for application availability
Designing and developing your database for application availabilityDesigning and developing your database for application availability
Designing and developing your database for application availability
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture Patterns
 
Nosql
NosqlNosql
Nosql
 
Storing and managing your content in share point spsnyc
Storing and managing your content in share point spsnycStoring and managing your content in share point spsnyc
Storing and managing your content in share point spsnyc
 
Ssn0020 ssis 2012 for beginners
Ssn0020   ssis 2012 for beginnersSsn0020   ssis 2012 for beginners
Ssn0020 ssis 2012 for beginners
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 Gangler
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
 
NoSQL Consepts
NoSQL ConseptsNoSQL Consepts
NoSQL Consepts
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Characteristics of no sql databases
Characteristics of no sql databasesCharacteristics of no sql databases
Characteristics of no sql databases
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 

Similar to Pentaho and NoSQL

Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...PGConf APAC
 
PostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesPostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesAll Things Open
 
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)Binary Studio
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Charley Hanania
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterEDB
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkkbajda
 
Doing More with Postgres - Yesterday's Vision Becomes Today's Reality
Doing More with Postgres - Yesterday's Vision Becomes Today's RealityDoing More with Postgres - Yesterday's Vision Becomes Today's Reality
Doing More with Postgres - Yesterday's Vision Becomes Today's RealityEDB
 
Sql interview question part 6
Sql interview question part 6Sql interview question part 6
Sql interview question part 6kaashiv1
 
Sql interview-question-part-6
Sql interview-question-part-6Sql interview-question-part-6
Sql interview-question-part-6kaashiv1
 
Sql interview-question-part-6
Sql interview-question-part-6Sql interview-question-part-6
Sql interview-question-part-6kaashiv1
 
PostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsPostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsJulyanto SUTANDANG
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseMatt Fuller
 

Similar to Pentaho and NoSQL (20)

Resume charles-2017
Resume charles-2017Resume charles-2017
Resume charles-2017
 
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
Sponsored Talk @ PGConf APAC 2018 - Migrating Oracle to EDB Postgres Approach...
 
Dao benchmark
Dao benchmarkDao benchmark
Dao benchmark
 
PostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real DifferencesPostgreSQL versus MySQL - What Are The Real Differences
PostgreSQL versus MySQL - What Are The Real Differences
 
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
NoSQL vs SQL (by Dmitriy Beseda, JS developer and coach Binary Studio Academy)
 
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
Pass chapter meeting dec 2013 - compression a hidden gem for io heavy databas...
 
Erciyes university
Erciyes universityErciyes university
Erciyes university
 
Postgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps FasterPostgres NoSQL - Delivering Apps Faster
Postgres NoSQL - Delivering Apps Faster
 
Introduction to Azure Data Lake
Introduction to Azure Data LakeIntroduction to Azure Data Lake
Introduction to Azure Data Lake
 
Presto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talkPresto Strata Hadoop SJ 2016 short talk
Presto Strata Hadoop SJ 2016 short talk
 
Doing More with Postgres - Yesterday's Vision Becomes Today's Reality
Doing More with Postgres - Yesterday's Vision Becomes Today's RealityDoing More with Postgres - Yesterday's Vision Becomes Today's Reality
Doing More with Postgres - Yesterday's Vision Becomes Today's Reality
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
 
Sql interview question part 6
Sql interview question part 6Sql interview question part 6
Sql interview question part 6
 
Ebook6
Ebook6Ebook6
Ebook6
 
Sql interview-question-part-6
Sql interview-question-part-6Sql interview-question-part-6
Sql interview-question-part-6
 
Sql interview-question-part-6
Sql interview-question-part-6Sql interview-question-part-6
Sql interview-question-part-6
 
Ebook6
Ebook6Ebook6
Ebook6
 
PostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise SolutionsPostgreSQL 10; Long Awaited Enterprise Solutions
PostgreSQL 10; Long Awaited Enterprise Solutions
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
 
manasiCV_oracleDBA
manasiCV_oracleDBAmanasiCV_oracleDBA
manasiCV_oracleDBA
 

Pentaho and NoSQL

  • 1. 1JaMU – Jakarta 7 Maret 2014 Pentaho and NoSQL Java Meet Up (JaMU), Jakarta 7th March, 2014 Feris Thia feris@phi-integration.com 08176-474-525
  • 2. 2JaMU – Jakarta 7 Maret 2014 ABOUT ME Founder 2007 2013 Feris Thia PHI-Integration
  • 3. 3JaMU – Jakarta 7 Maret 2014 ABOUT ME Book Author Feris Thia November 2013
  • 4. 4JaMU – Jakarta 7 Maret 2014 ABOUT ME Community Manager Feris Thia Excel Indonesia User Group (EIUG) Pentaho User Group Indonesia (Pentaho-ID) 2008 (~1000 members) 2013 (~5000 members)
  • 5. 5JaMU – Jakarta 7 Maret 2014 ABOUT ME PHI-Integration Clients Community Manager Feris Thia
  • 6. 6JaMU – Jakarta 7 Maret 2014 AGENDA DATA PREPARATION What and why it is important? PENTAHO DATA INTEGRATION Popular Open Source ETL NOSQL An Emerging Non Relational DatabaseTechnology
  • 7. 7JaMU – Jakarta 7 Maret 2014 PROBLEMS?
  • 8. 8JaMU – Jakarta 7 Maret 2014 image source: http://www.huntbigsales.com/winning-in-the-meeting-after-the-meeting/ What cause sales increase in this area? Is there something unusual happen? WHAT?? So we cannot make any decisions until the data ready. We need some times to prepare additional data to answer that. Yes, sir….
  • 9. 9JaMU – Jakarta 7 Maret 2014 Image Source: http://wrapbootstrap.com/preview/WB0KDM51J/ TYPICAL SOLUTION SOPHISTICATED REPORTING OR DASHBOARD APPLICATION!
  • 10. 10JaMU – Jakarta 7 Maret 2014 Image Source: http://reallybadboss.com/wp-content/uploads/2012/02/frustration.jpg PROBLEMS REMAIN…
  • 11. 11JaMU – Jakarta 7 Maret 2014 Time Spent on Data Preparation 80 % Data Quality 50% Extract, Transformation & Load 30%
  • 12. 12JaMU – Jakarta 7 Maret 2014
  • 13. 13JaMU – Jakarta 7 Maret 2014 DATA PREPARATION IS THE KEY Entry Systems Data Preparation Reporting Basic Data Presentation Performance Dashboard (Visualization) 1 2 3 4 Notes: Data preparation is often undermine.
  • 14. 14JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE Entry Systems Data Warehouse Business Intelligence 1 2 3
  • 15. 15JaMU – Jakarta 7 Maret 2014 DATA WAREHOUSE
  • 16. 16JaMU – Jakarta 7 Maret 2014 CHALLENGES
  • 17. 17JaMU – Jakarta 7 Maret 2014 INTEGRATION of many data sources INCREMENTAL Extract only changes DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAQUALITY missing data, conversion etc. PROTOCOL driver availability, reliability, etc. EXTRACT
  • 18. 18JaMU – Jakarta 7 Maret 2014 NORMALIZE DENORMALIZE SPLIT/ MERGE DATAREDUCTION (Aggregate,etc) TRANSPOSE TEXTPARSING TRANSFORM
  • 19. 19JaMU – Jakarta 7 Maret 2014 PERFORMANCE of many data sources CHANGES structure, data type, column size, etc DATASIZE Big data INFRASTRUCTURE network failure, high latency, slow i/o, etc. DATAMAPPING sync with correlated data Output Format Excel, PDF, HTML, RDBMS, etc. LOAD
  • 20. 20JaMU – Jakarta 7 Maret 2014 DEMO Data structure changes to increase SQL query performance.
  • 21. 21JaMU – Jakarta 7 Maret 2014 Pentaho Data Integration Open Source ETL
  • 22. 22JaMU – Jakarta 7 Maret 2014 FEATURES AND BENEFITS • Open Source • Cost Efficient • More than 200 modules • Multi OS Platform • Working with emerging Big Data platforms • Low Learning Curve
  • 23. 23JaMU – Jakarta 7 Maret 2014 DEMO Basic Extract and Transformaion More I/O Helper Table (Closure) 1 2 3
  • 24. 24JaMU – Jakarta 7 Maret 2014 NoSQL Not only SQL
  • 25. 25JaMU – Jakarta 7 Maret 2014 2009 Redis Initial Release TIMELINE Emergence of open source NoSQL 2004 2006 2007 2008 2009 2011 2012 2013 2014 2007 MongoDB Started, Neo4J Initial Release 2004 Google’s Map Reduce Paper Published 2012 Google Spanner Paper Published 1998 1998 NoSQL coined 2006 Hadoop Started 2008 Apache Hbase, Apache Cassandra
  • 26. 26JaMU – Jakarta 7 Maret 2014 NOSQL GROUPS DOCUMENT MongoDB, CouchDB, Ria k WIDE COLUMN Cassandra, Hbase, Hype rtable GRAPH Neo4J, OrientDB KEY - VALUE Redis, MemcacheDB, SimpleDB <K, V>
  • 27. 27JaMU – Jakarta 7 Maret 2014 NOSQL VS SQL http://gigaom.com/2010/07/12/nosql-pioneers-are-driving-the-webs-manifest-destiny/ Data Store Type Use Cases Advantages Disadvantages Key Product Key-Value In-memory cache, web-site analytics, log file analysis Simple, replication, versioning, locking, transactions, and sorting web-accessible, schema-less, distributed Simple, small set of data types, limited transaction support Redis, Scalaris, Tokyo Cabinet Tabular or Columnar Data mining, analytics Rapid data aggregation, scalable, versioning, locking, web- accessible, schema-less, distributed Limited transaction support Google BigTable, Hbase or HyperTable, Cassandra Document Store Document management CRM, Business continuity Stores and retrieves unstructured documents, map/reduce, web- accessible, schema-less, distributed Limited transaction support CouchDB, MongoDB, Riak Traditional Relational Transaction processing, typical corporate workloads Well documented and supported, mature code, widely implemented in production Cost, vertical scaling, increased complexity Oracle, Microsoft SQL Server, MySQL Cluster
  • 28. 28JaMU – Jakarta 7 Maret 2014 Nosql VS SQL • Schema are much more flexible • Non relational (no joins) • Horizontal Scalability • Master – Slave • Peer-to-peer • Data Pipeline – Expressions – Functional Programming • ACID (Atomicity, Consistency, Isolation, Du rability) • BASE (Basic Availability, Soft- state, Eventual consistency) • CAP (Consistency, Availability, Partition Tolerance)
  • 29. 29JaMU – Jakarta 7 Maret 2014 DB-ENGINES.COM DB RANKING PER 7 MARCH 2014 Rank Last Month DBMS Database Model Score Changes 1 1Oracle Relational DBMS 1491.8 -8.43 2 2MySQL Relational DBMS 1290.21 1.83 3 3Microsoft SQL Server Relational DBMS 1205.28 -8.99 4 4PostgreSQL Relational DBMS 235.06 4.61 5 5MongoDB Document store 199.99 4.81 6 6DB2 Relational DBMS 187.32 -1.14 7 7Microsoft Access Relational DBMS 146.48 -6.4 8 8SQLite Relational DBMS 92.98 -0.03 9 9Sybase ASE Relational DBMS 81.55 -6.33 10 10Cassandra Wide column store 78.09 -2.23
  • 30. 30JaMU – Jakarta 7 Maret 2014 MongoDB Document Oriented Database • Schemaless • Distributed • Auto Sharding • Map Reduce Capabilities • Multi Platform • Structures – Database – Collections – Documents • Document – A record is a document – Similar to JSON Objects
  • 31. 31JaMU – Jakarta 7 Maret 2014 MongoDB • MongoDB Shell • Insert db.koleksi.insert( {nama: “PHI-Integration”, type: “Company”}) • Insert / Update db. koleksi.update( {nama: “PHI-Integration”}, {name: “Lightora”}, {upsert:true}) • Delete db. koleksi.remove( {nama: “PHI-Integration”, type: “Company”}) • Read / Query db. koleksi.find( {nama: “PHI-Integration”, $and [ {posting: {$gt : 100}} , posting: {$lt: 200}}]) Basic Commands & Expressions
  • 32. 32JaMU – Jakarta 7 Maret 2014 MONGODB DEMO Basic Commands PDI Extract and Load Aggregation Framework 1 2 3
  • 33. 33JaMU – Jakarta 7 Maret 2014 Neo4j Graph Database Properties Relationship Cypher Node
  • 34. 34JaMU – Jakarta 7 Maret 2014 Neo4J • Neo4J Web Admin • Create Node CREATE (n {property_name :“property_value" }) • Create Relation CREATE n-[:RELATION]->m • Where: – n, m is identifier – :RELATION is relation name Basic Utility, Commands & Expressions
  • 35. 35JaMU – Jakarta 7 Maret 2014 Neo4J • Matching and Returning Objects START emil=node:people(name='Emil') MATCH emil-[:MARRIED_TO]-madde RETURN madde Basic Commands & Expressions
  • 36. 36JaMU – Jakarta 7 Maret 2014 HIERARCHICAL MODEL Neo4j Case Demo Root Child 3 Child 4Child 2Child 1 Child 5
  • 37. 37JaMU – Jakarta 7 Maret 2014 Q&A
  • 38. 38JaMU – Jakarta 7 Maret 2014 Universitas Multimedia Nusantara New Media Tower, Lv.12 Scientia Boulevard St. Tangerang, Banten, 15811 +6221-7038-7738 (phone) + 628176-474-525 (mobile) https://www.facebook.com/feris.thia @FerisThia feris@phi-integration.com CONTACT ME
  • 39. 39JaMU – Jakarta 7 Maret 2014 BIG THANK YOU!