SlideShare a Scribd company logo
1 of 41
Download to read offline
Pentaho & MongoDB Partner to Solve
Government Big Data Challenges
December 2013
Bob Gourley
Publisher, CTOvision.com

Will LaForest
Director of Federal, MongoDB

Dave Henry
SVP Enterprise Solutions, Pentaho
1

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Big Data Management
Best Practices for Federal Big
Data Projects
Bob Gourley
Publisher, CTOvision.com

2

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Brief Purpose
Research & Reports

A focus on a
new discipline
of “Big Data
Management”

Contribute your
thoughts at
CTOvision.com

3

Intro to top 5
“Best
Practices”
of Federal
Data activities

Invitation to
collaborate
and refine
approaches

A perpetual
draft - your
input is
requested

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Update Sources
 Big Data Government Newsletter - reader survey
 2,600 readers
 2% response rate, across Federal agencies

 Review of openly published research by Wikibon, TDWI, IDC, Gartner,
Forrester and of course our own CTOvision
 Review of best practices and use cases from the best vendors in
Enterprise Big Data
 Engagement of the community at events like Strata and Hadoop World

Planning Assumption
The ability to collect, parse, analyze machine data in real time,
whether on premise or in the cloud, will continue to grow

4

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Big Data Management

 Agencies are thinking through the right changes to concepts and technologies
 Old approaches still important, but cannot solve emerging problems
 Big Data Management is an evolved discipline which builds on existing data
management approaches to leverage new concepts, technologies and best
practices to optimize mission support
5

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Solutions That Require Big Data Management
•
•
•
•
•
•
•
•
•
•

Open Source Information: analysis and integration
Situational Awareness across disparate data sets
Two use cases: “Connect the Dots” and “Needle in Haystack”
Cyber Security: rapid real time analysis of all relevant data
Asset catalog across extensive/dynamic enterprises
Rapid return of geospatial data
Location based push of data
Real time return of relevant search
Real time suggestion of topics
Bioinformatics:
• Human Genome
• Patient location, treatment, outcomes
• Law Enforcement: Predictive Policing
• Data Hub: Unified storage, governance, security, functionality
6

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Best Practices in Big Data Management
VISION

STRATEGY

Start with a mission-focused vision. This will vary by organization. Support
to mission will drive everything else. Consider that analytics and Big Data
go together.
Should prioritize and tackle challenges like: Changes to governance
processes, right mix of skills for workforce, learning new technology,
prioritizing which workload types will be handled by which part of the
architecture.

KNOW

DESIGN

Document and continuously improve. Architect to manage data in its
original form. Include right mix of traditional and new in your design. Don’t
assume any one platform will be a solution. Architect to insulate
applications and users from a variety of disparate big data platforms.

EXECUTE

7

Know existing infrastructure and process with focus on: Understanding of
legal/policy dynamics relevant to your agency, understanding of new
capabilities available, current and required throughputs/capacities, types of
workloads supported by each components in the architecture, available
tech choices.

Avoid custom coding wherever possible. Don’t let new Big Data Platforms
become proprietary silos. ETL remains important. Ensure training for all
based on job function. Don’t neglect your own training. Serve the analyst.

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Next Steps
 Continue your market surveys, stay aware of what new
technologies can do for you.
 Revisit your vision. As you do, ponder this: How can you leverage
data to support your mission?
 Continue to study use-cases and exchange best practices. Dialog
with others in and out of your sector. Great lessons are coming
from other industries.
 Continue to engage with the broader community. Sign-up for our
Government Big Data Weekly.
 Share your lessons learned.

8

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Provide Your Thoughts, Input, Questions
E-mail: bob@ctovision.com
Blog: http://ctovision.com
Twitter: http://www.twitter.com/bobgourley
Facebook, LinkedIn, etc: See the blog

9

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
The Modern Operational
Database for Government

Will LaForest
Director of Federal, MongoDB

10

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
The Evolution of Databases

1990

2000

2010

Operational &
Real-time

Online

NoSQL

RDBMS
RDBMS
RDBMS
Datawarehouse

OLAP/BI

OLAP/BI
Hadoop

Offline

11

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Relational Database Challenges
Variety

Agile Development

• Unstructured data

• Iterative

• Semi-structured
data

• Short development
cycles

• Polymorphic data

• New workloads

Volume & Velocity

New Architectures

• Petabytes of data

• Horizontal scaling

• Trillions of records

• Commodity
servers

• Millions of queries per
second

12

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

• Cloud computing
MongoDB
The Modern Operational Database

General
Purpose

13

Document
Oriented

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

OpenSource
Fully Featured
Rich Queries

• Find Paul’s cars
• Find everybody in London with a car
built between 1970 and 1980

Geospatial

• Calculate the average value of Paul’s
car collection

• Secondary
Native Indexes • Compound
• Geospatial

14

first_name: ‘Paul’,
surname: ‘Miller’,
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
}

• Find all the cars described as having
leather seats

Aggregation

{

• Find all of the car owners within 5km of
Trafalgar Sq.

Text Search

MongoDB

• Full Text
• Hash
• Covering

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

}
MongoDB and Enterprise IT Stack

CRM, ERP, Collaboration, Mobile, BI

Data Management
Online Data

Offline Data

RDBMS
RDBMS

Hadoop

EDW

Infrastructure
OS & Virtualization, Compute, Storage, Network

15

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Security & Auditing

Management & Monitoring

Applications
Variety – Modern Data
Document Data Model
MongoDB

Relational
{

first_name: ‘Paul’,
surname: ‘Miller’
city: ‘London’,
location: [45.123,47.232],
cars: [
{ model: ‘Bentley’,
year: 1973,
value: 100000, … },
{ model: ‘Rolls Royce’,
year: 1965,
value: 330000, … }
]
}

17

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Dynamic Schema
MongoDB does not need any defined data schema.
Every document could have different data

{name: “will”,
eyes: “blue”,
birthplace: “NY”,
aliases: [“bill”, “la
ciacco”],
gender: ”???”,
boss: ”ben”}

18

{name: “jeff”,
eyes: “blue”,
height: 72,
boss: “ben”}

{name: “ben”,
hat: ”yes”}

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

{name: “brendan”,
aliases: [“el diablo”]}

{name: “matt”,
pizza: “DiGiorno”,
height: 74,
boss: 555.555.1212}
Volume, Velocity, and New Architectures
Automatic Sharding

• Increase or decrease capacity as you go
• Automatic balancing
• Optimized for commodity servers and cloud
infrastructure

20

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
High Availability

• Automated replication and failover
• 0 down time with hardware failure and upgrades
• Multi-data center support
• Improved operational simplicity (e.g., HW swaps)
• Data durability and consistency
21

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
MongoDB Performance*

Top 5 Marketing
Firm

Government
Agency

Top 5 Investment
Bank

10+ fields, arrays,
nested documents

20+ fields, arrays,
nested documents

Queries Key-based
1 – 100 docs/query
80/20 read/write

Compound queries
Range queries
MapReduce
20/80 read/write

Compound queries
Range queries
50/50 read/write

Servers ~250

~50

~40

Ops/sec 1,200,000

500,000

30,000

Data Key/value

* These figures are provided as examples. Your application governs your performance.

22

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Replication Benefits
Operational and Analytical Workloads
• Application interacts with primaries
• Analytical workloads on secondaries
• Workloads are isolated from one
another
• Working set appropriate for each
application

24

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Global Data Distribution

Real-time

Real-time

Real-time

Real-time

Real-time
Real-time
Real-time

25

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Read Global / Write Local
Primary:LON

Secondary:NYC

Primary:NYC

Secondary:SYD

Secondary:LON
Secondary:SYD

Primary:SYD
Secondary:LON
Secondary:NYC

26

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Solving Big Data
Challenges in the
Federal Government
Dave Diegtel
Head of Federal Sales, Pentaho

27

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Why Pentaho for Federal Government
•

•

Business Model and Subscription: Pentaho’s Subscription Model and
Server-based pricing allows for lower upfront investment and risk compared to
legacy BI vendors who traditionally cost an average of 4X for similar size
deployments.

•

Government Certifications: Pentaho has made significant investments in
Government Certifications and Compliance such as 508 and Security.

•

Open API’s and extensible architecture enable ease of integration and
reduce potential for vendor lock-in.

•

28

Company and Product Maturity: Pentaho has been around for over 9 years,
with 1,000’s of paid customers, and 5.0 Version release. Pentaho is proven
and less risky.

Existing Government Customers and Cleared Personnel

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
A Comprehensive Big
Data Platform

Dave Henry
Senior VP Enterprise Solutions, Pentaho

29

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentaho 5.0 Architected for the Future
Simplified analytics experience for all users

Billing Customer

Social
Media

Analytics

Existing & New
Data Infrastructure
& Processes

Web

Location Network

ANY Data
•
•
•
•

30

Relational
Operational
Big Data
Data sources not yet
anticipated…

ANY Environment
•
•
•
•
•

Data warehouses
Data marts
Stack vendors
Cloud
Embedded

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

ANY Analytics
•
•
•
•
•

Reports
Dashboards
Visualizations
Discovery
Predictive
The New Reality
Simplified analysis for all users

Simplified
Analytics
Experience

Blended
Big Data

Enterprise
Big Data
Integration

31

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentaho & MongoDB Enable Key Use Cases
Customer 360 and Device Data Analytics enable comprehensive
insight

…
• MongoDB delivers Scalable,
Low-Latency Enterprise Data
Store

Mission
Scope

• Visual ETL development with
Pentaho Data Integration
(PDI)
• Reporting, Dashboards,
Visualization and Discovery
with Pentaho Analytics

32

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

Pentaho Data
Integration
Pentaho Analytics
• Reporting
• Dashboards
• Visualization
• Discovery
Pentaho Data
Integration
Enterprise Customer Data Store
Powerful data integration for MongoDB

Customer
Master

PDI ETL
POS Data

Web Event
Data

$push to data arrays

33

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

mongoDB
cluster
Data Integration
Exploits MongoDB’s native APIs and query language

34

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Operational Reports
Multi-page, highly formatted reports – real-time, scheduled or burst
to email

35

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Operational Dashboards
Highly tailored, pixel-perfect dashboards on MongoDB

36

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Analyzer
Explore and visualize data

37

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
James Dixon
Founder and CTO, Pentaho

38

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

As CTO at Pentaho, James Dixon is
responsible for Pentaho's
architecture and technology
roadmap. James has over 15 years
of professional experience in
software architecture, development
and systems consulting. Prior to
Pentaho, James held key technical
roles at AppSource Corporation
(acquired by Arbor Software which
later merged into Hyperion
Solutions) and Keyola (acquired by
Lawson Software). Earlier in his
career, James was a technology
consultant working with large and
small firms to deliver the benefits of
innovative technology in real-world
environments.
Why Pentaho?
• Pentaho is the best platform to connect, integrate, and analyze both
traditional sources and MongoDB
• Pentaho embraces and extends the MongoDB environment with rich
visualization and exploration of data
• Pentaho’s Subscription-based business model lowers upfront investments,
enabling faster ROI
• Pentaho has dozens of Federal Government Customers and made
significant investments in government certifications and cleared personnel
• Pentaho and MongoDB are established partners – Pentaho carefully
engineers its products to use the latest MongoDB APIs to provide the best
possible performance

39

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Next Steps and Q&A
• Needs Assessment with Pentaho and MongoDB
• Dave Diegtel - ddiegtel@pentaho.com
• Will LaForrest - will@mongodb.com
• Try Pentaho (30 Free Trial) -- pentaho.com/download
• Learn More about Big Data and Government Solutions
• Pentaho
• Big Data Website: pentahobigdata.com/
• Government Solutions: pentaho.com/solutions/government

• MongoDB:
• Government Solutions: mongodb.com/industries/government
• Big Data: Examples and Guidelines for the Enterprise Decision Maker
mongodb.com/lp/whitepaper/big-data-nosql
• MongoDB Top 5 Considerations When Evaluating NoSQL Databases
mongodb.com/lp/whitepaper/nosql-considerations

• Sign-up for the Big Data Government Newsletter at CTOvision.com &
take reader survey
40

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Thank You

41

© 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555

More Related Content

More from Pentaho

Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Pentaho
 
Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Pentaho
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Pentaho
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Pentaho
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho EvaluationPentaho
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Pentaho
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer SuccessPentaho
 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementPentaho
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Pentaho
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Pentaho
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataPentaho
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User GroupPentaho
 

More from Pentaho (16)

Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy Why Your Product Needs an Analytic Strategy
Why Your Product Needs an Analytic Strategy
 
Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity Data Is Your Next Product Opportunity
Data Is Your Next Product Opportunity
 
Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics Improving the Business of Healthcare through Better Analytics
Improving the Business of Healthcare through Better Analytics
 
Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica Up Your Analytics Game with Pentaho and Vertica
Up Your Analytics Game with Pentaho and Vertica
 
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
Pentaho Analytics for MongoDB - presentation from MongoDB World 2014
 
30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation30 for 30: Quick Start Your Pentaho Evaluation
30 for 30: Quick Start Your Pentaho Evaluation
 
Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing Embedded Analytics in CRM and Marketing
Embedded Analytics in CRM and Marketing
 
Embedded Analytics in Customer Success
Embedded Analytics in Customer SuccessEmbedded Analytics in Customer Success
Embedded Analytics in Customer Success
 
Embedded Analytics in Human Capital Management
Embedded Analytics in Human Capital ManagementEmbedded Analytics in Human Capital Management
Embedded Analytics in Human Capital Management
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
Predictive Analytics with Pentaho Data Mining - Análisis Predictivo con Penta...
 
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
Big Data Integration Webinar: Reducing Implementation Efforts of Hadoop, NoSQ...
 
Big Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big DataBig Data Integration Webinar: Getting Started With Hadoop Big Data
Big Data Integration Webinar: Getting Started With Hadoop Big Data
 
Pentaho Healthcare Solutions
Pentaho Healthcare SolutionsPentaho Healthcare Solutions
Pentaho Healthcare Solutions
 
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcarePentaho Business Analytics for ISVs and SaaS providers in healthcare
Pentaho Business Analytics for ISVs and SaaS providers in healthcare
 
Bay Area Hadoop User Group
Bay Area Hadoop User GroupBay Area Hadoop User Group
Bay Area Hadoop User Group
 

Recently uploaded

Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingMAGNIntelligence
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and businessFrancesco Corti
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch TuesdayIvanti
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIVijayananda Mohire
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 

Recently uploaded (20)

Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
IT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced ComputingIT Service Management (ITSM) Best Practices for Advanced Computing
IT Service Management (ITSM) Best Practices for Advanced Computing
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
From the origin to the future of Open Source model and business
From the origin to the future of  Open Source model and businessFrom the origin to the future of  Open Source model and business
From the origin to the future of Open Source model and business
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
March Patch Tuesday
March Patch TuesdayMarch Patch Tuesday
March Patch Tuesday
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
My key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAIMy key hands-on projects in Quantum, and QAI
My key hands-on projects in Quantum, and QAI
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 

Pentaho and MongoDB Partner to Solve Government Big Data Challenges

  • 1. Pentaho & MongoDB Partner to Solve Government Big Data Challenges December 2013 Bob Gourley Publisher, CTOvision.com Will LaForest Director of Federal, MongoDB Dave Henry SVP Enterprise Solutions, Pentaho 1 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 2. Big Data Management Best Practices for Federal Big Data Projects Bob Gourley Publisher, CTOvision.com 2 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 3. Brief Purpose Research & Reports A focus on a new discipline of “Big Data Management” Contribute your thoughts at CTOvision.com 3 Intro to top 5 “Best Practices” of Federal Data activities Invitation to collaborate and refine approaches A perpetual draft - your input is requested © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 4. Update Sources  Big Data Government Newsletter - reader survey  2,600 readers  2% response rate, across Federal agencies  Review of openly published research by Wikibon, TDWI, IDC, Gartner, Forrester and of course our own CTOvision  Review of best practices and use cases from the best vendors in Enterprise Big Data  Engagement of the community at events like Strata and Hadoop World Planning Assumption The ability to collect, parse, analyze machine data in real time, whether on premise or in the cloud, will continue to grow 4 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 5. Big Data Management  Agencies are thinking through the right changes to concepts and technologies  Old approaches still important, but cannot solve emerging problems  Big Data Management is an evolved discipline which builds on existing data management approaches to leverage new concepts, technologies and best practices to optimize mission support 5 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 6. Solutions That Require Big Data Management • • • • • • • • • • Open Source Information: analysis and integration Situational Awareness across disparate data sets Two use cases: “Connect the Dots” and “Needle in Haystack” Cyber Security: rapid real time analysis of all relevant data Asset catalog across extensive/dynamic enterprises Rapid return of geospatial data Location based push of data Real time return of relevant search Real time suggestion of topics Bioinformatics: • Human Genome • Patient location, treatment, outcomes • Law Enforcement: Predictive Policing • Data Hub: Unified storage, governance, security, functionality 6 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 7. Best Practices in Big Data Management VISION STRATEGY Start with a mission-focused vision. This will vary by organization. Support to mission will drive everything else. Consider that analytics and Big Data go together. Should prioritize and tackle challenges like: Changes to governance processes, right mix of skills for workforce, learning new technology, prioritizing which workload types will be handled by which part of the architecture. KNOW DESIGN Document and continuously improve. Architect to manage data in its original form. Include right mix of traditional and new in your design. Don’t assume any one platform will be a solution. Architect to insulate applications and users from a variety of disparate big data platforms. EXECUTE 7 Know existing infrastructure and process with focus on: Understanding of legal/policy dynamics relevant to your agency, understanding of new capabilities available, current and required throughputs/capacities, types of workloads supported by each components in the architecture, available tech choices. Avoid custom coding wherever possible. Don’t let new Big Data Platforms become proprietary silos. ETL remains important. Ensure training for all based on job function. Don’t neglect your own training. Serve the analyst. © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 8. Next Steps  Continue your market surveys, stay aware of what new technologies can do for you.  Revisit your vision. As you do, ponder this: How can you leverage data to support your mission?  Continue to study use-cases and exchange best practices. Dialog with others in and out of your sector. Great lessons are coming from other industries.  Continue to engage with the broader community. Sign-up for our Government Big Data Weekly.  Share your lessons learned. 8 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 9. Provide Your Thoughts, Input, Questions E-mail: bob@ctovision.com Blog: http://ctovision.com Twitter: http://www.twitter.com/bobgourley Facebook, LinkedIn, etc: See the blog 9 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 10. The Modern Operational Database for Government Will LaForest Director of Federal, MongoDB 10 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 11. The Evolution of Databases 1990 2000 2010 Operational & Real-time Online NoSQL RDBMS RDBMS RDBMS Datawarehouse OLAP/BI OLAP/BI Hadoop Offline 11 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 12. Relational Database Challenges Variety Agile Development • Unstructured data • Iterative • Semi-structured data • Short development cycles • Polymorphic data • New workloads Volume & Velocity New Architectures • Petabytes of data • Horizontal scaling • Trillions of records • Commodity servers • Millions of queries per second 12 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 • Cloud computing
  • 13. MongoDB The Modern Operational Database General Purpose 13 Document Oriented © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 OpenSource
  • 14. Fully Featured Rich Queries • Find Paul’s cars • Find everybody in London with a car built between 1970 and 1980 Geospatial • Calculate the average value of Paul’s car collection • Secondary Native Indexes • Compound • Geospatial 14 first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } } • Find all the cars described as having leather seats Aggregation { • Find all of the car owners within 5km of Trafalgar Sq. Text Search MongoDB • Full Text • Hash • Covering © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 }
  • 15. MongoDB and Enterprise IT Stack CRM, ERP, Collaboration, Mobile, BI Data Management Online Data Offline Data RDBMS RDBMS Hadoop EDW Infrastructure OS & Virtualization, Compute, Storage, Network 15 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Security & Auditing Management & Monitoring Applications
  • 17. Document Data Model MongoDB Relational { first_name: ‘Paul’, surname: ‘Miller’ city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } ] } 17 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 18. Dynamic Schema MongoDB does not need any defined data schema. Every document could have different data {name: “will”, eyes: “blue”, birthplace: “NY”, aliases: [“bill”, “la ciacco”], gender: ”???”, boss: ”ben”} 18 {name: “jeff”, eyes: “blue”, height: 72, boss: “ben”} {name: “ben”, hat: ”yes”} © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 {name: “brendan”, aliases: [“el diablo”]} {name: “matt”, pizza: “DiGiorno”, height: 74, boss: 555.555.1212}
  • 19. Volume, Velocity, and New Architectures
  • 20. Automatic Sharding • Increase or decrease capacity as you go • Automatic balancing • Optimized for commodity servers and cloud infrastructure 20 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 21. High Availability • Automated replication and failover • 0 down time with hardware failure and upgrades • Multi-data center support • Improved operational simplicity (e.g., HW swaps) • Data durability and consistency 21 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 22. MongoDB Performance* Top 5 Marketing Firm Government Agency Top 5 Investment Bank 10+ fields, arrays, nested documents 20+ fields, arrays, nested documents Queries Key-based 1 – 100 docs/query 80/20 read/write Compound queries Range queries MapReduce 20/80 read/write Compound queries Range queries 50/50 read/write Servers ~250 ~50 ~40 Ops/sec 1,200,000 500,000 30,000 Data Key/value * These figures are provided as examples. Your application governs your performance. 22 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 24. Operational and Analytical Workloads • Application interacts with primaries • Analytical workloads on secondaries • Workloads are isolated from one another • Working set appropriate for each application 24 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 25. Global Data Distribution Real-time Real-time Real-time Real-time Real-time Real-time Real-time 25 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 26. Read Global / Write Local Primary:LON Secondary:NYC Primary:NYC Secondary:SYD Secondary:LON Secondary:SYD Primary:SYD Secondary:LON Secondary:NYC 26 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 27. Solving Big Data Challenges in the Federal Government Dave Diegtel Head of Federal Sales, Pentaho 27 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 28. Why Pentaho for Federal Government • • Business Model and Subscription: Pentaho’s Subscription Model and Server-based pricing allows for lower upfront investment and risk compared to legacy BI vendors who traditionally cost an average of 4X for similar size deployments. • Government Certifications: Pentaho has made significant investments in Government Certifications and Compliance such as 508 and Security. • Open API’s and extensible architecture enable ease of integration and reduce potential for vendor lock-in. • 28 Company and Product Maturity: Pentaho has been around for over 9 years, with 1,000’s of paid customers, and 5.0 Version release. Pentaho is proven and less risky. Existing Government Customers and Cleared Personnel © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 29. A Comprehensive Big Data Platform Dave Henry Senior VP Enterprise Solutions, Pentaho 29 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 30. Pentaho 5.0 Architected for the Future Simplified analytics experience for all users Billing Customer Social Media Analytics Existing & New Data Infrastructure & Processes Web Location Network ANY Data • • • • 30 Relational Operational Big Data Data sources not yet anticipated… ANY Environment • • • • • Data warehouses Data marts Stack vendors Cloud Embedded © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 ANY Analytics • • • • • Reports Dashboards Visualizations Discovery Predictive
  • 31. The New Reality Simplified analysis for all users Simplified Analytics Experience Blended Big Data Enterprise Big Data Integration 31 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 32. Pentaho & MongoDB Enable Key Use Cases Customer 360 and Device Data Analytics enable comprehensive insight … • MongoDB delivers Scalable, Low-Latency Enterprise Data Store Mission Scope • Visual ETL development with Pentaho Data Integration (PDI) • Reporting, Dashboards, Visualization and Discovery with Pentaho Analytics 32 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Pentaho Data Integration Pentaho Analytics • Reporting • Dashboards • Visualization • Discovery Pentaho Data Integration
  • 33. Enterprise Customer Data Store Powerful data integration for MongoDB Customer Master PDI ETL POS Data Web Event Data $push to data arrays 33 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 mongoDB cluster
  • 34. Data Integration Exploits MongoDB’s native APIs and query language 34 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 35. Operational Reports Multi-page, highly formatted reports – real-time, scheduled or burst to email 35 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 36. Operational Dashboards Highly tailored, pixel-perfect dashboards on MongoDB 36 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 37. Analyzer Explore and visualize data 37 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 38. James Dixon Founder and CTO, Pentaho 38 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 As CTO at Pentaho, James Dixon is responsible for Pentaho's architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.
  • 39. Why Pentaho? • Pentaho is the best platform to connect, integrate, and analyze both traditional sources and MongoDB • Pentaho embraces and extends the MongoDB environment with rich visualization and exploration of data • Pentaho’s Subscription-based business model lowers upfront investments, enabling faster ROI • Pentaho has dozens of Federal Government Customers and made significant investments in government certifications and cleared personnel • Pentaho and MongoDB are established partners – Pentaho carefully engineers its products to use the latest MongoDB APIs to provide the best possible performance 39 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 40. Next Steps and Q&A • Needs Assessment with Pentaho and MongoDB • Dave Diegtel - ddiegtel@pentaho.com • Will LaForrest - will@mongodb.com • Try Pentaho (30 Free Trial) -- pentaho.com/download • Learn More about Big Data and Government Solutions • Pentaho • Big Data Website: pentahobigdata.com/ • Government Solutions: pentaho.com/solutions/government • MongoDB: • Government Solutions: mongodb.com/industries/government • Big Data: Examples and Guidelines for the Enterprise Decision Maker mongodb.com/lp/whitepaper/big-data-nosql • MongoDB Top 5 Considerations When Evaluating NoSQL Databases mongodb.com/lp/whitepaper/nosql-considerations • Sign-up for the Big Data Government Newsletter at CTOvision.com & take reader survey 40 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
  • 41. Thank You 41 © 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555