SlideShare a Scribd company logo
1 of 14
Data Analytic Technology Platforms
Options and Tradeoffs
J Singh
January 7, 2014
Do you have a “Big Data” problem?
• Or do you have a big “data problem”?

© DataThinks 2013-14
2

2
Some Big Data problems (1)
• Recommendations

© DataThinks 2013-14
3

3
Some Big Data problems (2)
• Financial Analysis
– Really Big Data if we want Real Time analysis

© DataThinks 2013-14
4

4
Some Big Data problems (3)
• Internet Infrastructure Security Monitoring

© DataThinks 2013-14
5

5
Other Big Data problems
• Network graph problems (Social Media data)
• Bioinformatics problems (Genomics data)
• Physics/engineering problems (Sensor data)
•…

• Key characteristics
1. Not much common between problems
2. Data too big to download or upload.
3. Data changes fast, requires near-real-time analysis.

© DataThinks 2013-14
6

6
Just Big “Data Problems” (JBDP)
• Most problems on Kaggle
• Popular data sets (e.g., Amazon, Kaggle, …, data sets)
– If it can be downloaded,
– If it doesn’t change very often, …
– It’s a JBDP

© DataThinks 2013-14
7

7
About us
• Technology and analytics service based on Big Data
problems, focused on small & medium companies
• Analytics products
– App Kinetics – Application analytics for servicing users
– Pop Kinetics – Population analytics for targeting prospects

© DataThinks 2013-14
8

8
Background for this talk
• Experience building the “Kinetics”
products
– Harvest the kinetic energy of
your data for the benefit of your
business 

• Prior work.
– Like-you: an application that

trolls through Facebook data to
find users who like the same
things you do

© DataThinks 2013-14
9

9
Governing Principle for Platform Choices
• Big Data is difficult to move
– If you can move it easily, how big can it really be?

• Processing needs to be brought closer to the data
– Moving the data to processing is a losing proposition.

• Connector solutions for a database won’t scale

© DataThinks 2013-14
10

10
Implications of the Governing Principle
• Architecture has to be optimized across the entire pipeline
– Lesson learned:
•
•
•
•

The architecture is a giant jig-saw puzzle
Best of breed solutions may not fit!
Importance of caching in the pipeline
Vendor lock-in may be inevitable

– Cost, Data Volume and Bandwidth are primary drivers

• Different stacks for different applications
– App Kinetics: MongoDB-based stack
– Pop Kinetics: S3, Elastic Map Reduce-based stack
– Similarities: Google App Engine, Google Map Reduce
© DataThinks 2013-14
11

11
Governing Principle in Action
Function
Data
Collection

App Kinetics

Pop Kinetics

Custom “probes”

Like-You

Facebook API

Facebook API

Data Storage MongoDB

Amazon S3

Google Datastore

Analysis

Mongo M/R (JS)
PyMongo
(Python)

Amazon EMR
(Hadoop, Python)

Google App
Engine M/R
(Python)

Visualization

HTML+D3 (JS)

Text

HTML+JS

Recommend
ations

Text

© DataThinks 2013-14
12

12
The decision-making process
• An iterative process (like solving a jig-saw puzzle)
– Not linear or formulaic

• What is the objective?

• About the data

– Discovery?

– Volume

• If there is a market?
• If the concept is
feasible?

–
–
–
–

• Rate of Growth

– Velocity
– Variety

Time to market?
Hitting a cost target?
A scalable solution?
Minimizing lock-in?

– Format
– Location, location, …

© DataThinks 2013-14
13
Thank you
• J Singh
– Principal, DataThinks
• j.singh@datathinks.org

– Adj. Prof, WPI

© DataThinks 2013-14
14

14

More Related Content

What's hot

Big dataproposal
Big dataproposalBig dataproposal
Big dataproposalQubole
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop frameworkTu Pham
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Wes McKinney
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using sparkDatabricks
 
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillFrom R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillDatabricks
 
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopHourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopMatthew Hayes
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
Introduction to Google BigQuery
Introduction to Google BigQueryIntroduction to Google BigQuery
Introduction to Google BigQueryCsaba Toth
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Zekeriya Besiroglu
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovBig Data Spain
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudEduardo Silva Pereira
 

What's hot (20)

Big dataproposal
Big dataproposalBig dataproposal
Big dataproposal
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web ArchivingHBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
HBaseCon 2015: Warcbase - Scaling 'Out' and 'Down' HBase for Web Archiving
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
Visualizing big data in the browser using spark
Visualizing big data in the browser using sparkVisualizing big data in the browser using spark
Visualizing big data in the browser using spark
 
From R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep GillFrom R Script to Production Using rsparkling with Navdeep Gill
From R Script to Production Using rsparkling with Navdeep Gill
 
Hourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on HadoopHourglass: a Library for Incremental Processing on Hadoop
Hourglass: a Library for Incremental Processing on Hadoop
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Introduction to Google BigQuery
Introduction to Google BigQueryIntroduction to Google BigQuery
Introduction to Google BigQuery
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey KharlamovRUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 

Viewers also liked

Take Care of Your Health
Take Care of Your HealthTake Care of Your Health
Take Care of Your HealthDr Vipul Gupta
 
Сведения о педагогических кадрах
Сведения о педагогических кадрахСведения о педагогических кадрах
Сведения о педагогических кадрахВладеймир Потанин
 
Harrow council Ref
Harrow council RefHarrow council Ref
Harrow council RefFozia Tahir
 
Reference from the Administration of the President of the Republic of Bulgaria
Reference from the Administration of the President of the Republic of BulgariaReference from the Administration of the President of the Republic of Bulgaria
Reference from the Administration of the President of the Republic of BulgariaGeorgi Zografov
 
Reference Letter - Frank AlvillarJr & Kevin Konst
Reference Letter - Frank AlvillarJr & Kevin KonstReference Letter - Frank AlvillarJr & Kevin Konst
Reference Letter - Frank AlvillarJr & Kevin KonstTravis Byakeddy
 
IABCSeattle Lara Feltin - Biznik - Social Media
IABCSeattle Lara Feltin - Biznik - Social MediaIABCSeattle Lara Feltin - Biznik - Social Media
IABCSeattle Lara Feltin - Biznik - Social MediaIABC Seattle
 
Получение энергии из отходов Херсон
Получение энергии из отходов Херсон Получение энергии из отходов Херсон
Получение энергии из отходов Херсон Pierre-Fran Beuchard
 
Inventory Valuation
Inventory ValuationInventory Valuation
Inventory Valuationmscuttle
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)J Singh
 

Viewers also liked (13)

Take Care of Your Health
Take Care of Your HealthTake Care of Your Health
Take Care of Your Health
 
Сведения о педагогических кадрах
Сведения о педагогических кадрахСведения о педагогических кадрах
Сведения о педагогических кадрах
 
Présentation OTC- INVEST
Présentation OTC- INVESTPrésentation OTC- INVEST
Présentation OTC- INVEST
 
Policy and Institutional Approaches in Nutrition Sensitive Agriculture: Feedb...
Policy and Institutional Approaches in Nutrition Sensitive Agriculture: Feedb...Policy and Institutional Approaches in Nutrition Sensitive Agriculture: Feedb...
Policy and Institutional Approaches in Nutrition Sensitive Agriculture: Feedb...
 
Harrow council Ref
Harrow council RefHarrow council Ref
Harrow council Ref
 
Reference from the Administration of the President of the Republic of Bulgaria
Reference from the Administration of the President of the Republic of BulgariaReference from the Administration of the President of the Republic of Bulgaria
Reference from the Administration of the President of the Republic of Bulgaria
 
Reference Letter - Frank AlvillarJr & Kevin Konst
Reference Letter - Frank AlvillarJr & Kevin KonstReference Letter - Frank AlvillarJr & Kevin Konst
Reference Letter - Frank AlvillarJr & Kevin Konst
 
IABCSeattle Lara Feltin - Biznik - Social Media
IABCSeattle Lara Feltin - Biznik - Social MediaIABCSeattle Lara Feltin - Biznik - Social Media
IABCSeattle Lara Feltin - Biznik - Social Media
 
Получение энергии из отходов Херсон
Получение энергии из отходов Херсон Получение энергии из отходов Херсон
Получение энергии из отходов Херсон
 
Management of SAH
Management of SAHManagement of SAH
Management of SAH
 
Inventory Valuation
Inventory ValuationInventory Valuation
Inventory Valuation
 
KM Machinist
KM MachinistKM Machinist
KM Machinist
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 

Similar to Data Analytic Technology Platforms: Options and Tradeoffs

Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersRuhollah Farchtchi
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013Erdenebayar Erdenebileg
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )fmarukanda
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Guido Schmutz
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Panorama Software
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design PatternsAllen Day, PhD
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyTamrMarketing
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwarePanorama Software
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohenTaldor Group
 

Similar to Data Analytic Technology Platforms: Options and Tradeoffs (20)

Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013Big Data Infrastructure and Analytics Solution on FITAT2013
Big Data Infrastructure and Analytics Solution on FITAT2013
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?Big Data and Fast Data – Big and Fast Combined, is it Possible?
Big Data and Fast Data – Big and Fast Combined, is it Possible?
 
Top BI trends and predictions for 2017
Top BI trends and predictions for 2017Top BI trends and predictions for 2017
Top BI trends and predictions for 2017
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 

More from J Singh

PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engineJ Singh
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceJ Singh
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitJ Singh
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlJ Singh
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query OptimizationJ Singh
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceJ Singh
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processingJ Singh
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 IntroductionJ Singh
 
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointCloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointJ Singh
 

More from J Singh (18)

PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engine
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and Performance
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processing
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 Introduction
 
Cloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's ViewpointCloud Computing from an Entrpreneur's Viewpoint
Cloud Computing from an Entrpreneur's Viewpoint
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Data Analytic Technology Platforms: Options and Tradeoffs

  • 1. Data Analytic Technology Platforms Options and Tradeoffs J Singh January 7, 2014
  • 2. Do you have a “Big Data” problem? • Or do you have a big “data problem”? © DataThinks 2013-14 2 2
  • 3. Some Big Data problems (1) • Recommendations © DataThinks 2013-14 3 3
  • 4. Some Big Data problems (2) • Financial Analysis – Really Big Data if we want Real Time analysis © DataThinks 2013-14 4 4
  • 5. Some Big Data problems (3) • Internet Infrastructure Security Monitoring © DataThinks 2013-14 5 5
  • 6. Other Big Data problems • Network graph problems (Social Media data) • Bioinformatics problems (Genomics data) • Physics/engineering problems (Sensor data) •… • Key characteristics 1. Not much common between problems 2. Data too big to download or upload. 3. Data changes fast, requires near-real-time analysis. © DataThinks 2013-14 6 6
  • 7. Just Big “Data Problems” (JBDP) • Most problems on Kaggle • Popular data sets (e.g., Amazon, Kaggle, …, data sets) – If it can be downloaded, – If it doesn’t change very often, … – It’s a JBDP © DataThinks 2013-14 7 7
  • 8. About us • Technology and analytics service based on Big Data problems, focused on small & medium companies • Analytics products – App Kinetics – Application analytics for servicing users – Pop Kinetics – Population analytics for targeting prospects © DataThinks 2013-14 8 8
  • 9. Background for this talk • Experience building the “Kinetics” products – Harvest the kinetic energy of your data for the benefit of your business  • Prior work. – Like-you: an application that trolls through Facebook data to find users who like the same things you do © DataThinks 2013-14 9 9
  • 10. Governing Principle for Platform Choices • Big Data is difficult to move – If you can move it easily, how big can it really be? • Processing needs to be brought closer to the data – Moving the data to processing is a losing proposition. • Connector solutions for a database won’t scale © DataThinks 2013-14 10 10
  • 11. Implications of the Governing Principle • Architecture has to be optimized across the entire pipeline – Lesson learned: • • • • The architecture is a giant jig-saw puzzle Best of breed solutions may not fit! Importance of caching in the pipeline Vendor lock-in may be inevitable – Cost, Data Volume and Bandwidth are primary drivers • Different stacks for different applications – App Kinetics: MongoDB-based stack – Pop Kinetics: S3, Elastic Map Reduce-based stack – Similarities: Google App Engine, Google Map Reduce © DataThinks 2013-14 11 11
  • 12. Governing Principle in Action Function Data Collection App Kinetics Pop Kinetics Custom “probes” Like-You Facebook API Facebook API Data Storage MongoDB Amazon S3 Google Datastore Analysis Mongo M/R (JS) PyMongo (Python) Amazon EMR (Hadoop, Python) Google App Engine M/R (Python) Visualization HTML+D3 (JS) Text HTML+JS Recommend ations Text © DataThinks 2013-14 12 12
  • 13. The decision-making process • An iterative process (like solving a jig-saw puzzle) – Not linear or formulaic • What is the objective? • About the data – Discovery? – Volume • If there is a market? • If the concept is feasible? – – – – • Rate of Growth – Velocity – Variety Time to market? Hitting a cost target? A scalable solution? Minimizing lock-in? – Format – Location, location, … © DataThinks 2013-14 13
  • 14. Thank you • J Singh – Principal, DataThinks • j.singh@datathinks.org – Adj. Prof, WPI © DataThinks 2013-14 14 14