• Save
Data Analytic Technology Platforms: Options and Tradeoffs
Upcoming SlideShare
Loading in...5
×
 

Data Analytic Technology Platforms: Options and Tradeoffs

on

  • 406 views

 

Statistics

Views

Total Views
406
Views on SlideShare
405
Embed Views
1

Actions

Likes
1
Downloads
0
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Analytic Technology Platforms: Options and Tradeoffs Data Analytic Technology Platforms: Options and Tradeoffs Presentation Transcript

  • Data Analytic Technology Platforms Options and Tradeoffs J Singh January 7, 2014
  • Do you have a “Big Data” problem? • Or do you have a big “data problem”? © DataThinks 2013-14 2 2
  • Some Big Data problems (1) • Recommendations © DataThinks 2013-14 3 3
  • Some Big Data problems (2) • Financial Analysis – Really Big Data if we want Real Time analysis © DataThinks 2013-14 4 4
  • Some Big Data problems (3) • Internet Infrastructure Security Monitoring © DataThinks 2013-14 5 5
  • Other Big Data problems • Network graph problems (Social Media data) • Bioinformatics problems (Genomics data) • Physics/engineering problems (Sensor data) •… • Key characteristics 1. Not much common between problems 2. Data too big to download or upload. 3. Data changes fast, requires near-real-time analysis. © DataThinks 2013-14 6 6
  • Just Big “Data Problems” (JBDP) • Most problems on Kaggle • Popular data sets (e.g., Amazon, Kaggle, …, data sets) – If it can be downloaded, – If it doesn’t change very often, … – It’s a JBDP © DataThinks 2013-14 7 7
  • About us • Technology and analytics service based on Big Data problems, focused on small & medium companies • Analytics products – App Kinetics – Application analytics for servicing users – Pop Kinetics – Population analytics for targeting prospects © DataThinks 2013-14 8 8
  • Background for this talk • Experience building the “Kinetics” products – Harvest the kinetic energy of your data for the benefit of your business  • Prior work. – Like-you: an application that trolls through Facebook data to find users who like the same things you do © DataThinks 2013-14 9 9
  • Governing Principle for Platform Choices • Big Data is difficult to move – If you can move it easily, how big can it really be? • Processing needs to be brought closer to the data – Moving the data to processing is a losing proposition. • Connector solutions for a database won’t scale © DataThinks 2013-14 10 10
  • Implications of the Governing Principle • Architecture has to be optimized across the entire pipeline – Lesson learned: • • • • The architecture is a giant jig-saw puzzle Best of breed solutions may not fit! Importance of caching in the pipeline Vendor lock-in may be inevitable – Cost, Data Volume and Bandwidth are primary drivers • Different stacks for different applications – App Kinetics: MongoDB-based stack – Pop Kinetics: S3, Elastic Map Reduce-based stack – Similarities: Google App Engine, Google Map Reduce © DataThinks 2013-14 11 11
  • Governing Principle in Action Function Data Collection App Kinetics Pop Kinetics Custom “probes” Like-You Facebook API Facebook API Data Storage MongoDB Amazon S3 Google Datastore Analysis Mongo M/R (JS) PyMongo (Python) Amazon EMR (Hadoop, Python) Google App Engine M/R (Python) Visualization HTML+D3 (JS) Text HTML+JS Recommend ations Text © DataThinks 2013-14 12 12
  • The decision-making process • An iterative process (like solving a jig-saw puzzle) – Not linear or formulaic • What is the objective? • About the data – Discovery? – Volume • If there is a market? • If the concept is feasible? – – – – • Rate of Growth – Velocity – Variety Time to market? Hitting a cost target? A scalable solution? Minimizing lock-in? – Format – Location, location, … © DataThinks 2013-14 13
  • Thank you • J Singh – Principal, DataThinks • j.singh@datathinks.org – Adj. Prof, WPI © DataThinks 2013-14 14 14