• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Mining Big Data in Real Time
 

Mining Big Data in Real Time

on

  • 2,010 views

Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the ...

Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.

Statistics

Views

Total Views
2,010
Views on SlideShare
1,930
Embed Views
80

Actions

Likes
7
Downloads
97
Comments
0

3 Embeds 80

http://albertbifet.com 74
https://twitter.com 5
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Add names

Mining Big Data in Real Time Mining Big Data in Real Time Presentation Transcript

  • Mining Big Data in Real Time Albert Bifet
  • Motivation • BIG DATA is an OPEN SOURCE Software Revolution • BIG DATA Analytics 2.0 • What is happening right now • Why we need new tools? • Improve decision making: • Measure and react in REAL-TIME 2 7/6/2013
  • Real Time Decision Making 3 7/6/2013 Companies need to know: • what is happening right now, in real time, to be able to • react • anticipate and detect new business opportunities.
  • Big Data 6 Vs • Volume • Variety • Velocity • Value • Variability • Veracity 4 7/6/2013
  • Controversy of Big Data • All data is BIG now • Hype to sell Hadoop based systems • Ethical concerns about accessibility • Limited access to Big Data creates new digital divides 5 7/6/2013
  • Controversy of Big Data • Statistical Significance: – When the number of variables grow, the number of fake correlations also grow – Leinweber: S&P 500 stock index correlated with butter production in Bangladesh 6 7/6/2013
  • Need for Big Data • McKinsey Global Institute (MGI) Report on Big Data, 2011 7 7/6/2013
  • Need for Big Data 8 7/6/2013 • McKinsey Global Institute (MGI) Report on Big Data, 2011
  • More data or better models? 9 7/6/2013 Xavier Amatriain Netflix Research/Engineering Director http://recsys.acm.org/more-data-or-better-models/
  • Future Challenges for Big Data • Evaluation • Time evolving data • Distributed mining • Compression • Visualization • Hidden Big Data 10 7/6/2013
  • HADOOPArchitecture 11 7/6/2013
  • Apache Mahout 12 7/6/2013
  • Pig 13 7/6/2013 Pig Similar to SQL
  • Apache S4 14 7/6/2013
  • Twitter Storm 15 7/6/2013
  • Runaway Complexity 16 7/6/2013 Tools All data Precomputed batch view Query Precomputed realtime view New data stream Hadoop Storm “Lambda Architecture” Storm ElephantDB, Voldemort Cassandra, Riak, HBase Kafka
  • What is SAMOA? 17 7/6/2013 • NEW Software framework for mining distributed data streams • Big Data mining for evolving streams in REAL-TIME
  • 18 7/6/2013 Big Data Stream Mining BIG DATA Streams • Sequence is potentially infinite • High amount of data, high speed of arrival • Change over time • Process elements from a data stream in only one pass • Approximation algorithms – Small error rate with high probability
  • 19 7/6/2013 Big Data Stream Mining Distributed BIG DATA • BIG DATA Analytics 2.0 – Apache S4 • Yahoo! 2010 – Storm • Twitter 2011 Machine Learning Distributed Batch Hadoop Mahout Stream S4, Storm SAMOA Non Distributed Batch R, WEKA,… Stream MOA
  • SAMOAArchitecture Use S4, Storm, or other distributed stream processing platform Use MOA, or other streaming machine learning library Easy to extend through PACKAGES 20 7/6/2013 SAMOA S4 Storm … SAMOA Classifier Methods Clustering Methods Frequent Pattern Mining
  • Thanks! http://samoa-project.net/ G. De Francisci Morales SAMOA: A Platform for Mining Big Data Streams Keynote Talk at RAMSS ’13: 2nd International Workshop on Real-Time Analysis and Mining of Social Streams @WWW, Rio De Janeiro, 2013. 21 7/6/2013