Your SlideShare is downloading. ×
Mining Big Data in Real Time
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mining Big Data in Real Time

2,984

Published on

Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the …

Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.

Published in: Technology, Business
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,984
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
256
Comments
0
Likes
8
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Add names
  • Transcript

    • 1. Mining Big Data in Real Time Albert Bifet
    • 2. Motivation • BIG DATA is an OPEN SOURCE Software Revolution • BIG DATA Analytics 2.0 • What is happening right now • Why we need new tools? • Improve decision making: • Measure and react in REAL-TIME 2 7/6/2013
    • 3. Real Time Decision Making 3 7/6/2013 Companies need to know: • what is happening right now, in real time, to be able to • react • anticipate and detect new business opportunities.
    • 4. Big Data 6 Vs • Volume • Variety • Velocity • Value • Variability • Veracity 4 7/6/2013
    • 5. Controversy of Big Data • All data is BIG now • Hype to sell Hadoop based systems • Ethical concerns about accessibility • Limited access to Big Data creates new digital divides 5 7/6/2013
    • 6. Controversy of Big Data • Statistical Significance: – When the number of variables grow, the number of fake correlations also grow – Leinweber: S&P 500 stock index correlated with butter production in Bangladesh 6 7/6/2013
    • 7. Need for Big Data • McKinsey Global Institute (MGI) Report on Big Data, 2011 7 7/6/2013
    • 8. Need for Big Data 8 7/6/2013 • McKinsey Global Institute (MGI) Report on Big Data, 2011
    • 9. More data or better models? 9 7/6/2013 Xavier Amatriain Netflix Research/Engineering Director http://recsys.acm.org/more-data-or-better-models/
    • 10. Future Challenges for Big Data • Evaluation • Time evolving data • Distributed mining • Compression • Visualization • Hidden Big Data 10 7/6/2013
    • 11. HADOOPArchitecture 11 7/6/2013
    • 12. Apache Mahout 12 7/6/2013
    • 13. Pig 13 7/6/2013 Pig Similar to SQL
    • 14. Apache S4 14 7/6/2013
    • 15. Twitter Storm 15 7/6/2013
    • 16. Runaway Complexity 16 7/6/2013 Tools All data Precomputed batch view Query Precomputed realtime view New data stream Hadoop Storm “Lambda Architecture” Storm ElephantDB, Voldemort Cassandra, Riak, HBase Kafka
    • 17. What is SAMOA? 17 7/6/2013 • NEW Software framework for mining distributed data streams • Big Data mining for evolving streams in REAL-TIME
    • 18. 18 7/6/2013 Big Data Stream Mining BIG DATA Streams • Sequence is potentially infinite • High amount of data, high speed of arrival • Change over time • Process elements from a data stream in only one pass • Approximation algorithms – Small error rate with high probability
    • 19. 19 7/6/2013 Big Data Stream Mining Distributed BIG DATA • BIG DATA Analytics 2.0 – Apache S4 • Yahoo! 2010 – Storm • Twitter 2011 Machine Learning Distributed Batch Hadoop Mahout Stream S4, Storm SAMOA Non Distributed Batch R, WEKA,… Stream MOA
    • 20. SAMOAArchitecture Use S4, Storm, or other distributed stream processing platform Use MOA, or other streaming machine learning library Easy to extend through PACKAGES 20 7/6/2013 SAMOA S4 Storm … SAMOA Classifier Methods Clustering Methods Frequent Pattern Mining
    • 21. Thanks! http://samoa-project.net/ G. De Francisci Morales SAMOA: A Platform for Mining Big Data Streams Keynote Talk at RAMSS ’13: 2nd International Workshop on Real-Time Analysis and Mining of Social Streams @WWW, Rio De Janeiro, 2013. 21 7/6/2013

    ×