Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

- Machine Learning Lecture 1 - Introd... by butest 1124 views
- Beat Bill Belichick - Deep Learning... by Sri Ambati 604 views
- Data science for advanced dummies by Saurav Chakravorty 340 views
- Matrix Computations in Machine Lear... by butest 930 views
- A Predictive Model Factory Picks Up... by Sri Ambati 811 views
- Bayesian Belief Networks for dummies by Gilad Barkan 527 views

447

Published on

This is an introductory lecture of the buzziest domain technology nowadays.

The domain encapsulates a lot of new concepts, keywords, theories and paradigm shifts, from computer science to business.

No Downloads

Total Views

447

On Slideshare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

19

Comments

0

Likes

1

No embeds

No notes for slide

- 1. The Rise of Big Data Science GILAD BARKAN
- 2. Big Data Science Big Data Data Science Big Data Science
- 3. Big Data Why ? What ? How ?
- 4. Big Data Why ? What ? How ?
- 5. Why Big Data ? It’s the flooded information era we live in In a world where data is power, big data is big power
- 6. Why Big Data ? Web 2.0
- 7. Why should we care about Big Data ? The big business opportunities Competitive fast moving marketplace Capitalize on business opportunities before everyone else Existing channels to every person on the planet Maximizing revenues from customers Segment-of-1 - more personal customer experiences
- 8. Big Data Why ? What ? How ?
- 9. What is Big Data ? The 3 V’s Volume Variety Velocity
- 10. What is Big Data ? The 3 V’s Volume Variety Velocity
- 11. Big Data - Volume
- 12. Big Data - Volume Big Users More Users, All the Time 2 35 1 + Billion Global Online Population Billion Hours Hours Spent Online Billion Smartphone Users
- 13. More Users More Data + Big Data
- 14. What is Big Data ? The 3 V’s Volume Variety Velocity
- 15. Big Data - Variety Trillions of Gigabytes (Zettabytes) Heterogeneous sources of data Structured Un/SemiStructured Data Unstructured Structured Data Audio images tables text video 700 MB / movie Text, Log Files, Click 5000 KB / song Streams, Blogs, T weets, Audio, Vide o, etc. 1000 KB / image 5 KB / record Traditional Structured SQL 50 KB / record Unstructured NoSQL
- 16. What is Big Data ? The 3 V’s Volume Variety Velocity
- 17. Big Data - Velocity How the hell does Google return an answer in 0.28 seconds by looking at 4 Billion pages?
- 18. Big Data - Velocity Online Advertisement - Real Time Bidding (RTB)
- 19. Big Data - Velocity Recommendations
- 20. Big Data Why ? What ? How ?
- 21. How is Big Data Handled ? The challenge is huge Store, analyze and serve huge volume of variety of data in high velocity We can’t achieve this using a single machine, no matters how strong it is. Why? Expensive – stay tuned Load balancing requests Outbrain serves 3,000 per second DG (MediaMind) serves 500K per second!!! Not fault tolerant
- 22. The Big Data Paradigms Shifts Volume Distributing the Data Scale Out Scale Up (Horizontal) (Vertical) SQL Server Hadoop Cluster HDFS (GFS) Nodes
- 23. Big Data –Reducing Costs Hadoop is a 5 times cheaper infrastructure !!! TCO (purchase + maintenance) for 3 years per 300 TB: DBMS server = 5 M$ 75 nodes cluster = 1 M$
- 24. Big Data Paradigm Shift - Computing MapReduce Computing Paradigm Exploiting the distributed architecture for large scale computations in parallel
- 25. MapReduce “Hello MapReduce” – counting words Map Mappers W the C the 7 Cow 1 quick 0 W C the 9 Cow Hadoop Cluster 2 W URL 2 0 quick 1 quick 3 Reduce 5 Cow Master C Reducer + W C the 21 Cow 2 quick 5
- 26. Big Data Paradigm Shift – NoSQL Variety Schema-less databases to support the variety of data Complex SQL queries (joins, etc.) in a distributed data framework is extremely inefficient Key-Value Store NoSQL Key Value user_id Any – not single primary as in SQL tables url text image_id video_id images video any
- 27. Big Data Paradigm Shift – Velocity RAM-based DBs instead of traditional disk-based DBs Store critical data in memory (much more expensive) If the data doesn't come to Alg - Alg will come to the data Alg Write Read Data Alg Read Write Data traditional today
- 28. Big Data - Summary
- 29. Big Data - Summary BIG business opportunities The 3 V’s: Volume, Variety, Velocity Technological paradigm shifts
- 30. Big Data Technological Paradigm Shifts Volume Scale up Map Variety NoSQL Scale Out Mappers Key Value Velocity Reduce Alg Alg Data Master Reducer Data
- 31. Big Data - Summary BIG business opportunities The 3 V’s: Volume, Variety, Velocity Computing and DB paradigm shifts Flood of new (open source) technologies
- 32. Flood of New Big Data Technologies Open Source
- 33. Big Data - Summary BIG business opportunities The 3 V’s: Volume, Variety, Velocity Computing and DB paradigm shifts Flood of new (open source) technologies It’s definitely not just a buzz
- 34. Big Buzz ?
- 35. Big Data - Summary BIG business opportunities The 3 V’s: Volume, Variety, Velocity Computing and DB paradigm shifts Flood of new (open source) technologies It’s definitely not just a buzz It’s a real response to the world hectic paced evolution reducing costs by order of magnitude Still it doesn’t mean every business today will / should transform its technology stack to support big data
- 36. Big Data Science Big Data Data Science Big Data Science
- 37. Data Science Why ? What ? How ?
- 38. Data Science Why ? What ? How ?
- 39. Why Data Science ? data scientists
- 40. Data is a real value Facebook acquires Onavo for ~150M$
- 41. Data Science Why ? What ? How ?
- 42. Welcome to the Intelligent world Data Analysis Data Mining Data Analytics Data Science Automatic Decisioning Machine Learning Predictive Analytics
- 43. Data Miners are the New Gold Miners
- 44. Search
- 45. Online Advertisement - Real Time Bidding (RTB)
- 46. Recommendations Recommendations
- 47. Text Analysis
- 48. CRM – Customers Churn Prediction
- 49. Time Series Analysis
- 50. Machine Learning Classification Clustering Regression Recommendation
- 51. Classification Amdocs Insight™ - why is the customer calling the Call Center ? Pay Bill Third Party Charges Bill too high Overage Abnormal fee
- 52. Clustering Market Segmentation Social Network Analysis
- 53. Regression Housing price prediction 400 Price ($) in 1000’s 300 280 215 200 100 50 100 130 150 Size in m2 200 250
- 54. The Data Scientist
- 55. Data Scientist Skillset Hands on tools, languages, technologies MsC / PhD in Math, CS, Stats, Physics Hands on the specific problem domain
- 56. Data Science ≠ BI Apply advanced statistical machine learning algorithms to: dig deeper to find patterns that traditional BI tools may not reveal much wider domains / applications spectrum Predictive Analytics ≠ Exploratory Analytics
- 57. Predictive Analytics Data Science Big Data Science Vs. Exploratory Analytics Business Intelligence Traditional BI Exploratory Analytics
- 58. Academia Response to Data Science
- 59. Data Science Why ? What ? How ?
- 60. The Art of Data Science We need at least one semester course for it Still…
- 61. Data Science Life Cycle Run Time Offline Data Analysis Understand Data Prepare Data Monitor Business Goal Deploy Model Evaluate
- 62. Closing the Loop Technically wise, what do you think? Is Big Data good or bad for Data Science ? Big Data Data Science Big Data Science
- 63. The Bad - Finding a Needle in a Haystack It’s the same treasure that hides – the problem is that the pile is now huge Big Data Big Noise
- 64. The Bad - Finding a Needle in a Haystack It’s the same treasure that hides – the problem is that the pile is now huge Big Data Big Noise
- 65. The Good - The Statistical View Statistics is predictive analytics’ fuel ! The more data you have (Big Data) the better your predictive models will perform
- 66. Law of Large Numbers
- 67. Law of Large Numbers
- 68. Law of Large Numbers
- 69. Law of Large Numbers
- 70. Law of Large Numbers
- 71. Law of Large Numbers
- 72. Combining the Good & Bad Data is a function of quality and quantity High Quality Low Small Quantity Big
- 73. Big Data Science - Summary Big Data Big Numbers Big Opportunities Big Data is the buzziest technology nowadays Data Scientists the ones that coax the treasures for their companies, out of the big data Are multi-discipline skilled the new industry rock stars
- 74. Thank You for your attention

Be the first to comment