SlideShare a Scribd company logo
1 of 34
Dr. Brian J. Spiering
Practical Tips On
Handling Big Data
hi, brian.
Data Science Faculty @GalvanizeU
@BrianSpiering
Roadmap
Defining “Big Data” (aka, you probably don’t have Big Data)
How to avoid Big Data (and associated problems)
Okay, I really have Big Data. What should I do?
1
2
3
Defining “Big Data”
(aka, you probably don’t have Big Data)
1
What is Big Data?
“Data sets with sizes beyond the ability of
commonly used software tools to capture,
curate, manage, and process data within a
tolerable amounts of time.”
What is Big Data?
“Data sets with sizes beyond the ability of
commonly used software tools to capture,
curate, manage, and process data within a
tolerable amounts of time.”
Data that doesn’t find on a single machine.
What is not Big Data?
How to avoid Big Data
(and associated problems)
2
Handling Medium Data
Cache
RAM
Disk
Data Center
Big Data Gotcha!
Scaling Out
1. Single Local Machine < 10s GB*
2. Single Cloud Machine < 2 TB*
3. Cloud Cluster of Machines > 2 TB*
* Summer 2016
Matrix Multiplication
Matrix Multiplication:
Imperative Implementation
Matrix Multiplication:
Functional Implementation
Matrix Multiplication
Head, Torso, Tail:
Separate models (and hardware)
Okay, I really have Big Data.
What should I do?
3
“But my data is more than 5TB!
(and I need it in memory)”
“But my data is more than 5TB!
(and I need it in memory)”
Your life sucks now…
You are stuck with
distributed computing
map reduce
Big Data is functional
What to do:
1. Learn some math tricks (linear algebra)
2. Learn how to optimize your code
3. Learn how to use cloud compute
4. Learn a Big Data Framework
Where have we been?
Defining “Big Data” (aka, you probably don’t have Big Data)
How to avoid Big Data (and associated problems)
Okay, I really have Big Data. What should I do?
1
2
3
Thank You!
Questions?
Practical Tips On Handling Big Data

More Related Content

What's hot

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataIMC Institute
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdataAbinaya B
 
One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media MarketsMichael Driscoll
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applicationspanoratio
 
What do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlingerWhat do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlingerSahil Kumar
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAAdam Doyle
 
CeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web InsightsCeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web InsightsAsgar Mammadli
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Makinggregoryg
 
Sztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningSztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningKatarzyna Mrowca
 
The Walking Data
The Walking DataThe Walking Data
The Walking DataJESS3
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Edureka!
 
Hadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code CampHadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code CampJim Argeropoulos
 

What's hot (20)

Making Sense of Data
Making Sense of DataMaking Sense of Data
Making Sense of Data
 
Big data
Big dataBig data
Big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
Are you ready for BIG DATA?
Are you ready for BIG DATA?Are you ready for BIG DATA?
Are you ready for BIG DATA?
 
One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media Markets
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applications
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
What do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlingerWhat do we do with all this big data by susan etlinger
What do we do with all this big data by susan etlinger
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEA
 
Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class Easylearning Guru online Hadoop class
Easylearning Guru online Hadoop class
 
R & Data mining in action
R & Data mining in actionR & Data mining in action
R & Data mining in action
 
CeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web InsightsCeDAWI - Center for Data Analytics and Web Insights
CeDAWI - Center for Data Analytics and Web Insights
 
Community-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision MakingCommunity-Assisted Software Engineering Decision Making
Community-Assisted Software Engineering Decision Making
 
Sztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data miningSztuka czytania między wierszami - R i Data mining
Sztuka czytania między wierszami - R i Data mining
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Data science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebookData science and_analytics_for_ordinary_people_ebook
Data science and_analytics_for_ordinary_people_ebook
 
The Walking Data
The Walking DataThe Walking Data
The Walking Data
 
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...
 
Hadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code CampHadoop 101: North East Wisconsin Code Camp
Hadoop 101: North East Wisconsin Code Camp
 

Similar to Practical Tips On Handling Big Data

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataKaran Desai
 
Big Data, Big Opportunities
Big Data, Big OpportunitiesBig Data, Big Opportunities
Big Data, Big OpportunitiesArimo, Inc.
 
BIG DATA-Seminar Report
BIG DATA-Seminar ReportBIG DATA-Seminar Report
BIG DATA-Seminar Reportjosnapv
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the datamark madsen
 
Big data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nnBig data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nnCathy McKnight
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Introduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAIntroduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAWim Van Leuven
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxcalf_ville86
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 
Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionGigaScience, BGI Hong Kong
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 

Similar to Practical Tips On Handling Big Data (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction Big data
Introduction Big data  Introduction Big data
Introduction Big data
 
Big Data, Big Opportunities
Big Data, Big OpportunitiesBig Data, Big Opportunities
Big Data, Big Opportunities
 
BIG DATA-Seminar Report
BIG DATA-Seminar ReportBIG DATA-Seminar Report
BIG DATA-Seminar Report
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
GADLJRIET850691
GADLJRIET850691GADLJRIET850691
GADLJRIET850691
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
Wake up and smell the data
Wake up and smell the dataWake up and smell the data
Wake up and smell the data
 
Big data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nnBig data and enterprise search trends 120827nn
Big data and enterprise search trends 120827nn
 
"Big Data Dreams"
"Big Data Dreams""Big Data Dreams"
"Big Data Dreams"
 
A Big Data Concept
A Big Data ConceptA Big Data Concept
A Big Data Concept
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Introduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBAIntroduction to big data for the EA course at Solvay MBA
Introduction to big data for the EA course at Solvay MBA
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 
Scott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science sessionScott Edmunds slides from #IDCC13 Data Science session
Scott Edmunds slides from #IDCC13 Data Science session
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 

Recently uploaded (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 

Practical Tips On Handling Big Data

Editor's Notes

  1. Good Evening! Tonight, I’m going to sharing a couple Practical Tips on Handling Big Data I’m …
  2. I have been working in Big Data for the last couple of years. About 1 year ago joined Galvanize Galvanize is education company - build learning communities GalvanizeU is MSDS Teach NLP, Big Data, and Deep Learning
  3. Many people think they have big data but
  4. Here is popular quote. Does this sound reasonable
  5. I’m more precisely define what I mean by a single machine Compute (RAM) Storage (Disk)
  6. You can load hundreds of megabytes into memory in an efficient vectorized format. Tell story- I working at SaaS company my intern fitting a random forest for churn, 1mm rows / 1K attributes About R (8 hours), python (1 hour) Spark (10 minutes)
  7. I was working for a company doing competitive intelligence… In a data frame 5 GBs on my laptop. realtime <100ms Wes McKinney projects to scale out Pandas - ibis / arrow Single “computer”
  8. redefine “machine”
  9. 2TB of RAM 2,000 GB In memory DB limited roll out / use but it’s the future
  10. bigger, cheaper, faster, easier
  11. [Walk through slowly] http://www.theregister.co.uk/2016/04/04/memory_and_storage_boundary_changes/
  12. Remember doing competitive intelligence project. It took 5 minutes to load into RAM. “The difference between RAM and cache is its performance, cost, and proximity to the CPU. Cache is faster, more costly, and closest to the CPU. Due to the cost there is much less cache than RAM. The most basic computer is a CPU and storage for data. The structure we have these days is to give us the most bang for the buck. Generally faster is more expensive. For best performance the faster more expensive storage is closer to the CPU. The relation is like this: CPU-L1 cache-L2 cache-RAM-Hard drive-backup media(tape). The CPU itself has its registers for storing data. The cost per bit of storage goes down from the CPU out.”
  13. Stay local or stay in the cloud I was storing the data Moore’s Law: the number of transistors in a dense integrated circuit doubles approximately every two years 60% annual growth rate- printer will smaller font, more information on each sheet "Kryder's Law” A 2005 Scientific American article, titled observed that magnetic disk areal storage density was then increasing very quickly.[2] The pace was then much faster than the two-year doubling time of semiconductor chip density posited by Moore's law. Nielsen's Law of Internet bandwidth states that: a high-end user's connection speed grows by 50% per year
  14. These numbers are going to change - Both in value - Relative tipping point What is your preference?
  15. Alex Smola - Carggie Melon now at leading AWS machine learning offerings
  16. http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture1/lecture1.html regression (GLM) PCA / eigenvalue
  17. Vanilla Python very clear & logical very slow
  18. functional programming is an API call: what, not how less code functional * hides optimizations we can swap out the underlying code
  19. optimization distribute/parrelize by row send each row to a worker (core or cluster member )
  20. Power Law - The internet 101 Chris Anderson Movies - a few blockbuster, many in the middle of the pack, youtube/vimeo has enable MANY amateur cinampthoers
  21. Power Law - The internet 101 Head, Torso, Tail for recommenders Keep: Head in Cache Torse in RAM Tail on Disk
  22. - Learn spark 1st then go back to Hadoop Spark, just works better and easy to understand Beyond the scope of the talk, DataBricks Cloud
  23. Get out the data center as quickly as possible Simple ETL into aggregate Competitive intelligence project. I would ETL on the cloud and fit arggeaget data locally
  24. inputs and output Hadoop / MapReduce / Spark extends but is still functional practice on simple problems then extend to data
  25. Keep The Goal, The Goal. I love delight people, especially customers What are trying to do with your data? Properly spec’d then not big data Data Density