SlideShare a Scribd company logo
1 of 36
Download to read offline
What is Big Data in a Nutshell?:
An Introduction to Problems and
Bottlenecks in Data Systems
Zach Gazak
David E Drummond
Insight Data Science & Engineering
Program mentors are data teams from top
technology companies including:
500+
Fellows
100+
Companies
Goals
• Understand what can be done with “Big Data” and
the scale of the data.
• Understand the hardware bottlenecks that dictate
the technology “stack”.
• Understand different stacks that are used for
different types of companies, and why.
Facebook is Data
Types of Data
• Audio / Visual:
Images and Videos
• Text: Comments,
Notes, Profile Content
• Interactions: Likes,
Friendships, Groups
• Site usage: Log in,
Scroll, Click, Post, etc.
Types of Data
• Audio / Visual:
Images and Videos
• Text: Comments,
Notes, Profile Content
• Interactions: Likes,
Friendships, Groups
• Site usage: Log in,
Scroll, Click, Post, etc.
Unstructured
Structured
How is it Used?
Business Intelligence / Analytics Customer engagement
How is it Used?
Research and Development
Product Iteration and Improvement
How is it Used?
How much data is there?
For Zach:
• ~1 MB per month
• Unstructured data only
How much data is there?
For 1.2 billion Zachs ~ 1.2 petabytes per month
How is this done?
Hardware basics
Various ports
(I/O)
up to ~ 10GB/s
CPU
(processor)
~ 1GHz
Hard Drive
(storage)
~ 250GB
RAM
(memory)
~ 8GB
Various ports
(I/O)
up to ~ 10GB/s
RAM
(memory)
~ 8GB
CPU
(processor)
~ 1GHz
Hard Drive (storage)
~ 250GB
Network Processing Storage
Bottlenecks in Data Systems
Proper data system design should consider these
limiting bottlenecks:
• Processing time by the CPU
• Loading data into the CPU and memory
• Finding data on the disk
• Reading data from the disk
• Moving data across the network
Bottlenecks: Processing Data
• All data that is processed must be loaded into the CPU
Disk Storage
Memory
CPU
Price
Speed
Bottlenecks: Processing Data
• All data that is processed must be loaded into the CPU
Disk Storage
Memory
CPU
Price
Speed
• Solution: Storage Hierachy, Supercomputers, Distributed Systems
Bottlenecks: Finding Data
• Finding a new file on disk (known as random seeks)
Actuator arm
with head that reads from disk
End of Desired File
Beginning of Desired File
Bottlenecks: Finding Data
• Finding a new file on disk (known as random seeks)
• Solution: SSD and structured databases for specific use cases
Actuator arm
with head that reads from disk
End of Desired File
Beginning of Desired File
Bottlenecks: Moving Data
• Moving data from machine to machine over a network
Bottlenecks: Moving Data
• Solution: Keeping data close to the processors (MapReduce)
• Moving data from machine to machine over a network
Bottlenecks: Example
• Processing a 2 kB transaction in memory, sequentially and
randomly on disk, or across the network
100 :1 200 :1 50 :1
Open Questions
• Will processors continue to improve?
• Are there new types of processing?
• What if memory replaced hard
disks?
Quantum Computing
GPU and Deep Learning
Memory Optimized
Tech Stacks for Companies
Depending on your growth plans:
• Single system with small data
• Distributed data center with large data
• Renting computers for flexibility (cloud)
Small Firms with Small Data
• Example: Small medical firm with slow growth
• Pros: Easy to maintain, data locality, inexpensive
• Cons: Difficult to grow quickly, risky, not ideal for analysis
Small Firms with Small Data
• Example: Small medical firm with slow growth
• Pros: Easy to maintain, data locality, inexpensive
• Cons: Difficult to grow quickly, risky, not ideal for analysis
Small Firms with Small Data
Large Firms with Stable Growth
• Example: Facebook with steadily growing data centers
• Pros: Economies of scale, redundancy, innovative design
• Cons: Upfront capital, dedicated maintenance
• >100 PB of Data
• 7 PB / Day
• 1 kW / TB
• ~$20 / TB / Month
Start-Ups with Exponential Growth
• Example: AirBnB - rent processing and storage from AWS
• Pros: Scales easily, no maintenance, no upfront capital
• Cons: Expensive in the long run, depend on data provider
• 50 GB / Day
• $20-50 / TB / Mo
Start-Ups with Exponential Growth
• Example: Netflix - AWS fails on Christmas Eve
• Con: You can rent the computers, but you own the failure
Questions?
• info@insightdatascience.com
• jzgazak@gmail.com
• david@insightdatascience.com

More Related Content

What's hot

Making Big Data a First Class citizen in the enterprise
Making Big Data a First Class citizen in the enterpriseMaking Big Data a First Class citizen in the enterprise
Making Big Data a First Class citizen in the enterpriseTony Baer
 
i schools - panel session
i schools - panel sessioni schools - panel session
i schools - panel sessionARDC
 
Data Intelligence Overview
Data Intelligence OverviewData Intelligence Overview
Data Intelligence OverviewGDPR SMEs
 
Modernising the data warehouse - January 2019
Modernising the data warehouse - January 2019Modernising the data warehouse - January 2019
Modernising the data warehouse - January 2019Phil Watt
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldDez Blanchfield
 
Datacenter Pulse Stack v2
Datacenter Pulse Stack v2Datacenter Pulse Stack v2
Datacenter Pulse Stack v2Jan Wiersma
 
Technology on a Shoestring Michelle Murrain
Technology on a Shoestring Michelle MurrainTechnology on a Shoestring Michelle Murrain
Technology on a Shoestring Michelle Murrainwebhostingguy
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...Dataiku
 
Big data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopSamiraChandan
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation17aroumougamh
 

What's hot (18)

Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Making Big Data a First Class citizen in the enterprise
Making Big Data a First Class citizen in the enterpriseMaking Big Data a First Class citizen in the enterprise
Making Big Data a First Class citizen in the enterprise
 
i schools - panel session
i schools - panel sessioni schools - panel session
i schools - panel session
 
Data Intelligence Overview
Data Intelligence OverviewData Intelligence Overview
Data Intelligence Overview
 
Modernising the data warehouse - January 2019
Modernising the data warehouse - January 2019Modernising the data warehouse - January 2019
Modernising the data warehouse - January 2019
 
Finals(Group3)
Finals(Group3)Finals(Group3)
Finals(Group3)
 
Finals
FinalsFinals
Finals
 
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez BlanchfieldBig Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
 
Datacenter Pulse Stack v2
Datacenter Pulse Stack v2Datacenter Pulse Stack v2
Datacenter Pulse Stack v2
 
Hadoop(Term Paper)
Hadoop(Term Paper)Hadoop(Term Paper)
Hadoop(Term Paper)
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
Technology on a Shoestring Michelle Murrain
Technology on a Shoestring Michelle MurrainTechnology on a Shoestring Michelle Murrain
Technology on a Shoestring Michelle Murrain
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...Dataiku  -  data driven nyc  - april  2016 - the  solitude of the data team m...
Dataiku - data driven nyc - april 2016 - the solitude of the data team m...
 
Big data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and Hadoop
 
Data Cleaning
Data CleaningData Cleaning
Data Cleaning
 
Big Data Final Presentation
Big Data Final PresentationBig Data Final Presentation
Big Data Final Presentation
 
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
Introduction to Research Data Management - 2015-02-09 - MPLS Division, Univer...
 

Viewers also liked

ada 2 bloque 3
ada 2 bloque 3ada 2 bloque 3
ada 2 bloque 3markolol25
 
Personal Branding Checklists for You and Your Team in 2016
Personal Branding Checklists for You and Your Team in 2016Personal Branding Checklists for You and Your Team in 2016
Personal Branding Checklists for You and Your Team in 2016Kredible
 
Rentmania - коляска, люлька, автокресло: купить нельзя арендовать
Rentmania - коляска, люлька, автокресло: купить нельзя арендоватьRentmania - коляска, люлька, автокресло: купить нельзя арендовать
Rentmania - коляска, люлька, автокресло: купить нельзя арендоватьEfim Aldoukhov
 
IIA TIMES, Special Issue, February 2017, Edited by Sarbjit Bahga
IIA TIMES, Special Issue, February 2017, Edited by Sarbjit BahgaIIA TIMES, Special Issue, February 2017, Edited by Sarbjit Bahga
IIA TIMES, Special Issue, February 2017, Edited by Sarbjit BahgaSarbjit Bahga
 

Viewers also liked (10)

Helton Resume 2016
Helton Resume 2016Helton Resume 2016
Helton Resume 2016
 
22.03.2013, NEWSWIRE, Issue 266
22.03.2013, NEWSWIRE, Issue 26622.03.2013, NEWSWIRE, Issue 266
22.03.2013, NEWSWIRE, Issue 266
 
ada 2 bloque 3
ada 2 bloque 3ada 2 bloque 3
ada 2 bloque 3
 
финальная презентация визионеры
финальная презентация визионерыфинальная презентация визионеры
финальная презентация визионеры
 
We love each other
We love each otherWe love each other
We love each other
 
Doc10
Doc10Doc10
Doc10
 
powder power
powder powerpowder power
powder power
 
Personal Branding Checklists for You and Your Team in 2016
Personal Branding Checklists for You and Your Team in 2016Personal Branding Checklists for You and Your Team in 2016
Personal Branding Checklists for You and Your Team in 2016
 
Rentmania - коляска, люлька, автокресло: купить нельзя арендовать
Rentmania - коляска, люлька, автокресло: купить нельзя арендоватьRentmania - коляска, люлька, автокресло: купить нельзя арендовать
Rentmania - коляска, люлька, автокресло: купить нельзя арендовать
 
IIA TIMES, Special Issue, February 2017, Edited by Sarbjit Bahga
IIA TIMES, Special Issue, February 2017, Edited by Sarbjit BahgaIIA TIMES, Special Issue, February 2017, Edited by Sarbjit Bahga
IIA TIMES, Special Issue, February 2017, Edited by Sarbjit Bahga
 

Similar to Data for Action Talk - 2016-02-22

Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data SystemsWhere Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data SystemsInsightDataScience
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data scienceLoïc Lejoly
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and HadoopSatyaHadoop
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data ScienceNiko Vuokko
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game ChangerCaserta
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxPriyadarshini648418
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
(STG308) How EA, State Of Texas & H3 Biomedicine Protect Data
(STG308) How EA, State Of Texas & H3 Biomedicine Protect Data(STG308) How EA, State Of Texas & H3 Biomedicine Protect Data
(STG308) How EA, State Of Texas & H3 Biomedicine Protect DataAmazon Web Services
 

Similar to Data for Action Talk - 2016-02-22 (20)

Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data SystemsWhere Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data Science
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Data Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptxData Science Machine Lerning Bigdat.pptx
Data Science Machine Lerning Bigdat.pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
(STG308) How EA, State Of Texas & H3 Biomedicine Protect Data
(STG308) How EA, State Of Texas & H3 Biomedicine Protect Data(STG308) How EA, State Of Texas & H3 Biomedicine Protect Data
(STG308) How EA, State Of Texas & H3 Biomedicine Protect Data
 
Database & Database Users
Database & Database UsersDatabase & Database Users
Database & Database Users
 

Recently uploaded

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 

Recently uploaded (20)

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

Data for Action Talk - 2016-02-22

  • 1. What is Big Data in a Nutshell?: An Introduction to Problems and Bottlenecks in Data Systems Zach Gazak David E Drummond Insight Data Science & Engineering
  • 2.
  • 3. Program mentors are data teams from top technology companies including: 500+ Fellows 100+ Companies
  • 4. Goals • Understand what can be done with “Big Data” and the scale of the data. • Understand the hardware bottlenecks that dictate the technology “stack”. • Understand different stacks that are used for different types of companies, and why.
  • 6. Types of Data • Audio / Visual: Images and Videos • Text: Comments, Notes, Profile Content • Interactions: Likes, Friendships, Groups • Site usage: Log in, Scroll, Click, Post, etc.
  • 7. Types of Data • Audio / Visual: Images and Videos • Text: Comments, Notes, Profile Content • Interactions: Likes, Friendships, Groups • Site usage: Log in, Scroll, Click, Post, etc. Unstructured Structured
  • 8. How is it Used? Business Intelligence / Analytics Customer engagement
  • 9. How is it Used? Research and Development Product Iteration and Improvement
  • 10. How is it Used?
  • 11. How much data is there? For Zach: • ~1 MB per month • Unstructured data only
  • 12. How much data is there? For 1.2 billion Zachs ~ 1.2 petabytes per month
  • 13. How is this done?
  • 15. Various ports (I/O) up to ~ 10GB/s CPU (processor) ~ 1GHz Hard Drive (storage) ~ 250GB RAM (memory) ~ 8GB
  • 16. Various ports (I/O) up to ~ 10GB/s RAM (memory) ~ 8GB CPU (processor) ~ 1GHz Hard Drive (storage) ~ 250GB Network Processing Storage
  • 17. Bottlenecks in Data Systems Proper data system design should consider these limiting bottlenecks: • Processing time by the CPU • Loading data into the CPU and memory • Finding data on the disk • Reading data from the disk • Moving data across the network
  • 18. Bottlenecks: Processing Data • All data that is processed must be loaded into the CPU Disk Storage Memory CPU Price Speed
  • 19. Bottlenecks: Processing Data • All data that is processed must be loaded into the CPU Disk Storage Memory CPU Price Speed • Solution: Storage Hierachy, Supercomputers, Distributed Systems
  • 20. Bottlenecks: Finding Data • Finding a new file on disk (known as random seeks) Actuator arm with head that reads from disk End of Desired File Beginning of Desired File
  • 21. Bottlenecks: Finding Data • Finding a new file on disk (known as random seeks) • Solution: SSD and structured databases for specific use cases Actuator arm with head that reads from disk End of Desired File Beginning of Desired File
  • 22. Bottlenecks: Moving Data • Moving data from machine to machine over a network
  • 23. Bottlenecks: Moving Data • Solution: Keeping data close to the processors (MapReduce) • Moving data from machine to machine over a network
  • 24. Bottlenecks: Example • Processing a 2 kB transaction in memory, sequentially and randomly on disk, or across the network 100 :1 200 :1 50 :1
  • 25. Open Questions • Will processors continue to improve? • Are there new types of processing? • What if memory replaced hard disks?
  • 27. GPU and Deep Learning
  • 29. Tech Stacks for Companies Depending on your growth plans: • Single system with small data • Distributed data center with large data • Renting computers for flexibility (cloud)
  • 30. Small Firms with Small Data • Example: Small medical firm with slow growth • Pros: Easy to maintain, data locality, inexpensive • Cons: Difficult to grow quickly, risky, not ideal for analysis
  • 31. Small Firms with Small Data • Example: Small medical firm with slow growth • Pros: Easy to maintain, data locality, inexpensive • Cons: Difficult to grow quickly, risky, not ideal for analysis
  • 32. Small Firms with Small Data
  • 33. Large Firms with Stable Growth • Example: Facebook with steadily growing data centers • Pros: Economies of scale, redundancy, innovative design • Cons: Upfront capital, dedicated maintenance • >100 PB of Data • 7 PB / Day • 1 kW / TB • ~$20 / TB / Month
  • 34. Start-Ups with Exponential Growth • Example: AirBnB - rent processing and storage from AWS • Pros: Scales easily, no maintenance, no upfront capital • Cons: Expensive in the long run, depend on data provider • 50 GB / Day • $20-50 / TB / Mo
  • 35. Start-Ups with Exponential Growth • Example: Netflix - AWS fails on Christmas Eve • Con: You can rent the computers, but you own the failure