SlideShare a Scribd company logo
Tour of Big Data
Raymond Yu
Socal Code Camp 2013
About myself
•Sr. Database Architect @ BridgePoint
Education
•Blog www.yutechnet.com
•LinkedIn www.linkedin.com/in/raymondyu1
•@yutechnet
About this talk…
7/28/2013yutechnet.com
•Inspired by “Introduction to Data Science”
on Coursera (Bill Howe, UW)
•Guided tour of topics in data science
– MapReduce, Pig
– noSQL
– Machine Learning
– Information Visualization
•Goal
Big Data
•Volume
– Size of data
•Velocity
– The latency of data processing relative to the growing
demand of interactivity
•Variety
– The diversity of sources, formats, quality, and structures
Big Data is any data that is expensive to manage and hard to
extract value from. -Michael Franklin
Where does big data come from?
•“Data exhaust” from customers
•New censor technologies
•Individually contributed data in massive
scale
•Cheap to keep data
Data Science
•Data Preparation (at scale)
•Analytics
•Communication
The ability to take data, understand it, process it,
extract value from it, visualize it, and communicate it
- Hal Varian, Google's Chief Economist
Context…
src. Introduction to Data Science course
Relational Databases
•SQL as Declarative Language
•Indexes
– Extract small result from big dataset
– Built easily and automatically used when appropriate
•Data consistency
•“Old-style” scalability
MapReduce
•Google paper 2004
•Hadoop 2008
•High level programming model for large-
scale parallel data processing
•Divide-and-conquer
•Mapper + Reducer
“Hello World” of MapReduce
Count word frequency in millions of documents
MapReduce Programming Model
src. Course slide
Show me the MapReduce…
•www.jsmapreduce.com
MapReduce in Hadoop
Pig
• An engine to execute programs on top of
Hadoop
• Language layer Pig Latin
• An Apache open source project
(http://pig.apache.org)
•Yahoo! 2009
Why use Pig?
In MapReduce…
In Pig Latin
Pig System Overview
Context…
src. Introduction to Data Science course
noSQL definitions
•A term to designate databases which
differ from classic relational databases
– Transactional model
– Data model
•Not much to do with SQL
•“not only SQL”
Concepts
• CAP Theorem
– Consistency
– Availability
– Partition Tolerance
• Eventual consistency
Src: blog.beany.co.kr
noSQL One-page Overview
Let’s walk through a few
•Column definitions
•RDBMS
•Memcache
•Dynamo
•CouchDB
•BigTable (Hbase)
noSQL Common Features
• The ability to replicate and partition data
over many servers (scale)
• Horizontally scale simple operation
throughput over many servers
• A simple API - no query language (no SQL)
• Weaker concurrency model than ACID
transactions (no transaction)
• The ability to dynamically add new attributes
to data records (no schema)
Machine Learning
• Systems that automatically learn programs
from data
• Prediction
– Given examples of inputs and outputs
– Learn the relationship between them
– Apply the relationship to larger set
• Different from statistics model
– Large data set over simple model trumpets small data set
over sophisticated model
Bertin’s Visual Attributes
Data Encoding Exercise
Information Visualization
src. http://www.tableausoftware.com/public
Closing example
Src. http://commons.wikimedia.org/wiki/File:ElectoralCollege2012.svg
Nate Silver
fivethirtyeight.com
Obama’s Data-
Driven Campaign
• Massive voter db
• Hadoop as ETL
• Vertica db for slice-
and-dice
Questions?

More Related Content

What's hot

Big data
Big dataBig data
Big data
leenagoyal
 
Big data presenation
Big data presenationBig data presenation
Big data presenation
leenagoyal
 
Maximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpaceMaximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpace
AIMS (Agricultural Information Management Standards)
 
NASA Webserver Big Data InfoVis Summer School presentation
NASA Webserver Big Data InfoVis Summer School presentation NASA Webserver Big Data InfoVis Summer School presentation
NASA Webserver Big Data InfoVis Summer School presentation
Aaron Quigley
 
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
LinDa_FP7
 
Connecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked DataConnecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked Data
Victor de Boer
 
Domain driven design lightning talk
Domain driven design lightning talkDomain driven design lightning talk
Domain driven design lightning talk
Tadas Šubonis
 
Mets opening day - web based mets creation (2007)
Mets opening day - web based mets creation (2007)Mets opening day - web based mets creation (2007)
Mets opening day - web based mets creation (2007)
Ralf Stockmann
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
Bogdan Gaza
 
noSQL
noSQLnoSQL
MongoDB introduction at Google Cloud next Algiers
MongoDB introduction at Google Cloud next AlgiersMongoDB introduction at Google Cloud next Algiers
MongoDB introduction at Google Cloud next Algiers
Sylia Baraka
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
Alex Meadows
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
Alex Meadows
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
Alex Meadows
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
Data Science Society
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
Navid Sedighpour
 
Parse.com
Parse.comParse.com
Parse.com
Hyungook Yoon
 
Introduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLIntroduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQL
Mayur Patil
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
Matthias Funke
 

What's hot (19)

Big data
Big dataBig data
Big data
 
Big data presenation
Big data presenationBig data presenation
Big data presenation
 
Maximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpaceMaximizing the Impact of Institutional Knowledge Using DSpace
Maximizing the Impact of Institutional Knowledge Using DSpace
 
NASA Webserver Big Data InfoVis Summer School presentation
NASA Webserver Big Data InfoVis Summer School presentation NASA Webserver Big Data InfoVis Summer School presentation
NASA Webserver Big Data InfoVis Summer School presentation
 
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
 
Connecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked DataConnecting Heterogeneous Collections using Linked Data
Connecting Heterogeneous Collections using Linked Data
 
Domain driven design lightning talk
Domain driven design lightning talkDomain driven design lightning talk
Domain driven design lightning talk
 
Mets opening day - web based mets creation (2007)
Mets opening day - web based mets creation (2007)Mets opening day - web based mets creation (2007)
Mets opening day - web based mets creation (2007)
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
noSQL
noSQLnoSQL
noSQL
 
MongoDB introduction at Google Cloud next Algiers
MongoDB introduction at Google Cloud next AlgiersMongoDB introduction at Google Cloud next Algiers
MongoDB introduction at Google Cloud next Algiers
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
Scalable Web Data Management using RDF
Scalable Web Data Management using RDF  Scalable Web Data Management using RDF
Scalable Web Data Management using RDF
 
Parse.com
Parse.comParse.com
Parse.com
 
Introduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQLIntroduction to MongoDB Basics from SQL to NoSQL
Introduction to MongoDB Basics from SQL to NoSQL
 
Open source for customer analytics
Open source for customer analyticsOpen source for customer analytics
Open source for customer analytics
 

Similar to Tour of Big Data

Gilbane Boston 2012 Big Data 101
Gilbane Boston 2012 Big Data 101Gilbane Boston 2012 Big Data 101
Gilbane Boston 2012 Big Data 101
Peter O'Kelly
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Perficient, Inc.
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
Open Analytics
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
Institute of Contemporary Sciences
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
Lecture1
Lecture1Lecture1
Lecture1
Manish Singh
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
Hisham Arafat
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
Philippe Julio
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
Arthur Gimpel
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
InfiniteGraph
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
Peter O'Kelly
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Idan Tohami
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
NidhiAhuja30
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
Saurabh K. Gupta
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
Satish Mohan
 

Similar to Tour of Big Data (20)

Gilbane Boston 2012 Big Data 101
Gilbane Boston 2012 Big Data 101Gilbane Boston 2012 Big Data 101
Gilbane Boston 2012 Big Data 101
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
Big Data Open Source Tools and Trends: Enable Real-Time Business Intelligence...
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Lecture1
Lecture1Lecture1
Lecture1
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Big Data with Not Only SQL
Big Data with Not Only SQLBig Data with Not Only SQL
Big Data with Not Only SQL
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
NoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-lessNoSQL Simplified: Schema vs. Schema-less
NoSQL Simplified: Schema vs. Schema-less
 
Gilbane Boston 2011 big data
Gilbane Boston 2011 big dataGilbane Boston 2011 big data
Gilbane Boston 2011 big data
 
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZoneStartup Bootcamp - Intro to NoSQL/Big Data by DataZone
Startup Bootcamp - Intro to NoSQL/Big Data by DataZone
 
Big_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic backgroundBig_data_1674238705.ppt is a basic background
Big_data_1674238705.ppt is a basic background
 
Transform from database professional to a Big Data architect
Transform from database professional to a Big Data architectTransform from database professional to a Big Data architect
Transform from database professional to a Big Data architect
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 

Recently uploaded

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 

Recently uploaded (20)

How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 

Tour of Big Data

Editor's Notes

  1. Whenever you see “yutechnet”, it is me Next ask audience about:Developer? DBA? DBE?Worked on any databases beyond relational database?Use Hadoop and other noSQL on a daily basis?
  2. Dummy down version of the courseHard to pick topics to shareMajor areas of data science, focus on big data and noSQLGoal: familiar with the big picture and terminologies of data science and speak intelligently about this field, springboard into specific areas you are further interested in
  3. Franklin’s key idea: “Big” is relative, it depends on what you try to do
  4. Analytics: statistics model, machine learning, slice-dice
  5. Call out a few great features about relational databases to set the context of how we got here, and we don’t get lost in the context of big data and noSQL, with bad name/impression as old guardDeclarative – specify what you want, no need to worry about logical or physical operation and optimization
  6. Map, shuffle, and reduce.
  7. Touch base on HDFS layer about fault tolerance, job tracker, task tracker, etc.
  8. Comments in lieu of demo:schema-on-read with LOADRelational JOIN operationOptimization – relational algebraLazy evaluation – no work is done until STORE
  9. Pig performance: initially not good as MR, but caught up quickly, now almost the same as MRHive not covered, but 2011 data showed that >90% MR jobs are executed via HiveClear win for a declarative languageDon’t feel bad if you know SQL 
  10. About EC:Databases: “Everyone MUST see the same thing, either old or new, no matter how long it takes.”NoSQL: “For large applications, we can’t afford to wait that long, and maybe it doesn’t matter anyway”
  11. Memcache: load everything into memory, and scale across hundreds of machines, consistent hashingBigTable – Google 2006, complementary to MapReduce, added index (zoom-in), fast key-based lookup
  12. Statistics emphasizes accuracy of model, while ML cares less about the nature of modelThink of the example of building a super-accurate gun