SlideShare a Scribd company logo
1 of 33
Download to read offline
The Big Data Ecosystem at LinkedIn,[object Object],Jay Kreps,[object Object]
Me,[object Object],Background in data not infrastructure,[object Object],LinkedIn’s SNA team,[object Object],Original co-author of some LinkedIn open source projects (Voldemort, Azkaban, Kafka),[object Object]
This Talk,[object Object],We are in a renaissance of data infrastructure.,[object Object],How do all these pieces fit together?,[object Object]
Why the current obsession with “Big Data”?,[object Object]
The goal of modern data infrastructure is to make many small computers act like one big one.,[object Object]
The Old Picture,[object Object]
The New Picture,[object Object]
Polyglot persistence?,[object Object]
Infrastructure Icebergs,[object Object],90k lines of tooling and monitoring, 30k lines of logic,[object Object],Dedicated engineers, operations,[object Object],Training,[object Object],First three nines come from operations,[object Object]
This is (still) a very immature space. Which systems should we have?,[object Object]
Infrastructure is sculpted by applications and constraints,[object Object],Projects are defined by trade-offs,[object Object]
Constraints,[object Object],Hardware,[object Object],Jeff Dean: Numbers everyone should know,[object Object],David Patterson: Latency lags bandwidth,[object Object],$$$,[object Object],Other,[object Object],Path dependence,[object Object],Complexity,[object Object],Resources,[object Object]
Applications,[object Object]
Common categories of non-CRUD,[object Object],Recommendations & Matching,[object Object],Graphs,[object Object],Search,[object Object],Data Normalization,[object Object],News feed,[object Object],Analysis & Monitoring,[object Object]
Social Graph,[object Object]
Search,[object Object]
Recommendations: People,[object Object]
Recommendations: Jobs,[object Object]
Recommendations: Newsfeed,[object Object]
Data Normalization,[object Object]
Analytics,[object Object]
Infrastructure,[object Object],Search,[object Object],Lucene,[object Object],Bobo (facets), Zoie (real-time indexing), Sensei (distribution),[object Object],Social Graph,[object Object],Storage,[object Object],Oracle,[object Object],Voldemort,[object Object],Espresso,[object Object],Streams,[object Object],Databus,[object Object],Kafka,[object Object],Offline,[object Object],Hadoop & friends (Pig, Hive, Azkaban, etc),[object Object]
Three Major Paradigms,[object Object],Request/Response,[object Object],Search,[object Object],Social Graph,[object Object],Storage,[object Object],Streams,[object Object],Kafka,[object Object],Batch,[object Object],Hadoop,[object Object]
Most features are multi-paradigm,[object Object]
Request/Response,[object Object],Search,[object Object],Social Graph,[object Object],Storage,[object Object],Voldemort,[object Object],Espresso,[object Object]
Request/Response Patterns,[object Object],Broker, scatter-gather,[object Object],Storage systems: only ,[object Object],Partitioning strategy,[object Object],Latency oriented,[object Object]
Batch: Hadoop,[object Object],Uses,[object Object],Ad hoc,[object Object],Production batch,[object Object],Ecosystem,[object Object],Hive, Pig,[object Object],Azkaban (workflow),[object Object],Avro data,[object Object],Data in: Kafka,[object Object],Data out: Voldemort, Kafka,[object Object]
Why do batch if you have real-time?,[object Object],Batch advantages,[object Object],Safety,[object Object],Easy,[object Object],Throughput,[object Object],Simplicity,[object Object],Economics,[object Object],Tricky bit: engineering the data cycle,[object Object]
Why do streaming?,[object Object],You have to glue all these systems together,[object Object],Throughput as good as batch,[object Object],Latency much better,[object Object],Metaphor more natural for low latency than Hadoop,[object Object]
What makes successful infrastructure systems?,[object Object],Operability and Operations,[object Object],Monitoring,[object Object],Simplicity,[object Object],Documentation,[object Object],Broad adoption,[object Object],Lazy users,[object Object],Open source,[object Object]
Open Source,[object Object],Data > Infrastructure,[object Object],Open source creates better code—even with few outside contributors,[object Object],Commercial infrastructure not interesting,[object Object]
Open Source Projects,[object Object],We made,[object Object],Voldemort: Key/Value storage,[object Object],Sensei, Bobo, Zoie: Elastic, faceted, real-time search with Lucene,[object Object],Kafka: Persistent, distributed data streams,[object Object],Norbert: Cluster aware RPC, load balancing, and group membership,[object Object],And others…,[object Object],We stole,[object Object],Hadoop, Pig, Hive,[object Object],Lucene,[object Object],Netty, Jetty,[object Object],Zookeeper,[object Object],Avro,[object Object],Apache Traffic Server,[object Object]
The End,[object Object],jay.kreps@gmail.com,[object Object],http://www.linkedin.com/in/jaykreps,[object Object],http://twitter.com/jaykreps,[object Object],http://sna-projects.com,[object Object]

More Related Content

What's hot

What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperVasu S
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data Geoffrey Fox
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing ArchitectureGang Tao
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInSam Shah
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureNiels Naglé
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahDatabricks
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Big Data Spain
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 

What's hot (20)

What is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | WhitepaperWhat is an Open Data Lake? - Data Sheets | Whitepaper
What is an Open Data Lake? - Data Sheets | Whitepaper
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
High Performance Computing and Big Data
High Performance Computing and Big Data High Performance Computing and Big Data
High Performance Computing and Big Data
 
Big Data Computing Architecture
Big Data Computing ArchitectureBig Data Computing Architecture
Big Data Computing Architecture
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
The "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedInThe "Big Data" Ecosystem at LinkedIn
The "Big Data" Ecosystem at LinkedIn
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq AbdullahLeveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
Fishing Graphs in a Hadoop Data Lake by Jörg Schad and Max Neunhoeffer at Big...
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 

Viewers also liked

AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Databricks
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for IndustriesAvadhoot Patwardhan
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Denodo
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesDataStax
 
Big Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated DronesBig Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated DronesEspeo Software
 
A look back bkelly duo farewell - june 2015
A look back   bkelly duo farewell - june 2015A look back   bkelly duo farewell - june 2015
A look back bkelly duo farewell - june 2015Brian Kelly
 
LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...Nish Mahanty
 
Patrick Shields Digitising the Public Sector
Patrick Shields   Digitising the Public SectorPatrick Shields   Digitising the Public Sector
Patrick Shields Digitising the Public SectorSoftware AG South Africa
 
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...Chanita Anuwong
 
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011Compulsiva Accesorios
 
同玩節海報事件
同玩節海報事件同玩節海報事件
同玩節海報事件lalacamp07
 

Viewers also liked (20)

AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
Real-Time Analytics for Industries
Real-Time Analytics for IndustriesReal-Time Analytics for Industries
Real-Time Analytics for Industries
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
 
Aligning BPM and EA
Aligning BPM and EAAligning BPM and EA
Aligning BPM and EA
 
The Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial Services
 
Big Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated DronesBig Data Ecosystem - 1000 Simulated Drones
Big Data Ecosystem - 1000 Simulated Drones
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Your Garden and Global Warming
Your Garden and Global WarmingYour Garden and Global Warming
Your Garden and Global Warming
 
A look back bkelly duo farewell - june 2015
A look back   bkelly duo farewell - june 2015A look back   bkelly duo farewell - june 2015
A look back bkelly duo farewell - june 2015
 
LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...LAST Conference - The Mickey Mouse model of leadership for software delivery ...
LAST Conference - The Mickey Mouse model of leadership for software delivery ...
 
Quick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & MicroformatsQuick Introduction to the Semantic Web, RDFa & Microformats
Quick Introduction to the Semantic Web, RDFa & Microformats
 
4º básico a semana 25 abril al 29 abril
4º básico a  semana 25 abril  al 29 abril4º básico a  semana 25 abril  al 29 abril
4º básico a semana 25 abril al 29 abril
 
Patrick Shields Digitising the Public Sector
Patrick Shields   Digitising the Public SectorPatrick Shields   Digitising the Public Sector
Patrick Shields Digitising the Public Sector
 
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
Product Discovery and Delivery by Odd-e (Thailand). Build the right thing at ...
 
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
Lo Mejor de Cibeles Madrid Fashion Week - Otoño/Invierno 2010 - 2011
 
Driving school
Driving schoolDriving school
Driving school
 
Comic Analysis
Comic AnalysisComic Analysis
Comic Analysis
 
同玩節海報事件
同玩節海報事件同玩節海報事件
同玩節海報事件
 

Similar to The Big Data Ecosystem at LinkedIn

Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPeter Wang
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfkalai75
 
Startds9.19.17sd
Startds9.19.17sdStartds9.19.17sd
Startds9.19.17sdThinkful
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceNeo4j
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big DataMrinal Kumar
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17Thinkful
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014Kenneth Igiri
 
Intro to Neo4j Webinar
Intro to Neo4j WebinarIntro to Neo4j Webinar
Intro to Neo4j WebinarNeo4j
 
Jon cohn exton pa corporate data architecture
Jon cohn exton pa   corporate data architectureJon cohn exton pa   corporate data architecture
Jon cohn exton pa corporate data architectureJon Cohn
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringAndrei Savu
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanLuke Caratan
 
ITCamp 2018 - Magnus Mårtensson - Azure Global Application Perspectives
ITCamp 2018 - Magnus Mårtensson - Azure Global Application PerspectivesITCamp 2018 - Magnus Mårtensson - Azure Global Application Perspectives
ITCamp 2018 - Magnus Mårtensson - Azure Global Application PerspectivesITCamp
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 

Similar to The Big Data Ecosystem at LinkedIn (20)

Microsoft Dryad
Microsoft DryadMicrosoft Dryad
Microsoft Dryad
 
Python's Role in the Future of Data Analysis
Python's Role in the Future of Data AnalysisPython's Role in the Future of Data Analysis
Python's Role in the Future of Data Analysis
 
Grandata
GrandataGrandata
Grandata
 
unit-4-notes.pdf
unit-4-notes.pdfunit-4-notes.pdf
unit-4-notes.pdf
 
Cloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdfCloud and Bid data Dr.VK.pdf
Cloud and Bid data Dr.VK.pdf
 
Startds9.19.17sd
Startds9.19.17sdStartds9.19.17sd
Startds9.19.17sd
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
 
Cloud Computing & Big Data
Cloud Computing & Big DataCloud Computing & Big Data
Cloud Computing & Big Data
 
Data sci sd-11.6.17
Data sci sd-11.6.17Data sci sd-11.6.17
Data sci sd-11.6.17
 
Big Data Basic Concepts | Presented in 2014
Big Data Basic Concepts  | Presented in 2014Big Data Basic Concepts  | Presented in 2014
Big Data Basic Concepts | Presented in 2014
 
Intro to Neo4j Webinar
Intro to Neo4j WebinarIntro to Neo4j Webinar
Intro to Neo4j Webinar
 
Jon cohn exton pa corporate data architecture
Jon cohn exton pa   corporate data architectureJon cohn exton pa   corporate data architecture
Jon cohn exton pa corporate data architecture
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
Business_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_CaratanBusiness_Analytics_Presentation_Luke_Caratan
Business_Analytics_Presentation_Luke_Caratan
 
ITCamp 2018 - Magnus Mårtensson - Azure Global Application Perspectives
ITCamp 2018 - Magnus Mårtensson - Azure Global Application PerspectivesITCamp 2018 - Magnus Mårtensson - Azure Global Application Perspectives
ITCamp 2018 - Magnus Mårtensson - Azure Global Application Perspectives
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
2018 learning approach-digitaltrends
2018 learning approach-digitaltrends2018 learning approach-digitaltrends
2018 learning approach-digitaltrends
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 

More from OSCON Byrum

OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON Byrum
 
Protecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseProtecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseOSCON Byrum
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataOSCON Byrum
 
Finite State Machines - Why the fear?
Finite State Machines - Why the fear?Finite State Machines - Why the fear?
Finite State Machines - Why the fear?OSCON Byrum
 
Open Source Automotive Development
Open Source Automotive DevelopmentOpen Source Automotive Development
Open Source Automotive DevelopmentOSCON Byrum
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenOSCON Byrum
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonOSCON Byrum
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with PythonOSCON Byrum
 
An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)OSCON Byrum
 
Oscon 2013 Jesse Anderson
Oscon 2013 Jesse AndersonOscon 2013 Jesse Anderson
Oscon 2013 Jesse AndersonOSCON Byrum
 
US Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzUS Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzOSCON Byrum
 
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON Byrum
 
Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of usOSCON Byrum
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking OSCON Byrum
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptOSCON Byrum
 
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...OSCON Byrum
 
A Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsA Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsOSCON Byrum
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudOSCON Byrum
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesOSCON Byrum
 
Comparing open source private cloud platforms
Comparing open source private cloud platformsComparing open source private cloud platforms
Comparing open source private cloud platformsOSCON Byrum
 

More from OSCON Byrum (20)

OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom FifieldOSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
OSCON 2013 - Planning an OpenStack Cloud - Tom Fifield
 
Protecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent LicenseProtecting Open Innovation with the Defensive Patent License
Protecting Open Innovation with the Defensive Patent License
 
Using Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open DataUsing Cascalog to build an app with City of Palo Alto Open Data
Using Cascalog to build an app with City of Palo Alto Open Data
 
Finite State Machines - Why the fear?
Finite State Machines - Why the fear?Finite State Machines - Why the fear?
Finite State Machines - Why the fear?
 
Open Source Automotive Development
Open Source Automotive DevelopmentOpen Source Automotive Development
Open Source Automotive Development
 
How we built our community using Github - Uri Cohen
How we built our community using Github - Uri CohenHow we built our community using Github - Uri Cohen
How we built our community using Github - Uri Cohen
 
The Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in PythonThe Vanishing Pattern: from iterators to generators in Python
The Vanishing Pattern: from iterators to generators in Python
 
Distributed Coordination with Python
Distributed Coordination with PythonDistributed Coordination with Python
Distributed Coordination with Python
 
An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)An overview of open source in East Asia (China, Japan, Korea)
An overview of open source in East Asia (China, Japan, Korea)
 
Oscon 2013 Jesse Anderson
Oscon 2013 Jesse AndersonOscon 2013 Jesse Anderson
Oscon 2013 Jesse Anderson
 
US Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David MertzUS Patriot Act OSCON2012 David Mertz
US Patriot Act OSCON2012 David Mertz
 
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
OSCON 2012 US Patriot Act Implications for Cloud Computing - Diane Mueller, A...
 
Big Data for each one of us
Big Data for each one of usBig Data for each one of us
Big Data for each one of us
 
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
BodyTrack: Open Source Tools for Health Empowerment through Self-Tracking
 
Declarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScriptDeclarative web data visualization using ClojureScript
Declarative web data visualization using ClojureScript
 
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...Using and Building Open Source in Google Corporate Engineering - Justin McWil...
Using and Building Open Source in Google Corporate Engineering - Justin McWil...
 
A Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed ApplicationsA Look at the Network: Searching for Truth in Distributed Applications
A Look at the Network: Searching for Truth in Distributed Applications
 
Life After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data CloudLife After Sharding: Monitoring and Management of a Complex Data Cloud
Life After Sharding: Monitoring and Management of a Complex Data Cloud
 
Faster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypesFaster! Faster! Accelerate your business with blazing prototypes
Faster! Faster! Accelerate your business with blazing prototypes
 
Comparing open source private cloud platforms
Comparing open source private cloud platformsComparing open source private cloud platforms
Comparing open source private cloud platforms
 

Recently uploaded

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 

Recently uploaded (20)

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 

The Big Data Ecosystem at LinkedIn

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.

Editor's Notes

  1. Good news for users, bad news for distributed systems nerdsFilesystems take a decade to mature. Don’t expect this will be easier.