SlideShare a Scribd company logo
1 of 39
Download to read offline
Big Data EcoSystem @ LinkedIn
October 20, 2012
LinkedIn Confidential ©2013 All Rights Reserved
LinkedIn Confidential ©2013 All Rights Reserved
Sunil Shirguppi
Head of Data Services- International
LinkedIn Corporation
http://www.linkedin.com/in/sunilshirguppi
Outline
LinkedIn Overview
Data Science
Big Data Eco-System
Learnings
LinkedIn Confidential ©2013 All Rights Reserved 3
Our Mission
Connect the world’s professionals
to make them more productive and successful
LinkedIn Confidential ©2013 All Rights Reserved 4
We are the professional profile of record
Googled yourself lately?
Don’t feel bad, we all do it.
Executives from all
Companies are
LinkedIn members
The LinkedIn Opportunity
LinkedIn Confidential ©2013 All Rights Reserved 7
Fundamentally transforming the way the world worksFundamentally transforming the way the world works
Connect talent with opportunity at massive scale
+
The World’s Largest Professional Network
LinkedIn Confidential ©2013 All Rights Reserved 8
*as of Nov 4, 2011
**as of June 30, 2011
2
4
8
17
32
55
90
2004 2005 2006 2007 2008 2009 2010
LinkedIn Members (Millions)
175M+*
82%
Fortune 100 Companies
use LinkedIn to hire
Company Pages
>2M
**
New Members joining
~2/sec
Professional
searches in 2011
~4.2B
Multiple revenue channels
 Premium Subscriptions
 Self Serve Ads
 Hiring Solutions
 Marketing Solutions
Let’s talk Data…
Business is recognizing the importance of analytics
Data Scientist = Curiosity + Intuition + Data
gathering + Standardization + Statistics + Modeling
+ Visualization + Communication
What makes a Data Scientist?
Big Data at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 13
* Chart from Philip Russom- Research Director: TDWI
What do we do with Data?
 Data Standardization
 Build innovative data products to help professionals
 Draw insights
 Drive the business
Before we can do that...
There are a few challenges that we have to overcome
• Scale
• Standardization
• Infrastructure
Few Data-Driven Products
LinkedIn Confidential ©2013 All Rights Reserved 15
Pandora Search for People
Events You
May Be
Interested In
Groups browse maps
How do we do it?
Big Data Ecosystem @ LinkedIn
LinkedIn Sample Data Stack
Crowdsourcing
Big Data at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 19
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
High-level data environment
Challenges so complex which
off-the-shelf or a few
technologies can’t address
Built our own combination of
toolsets/ technologies to
meet specific requirements
LinkedIn Data Stack – Online
LinkedIn Confidential ©2013 All Rights Reserved 20
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Rich structures (e.g. indexes)
• Change capture capability
LinkedIn Data Stack – Nearline
LinkedIn Confidential ©2013 All Rights Reserved 21
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Key value accessVoldemort
• Search platform
• Distributed Graph engine
Zoie Bobo Sensei
D-Graph
LinkedIn Data Stack – Pipeline
LinkedIn Confidential ©2013 All Rights Reserved 22
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Messaging for site events,
monitoring
• Change data capture streams
LinkedIn Data Stack – Offline
LinkedIn Confidential ©2013 All Rights Reserved 23
Users
Online Data
Store
Near-Line
Data Store
Application Offline Data
Store
Web
Logs
Systems Capabilities
• Machine learning, ranking,
relevance
• Warehouse and analytics
LinkedIn with Hadoop, Aster, and Teradata
LinkedIn Confidential ©2013 All Rights Reserved 24
Integrated Data
Warehouse
• Exec Dashboards
• Adhoc/OLAP
• Complex SQL
• SQL
Data transformation
& batch processing
• Image processing
• Search indexes
• Graph (PYMK)
• MapReduce
Analytic Platform for data
discovery
• nPath Pattern/Path
• Clickstream analysis
• A/B site testing
• Data Sciences discovery
• SQL-MapReduce
Aster/Teradata
Bi-Directional Connector
Aster/Teradata
Hadoop Connectors
Batch data transformations for
engineering groups using HDFS +
MapReduce
Batch data transformations for
engineering groups using HDFS +
MapReduce
Interactive MapReduce
analytics for the enterprise using
MapReduce Analytics &
SQL-MapReduce
Interactive MapReduce
analytics for the enterprise using
MapReduce Analytics &
SQL-MapReduce
Integration with structured data,
operational intelligence, scalable
distribution of analytics
Integration with structured data,
operational intelligence, scalable
distribution of analytics
Big Data Ecosystem @ LinkedIn
It’s a global economy
Country connectedness on LinkedIn
Data deep dives
Job migration after financial collapse
How Often do people change jobs?
Visualization is important
If your name is Chip, you are likely in sales!
31
Industry Growth
Buzzwords
What next?
• Self service analytics
• Metadata framework
• Integrate reporting solutions
• Go Mobile!
• Scalability and Data Quality
Challenges
• Data volumes and availability
– Billion+ rows every day
– Users in Global locations need data
• Multiple platforms
– Agile development
– Data Integration
 Data Quality
– User input data
– Data standardization
Key Learnings
 Self Service
– Making data accessible to key stakeholders in a timely
manner creates tremendous value.
– Viz is more important than we think
• Measuring your future investments
– Performance is not the only measure
– Company fundamentals matter
• As an Data team, be in control of your destiny
– Identify what to measure and lead by metrics
– Become the Think-tank
Web 3.0 – It’s all about data!!
LinkedIn Confidential ©2013 All Rights Reserved 36
ULTIMATELY…
It is all about the people!
LinkedIn Confidential ©2013 All Rights Reserved 39
Thank You!

More Related Content

What's hot

Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Databricks
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1BigDataExpo
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphNeo4j
 
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...Data Con LA
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligenceUnlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligencePrecisely
 
Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2InnoTech
 
Opportunities in Big Data by Arihant Patni
Opportunities in Big Data by Arihant PatniOpportunities in Big Data by Arihant Patni
Opportunities in Big Data by Arihant PatniThe Hive
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyArthur_Hansen
 
Top 10 Big Data Technologies | Edureka
Top 10 Big Data Technologies | EdurekaTop 10 Big Data Technologies | Edureka
Top 10 Big Data Technologies | EdurekaEdureka!
 
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...Data Con LA
 
Building a Collaborative Data Architecture
Building a Collaborative Data ArchitectureBuilding a Collaborative Data Architecture
Building a Collaborative Data ArchitectureDATAVERSITY
 
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...DATAVERSITY
 
The Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jThe Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jNeo4j
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
 
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...VMware Tanzu
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Knowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeNeo4j
 
How Data is Driving AI Innovation
How Data is Driving AI InnovationHow Data is Driving AI Innovation
How Data is Driving AI InnovationMatt Turner
 
Strata 2015 - Architecting For The Cloud
Strata 2015 - Architecting For The CloudStrata 2015 - Architecting For The Cloud
Strata 2015 - Architecting For The CloudDataHero
 

What's hot (20)

Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
Revolutionizing the Legal Industry with Spark, NLP and Azure Databricks at Cl...
 
Dell hans timmerman v1.1
Dell hans timmerman v1.1Dell hans timmerman v1.1
Dell hans timmerman v1.1
 
GraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business GraphGraphTalks Rome - The Italian Business Graph
GraphTalks Rome - The Italian Business Graph
 
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
Big Data on The Rise: Views of Emerging Trends from real life end-users by Ro...
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligenceUnlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location Intelligence
 
Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2Share point saturday access services 2015 final 2
Share point saturday access services 2015 final 2
 
Opportunities in Big Data by Arihant Patni
Opportunities in Big Data by Arihant PatniOpportunities in Big Data by Arihant Patni
Opportunities in Big Data by Arihant Patni
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
 
Top 10 Big Data Technologies | Edureka
Top 10 Big Data Technologies | EdurekaTop 10 Big Data Technologies | Edureka
Top 10 Big Data Technologies | Edureka
 
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
Logitech Accelerates Cloud Analytics Using Data Virtualization by Avinash Des...
 
Building a Collaborative Data Architecture
Building a Collaborative Data ArchitectureBuilding a Collaborative Data Architecture
Building a Collaborative Data Architecture
 
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
Slides: Accelerate and Assure the Adoption of Cloud Data Platforms Using Inte...
 
The Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4jThe Connected Data Imperative: An Introduction to Neo4j
The Connected Data Imperative: An Introduction to Neo4j
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Rocking the World of Big Data at Centrica
Rocking the World of Big Data at CentricaRocking the World of Big Data at Centrica
Rocking the World of Big Data at Centrica
 
Knowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your KnowledgeKnowledge Architecture: Graphing Your Knowledge
Knowledge Architecture: Graphing Your Knowledge
 
How Data is Driving AI Innovation
How Data is Driving AI InnovationHow Data is Driving AI Innovation
How Data is Driving AI Innovation
 
Strata 2015 - Architecting For The Cloud
Strata 2015 - Architecting For The CloudStrata 2015 - Architecting For The Cloud
Strata 2015 - Architecting For The Cloud
 

Similar to Big Data Ecosystem @ LinkedIn

Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bhaskar Ghosh
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesCA | Automic Software
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)Denodo
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsDenodo
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with MicrosoftCaserta
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsDATAVERSITY
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Usama Fayyad
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)Denodo
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
Cloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIsCloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIsSnapLogic
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationDenodo
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsEmbarcadero Technologies
 

Similar to Big Data Ecosystem @ LinkedIn (20)

Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
How Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data ProcessesHow Linkedin uses Automic for Big Data Processes
How Linkedin uses Automic for Big Data Processes
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
Big Data with Data Virtualization (session 3 from Packed Lunch Webinar Series)
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Self-Service Analytics with Guard Rails
Self-Service Analytics with Guard RailsSelf-Service Analytics with Guard Rails
Self-Service Analytics with Guard Rails
 
Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
Keynote talk at Financial Times Forum - BigData and Advanced Analytics at SIB...
 
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...
 
Group 1 LinkedIn
Group 1 LinkedInGroup 1 LinkedIn
Group 1 LinkedIn
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Cloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIsCloud Con 2015 - Integration & Web APIs
Cloud Con 2015 - Integration & Web APIs
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Modernizing Integration with Data Virtualization
Modernizing Integration with Data VirtualizationModernizing Integration with Data Virtualization
Modernizing Integration with Data Virtualization
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data Assets
 

Recently uploaded

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 

Recently uploaded (20)

9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 

Big Data Ecosystem @ LinkedIn

  • 1. Big Data EcoSystem @ LinkedIn October 20, 2012 LinkedIn Confidential ©2013 All Rights Reserved
  • 2. LinkedIn Confidential ©2013 All Rights Reserved Sunil Shirguppi Head of Data Services- International LinkedIn Corporation http://www.linkedin.com/in/sunilshirguppi
  • 3. Outline LinkedIn Overview Data Science Big Data Eco-System Learnings LinkedIn Confidential ©2013 All Rights Reserved 3
  • 4. Our Mission Connect the world’s professionals to make them more productive and successful LinkedIn Confidential ©2013 All Rights Reserved 4
  • 5. We are the professional profile of record Googled yourself lately? Don’t feel bad, we all do it.
  • 6. Executives from all Companies are LinkedIn members
  • 7. The LinkedIn Opportunity LinkedIn Confidential ©2013 All Rights Reserved 7 Fundamentally transforming the way the world worksFundamentally transforming the way the world works Connect talent with opportunity at massive scale +
  • 8. The World’s Largest Professional Network LinkedIn Confidential ©2013 All Rights Reserved 8 *as of Nov 4, 2011 **as of June 30, 2011 2 4 8 17 32 55 90 2004 2005 2006 2007 2008 2009 2010 LinkedIn Members (Millions) 175M+* 82% Fortune 100 Companies use LinkedIn to hire Company Pages >2M ** New Members joining ~2/sec Professional searches in 2011 ~4.2B
  • 9. Multiple revenue channels  Premium Subscriptions  Self Serve Ads  Hiring Solutions  Marketing Solutions
  • 11. Business is recognizing the importance of analytics
  • 12. Data Scientist = Curiosity + Intuition + Data gathering + Standardization + Statistics + Modeling + Visualization + Communication What makes a Data Scientist?
  • 13. Big Data at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 13 * Chart from Philip Russom- Research Director: TDWI
  • 14. What do we do with Data?  Data Standardization  Build innovative data products to help professionals  Draw insights  Drive the business Before we can do that... There are a few challenges that we have to overcome • Scale • Standardization • Infrastructure
  • 15. Few Data-Driven Products LinkedIn Confidential ©2013 All Rights Reserved 15 Pandora Search for People Events You May Be Interested In Groups browse maps
  • 16. How do we do it?
  • 18. LinkedIn Sample Data Stack Crowdsourcing
  • 19. Big Data at LinkedIn LinkedIn Confidential ©2013 All Rights Reserved 19 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs High-level data environment Challenges so complex which off-the-shelf or a few technologies can’t address Built our own combination of toolsets/ technologies to meet specific requirements
  • 20. LinkedIn Data Stack – Online LinkedIn Confidential ©2013 All Rights Reserved 20 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Rich structures (e.g. indexes) • Change capture capability
  • 21. LinkedIn Data Stack – Nearline LinkedIn Confidential ©2013 All Rights Reserved 21 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Key value accessVoldemort • Search platform • Distributed Graph engine Zoie Bobo Sensei D-Graph
  • 22. LinkedIn Data Stack – Pipeline LinkedIn Confidential ©2013 All Rights Reserved 22 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Messaging for site events, monitoring • Change data capture streams
  • 23. LinkedIn Data Stack – Offline LinkedIn Confidential ©2013 All Rights Reserved 23 Users Online Data Store Near-Line Data Store Application Offline Data Store Web Logs Systems Capabilities • Machine learning, ranking, relevance • Warehouse and analytics
  • 24. LinkedIn with Hadoop, Aster, and Teradata LinkedIn Confidential ©2013 All Rights Reserved 24 Integrated Data Warehouse • Exec Dashboards • Adhoc/OLAP • Complex SQL • SQL Data transformation & batch processing • Image processing • Search indexes • Graph (PYMK) • MapReduce Analytic Platform for data discovery • nPath Pattern/Path • Clickstream analysis • A/B site testing • Data Sciences discovery • SQL-MapReduce Aster/Teradata Bi-Directional Connector Aster/Teradata Hadoop Connectors Batch data transformations for engineering groups using HDFS + MapReduce Batch data transformations for engineering groups using HDFS + MapReduce Interactive MapReduce analytics for the enterprise using MapReduce Analytics & SQL-MapReduce Interactive MapReduce analytics for the enterprise using MapReduce Analytics & SQL-MapReduce Integration with structured data, operational intelligence, scalable distribution of analytics Integration with structured data, operational intelligence, scalable distribution of analytics
  • 26. It’s a global economy Country connectedness on LinkedIn
  • 27. Data deep dives Job migration after financial collapse
  • 28. How Often do people change jobs?
  • 30. If your name is Chip, you are likely in sales!
  • 33. What next? • Self service analytics • Metadata framework • Integrate reporting solutions • Go Mobile! • Scalability and Data Quality
  • 34. Challenges • Data volumes and availability – Billion+ rows every day – Users in Global locations need data • Multiple platforms – Agile development – Data Integration  Data Quality – User input data – Data standardization
  • 35. Key Learnings  Self Service – Making data accessible to key stakeholders in a timely manner creates tremendous value. – Viz is more important than we think • Measuring your future investments – Performance is not the only measure – Company fundamentals matter • As an Data team, be in control of your destiny – Identify what to measure and lead by metrics – Become the Think-tank
  • 36. Web 3.0 – It’s all about data!! LinkedIn Confidential ©2013 All Rights Reserved 36
  • 38. It is all about the people!
  • 39. LinkedIn Confidential ©2013 All Rights Reserved 39 Thank You!