SlideShare a Scribd company logo
1 of 73
Social Network Analysis at
         Linkedin
                 Mitul Tiwari
     Search, Network, and Analytics (SNA)
                  LinkedIn
                      1
Who am I?




    2
Outline


About Me

About LinkedIn

Analytics products at LinkedIn

Fun Facts: Sequencing the Startup DNA


                    3
LinkedIn by the numbers

120,000,000+ users (August 4, 2011)

2+ new user registrations per second

81+ Million monthly unique users* (Comscore)

2 Billion People Searches in 2010

7.1 Billion page views in Q2 2010

2+ Million companies with LinkedIn Company Pages

2+ M LinkedIn Groups

                            4
Broad Range of Products




           5
User Profile




     6
Connections




     7
Communications

Communications




                        10


                 8
Hiring Solutions




       9
Job Search




    10
Company Pages




      11
Outline


About Me

About LinkedIn

Analytics products at LinkedIn

Fun Facts: Sequencing the Startup DNA


                    12
What do I mean by Analytics Products?




                 13
People You May Know




         14
Profile Stats: WVMP




        15
Viewers of this profile also ...




               16
Skills




  17
InMaps




  18
Related Searches


Millions of Searches everyday

Help users to explore and refine their queries




                     19
Analytics Products: Key Ideas

Recommendations
 People You May Know, Related Searches, Viewers of
 this profile ...

Insight
 Profile Stats: Who Viewed My Profile, Skills

Visualization
 InMaps
                      20
Analytics Products: Challenges


 LinkedIn: largest professional network

 120+ million members on LinkedIn

 Billions of pageviews

 Terabytes of data to process


                      21
Outline


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    22
Connection Strength


Let’s build “Connection Strength”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                     23
Connection Strength


How well do people know each other?

Connection Strength

Application: reorder updates to show updates
from close connections



                      24
Connection Strength
  How well do
                          Alice
people know each
     other?


               Bob                Carol




                     25
Connection Strength
  How well do
                          Alice
people know each
     other?


               Bob                Carol




                     26
Connection Strength
  How well do
                               Alice
people know each
     other?


               Bob                     Carol



                   Triangle closing


                          27
Connection Strength
  How well do
                              Alice
people know each
     other?


               Bob                    Carol



                 Triangle closing
Prob(Bob knows Carol) ~ the # of common connections

                         28
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      29
Pig Overview
Load: load data, specify format

Store: store data, specify format

Foreach, Generate: Projections, similar to select

Group by: group by column(s)

Join, Filter, Limit, Order, ...

User Defined Functions (UDFs)
                        30
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      31
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      32
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      33
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      34
Triangle Closing in Pig
-- connections in (source_id, dest_id) format in both directions
connections = LOAD `connections` USING PigStorage();
group_conn = GROUP connections BY source_id;
pairs = FOREACH group_conn GENERATE
        generatePair(connections.dest_id) as (id1, id2);

common_conn = GROUP pairs BY (id1, id2);
common_conn = FOREACH common_conn GENERATE
              flatten(group) as (source_id, dest_id),
              COUNT(pairs) as common_connections;
STORE common_conn INTO `common_conn` USING PigStorage();


                                      35
Triangle Closing Example
                                   Alice




                  Bob                       Carol

                               connections = LOAD `connections` USING
1.(A,B),(B,A),(A,C),(C,A)      PigStorage();
2.(A,{B,C}),(B,{A}),(C,{A})
3.(A,{B,C}),(A,{C,B})
4.(B,C,1), (C,B,1)
                              36
Triangle Closing Example
                                    Alice




                  Bob                         Carol


1.(A,B),(B,A),(A,C),(C,A)
                              group_conn = GROUP connections BY
2.(A,{B,C}),(B,{A}),(C,{A})   source_id;
3.(A,{B,C}),(A,{C,B})
4.(B,C,1), (C,B,1)
                               37
Triangle Closing Example
                                     Alice




                  Bob                             Carol


1.(A,B),(B,A),(A,C),(C,A)
2.(A,{B,C}),(B,{A}),(C,{A})
                              pairs = FOREACH group_conn GENERATE
3.(A,{B,C}),(A,{C,B})         generatePair(connections.dest_id) as (id1, id2);
4.(B,C,1), (C,B,1)
                                38
Triangle Closing Example
                                     Alice




                  Bob                           Carol


1.(A,B),(B,A),(A,C),(C,A)
2.(A,{B,C}),(B,{A}),(C,{A})   common_conn = GROUP pairs BY (id1, id2);
                              common_conn = FOREACH common_conn
3.(A,{B,C}),(A,{C,B})         GENERATE flatten(group) as (source_id, dest_id),
4.(B,C,1), (C,B,1)            COUNT(pairs) as common_connections;
                                39
Connection Strength



Age

Company Overlap

School Overlap



                  40
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      41
Systems and Tools


Kafka (LinkedIn)

Hadoop (Apache)

Azkaban (LinkedIn)

Voldemort (LinkedIn)


                     42
Data Cycle




    43
Systems and Tools

Kafka
 publish-subscribe messaging system

 transfer data from production to HDFS

Hadoop

Azkaban

Voldemort

                      44
Systems and Tools

Kafka

Hadoop
 Java MapReduce and Pig

 process data

Azkaban

Voldemort

                    45
Systems and Tools

Kafka

Hadoop

Azkaban
 Hadoop workflow management tool

 to manage hundreds of Hadoop jobs

Voldemort

                     46
Systems and Tools

Kafka

Hadoop

Azkaban

Voldemort
 Key-value store

 store output of Hadoop jobs and serve in production

                      47
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      48
Hadoop MapReduce

Large-scale distributed data processing

Map phase: partition the problem in smaller
steps

Reduce phase: aggregate the output of smaller
steps

Allows parallelism and fault-tolerance

                     49
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      50
Our Workflow

         Triangle
Age                   School Overlap
         Closing



         Union




       push-to-prod



                 51
Our Workflow

                Triangle
  Age                        School Overlap
                Closing



                Union




push-to-qa    push-to-prod



                        52
Sample Workflow




      53
Azkaban Workflow Management

  Dependency management
  Regular Scheduling
  Monitoring
  Diverse jobs: Java, Pig, Clojure
  Configuration/Parameters
  Resource control/locking
  Restart/Stop/Retry
  Visualization
  History
  Logs
                           54
Our Workflow

         Triangle
Age                   School Overlap
         Closing



         Union




       push-to-prod



                 55
Our Workflow

         Triangle
Age                   School Overlap
         Closing



         Union




       push-to-prod



                 56
Related Searches


Let’s build “Related Searches”

Systems and Tools we use

Hadoop MapReduce

Managing workflow

Serving data in production

                      57
Voldemort

Large amount of data/Scalable

Quick lookup/low latency

Versioning and Rollback

Fault tolerance through replication

Read only

Offline index building

                        58
Voldemort RO Store




        59
Outline


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    60
Related Searches


Millions of Searches everyday

Help users to explore and refine their queries




                     61
How to build Related Searches?

 Collaborative Filtering: searches done in the
 same session



        Q1       Q2        Q3    Q4

             Time



                      62
How to build Related Searches?

 Searches correlated by results clicks


         Q1                 R1




         Qn                 Rm


                       63
How to build Related Searches?

 Searches with overlapping terms



 Q1     Jeff LinkedIn


 Q2         LinkedIn CEO




                        64
Outline


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    65
Sequencing the Startup DNA




            Done by Monica Rogati at
                   LinkedIn


            66
Sequencing the Startup DNA

•Top majors: Entrepreneurship, Computer Engineering
•Bottom: Nursing, Administration




                         67
Sequencing the Startup DNA

•Most founders age at first startup between 20 and 40




                          68
Sequencing the Startup DNA

•Top regions: San Francisco, NYC




                         69
Sequencing the Startup DNA


•Top Business Schools:
Stanford, Harvard, MIT Sloan




                         70
Things Covered


Analytics products at LinkedIn

Deep Dive - Connection Strength

Deep Dive - Related Searches

Fun Facts: Sequencing the Startup DNA


                    71
SNA Team



Thanks to SNA Team at LinkedIn

http://sna-projects.com

We are hiring!



                    72
Questions?




    73

More Related Content

What's hot

Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jNeo4j
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracleLogan Palanisamy
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyAnthony (Tony) Sarris
 
Social Network Analysis: Applications & Challenges
Social Network Analysis: Applications & ChallengesSocial Network Analysis: Applications & Challenges
Social Network Analysis: Applications & ChallengesIIIT Hyderabad
 
Sql task answers
Sql task answersSql task answers
Sql task answersNawaz Sk
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Neo4j
 
Microsoft Power BI for Office 365 Pricing and Licensing
Microsoft Power BI for Office 365Pricing and LicensingMicrosoft Power BI for Office 365Pricing and Licensing
Microsoft Power BI for Office 365 Pricing and Licensing InnoTech
 
Step by Step Beginner's Guide on Workday Integration
Step by Step Beginner's Guide on Workday IntegrationStep by Step Beginner's Guide on Workday Integration
Step by Step Beginner's Guide on Workday IntegrationERP Cloud Training
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewSatyaki Sikdar
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practicesDmitry Anoshin
 
PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQueryin4400
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI ArchitectureArthur Graus
 
Big Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaBig Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaR A Akerkar
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...jexp
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPatti Anklam
 
DATABASE PROJECT
DATABASE PROJECTDATABASE PROJECT
DATABASE PROJECTabdul basit
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4jNeo4j
 
Database Design & Implementation Report
Database Design & Implementation ReportDatabase Design & Implementation Report
Database Design & Implementation ReportIan Morris
 

What's hot (20)

Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
Regular expressions in oracle
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
 
Hybrid use of machine learning and ontology
Hybrid use of machine learning and ontologyHybrid use of machine learning and ontology
Hybrid use of machine learning and ontology
 
Social Network Analysis: Applications & Challenges
Social Network Analysis: Applications & ChallengesSocial Network Analysis: Applications & Challenges
Social Network Analysis: Applications & Challenges
 
Sql task answers
Sql task answersSql task answers
Sql task answers
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
Knowledge Graphs for Transformation: Dynamic Context for the Intelligent Ente...
 
Microsoft Power BI for Office 365 Pricing and Licensing
Microsoft Power BI for Office 365Pricing and LicensingMicrosoft Power BI for Office 365Pricing and Licensing
Microsoft Power BI for Office 365 Pricing and Licensing
 
Step by Step Beginner's Guide on Workday Integration
Step by Step Beginner's Guide on Workday IntegrationStep by Step Beginner's Guide on Workday Integration
Step by Step Beginner's Guide on Workday Integration
 
Community Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief OverviewCommunity Detection in Social Networks: A Brief Overview
Community Detection in Social Networks: A Brief Overview
 
SAP BO and Teradata best practices
SAP BO and Teradata best practicesSAP BO and Teradata best practices
SAP BO and Teradata best practices
 
PowerPivot and PowerQuery
PowerPivot and PowerQueryPowerPivot and PowerQuery
PowerPivot and PowerQuery
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI Architecture
 
Big Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social MediaBig Data and Harvesting Data from Social Media
Big Data and Harvesting Data from Social Media
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
DATABASE PROJECT
DATABASE PROJECTDATABASE PROJECT
DATABASE PROJECT
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Database Design & Implementation Report
Database Design & Implementation ReportDatabase Design & Implementation Report
Database Design & Implementation Report
 

Viewers also liked

Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8Manuela Aparicio
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphAdrian-Tudor Panescu
 
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudKey-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudUniversity of New South Wales
 
Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook NetworksAndy Carvin
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph PresentationAmy W. Tang
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsNitish Upreti
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 

Viewers also liked (8)

Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8Smm8: Social Media Management_Gestao dos Social Media 8
Smm8: Social Media Management_Gestao dos Social Media 8
 
TAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social GraphTAO: Facebook's Distributed Data Store for the Social Graph
TAO: Facebook's Distributed Data Store for the Social Graph
 
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the CloudKey-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
Key-Key-Value Stores for Efficiently Processing Graph Data in the Cloud
 
Visualizing My Facebook Networks
Visualizing My Facebook NetworksVisualizing My Facebook Networks
Visualizing My Facebook Networks
 
LinkedIn Graph Presentation
LinkedIn Graph PresentationLinkedIn Graph Presentation
LinkedIn Graph Presentation
 
Facebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platformsFacebook's TAO & Unicorn data storage and search platforms
Facebook's TAO & Unicorn data storage and search platforms
 
Dex
DexDex
Dex
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 

Similar to Social Network Analysis at LinkedIn

Building Data Driven Products at Linkedin
Building Data Driven Products at LinkedinBuilding Data Driven Products at Linkedin
Building Data Driven Products at LinkedinMitul Tiwari
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api TrainingSpark Summit
 
Building a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDBBuilding a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDBMongoDB
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!Arjen de Vries
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19ngamou
 
Clustering
ClusteringClustering
Clusteringbutest
 
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Zia Babar
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesReal-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesEugene Dvorkin
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesMax De Marzi
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevDatabricks
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases MongoDB
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
UNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt pythonUNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt pythonSuganthiDPSGRKCW
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidYousun Jeong
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analyticsdatablend
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...semanticsconference
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...MongoDB
 
RDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondRDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondFadi Maali
 

Similar to Social Network Analysis at LinkedIn (20)

Building Data Driven Products at Linkedin
Building Data Driven Products at LinkedinBuilding Data Driven Products at Linkedin
Building Data Driven Products at Linkedin
 
Visual Api Training
Visual Api TrainingVisual Api Training
Visual Api Training
 
Building a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDBBuilding a Cross Channel Content Delivery Platform with MongoDB
Building a Cross Channel Content Delivery Platform with MongoDB
 
What to do when one size does not fit all?!
What to do when one size does not fit all?!What to do when one size does not fit all?!
What to do when one size does not fit all?!
 
Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19Collective entity linking with WSRM DocEng'19
Collective entity linking with WSRM DocEng'19
 
Clustering
ClusteringClustering
Clustering
 
Bill howe 2_databases
Bill howe 2_databasesBill howe 2_databases
Bill howe 2_databases
 
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29Waterloo Data Science and Data Engineering Meetup - 2018-08-29
Waterloo Data Science and Data Engineering Meetup - 2018-08-29
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesReal-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Developer Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker NotesDeveloper Intro Deck-PowerPoint - Download for Speaker Notes
Developer Intro Deck-PowerPoint - Download for Speaker Notes
 
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey GusevImage Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
Image Similarity Detection at Scale Using LSH and Tensorflow with Andrey Gusev
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
UNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt pythonUNIT V PYTHON.pptx python basics ppt python
UNIT V PYTHON.pptx python basics ppt python
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Druid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druidDruid meetup 4th_sql_on_druid
Druid meetup 4th_sql_on_druid
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analytics
 
Session 1.5 supporting virtual integration of linked data with just-in-time...
Session 1.5   supporting virtual integration of linked data with just-in-time...Session 1.5   supporting virtual integration of linked data with just-in-time...
Session 1.5 supporting virtual integration of linked data with just-in-time...
 
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
RDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and BeyondRDF Analytics... SPARQL and Beyond
RDF Analytics... SPARQL and Beyond
 

More from Mitul Tiwari

Large scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInLarge scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInMitul Tiwari
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Mitul Tiwari
 
Modeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systemsModeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systemsMitul Tiwari
 
Large scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationLarge scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationMitul Tiwari
 
Metaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendationsMetaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendationsMitul Tiwari
 
Related searches at LinkedIn
Related searches at LinkedInRelated searches at LinkedIn
Related searches at LinkedInMitul Tiwari
 
Structural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender SystemsStructural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender SystemsMitul Tiwari
 
Organizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsOrganizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsMitul Tiwari
 
Large-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and OpportunityLarge-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and OpportunityMitul Tiwari
 

More from Mitul Tiwari (9)

Large scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedInLarge scale social recommender systems at LinkedIn
Large scale social recommender systems at LinkedIn
 
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
Big Data Ecosystem at LinkedIn. Keynote talk at Big Data Innovators Gathering...
 
Modeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systemsModeling Impression discounting in large-scale recommender systems
Modeling Impression discounting in large-scale recommender systems
 
Large scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluationLarge scale social recommender systems and their evaluation
Large scale social recommender systems and their evaluation
 
Metaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendationsMetaphor: A system for related searches recommendations
Metaphor: A system for related searches recommendations
 
Related searches at LinkedIn
Related searches at LinkedInRelated searches at LinkedIn
Related searches at LinkedIn
 
Structural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender SystemsStructural Diversity in Social Recommender Systems
Structural Diversity in Social Recommender Systems
 
Organizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its ApplicationsOrganizational Overlap on Social Networks and its Applications
Organizational Overlap on Social Networks and its Applications
 
Large-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and OpportunityLarge-scale Social Recommendation Systems: Challenges and Opportunity
Large-scale Social Recommendation Systems: Challenges and Opportunity
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Social Network Analysis at LinkedIn

  • 1. Social Network Analysis at Linkedin Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn 1
  • 3. Outline About Me About LinkedIn Analytics products at LinkedIn Fun Facts: Sequencing the Startup DNA 3
  • 4. LinkedIn by the numbers 120,000,000+ users (August 4, 2011) 2+ new user registrations per second 81+ Million monthly unique users* (Comscore) 2 Billion People Searches in 2010 7.1 Billion page views in Q2 2010 2+ Million companies with LinkedIn Company Pages 2+ M LinkedIn Groups 4
  • 5. Broad Range of Products 5
  • 12. Outline About Me About LinkedIn Analytics products at LinkedIn Fun Facts: Sequencing the Startup DNA 12
  • 13. What do I mean by Analytics Products? 13
  • 14. People You May Know 14
  • 16. Viewers of this profile also ... 16
  • 19. Related Searches Millions of Searches everyday Help users to explore and refine their queries 19
  • 20. Analytics Products: Key Ideas Recommendations People You May Know, Related Searches, Viewers of this profile ... Insight Profile Stats: Who Viewed My Profile, Skills Visualization InMaps 20
  • 21. Analytics Products: Challenges LinkedIn: largest professional network 120+ million members on LinkedIn Billions of pageviews Terabytes of data to process 21
  • 22. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 22
  • 23. Connection Strength Let’s build “Connection Strength” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 23
  • 24. Connection Strength How well do people know each other? Connection Strength Application: reorder updates to show updates from close connections 24
  • 25. Connection Strength How well do Alice people know each other? Bob Carol 25
  • 26. Connection Strength How well do Alice people know each other? Bob Carol 26
  • 27. Connection Strength How well do Alice people know each other? Bob Carol Triangle closing 27
  • 28. Connection Strength How well do Alice people know each other? Bob Carol Triangle closing Prob(Bob knows Carol) ~ the # of common connections 28
  • 29. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 29
  • 30. Pig Overview Load: load data, specify format Store: store data, specify format Foreach, Generate: Projections, similar to select Group by: group by column(s) Join, Filter, Limit, Order, ... User Defined Functions (UDFs) 30
  • 31. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 31
  • 32. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 32
  • 33. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 33
  • 34. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 34
  • 35. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 35
  • 36. Triangle Closing Example Alice Bob Carol connections = LOAD `connections` USING 1.(A,B),(B,A),(A,C),(C,A) PigStorage(); 2.(A,{B,C}),(B,{A}),(C,{A}) 3.(A,{B,C}),(A,{C,B}) 4.(B,C,1), (C,B,1) 36
  • 37. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) group_conn = GROUP connections BY 2.(A,{B,C}),(B,{A}),(C,{A}) source_id; 3.(A,{B,C}),(A,{C,B}) 4.(B,C,1), (C,B,1) 37
  • 38. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) 2.(A,{B,C}),(B,{A}),(C,{A}) pairs = FOREACH group_conn GENERATE 3.(A,{B,C}),(A,{C,B}) generatePair(connections.dest_id) as (id1, id2); 4.(B,C,1), (C,B,1) 38
  • 39. Triangle Closing Example Alice Bob Carol 1.(A,B),(B,A),(A,C),(C,A) 2.(A,{B,C}),(B,{A}),(C,{A}) common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn 3.(A,{B,C}),(A,{C,B}) GENERATE flatten(group) as (source_id, dest_id), 4.(B,C,1), (C,B,1) COUNT(pairs) as common_connections; 39
  • 41. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 41
  • 42. Systems and Tools Kafka (LinkedIn) Hadoop (Apache) Azkaban (LinkedIn) Voldemort (LinkedIn) 42
  • 44. Systems and Tools Kafka publish-subscribe messaging system transfer data from production to HDFS Hadoop Azkaban Voldemort 44
  • 45. Systems and Tools Kafka Hadoop Java MapReduce and Pig process data Azkaban Voldemort 45
  • 46. Systems and Tools Kafka Hadoop Azkaban Hadoop workflow management tool to manage hundreds of Hadoop jobs Voldemort 46
  • 47. Systems and Tools Kafka Hadoop Azkaban Voldemort Key-value store store output of Hadoop jobs and serve in production 47
  • 48. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 48
  • 49. Hadoop MapReduce Large-scale distributed data processing Map phase: partition the problem in smaller steps Reduce phase: aggregate the output of smaller steps Allows parallelism and fault-tolerance 49
  • 50. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 50
  • 51. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 51
  • 52. Our Workflow Triangle Age School Overlap Closing Union push-to-qa push-to-prod 52
  • 54. Azkaban Workflow Management Dependency management Regular Scheduling Monitoring Diverse jobs: Java, Pig, Clojure Configuration/Parameters Resource control/locking Restart/Stop/Retry Visualization History Logs 54
  • 55. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 55
  • 56. Our Workflow Triangle Age School Overlap Closing Union push-to-prod 56
  • 57. Related Searches Let’s build “Related Searches” Systems and Tools we use Hadoop MapReduce Managing workflow Serving data in production 57
  • 58. Voldemort Large amount of data/Scalable Quick lookup/low latency Versioning and Rollback Fault tolerance through replication Read only Offline index building 58
  • 60. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 60
  • 61. Related Searches Millions of Searches everyday Help users to explore and refine their queries 61
  • 62. How to build Related Searches? Collaborative Filtering: searches done in the same session Q1 Q2 Q3 Q4 Time 62
  • 63. How to build Related Searches? Searches correlated by results clicks Q1 R1 Qn Rm 63
  • 64. How to build Related Searches? Searches with overlapping terms Q1 Jeff LinkedIn Q2 LinkedIn CEO 64
  • 65. Outline Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 65
  • 66. Sequencing the Startup DNA Done by Monica Rogati at LinkedIn 66
  • 67. Sequencing the Startup DNA •Top majors: Entrepreneurship, Computer Engineering •Bottom: Nursing, Administration 67
  • 68. Sequencing the Startup DNA •Most founders age at first startup between 20 and 40 68
  • 69. Sequencing the Startup DNA •Top regions: San Francisco, NYC 69
  • 70. Sequencing the Startup DNA •Top Business Schools: Stanford, Harvard, MIT Sloan 70
  • 71. Things Covered Analytics products at LinkedIn Deep Dive - Connection Strength Deep Dive - Related Searches Fun Facts: Sequencing the Startup DNA 71
  • 72. SNA Team Thanks to SNA Team at LinkedIn http://sna-projects.com We are hiring! 72