SlideShare a Scribd company logo
1 of 35
Recommendations @ LinkedIn 1
Think Platform Leverage Hadoop 2
The world’s largest professional networkOver 50% of members are now international 135M+ 75% * Fortune 100 Companies use LinkedIn to hire ** >2M Company Pages ** ~2/sec New Members joining *as of Nov 4, 2011**as of June 30, 2011 3
4 Recommendations Opportunity
5
6
7
8
9
10
       The Recommendations Opportunity Pandora Search for People Groups browse maps Events You May Be Interested In 11
50% 12
13 Positions Education Summary Experience Skills
Are all titles the same? ,[object Object]
Technical Yahoo
Member Technical Staff
Software Development Engineer
SDE,[object Object]
ibm research
T J Watson Labs
International Bus. Machines,[object Object]
Recommendation Trade-offsThe need for a common platform Content Analysis Collaborative 17
Recommendation Trade-offsThe need for a common platform Precision  Recall 18
Specialty -> Specialty          Skills-> Skills Seniority Skills Title Specialty Education Experience Location Industry          Title -> Title Matching 0.58 Seniority -> Seniority Related Titles Related Companies Related Industries 0.94 Binary Exact match Exact match in bucket Summary -> Summary 0.26 Title -> Related Title 0.18 Education -> Education Soft Match       v1 = tf * idf CosΘ =v1*v2 |v1|*|v2| 0.98 . . . 0.16 Seniority Skills Title Specialty Education Experience Location Industry 0.40 Related Titles Related Companies Related Industries
Importance  weight vector (Skills-> Skills) Feedback 0.70 Normalization,  Scoring  & Ranking Filtering Location Company Industry Similarity  score vector (Skills-> Skills) 0.94
Technologies
22 Hadoop Case Studies ,[object Object]
 Blending Recommendation Algorithms
 Grandfathering
 Model Selection
 A/B Testing
 Tracking and Reporting,[object Object]
24 Hadoop Case Studies ,[object Object]
 Blending Recommendation Algorithms
 Grandfathering

More Related Content

What's hot

“Controlling of messages flow in Microservices architecture” by Andris Lubans...
“Controlling of messages flow in Microservices architecture” by Andris Lubans...“Controlling of messages flow in Microservices architecture” by Andris Lubans...
“Controlling of messages flow in Microservices architecture” by Andris Lubans...DevClub_lv
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionTammy Everts
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowDatabricks
 
“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...
“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...
“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...DevClub_lv
 
Redis Day TLV 2018 - Using Redis for Schema Detection
Redis Day TLV 2018 - Using Redis for Schema DetectionRedis Day TLV 2018 - Using Redis for Schema Detection
Redis Day TLV 2018 - Using Redis for Schema DetectionRedis Labs
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosSpark Summit
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisVolha Banadyseva
 
„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...
„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...
„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...DevClub_lv
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Databricks
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DDomonkos Tikk
 

What's hot (11)

“Controlling of messages flow in Microservices architecture” by Andris Lubans...
“Controlling of messages flow in Microservices architecture” by Andris Lubans...“Controlling of messages flow in Microservices architecture” by Andris Lubans...
“Controlling of messages flow in Microservices architecture” by Andris Lubans...
 
Using machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversionUsing machine learning to determine drivers of bounce and conversion
Using machine learning to determine drivers of bounce and conversion
 
Importance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
 
“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...
“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...
“How to Succeed with Machine Learning” by Arturs Valujevs from Intrum Global ...
 
Redis Day TLV 2018 - Using Redis for Schema Detection
Redis Day TLV 2018 - Using Redis for Schema DetectionRedis Day TLV 2018 - Using Redis for Schema Detection
Redis Day TLV 2018 - Using Redis for Schema Detection
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
 
Ed Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual AnalysisEd Snelson. Counterfactual Analysis
Ed Snelson. Counterfactual Analysis
 
„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...
„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...
„OWASP Top Ten in Latvia“ by Agris Krusts from IT Centrs SIA at Security focu...
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
 
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&DChallenges Encountered by Scaling Up Recommendation Services at Gravity R&D
Challenges Encountered by Scaling Up Recommendation Services at Gravity R&D
 

Viewers also liked

LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationLinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationDataWorks Summit
 
Computational advertising in Social Networks
Computational advertising in Social NetworksComputational advertising in Social Networks
Computational advertising in Social NetworksAnmol Bhasin
 
Connecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInConnecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInAnmol Bhasin
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...Anmol Bhasin
 
Leadership in Uncertain Times - Hudson
Leadership in Uncertain Times - HudsonLeadership in Uncertain Times - Hudson
Leadership in Uncertain Times - HudsonHudsonAPAC
 
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendixLinked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendixAndres Bang
 
LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016Denis Curtin
 
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...LinkedIn Talent Solutions
 
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...LinkedIn Talent Solutions
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das
 
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
The latest in LinkedIn talent pool reports  | Talent Connect AnaheimThe latest in LinkedIn talent pool reports  | Talent Connect Anaheim
The latest in LinkedIn talent pool reports | Talent Connect AnaheimLinkedIn Talent Solutions
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...David Chen
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016Carl Steinbach
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Yahoo Developer Network
 
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a ProLive Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a ProLinkedIn
 
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - AmsterdamJorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - AmsterdamJorge Lascas
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionDeepak Agarwal
 

Viewers also liked (20)

Talent Pools: Using insights to power your talent acquisition strategy
Talent Pools: Using insights to power your talent acquisition strategyTalent Pools: Using insights to power your talent acquisition strategy
Talent Pools: Using insights to power your talent acquisition strategy
 
LinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data ApplicationLinkedIn Member Segmentation Platform: A Big Data Application
LinkedIn Member Segmentation Platform: A Big Data Application
 
Computational advertising in Social Networks
Computational advertising in Social NetworksComputational advertising in Social Networks
Computational advertising in Social Networks
 
Connecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedInConnecting Talent to Opportunity.. at scale @ LinkedIn
Connecting Talent to Opportunity.. at scale @ LinkedIn
 
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...Tutorial on People Recommendations in Social Networks -  ACM RecSys 2013,Hong...
Tutorial on People Recommendations in Social Networks - ACM RecSys 2013,Hong...
 
Leadership in Uncertain Times - Hudson
Leadership in Uncertain Times - HudsonLeadership in Uncertain Times - Hudson
Leadership in Uncertain Times - Hudson
 
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendixLinked in   data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
Linked in data to power sales - dreamforce nov 18 2013 - vfinal w. appendix
 
LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016LinkedIn Presentation Plainfield Library 2016
LinkedIn Presentation Plainfield Library 2016
 
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
By the Numbers: Leveraging LinkedIn Data to Become a Strategic Talent Advisor...
 
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
Leveraging Data: LinkedIn Recruiter Jobs and Talent Pool Analysis | Talent Co...
 
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop
 
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
The latest in LinkedIn talent pool reports  | Talent Connect AnaheimThe latest in LinkedIn talent pool reports  | Talent Connect Anaheim
The latest in LinkedIn talent pool reports | Talent Connect Anaheim
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
Hadoop Summit 2014: Building a Self-Service Hadoop Platform at LinkedIn with ...
 
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
LinkedIn's Logical Data Access Layer for Hadoop -- Strata London 2016
 
How AlphaGo Works
How AlphaGo WorksHow AlphaGo Works
How AlphaGo Works
 
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
Data Applications and Infrastructure at LinkedIn__HadoopSummit2010
 
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a ProLive Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
Live Webinar: Advanced Strategies for Leveraging Linkedin Like a Pro
 
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - AmsterdamJorge Lascas - Workshop linkedin successful strategies - Amsterdam
Jorge Lascas - Workshop linkedin successful strategies - Amsterdam
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
 

Similar to Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features at LinkedIn - Abhishek Gupta & Adil Aijaz, LinkedIn

Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoopCraig Jordan
 
Realtech assessment services combined slides final
Realtech assessment services combined slides finalRealtech assessment services combined slides final
Realtech assessment services combined slides finalCarly Shank
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution AnalyticsRevolution Analytics
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowDatabricks
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine LearningRandy Shoup
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Steve Feldman
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testingCharlie Hull
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Business Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at ScaleBusiness Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at ScaleSongtao Guo
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopJayant Shekhar
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Outsourcing your SharePoint Hosting: The Cloud's Fine Print Magnified
Outsourcing your SharePoint Hosting: The Cloud's Fine Print MagnifiedOutsourcing your SharePoint Hosting: The Cloud's Fine Print Magnified
Outsourcing your SharePoint Hosting: The Cloud's Fine Print MagnifiedSherWeb
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Michael Li
 

Similar to Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features at LinkedIn - Abhishek Gupta & Adil Aijaz, LinkedIn (20)

Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
Realtech assessment services combined slides final
Realtech assessment services combined slides finalRealtech assessment services combined slides final
Realtech assessment services combined slides final
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
12Nov13 Webinar: Big Data Analysis with Teradata and Revolution Analytics
 
Scaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflowScaling Ride-Hailing with Machine Learning on MLflow
Scaling Ride-Hailing with Machine Learning on MLflow
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07Sfeldman performance bb_worldemea07
Sfeldman performance bb_worldemea07
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Search Solutions 2015: Towards a new model of search relevance testing
Search Solutions 2015:  Towards a new model of search relevance testingSearch Solutions 2015:  Towards a new model of search relevance testing
Search Solutions 2015: Towards a new model of search relevance testing
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Business Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at ScaleBusiness Applications of Predictive Modeling at Scale
Business Applications of Predictive Modeling at Scale
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with Hadoop
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Outsourcing your SharePoint Hosting: The Cloud's Fine Print Magnified
Outsourcing your SharePoint Hosting: The Cloud's Fine Print MagnifiedOutsourcing your SharePoint Hosting: The Cloud's Fine Print Magnified
Outsourcing your SharePoint Hosting: The Cloud's Fine Print Magnified
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...Empower customer success at LinkedIn with advanced analytics and great visual...
Empower customer success at LinkedIn with advanced analytics and great visual...
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Hadoop World 2011: LeveragIng Hadoop to Transform Raw Data to Rich Features at LinkedIn - Abhishek Gupta & Adil Aijaz, LinkedIn

Editor's Notes

  1. - Ever since I studied Machine Learning and Data Mining at Stanford 3 years ago., I have been enamored by the idea that it is now possible to write programs that can sift through TBS of data to recommend useful things.- So here I am with my colleague AdilAijaz, for a talk on some of the lessons we learnt and challenges we faced in building large-scale recommender system.
  2. At LinkedIn we believe in building platform, not verticals. Our talk is divided into 2 parts. In the first part of thistalk, I will talk about our motivation for building the recommendation platform, followed by a discussion on how we do recommendations.No analytics platform is complete without Hadoop. So, in he next part of our talk, Adil will talk about leveraging Hadoop for scaling our products. ‘Think Platform, Leverage Hadoop’ is our core message. Throughout our talk, we will provide examples that highlight how these two ideas have helped us ‘scale innovation’.
  3. With north of 135 million members, we’re making great strides toward our mission of connecting the world’s professionals to make them more productive and successful. For us this not only means helping people to find their dream jobs, but also enabling them to be great at the jobs they’re already in.-With terabytes of data flowing through our systems, generated from member’s profile, their connections and their activity on LinkedIn, we have amassed rich and structured data of one of the most influential, affluent and highly-educated audience on the web. This huge semi-structured data is getting updated in real-time and growing at a tremendous pace, we are all very excited about the data opportunity at LinkedIn
  4. For an average user, there is so much data, there is no way users can leverage all the data on their own. We need to put the right information, to the right user at the right time.With such rich data of members, jobs, groups, news, companies, schools, discussions and events. We do all kinds of recommendations in a relevant and engaging way.
  5. We have products like ‘Job Recommendation’: here using profile data we suggest top top jobs that our member might be interested in. The idea is to let our users be aware of the possibilities out there for them.
  6. ‘Talent Match’: When recruiters post jobs, we in real-time suggest top candidates for the job.
  7. ‘News Recommendation’: Using articles shared per industry, we suggest top news that our users need to keep them updated with the latest happenings.
  8. ‘Companies You May Want to Follow’: Using a combination of content matching and collaborative filtering, we recommend companies a user might be interested in keeping up-to date with.
  9. ‘PYMK’: based on common attributes like connections, schools, companies and some activity based features, we suggest people that you may know outside of LinkedIn and may want to connect with at LinkedIn.
  10. ‘Similar Profiles’: Finally our latest offering which we released a few months ago for recruiters, Similar Profiles. Given a candidate profile, we suggest top Similar candidates for hiring based on overall background, experience and skills.
  11. We have recommendation solutions for everyone, for individuals, recruiters and advertisers In our view, recommendations are ubiquitous and they permeate the whole site.
  12. Are recommendations really important?To put things in perspective, 50% of total job applications and job views by members are a direct result of recommendations. Interestingly, in the past year and half it has risen from 6% to 50%. This kind of contribution is observed across all our recommendation products and is growing by the day.
  13. Let us start with an example of the kind of data we have. For a member, we have positions, education, Summary, Specialty, Experience and skills of the user from the profile itself. Then, from member’s activity we have data about members connections, the groups that the member has joined, the company that the member follows amongst others.
  14. Before we can start leveraging data for recommendation, we first need to clean and canonicalize it.Lets take an example for matching member to jobs, In order to accurately match members to jobs, we need to understand that all the ways of listing these titles like ‘Software Engineer’, ‘Technical Yahoo’, ‘Member Technical Staff’ and ‘SDE’ mean the same entity i.e. ‘Software Engineers’. Solving this problem is itself a research topic broadly referred to as ‘Entity Resolution’.
  15. As another example, How many variations do you think, we have for the company name ‘IBM’?When I joined LinkedIn, I was surprised to find that we had close to 8000+ user entered variations of the same company IBM.We apply machine learnt classifiers for the entity resolution using a host of features for company standardization. In summary, data canonicalization is the key to accurate matching and is one of the most challenging aspects of our recommendation platform.
  16. Now, we will discuss our motivation behind building a common platform by way of 3 key example trade-offs that we’ve encountered. In LinkedIn ecosystem, one trade-off that we encountered is that of real-time Vs time independent recommendations. Lets look at ‘News Recommendation’, which finds relevant news for our users. Relevant news today might be old news tomorrow. Hence, News recommendation has to have a strong real-time component. On the other hand, we have another product called ‘Similar Profiles’. The motivation here is that if a hiring manager, already know the kind of person that he wants to hire. He could be like person in his team already or like one of his connections on LinkedIn, then using that as the source profile, we suggest top Similar Profiles for hiring. Since, ‘people don’t reinvent themselves everyday’. ‘People similar today are most likely similar today’. So, we can potentially do this computation completely offline with a more sophisticated model.These are the 2 extreme cases in terms of ‘freshness’: Most examples fall into an intermediate category. For E.g. new jobs gets posted by the hour and they expire when they get filled up. All jobs posted today don’t expire the same day. Hence, we cache job recommendation for members for some time as it is OK to not recommend the absolute latest jobs instantly to all members.In solving the completely real-time Vs completely offline problem, we could have gone down the route of creating separate solutions optimized for the use-case. In the short run, that would have been a quicker solution. But we went down the platform route because we realized that we would churn out more and more such verticals as LinkedIn grows. Now, as a result of which we have the same code that computes recommendations online as well as offline. Moreover, in the production system caching and an expiry policy allows us to keep recommendations fresh irrespective of how we compute the recommendations. Now, as a result for newer verticals. We can easily get ‘freshness’ of recommendations irrespective of whether we compute recommendations online or offline.
  17. Another interesting trade-off, is choosing between content analysis and collaborative filtering.Historically speaking: ‘job posting has been a post-and-prey’ model. Where job posters post a job and pray that hopefully someone will apply for the job. But we at LinkedIn, believe in ‘post-and-produce’ so we go ahead and produce matches to the job poster in real-time right after the job gets posted. When someone posts a job, the job poster naturally expects the candidates to have a strong match between job and the profile. Hence, this type of recommendation is heavy on content analysis. On the other hand, we have a product called ‘Viewers of the profile also viewed ..’. When a member views more than 1 profile within a single session. We record it as a co-view. Then on aggregating these co-views for every member. We get the data for all profiles that get co-viewed, when someone visits any given profile. This is a classical collaborative filtering based recommendation much like Amaozon’s ‘people who viewed this item also viewed’Most other recommendations are hybrid. For e.g. for ‘SimilarJobs’ jobs that have high content overlap with each other are similar. Interestingly, jobs that get applied to or viewed by the same members are also Similar. So, Similar Jobs is a nice mix of content and collaborative filtering.Again, Because of a platform approach, we can re-use the content matching and the collaborative filtering components to come up with newer verticals without re-inventing the wheel.
  18. Finally, the last key trade-off is of precision vs recall.On our homepage, we suggest jobs that are good fit for our users with the motivation for them to be more aware of the possibilities out there. In some sense, we are pushing recommendations to you, as opposed to you actively looking for them. Even if a single job recommendation looks bad to the user either because of a lower seniority of the job or because the recommendation is for the company that the user is not fond of, our users might feel less than pleased. Here, getting the absolute best 3 jobs even at the cost of aggressively filtering out a lot of jobs is acceptable.On the other hand, We have another recommendation product called ‘Similar Profiles’ for hiring managers who are actively looking for candidates. Here, if one finds a candidate we suggest other candidates like the original one in terms of overall experience, specialty, education background and a host of other features. Since, the hiring manager is actively looking so in this case they are more open to getting a few bad ones as long as they get a lot of good ones too. So in essence, recall is more important here.Again, because of a platform approach and because we can re-use features, filters and code-base across verticals. So, tuning the knob of more precision vs more recall is mostly a matter of figuring out 1. how complicated the matching model should be and 2. how aggressively we want to apply filters for all recommendation verticals. Hence, our core message ‘Think Platform’. Now, we will discuss in some detail how our recommendations work.
  19. Lets see how we do recommendations by taking an example of ‘Similar Profiles’ that we just discussed. Given a member profile, the goal is to find other similar people for hiring. Lets try to find profiles similar to me.Here, we look at host of different features.Such as User provided features like ‘Title, Specialty, Education, experience amongst others’ Complex derived features like ‘Seniority’ and ‘Skills’, computed using machine learnt classifiers. Both these kinds of features help in precision, we also have features like ‘Related Titles’ and ‘Related Companies’ that help increase the recall. Intuitively, one might imagine that we use the following pair of features to compute Similar Profiles. In the next slide, we will discuss a more principled approach to figuring out pair of features to match against. Here in order to compute overall of similarity between me and Adil, we are first computing similarity between our specialties, our skills, our titles and other attribute.
  20. With this we get a ‘Similarity score vector’ for how Similar Adil is to me, Similarly we can get such a vector for other profiles. Now we somehow need to combine the similarity score in the vector to a single number s.t. profiles that have higher similarity score across more dimensions get ranked higher, as similar profiles for me.Moreover, the fact that our skills match might matter more for hiring than whether our education matches. Hence, there should be relative importance of one feature over the others. Once we get the topK recommendations, we also apply application specific filtering with the goal of leveraging domain knowledge For example, it could be the case that for a ‘Data Engineer role’ you as a hiring manager are looking for a candidate like one of your team member but who is local. Whereas, for all you know the ideal ‘Data Engineer most similar to the one you are looking for in terms of skills might be working somewhere in INDIA’ .To ensure our recommendation quality keeps improving as more and more people use our products, we use explicit and implicit user feedback combined with crowd-sourcing, to construct high quality training and test sets for constructing the ‘Importance weight vector’. Moreover, classifier with L1 regularization helps prune out the weakly correlated features. We use this for figuring out which features to match profiles against.We just discussed an example. However, the same concepts apply to all the recommendation verticals.
  21. And now the technologies that drives it all. The core our matching algorithm uses Lucene with our custom query implementation. We use Hadoop to scale our platform. It serves a variety of needs from computing Collaborative filtering features, building Lucene indices offline, doing quality analysis of recommendation and host of other exciting things that Adil will talk about in a bit. Lucene does not provide fast real-time indexing. To keep our indices up-to date, we use a real-time indexing library on top of Lucene called Zoie. We provide facets to our members for drilling down and exploring recommendation results. This is made possible by a Faceting Search library called Bobo. For storing features and for caching recommendation results, we use a key-value store Voldemort. For analyzing tracking and reporting data, we use a distributed messaging system called Kafka.Out of these Bobo, Zoie, Voldemort and Kafka are developed at LinkedIn and are open sourced. In fact, Kafka is an apache incubator project.Historically, we have used R for model training. We have recently started experimenting with Mahout for model training and are excited about it. All the above technologies, combined with great engineers powers LinkedIn’s Recommendation platform.Now Adil will talk about how we leverage Hadoop.
  22. In the second half of our talk we will present case studies on how Hadoop has helped us scale innovation for our Recommendation Platform. We will use the ‘Similar Profiles’ vertical which was discussed earlier as the example for each case study. As a quick reminder, similar profiles recommends profiles that are similar to the one a user is interested in. Some of its biggest customers are hiring managers and recruiters. For each of the case studies, we will lay out the solutions we tried before turning to Hadoop, analyze the pros and cons of the approaches before and after Hadoop, and finally derive some lessons that are applicable to folks working on large scale recommendation systems.
  23. When it comes to recommendations, relevance is the most important consideration. However, with over 120M members, and billions of recommendation computations, the latency of our recommendations becomes equally important. No mater how great our recommendations are, they wont be of utility to our members if we take too long to return recommendations. Among our many products, ‘Similar Profiles’ is a particularly challenging product to speed up. Our plain vanilla solution involved using a large number of features to mine the entire member index for the best matches. The latency of this solution was in the order of seconds. Clearly, no matter how relevant our recommendations were, with that type of latency, our members would not even wait for the results. So, something had to be done. We needed a solution that could pre-filter most of the irrelevant results while maintaining a high precision on the documents that survived the filter. One technique that meets these conditions is minhashing. On a very high level, minhashing involves running each document, in our case member profiles, through k hash functions to to construct a bit vector. One can play with ANDING/Oring of subsets of the bit vector, to get the right balance between recall and precision. As our second pass solution, we minhashed each document and stored the resulting bit array in the member index. At query time, we minhashed the query into a bit array, filtered out documents that did not have the exact same subsets of the bit array, and finally did advanced matching on documents that survived the filtration. This solution brought down the latency well below a second, however, minhashing did not give us the recall we had hoped for. This was a really disappointing result since we had spent significant engineering resources in productionalizing minhashing, yet it was all for nought. So, we went back to the drawing board and started thinking about how we can use Hadoop to solve this problem. The key breakthrough was when we realized that people do not reinvent themselves everyday. The folks I was similar to yesterday, are likely to be the same folks I am similar to today. This meant that we could serve ‘similar profiles’ recommendations from cache. When the cache would expire, we could compute fresh recommendations online and repopulate the cache. This meant that the user would almost always be served from cache. Great: but we still have to populate the cache somehow. This is where Hadoop comes into the picture. By opening an index shard per mapper, we can generate a portion of the recommendations in each mapper, and combine them into a final recommendation set in the reducers. With the distributed computation of Hadoop, we easily generate similar profiles for each member, and then copy over the results to online caches. So the three elements of:Offline batch computations on Hadoop copied to online caches.Aggressive online caching policyOnline computation when the cache expiresHave scaled our similar profiles recommendations while maintaining a high precision of recommendations.So, the key takeaway from this case study of scaling is that if one is facing the problem of:High latency computationHigh qpsAnd not so stringent freshness requirements Then one should leverage Hadoop and caching to scale the recommendation computation.
  24. With our scaling problems solved, we rolled out Similar Profiles to our members. The reception was amazing. However, we felt that we could do even better by going beyond content based features alone. One such feature that we wanted to experiment with was collaborative filtering. More specifically, if a member browses multiple member profiles within a single session, aka coviews, it is quite likely that those member profiles are very similar to each other. How we blend collaborative filtering with existing content based recommendations is the subject of our second case study: “blending multiple recommendation algorithms”.Our basic blending solution is this: While constructing the query for content based similar profiles, we fetch collaborative filtering recommendations and their scores, and attach them to the query. In the scoring of content based recommendations, we can use the collaborative filtering score as a boost. An alternative approach is a bag of models approach with content and collaborative filtering serving as two of the models in the bag.In either solution, we need a way to keep collaborative filtering results fresh. If two member profiles were coviewed yesterday, we should be able to use that knowledge today.We first sketched out a completely online solution. This online solution involved keeping track of the state of each member session, accumulating all the profile views within that session. At the end of each session, we updated the counts of the various coview pairs. As you can appreciate, such a stateful solution can get very complicated very quickly. As an example, we have to worry about machine failures, multi data center coordination just to name two challenges. In essence such a solution could introduce more problems than it solves. So, we scratched this solution even before implementing it.We thought more about this problem and realized two important aspects: 1) Coview counts can be updated in batch mode. 2) We can tolerate delay in updating our collaborative filtering results. These two properties, batch computation and tolerance for delay in impacting the online world, led us to leverage Hadoop to solve this problem. Our production servers can produce tracking events everytime a member profile is viewed. These tracking events are copied over to hdfs where everyday, we use these tracking events to batch compute a fresh set of collaborative filtering recommendations. These recommendations are then copied to online key value stores where we use the blending approaches outlined earlier to blend collaborative filtering and content based recommendations.Compared to the purely online solution, the Hadoop solution is simpler in complexity, less error prone, but introduces a lag between the time two profiles are coviewed and the time that coview has an impact on similar profiles. For us, this solution works great. The other great thing about this solution is that it can be easily extended to blend social or globally popular recommendations in addition to collaborative filtering.The lesson we derive from this case study is that by leveraging Hadoop, we were able to experiment with collaborative filtering in similar profiles without significant investment in an online system to keep collaborative filtering results fresh. Once our proof of concept was successful, we could always go back and see if reducing the lag between a profile coview and its impact on similar profile by building an online system would be useful. If it is, we could invest in a non-Hadoop system. However, by leveraging Hadoop, we were able to defer that decision till the point when we had data to backup our assumptions.
  25. A consistent feedback from hiring managers using Similar Profiles was that while the recommendations were highly relevant, often times the recommended members were not ready to take the next steps in their professional career. Such recommended members would thus respond negatively to a contact from the hiring manager, leading to a bad experience for the hiring manager. This feedback indicated a strong preference from our users that they would like a tradeoff between relevance of recommendations and the responsiveness of those recommendations. One can imagine a similar scenario playing out for a sales person looking for recommendations for potential clients.As our next case study, let’s take a look at how we approached solving this problem for our users. Let’s say we come up with an algorithm that assigns each LinkedIn member a ‘JobSeeker’ score which indicates how open is she to taking the next step in her career. As we said already, this feature would be very useful for ‘Similar Profiles’. However, the utility of this feature would be directly related to how many members have this score: aka coverage. The key challenge we faced was that since ‘Similar Profiles’ was already in production, we had to add this new feature while continuing to serve recommendations. We call this problem “grandfathering”.A naïve solution, could be to assign a ‘jobseeker’ score to a member next time she updates her profile. This approach will have minimal impact on the system serving traffic, however, we will not have all members tagged with such a score for a very long time, which impacts the utility of this feature for ‘Similar Profiles’.So, we scratch the naïve solution and look for a solution that will batch update all members with this score in all data centers while serving traffic.A second pass solution is to run a ‘batch’ feature extraction pipeline in parallel to the production feature extraction pipeline. This batch pipeline will query the db for all members and add a ‘job seeker score’ to every member. This solution ensures that we have an upper bound on the time it takes to grandfather all members with job seeker score. It will work great for small startups whose member base is in a few million range.However, the downside of this solution at LinkedIn scale is:It adds load on the production databases serving live traffic.To avoid the load, we end up throttling the batch solution which in turn makes the batch pipeline run for days or weeks. This slows down rate of batch update.Lastly, the two factors above combine to make grandfathering a ‘dreaded word’. You only end up grandfathering ‘once a quarter’ which is clearly not helpful in innovating faster.So, we clearly cannot use that solution either. However, one good aspect of this solution is the batch update which leads us into a Hadoop based solution.Using Hadoop, we take a snapshot of member profiles in production, move it to hdfs, grandfather members with a ‘job seeker score’ in a matter of hours, and copy this data back online. This way we can grandfather all members with a job seeker score in hours. The biggest advantage of using Hadoop here is that grandfathering is no longer a ‘dreaded word’. We can grandfather when ready instead of grandfather every quarter which speeds up innovation.So, in a nutshell, if one finds oneself slowed down due to constraints of updating features in the online world, consider batch updating the features offline using Hadoop and swapping them out online.
  26. With the first few versions of similar profiles out the door, we began to simultaneously investigate a number of avenues for improvement. Some of us investigated different model families, say logistic regression vs SVM, others investigated new features with the existing model\\. In this case study, we will talk about how we decided which one of these experiments would actually improve the online relevance of similar profiles so we could double down on getting them out to production. We are not concerned with ‘how’ we come up with these new models. For all that matters, we hand tuned a common sense model. The question is how to decide whether or not to move that new model to production.Now, as a base solution, we can always move every model to production. We can A/B test the model with real traffic and see which models sink and which ones float. The simplicity of this approach is very attractive, however, there are some major flaws with it:For each models we have to push code to production. This takes up valuable development resources. There is an upper limit on the number of A/B tests one can run at the same time. This can be due to user experience concerns and/or revenue concerns.Since online tests need to run for days before enough data is accumulated to make a call, this approach slows down rate of progress.Ideally, we would like to try all our ideas offline, evaluate them, and only push to production the best ones. Hadoop proves critical in the evaluation step. Overtime, using implicit and explicit feedback combined with crowdsourcing, we have accumulated a huge gold test set for ‘similar profiles’. We rank the gold set with each model on Hadoop, and use standard ranking metrics to evaluate which one performs best.As you can guess, Hadoop provides a very good sandbox for our ideas. We are able to filter many of the craziest ideas, and double down on only those few that show promise. Plus, it allows us to have relatively large gold sets which gives us strong confidence in our evaluation results.
  27. Now that we have learned a new model for ‘Similar Profiles’ that performs well in our offline evaluation framework, we need to test it online. An industry standard approach to this problem is known as A/B testing or bucket testing. Formally, AB testing involves partitioning real traffic between alternatives and then evaluating which alternative maximizes the desired objective. Typical desired objectives are CTR, revenue, or number of views.The key requirements of AB testing:1. Time to evaluate which bucket to send traffic to should ideally be < 1ms, at-worst a few ms.Let’s discuss how we would do A/B testing for our new model. For simple partitioning requirements, one can use a mod-based scheme. This is very fast and very simple and can satisfy most use cases. However, if one wishes to partition traffic based on profile and member activity criteria for e.g. “Send 10% of members who have greater than 100 connections AND who have logged in the last one week AND who are based in Europe” then doing this online is too expensive. Keep in mind that deciding which bucket to send traffic to should be very fast, ideally less than a millisecond. In worst case scenario a few milliseconds. I am not going to even attempt an online solution for this problem.So, we go straight to Hadoop. For complex criteria like this, we run over our entire member base on Hadoop every couple of hours, assigning them to the appropriate bucket for each test. The results of this computation are pushed online, where the problem of A/B testing reduces to given a member and a test, fetching which bucket to send the traffic to from cache.The take home message here is: If you need complex targeting and A/B testing, leverage Hadoop.
  28. Our last case study involves the last step of a model deployment process: tracking and reporting. These two steps allow us to have an unbiased, data-driven way of saying whether or not a new model is successful in lifting our desired metrics: ctr or revenue or engagement or whatever else one is interested in. Our production servers generate tracking events every time a recommendation is impressed, clicked or rejected by the member. Before Hadoop, we used to have an online reporting tool that would listen to tracking events over a moving window of time, doing in-memory joins of different streams of events and reporting up-to-the minute performance of our models. The clear advantages of this approach were that we could see exactly how the model was performing online at this moment. However, there are a few downsides such asOne cannot look at greater than certain amount of time window in the past..As the number of tracking streams increases, it becomes harder and harder to join them online.To increase the time window, we will have to spend significant engineering resources in architecting a scalable reporting system which would be an overkill. Instead we placed our bet on Hadoop. All tracking events at LinkedIn are stored on HDFS. Add to this data Pig or plain Map-Reduce, we can do arbitrary k-way joins across billions of rows to come up with reports that can look as far in the past as we want to.The advantages of this approach are quite clear. Complex joins are easy to compute and reporting is flexible on time windows. However, we cannot have up-to-the minute reports since we copy over tracking events in batch to hdfs. If we ever need that level of reporting, we can always use our online solution.We can say without any hesitation that Hadoop has now become an integral part of the whole life-cycle of our workflow starting from prototyping a new idea to eventually tracking the impact of that idea.
  29. By thinking about platform and not verticals, we are able to to come with newer verticals at a fast pace. By leveraging Hadoop, we were able to continuously improve the quality and scale the computations.Hence, these 2 ideas helped us ‘scale innocation’ at Linkedin.
  30. To conclude, we want to say that the data opportunity at LinkedIn is HUGE and so come work with us at LinkedIn!