SlideShare a Scribd company logo
©2015 LinkedIn Corporation. All Rights Reserved.
1
Methods for approaching content
relevance at LinkedIn
Ajit Singh
DataEngConf, April 7th 2016
©2015 LinkedIn Corporation. All Rights Reserved.
22
Who am I ?
•  Builds models and infrastructure which
serves those models in production.
•  Background in machine learning
•  Ph.D. Machine Learning, CMU
•  Post-doc, University of Washington.
©2015 LinkedIn Corporation. All Rights Reserved.
33
Who Do I Work With ?
Engineering team with expertise in
machine learning and systems.
©2015 LinkedIn Corporation. All Rights Reserved.
44
Why attend this talk ?
Feed Slideshare
Rich media: slides and video,
with transcripts
Pulse
Content discovery.
Groups
Conversation Threads
©2015 LinkedIn Corporation. All Rights Reserved.
55
Article Recommendation
©2015 LinkedIn Corporation. All Rights Reserved.
6
Common Pattern for Recommendations
Keep it Simple
1.  Feature Extraction
–  What do I know about this content ?
2.  Indexing & Search
–  How do I store and retrieve documents and their features?
–  What candidates should I consider for this request ?
3.  Recommendation
–  What should I show you ?
6
©2015 LinkedIn Corporation. All Rights Reserved.
77
Engineering Stacks
Offline (minutes – days)
•  Hadoop
•  Spark
•  “Big Data” tools
Nearline (seconds – hours)
•  Samza
•  Spark Streaming, Storm
Online (milliseconds)
•  Services
•  DB, Distributed Key-Value Stores
©2015 LinkedIn Corporation. All Rights Reserved.
99
Kinds of Model Features
§  Member Features
–  What we know about the viewer or guest.
–  E.g., Industry, Skills, Languages Spoken
§  Document Features
–  What we know about a candidate item for recommendation.
–  E.g., For articles: vector-space, topics, social gestures.
§  Engagement
–  Aggregations of tracking data
–  E.g., Per-item click-through rates, Dwell time.
–  Think OLAP cubes.
©2015 LinkedIn Corporation. All Rights Reserved.
1010
Document Features
©2015 LinkedIn Corporation. All Rights Reserved.
1111
Document Features
©2015 LinkedIn Corporation. All Rights Reserved.
1212
NLP
§  Featurizers which process text one document at a time.
–  Foundational:
§  Tokenization, Lemmatization, Stop-word removal => Vector-space models
§  Text near-deduplication (w-shingling, SpotSigs)
§  Language detection
–  Classifiers:
§  Explicitly categorize the document into one or more label sets.
§  Fast Prototyping
–  Build a library first
–  Deploy the library to Hadoop first: e.g., via Pig, Scalding.
–  Don’t build a near-real time system till you have validated features.
©2015 LinkedIn Corporation. All Rights Reserved.
1313
Near-Real Time Feature Generation
§  Before starting, ask yourself – do I need near-real time features?
14
Offline
Nearline
Online
Article
DB
1
Article Stream
Article Stream
2
Language
Topics
Entity Extractors
Text Hashes
Language Features
Topic Features
Entity Features
Hash Features
Language Features
Hash Features
Topic Features
Entity Features
Search Index
3
6
5
Other Features
4
Offline
Producer
©2015 LinkedIn Corporation. All Rights Reserved.
15
Pre-computed Recommendations
When you can stick to the offline world.
§  In many cases the recommendation problem is constrained: e.g.,
–  You know which users are likely to visit in a given time period.
–  Documents being considered will not change quickly.
§  Give me the top-k documents by highest normalized click-through rate.
–  Recommendations are pushed (e-mail, push notifications)
§  Just pre-compute recommendations for likely-to-visit users
–  Obvious parallelization via Hadoop.
–  Agile development.
–  Send recommendations to a distributed key-value store to serve.
§  Presupposes having good tracking & data pipelines (data lake).
15
©2015 LinkedIn Corporation. All Rights Reserved.
1616
Data Science / Engineering Contracts
§  How do I get my data into the near real-time flow ?
§  How do I deploy a model for feature X ?
§  What if my model has a large number of parameters ?
–  Language models are notoriously huge.
–  Memory is often a tighter constraint than CPU.
§  What if I want to A/B test different versions of a feature ?
§  What happens if a feature source fails ?
©2015 LinkedIn Corporation. All Rights Reserved.
1818
Search Indices
§  We use an in-house search
system called Galene.
§  Store features as searchable
facets within the index.
©2015 LinkedIn Corporation. All Rights Reserved.
1919
Why ?
§  Flexible candidate selection,
–  Give me all the English-language documents ingested in the last four
days, which also mention the term “Lectra”.
–  Give me all promoted articles tagged with “Grace Hopper 2016”
§  Consolidate state management across search & recommendations.
©2015 LinkedIn Corporation. All Rights Reserved.
2020
Search Verticals
Search Federator
Recommender
…
Pre-computed Recommendations
Articles Slides
Groups Courses
Search Verticals
Search
query
form
ulated
query
lookup
lookup
©2015 LinkedIn Corporation. All Rights Reserved.
2121
Continuum between Search & Recommendations
Search Recommendation
Navigational Broad /
Exploratory
Guided &
Faceted Search
Empty
Search
©2015 LinkedIn Corporation. All Rights Reserved.
2323
Homepage Module
Epsilon-Greedy Explore/Exploit
Target’s Red Ink Runs
out in Canada.
Why we need “Economy
Wide” Airline Seats.
How much work is too
much work?
We can learn from barn
raisers.
#1
#2
#3
#4
0.3241%
0.5923%
0.4864%
0.0231%
24
©2015 LinkedIn Corporation. All Rights Reserved.
2525
Key Ideas
§  Online Algorithm
–  The model continuous updates with feedback from decisions.
–  Infrastructure components:
§  Near real-time counting (OLAP).
§  Efficient scoring of k-candidates per-request.
–  Allows for warm/cold-start models (c.f., Thompson sampling)
§  What algorithms do well vs. what humans do well.
–  Candidate selection by human editors.
–  Ranking via algorithms.
©2015 LinkedIn Corporation. All Rights Reserved.
2626
Algorithm Aversion
“Although people may be willing to trust an
algorithm in the absence of experience with it,
seeing it perform—and almost inevitably err—
will cause them to abandon it in favor of a human
judge. This may occur even when people see the
algorithm outperform the human.”
Dietvorst et al.
J. Exp. Psych Res. 2014
Man + Machine
©2015 LinkedIn Corporation. All Rights Reserved.
2828
Engineering & Data Science
§  The core challenge is that data-driven products are complex
–  Tracking
–  Data Warehousing
–  Offline Infrastructure (Hadoop & Spark)
–  Modelers
–  Nearline & Online Infrastructure
§  Craftsmanship
–  Clear contracts are critical
§  Are you providing recommendations ?
§  Are you providing data sets ?
§  Are you providing modeling infrastructure to other teams ?
©2015 LinkedIn Corporation. All Rights Reserved.
29
Align on Metrics
Know why something matters.
§  True North Metrics:
–  Track the health of a product.
–  Usually affected by many aspects of the product; not just relevance.
–  May not be measurable on short-time spans.
§  e.g., Revenue in a subscription funnel.
§  Signpost Metrics:
–  Leading indicators of health for a true north.
–  Measurable, often via an A/B test.
§  Relevance Metrics:
–  You have an optimization problem, this is what it optimizes.
–  Rare that you can directly optimize your true north metric.
DataEngConf SF16 - Methods for Content Relevance at LinkedIn

More Related Content

What's hot

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoCodecamp Romania
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a Jedi
Trevor Parsons
 
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Data Con LA
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
Konstantin Gredeskoul
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
gluent.
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
Spark Summit
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
confluent
 
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
VoltDB
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Turkish Testing Board
 
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudDisrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
Jen Aman
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
Jamie Grier
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
Robert Dempsey
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
QA InfoTech
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
Alexis Seigneurin
 
Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#
J On The Beach
 
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge ScaleDOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
Gene Kim
 
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
What's The Role Of Machine Learning In Fast Data And Streaming Applications?What's The Role Of Machine Learning In Fast Data And Streaming Applications?
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
Lightbend
 
Intro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverIntro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - Denver
Sri Ambati
 

What's hot (20)

Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcaderoIasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
Iasi code camp 20 april 2013 testing big data-anca sfecla - embarcadero
 
Correlate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a JediCorrelate Log Data with Business Metrics Like a Jedi
Correlate Log Data with Business Metrics Like a Jedi
 
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
Big Data Day LA 2016/ Big Data Track - Rapid Analytics @ Netflix LA (Updated ...
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
 
The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
 
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
Kyle Kingsbury Talks about the Jepsen Test: What VoltDB Learned About Data Ac...
 
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland LeusdenTestistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
Testistanbul 2016 - Keynote: "Performance Testing of Big Data" by Roland Leusden
 
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudDisrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Big Data Testing
Big Data TestingBig Data Testing
Big Data Testing
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#Agile experiments in Machine Learning with F#
Agile experiments in Machine Learning with F#
 
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge ScaleDOES SFO 2016 - Avan Mathur - Planning for Huge Scale
DOES SFO 2016 - Avan Mathur - Planning for Huge Scale
 
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
What's The Role Of Machine Learning In Fast Data And Streaming Applications?What's The Role Of Machine Learning In Fast Data And Streaming Applications?
What's The Role Of Machine Learning In Fast Data And Streaming Applications?
 
Intro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverIntro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - Denver
 

Viewers also liked

DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
Hakka Labs
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
Hakka Labs
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
Hakka Labs
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
Hakka Labs
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
Hakka Labs
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
Hakka Labs
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
Hakka Labs
 
Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
Hakka Labs
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
Hakka Labs
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Hakka Labs
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
Hakka Labs
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
Hakka Labs
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
Hakka Labs
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
Hakka Labs
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
Hakka Labs
 

Viewers also liked (17)

DataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with OurselvesDataEngConf SF16 - Beginning with Ourselves
DataEngConf SF16 - Beginning with Ourselves
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High DeliverabilityDataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
DataEngConf SF16 - Routing Billions of Analytics Events with High Deliverability
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
DataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scaleDataEngConf SF16 - Running simulations at scale
DataEngConf SF16 - Running simulations at scale
 
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
DataEngConf SF16 - Tales from the other side - What a hiring manager wish you...
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using SparkDataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
DataEngConf SF16 - Entity Resolution in Data Pipelines Using Spark
 
Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)Always Valid Inference (Ramesh Johari, Stanford)
Always Valid Inference (Ramesh Johari, Stanford)
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor DataDataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
DataEngConf SF16 - Deriving Meaning from Wearable Sensor Data
 
DataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineeringDataEngConf SF16 - Bridging the gap between data science and data engineering
DataEngConf SF16 - Bridging the gap between data science and data engineering
 
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
DataEngConf SF16 - Unifying Real Time and Historical Analytics with the Lambd...
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
DataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data StructuresDataEngConf SF16 - Multi-temporal Data Structures
DataEngConf SF16 - Multi-temporal Data Structures
 

Similar to DataEngConf SF16 - Methods for Content Relevance at LinkedIn

The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
Embarcadero Technologies
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
Venkatesh Umaashankar
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation framework
Joseph Adler
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
Amazon Web Services
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
Inside Analysis
 
Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...
Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...
Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...
Sendachi
 
Group 1 LinkedIn
Group 1 LinkedInGroup 1 LinkedIn
Group 1 LinkedIn
Lee Schlenker
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
Trivadis
 
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
Daniel Bryant
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
NadinaLisbon1
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
Panorama Software
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
Maruti Gollapudi
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
WeCloudData
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
Fangda Wang
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
Rob Winters
 
Gdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfGdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdf
SparshJhariya2
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
Spark Summit
 
How Best Practices Enable Rapid Implementation of Intelligence Portals
How Best Practices Enable Rapid Implementation of Intelligence PortalsHow Best Practices Enable Rapid Implementation of Intelligence Portals
How Best Practices Enable Rapid Implementation of Intelligence Portals
IntelCollab.com
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Rehgan Avon
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 

Similar to DataEngConf SF16 - Methods for Content Relevance at LinkedIn (20)

The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
Building successful data science teams
Building successful data science teamsBuilding successful data science teams
Building successful data science teams
 
Linked in stream experimentation framework
Linked in stream experimentation frameworkLinked in stream experimentation framework
Linked in stream experimentation framework
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
Presumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of SuccessPresumption of Abundance: Architecting the Future of Success
Presumption of Abundance: Architecting the Future of Success
 
Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...
Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...
Sendachi | 451 | GitHub Webinar: Demystifying Collaboration at Scale: DevOp...
 
Group 1 LinkedIn
Group 1 LinkedInGroup 1 LinkedIn
Group 1 LinkedIn
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
DevoxxUK 2016: "DevOps: Microservices, containers, platforms, tooling... Oh y...
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
 
Top Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama SoftwareTop Business Intelligence Trends for 2016 by Panorama Software
Top Business Intelligence Trends for 2016 by Panorama Software
 
Maruti gollapudi cv
Maruti gollapudi cvMaruti gollapudi cv
Maruti gollapudi cv
 
Big Data for Data Scientists - Info Session
Big Data for Data Scientists - Info SessionBig Data for Data Scientists - Info Session
Big Data for Data Scientists - Info Session
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Big Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil GamesBig Data at a Gaming Company: Spil Games
Big Data at a Gaming Company: Spil Games
 
Gdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdfGdsc IIIT Surat Orientation 2022.pdf
Gdsc IIIT Surat Orientation 2022.pdf
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
 
How Best Practices Enable Rapid Implementation of Intelligence Portals
How Best Practices Enable Rapid Implementation of Intelligence PortalsHow Best Practices Enable Rapid Implementation of Intelligence Portals
How Best Practices Enable Rapid Implementation of Intelligence Portals
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 

More from Hakka Labs

DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
Hakka Labs
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Hakka Labs
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
Hakka Labs
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
Hakka Labs
 
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde NastDataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
Hakka Labs
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
Hakka Labs
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
Hakka Labs
 
DataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeedDataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeed
Hakka Labs
 
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
Hakka Labs
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
Hakka Labs
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
Hakka Labs
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
Hakka Labs
 

More from Hakka Labs (12)

DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL WorkshopDataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
DataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris WigginsDataEngConf: Data Science at the New York Times by Chris Wiggins
DataEngConf: Data Science at the New York Times by Chris Wiggins
 
DataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation EngineDataEngConf: Building the Next New York Times Recommendation Engine
DataEngConf: Building the Next New York Times Recommendation Engine
 
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde NastDataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
DataEngConf: Measuring Impact with Data in a Distributed World at Conde Nast
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
DataEngConf: Talkographics: Using What Viewers Say Online to Measure TV and B...
 
DataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeedDataEngConf: The Science of Virality at BuzzFeed
DataEngConf: The Science of Virality at BuzzFeed
 
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
DataEngConf: Uri Laserson (Data Scientist, Cloudera) Scaling up Genomics with...
 
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big DataDataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
DataEngConf: Parquet at Datadog: Fast, Efficient, Portable Storage for Big Data
 
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
DataEngConf: Apache Kafka at Rocana: a scalable, distributed log for machine ...
 
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedInDataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
DataEngConf: Building Satori, a Hadoop toll for Data Extraction at LinkedIn
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 

DataEngConf SF16 - Methods for Content Relevance at LinkedIn

  • 1. ©2015 LinkedIn Corporation. All Rights Reserved. 1 Methods for approaching content relevance at LinkedIn Ajit Singh DataEngConf, April 7th 2016
  • 2. ©2015 LinkedIn Corporation. All Rights Reserved. 22 Who am I ? •  Builds models and infrastructure which serves those models in production. •  Background in machine learning •  Ph.D. Machine Learning, CMU •  Post-doc, University of Washington.
  • 3. ©2015 LinkedIn Corporation. All Rights Reserved. 33 Who Do I Work With ? Engineering team with expertise in machine learning and systems.
  • 4. ©2015 LinkedIn Corporation. All Rights Reserved. 44 Why attend this talk ? Feed Slideshare Rich media: slides and video, with transcripts Pulse Content discovery. Groups Conversation Threads
  • 5. ©2015 LinkedIn Corporation. All Rights Reserved. 55 Article Recommendation
  • 6. ©2015 LinkedIn Corporation. All Rights Reserved. 6 Common Pattern for Recommendations Keep it Simple 1.  Feature Extraction –  What do I know about this content ? 2.  Indexing & Search –  How do I store and retrieve documents and their features? –  What candidates should I consider for this request ? 3.  Recommendation –  What should I show you ? 6
  • 7. ©2015 LinkedIn Corporation. All Rights Reserved. 77 Engineering Stacks Offline (minutes – days) •  Hadoop •  Spark •  “Big Data” tools Nearline (seconds – hours) •  Samza •  Spark Streaming, Storm Online (milliseconds) •  Services •  DB, Distributed Key-Value Stores
  • 8.
  • 9. ©2015 LinkedIn Corporation. All Rights Reserved. 99 Kinds of Model Features §  Member Features –  What we know about the viewer or guest. –  E.g., Industry, Skills, Languages Spoken §  Document Features –  What we know about a candidate item for recommendation. –  E.g., For articles: vector-space, topics, social gestures. §  Engagement –  Aggregations of tracking data –  E.g., Per-item click-through rates, Dwell time. –  Think OLAP cubes.
  • 10. ©2015 LinkedIn Corporation. All Rights Reserved. 1010 Document Features
  • 11. ©2015 LinkedIn Corporation. All Rights Reserved. 1111 Document Features
  • 12. ©2015 LinkedIn Corporation. All Rights Reserved. 1212 NLP §  Featurizers which process text one document at a time. –  Foundational: §  Tokenization, Lemmatization, Stop-word removal => Vector-space models §  Text near-deduplication (w-shingling, SpotSigs) §  Language detection –  Classifiers: §  Explicitly categorize the document into one or more label sets. §  Fast Prototyping –  Build a library first –  Deploy the library to Hadoop first: e.g., via Pig, Scalding. –  Don’t build a near-real time system till you have validated features.
  • 13. ©2015 LinkedIn Corporation. All Rights Reserved. 1313 Near-Real Time Feature Generation §  Before starting, ask yourself – do I need near-real time features?
  • 14. 14 Offline Nearline Online Article DB 1 Article Stream Article Stream 2 Language Topics Entity Extractors Text Hashes Language Features Topic Features Entity Features Hash Features Language Features Hash Features Topic Features Entity Features Search Index 3 6 5 Other Features 4 Offline Producer
  • 15. ©2015 LinkedIn Corporation. All Rights Reserved. 15 Pre-computed Recommendations When you can stick to the offline world. §  In many cases the recommendation problem is constrained: e.g., –  You know which users are likely to visit in a given time period. –  Documents being considered will not change quickly. §  Give me the top-k documents by highest normalized click-through rate. –  Recommendations are pushed (e-mail, push notifications) §  Just pre-compute recommendations for likely-to-visit users –  Obvious parallelization via Hadoop. –  Agile development. –  Send recommendations to a distributed key-value store to serve. §  Presupposes having good tracking & data pipelines (data lake). 15
  • 16. ©2015 LinkedIn Corporation. All Rights Reserved. 1616 Data Science / Engineering Contracts §  How do I get my data into the near real-time flow ? §  How do I deploy a model for feature X ? §  What if my model has a large number of parameters ? –  Language models are notoriously huge. –  Memory is often a tighter constraint than CPU. §  What if I want to A/B test different versions of a feature ? §  What happens if a feature source fails ?
  • 17.
  • 18. ©2015 LinkedIn Corporation. All Rights Reserved. 1818 Search Indices §  We use an in-house search system called Galene. §  Store features as searchable facets within the index.
  • 19. ©2015 LinkedIn Corporation. All Rights Reserved. 1919 Why ? §  Flexible candidate selection, –  Give me all the English-language documents ingested in the last four days, which also mention the term “Lectra”. –  Give me all promoted articles tagged with “Grace Hopper 2016” §  Consolidate state management across search & recommendations.
  • 20. ©2015 LinkedIn Corporation. All Rights Reserved. 2020 Search Verticals Search Federator Recommender … Pre-computed Recommendations Articles Slides Groups Courses Search Verticals Search query form ulated query lookup lookup
  • 21. ©2015 LinkedIn Corporation. All Rights Reserved. 2121 Continuum between Search & Recommendations Search Recommendation Navigational Broad / Exploratory Guided & Faceted Search Empty Search
  • 22.
  • 23. ©2015 LinkedIn Corporation. All Rights Reserved. 2323 Homepage Module
  • 24. Epsilon-Greedy Explore/Exploit Target’s Red Ink Runs out in Canada. Why we need “Economy Wide” Airline Seats. How much work is too much work? We can learn from barn raisers. #1 #2 #3 #4 0.3241% 0.5923% 0.4864% 0.0231% 24
  • 25. ©2015 LinkedIn Corporation. All Rights Reserved. 2525 Key Ideas §  Online Algorithm –  The model continuous updates with feedback from decisions. –  Infrastructure components: §  Near real-time counting (OLAP). §  Efficient scoring of k-candidates per-request. –  Allows for warm/cold-start models (c.f., Thompson sampling) §  What algorithms do well vs. what humans do well. –  Candidate selection by human editors. –  Ranking via algorithms.
  • 26. ©2015 LinkedIn Corporation. All Rights Reserved. 2626 Algorithm Aversion “Although people may be willing to trust an algorithm in the absence of experience with it, seeing it perform—and almost inevitably err— will cause them to abandon it in favor of a human judge. This may occur even when people see the algorithm outperform the human.” Dietvorst et al. J. Exp. Psych Res. 2014
  • 28. ©2015 LinkedIn Corporation. All Rights Reserved. 2828 Engineering & Data Science §  The core challenge is that data-driven products are complex –  Tracking –  Data Warehousing –  Offline Infrastructure (Hadoop & Spark) –  Modelers –  Nearline & Online Infrastructure §  Craftsmanship –  Clear contracts are critical §  Are you providing recommendations ? §  Are you providing data sets ? §  Are you providing modeling infrastructure to other teams ?
  • 29. ©2015 LinkedIn Corporation. All Rights Reserved. 29 Align on Metrics Know why something matters. §  True North Metrics: –  Track the health of a product. –  Usually affected by many aspects of the product; not just relevance. –  May not be measurable on short-time spans. §  e.g., Revenue in a subscription funnel. §  Signpost Metrics: –  Leading indicators of health for a true north. –  Measurable, often via an A/B test. §  Relevance Metrics: –  You have an optimization problem, this is what it optimizes. –  Rare that you can directly optimize your true north metric.