SlideShare a Scribd company logo
1 of 27
Download to read offline
Improvement for
StackOverflow.com
Chentao Zhang
Insight Data Engineering SV
Motivations
Motivations
1.Tag right?
2. When will be answered?
~Help users to tag their questions on
stackoverflow.com more properly
DEMO:
www.stackoverflowtags.tech
Data Pipeline
Historical
data(60G)
Streaming data
Input Data
Question:
{
“post_id”:67172,
“post_date”:”6-10-2015-00-01-02”,
“type”:0,
“parent_id”:0
“tiltle”:” Java Exception”,
“body”:”....”,
“tags”:“java;algorithm”,
“user_id”:782,
…
}
Answer:
{
“post_id”:67172,
“post_date”:”6-10-2015-00-01-23”,
“type”:1,
“parent_id”:67172
“tiltle”:” “,
“body”:”You should....”,
“tags”:“”,
“user_id”:1982,
…
}
Query #1
• Prob. of a question labeled with specific tag(such as tag A) will be
answered in 10 mins
= number of questions answered in 10 mins and tagged with A/ total number of questions tagged with A
question_id tags answer_time(sec) posted_at
231 Java 3010 2016_01_02_21_20_01
290 spark 7381 2016_01_02_22_09_01
341 Java 5611 2016_01_10_01_02_05
Query #1
• Prob. of a question labeled with specific tag(such as tag A) will be
answered in 10 mins
= number of questions answered in 10 mins and tagged with A/ total number of questions tagged with A
question_id tags answer_time(sec) posted_at
231 Java 3010 2016_01_02_21_20_01
290 spark 7381 2016_01_02_22_09_01
341 Java 5611 2016_01_10_01_02_05
Query #1
• Prob. of a question labeled with specific tag(such as tag A) will be
answered in 10 mins
= number of questions answered in 10 mins and tagged with A/ total number of questions tagged with A
Shard
Shard Shard
Shard
Request
Query #1
Shard
Shard Shard
Shard
0.5|2000
0.3|2000
0.1|1660
(Prob.|sample size)
0.2|2000
Request
Query #1
Shard
Shard Shard
Shard
0.5|2000
0.3|2000
0.1|1660
0.2|2000
1660*0.1+2000*0.2+2000*0.5+2000*0.3)/
(2000*3+1660)=0.28
Request
Query #1
Query #2
• Recommend a list of tags which are similar to a
specific tag and sorted by their similarity to the tag
spark/tags_graph:
{
“id”:261
”tags”:[”kdc:lat”,
“java”,
“hadoop:java”,
“apache”,
“linux:OS”]
}
*vertice:tags
*weight of vertice: number of people who have answered questions labeled with this tags
*weight of edge: number of people who have answered questions for both tags
* Similarity of A to B = WAB /WA
Query #2
• User entered tagA
Query #2
tag A
tag C tag B
tag D
3
2
5
5 10
3
7
• User entered tagA
• Search all neighbors
of tagA and compute
their similarity to A.
Query #2
tag A
tag C tag B
tag D
3
2
5
5 10
3
7
SC->A=3/5=0.6
SD->A=2/3=0.67
SB>A=5/10=0.5
• User entered tagA
• Search all neighbors
of tagA and compute
their similarity to A.
• Sort B,C,D by their
similarity to A
Query #2
tag A
tag C tag B
tag D
3
2
5
5 10
3
7
SC->A=3/5=0.6
SD->A=2/3=0.67
SB>A=5/10=0.5
• User entered tagA
• Search all neighbors of
tagA and compute their
similarity to A.
• Sort B,C,D by their
similarity to A
• Give result to user
Query #2
tag A
tag C tag B
tag D
3
2
5
5 10
3
7
SC->A=3/5=0.6
SD->A=2/3=0.67
SB>A=5/10=0.5
Challenges and Future Considerations
• Streaming processing to update information
• Process big data
• Scale up the performance of sorting in graph
About Me
• Chentao(Sam) Zhang
• MS in Electrical & Computer
Engineering from University of
Delaware
• Passionated to learn and try
new things
Query #1
tags
Prob. of being answered in
10 mins Avg time(sec)
Java 0.32 1200
spark 0.013 31000
tags
Prob. of being answered in
10 mins Avg time(sec)
Java 0.32 1200
spark 0.013 31000
Query #1
Challenges
~Process big data
~Performance of updating batch of
data
tags
Prob. of being answered in
10 mins Avg time(sec)
Java 0.32 1200
spark 0.013 31000
question_id tags answer_time(sec) posted_at
231 Java 3010 2016_01_02_21_20_01
290 spark 7381 2016_01_02_22_09_01
341 Java 5611 2016_01_10_01_02_05
Query #1
tags
Prob. of being answered in
10 mins Avg time(sec)
Java 0.32 1200
spark 0.013 31000
question_id tags answer_time(sec) posted_at
231 Java 3010 2016_01_02_21_20_01
290 spark 7381 2016_01_02_22_09_01
341 Java 5611 2016_01_10_01_02_05
Query #1
Data Modelling
~Good at searching
~Graph engine
Index: spark
Type:tags_time Type:tags_graph
Shard
Shard Shard
Shard
Query #1
Elasticsearch cluster
spark/tags:
{
”tags”:”kdc:latitude-longitude",
“aws_in_time”:2985,
“aws_no_in_time”:3023,
“num_aws”:29795,
“tol_time":3234324,
“num_no_aws”:796
}
~Efficiency
~Data update
* Response time: the time from each question being posted to getting its first answer
* average response time=tot_time/num_aws
* Prob. of being answered in 10 mins=aws_in_time/(num_aws+num_no_aws)

More Related Content

Similar to Sam zhang demo

MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB
 
How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...
How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...
How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...MongoDB
 
ITB2019 Easy ElasticSearch with cbElasticSearch - Jon Clausen
ITB2019 Easy ElasticSearch with cbElasticSearch - Jon ClausenITB2019 Easy ElasticSearch with cbElasticSearch - Jon Clausen
ITB2019 Easy ElasticSearch with cbElasticSearch - Jon ClausenOrtus Solutions, Corp
 
Template-based Modular Architecture
Template-based Modular ArchitectureTemplate-based Modular Architecture
Template-based Modular Architecturegenify
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooAll Things Open
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
Relational_Intro_1.ppt
Relational_Intro_1.pptRelational_Intro_1.ppt
Relational_Intro_1.pptIrfanRashid36
 
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian KurzeTips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian KurzeMongoDB
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
Demystifying NoSQL - All Things Open - October 2020
Demystifying NoSQL - All Things Open - October 2020Demystifying NoSQL - All Things Open - October 2020
Demystifying NoSQL - All Things Open - October 2020Matthew Groves
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018artgillespie
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling databaseMUG Perú
 
Build 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsBuild 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsWindows Developer
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 

Similar to Sam zhang demo (20)

Sam zhang demo
Sam zhang demoSam zhang demo
Sam zhang demo
 
Into The Box 2018 cbelasticsearch
Into The Box 2018   cbelasticsearchInto The Box 2018   cbelasticsearch
Into The Box 2018 cbelasticsearch
 
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query PitfallsMongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
MongoDB.local Seattle 2019: Tips & Tricks for Avoiding Common Query Pitfalls
 
How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...
How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...
How to Avoid Common Data Visualization Pitfalls and Being Led Astray By Your ...
 
ITB2019 Easy ElasticSearch with cbElasticSearch - Jon Clausen
ITB2019 Easy ElasticSearch with cbElasticSearch - Jon ClausenITB2019 Easy ElasticSearch with cbElasticSearch - Jon Clausen
ITB2019 Easy ElasticSearch with cbElasticSearch - Jon Clausen
 
Template-based Modular Architecture
Template-based Modular ArchitectureTemplate-based Modular Architecture
Template-based Modular Architecture
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Relational_Intro_1.ppt
Relational_Intro_1.pptRelational_Intro_1.ppt
Relational_Intro_1.ppt
 
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian KurzeTips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
Tips and Tricks for Avoiding Common Query Pitfalls Christian Kurze
 
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB World 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
Demystifying NoSQL - All Things Open - October 2020
Demystifying NoSQL - All Things Open - October 2020Demystifying NoSQL - All Things Open - October 2020
Demystifying NoSQL - All Things Open - October 2020
 
PostgreSQL Open SV 2018
PostgreSQL Open SV 2018PostgreSQL Open SV 2018
PostgreSQL Open SV 2018
 
Gab document db scaling database
Gab   document db scaling databaseGab   document db scaling database
Gab document db scaling database
 
Build 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive CardsBuild 2017 - B8002 - Introducing Adaptive Cards
Build 2017 - B8002 - Introducing Adaptive Cards
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
TechKlout
TechKloutTechKlout
TechKlout
 
Why no sql
Why no sqlWhy no sql
Why no sql
 

Recently uploaded

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 

Recently uploaded (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 

Sam zhang demo