SlideShare a Scribd company logo
1 of 31
Alexander Borzunov
How to do research
at a large IT company
2
Who am I?
Alexander Borzunov
• Researcher at Yandex
• NEERC ICPC 2017 prize winner
• Bachelor’s at Ural FU
• Master’s at HSE University +
Yandex School of Data Analysis
3
Plan
• Why do companies need research?
• What researchers do?
• How to get there?
4
Why do companies need research?
Product development:
• Developers address user feedback/business needs
• No time to dive deeply into a problem (e. g. invent a new algorithm)
Research:
• Experts work on problems from a particular area full-time
• Necessary to get innovations in the long term
5
How is it different from universities?
Research in companies:
• More funding
• Access to more compute
• Interaction with product teams
6
Many breakthroughs in modern computer science
are made by companies
7
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
8
What researchers do?
9
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
10
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
• Collaborate with each other
11
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
• Collaborate with each other
• Conduct experiments (you need to write code quickly to evaluate many ideas)
12
What researchers do?
• Follow latest findings and results (e. g. on Twitter)
• Choose promising research directions
• Collaborate with each other
• Conduct experiments
• Design rigorous proofs
13
What researchers do?
If the method works:
• Write a paper for an (international) conference
• Defend it in a discussion with reviewers
• If accepted:
• Travel to a conference ✈️
• Tell the world about it on Twitter, Reddit, Habr, etc. 🌎
• Your results may be adopted by product teams
14
Yandex Research
• Focus: machine learning and related algorithms
• Computer vision, image generation
• Language processing
• Program synthesis with neural nets (e. g. trained on Codeforces solutions)
• Systems for distributed training
• Theory, e. g. continuous optimization
• Publications in top venues such as NeurIPS, ICML, CVPR, ACL
15
Collaboration with product teams
Self-driving and Robotics Voice assistants
16
Yandex Research
• Joint labs with paid programs for Master’s/PhD students:
• Collaborations:
17
How did I get there?
2014 – 2018 Bachelor’s at Ural FU, participated in ICPC
▎ “What’s next?”
▎ “Machine learning – a growing field”
18
Machine learning on “Cats vs. Dogs”
No methods known to get 60% accuracy (random gives 50%)
2007
vs.
19
Machine learning on “Cats vs. Dogs”
No methods known to get 60% accuracy (random gives 50%)
Solved with 98% accuracy
2007
2014
vs.
20
Machine learning on “Cats vs. Dogs”
No methods known to get 60% accuracy (random gives 50%)
Solved with 98% accuracy
Neural nets can draw cats and dogs themselves
(this cat does not exist)
2007
2014
2019
vs.
21
Machine learning in 2021
Neural nets can draw cats and dogs themselves
Neural nets draw pictures matching any text description
2019
2021
22
How did I get there?
2018 – 2020
2019 – 2021
2021 – Now
Master’s at HSE University + Yandex School of Data Analysis
▎ “Self-driving – a product that may change everyday life”
Research Engineer at Yandex Self-Driving
▎ “Research – a place where people invent new things”
Yandex Research
23
What I do?
• Compute needed for training latest neural nets grows quickly
• Popular training methods are designed for high-performance clusters
• Cluster to train GPT-3 costs over $250 million
• Hard to get if you are in a university or a startup
• Solution: distributed training over the Internet (like BitTorrent)
24
First use case: Language models
• Training one large neural net allows to solve many tasks:
• Understanding intents, tone, logical relations from a sentence
• Answering questions
• Extracting entities (locations, persons, etc.)
• Once trained, it is easy to use for your business/research
First use case: Language model for Bengali
• TOP-6 language by no. of native speakers
• No good model yet
First use case: Language model for Bengali
• We offered people to train one together!
Together with:
• Got a competitive model, state-of-the-art on some tasks
Roadblock to scaling: Security
• To train a neural net, you need to average
computations performed by peers on
different data samples
• A troll or competitor may destroy the
model by sending wrong values once
28
Secure distributed training
Idea #1: Clip outliers among computations
(it does not hurt training if done right)
29
Idea #2:
• Peers broadcast hashes of their calculations.
• Then, the system selects “policemen” to validate results of some peers.
• If a policeman accuses someone, we can learn who is right from the hashes.
Secure distributed training
Secure distributed training
Result: We ban offenders and quickly recover training progress
31
Thank you!
Check out our publications and
available positions on
research.yandex.com
I am available for a chat or questions
at the Yandex area
on the 3rd floor terrace until 7 pm 🙂

More Related Content

What's hot

Online Collaboration - What’s Up in Singapore?
Online Collaboration - What’s Up in Singapore?Online Collaboration - What’s Up in Singapore?
Online Collaboration - What’s Up in Singapore?NUS-ISS
 
Trainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILTTrainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILTCynthia Clay
 
Interface Design for Elearning - Tips and Tricks
Interface Design for Elearning - Tips and TricksInterface Design for Elearning - Tips and Tricks
Interface Design for Elearning - Tips and TricksJulie Dirksen
 
SCALE12X DevOps Day LA: 9 Principles for Navigating Change
SCALE12X DevOps Day LA: 9 Principles for Navigating ChangeSCALE12X DevOps Day LA: 9 Principles for Navigating Change
SCALE12X DevOps Day LA: 9 Principles for Navigating ChangeMatt Ray
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingTechWell
 
Agile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from ResearchAgile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from ResearchTorgeir Dingsøyr
 
Rapid Software Testing: Strategy
Rapid Software Testing: StrategyRapid Software Testing: Strategy
Rapid Software Testing: StrategyTechWell
 
9 Principles for Navigating Change
9 Principles for Navigating Change9 Principles for Navigating Change
9 Principles for Navigating ChangeMatt Ray
 
Prelude Suite Deck / South Summit 2018
Prelude Suite Deck / South Summit 2018Prelude Suite Deck / South Summit 2018
Prelude Suite Deck / South Summit 2018Howard Esbin
 
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work Practices
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work PracticesLets Talk Toolbox Talks: How to Effectively Reinforce Safe Work Practices
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work PracticesTriumvirate Environmental
 
How Virtual is Virtual: Designing for Distributed Work in Innovation
How Virtual is Virtual: Designing for Distributed Work in InnovationHow Virtual is Virtual: Designing for Distributed Work in Innovation
How Virtual is Virtual: Designing for Distributed Work in InnovationSociotechnical Roundtable
 
Trainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILTTrainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILTCynthia Clay
 
Multi-Cloud for Dummies
Multi-Cloud for DummiesMulti-Cloud for Dummies
Multi-Cloud for DummiesLiberteks
 
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...Puppet
 
Hiring Tips For Distributed Teams from PowerToFly
Hiring Tips For Distributed Teams from PowerToFlyHiring Tips For Distributed Teams from PowerToFly
Hiring Tips For Distributed Teams from PowerToFlypowertofly
 

What's hot (17)

Online Collaboration - What’s Up in Singapore?
Online Collaboration - What’s Up in Singapore?Online Collaboration - What’s Up in Singapore?
Online Collaboration - What’s Up in Singapore?
 
Trainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILTTrainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILT
 
Interface Design for Elearning - Tips and Tricks
Interface Design for Elearning - Tips and TricksInterface Design for Elearning - Tips and Tricks
Interface Design for Elearning - Tips and Tricks
 
SCALE12X DevOps Day LA: 9 Principles for Navigating Change
SCALE12X DevOps Day LA: 9 Principles for Navigating ChangeSCALE12X DevOps Day LA: 9 Principles for Navigating Change
SCALE12X DevOps Day LA: 9 Principles for Navigating Change
 
A Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software TestingA Rapid Introduction to Rapid Software Testing
A Rapid Introduction to Rapid Software Testing
 
Agile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from ResearchAgile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from Research
 
Rapid Software Testing: Strategy
Rapid Software Testing: StrategyRapid Software Testing: Strategy
Rapid Software Testing: Strategy
 
9 Principles for Navigating Change
9 Principles for Navigating Change9 Principles for Navigating Change
9 Principles for Navigating Change
 
Prelude Suite Deck / South Summit 2018
Prelude Suite Deck / South Summit 2018Prelude Suite Deck / South Summit 2018
Prelude Suite Deck / South Summit 2018
 
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work Practices
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work PracticesLets Talk Toolbox Talks: How to Effectively Reinforce Safe Work Practices
Lets Talk Toolbox Talks: How to Effectively Reinforce Safe Work Practices
 
How Virtual is Virtual: Designing for Distributed Work in Innovation
How Virtual is Virtual: Designing for Distributed Work in InnovationHow Virtual is Virtual: Designing for Distributed Work in Innovation
How Virtual is Virtual: Designing for Distributed Work in Innovation
 
Trainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILTTrainers Matter: Making the Case for VILT
Trainers Matter: Making the Case for VILT
 
Multi-Cloud for Dummies
Multi-Cloud for DummiesMulti-Cloud for Dummies
Multi-Cloud for Dummies
 
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
PuppetConf 2016: Collaboration and Empowerment: Driving Change in Infrastruct...
 
Hiring Tips For Distributed Teams from PowerToFly
Hiring Tips For Distributed Teams from PowerToFlyHiring Tips For Distributed Teams from PowerToFly
Hiring Tips For Distributed Teams from PowerToFly
 
PLN4LL
PLN4LLPLN4LL
PLN4LL
 
ROI On DLP
ROI On DLPROI On DLP
ROI On DLP
 

Similar to How to do science in a large IT company (ICPC World Finals 2021, Moscow)

Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
Computational Thinking and Acting: Future Technologies for Future Generations
Computational Thinking and Acting: Future Technologies for Future GenerationsComputational Thinking and Acting: Future Technologies for Future Generations
Computational Thinking and Acting: Future Technologies for Future GenerationsJan Pawlowski
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchTao Xie
 
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Brocade
 
Experience sharing-of-technologist-cum-mgmt-scientist-2013
Experience sharing-of-technologist-cum-mgmt-scientist-2013Experience sharing-of-technologist-cum-mgmt-scientist-2013
Experience sharing-of-technologist-cum-mgmt-scientist-2013Sanjeev Deshmukh
 
Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903Richard Hackathorn
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 stepsQuantUniversity
 
2016: Applying AI Innovation in Business
2016: Applying AI Innovation in Business2016: Applying AI Innovation in Business
2016: Applying AI Innovation in BusinessLeandro de Castro
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesDaniel S. Katz
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSADaniel S. Katz
 
Community and Code: Lessons from NESCent Hackathons
Community and Code: Lessons from NESCent HackathonsCommunity and Code: Lessons from NESCent Hackathons
Community and Code: Lessons from NESCent HackathonsArlin Stoltzfus
 
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITDr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITBristlecone SCC
 
Hawaii Machine Learning - Our Inaugural Meetup
Hawaii Machine Learning - Our Inaugural MeetupHawaii Machine Learning - Our Inaugural Meetup
Hawaii Machine Learning - Our Inaugural MeetupMichael Motoki
 

Similar to How to do science in a large IT company (ICPC World Finals 2021, Moscow) (20)

Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
NUS PhD e-open day 2020
NUS PhD e-open day 2020NUS PhD e-open day 2020
NUS PhD e-open day 2020
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
Computational Thinking and Acting: Future Technologies for Future Generations
Computational Thinking and Acting: Future Technologies for Future GenerationsComputational Thinking and Acting: Future Technologies for Future Generations
Computational Thinking and Acting: Future Technologies for Future Generations
 
Life after-phd-10-nov
Life after-phd-10-novLife after-phd-10-nov
Life after-phd-10-nov
 
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven ResearchISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
ISEC'18 Tutorial: Research Methodology on Pursuing Impact-Driven Research
 
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
 
Experience sharing-of-technologist-cum-mgmt-scientist-2013
Experience sharing-of-technologist-cum-mgmt-scientist-2013Experience sharing-of-technologist-cum-mgmt-scientist-2013
Experience sharing-of-technologist-cum-mgmt-scientist-2013
 
Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903Staying Competitive in Data Analytics: Analyze Boulder 20140903
Staying Competitive in Data Analytics: Analyze Boulder 20140903
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Classroom of the futurev3
Classroom of the futurev3Classroom of the futurev3
Classroom of the futurev3
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 steps
 
2016: Applying AI Innovation in Business
2016: Applying AI Innovation in Business2016: Applying AI Innovation in Business
2016: Applying AI Innovation in Business
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
 
Community and Code: Lessons from NESCent Hackathons
Community and Code: Lessons from NESCent HackathonsCommunity and Code: Lessons from NESCent Hackathons
Community and Code: Lessons from NESCent Hackathons
 
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MITDr Abel Sanchez at Bristlecone Pulse 2017 MIT
Dr Abel Sanchez at Bristlecone Pulse 2017 MIT
 
Hawaii Machine Learning - Our Inaugural Meetup
Hawaii Machine Learning - Our Inaugural MeetupHawaii Machine Learning - Our Inaugural Meetup
Hawaii Machine Learning - Our Inaugural Meetup
 

Recently uploaded

Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfWildaNurAmalia2
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 

Recently uploaded (20)

Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdfBUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
BUMI DAN ANTARIKSA PROJEK IPAS SMK KELAS X.pdf
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 

How to do science in a large IT company (ICPC World Finals 2021, Moscow)

  • 1. Alexander Borzunov How to do research at a large IT company
  • 2. 2 Who am I? Alexander Borzunov • Researcher at Yandex • NEERC ICPC 2017 prize winner • Bachelor’s at Ural FU • Master’s at HSE University + Yandex School of Data Analysis
  • 3. 3 Plan • Why do companies need research? • What researchers do? • How to get there?
  • 4. 4 Why do companies need research? Product development: • Developers address user feedback/business needs • No time to dive deeply into a problem (e. g. invent a new algorithm) Research: • Experts work on problems from a particular area full-time • Necessary to get innovations in the long term
  • 5. 5 How is it different from universities? Research in companies: • More funding • Access to more compute • Interaction with product teams
  • 6. 6 Many breakthroughs in modern computer science are made by companies
  • 7. 7 What researchers do? • Follow latest findings and results (e. g. on Twitter)
  • 9. 9 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions
  • 10. 10 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions • Collaborate with each other
  • 11. 11 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions • Collaborate with each other • Conduct experiments (you need to write code quickly to evaluate many ideas)
  • 12. 12 What researchers do? • Follow latest findings and results (e. g. on Twitter) • Choose promising research directions • Collaborate with each other • Conduct experiments • Design rigorous proofs
  • 13. 13 What researchers do? If the method works: • Write a paper for an (international) conference • Defend it in a discussion with reviewers • If accepted: • Travel to a conference ✈️ • Tell the world about it on Twitter, Reddit, Habr, etc. 🌎 • Your results may be adopted by product teams
  • 14. 14 Yandex Research • Focus: machine learning and related algorithms • Computer vision, image generation • Language processing • Program synthesis with neural nets (e. g. trained on Codeforces solutions) • Systems for distributed training • Theory, e. g. continuous optimization • Publications in top venues such as NeurIPS, ICML, CVPR, ACL
  • 15. 15 Collaboration with product teams Self-driving and Robotics Voice assistants
  • 16. 16 Yandex Research • Joint labs with paid programs for Master’s/PhD students: • Collaborations:
  • 17. 17 How did I get there? 2014 – 2018 Bachelor’s at Ural FU, participated in ICPC ▎ “What’s next?” ▎ “Machine learning – a growing field”
  • 18. 18 Machine learning on “Cats vs. Dogs” No methods known to get 60% accuracy (random gives 50%) 2007 vs.
  • 19. 19 Machine learning on “Cats vs. Dogs” No methods known to get 60% accuracy (random gives 50%) Solved with 98% accuracy 2007 2014 vs.
  • 20. 20 Machine learning on “Cats vs. Dogs” No methods known to get 60% accuracy (random gives 50%) Solved with 98% accuracy Neural nets can draw cats and dogs themselves (this cat does not exist) 2007 2014 2019 vs.
  • 21. 21 Machine learning in 2021 Neural nets can draw cats and dogs themselves Neural nets draw pictures matching any text description 2019 2021
  • 22. 22 How did I get there? 2018 – 2020 2019 – 2021 2021 – Now Master’s at HSE University + Yandex School of Data Analysis ▎ “Self-driving – a product that may change everyday life” Research Engineer at Yandex Self-Driving ▎ “Research – a place where people invent new things” Yandex Research
  • 23. 23 What I do? • Compute needed for training latest neural nets grows quickly • Popular training methods are designed for high-performance clusters • Cluster to train GPT-3 costs over $250 million • Hard to get if you are in a university or a startup • Solution: distributed training over the Internet (like BitTorrent)
  • 24. 24 First use case: Language models • Training one large neural net allows to solve many tasks: • Understanding intents, tone, logical relations from a sentence • Answering questions • Extracting entities (locations, persons, etc.) • Once trained, it is easy to use for your business/research
  • 25. First use case: Language model for Bengali • TOP-6 language by no. of native speakers • No good model yet
  • 26. First use case: Language model for Bengali • We offered people to train one together! Together with: • Got a competitive model, state-of-the-art on some tasks
  • 27. Roadblock to scaling: Security • To train a neural net, you need to average computations performed by peers on different data samples • A troll or competitor may destroy the model by sending wrong values once
  • 28. 28 Secure distributed training Idea #1: Clip outliers among computations (it does not hurt training if done right)
  • 29. 29 Idea #2: • Peers broadcast hashes of their calculations. • Then, the system selects “policemen” to validate results of some peers. • If a policeman accuses someone, we can learn who is right from the hashes. Secure distributed training
  • 30. Secure distributed training Result: We ban offenders and quickly recover training progress
  • 31. 31 Thank you! Check out our publications and available positions on research.yandex.com I am available for a chat or questions at the Yandex area on the 3rd floor terrace until 7 pm 🙂