100% Serverless big data scale production Deep Learning Systemhoondong kim
ย
- BigData Sale Deep Learning Training System (with GPU Docker PaaS on Azure Batch AI)
- Deep Learning Serving Layer (with Auto Scale Out Mode on Web App for Linux Docker)
- BigDL, Keras, Tensorlfow, Horovod, TensorflowOnAzure
6. ์ค๋์ ๋ฐํ
1. Big Data์ 3์์ + 2๋ชฉํ
2. Data Science๋?
3. Data Science @ Linkedin
- Data Product: People You May Know
- Data Analytics: Skills
4.๊ฒฐ๋ก
18. ์ค๋์ ๋ฐํ
1. Big Data์ 3์์ + 2๋ชฉํ
2. Data Science๋?
3. Data Science @ Linkedin
- Data Product: People You May Know
- Data Analytics: Skills
4.๊ฒฐ๋ก
30. ์ค๋์ ๋ฐํ
1. Big Data์ 3์์ + 2๋ชฉํ
2. Data Science๋?
3. Data Science @ Linkedin
- Data Product: People You May Know
- Data Analytics: Skills
4.๊ฒฐ๋ก
38. ์ค๋์ ๋ฐํ
1. Big Data์ 3์์ + 2๋ชฉํ
2. Data Science๋?
3. Data Science @ Linkedin
- Data Product: People You May Know
- Data Analytics: Skills
4.๊ฒฐ๋ก
40. + People You May Know?
์์ ๋คํธ์ํฌ ๊ทธ๋ํ ์์์์ Link Prediction ๋ฌธ์
?
41. + People You May Know - HowTo
1. ๊ธฐ์กด์ ์ ์ ๋ฐ์ดํฐ๋ฅผ ํ์ฉ, ๋จธ์ ๋ฌ๋ ๋ชจ๋ธ์ Train
Model Training
http://www.vorterix.com/malditosnerds/notas/4918/los-creadores-de-siri-preparan-algo-especial.html
42. + People You May Know - HowTo
2. Hadoop Flow ๋ฅผ ํตํด, ์ถ์ฒ ๋ฐ์ดํฐ ์์ฑ
43. + People You May Know - HowTo
3. ์ ์ ์๊ฒ ์ถ์ฒ.
44. + People You May Know - HowTo
4. ์๋ก์ด ๋ฐ์ดํฐ ์์ฑ
? !
54. ์ค๋์ ๋ฐํ
1. Big Data์ 3์์ + 2๋ชฉํ
2. Data Science๋?
3. Data Science @ Linkedin
- Data Product: People You May Know
- Data Analytics: Skills
4.๊ฒฐ๋ก
56. ๊ฐ์ค : ์ค๋ฆฌ์ฝ ๋ฐธ๋ฆฌ์ ์ฟจํ ํ์ฌ๋ ๋ชจ๋ ๋ถ์ชฝ์, ์ง๋ฃจํ ํ์ฌ๋ ๋ชจ๋
๋จ์ชฝ์ ์๋ค?
San Francisco
Mountain View
San
Jose
Redwood City
57. ๊ฐ์ค : ์ค๋ฆฌ์ฝ ๋ฐธ๋ฆฌ์ ์ฟจํ ํ์ฌ๋ ๋ชจ๋ ๋ถ์ชฝ์, ์ง๋ฃจํ ํ์ฌ๋ ๋ชจ๋
๋จ์ชฝ์ ์๋ค?
San Francisco
Mountain View
San
Jose
Redwood City
58. ์ง์ง ๊ฐ์ค : ์ค๋ฆฌ์ฝ ๋ฐธ๋ฆฌ์ ํ์ฌ ๋ถํฌ๋, Network OSI 7 layer๋ฅผ ๋ฎ์๋ค.
San Francisco
Mountain View
San
Jose
Redwood City
60. San Francisco
San Jose
Redwood City
Mountain View
Application
Presentation
Network &
Transport
Data Link &
Physical
61. ์ค๋์ ๋ฐํ
1. Big Data์ 3์์ + 2๋ชฉํ
2. Data Science๋?
3. Data Science @ Linkedin
- Data Product: People You May Know
- Data Analytics: Skills
4.๊ฒฐ๋ก
Imaginary conversation: Iโm collecting user log data, I finished setting up hadoop cluster. Now I just want to do โsomething interestingโ with big data
Imaginary conversation: Iโm collecting user log data, I finished setting up hadoop cluster. Now I just want to do โsomething interestingโ with big data
How can u do โsomething interestingโ with big data?
Disclaimer: This presentation is based on public research/presentations of LinkedIn. However, opinions presented here is mine, and can be differ from official stance of Linkedin.
2:05
Definition of Big Data
Very large sets of data that are produced by people using the internet, and that can only be stored, understood, and used with the help of special tools and methods
โ Cambridge Dictionary
3 elements of big data
๊ฐ๊ฐ์ ์์์ ๋ํด ์์ธํ ๋งํ ํ์๋ ์์(๋ค์์ฌ๋ผ์ด๋๋ค์์ ํ ๊ฒ์ด๊ธฐ ๋๋ฌธ์)
Element 1: very large data set
Element 2: Tools
๊ธฐํ๊ธ์์
Element 3: Methodology = Data Science
Imaginary conversation: Iโm collecting user log data, I finished setting up hadoop cluster. Now I just want to do โsomething interestingโ with big data
Imaginary conversation: Iโm collecting user log data, I finished setting up hadoop cluster. Now I just want to do โsomething interestingโ with big data
How can u do โsomething interestingโ with big data?
Methodology is missing!
7:08
What is data science?
3 elements of big data
๊ฐ๊ฐ์ ์์์ ๋ํด ์์ธํ ๋งํ ํ์๋ ์์(๋ค์์ฌ๋ผ์ด๋๋ค์์ ํ ๊ฒ์ด๊ธฐ ๋๋ฌธ์)
3 elements of big data
๊ฐ๊ฐ์ ์์์ ๋ํด ์์ธํ ๋งํ ํ์๋ ์์(๋ค์์ฌ๋ผ์ด๋๋ค์์ ํ ๊ฒ์ด๊ธฐ ๋๋ฌธ์)
Why is it science?
Hypothesis & Model building
A/B Testing
์คํ๋ผ์ธ ํ ์คํธ๋ ์ธ๊ธ?
A/B Testing
: Obama election campaign
A/B Testing
: Google โ40shades of blueโ
Accept or decline the hypothesis
14:17
313million linkedin users
Linkedinโs Data Products
Linkedinโs Data Products
Linkedinโs Data Products
Big Data Ecosystem : Big data Product -> User Interaction Data -> Hadoop Cluster -> Key/Value Storage
Open source projects used in Linkedin Data team.
Analytics/Modeling layerโs knowledge is separated from infrastructure layerโs knowledge
18:17
PYMK: Link Prediction On Social Network
PYMK: Train the machine learning model using existing connection data
PYMK: Hadoop workflow
PYMK: serving data to users
Userโs reaction will be the new input data
How PYMK has been changed from 2008
21:19
Can we use organizational overlap on PYMK?
Using scientific method
The longer two users were on same organization, the higher the probability for them to know each other
The larger the size of organization, the lower the probability for members within it to know each other
Model of organizational overlap
Model of organizational overlap
Experiment of Organizational overlap: A/B Testing
Organizational overlap: Hypothesis accepted
Hypothesis: All cool companies are at north of silicon valley, while companies at south of silicon valley are boring?(joke)
Hypothesis: All cool companies are at north of silicon valley, while companies at south of silicon valley are boring?(joke)
Real Hypothesis: Silicon Valleyโs distribution of the company resembles that of Network OSI 7 layer
Methodology we used to extract skills by the region of silicon valley
31:00
Imaginary conversation: Iโm collecting user log data, I finished setting up hadoop cluster. Now I just want to do โsomething interestingโ with big data
Imaginary conversation: Iโm collecting user log data, I finished setting up hadoop cluster. Now I just want to do โsomething interestingโ with big data
Imaginary conversation: Iโm collecting user log data, I finished setting up hadoop cluster. Now I just want to do โsomething interestingโ with big data
3 elements of big data
๊ฐ๊ฐ์ ์์์ ๋ํด ์์ธํ ๋งํ ํ์๋ ์์(๋ค์์ฌ๋ผ์ด๋๋ค์์ ํ ๊ฒ์ด๊ธฐ ๋๋ฌธ์)
Why is it science?
Action Item 1: Donโt forget Hypothesis setup must be done by human
Action Item 3: Be aware that data product is everywhere
Action Item 2: Review statistics
Action Item 4: Trial & Error โ lots of iteration is the key