Big Data @ Bukalapak
Ibrahim Arief – @ibamarief
VP of Engineering – Bukalapak
SARCCOM & BCA Tech Talk – October 2017
Strictly Confidential 1
Short Intro – Speaker
• VP of Engineering – Bukalapak (ID)
• 2016 – present
• Engineering Lead – bol.com (NL)
• 2014 – 2016
• Comp Sci PhD dropout  – NTNU Gjøvik (NO)
• 2013 – 2015
2
Short Intro – Bukalapak
3
• One of the largest e-marketplace in Southeast Asia
• 15 million users, 1 Trillion IDR/month
• 900+ Total Employees
• 350+ in Product Development Group
• 120+ in Product (PM, UX, UI, DS, QAT)
• 200+ in Engineering (FE, BE, QAE, MOB, AI)
• 30+ in Technology (SRE, SysEng)
• 20+ Product Development Teams
How big is our Big Data?
4
Billions of data points per day
(censored, sorry
)
5
Since 2014, we…
6
7
>1.5 PB of data
Hundreds of millions of images
(current size, predicted to triple every year)
8
>1.5 PB of data
Hundreds of millions of products
(current size, predicted to triple every year)
9
>1.5 PB of data
Hundreds of millions of messages
(current size, predicted to triple every year)
How do we handle all those
data?
i.e. our Big Data Architecture
10
11Old (2016), data fragmentation, hard to do data crunching for AIs
12
New (2017), data lake & warehouse, robust pipeline for AI & Analytics
13
1PB Elastic
Cluster
New (2017), data lake & warehouse, robust pipeline for AI & Analytics
14
Small 192-core
Spark Cluster
New (2017), data lake & warehouse, robust pipeline for AI & Analytics
What do we use all those
data for?
(hint: not just for generating business reports )
15
Realtime high-level health insight
16
Fast awareness if releases are unhealthy, enabling rapid reaction & mitigation
Old Recommender AI – Similarity-Based Search
17
Showing boring similar products 
New Recommender AI – Crunching 1.2B Monthly Views
18
Showing inspirational alternatives 
Data-Driven  A/B Tested
Amazing incremental growth 
New Recommender AI – Crunching 1.2B Monthly Views
19
Wrap Up
20
• Big Data @ Bukalapak  >1.5PB
• Big Data  not just for business reports
• Big Data for realtime health insight can save $$$
• Big Data for AI can and do generate $$$
• We’re hiring!  Check out careers.bukalapak.com

Big data @ Bukalapak

  • 1.
    Big Data @Bukalapak Ibrahim Arief – @ibamarief VP of Engineering – Bukalapak SARCCOM & BCA Tech Talk – October 2017 Strictly Confidential 1
  • 2.
    Short Intro –Speaker • VP of Engineering – Bukalapak (ID) • 2016 – present • Engineering Lead – bol.com (NL) • 2014 – 2016 • Comp Sci PhD dropout  – NTNU Gjøvik (NO) • 2013 – 2015 2
  • 3.
    Short Intro –Bukalapak 3 • One of the largest e-marketplace in Southeast Asia • 15 million users, 1 Trillion IDR/month • 900+ Total Employees • 350+ in Product Development Group • 120+ in Product (PM, UX, UI, DS, QAT) • 200+ in Engineering (FE, BE, QAE, MOB, AI) • 30+ in Technology (SRE, SysEng) • 20+ Product Development Teams
  • 4.
    How big isour Big Data? 4
  • 5.
    Billions of datapoints per day (censored, sorry ) 5
  • 6.
  • 7.
    7 >1.5 PB ofdata Hundreds of millions of images (current size, predicted to triple every year)
  • 8.
    8 >1.5 PB ofdata Hundreds of millions of products (current size, predicted to triple every year)
  • 9.
    9 >1.5 PB ofdata Hundreds of millions of messages (current size, predicted to triple every year)
  • 10.
    How do wehandle all those data? i.e. our Big Data Architecture 10
  • 11.
    11Old (2016), datafragmentation, hard to do data crunching for AIs
  • 12.
    12 New (2017), datalake & warehouse, robust pipeline for AI & Analytics
  • 13.
    13 1PB Elastic Cluster New (2017),data lake & warehouse, robust pipeline for AI & Analytics
  • 14.
    14 Small 192-core Spark Cluster New(2017), data lake & warehouse, robust pipeline for AI & Analytics
  • 15.
    What do weuse all those data for? (hint: not just for generating business reports ) 15
  • 16.
    Realtime high-level healthinsight 16 Fast awareness if releases are unhealthy, enabling rapid reaction & mitigation
  • 17.
    Old Recommender AI– Similarity-Based Search 17 Showing boring similar products 
  • 18.
    New Recommender AI– Crunching 1.2B Monthly Views 18 Showing inspirational alternatives  Data-Driven  A/B Tested Amazing incremental growth 
  • 19.
    New Recommender AI– Crunching 1.2B Monthly Views 19
  • 20.
    Wrap Up 20 • BigData @ Bukalapak  >1.5PB • Big Data  not just for business reports • Big Data for realtime health insight can save $$$ • Big Data for AI can and do generate $$$ • We’re hiring!  Check out careers.bukalapak.com