Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

•

2 likes•2,863 views

Sharing about how Lazada overcame our challenges with scaling and having a proper data culture at the Big Data and Analytics Innovation Summit Singapore 2018

Data & Analytics

#1 Shopping
Site in SEA
145,000 sellers
3,000 brands

Lazada Data
Data App Devs expose, integrate, platform-ize
Data Scientists explore, prepare, model
Data Engineers collect, store, maintain
Start from bottom up

How much business
input/overriding?
Trade-off: Manual human input vs. automated algorithms
Necessary to some extent, but harmful if overdone
Technically, manual input and rules are difficult to maintain

How much business
input/overriding?
Example: Manual override of product ranking on the site
Allows category managers to incorporate their domain
knowledge (e.g., new product releases, trending, etc.)
Nonetheless, too much manual overriding reduced metrics
Conducted AB tests to find optimal level of manual overriding

How fast is “too fast”?
Trade-off: Development speed vs. production stability
You can move faster without building tooling/abstractions, code
reviews, automated testing, repaying technical debt, documentation
But in the long run, they save time and effort
FB: “Move fast and break things” -> “Move fast with stable infra”

How fast is “too fast”?
Moar features!
Quick POC
Automation,
testing, tooling,
clear tech debt
Environment in place
Project size
Effort
Production
Dev SpeedStability
Less effort and faster =)
More effort and slower =(
Dev, dev, dev
Development effort over the long run

How fast is “too fast”?
Example: 8 man team, 10 problems—mostly focused on delivery
In the first two years, the team achieved a lot and proved our worth
Nonetheless, as we matured and had to maintain more production
code, investing in iteration speed and code quality had high ROI

How to set priorities with
business?
Trade-off: Short-term vs. long term
Business understands best what is needed, though can be overly
focused on day-to-day ops and near term goals
Data science is aware of the latest research and can innovate, but
risks being detached from business needs

How to set priorities with
business?
Example: Timebox-ed skunkworks resulting in POCs
Data leadership sponsored some POCs that were hacked together
in 2 – 4 weeks—some eventually made it into production
Nonetheless, the focus is on research and innovation that can be
applied to improve the online shopping experience

Product
Review
API
Spam
Classification
General
Classification
Model-based
Data sources
Rule-based
Keywords
Spam
Characteristics
Review
API
Manual QC
Input and post-processing
Audit

Overall results
Significant manpower cost savings (5-figures monthly)
Existing workforce can be diverted to difficult-to-automate tasks
Reduced lead-time before reviews are live on site

Ranking is
different
from recom-
mendation

Web Tracker
(JavaScript)
Mobile Tracker
(Adjust)
3rd Party
(e.g. ,ZenDesk,
SurveyGizmo)
Kafka Queues
Bulk Loaders
(Spark)
Hadoop
Hadoop
Data
Exploration
+
Data
Preparation
+
Feature
Engineering
+
Modelling
(Spark)
Manual
Boosting
(Django)
Local
Validation
A/B
Testing
Product
Seller
Transaction
Product rankings
Split traffic and measure outcomes
(Category Managers)
(User devices)

Overall results
Better ranking improved conversion (3 – 8%) and revenue per
session (5 – 20%)
Introducing new products improved new product engagement
(CTR increased 30 – 80%; add-to-cart increased 20 – 90%)
Emphasizing product quality had neutral to positive outcomes
(reduced return rate; increased product net promoter score)

Key takeaways
There is no single best answer to the challenges raised—it
depends on the maturity stage of the team and organization
Data science > Coding + Machine Learning—many other
activities contribute greatly to the final impact

Thank you!
eugene.yan@lazada.com
Our culture: http://bit.ly/datascienceculture
How we rank products: http://bit.ly/how-lazada-ranks-products

Slides from sharing at Strata + Hadoop Singapore 2016 (http://conferences.oreilly.com/strata/hadoop-big-data-sg/public/schedule/detail/54542) Ecommerce has enabled retailers to make all of their products available to consumers and consumers to access niche products not found in brick-and-mortar stores. This growth provides consumers with unparalleled choice. Nonetheless, the sheer number of products brings with it the challenge of helping users find relevant products with ease. Lazada has tens of millions of products on its platform, and this number grows by approximately one million monthly. Lazada’s challenge: How can we help users easily discover good quality products they will like? How can we ensure product selection remains fresh and constantly updated? One way to do this is through the ranking of products. Via ranking, Lazada helps customers easily find products that will delight them by ensuring these products appear in the first few pages. I’ll share how Lazada ranks products on our website. (Note: Google “how amazon ranks products” for some industry background) Topics include how we: * Develop methodology (and tricks) to solve not-so-well-defined problems * Collect and store user-behavior data from our website and app * Clean and prepare the data (e.g., handling outliers) * Discover and create features useful features * Build models to improve customer experience and meet business objectives * Measure and test outcomes on our website * Built this end-to-end on our Hadoop infrastructure, with tools including Kafka and Spark

Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer

Sachin Aggarwal

Machine Learning with H2O

Sri Ambati

Extending Spark SQL API with Easier to Use Array Types Operations with Marek ...

Databricks

Big companies typically integrate their data from various heterogeneous systems when building a data lake as single point for accessing data. To achieve this goal technical teams often deal with data defined by complex schemas and various data formats. Spark SQL Datasets are currently compatible with data formats such as XML, Avro and Parquet by providing primitive and complex data types such as structs and arrays. Although Dataset API offers rich set of functions, general manipulation of array and deeply nested data structures is lacking. We will demonstrate this fact by providing examples of data which is currently very hard to process in Spark efficiently. We designed and developed an extension of Dataset API to allow developers to work with array and complex type elements in a more straightforward and consistent way. The extension should help users dealing with complex and structured big data to use Apache Spark as a truly generic processing framework.

Recommender systems for E-commerce

Alexander Konduforov

Data Mining: Mining stream time series and sequence data

DataminingTools Inc

Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms

Anant Corporation

During this lunch, we’ll review open-source reverse ETL tools to uncover how to send data back to SaaS systems. Sign Up For Our Newsletter: http://eepurl.com/grdMkn Join Data Engineer’s Lunch Weekly at 12 PM EST Every Monday: https://www.meetup.com/Data-Wranglers-DC/events/ Cassandra.Link: https://cassandra.link/ Follow Us and Reach Us At: Anant: https://www.anant.us/ Awesome Cassandra: https://github.com/Anant/awesome-cassandra Email: solutions@anant.us LinkedIn: https://www.linkedin.com/company/anant/ Twitter: https://twitter.com/anantcorp Eventbrite: https://www.eventbrite.com/o/anant-1072927283 Facebook: https://www.facebook.com/AnantCorp/ Join The Anant Team: https://www.careers.anant.us #data #dataengineering #datagovernance

"Structured Streaming was a new streaming API introduced to Spark over 2 years ago in Spark 2.0, and was announced GA as of Spark 2.2. Databricks customers have processed over a hundred trillion rows in production using Structured Streaming. We received dozens of questions on how to best develop, monitor, test, deploy and upgrade these jobs. In this talk, we aim to share best practices around what has worked and what hasn't across our customer base. We will tackle questions around how to plan ahead, what kind of code changes are safe for structured streaming jobs, how to architect streaming pipelines which can give you the most flexibility without sacrificing performance by using tools like Databricks Delta, how to best monitor your streaming jobs and alert if your streams are falling behind or are actually failing, as well as how to best test your code."

Apache spark

TEJPAL GAUTAM

A timeline view of Evolution of Analytics

Saurabh Banerjee

WAND Top-k Retrieval

Andrew Zhang

APACHE SPARK.pptx

DeepaThirumurugan

Democratizing Data at Airbnb

Neo4j

Introduction to data pre-processing and cleaning

Matteo Manca

Personalized search

Toine Bogers

Learn to Rank search results

Ganesh Venkataraman

7 steps to Predictive Analytics

Coforge (Erstwhile WHISHWORKS)

LuceneRDD for (Geospatial) Search and Entity Linkage

zouzias

In this talk, I will present the design and implementation of LuceneRDD for Apache Spark. LuceneRDD instantiates an inverted index on each Spark executor and collects / aggregates search results from Spark executors to the Spark driver. The main motivation behind LuceneRDD is to natively extend Spark's capabilities with full-text search, geospatial search and entity linkage without requiring an external dependency of a SolrCloud or Elasticsearch cluster. As a case study, we will show how LuceneRDD can tackle the entity linkage problem. We will demonstrate both the flexibility and efficiency of LuceneRDD for this problem. First, we will show that LuceneRDD's interface provide a highly flexible approach to its users for entity linkage. This flexibility is due to Lucene's powerful query language that is able to combine multiple full-text queries such as term, prefix, fuzzy and phrase queries. Second, we will focus on the efficiency and scalability of LuceneRDD by linking records between two relatively large datasets. Lastly and time permitting, I will present ShapeLuceneRDD which enhances LuceneRDD with geospatial queries.

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa

Spark Summit

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...

Edureka!

This Edureka Spark Tutorial will help you to understand all the basics of Apache Spark. This Spark tutorial is ideal for both beginners as well as professionals who want to learn or brush up Apache Spark concepts. Below are the topics covered in this tutorial: 1) Big Data Introduction 2) Batch vs Real Time Analytics 3) Why Apache Spark? 4) What is Apache Spark? 5) Using Spark with Hadoop 6) Apache Spark Features 7) Apache Spark Ecosystem 8) Demo: Earthquake Detection Using Apache Spark

Benchmark MinHash+LSH algorithm on Spark

Xiaoqian Liu

MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...

Francesco Cabiddu

Slides settimo intervento giornata 24 Maggio 2013 : "Una Statistica più consapevole per decisioni migliori. Giornata di Metodologia e Statistica per le Scienze Umane." Pomeriggio: La Statistica nelle Ricerche in Psicologia. Università degli studi di Cagliari. Dipartimento di Pedagogia, Psicologia e Filosofia. Università di Cagliari. TITOLO: Molti p-value nella stessa analisi: necessità e metodi di correzione. (L. Finos) Università di Padova ABSTRACT: Durante l'analisi di un dataset è uso comune postulare molteplici ipotesi sperimentali. Per rispondere a tali ipotesi si fa uso di altrettanti test e p-value ad essi associati. Questo è il caso tipico, ad esempio, di due gruppi sperimentali che vengano confrontati su più di scale o il caso di più di due gruppi confrontati a due a due su una medesima scala. In questi casi risulta necessario estendere il concetto di errore di primo tipo al caso multidimensionale. Le definizioni largamente più accettate sono il FamilyWise Error Rate e il False Discovery Rate. Le ultime tre decadi hanno visto il fiorire di un gran numero di metodi per il controllo di questi due errori di primo tipo (in ambito multidimensionale). In questo seminario verranno presentati e discussi in modo critico i metodi sopracitati e presentati i principali metodi per il controllo della molteplicità. Si faranno anche alcuni brevi accenni alle prospettive future.

bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx

Harshavardhan851231

Recommender Systems - A Review and Recent Research Trends

Sujoy Bag

Neural Learning to Rank

Bhaskar Mitra

Learning to rank (LTR) for information retrieval (IR) involves the application of machine learning models to rank artifacts, such as items to be recommended, in response to user's need. LTR models typically employ training data, such as human relevance labels and click data, to discriminatively train towards an IR objective. The focus of this tutorial will be on the fundamentals of neural networks and their applications to learning to rank.

Real time big data stream processing

Luay AL-Assadi

a simple presentation about different big data stream processing systems such as SPARK, SAMZA and STORM and the difference between their architectures and purpose, in addition we talk about streaming layers tools such as Kafka and rabbitMQ, this presentation refer to this paper https://vsis-www.informatik.uni-hamburg.de/getDoc.php/publications/561/Real-time%20stream%20processing%20for%20Big%20Data.pdf and other useful links.

The How and Why of Feature Engineering

Alice Zheng

ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...

AgileNetwork

Abstract: In today's ever-changing environment, every business needs to deliver to its customers and stakeholders. Customers need the best value for money and convenience and Stakeholders need Return on their investment. Until business need, this cusp and ready to embrace the changes needed, would not be a successful business. This requires the utmost ability to respond to the changes in the environment. This requires business agility. Key Takeaways: 1. Why business agility is very crucial in today's environment? 2. How to be agile as a business 3. Role of technology in this agility 4. Common principles between Business Agility and Software Agility 5. Ingredients to business agility.

Business and IT alignment through effective Project & Program Portfolio Manag...

Alan Kan

What's hot

Productizing Structured Streaming Jobs

Databricks

Apache spark

TEJPAL GAUTAM

A timeline view of Evolution of Analytics

Saurabh Banerjee

WAND Top-k Retrieval

Andrew Zhang

APACHE SPARK.pptx

DeepaThirumurugan

Democratizing Data at Airbnb

Neo4j

Introduction to data pre-processing and cleaning

Matteo Manca

Personalized search

Toine Bogers

Learn to Rank search results

Ganesh Venkataraman

7 steps to Predictive Analytics

Coforge (Erstwhile WHISHWORKS)

LuceneRDD for (Geospatial) Search and Entity Linkage

zouzias

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa

Spark Summit

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...

Edureka!

Benchmark MinHash+LSH algorithm on Spark

Xiaoqian Liu

MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...

Francesco Cabiddu

bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx

Harshavardhan851231

Recommender Systems - A Review and Recent Research Trends

Sujoy Bag

Neural Learning to Rank

Bhaskar Mitra

Real time big data stream processing

Luay AL-Assadi

The How and Why of Feature Engineering

Alice Zheng

What's hot (20)

Productizing Structured Streaming Jobs

Apache spark

A timeline view of Evolution of Analytics

WAND Top-k Retrieval

APACHE SPARK.pptx

Democratizing Data at Airbnb

Introduction to data pre-processing and cleaning

Personalized search

Learn to Rank search results

7 steps to Predictive Analytics

LuceneRDD for (Geospatial) Search and Entity Linkage

Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa

Apache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training ...

Benchmark MinHash+LSH algorithm on Spark

MtpMolti p-value nella stessa analisi: necessità e metodi di correzione (Livi...

bigmartsalespridictionproject-220813050638-8e9c4c31 (1).pptx

Recommender Systems - A Review and Recent Research Trends

Neural Learning to Rank

Real time big data stream processing

The How and Why of Feature Engineering

Similar to Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...

AgileNetwork

Business and IT alignment through effective Project & Program Portfolio Manag...

Alan Kan

Business and IT alignment through effective Project & Program Portfolio Manag...

Alan Kan

The Four Pillars of Analytics Technology Whitepaper

Edgar Alejandro Villegas

Hypothesis-Driven Development & How to Fail-Fast Hacking Growth

Prabhat Gupta

The Right Data Warehouse: Automation Now, Business Value Thereafter

Inside Analysis

The Briefing Room with Dr. Robin Bloor and WhereScape Live Webcast on April 1, 2014 Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=7b23b14b532bd7be60a70f6bd5209f03 In the Big Data shuffle, everyone is looking at Hadoop as “the answer” to collect interesting data from a new set of sources. While Hadoop has given organizations the power to gather more information assets than ever before, the question still looms: which data, regardless of source, structure, volume and all the rest, are significant for affecting business value – and how do we harness it? One effective approach is to bolster the data warehouse environment with a solution capable of integrating all the data sources, including Hadoop, and automating delivery of key information into the rights hands. Register for this episode of The Briefing Room to hear veteran Analyst Robin Bloor as he explains how a rapidly changing information landscape impacts data management. He will be briefed by Mark Budzinski of WhereScape, who will tout his company’s data warehouse automation solutions. Budzinski will discuss how automation can be the cornerstone for closing the gap between those responsible for data management and the people driving business decisions. Visit InsideAnlaysis.com for more information.

Startup Product Development

Aaron Stannard

Future directives in erp, erp and internet, critical success and failure factors

Varun Luthra

Npi with bpm webinarAisurya Puhan

RAD Lab Overview v04Daniel Grbac

Building Simple Continuous Reviews in ACL

Jim Kaplan CIA CFE

Continuous auditing and monitoring (“continuous reviews”) has been discussed for decades but implemented in moderation based on recent surveys. It comes down to how much are data analytics integrated into our audit processes initially to then become continuous. If a high degree of integration exists, then there is probably a good amount of continuous reviews happening in the organization already. However, most companies fall into the other camp and have not integrated analytics well enough or considered how to take full advantage of continuous reviews. This course will explain culturally what audit departments must do to embrace continuous reviews and how that can be integrated with ACL Desktop software techniques. Sample files and scripts will be provided to get you started down the road to continuous reviews. As regulatory changes sweep the globe, auditors, risk management, and compliance professionals are using more sophisticated tools, and methods. Using a live/video training library approach, we help companies of all sizes use audit and assurance software to improve business intelligence, increase efficiencies, identify fraud, test controls, and bottom line savings. AuditNet and Cash Recovery Partners Webinar recording available at auditsoftwarevideos.com and AuditNet.tv (registration required) Recording free to view. Sample Data Files for All Courses are available for $49 To purchase access to all sample data files, Excel macros and ACL scripts associated with the free training visit AuditSoftwareVideos.

Lean product management for web2.0 by Sujoy Bhatacharjee, April

Triggr In

Best Practices and Lessons Learned on Our IBM Rational Insight DeploymentMarc Nehme

CIS 499 FinalBrandon Hicks

Designing a to be process

Ihor Malytskyi

Gov Day Sacramento 2015 - Keynote/Overview

Splunk

Improving Speed to Market in E-commerce

Cognizant

What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys

NEWYORKSYS-IT SOLUTIONS

IBM Innovate - Uderstanding DevOps

Sanjeev Sharma

[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi

DataScienceConferenc1

Data teams often struggle to deliver value. KPIs, data pipelines, or ML driven predictions aren't inherently useful - unless the data team enables the business to use them. Having worked on 37 data projects over the past 5 years, with total client revenue clocking at about $350B, I started noticing simple success factors - and summarized those in the Operating Model Canvas & the Value Delivery Process. With those, I branched out into what I call data organization consulting and help clients build their data teams for success, the one you see not only on paper but also in your P&L. In this talk, I'll share some insight with you.

Similar to Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018) (20)

ANI | Business Agility Day @Gurugram | Are you a responsible Business | Dilje...

Business and IT alignment through effective Project & Program Portfolio Manag...

The Four Pillars of Analytics Technology Whitepaper

Hypothesis-Driven Development & How to Fail-Fast Hacking Growth

The Right Data Warehouse: Automation Now, Business Value Thereafter

Startup Product Development

Future directives in erp, erp and internet, critical success and failure factors

Npi with bpm webinar

RAD Lab Overview v04

Building Simple Continuous Reviews in ACL

Lean product management for web2.0 by Sujoy Bhatacharjee, April

Best Practices and Lessons Learned on Our IBM Rational Insight Deployment

CIS 499 Final

Designing a to be process

Gov Day Sacramento 2015 - Keynote/Overview

Improving Speed to Market in E-commerce

What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys

IBM Innovate - Uderstanding DevOps

[DSC Europe 22] The Making of a Data Organization - Denys Holovatyi

More from Eugene Yan Ziyou

System design for recommendations and search

Eugene Yan Ziyou

Recommender Systems: Beyond the user-item matrix

Eugene Yan Ziyou

Recommendation systems. They're a pretty old topic that started way back in the 1990s. A meetup on it sounds like it'll be boring... if we only talked about the standard user-item matrix collaborative filtering on big data systems. Thankfully, for this meetup, we'll be sharing on how we can adopt some more recent techniques to recommend products, including social media graphs (and random walks), sequences (and NLP), and PyTorch. The sharing will cover everything starting from data acquisition and preparation, implementation of multiple techniques, and result comparisons. Some familiarity with Python and PyTorch would be useful; minimal math required.

Predicting Hospital Bills at Pre-admission

Eugene Yan Ziyou

Healthcare expenditure is set to rise over the coming years. Cost will undoubtedly influence patients’ decision-making when it comes to diagnosis and treatment. For healthcare providers, providing up-front cost estimates improves patient experience, making patients more willing to return (if required) in the future. For patients, having accurate pre-admission estimates allow for informed decisions and adequate preparation, reducing payment challenges after treatment. Ultimately, this case is a first step towards (i) standardization of healthcare cost estimation and (ii) price transparency to build trust between healthcare providers, payers, and patients.

OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants

Eugene Yan Ziyou

- Scaling across multiple properties while centralising capabilities - How to decide what to centralise / decentralise? - Alibaba & Grab: How do they scale across multiple commerce sites? - SuperApps in China and Southeast Asia - Why / why not go the SuperApp approach? - WeChat & Grab: SuperApps of Asia - Case Study: Alibaba’s playbook for integrating acquisitions (Lazada and Daraz) - What were the key tactics and priorities? - Lessons learnt

SMU BIA Sharing on Data Science

Eugene Yan Ziyou

Culture at Lazada Data Science

Eugene Yan Ziyou

Competition Improves Performance: Only when Competition Form matches Goal Ori...

Eugene Yan Ziyou

Sharing about my data science journey and what I do at Lazada

Eugene Yan Ziyou

Was invited to share with the SMU Masters of IT in Business students on (i) how I got to my current position as a data scientist and (ii) what I do in my current position. Includes suggested areas to focus on (e.g., distributed systems and processing) and how to gain more experience (e.g., volunteering). I also go through the problems that we solve at Lazada using machine learning and a high level architecture of how we do it.

AXA x DSSG Meetup Sharing (Feb 2016)

Eugene Yan Ziyou

Garuda Robotics x DataScience SG Meetup (Sep 2015)

Eugene Yan Ziyou

What exactly goes on in the commercial drone/UAV industry in Singapore and globally? Behind the hype of consumer “selfie” drones lies a vast number of interesting commercial applications, where drones become an enabler for enterprises to gain new aerial perspectives of their facilities and estates, to make intelligent decisions incorporating this additional dimension of data. In this presentation, we will look at one such drones-at-work application to reveal some of the behind-the-scene processes and technologies employed. Specifically, we will dive into the precision agriculture domain and share some of the computer vision problems we face, and take a look at various potential solutions to these challenges.

DataKind SG sharing of our first DataDive

Eugene Yan Ziyou

Social network analysis and growth recommendations for DataScience SG community

Eugene Yan Ziyou

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt

Eugene Yan Ziyou

Nielsen x DataScience SG Meetup (Apr 2015)

Eugene Yan Ziyou

Statistical inference: Statistical Power, ANOVA, and Post Hoc tests

Eugene Yan Ziyou

Statistical inference: Hypothesis Testing and t-tests

Eugene Yan Ziyou

Statistical inference: Probability and Distribution

Eugene Yan Ziyou

A Study on the Relationship between Education and Income in the US

Eugene Yan Ziyou

Diving into Twitter data on consumer electronic brands

Eugene Yan Ziyou

More from Eugene Yan Ziyou (19)

System design for recommendations and search

Recommender Systems: Beyond the user-item matrix

Predicting Hospital Bills at Pre-admission

OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants

SMU BIA Sharing on Data Science

Culture at Lazada Data Science

Competition Improves Performance: Only when Competition Form matches Goal Ori...

Sharing about my data science journey and what I do at Lazada

AXA x DSSG Meetup Sharing (Feb 2016)

Garuda Robotics x DataScience SG Meetup (Sep 2015)

DataKind SG sharing of our first DataDive

Social network analysis and growth recommendations for DataScience SG community

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt

Nielsen x DataScience SG Meetup (Apr 2015)

Statistical inference: Statistical Power, ANOVA, and Post Hoc tests

Statistical inference: Hypothesis Testing and t-tests

Statistical inference: Probability and Distribution

A Study on the Relationship between Education and Income in the US

Diving into Twitter data on consumer electronic brands

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx

balafet

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理

slg6lamcq

原版定制【微信:41543339】【(Adelaide毕业证书)阿德莱德大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

Learn SQL from basic queries to Advance queries

manishkhaire30

Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively. Key Highlights: Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation. Advanced Queries: Learn to craft complex queries to uncover deep insights from your data. Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets. Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios. Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making. Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data! #DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics

Adjusting primitives for graph : SHORT REPORT / NOTES

Subhajit Sahu

Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is Multiply with different modes (map) 1. Performance of sequential execution based vs OpenMP based vector multiply. 2. Comparing various launch configs for CUDA based vector multiply. Sum with different storage types (reduce) 1. Performance of vector element sum using float vs bfloat16 as the storage type. Sum with different modes (reduce) 1. Performance of sequential execution based vs OpenMP based vector element sum. 2. Performance of memcpy vs in-place based CUDA based vector element sum. 3. Comparing various launch configs for CUDA based vector element sum (memcpy). 4. Comparing various launch configs for CUDA based vector element sum (in-place). Sum with in-place strategies of CUDA mode (reduce) 1. Comparing various launch configs for CUDA based vector element sum (in-place).

The affect of service quality and online reviews on customer loyalty in the E...

jerlynmaetalle

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理

74nqk8xf

毕业原版【微信:41543339】【(Coventry毕业证书)考文垂大学毕业证】【微信:41543339】成绩单、外壳、offer、留信学历认证（永久存档真实可查）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【我们承诺采用的是学校原版纸张（纸质、底色、纹路），我们拥有全套进口原装设备，特殊工艺都是采用不同机器制作，仿真度基本可以达到100%，所有工艺效果都可提前给客户展示，不满意可以根据客户要求进行调整，直到满意为止！】【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信41543339】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信41543339】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才留信网服务项目： 1、留学生专业人才库服务（留信分析） 2、国（境）学习人员提供就业推荐信服务 3、留学人员区块链存储服务 → 【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：客户在留信官方认证查询网站查询到认证通过结果后付款，不成功不收费！

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样

u86oixdj

学校原件一模一样【微信：741003700 】《(Deakin毕业证书)迪肯大学毕业证学位证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样

u86oixdj

学校原件一模一样【微信：741003700 】《(swinburne毕业证书)斯威本科技大学毕业证》【微信：741003700 】学位证，留信认证（真实可查，永久存档）原件一模一样纸张工艺/offer、雅思、外壳等材料/诚信可靠,可直接看成品样本，帮您解决无法毕业带来的各种难题！外壳，原版制作，诚信可靠，可直接看成品样本。行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备。十五年致力于帮助留学生解决难题，包您满意。本公司拥有海外各大学样板无数，能完美还原。 1:1完美还原海外各大学毕业材料上的工艺：水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠。文字图案浮雕、激光镭射、紫外荧光、温感、复印防伪等防伪工艺。材料咨询办理、认证咨询办理请加学历顾问Q/微741003700 【主营项目】一.毕业证【q微741003700】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【q/微741003700】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】

NABLAS株式会社

Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf

Enterprise Wired

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样

axoqas

原版定制【Q微信:741003700】《(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书》【Q微信:741003700】成绩单、雅思、外壳、留信学历认证永久存档查询，采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【Q微信741003700】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信741003700】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才【关于价格问题（保证一手价格）】我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Subhajit Sahu

Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.

Everything you wanted to know about LIHTC

Roger Valdez

Malana- Gimlet Market Analysis (Portfolio 2)

TravisMalana

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

Subhajit Sahu

Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.

The Building Blocks of QuestDB, a Time Series Database

javier ramirez

Talk Delivered at Valencia Codes Meetup 2024-06. Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds. It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.

My burning issue is homelessness K.C.M.O.

rwarrenll

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Timothy Spann

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI Discussion on Vector Databases, Unstructured Data and AI https://www.meetup.com/unstructured-data-meetup-new-york/ This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.

Criminal IP - Threat Hunting Webinar.pdf

Criminal IP

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx

AnirbanRoy608946

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx

一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理

Learn SQL from basic queries to Advance queries

Adjusting primitives for graph : SHORT REPORT / NOTES

The affect of service quality and online reviews on customer loyalty in the E...

一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理

原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样

原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】

Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...

Everything you wanted to know about LIHTC

Malana- Gimlet Market Analysis (Portfolio 2)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...

The Building Blocks of QuestDB, a Time Series Database

My burning issue is homelessness K.C.M.O.

06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...

Criminal IP - Threat Hunting Webinar.pdf

Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx

Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

1. Data Science Challenges and Impact @Lazada Big Data & Analytics Innovation Summit Singapore 2018

2. #1 Shopping Site in SEA 145,000 sellers 3,000 brands

3. Lazada Data Data App Devs expose, integrate, platform-ize Data Scientists explore, prepare, model Data Engineers collect, store, maintain Start from bottom up

4. Considerations and Challenges

5. How much business input/overriding? Trade-off: Manual human input vs. automated algorithms Necessary to some extent, but harmful if overdone Technically, manual input and rules are difficult to maintain

6. How much business input/overriding? Example: Manual override of product ranking on the site Allows category managers to incorporate their domain knowledge (e.g., new product releases, trending, etc.) Nonetheless, too much manual overriding reduced metrics Conducted AB tests to find optimal level of manual overriding

7. How fast is “too fast”? Trade-off: Development speed vs. production stability You can move faster without building tooling/abstractions, code reviews, automated testing, repaying technical debt, documentation But in the long run, they save time and effort FB: “Move fast and break things” -> “Move fast with stable infra”

8. How fast is “too fast”? Moar features! Quick POC Automation, testing, tooling, clear tech debt Environment in place Project size Effort Production Dev SpeedStability Less effort and faster =) More effort and slower =( Dev, dev, dev Development effort over the long run

9. How fast is “too fast”? Example: 8 man team, 10 problems—mostly focused on delivery In the first two years, the team achieved a lot and proved our worth Nonetheless, as we matured and had to maintain more production code, investing in iteration speed and code quality had high ROI

10. How to set priorities with business? Trade-off: Short-term vs. long term Business understands best what is needed, though can be overly focused on day-to-day ops and near term goals Data science is aware of the latest research and can innovate, but risks being detached from business needs

11. How to set priorities with business? Example: Timebox-ed skunkworks resulting in POCs Data leadership sponsored some POCs that were hacked together in 2 – 4 weeks—some eventually made it into production Nonetheless, the focus is on research and innovation that can be applied to improve the online shopping experience

12. Development and Impact

13. Automated Review QC

14. Product Review API Spam Classification General Classification Model-based Data sources Rule-based Keywords Spam Characteristics Review API Manual QC Input and post-processing Audit

15. Overall results Significant manpower cost savings (5-figures monthly) Existing workforce can be diverted to difficult-to-automate tasks Reduced lead-time before reviews are live on site

16. Product Ranking

17. Ranking affects what appears on top

18. Ranking is different from recommendation

19. Web Tracker (JavaScript) Mobile Tracker (Adjust) 3rd Party (e.g. ,ZenDesk, SurveyGizmo) Kafka Queues Bulk Loaders (Spark) Hadoop Hadoop Data Exploration + Data Preparation + Feature Engineering + Modelling (Spark) Manual Boosting (Django) Local Validation A/B Testing Product Seller Transaction Product rankings Split traffic and measure outcomes (Category Managers) (User devices)

20. Overall results Better ranking improved conversion (3 – 8%) and revenue per session (5 – 20%) Introducing new products improved new product engagement (CTR increased 30 – 80%; add-to-cart increased 20 – 90%) Emphasizing product quality had neutral to positive outcomes (reduced return rate; increased product net promoter score)

21. Key takeaways There is no single best answer to the challenges raised—it depends on the maturity stage of the team and organization Data science > Coding + Machine Learning—many other activities contribute greatly to the final impact

22. Thank you! eugene.yan@lazada.com Our culture: http://bit.ly/datascienceculture How we rank products: http://bit.ly/how-lazada-ranks-products

Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)

Similar to Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018) (20)

More from Eugene Yan Ziyou

More from Eugene Yan Ziyou (19)

Recently uploaded

Recently uploaded (20)

Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovation Summit Singapore 2018)