Leveraging Hadoop to mine customer insights in a developing market

•

2 likes•1,152 views

I was a speaker at Big data world conference in London on the 18th september 2012. http://www.terrapinn.com/2012/big-data-world-europe/ See full text speech at http://webkpis.com/2012/11/hadoop-implementation-in-wikimart/ Incorporating Hadoop technology within your infrastructure to cut costs and increase the scale of your operations Understanding how Hadoop can provide insightful data analysis to the end user Combining Hadoop with existing enterprise systems to deepen your insight and discover previously hidden trends Will Hadoop replace the need for relational data warehousing systems?

Technology

Leveraging Hadoop in Wikimart
Roman Zykov
Head of analytics
http://wikimart.ru

London, Big Data World Europe, 20th September 2012

Key problem

To be or not to be….

Hadoop

Introduction

Key tasks for Wikimart

What
• BI tasks
• Web analytics (in-house solution)
• Recommendations on site
• Data services for marketing

Who
• Core analytics team
• Analytics members in other departments
• IT site operations

Problem

Too time consuming or too
expensive?
• Data volume
• # of data services

Map Reduce

Standalone

DATA

Map Reduce

Our idea

New platform for “Big Data” tasks only

• Start research on Map Reduce software
• First patient - recommendation engine

Difficulties
- no planned budget -> Hadoop is free
- no experts -> learn it
- no hardware -> virtual cluster

Requirements for Hadoop

• Easy scalable
• Easy deployment
• Easy integration
• Less low level Java coding
• SQL-like querries

Accomplishments

Recommendations
• Collaborative filtering (item-to-item on browsing history, PIG)
• Similar products (items attributes, PIG)
• Most popular items (browsing history + orders, HiveQL)
• Internal and external search recommendations (HiveQL)

Some statistics after 1 year
• >10% of revenue
• 3 months to launch
• Tens of gigabytes are processed 2 hours daily
• 1 crash only (cluster lost power)

Decision: Invest to Hardware cluster

End user

Internal high-level languages
• HiveQL
• Pig

Reporting
• Pre-aggregated data for OLAP
• RDBMS - front end
• OLAP and Reporting software should
support HiveQL

Data Integration

• SQOOP
• Parallel data exchange with RDBMS
(MS SQL, MySQL, Oracle, Teradata… )
• Incremental updates
• HDFS, Hive, HBASE

• Talend Open Studio

Hadoop vs RDBMS

• Never replace RDBMS:
• Latency
• Weak capabilities of HiveQL vs SQL
• Only some tasks with offline processing:
• Machine learning
• Queries to Big tables
• ….
• Real time: NOSQL

Hadoop myth

Terabytes?
Petabytes?

Big tasks!

Conclusion

• Hadoop is not Rocket Science
• Intermediate data can be Big Data

Starter kit
• Hadoop management system
• Virtual hardware (cloud, virtual servers, etc)
• Offline data tasks
• Pig or HiveQL
• Sqoop: import data from existing data sources

Thank you!!!

rzykov@gmail.com
linkedin.com/in/romanzykov

Recently uploaded

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Tech Trends Report 2024 Future Today Institute.pdfhans926745

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Recently uploaded (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

Axa Assurance Maroc - Insurer Innovation Award 2024

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Presentation on how to chat with PDF using ChatGPT code interpreter

Driving Behavioral Change for Information Management through Data-Driven Gree...

Finology Group – Insurtech Innovation Award 2024

Strategies for Landing an Oracle DBA Job as a Fresher

🐬 The future of MySQL is Postgres 🐘

Boost Fertility New Invention Ups Success Rates.pdf

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

IAC 2024 - IA Fast Track to Search Focused AI Solutions

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

presentation ICT roal in 21st century education

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Exploring the Future Potential of AI-Enabled Smartphone Processors

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Automating Google Workspace (GWS) & more with Apps Script

Tech Trends Report 2024 Future Today Institute.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Leveraging Hadoop to mine customer insights in a developing market

1. Leveraging Hadoop in Wikimart Roman Zykov Head of analytics http://wikimart.ru London, Big Data World Europe, 20th September 2012

2. Key problem To be or not to be…. Hadoop Introduction

3. Key tasks for Wikimart What • BI tasks • Web analytics (in-house solution) • Recommendations on site • Data services for marketing Who • Core analytics team • Analytics members in other departments • IT site operations

4. Problem Too time consuming or too expensive? • Data volume • # of data services

5. Map Reduce Standalone DATA Map Reduce

6. Our idea New platform for “Big Data” tasks only • Start research on Map Reduce software • First patient - recommendation engine Difficulties - no planned budget -> Hadoop is free - no experts -> learn it - no hardware -> virtual cluster

7. Requirements for Hadoop • Easy scalable • Easy deployment • Easy integration • Less low level Java coding • SQL-like querries

8. Data flow DWH Data feeds

9. Accomplishments Recommendations • Collaborative filtering (item-to-item on browsing history, PIG) • Similar products (items attributes, PIG) • Most popular items (browsing history + orders, HiveQL) • Internal and external search recommendations (HiveQL) Some statistics after 1 year • >10% of revenue • 3 months to launch • Tens of gigabytes are processed 2 hours daily • 1 crash only (cluster lost power) Decision: Invest to Hardware cluster

10. End user Internal high-level languages • HiveQL • Pig Reporting • Pre-aggregated data for OLAP • RDBMS - front end • OLAP and Reporting software should support HiveQL

11. Data Integration • SQOOP • Parallel data exchange with RDBMS (MS SQL, MySQL, Oracle, Teradata… ) • Incremental updates • HDFS, Hive, HBASE • Talend Open Studio

12. Hadoop vs RDBMS • Never replace RDBMS: • Latency • Weak capabilities of HiveQL vs SQL • Only some tasks with offline processing: • Machine learning • Queries to Big tables • …. • Real time: NOSQL

13. Hadoop myth Terabytes? Petabytes? Big tasks!

14. Conclusion • Hadoop is not Rocket Science • Intermediate data can be Big Data Starter kit • Hadoop management system • Virtual hardware (cloud, virtual servers, etc) • Offline data tasks • Pig or HiveQL • Sqoop: import data from existing data sources

15. Thank you!!! rzykov@gmail.com linkedin.com/in/romanzykov

Leveraging Hadoop to mine customer insights in a developing market

Recommended

Recommended

More Related Content

More from Roman Zykov

More from Roman Zykov (20)

Recently uploaded

Recently uploaded (20)

Leveraging Hadoop to mine customer insights in a developing market