Demystifying big data

•

0 likes•150 views

Akash Mishra

Introduction to Big Data.

Technology

What’s next?
Unanswered question of lifetime.

Unquenchable thirst of improvement
❏ How to Sell more?
❏ How to optimize inventory?
❏ How to engage customer more?
❏ What do my customer Like?
❏ How to reduce Operation Cost?

Torture the data,
and it will confess
to anything
Ronald Coase

Ever Growing Data
❏ Historical data plays important role.
❏ Data explodes while processing.
❏ More data beats better algorithms.

So What is Big Data?
When data has tendency to grow more than what one machine can
process.

Data Parallel Processing
❏ Distribute the data [ With replication]
❏ Move Computation close to Data
❏ Process each section of Data separately
❏ Aggregate the results.

Advantages of Data Parallel Model
❏ No Hardware restriction. e.g Memory, CPU.
❏ No Scalability Issue
❏ Cost effectiveness.
❏ No Single point of failure.

That’s nice, So
problem solved. But
Presentation says
Hadoop,Spark?

Challenges of Data-||-sim
❏ Data partitioning, distribution and accumulation
❏ Fault Tolerance.
❏ Distributed Coordination and management.
❏ Abstraction with the distributed complexity.

Big Data Ecosystem
❏ Distributed Data Storage System:
❏ Data distribution.
❏ Data Replication.
❏ High throughput with no single point of failure.
❏ Distributed Data Processing System:
❏ Distributing Code close to data.
❏ Abstracting distributed complexity from programmer.
❏ Fault tolerance and handling computation failure.
❏ Aggregating results.
❏ Distributed Coordination and Resource management.
❏ Resource allocation.
❏ Distributed configuration management.

Distributed Coordination and Resource management.

Speed Layer
2. Product Views
1. Web Log
3. Similar Product
4. Update user product recommendation

Batch Layer
1. User Data
2. Location Cluster per item
3. Location Cluster
per item Data
3. Current Warehouse
inventory
4. Inventory transfer.

THANK YOU
Akash Mishra
akashm@thoughtworks.com

Viewers also liked

Emevi sanati filizkaragozoglu

Büyük Selçuklu DevletiDoğukan Çetin

Karahanlilardilaybulut

Büyük selcuklufilizkaragozoglu

Beyli̇kler dönemi̇filizkaragozoglu

Minimalizmin Flat Tasarım Bağlamında Popülaritesi, Kökeni ve TemsilcileriMesut Yılmaz

Viewers also liked (6)

Emevi sanati

Büyük Selçuklu Devleti

Karahanlilar

Büyük selcuklu

Beyli̇kler dönemi̇

Minimalizmin Flat Tasarım Bağlamında Popülaritesi, Kökeni ve Temsilcileri

Similar to Demystifying big data

Build data warehouse for retail using HadoopAlex Nguyen

Optimisation vs predictionDr. Stylianos Kampakis

Big data explanation with real time use caseN.Jagadish Kumar

Stacktrace Berlin RC.2Oliver Seemann

SuperWeek 2023 - Building the case for Digital AnalyticsLukáš Čech

Bi isn't big data and big data isn't BI (updated)mark madsen

Big Data at a Gaming Company: Spil GamesRob Winters

WiDS - Unleashing the promises of big dataYara Jubran

Putting data science in your business a first utility feedbackPeculium Crypto

Next Big Thing In IT SpaceAhsan Shamsudeen

Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen

SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...Dario Mangano

NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang

Data for Action Talk - 2016-02-22David E Drummond

How to succeed at data without even trying!Dylan

One Size Doesn't Fit All: The New Database Revolutionmark madsen

Analytics-Enabled Experiences: The New Secret WeaponDatabricks

Everything has changed except usmark madsen

Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data SystemsInsightDataScience

Next-Generation BPM - How to create intelligent Business Processes thanks to ...Kai Wähner

Similar to Demystifying big data (20)

Build data warehouse for retail using Hadoop

Optimisation vs prediction

Big data explanation with real time use case

Stacktrace Berlin RC.2

SuperWeek 2023 - Building the case for Digital Analytics

Bi isn't big data and big data isn't BI (updated)

Big Data at a Gaming Company: Spil Games

WiDS - Unleashing the promises of big data

Putting data science in your business a first utility feedback

Next Big Thing In IT Space

Architecting a Data Platform For Enterprise Use (Strata NY 2018)

SDD2017 - 03 Abed Ajraou - putting data science in your business a first uti...

NYC Open Data Meetup-- Thoughtworks chief data scientist talk

Data for Action Talk - 2016-02-22

How to succeed at data without even trying!

One Size Doesn't Fit All: The New Database Revolution

Analytics-Enabled Experiences: The New Secret Weapon

Everything has changed except us

Where Is Your Data?: An Introduction to Problems and Bottlenecks in Data Systems

Next-Generation BPM - How to create intelligent Business Processes thanks to ...

Recently uploaded

CloudStudio User manual (basic edition):comworks

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

AI as an Interface for Commercial BuildingsMemoori

How to convert PDF to text with Nanonetsnaman860154

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Key Features Of Token Development (1).pptxLBM Solutions

Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Presentation on how to chat with PDF using ChatGPT code interpreternaman860154

Recently uploaded (20)

CloudStudio User manual (basic edition):

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

AI as an Interface for Commercial Buildings

How to convert PDF to text with Nanonets

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

The transition to renewables in India.pdf

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

Breaking the Kubernetes Kill Chain: Host Path Mount

GenCyber Cyber Security Day Presentation

Injustice - Developers Among Us (SciFiDevCon 2024)

Key Features Of Token Development (1).pptx

Benefits Of Flutter Compared To Other Frameworks

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

08448380779 Call Girls In Civil Lines Women Seeking Men

Pigging Solutions in Pet Food Manufacturing

Human Factors of XR: Using Human Factors to Design XR Systems

Presentation on how to chat with PDF using ChatGPT code interpreter

Demystifying big data

1. Demystifying Big Data Brown Bag

2. Everything start small

3. Traditional Approach

4. Simple Process

5. Result

6. What’s next? Unanswered question of lifetime.

7. Unquenchable thirst of improvement ❏ How to Sell more? ❏ How to optimize inventory? ❏ How to engage customer more? ❏ What do my customer Like? ❏ How to reduce Operation Cost?

8. Torture the data, and it will confess to anything Ronald Coase

9. How to get Data? Humans…..

10. Ever Growing Data ❏ Historical data plays important role. ❏ Data explodes while processing. ❏ More data beats better algorithms.

11. So What is Big Data? When data has tendency to grow more than what one machine can process.

12. Getting Right Tool

13. Data Parallel Processing ❏ Distribute the data [ With replication] ❏ Move Computation close to Data ❏ Process each section of Data separately ❏ Aggregate the results.

14. Advantages of Data Parallel Model ❏ No Hardware restriction. e.g Memory, CPU. ❏ No Scalability Issue ❏ Cost effectiveness. ❏ No Single point of failure.

15. That’s nice, So problem solved. But Presentation says Hadoop,Spark?

16. Challenges of Data-||-sim ❏ Data partitioning, distribution and accumulation ❏ Fault Tolerance. ❏ Distributed Coordination and management. ❏ Abstraction with the distributed complexity.

17. Big Data Ecosystem ❏ Distributed Data Storage System: ❏ Data distribution. ❏ Data Replication. ❏ High throughput with no single point of failure. ❏ Distributed Data Processing System: ❏ Distributing Code close to data. ❏ Abstracting distributed complexity from programmer. ❏ Fault tolerance and handling computation failure. ❏ Aggregating results. ❏ Distributed Coordination and Resource management. ❏ Resource allocation. ❏ Distributed configuration management.

18. Distributed Data Storage System

19. Distributed Data Processing System

20. Distributed Coordination and Resource management.

21. Lambda Architecture

22. How to Sell more? Recommendation.

23. Speed Layer 2. Product Views 1. Web Log 3. Similar Product 4. Update user product recommendation

24. How to optimize inventory? Predication

25. Batch Layer 1. User Data 2. Location Cluster per item 3. Location Cluster per item Data 3. Current Warehouse inventory 4. Inventory transfer.

26. THANK YOU Akash Mishra akashm@thoughtworks.com