SlideShare a Scribd company logo
1 of 17
Big Data In Action
Ilya Buzytsky
Big Data: not all things to all people
• Different Business Verticals have different uses for Big
Data problems
• Specifically, Web Analytics yields itself very nicely to
Big Data Solutions: scales well with Big Data
distributed approach, for both storage and processing
• Other Business Verticals are not as lucky: harder to
parallelize where more complex computations are
required
Big Data for Web Analytics: where to start?
So we got us a “Big Data” system somewhere in the
cloud. Now what?
We can start with asking 3 basic questions:
1. Can this system do what our legacy software stack
cannot do?
2. Even if we got a working legacy solution, can it be
done better with the “Big Data” solution?
3. Can it integrate into what we got already?
We will learn how to address these questions.
Big Data Solutions: Context is everything!
Most businesses have a working BI Solution (or many)
There comes a time when source data grows so much
that it becomes “unwieldy”
Choice is:
1. Scale out the legacy systems’ hardware to try and
accommodate the growth
OR
2. Use a Big Data solution to help with scaling in a new
and exciting ways
Big Data for Web Analytics: Common Pitfalls?
Common Pitfalls
• Pitfall 1: Just storing a bunch of information forever
does not make a real or useful Big Data Solution
• Pitfall 2: Big Data IS big. There is no good way to simply
pull a bunch of it onto your computer and “play with it”
(a very common analyst ask)
Solving the Big Data problem: A Structured Approach
Avoiding the pitfalls requires a structured approach to
figuring out how to make Big Data useful
1. Define business requirements
2. Understand the target audience. Who is the
consumer of the findings from your Big Data
solution? Solution targeting top level executive often
looks different from the one targeting “in the
trenches” analysts
3. Understand your data sources
4. Understand your business analytical pain points well
Building a Big Data Pipeline
• A “data pipeline” is a well defined process that allows sequential
transformation of often unstructured and otherwise
unmanageable data into structured sets that can be used by the
Business for BI and other purposes
• Building pipelines allows to establish clearly defined data flows
within parameterized guidelines
• Big Data Pipelines often leverage existing software stacks to
create or augment BI Solutions already in place
• Sometimes pipelines are required as intermediary mechanism to
communicate needed data from one external source to another,
for purposes of improved BI capabilities
BI Pipelines: Examples
Pipeline Example 1: Unstructured data to the in-house Data
Warehouse
Addresses the how we can augment existing BI solution with a Big
Data capability
Big Data Solutions: BI Refactoring
Simple example of a classic BI solution
Data WarehouseExternal Data
Source Data
(SQL Server
OLTP)
Source Data
(Text Files)
Source Data
(Binary Logs)
ETL
ETL
ETL
Our huge-all-
encompassing Data
Warehouse
Presentation
MobileMobile
ExcelExcel
WebWeb
Big Data Solutions: BI Refactoring
Too much data results in SLAs being missed and too little data
getting to the Data Warehouse layer.
Data WarehouseExternal Data
Source Data
(SQL Server
OLTP)
Source Data
(Text Files)
Source Data
(Binary Logs)
ETL
ETL
ETL
Our huge-all-
encompassing Data
Warehouse
Presentation
MobileMobile
ExcelExcel
WebWeb
S
Scalability Fault Line
Big Data Solutions: BI Refactoring
Lets try and refactor: take 1
Big
Data
Storage
and
Processing
Big
Data
Storage
and
Processing
Data WarehouseExternal Data
Source Data
(SQL Server
OLTP)
Source Data
(Text Files)
Source Data
(Binary Logs)
ETL
Our huge-all-
encompassing Data
Warehouse
Presentation
MobileMobile
ExcelExcel
WebWeb
BI Pipelines: Example Diagrams (cont.)
Pipeline Example 2: Unstructured data, enriched and cleaned, to the
3rd party Analytics Solution
Sometimes we are just an intermediary, making data better but not
passing it on
2 (sometimes more) large systems, need to scale data enrichment
when moving large volumes across quickly and efficiently
Big Data Solutions: 3rd party integration
BigDataStorageandEnrichment
BigDataStorageandEnrichment
Source System
Source Data
(SQL Server
OLTP)
Source Data
(Text Files)
Source Data
(Binary Logs)
Destination System
Enrichment
Metadata
Feed Out
(FTP)
Feed Out
(Share)
• Raw level data is giant: 10 Billion entries per month
• It makes no sense to look at every event
• KPIs require data to be aggregate (in a meaningful
way)
 Enter Distributed Storage and Processing
• Now we can aggregate using the pre-defined KPI
requirements and using the Big Data engine to slim
down the result set to a BI Data Warehouse
manageable size
Case Study 1: Web Logs Processing to identify Market Segmentation
Raw Log Storage
Log Files
(Text)
10 Bln.
Filter
Clean
Aggregate
DW (SQL)
Big Data Engine
Enrich
100 Mln.
Case Study 1: Web Logs Processing to identify Market Segmentation
• Current generation of technologies is usually
complementary
• Emphasis on a specific technology depends on the
specific business requirements: real-time analysis vs.
reporting vs. data mining
• Mix of technologies is good
• Pick your tools wisely
Using Specific Big Data Technologies
• Prioritize your needs
• Identify failing links
• Plan for rapid growth
• Plan for a relatively steep learning curve for your
Engineering
• In-house IP (co-)ownership is a good thing
• Scalability != unlimited resources
In Summary

More Related Content

What's hot

5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance Qubole
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome ThemQubole
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?panoratio
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analyticsDendej Sawarnkatat
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMHoi Lan Leong
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousingSatya P. Joshi
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applicationspanoratio
 
Brochure_Big-Data_Offerings
Brochure_Big-Data_OfferingsBrochure_Big-Data_Offerings
Brochure_Big-Data_OfferingsAnisha Lamba
 
Big data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik VandeputteBig data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik VandeputteInspireX
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsDenodo
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best PracticesYellowfin
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Augmented analytics will push the analytics adoption
Augmented analytics will push the analytics adoptionAugmented analytics will push the analytics adoption
Augmented analytics will push the analytics adoptionPolestarsolutions
 
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...Discover how Covid-19 is accelerating the need for healthcare interoperabilit...
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...Denodo
 

What's hot (20)

5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them7 Big Data Challenges and How to Overcome Them
7 Big Data Challenges and How to Overcome Them
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
Big Data
Big DataBig Data
Big Data
 
Big, small or just complex data?
Big, small or just complex data?Big, small or just complex data?
Big, small or just complex data?
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
6 levels of big data analytics applications
6 levels of big data analytics applications6 levels of big data analytics applications
6 levels of big data analytics applications
 
Big Data Hadoop
Big Data HadoopBig Data Hadoop
Big Data Hadoop
 
Brochure_Big-Data_Offerings
Brochure_Big-Data_OfferingsBrochure_Big-Data_Offerings
Brochure_Big-Data_Offerings
 
Big data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik VandeputteBig data, your data, all data - Frederik Vandeputte
Big data, your data, all data - Frederik Vandeputte
 
Logical Data Fabric: Architectural Components
Logical Data Fabric: Architectural ComponentsLogical Data Fabric: Architectural Components
Logical Data Fabric: Architectural Components
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and Business
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Big data
Big dataBig data
Big data
 
Augmented analytics will push the analytics adoption
Augmented analytics will push the analytics adoptionAugmented analytics will push the analytics adoption
Augmented analytics will push the analytics adoption
 
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...Discover how Covid-19 is accelerating the need for healthcare interoperabilit...
Discover how Covid-19 is accelerating the need for healthcare interoperabilit...
 

Viewers also liked

Amper overview slide share
Amper overview   slide shareAmper overview   slide share
Amper overview slide shareChad Richeson
 
Idha ekonomi 1
Idha ekonomi 1Idha ekonomi 1
Idha ekonomi 1Taza Net
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopChad Richeson
 
Kerchan Group---New Product Catalog
Kerchan Group---New Product CatalogKerchan Group---New Product Catalog
Kerchan Group---New Product CatalogJudy Chen
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics PrimerChad Richeson
 
Presentation on works ethics in noble.............
Presentation on works ethics in noble.............Presentation on works ethics in noble.............
Presentation on works ethics in noble.............jyoti savaliya
 
Why analytics matters
Why analytics mattersWhy analytics matters
Why analytics mattersChad Richeson
 
Presentation on decision making
Presentation on decision makingPresentation on decision making
Presentation on decision makingjyoti savaliya
 
Advert for a Giant (1742 text)
Advert for a Giant (1742 text)Advert for a Giant (1742 text)
Advert for a Giant (1742 text)Saskia Simm
 
Presentation on pa in noble
Presentation on pa in noblePresentation on pa in noble
Presentation on pa in noblejyoti savaliya
 
Philip Larkin Poet Notes
Philip Larkin Poet NotesPhilip Larkin Poet Notes
Philip Larkin Poet NotesSaskia Simm
 

Viewers also liked (14)

Amper overview slide share
Amper overview   slide shareAmper overview   slide share
Amper overview slide share
 
Idha ekonomi 1
Idha ekonomi 1Idha ekonomi 1
Idha ekonomi 1
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
cv shijith
cv shijithcv shijith
cv shijith
 
Kerchan Group---New Product Catalog
Kerchan Group---New Product CatalogKerchan Group---New Product Catalog
Kerchan Group---New Product Catalog
 
Prima Alam Internasional 2015
Prima Alam Internasional 2015  Prima Alam Internasional 2015
Prima Alam Internasional 2015
 
Web Analytics Primer
Web Analytics PrimerWeb Analytics Primer
Web Analytics Primer
 
Presentation on works ethics in noble.............
Presentation on works ethics in noble.............Presentation on works ethics in noble.............
Presentation on works ethics in noble.............
 
Why analytics matters
Why analytics mattersWhy analytics matters
Why analytics matters
 
Presentation on decision making
Presentation on decision makingPresentation on decision making
Presentation on decision making
 
Advert for a Giant (1742 text)
Advert for a Giant (1742 text)Advert for a Giant (1742 text)
Advert for a Giant (1742 text)
 
Presentation on pa in noble
Presentation on pa in noblePresentation on pa in noble
Presentation on pa in noble
 
Philip Larkin Poet Notes
Philip Larkin Poet NotesPhilip Larkin Poet Notes
Philip Larkin Poet Notes
 
Mr bleaney
Mr bleaneyMr bleaney
Mr bleaney
 

Similar to Big data in action

Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
Agile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimeAgile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimePerficient, Inc.
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataSpringPeople
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataVipin Batra
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time AnalyticsMohsin Hakim
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 

Similar to Big data in action (20)

Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
E05WAREH1.PPT
E05WAREH1.PPTE05WAREH1.PPT
E05WAREH1.PPT
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data at a Glance
Big Data at a GlanceBig Data at a Glance
Big Data at a Glance
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Agile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less TimeAgile BI: How to Deliver More Value in Less Time
Agile BI: How to Deliver More Value in Less Time
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Big data Question bank.pdf
Big data Question bank.pdfBig data Question bank.pdf
Big data Question bank.pdf
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Real Time Analytics
Real Time AnalyticsReal Time Analytics
Real Time Analytics
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Recently uploaded

Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataExhibitors Data
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with CultureSeta Wicaksana
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 

Recently uploaded (20)

Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 

Big data in action

  • 1. Big Data In Action Ilya Buzytsky
  • 2. Big Data: not all things to all people • Different Business Verticals have different uses for Big Data problems • Specifically, Web Analytics yields itself very nicely to Big Data Solutions: scales well with Big Data distributed approach, for both storage and processing • Other Business Verticals are not as lucky: harder to parallelize where more complex computations are required
  • 3. Big Data for Web Analytics: where to start? So we got us a “Big Data” system somewhere in the cloud. Now what? We can start with asking 3 basic questions: 1. Can this system do what our legacy software stack cannot do? 2. Even if we got a working legacy solution, can it be done better with the “Big Data” solution? 3. Can it integrate into what we got already? We will learn how to address these questions.
  • 4. Big Data Solutions: Context is everything! Most businesses have a working BI Solution (or many) There comes a time when source data grows so much that it becomes “unwieldy” Choice is: 1. Scale out the legacy systems’ hardware to try and accommodate the growth OR 2. Use a Big Data solution to help with scaling in a new and exciting ways
  • 5. Big Data for Web Analytics: Common Pitfalls? Common Pitfalls • Pitfall 1: Just storing a bunch of information forever does not make a real or useful Big Data Solution • Pitfall 2: Big Data IS big. There is no good way to simply pull a bunch of it onto your computer and “play with it” (a very common analyst ask)
  • 6. Solving the Big Data problem: A Structured Approach Avoiding the pitfalls requires a structured approach to figuring out how to make Big Data useful 1. Define business requirements 2. Understand the target audience. Who is the consumer of the findings from your Big Data solution? Solution targeting top level executive often looks different from the one targeting “in the trenches” analysts 3. Understand your data sources 4. Understand your business analytical pain points well
  • 7. Building a Big Data Pipeline • A “data pipeline” is a well defined process that allows sequential transformation of often unstructured and otherwise unmanageable data into structured sets that can be used by the Business for BI and other purposes • Building pipelines allows to establish clearly defined data flows within parameterized guidelines • Big Data Pipelines often leverage existing software stacks to create or augment BI Solutions already in place • Sometimes pipelines are required as intermediary mechanism to communicate needed data from one external source to another, for purposes of improved BI capabilities
  • 8. BI Pipelines: Examples Pipeline Example 1: Unstructured data to the in-house Data Warehouse Addresses the how we can augment existing BI solution with a Big Data capability
  • 9. Big Data Solutions: BI Refactoring Simple example of a classic BI solution Data WarehouseExternal Data Source Data (SQL Server OLTP) Source Data (Text Files) Source Data (Binary Logs) ETL ETL ETL Our huge-all- encompassing Data Warehouse Presentation MobileMobile ExcelExcel WebWeb
  • 10. Big Data Solutions: BI Refactoring Too much data results in SLAs being missed and too little data getting to the Data Warehouse layer. Data WarehouseExternal Data Source Data (SQL Server OLTP) Source Data (Text Files) Source Data (Binary Logs) ETL ETL ETL Our huge-all- encompassing Data Warehouse Presentation MobileMobile ExcelExcel WebWeb S Scalability Fault Line
  • 11. Big Data Solutions: BI Refactoring Lets try and refactor: take 1 Big Data Storage and Processing Big Data Storage and Processing Data WarehouseExternal Data Source Data (SQL Server OLTP) Source Data (Text Files) Source Data (Binary Logs) ETL Our huge-all- encompassing Data Warehouse Presentation MobileMobile ExcelExcel WebWeb
  • 12. BI Pipelines: Example Diagrams (cont.) Pipeline Example 2: Unstructured data, enriched and cleaned, to the 3rd party Analytics Solution Sometimes we are just an intermediary, making data better but not passing it on
  • 13. 2 (sometimes more) large systems, need to scale data enrichment when moving large volumes across quickly and efficiently Big Data Solutions: 3rd party integration BigDataStorageandEnrichment BigDataStorageandEnrichment Source System Source Data (SQL Server OLTP) Source Data (Text Files) Source Data (Binary Logs) Destination System Enrichment Metadata Feed Out (FTP) Feed Out (Share)
  • 14. • Raw level data is giant: 10 Billion entries per month • It makes no sense to look at every event • KPIs require data to be aggregate (in a meaningful way)  Enter Distributed Storage and Processing • Now we can aggregate using the pre-defined KPI requirements and using the Big Data engine to slim down the result set to a BI Data Warehouse manageable size Case Study 1: Web Logs Processing to identify Market Segmentation
  • 15. Raw Log Storage Log Files (Text) 10 Bln. Filter Clean Aggregate DW (SQL) Big Data Engine Enrich 100 Mln. Case Study 1: Web Logs Processing to identify Market Segmentation
  • 16. • Current generation of technologies is usually complementary • Emphasis on a specific technology depends on the specific business requirements: real-time analysis vs. reporting vs. data mining • Mix of technologies is good • Pick your tools wisely Using Specific Big Data Technologies
  • 17. • Prioritize your needs • Identify failing links • Plan for rapid growth • Plan for a relatively steep learning curve for your Engineering • In-house IP (co-)ownership is a good thing • Scalability != unlimited resources In Summary