SlideShare a Scribd company logo
1 of 14
WORKING WITH BIG
DATA
BY:
GURUABIRAMI.D
M.SC IT
DEPARTMENT OF CS & IT
NADAR SARASWATHI
COLLEGE OF ARTS AND SEIENCE, THENI.
STRATEGY : SAMPLE AND
MODEL
• N ITS ENTIRETY AND CREATE A MODEL ON THE SAMPLE. DOWN
SAMPLING TO THOUSANDS – OR EVEN HUNDREDS OF
THOUSANDS – OF DATA POINTS CAN MAKE MODEL RUNTIMES
FEASIBLE WHILE ALSO MAINTAINING STATISTICAL VALIDITY.2
• IF MAINTAINING CLASS BALANCE IS NECESSARY (OR ONE CLASS
NEEDS TO BE OVER/UNDER-SAMPLED), IT’S REASONABLY SIMPLE
STRATIFY THE DATA SET DURING SAMPLING.
ADVANTAGES
• SPEED RELATIVE TO WORKING ON YOUR ENTIRE DATA SET,
WORKING ON JUST A SAMPLE CAN DRASTICALLY DECREASE RUN
TIMES AND INCREASE ITERATION SPEED.
• PROTOTYPING EVEN IF YOU’LL EVENTUALLY HAVE TO RUN YOUR
MODEL ON THE ENTIRE DATA SET, THIS CAN BE A GOOD WAY TO
REFINE HYPER PARAMETERS AND DO FEATURE ENGINEERING
FOR YOUR MODEL.
• PACKAGES SINCE YOU’RE WORKING ON A NORMAL IN-MEMORY
DATA SET, YOU CAN USE ALL YOUR R PACKAGES.
DISADVANTAGES
• SAMPLING DOWN SAMPLING ISN’T TERRIBLY DIFFICULT, BUT DOES NEED TO BE
DONE WITH CARE TO ENSURE THAT THE SAMPLE IS VALID AND THAT YOU’VE
PULLED ENOUGH POINTS FROM THE ORIGINAL DATA SET.
• SCALING IF YOU’RE USING SAMPLE AND MODEL TO PROTOTYPE SOMETHING
THAT WILL LATER BE RUN ON THE FULL DATA SET, YOU’LL NEED TO HAVE A
STRATEGY (SUCH AS PUSHING COMPUTE TO THE DATA) FOR SCALING YOUR
PROTOTYPE VERSION BACK TO THE FULL DATA SET.
• TOTALS BUSINESS INTELLIGENCE (BI) TASKS FREQUENTLY ANSWER
QUESTIONS ABOUT TOTALS, LIKE THE COUNT OF ALL SALES IN A MONTH. ONE
OF THE OTHER STRATEGIES IS USUALLY A BETTER FIT IN THIS CASE.
STRATEGY 2 : CHUNK AND PULL
• STRATEGY 2: CHUNK AND PULL
• IN THIS STRATEGY, THE DATA IS CHUNKED INTO SEPARABLE UNITS AND EACH CHUNK IS
PULLED SEPARATELY AND OPERATED ON SERIALLY, IN PARALLEL, OR AFTER
RECOMBINING. THIS STRATEGY IS CONCEPTUALLY SIMILAR TO THE MAP REDUCE
• ALGORITHM. DEPENDING ON THE TASK AT HAND, THE CHUNKS MIGHT BE TIME
PERIODS, GEOGRAPHIC UNITS, OR LOGICAL LIKE SEPARATE BUSINESSES,
DEPARTMENTS, PRODUCTS, OR CUSTOMER SEGMENTS.
ADVANTAGES
• FULL DATA SET THE ENTIRE DATA SET GETS USED.
• PARALLELIZATION IF THE CHUNKS ARE RUN
SEPARATELY, THE PROBLEM IS EASY TO TREAT
AS EMBARASSINGLY PARALLEL AND MAKE USE OF
PARALLELIZATION TO SPEED RUNTIMES.
DISADVANTAGES
• NEED CHUNKS YOUR DATA NEEDS TO HAVE SEPARABLE CHUNKS
FOR CHUNK AND PULL TO BE APPROPRIATE.
• PULL ALL DATA EVENTUALLY HAVE TO PULL IN ALL DATA, WHICH
MAY STILL BE VERY TIME AND MEMORY INTENSIVE.
• STALE DATA THE DATA MAY REQUIRE PERIODIC REFRESHES
FROM THE DATABASE TO STAY UP-TO-DATE SINCE YOU’RE SAVING
A VERSION ON YOUR LOCAL MACHINE.
STRATEGY 3 : PUSH COMPUTE TO DATA
• STRATEGY 3: PUSH COMPUTE TO DATA
• IN THIS STRATEGY, THE DATA IS COMPRESSED ON THE DATABASE, AND ONLY THE COMPRESSED
DATA SET IS MOVED OUT OF THE DATABASE INTO R. IT IS OFTEN POSSIBLE TO OBTAIN
SIGNIFICANT SPEEDUPS SIMPLY BY DOING SUMMARIZATION OR FILTERING IN THE DATABASE
BEFORE PULLING THE DATA INTO R.
• SOMETIMES, MORE COMPLEX OPERATIONS ARE ALSO POSSIBLE, INCLUDING COMPUTING
HISTOGRAM AND RASTER MAPS WITH DBPLOT, BUILDING A MODEL WITH MODELDB, AND
GENERATING PREDICTIONS FROM MACHINE LEARNING MODELS WITH TIDYPREDICT.
ADVANTAGES
• USE THE DATABASE TAKES ADVANTAGE OF WHAT DATABASES ARE OFTEN
BEST AT: QUICKLY SUMMARIZING AND FILTERING DATA BASED ON A QUERY.
• MORE INFO, LESS TRANSFER BY COMPRESSING BEFORE PULLING DATA BACK
TO R, THE ENTIRE DATA SET GETS USED, BUT TRANSFER TIMES ARE FAR LESS
THAN MOVING THE ENTIRE DATA SET.
DISADVANTAGES
• DATABASE OPERATIONS DEPENDING ON WHAT DATABASE YOU’RE USING,
SOME OPERATIONS MIGHT NOT BE SUPPORTED.
• DATABASE SPEED IN SOME CONTEXTS, THE LIMITING FACTOR FOR DATA
ANALYSIS IS THE SPEED OF THE DATABASE ITSELF, AND SO PUSHING MORE
WORK ONTO THE DATABASE IS THE LAST THING ANALYSTS WANT TO DO.
THANK YOU …

More Related Content

Similar to Bigdata analytics

Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
 
Design and development of data collection
Design and development of data collectionDesign and development of data collection
Design and development of data collectionSukriti Singh
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overviewdublinx
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data AnalyticsAllen Day, PhD
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design PatternsAllen Day, PhD
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data miningDhilsath Fathima
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series PolarSeven Pty Ltd
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Alex Gilgur
 
Introduction to business analytics.pptx
Introduction to business analytics.pptxIntroduction to business analytics.pptx
Introduction to business analytics.pptxCharlou Bautista
 
Time Series Forecasting Using TBATS Model.pptx
Time Series Forecasting Using TBATS Model.pptxTime Series Forecasting Using TBATS Model.pptx
Time Series Forecasting Using TBATS Model.pptxGowthamKumar470818
 
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...StatsCommunications
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...IRJET Journal
 
Data Capture And Validation_Katalyst HLS
Data Capture And Validation_Katalyst HLSData Capture And Validation_Katalyst HLS
Data Capture And Validation_Katalyst HLSKatalyst HLS
 
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Precisely
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...israel edem
 
An emerging step : Data Warehousing to Pattern Warehousing
An emerging step : Data Warehousing to Pattern WarehousingAn emerging step : Data Warehousing to Pattern Warehousing
An emerging step : Data Warehousing to Pattern WarehousingHarshita S. Jain
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTetiana Ivanova
 

Similar to Bigdata analytics (20)

Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
Design and development of data collection
Design and development of data collectionDesign and development of data collection
Design and development of data collection
 
Mind Map Test Data Management Overview
Mind Map Test Data Management OverviewMind Map Test Data Management Overview
Mind Map Test Data Management Overview
 
2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics2013.12.12 - Sydney - Big Data Analytics
2013.12.12 - Sydney - Big Data Analytics
 
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
20131111 - Santa Monica - BigDataCamp - Big Data Design Patterns
 
Unit 3 part ii Data mining
Unit 3 part ii Data miningUnit 3 part ii Data mining
Unit 3 part ii Data mining
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series AWS Forcecast: DeepAR Predictor Time-series
AWS Forcecast: DeepAR Predictor Time-series
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
 
Introduction to business analytics.pptx
Introduction to business analytics.pptxIntroduction to business analytics.pptx
Introduction to business analytics.pptx
 
Time Series Forecasting Using TBATS Model.pptx
Time Series Forecasting Using TBATS Model.pptxTime Series Forecasting Using TBATS Model.pptx
Time Series Forecasting Using TBATS Model.pptx
 
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
IAOS 2018 - Enhanced recommendations on step-by-step procedure and approach t...
 
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
An Robust Outsourcing of Multi Party Dataset by Utilizing Super-Modularity an...
 
Data Capture And Validation_Katalyst HLS
Data Capture And Validation_Katalyst HLSData Capture And Validation_Katalyst HLS
Data Capture And Validation_Katalyst HLS
 
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
Engineering Machine Learning Data Pipelines Series: Tracking Data Lineage fro...
 
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
The Paradigm of Fog Computing with Bio-inspired Search Methods and the “5Vs” ...
 
An emerging step : Data Warehousing to Pattern Warehousing
An emerging step : Data Warehousing to Pattern WarehousingAn emerging step : Data Warehousing to Pattern Warehousing
An emerging step : Data Warehousing to Pattern Warehousing
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Time Series Analysis: Theory and Practice
Time Series Analysis: Theory and PracticeTime Series Analysis: Theory and Practice
Time Series Analysis: Theory and Practice
 

More from GuruAbirami2

Web technology slideshare
Web technology slideshareWeb technology slideshare
Web technology slideshareGuruAbirami2
 
Software engineering
Software engineeringSoftware engineering
Software engineeringGuruAbirami2
 
Java programming (1)
Java programming (1)Java programming (1)
Java programming (1)GuruAbirami2
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningGuruAbirami2
 

More from GuruAbirami2 (9)

Web technology slideshare
Web technology slideshareWeb technology slideshare
Web technology slideshare
 
Rdbms (1)
Rdbms (1)Rdbms (1)
Rdbms (1)
 
Software engineering
Software engineeringSoftware engineering
Software engineering
 
Java programming (1)
Java programming (1)Java programming (1)
Java programming (1)
 
Computer networks
Computer networksComputer networks
Computer networks
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Operating system
Operating systemOperating system
Operating system
 
Java programming
Java programmingJava programming
Java programming
 
javaprogramming
javaprogrammingjavaprogramming
javaprogramming
 

Recently uploaded

Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Chameera Dedduwage
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardsticksaastr
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Vipesco
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxmohammadalnahdi22
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Hasting Chen
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaKayode Fayemi
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMoumonDas2
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceDelhi Call girls
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxNikitaBankoti2
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubssamaasim06
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxraffaeleoman
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsaqsarehman5055
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesPooja Nehwal
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...Sheetaleventcompany
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024eCommerce Institute
 

Recently uploaded (20)

Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 

Bigdata analytics

  • 1. WORKING WITH BIG DATA BY: GURUABIRAMI.D M.SC IT DEPARTMENT OF CS & IT NADAR SARASWATHI COLLEGE OF ARTS AND SEIENCE, THENI.
  • 2. STRATEGY : SAMPLE AND MODEL • N ITS ENTIRETY AND CREATE A MODEL ON THE SAMPLE. DOWN SAMPLING TO THOUSANDS – OR EVEN HUNDREDS OF THOUSANDS – OF DATA POINTS CAN MAKE MODEL RUNTIMES FEASIBLE WHILE ALSO MAINTAINING STATISTICAL VALIDITY.2 • IF MAINTAINING CLASS BALANCE IS NECESSARY (OR ONE CLASS NEEDS TO BE OVER/UNDER-SAMPLED), IT’S REASONABLY SIMPLE STRATIFY THE DATA SET DURING SAMPLING.
  • 3.
  • 4. ADVANTAGES • SPEED RELATIVE TO WORKING ON YOUR ENTIRE DATA SET, WORKING ON JUST A SAMPLE CAN DRASTICALLY DECREASE RUN TIMES AND INCREASE ITERATION SPEED. • PROTOTYPING EVEN IF YOU’LL EVENTUALLY HAVE TO RUN YOUR MODEL ON THE ENTIRE DATA SET, THIS CAN BE A GOOD WAY TO REFINE HYPER PARAMETERS AND DO FEATURE ENGINEERING FOR YOUR MODEL. • PACKAGES SINCE YOU’RE WORKING ON A NORMAL IN-MEMORY DATA SET, YOU CAN USE ALL YOUR R PACKAGES.
  • 5. DISADVANTAGES • SAMPLING DOWN SAMPLING ISN’T TERRIBLY DIFFICULT, BUT DOES NEED TO BE DONE WITH CARE TO ENSURE THAT THE SAMPLE IS VALID AND THAT YOU’VE PULLED ENOUGH POINTS FROM THE ORIGINAL DATA SET. • SCALING IF YOU’RE USING SAMPLE AND MODEL TO PROTOTYPE SOMETHING THAT WILL LATER BE RUN ON THE FULL DATA SET, YOU’LL NEED TO HAVE A STRATEGY (SUCH AS PUSHING COMPUTE TO THE DATA) FOR SCALING YOUR PROTOTYPE VERSION BACK TO THE FULL DATA SET. • TOTALS BUSINESS INTELLIGENCE (BI) TASKS FREQUENTLY ANSWER QUESTIONS ABOUT TOTALS, LIKE THE COUNT OF ALL SALES IN A MONTH. ONE OF THE OTHER STRATEGIES IS USUALLY A BETTER FIT IN THIS CASE.
  • 6. STRATEGY 2 : CHUNK AND PULL • STRATEGY 2: CHUNK AND PULL • IN THIS STRATEGY, THE DATA IS CHUNKED INTO SEPARABLE UNITS AND EACH CHUNK IS PULLED SEPARATELY AND OPERATED ON SERIALLY, IN PARALLEL, OR AFTER RECOMBINING. THIS STRATEGY IS CONCEPTUALLY SIMILAR TO THE MAP REDUCE • ALGORITHM. DEPENDING ON THE TASK AT HAND, THE CHUNKS MIGHT BE TIME PERIODS, GEOGRAPHIC UNITS, OR LOGICAL LIKE SEPARATE BUSINESSES, DEPARTMENTS, PRODUCTS, OR CUSTOMER SEGMENTS.
  • 7.
  • 8. ADVANTAGES • FULL DATA SET THE ENTIRE DATA SET GETS USED. • PARALLELIZATION IF THE CHUNKS ARE RUN SEPARATELY, THE PROBLEM IS EASY TO TREAT AS EMBARASSINGLY PARALLEL AND MAKE USE OF PARALLELIZATION TO SPEED RUNTIMES.
  • 9. DISADVANTAGES • NEED CHUNKS YOUR DATA NEEDS TO HAVE SEPARABLE CHUNKS FOR CHUNK AND PULL TO BE APPROPRIATE. • PULL ALL DATA EVENTUALLY HAVE TO PULL IN ALL DATA, WHICH MAY STILL BE VERY TIME AND MEMORY INTENSIVE. • STALE DATA THE DATA MAY REQUIRE PERIODIC REFRESHES FROM THE DATABASE TO STAY UP-TO-DATE SINCE YOU’RE SAVING A VERSION ON YOUR LOCAL MACHINE.
  • 10. STRATEGY 3 : PUSH COMPUTE TO DATA • STRATEGY 3: PUSH COMPUTE TO DATA • IN THIS STRATEGY, THE DATA IS COMPRESSED ON THE DATABASE, AND ONLY THE COMPRESSED DATA SET IS MOVED OUT OF THE DATABASE INTO R. IT IS OFTEN POSSIBLE TO OBTAIN SIGNIFICANT SPEEDUPS SIMPLY BY DOING SUMMARIZATION OR FILTERING IN THE DATABASE BEFORE PULLING THE DATA INTO R. • SOMETIMES, MORE COMPLEX OPERATIONS ARE ALSO POSSIBLE, INCLUDING COMPUTING HISTOGRAM AND RASTER MAPS WITH DBPLOT, BUILDING A MODEL WITH MODELDB, AND GENERATING PREDICTIONS FROM MACHINE LEARNING MODELS WITH TIDYPREDICT.
  • 11.
  • 12. ADVANTAGES • USE THE DATABASE TAKES ADVANTAGE OF WHAT DATABASES ARE OFTEN BEST AT: QUICKLY SUMMARIZING AND FILTERING DATA BASED ON A QUERY. • MORE INFO, LESS TRANSFER BY COMPRESSING BEFORE PULLING DATA BACK TO R, THE ENTIRE DATA SET GETS USED, BUT TRANSFER TIMES ARE FAR LESS THAN MOVING THE ENTIRE DATA SET.
  • 13. DISADVANTAGES • DATABASE OPERATIONS DEPENDING ON WHAT DATABASE YOU’RE USING, SOME OPERATIONS MIGHT NOT BE SUPPORTED. • DATABASE SPEED IN SOME CONTEXTS, THE LIMITING FACTOR FOR DATA ANALYSIS IS THE SPEED OF THE DATABASE ITSELF, AND SO PUSHING MORE WORK ONTO THE DATABASE IS THE LAST THING ANALYSTS WANT TO DO.