Data cloud-lab-version-v0012020

•Download as DOCX, PDF•

0 likes•29 views

mdcdwh

Data & Analytics

Big Data
Processing
Training,
R&DPower by Data Cloud Lab
[Bigdata isa fieldthattreats ways to analyze, systematically extract
informationfrom,orotherwisedeal withdata sets that are too large
or complex to be dealt with by traditional data-processing
application software. Big data was originally associated with three
key concepts: volume, variety, and velocity.]

Data Set – 1M Data:
1. Healthcare_ [Record – 46935]
2. Weather-history - [Record – 4573]
3. World Demography - [Record – 5000]
4. Census Tracts 2010 - [Record -21
5. Animal_Services_Intake_Data - [Record -187594]
6. Average_Daily_Traffic_Counts - [Record -1280]
7. Acciental_Durg_Related_Death - [Record -5106]
8. Retails Store - [Record – 182728]
customer12435,category_59,Departments_7,orders_68883,products_1345,order_items_99999
9. Popular_Baby_Names - [Record – 46935]
10. SAT__College_Board__2010_School_Level_Results - Total Data [Record -461]
11. Sales_Tax_Rates - [Record -1911]
12. Restaurants [Record -1328]
13. Transportation : 34_drivers , 17076_truck_event_text_partition , 1768_timesheet - [Record -
18878]
14. Acciental_Durg_Related_Death - [Record -5106]
15. Census Tracts 2010 - [Record -216]
16. Employees_Salary - [Record – 824]
17. Customer_transactional_spending - [Record – 60000]
18. Customer_Order - [Record – 1000]
19. Employees_Salary - [Record – 824]

Power by: Software Linux, Hadoop Big Data, Hive & Power BI)
Case Study 01: Healthcare [Record – 46935]
Raw Data (Date, Sex, Diseases, Age) :
12/10/1950,M,Diabetes,78
12/10/1984,F,PCOS,67
712/11/1940,M,Fever,90
12/12/1950,F,Cold,88
12/13/1960,M,Blood Pressure,76
Result :
Blood Pressure,5215
Cold,5215
Diabetes,5215
Fever,15645
Malaria,5215
PCOS,5215
Swine Flu,5215
Data Visualizations:
Backend Data Process by HiveQL command:
select diseases, count(*) from healthgroupby diseases;
WARNING: Hive-on-MR is deprecated inHive2 and may not beavailableinthefuture versions. Considerusing a different execution engine(i.e.
spark, tez) or using Hive 1.X releases.
Query ID =hduser_20200125220715_338a065f-f176-4464-b03e-28fb18dc66f5
Total jobs =1
Launching Job 1 outof1
Number ofreducetasks not specified. Estimated frominputdata size: 1
In order to changethe average load for a reducer (inbytes): , set hive.exec.reducers.bytes.per.reducer=<number>
In order to limitthemaximum number ofreducers: , sethive.exec.reducers.max=<number>
In order to set a constant numberofreducers: , setmapreduce.job.reduces=<number>
Job running in-process (localHadoop) , 2020-01-25 22:07:18,630Stage-1 map =100%, reduce=100%
Ended Job =job_local171670995_0001, Moving data to localdirectory /home/hduser/Dataset
MapReduceJobs Launched: , Stage-Stage-1: HDFS Read:2336322 HDFS Write: 0 SUCCESS, TotalMapReduce CPU TimeSpent:0 msec, OK
Time taken: 3.617seconds

What's hot

John Gladstone - ‎EMEA Healthcare Pathways and Alliances, NetappHIMSS UK

Starting the Hadoop Journey at a Global Leader in Cancer ResearchDataWorks Summit/Hadoop Summit

Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...Stephen Allan Weitzman

8 1open ehr-helsinki_29oct2018Sosiaali- ja terveysministeriö / yleiset

Data Management Planning and Data Compliance Reporting with IEDAVicki Ferrini

Sapiens data science and snowflake data warehouseLarry Heminger

BIG Data & Hadoop Applications in HealthcareSkillspeed

Big data and the Healthcare Sector Chris Groves

Lecture 7Shani729

Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCampBigDataCamp

Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...David Nickelson, PsyD, JD

What's hot (11)

John Gladstone - ‎EMEA Healthcare Pathways and Alliances, Netapp

Starting the Hadoop Journey at a Global Leader in Cancer Research

Interoperability Solution - Hybrid Update -- From Pahe II and III to Post Mar...

8 1open ehr-helsinki_29oct2018

Data Management Planning and Data Compliance Reporting with IEDA

Sapiens data science and snowflake data warehouse

BIG Data & Hadoop Applications in Healthcare

Big data and the Healthcare Sector

Lecture 7

Steve Woolege Of Aster Data Gives Lightning Talk At BigDataCamp

Aimia: The Big Deal About Big Data -- How It Will Transform Pharma Meeting an...

Similar to Data cloud-lab-version-v0012020

IRJET- Predictive Analysis and Healthcare of DiabetesIRJET Journal

PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESIRJET Journal

Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...VMware Tanzu

Mr. Neil Hammerschmidt - USDA-APHIS IT UpdateJohn Blue

IRJET- A Survey on Mining of Tweeter Data for Predicting User BehaviorIRJET Journal

IRJET- Advances in Data Mining: Healthcare ApplicationsIRJET Journal

IRJET- A Survey on Big Data Frameworks and Approaches in Health Care SectorIRJET Journal

Final Presentation.pptxsainathk18

Big Data Testing Using Hadoop PlatformIRJET Journal

76 s201915IJRAT

IDC Perspectives on Big Data Outside of HPCinside-BigData.com

IRJET- A Scenario on Big DataIRJET Journal

Private Hidden Data for Health CareIRJET Journal

Improving the Business of Healthcare through Better Analytics Pentaho

Shrink your DB and increase SAP BW performanceDataVard

IRJET- Medical Data MiningIRJET Journal

Innovative project1LillySheebaS1

Lesson 1 introduction to_big_data_and_hadoop.pptxPankajkumar496281

Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...EMC

HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES cscpconf

Similar to Data cloud-lab-version-v0012020 (20)

IRJET- Predictive Analysis and Healthcare of Diabetes

PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES

Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Ana...

Mr. Neil Hammerschmidt - USDA-APHIS IT Update

IRJET- A Survey on Mining of Tweeter Data for Predicting User Behavior

IRJET- Advances in Data Mining: Healthcare Applications

IRJET- A Survey on Big Data Frameworks and Approaches in Health Care Sector

Final Presentation.pptx

Big Data Testing Using Hadoop Platform

76 s201915

IDC Perspectives on Big Data Outside of HPC

IRJET- A Scenario on Big Data

Private Hidden Data for Health Care

Improving the Business of Healthcare through Better Analytics

Shrink your DB and increase SAP BW performance

IRJET- Medical Data Mining

Innovative project1

Lesson 1 introduction to_big_data_and_hadoop.pptx

Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...

HEALTH CARE DATA WAREHOUSE SYSTEM ARCHITECTURE FOR INFLUENZA (FLU) DISEASES

Recently uploaded

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

04242024_CCC TUG_Joins and Relationshipsccctableauusergroup

Brighton SEO | April 2024 | Data StorytellingNeil Barnes

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor

Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

Recently uploaded (20)

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

100-Concepts-of-AI by Anupama Kate .pptx

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Call Girls In Mahipalpur O9654467111 Escorts Service

04242024_CCC TUG_Joins and Relationships

Brighton SEO | April 2024 | Data Storytelling

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati

Predicting Employee Churn: A Data-Driven Approach Project Presentation

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改

{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

RA-11058_IRR-COMPRESS Do 198 series of 1998

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

Data cloud-lab-version-v0012020

1. Big Data Processing Training, R&DPower by Data Cloud Lab [Bigdata isa fieldthattreats ways to analyze, systematically extract informationfrom,orotherwisedeal withdata sets that are too large or complex to be dealt with by traditional data-processing application software. Big data was originally associated with three key concepts: volume, variety, and velocity.]

2. Data Set – 1M Data: 1. Healthcare_ [Record – 46935] 2. Weather-history - [Record – 4573] 3. World Demography - [Record – 5000] 4. Census Tracts 2010 - [Record -21 5. Animal_Services_Intake_Data - [Record -187594] 6. Average_Daily_Traffic_Counts - [Record -1280] 7. Acciental_Durg_Related_Death - [Record -5106] 8. Retails Store - [Record – 182728] customer12435,category_59,Departments_7,orders_68883,products_1345,order_items_99999 9. Popular_Baby_Names - [Record – 46935] 10. SAT__College_Board__2010_School_Level_Results - Total Data [Record -461] 11. Sales_Tax_Rates - [Record -1911] 12. Restaurants [Record -1328] 13. Transportation : 34_drivers , 17076_truck_event_text_partition , 1768_timesheet - [Record - 18878] 14. Acciental_Durg_Related_Death - [Record -5106] 15. Census Tracts 2010 - [Record -216] 16. Employees_Salary - [Record – 824] 17. Customer_transactional_spending - [Record – 60000] 18. Customer_Order - [Record – 1000] 19. Employees_Salary - [Record – 824]

3. Power by: Software Linux, Hadoop Big Data, Hive & Power BI) Case Study 01: Healthcare [Record – 46935] Raw Data (Date, Sex, Diseases, Age) : 12/10/1950,M,Diabetes,78 12/10/1984,F,PCOS,67 712/11/1940,M,Fever,90 12/12/1950,F,Cold,88 12/13/1960,M,Blood Pressure,76 Result : Blood Pressure,5215 Cold,5215 Diabetes,5215 Fever,15645 Malaria,5215 PCOS,5215 Swine Flu,5215 Data Visualizations: Backend Data Process by HiveQL command: select diseases, count(*) from healthgroupby diseases; WARNING: Hive-on-MR is deprecated inHive2 and may not beavailableinthefuture versions. Considerusing a different execution engine(i.e. spark, tez) or using Hive 1.X releases. Query ID =hduser_20200125220715_338a065f-f176-4464-b03e-28fb18dc66f5 Total jobs =1 Launching Job 1 outof1 Number ofreducetasks not specified. Estimated frominputdata size: 1 In order to changethe average load for a reducer (inbytes): , set hive.exec.reducers.bytes.per.reducer=<number> In order to limitthemaximum number ofreducers: , sethive.exec.reducers.max=<number> In order to set a constant numberofreducers: , setmapreduce.job.reduces=<number> Job running in-process (localHadoop) , 2020-01-25 22:07:18,630Stage-1 map =100%, reduce=100% Ended Job =job_local171670995_0001, Moving data to localdirectory /home/hduser/Dataset MapReduceJobs Launched: , Stage-Stage-1: HDFS Read:2336322 HDFS Write: 0 SUCCESS, TotalMapReduce CPU TimeSpent:0 msec, OK Time taken: 3.617seconds

Data cloud-lab-version-v0012020

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Data cloud-lab-version-v0012020

Similar to Data cloud-lab-version-v0012020 (20)

Recently uploaded

Recently uploaded (20)

Data cloud-lab-version-v0012020