SlideShare a Scribd company logo
1 of 39
COLLEGE SCORECARD
ANALYSIS USING HIVE
Presented By:
Abhishek Kumar
Anurag Anand
Aditya Patil
Siva Sai
TABLE OF CONTENT
• What is BigData
• College Scorecard
• What is our Project
• What technology we used
• SDLC
• Hive Queries
• Graphs and Final Results
• Conclusion
• Github and Data Source
• References
BIG DATA IN AROUND THE WORLD
BIG DATA ECO-SYSTEMS
A Applications Of Big Data
Homeland
Security
Smarter
Healthcare
Sales
Telecom
Manufacturing
Traffic Control Analytics
Search
Quality
DATA SET SOURCE -
http://catalog.data.gov/dataset/road-traffic-injuries-2002-2010
THE RAW DATA
COLLEGE SCORECARDS MAKE IT EASIER FOR
STUDENTS TO SEARCH FOR A COLLEGE THAT IS A
GOOD FIT FOR THEM. THEY CAN USE THE COLLEGE
SCORECARD TO FIND OUT
• Popular Colleges among students
• Affordability
• Net Price
• No of enrollments
• State with Most number of University
TECHNOLOGIES WHAT HAVE WE USED
• Microsoft Power BI
• Apache Ambari - Version 2.1.2
• Hortonworks Sandbox with HDP 2.4
• HIVE
• Microsoft Excel
• Putty - Release 0.65
• Google Fusion Table
SYSTEM DEVELOPMENT LIFE CYCLE
SYSTEM DEVELOPMENT LIFE CYCLE
Planning
• Defined Scope
• Requirement
Gathering
• Time
Estimation
Analysis
• Gathered
data from
Data.Gov
Design
• Gathered required
softwares such as
Azure, Power View,
Microsoft Power BI
Impleme
ntation
• Developed
Queries &
Created
Tables
Testing
• Analysis
made on the
created
Tables using
graph and
Map
WHAT IS OUR PROJECT
• Data analysis is done on College Student Data
Cost of college Tution Fee
Admission Rate
Popular Colleges
Popular States
Biggest Universities
• Data analysis is done by using HDFS Cluster, HiveQL
• Analyzed data will be displayed using MS Power BI & Power Query in the form
of Graphs and Maps.
CREATING THE CLUSTER IN SANDBOX
USING PUTTY TO LOGIN TO CLUSTER
AND CHECK HIVE STATUS
PROCESS OF ANALYSIS
Step 1- Data CLEANING by removing unwanted an NULL column.
Step-2- LOADING Data to HDFS
STEP-3- Running HQL Queries
Steo-4-Saving results in CVS files
Step-5- Combining the results into one Excel file.
Step-6- Analyzing data through Power BI & EXCEL.
FLOWCHART OF DATA ANALYSIS
DOWNLOAD DATA
FROM DATA.GOV
Uploaded the txt files
into HDFS Using Ambari
Created tables using
HiveQL
Analysed data using the
query, Microsoft BI and
powerview
Analysis of Bar and line
Graphs.
DATA UPLOAD VIA AMBARI
• We have put data in HDFS using Ambari
CREATING THE TABLES
CREATING A COST TABLE
• CREATE TABLE newcost2011(
• UNITID INT, INSTNM STRING,CITY STRING,CONTROL INT,ADM_RATE
FLOAT,ADM_RATE_ALL FLOAT,TUITFTE FLOAT,TUITIONFEE_IN
FLOAT,TUITIONFEE_OUT FLOAT,COSTT4_A FLOAT, UGDS INT)
• COMMENT 'This is the Student 2011 data'
• ROW FORMAT DELIMITED
• FIELDS TERMINATED BY 't'
• STORED AS TEXTFILE;
• LOAD DATA INPATH '/tmp/newcost2011.txt' OVERWRITE INTO TABLE
newcost2011;
OUTPUT OF QUERY
RESULTS OF THE QUERY
DOWNLOADING RESULTS INTO CSV
FORMAT
HIVE QUERY FOR SORTING DATA
MICROSOFT POWER BI USED FOR
DATA ANALYSIS
COMBINING THE RESULT QUERY
TOGETHER FOR ANALYSIS
GRAPHICAL REPRESENTATION USING POWER BI
GRAPHICAL REPRESENTATION USING POWERVIEW
GRAPHICAL REPRESENTATION USING GOOGLE
FUSION TABLES
GRAPHICAL REPRESENTATION USING POWER BI
MOST COSTLY UNIVERSITY IS IN EAST COST
OVERALL CHEAPEST COLLEGE
BEST ADMIT RATES AMONG
COLLEGES
CONCLUSION
SATE WITH HIGHEST NUMBER OF
UNIVERSITY
MOST POPULAR COLLEGE MAJORS
CONCLUSION
• COSTLIET UNIVERSITY is NEW YORK UNIVERSITY
• Most of costly university are located in east cost i.e New York and nearby area
• BIGGEST UNIVERSITY of Phoenix-Online Campus
• Biggest Major Is business.
• STATE WITH MOST UNIVERSITY WITH 10000 student is California i.e 16
• CUNY College of Staten Island has highest admission rate i.e. its easiet to get
admission here.
• CHEAPEST College is High Point University
LINK
• GITHUB Link: (Code Only)
https://github.com/abhimisedu/CIS520GroupF
• Dataset Link: (Dataset Size – 1580 MB uncompressed)
http://catalog.data.gov/dataset/college-scorecard
REFERENCE
• https://azure.microsoft.com
• www.Data.gov
• http://www.lynda.com/Hadoop-tutorials/
• http://www.tutorialspoint.com/big_data_tutorials.htm
• http://searchstorage.techtarget.com/guides/Big-data-tutorial-Everything-you-need-
to-know
THANK YOU
ANY QUERIES?

More Related Content

What's hot

Performance of fractal tree databases
Performance of fractal tree databasesPerformance of fractal tree databases
Performance of fractal tree databasesLixun Peng
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsBryan Bende
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Transpilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraTranspilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraJoshua Shinavier
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
Introduction to hazelcast
Introduction to hazelcastIntroduction to hazelcast
Introduction to hazelcastEmin Demirci
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j
 
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...confluent
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...Mahan Hosseinzadeh
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph DatabaseTobias Lindaaker
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical OverviewSylvester John
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseXpand IT
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksDatabricks
 
Zookeeper 활용 nifi clustering
Zookeeper 활용 nifi clusteringZookeeper 활용 nifi clustering
Zookeeper 활용 nifi clusteringNoahKIM36
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Julian Hyde
 

What's hot (19)

Performance of fractal tree databases
Performance of fractal tree databasesPerformance of fractal tree databases
Performance of fractal tree databases
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Transpilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing HydraTranspilers Gone Wild: Introducing Hydra
Transpilers Gone Wild: Introducing Hydra
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Introduction to hazelcast
Introduction to hazelcastIntroduction to hazelcast
Introduction to hazelcast
 
Neo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic trainingNeo4j GraphDay Seattle- Sept19- neo4j basic training
Neo4j GraphDay Seattle- Sept19- neo4j basic training
 
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...
Prophet at Scale: Using Prophet at scale to tune and forecast time series at ...
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
Kafka Technical Overview
Kafka Technical OverviewKafka Technical Overview
Kafka Technical Overview
 
Mongo DB: Operational Big Data Database
Mongo DB: Operational Big Data DatabaseMongo DB: Operational Big Data Database
Mongo DB: Operational Big Data Database
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
Scaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on DatabricksScaling Data Analytics Workloads on Databricks
Scaling Data Analytics Workloads on Databricks
 
Zookeeper 활용 nifi clustering
Zookeeper 활용 nifi clusteringZookeeper 활용 nifi clustering
Zookeeper 활용 nifi clustering
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!Don’t optimize my queries, optimize my data!
Don’t optimize my queries, optimize my data!
 

Viewers also liked

APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna HiremaneIntelAPAC
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses HadoopNarayan Bharadwaj
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
DFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure ChartsDFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure ChartsSOuvagya Kumar Jena
 
Student result mamagement
Student result mamagementStudent result mamagement
Student result mamagementMickey
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop TutorialEdureka!
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Viewers also liked (16)

APAC Big Data Strategy RadhaKrishna Hiremane
APAC Big Data  Strategy RadhaKrishna  HiremaneAPAC Big Data  Strategy RadhaKrishna  Hiremane
APAC Big Data Strategy RadhaKrishna Hiremane
 
How Salesforce.com uses Hadoop
How Salesforce.com uses HadoopHow Salesforce.com uses Hadoop
How Salesforce.com uses Hadoop
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Structure chart
Structure chartStructure chart
Structure chart
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
DFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure ChartsDFD, Decision Table, Decision Chart, Structure Charts
DFD, Decision Table, Decision Chart, Structure Charts
 
Student result mamagement
Student result mamagementStudent result mamagement
Student result mamagement
 
What is big data?
What is big data?What is big data?
What is big data?
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to Big Data Project using HIVE - college scorecard

Mobile data collection using odk
Mobile data collection using odkMobile data collection using odk
Mobile data collection using odkKARUMBA GATAMA
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...CIARD Movement
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainKamal A
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityOpen Cyber University of Korea
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan PirvuDataScienceConferenc1
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017Jongwook Woo
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceDeepak Chandramouli
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkJim Kaplan CIA CFE
 
Resume ricky jairath
Resume   ricky jairathResume   ricky jairath
Resume ricky jairathRICKY JAIRATH
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...nimak
 
School performance management analytics in cloud
School performance management analytics in cloudSchool performance management analytics in cloud
School performance management analytics in cloudNitai Partners Inc
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data PipelineJesus Rodriguez
 

Similar to Big Data Project using HIVE - college scorecard (20)

Mobile data collection using odk
Mobile data collection using odkMobile data collection using odk
Mobile data collection using odk
 
odkk.pptx
odkk.pptxodkk.pptx
odkk.pptx
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Bigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domainBigdata Hadoop project payment gateway domain
Bigdata Hadoop project payment gateway domain
 
Proof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics InteroperabilityProof of Concept for Learning Analytics Interoperability
Proof of Concept for Learning Analytics Interoperability
 
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
[DSC DACH 23] The Modern Data Stack - Bogdan Pirvu
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J ConferenceNodes2020 | Graph of enterprise_metadata | NEO4J Conference
Nodes2020 | Graph of enterprise_metadata | NEO4J Conference
 
When Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t WorkWhen Data Visualizations and Data Imports Just Don’t Work
When Data Visualizations and Data Imports Just Don’t Work
 
Business Intelligence in Laymen terms
Business Intelligence in Laymen termsBusiness Intelligence in Laymen terms
Business Intelligence in Laymen terms
 
Resume ricky jairath
Resume   ricky jairathResume   ricky jairath
Resume ricky jairath
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
 
School performance management analytics in cloud
School performance management analytics in cloudSchool performance management analytics in cloud
School performance management analytics in cloud
 
Building a Big Data Pipeline
Building a Big Data PipelineBuilding a Big Data Pipeline
Building a Big Data Pipeline
 

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 

Big Data Project using HIVE - college scorecard

  • 1. COLLEGE SCORECARD ANALYSIS USING HIVE Presented By: Abhishek Kumar Anurag Anand Aditya Patil Siva Sai
  • 2. TABLE OF CONTENT • What is BigData • College Scorecard • What is our Project • What technology we used • SDLC • Hive Queries • Graphs and Final Results • Conclusion • Github and Data Source • References
  • 3. BIG DATA IN AROUND THE WORLD
  • 5. A Applications Of Big Data Homeland Security Smarter Healthcare Sales Telecom Manufacturing Traffic Control Analytics Search Quality
  • 6. DATA SET SOURCE - http://catalog.data.gov/dataset/road-traffic-injuries-2002-2010
  • 8. COLLEGE SCORECARDS MAKE IT EASIER FOR STUDENTS TO SEARCH FOR A COLLEGE THAT IS A GOOD FIT FOR THEM. THEY CAN USE THE COLLEGE SCORECARD TO FIND OUT • Popular Colleges among students • Affordability • Net Price • No of enrollments • State with Most number of University
  • 9. TECHNOLOGIES WHAT HAVE WE USED • Microsoft Power BI • Apache Ambari - Version 2.1.2 • Hortonworks Sandbox with HDP 2.4 • HIVE • Microsoft Excel • Putty - Release 0.65 • Google Fusion Table
  • 11. SYSTEM DEVELOPMENT LIFE CYCLE Planning • Defined Scope • Requirement Gathering • Time Estimation Analysis • Gathered data from Data.Gov Design • Gathered required softwares such as Azure, Power View, Microsoft Power BI Impleme ntation • Developed Queries & Created Tables Testing • Analysis made on the created Tables using graph and Map
  • 12. WHAT IS OUR PROJECT • Data analysis is done on College Student Data Cost of college Tution Fee Admission Rate Popular Colleges Popular States Biggest Universities • Data analysis is done by using HDFS Cluster, HiveQL • Analyzed data will be displayed using MS Power BI & Power Query in the form of Graphs and Maps.
  • 13. CREATING THE CLUSTER IN SANDBOX
  • 14. USING PUTTY TO LOGIN TO CLUSTER AND CHECK HIVE STATUS
  • 15. PROCESS OF ANALYSIS Step 1- Data CLEANING by removing unwanted an NULL column. Step-2- LOADING Data to HDFS STEP-3- Running HQL Queries Steo-4-Saving results in CVS files Step-5- Combining the results into one Excel file. Step-6- Analyzing data through Power BI & EXCEL.
  • 16. FLOWCHART OF DATA ANALYSIS DOWNLOAD DATA FROM DATA.GOV Uploaded the txt files into HDFS Using Ambari Created tables using HiveQL Analysed data using the query, Microsoft BI and powerview Analysis of Bar and line Graphs.
  • 17. DATA UPLOAD VIA AMBARI • We have put data in HDFS using Ambari
  • 19. CREATING A COST TABLE • CREATE TABLE newcost2011( • UNITID INT, INSTNM STRING,CITY STRING,CONTROL INT,ADM_RATE FLOAT,ADM_RATE_ALL FLOAT,TUITFTE FLOAT,TUITIONFEE_IN FLOAT,TUITIONFEE_OUT FLOAT,COSTT4_A FLOAT, UGDS INT) • COMMENT 'This is the Student 2011 data' • ROW FORMAT DELIMITED • FIELDS TERMINATED BY 't' • STORED AS TEXTFILE; • LOAD DATA INPATH '/tmp/newcost2011.txt' OVERWRITE INTO TABLE newcost2011;
  • 21. RESULTS OF THE QUERY
  • 23. HIVE QUERY FOR SORTING DATA
  • 24. MICROSOFT POWER BI USED FOR DATA ANALYSIS
  • 25. COMBINING THE RESULT QUERY TOGETHER FOR ANALYSIS
  • 28. GRAPHICAL REPRESENTATION USING GOOGLE FUSION TABLES
  • 29.
  • 30. GRAPHICAL REPRESENTATION USING POWER BI MOST COSTLY UNIVERSITY IS IN EAST COST
  • 32. BEST ADMIT RATES AMONG COLLEGES
  • 34. SATE WITH HIGHEST NUMBER OF UNIVERSITY
  • 36. CONCLUSION • COSTLIET UNIVERSITY is NEW YORK UNIVERSITY • Most of costly university are located in east cost i.e New York and nearby area • BIGGEST UNIVERSITY of Phoenix-Online Campus • Biggest Major Is business. • STATE WITH MOST UNIVERSITY WITH 10000 student is California i.e 16 • CUNY College of Staten Island has highest admission rate i.e. its easiet to get admission here. • CHEAPEST College is High Point University
  • 37. LINK • GITHUB Link: (Code Only) https://github.com/abhimisedu/CIS520GroupF • Dataset Link: (Dataset Size – 1580 MB uncompressed) http://catalog.data.gov/dataset/college-scorecard
  • 38. REFERENCE • https://azure.microsoft.com • www.Data.gov • http://www.lynda.com/Hadoop-tutorials/ • http://www.tutorialspoint.com/big_data_tutorials.htm • http://searchstorage.techtarget.com/guides/Big-data-tutorial-Everything-you-need- to-know