SlideShare a Scribd company logo
Revenue & employment Analysis of
International Students in USA
Team Members: Priyanka Kale, Apekshit Bhingardive, Aditya Verma
Guide: Dr. Jongwook Woo
24th Annual Student Symposium, CSULA
26th February 2016
What is Big Data?
 Big data is a term that describes the large volume of data – both structured and
unstructured – that inundates a business on a day-to-day basis.
 It's not the amount of data that's important. It's what we do with the data that
matters.
 Machine Learning: big data often doesn't ask why and simply detects patterns.
 Digital footprint: big data is often a cost-free byproduct of digital interaction.
Purpose of Analysis
 To develop a system which will assist
us to determine the revenue
generated by international students.
 Examining the relationship between
new international enrollments and
institutional income at public
colleges, universities and
professional organizations in the US.
Continued..
 To understand the effects of increased international student enrollment
on net revenue generation in US
 Find out the income from Universities
 Predict the impact of international students on revenue generation
 Predict employment opportunities in the US
• Basic formula for calculating economic Benefit
Analysis is done using:
 Analysis on huge data is done using the Hadoop File system (HDFS)
 Hadoop environment using Horton Sandbox on Azure
 Using Python and HIVE [Pyhive] – iPython Notebook
 HUE
 Google Fusion tables
 WEKA Framework
Loading data into HDFS:
 File has been uploaded using Hadoop command line Interface
Hortonworks Sandbox configuration
Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory
Creating tables in HUE from existing data
Connecting HIVE through Python
 Using Ipython notebook for writing the python code
 Embedding HiveQL inside python code.
Executing the Hive script from python code:
Visualizing data with Graphs
$0.00
$5.00
$10.00
$15.00
$20.00
$25.00
ALABAMA
ALASKA
ARIZONA
ARKANSAS
CALIFORNIA
COLORADO
CONNECTICUT
DELAWARE
DISTRICTOFCOLUMBIA
FEDERATEDSTATESOFMICRONESIA
FLORIDA
GEORGIA
GUAM
HAWAII
IDAHO
ILLINOIS
INDIANA
IOWA
KANSAS
KENTUCKY
LOUISIANA
MAINE
MARSHALLISLANDS
MARYLAND
MASSACHUSETTS
MICHIGAN
MINNESOTA
MISSISSIPPI
MISSOURI
MONTANA
NEBRASKA
NEVADA
NEWHAMPSHIRE
NEWJERSEY
NEWMEXICO
NEWYORK
NORTHCAROLINA
NORTHDAKOTA
OHIO
OKLAHOMA
OREGON
PALAU
PENNSYLVANIA
PUERTORICO
RHODEISLAND
SOUTHCAROLINA
SOUTHDAKOTA
TENNESSEE
TEXAS
Billions
TOTAL EARNING FROM FEES
Major earning states
California,
9.55%
New York,
10.84%
Pennsylvania,
7.36%
Percentage of total income
California
New York
Pennsylvania
Visualizing Data in Google Fusion Tables
Supervised Learning using Classification:
 WEKA framework has been used to classify the states depending on there
total value of earnings.
 UserClassifier Algorithm provided by WEKA tool has been used to generate
graph of classification.
 Final outcome of the Hive script executed in python has been processed
using above mentioned algorithm.
Continued..
The class color differentiate the states into categories : For instance New York lies
in orange color zone with being the among the top revenue generating state
Value Proposition:
 International Students mobility trends: By 2017, the global
middle class is projected to increase its spending on
educational products and services by nearly 50 percent.
 Institutions can take this growth into consideration!
 United States a more welcoming nation!
Predictive Modelling:
Employment Analysis – How ?
 Finding data where international student work after their graduation
 Based on the number students employed in current and past years
 Number of employers hiring international students in every filed of the grad
study [Job positions]
References :
 https://nces.ed.gov/ipeds/datacenter/
 https://github.com/priya708/Project-528
 https://gitlab.com/Addylad/Project528BigData/tree/47b3e6469bff4e9b7cbe0d743cb8ad95
20dbb786/DataSource
 https://cwiki.apache.org/confluence/display/Hive/Tutorial
 https://hortonworks.com/tutorials
 http://www.nafsa.org/

Thank You!

More Related Content

Similar to Revenue & Employment Analysis of International Students in USA using PyHive

Data science for everyone
Data science for everyoneData science for everyone
Data science for everyone
Pranavathiyani G
 
Opportunities and Challenges of establishing Open Access Repositories: A case...
Opportunities and Challenges of establishing Open Access Repositories: A case...Opportunities and Challenges of establishing Open Access Repositories: A case...
Opportunities and Challenges of establishing Open Access Repositories: A case...
Sukhdev Singh
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
Nurfadhlina Mohd Sharef
 
What is Data Science? Daniel D Gutierrez
What is Data Science? Daniel D GutierrezWhat is Data Science? Daniel D Gutierrez
What is Data Science? Daniel D Gutierrez
amuletc
 
Education Big Data
Education Big DataEducation Big Data
Education Big Data
Matt Woodruff
 
FSCI Data management and data sharing
FSCI Data management and data sharingFSCI Data management and data sharing
FSCI Data management and data sharing
ARDC
 
Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...
Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...
Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...
YogeshIJTSRD
 
CODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon HodsonCODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon Hodson
AIMS (Agricultural Information Management Standards)
 
Open education data
Open education dataOpen education data
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
Fernando de Assis Rodrigues
 
Empowering those that don't "speak" data
Empowering those that don't "speak" dataEmpowering those that don't "speak" data
Empowering those that don't "speak" data
rahulbot
 
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon HodsonCODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
Academy of Science of South Africa (ASSAf)
 
The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...
The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...
The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...
Martha Russell
 
UKSG 2018 Breakout - Organisation Identifier Registry update - Pentz
UKSG 2018 Breakout - Organisation Identifier Registry update - PentzUKSG 2018 Breakout - Organisation Identifier Registry update - Pentz
UKSG 2018 Breakout - Organisation Identifier Registry update - Pentz
UKSG: connecting the knowledge community
 
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured DataIBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
Perficient, Inc.
 
Job market trends in 2016
Job market trends in 2016Job market trends in 2016
Job market trends in 2016
iECARUS
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
Semantic Web Company
 
ICT & Gender
ICT & GenderICT & Gender
ICT & Gender
eDev
 

Similar to Revenue & Employment Analysis of International Students in USA using PyHive (20)

Data science for everyone
Data science for everyoneData science for everyone
Data science for everyone
 
Opportunities and Challenges of establishing Open Access Repositories: A case...
Opportunities and Challenges of establishing Open Access Repositories: A case...Opportunities and Challenges of establishing Open Access Repositories: A case...
Opportunities and Challenges of establishing Open Access Repositories: A case...
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 
What is Data Science? Daniel D Gutierrez
What is Data Science? Daniel D GutierrezWhat is Data Science? Daniel D Gutierrez
What is Data Science? Daniel D Gutierrez
 
Introduction to Kno.e.sis Center - March 2011
Introduction to Kno.e.sis Center - March 2011Introduction to Kno.e.sis Center - March 2011
Introduction to Kno.e.sis Center - March 2011
 
Education Big Data
Education Big DataEducation Big Data
Education Big Data
 
FSCI Data management and data sharing
FSCI Data management and data sharingFSCI Data management and data sharing
FSCI Data management and data sharing
 
Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...
Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...
Big Data Management and Employee Resilience of Deposit Money Banks in Port Ha...
 
CODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon HodsonCODATA, Open Science Policies and Capacity Building by Simon Hodson
CODATA, Open Science Policies and Capacity Building by Simon Hodson
 
Open education data
Open education dataOpen education data
Open education data
 
Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...Identifying semantics characteristics of user’s interactions datasets through...
Identifying semantics characteristics of user’s interactions datasets through...
 
Empowering those that don't "speak" data
Empowering those that don't "speak" dataEmpowering those that don't "speak" data
Empowering those that don't "speak" data
 
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon HodsonCODATA: Open Data, FAIR Data and Open Science/Simon Hodson
CODATA: Open Data, FAIR Data and Open Science/Simon Hodson
 
Paul Allen Open Science
Paul Allen Open SciencePaul Allen Open Science
Paul Allen Open Science
 
The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...
The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...
The Transformation of Innovation Ecosystems in Global Metropolitan Areas A...
 
UKSG 2018 Breakout - Organisation Identifier Registry update - Pentz
UKSG 2018 Breakout - Organisation Identifier Registry update - PentzUKSG 2018 Breakout - Organisation Identifier Registry update - Pentz
UKSG 2018 Breakout - Organisation Identifier Registry update - Pentz
 
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured DataIBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
IBM Watson Content Analytics: Discover Hidden Value in Your Unstructured Data
 
Job market trends in 2016
Job market trends in 2016Job market trends in 2016
Job market trends in 2016
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
ICT & Gender
ICT & GenderICT & Gender
ICT & Gender
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 

Revenue & Employment Analysis of International Students in USA using PyHive

  • 1. Revenue & employment Analysis of International Students in USA Team Members: Priyanka Kale, Apekshit Bhingardive, Aditya Verma Guide: Dr. Jongwook Woo 24th Annual Student Symposium, CSULA 26th February 2016
  • 2. What is Big Data?  Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.  It's not the amount of data that's important. It's what we do with the data that matters.  Machine Learning: big data often doesn't ask why and simply detects patterns.  Digital footprint: big data is often a cost-free byproduct of digital interaction.
  • 3. Purpose of Analysis  To develop a system which will assist us to determine the revenue generated by international students.  Examining the relationship between new international enrollments and institutional income at public colleges, universities and professional organizations in the US.
  • 4. Continued..  To understand the effects of increased international student enrollment on net revenue generation in US  Find out the income from Universities  Predict the impact of international students on revenue generation  Predict employment opportunities in the US
  • 5. • Basic formula for calculating economic Benefit
  • 6. Analysis is done using:  Analysis on huge data is done using the Hadoop File system (HDFS)  Hadoop environment using Horton Sandbox on Azure  Using Python and HIVE [Pyhive] – iPython Notebook  HUE  Google Fusion tables  WEKA Framework
  • 7. Loading data into HDFS:  File has been uploaded using Hadoop command line Interface
  • 8. Hortonworks Sandbox configuration Number of nodes: 3 Size : Basic A4 with 8 cores 14 Gb memory
  • 9. Creating tables in HUE from existing data
  • 10. Connecting HIVE through Python  Using Ipython notebook for writing the python code  Embedding HiveQL inside python code.
  • 11. Executing the Hive script from python code:
  • 12. Visualizing data with Graphs $0.00 $5.00 $10.00 $15.00 $20.00 $25.00 ALABAMA ALASKA ARIZONA ARKANSAS CALIFORNIA COLORADO CONNECTICUT DELAWARE DISTRICTOFCOLUMBIA FEDERATEDSTATESOFMICRONESIA FLORIDA GEORGIA GUAM HAWAII IDAHO ILLINOIS INDIANA IOWA KANSAS KENTUCKY LOUISIANA MAINE MARSHALLISLANDS MARYLAND MASSACHUSETTS MICHIGAN MINNESOTA MISSISSIPPI MISSOURI MONTANA NEBRASKA NEVADA NEWHAMPSHIRE NEWJERSEY NEWMEXICO NEWYORK NORTHCAROLINA NORTHDAKOTA OHIO OKLAHOMA OREGON PALAU PENNSYLVANIA PUERTORICO RHODEISLAND SOUTHCAROLINA SOUTHDAKOTA TENNESSEE TEXAS Billions TOTAL EARNING FROM FEES
  • 13. Major earning states California, 9.55% New York, 10.84% Pennsylvania, 7.36% Percentage of total income California New York Pennsylvania
  • 14. Visualizing Data in Google Fusion Tables
  • 15. Supervised Learning using Classification:  WEKA framework has been used to classify the states depending on there total value of earnings.  UserClassifier Algorithm provided by WEKA tool has been used to generate graph of classification.  Final outcome of the Hive script executed in python has been processed using above mentioned algorithm.
  • 16. Continued.. The class color differentiate the states into categories : For instance New York lies in orange color zone with being the among the top revenue generating state
  • 17. Value Proposition:  International Students mobility trends: By 2017, the global middle class is projected to increase its spending on educational products and services by nearly 50 percent.  Institutions can take this growth into consideration!  United States a more welcoming nation!
  • 19. Employment Analysis – How ?  Finding data where international student work after their graduation  Based on the number students employed in current and past years  Number of employers hiring international students in every filed of the grad study [Job positions]
  • 20. References :  https://nces.ed.gov/ipeds/datacenter/  https://github.com/priya708/Project-528  https://gitlab.com/Addylad/Project528BigData/tree/47b3e6469bff4e9b7cbe0d743cb8ad95 20dbb786/DataSource  https://cwiki.apache.org/confluence/display/Hive/Tutorial  https://hortonworks.com/tutorials  http://www.nafsa.org/ 