Data management

Student Data
Management
for
Data Gyan
PROJECT PURPOSE:
To focus on the student's data from
various data sources collected from trusted websites
and updating them in the database featuring the key
aspects of students’ potentiality.
Source of Data:
Internal Student data from the firm
and the external data from different online sources
such as Facebook, LinkedIn, and Naukri.com.
Tools Used:
 Microsoft Excel
 Microsoft SQL Server Database
 Tableau Desktop
PROJECT TITLE

 Acquiring the student's data from different sources and updating them
in the database.
 The overall strategy of the project is to find out the insights from the
data and to identify the pivotal features of the students which would
bring out each students' potentiality.
 All this insights obtained performing the analysis will be presented to
the management team.

 Data collection was done mainly in two parts which are as follows:
1. Internal Data:
As the first step involves the data collection, the Management team
provided us with the internal data for the analytical process.
2. External Data:
External data were extracted from various online sources such as
LinkedIn, Facebook, Naukri & both the internal and External data was jotted
down to Microsoft Excel for further process.

Internal Data:
We received the internal data from the management team of Data Gyan and the attributes
pertaining to it were Names, Gender, Email, Contact No, Educational background, Age, courses pursued, Fees,
Installment(Y/N), Preferences(Weekend or Weekdays).
LinkedIn:
As a external data sources we gathered the data through the marketing team, and attributes pertaining to it were
Names, Gender, Email, Contact No, Age, Course pursued, Fees, Educational background, Experience in Years.
Naukri:
Names, Gender, Email, Contact No, state, Fees, Educational background, skills, Experience in Years.
Facebook:
Names, Gender, Email, Contact No, state, Educational background, Hobbies.
Once the data were extracted from various sources which includes both internal and external data, were
assembled in Excel sheets. After the data was collaborated into different excel sheets, then the data was
further merged into a single spreadsheet called Master data.

Detail description of the steps followed for collection of data and collating it
into the master database is as follows:
 Initially a template in MS-excel for the master data sheet was prepared and
few important attributes which need to be captured in it and subsequently into
the database were finalized after thorough discussions with the team.
 Attributes such as Name, Email, Contact no, Age, Gender, State, Course pursued,
Skills, Pass out year, Pass out Month, Educational background were collected for
500+ students from Internal & External data into Master data sheet.

Now before transferring data into the master sheet, the data was cleansed ,
modified in order to make the data uniform.
The attributes whose data were modified are as follows:
LinkedIn: Name, Email, course, Skills, University & Experience.
Naukri: Name, Email, Qualifications, Specializations, Fees, Pass out year, Pass out
month, Skills, State, University, Experience in years.
Facebook: Name, Email, State, University, Hobbies.
Since some sources of data contains only Full name whereas some sources of data
contains both first name and last name but not the complete name, hence we need to
derive the full name from those data where both first name and last name are given
and need to split those data into first and last names where full name is given.

Master data must contain the cleaned data, so for the formation we have followed few methods
such as, Data cleansing, Data profiling, Data mining etc., to purify the data completely.
The methods followed for the procurement are as follows:
CONCATENATION

CONCATENATION:
Concatenation is the process of merging two or more strings into a single
output.
 In the above reference concatenation was used to merge ‘First name’ & ‘Last
name’ to get the “ Full Name”.
 In the above reference concatenation was used to merge ‘First name’ & ‘Email
(from test table)’ to get the final resulted Email_Id.

VLOOKUP (Vertical Look Up):
It is a function that makes Excel search for a certain value in a column in order to
return a value from a different column in the same row.
 In the above references VLOOKUP was used for the skills & course.
 As it can be seen VLOOKUP looked up for a value from the test table & by selecting the table
array, column no & the Boolean value False, which has return the exact match for the particular
column.

Nested IF:
Nested if is used for testing multiple IF function. In Nested IF we can test up
to 64 condition/ criteria.

 For the given references Nested IF was used to find the data of a particular column.
 Initially a condition was given from a related column using the IF statement and then a
True Value was Mentioned. Another condition was mentioned using another IF
statement similarly with true value & In the Final IF statement both the True & False
value were jotted down to get the Output.

IF Condition:
 It is a logical operator used for decision making which test the content of the particular
cell and returns a ‘True or False’ value.
 For the given reference a condition was mentioned using a IF statement, using a True
and False value to get the final output.

After data is retrieved and combined from multiple sources (extracted), cleaned and
formatted (transformed), it is then loaded into a storage system. In this case we used SQL
server Database.
Steps involved in Importing the Data:

Procedure for importing data:
Step 1: Expand database > Capstone Project (Database)> Tasks > Import Data.
Step 2: Data Source (Microsoft Excel) > File Path > Excel Version & click on next.
Step 3: Select Master data sheet and we can edit the mappings (optional) & finally click on Next.
Step 4: Finally the data gets loaded into the SQL server and success message is displayed after the data
has been loaded into the destination.

Skills wise Student count:
Query :
select Skills, count(skills) as total_skills_count from Masterdata group by
Skills order by total_skills_count desc;
OUTPUT:

Age wise Student count:
Query :
OUTPUT:
select Age,count(age) as total_age_count from Masterdata group by
Age order by total_age_count desc;

Fees wise Student count:
Query :
OUTPUT:
select Fees,count(fees) as total_fees_count from Masterdata group by
Fees order by total_fees_count desc;

Preference wise Student count:
Query :
OUTPUT:
select Preference, count(preference) as total_preference_count from Masterdata
group by Preference order by total_preference_count desc;

Installments wise Student count:
Query :
OUTPUT:
select [Installment_Y/N], count([Installment_Y/N]) as total_installment_count
from Masterdata group by [Installment_Y/N] order by total_installment_count
desc;

Specializations wise Student count:
Query :
OUTPUT:
select Specializations, count(specializations) as total_specializations_count
from Masterdata group by Specializations order by total_specializations_count
desc;

Experience wise Student count:
Query :
OUTPUT:
select Experience_in_years, count(Experience_in_years) as total_experience_count
from Masterdata group by Experience_in_years order by total_experience_count
desc;

Course wise Student count:
Query :
OUTPUT:
select Course, count(course) as total_course_count from Masterdata group by
Course order by total_course_count desc;

University wise Student count:
Query :
OUTPUT:
select University, count(university) as total_unversity_count from Masterdata
group by University order by total_unversity_count desc;

Qualifications wise Student count:
Query :
OUTPUT:
select Qualifications, count(Qualifications) as total_qualifications_count from
Masterdata group by Qualifications order by total_qualifications_count desc;

State wise Student count:
Query :
OUTPUT:
select State, count(state) as total_state_count from Masterdata group by
State order by total_state_count desc;

Gender wise Student count:
Query :
OUTPUT:
select Gender, count(gender) as total_gender_count from Masterdata group by
Gender order by total_gender_count desc;

Hobbies wise Student count:
Query :
OUTPUT:
select Hobbies, count(hobbies) as total_hobbies_count from Masterdata group by
Hobbies order by total_hobbies_count desc;

Year wise Student count:
Query :
OUTPUT:
select Passout_year, count(Passout_year) as totat_passout_year_count from
Masterdata group by Passout_year order by totat_passout_year_count desc;

Month wise Student count:
Query :
OUTPUT:
select Passout_month, count(Passout_month) as totat_passout_month_count from
Masterdata group by Passout_month order by totat_passout_month_count desc;

 In SQL database it is easier to extract data as per our requirement.
 In any organization there may be a large number of Master data files and as a result
maintaining a database can help.
 MS-excel has a limited capacity to store up to 10 Lakh data.
 Hence under those circumstances where we need to deal with much larger volumes of
data, importing into SQL is useful.
Now, for visualization of data in order to draw important insights from it regarding
potential business opportunities and target areas so that Data Gyan can take
important decision such as:
 Identify those places or areas where it can set up centers.
 Modify or increases the courses portfolio.
 Identify important areas for investment and come up with appropriate marketing
strategies.

• Data visualization is the
graphical representation of
information and data. By using
visual elements like charts, graphs,
and maps, data
visualization tools provide an
accessible way to see and
understand trends, outliers, and
patterns in data

Skills wise Student count:
 There are 10 different types of
skills viz BI, C, C++, Excel, IT, Java,
R, SQL, Tableau, VBA
 We can observe from the text table
that maximum students possess these
3 skills i.e. BI, SQL & IT.
 Rest of the students possess the
remaining 7 skills i.e. C, C++, Excel,
Java, R, Tableau, VBA.

State wise Course:
 There are 5 courses available
in data Gyan i.e. Data
analytics, Business analytics,
Software, Programming &
Database.
 From this bar graph we can
observe that programming is the
most preferred course among the
students in Bihar.
 We can also observe that data
analytics, & programming are the
least preferred courses among the
students of Rajasthan & Punjab
respectively

Installments wise Students:
 From this Circle map we can
observe that 32 students
have opted for the
Installment Payment mode
& 531 students have not
opted for this payment
mode, so we conclude that
maximum number of
students have opted One
time payment mode.

Qualifications wise Students:
 There are 8 different types of
qualifications viz, BA, BBA, BCom,
BSC, BTech, MBA, MCA, MSC.
 We can observe from the highlighted
table that maximum students possess
these 4 qualifications i.e. BA, BSC,
Btech & MCA.
 Few students possess the qualifications
BCom, MBA & MSC.
 Least number of students possess BBA
Qualifications.

Year wise Students:
 We can observe that the students
passed in 3 academic years viz
2018, 2019 & 2020.
 From this Bar graph we can
decode that 57 No of students
passed in the year 2018, 224 No
of students passed in the year
2019, 282 No of students
passed in the year 2020.

Month wise Students:
 We can observe that the students
passed in 3 months viz July,
August & October.
 From this line graph we can
decode that 57 No of students
passed in the Month October,
224 No of students passed in
the month July, 282 No of
students passed in the Month
August.

Specializations wise Students:
 From the bubble chart we can
observe that there are 8 types of
specializations viz Physics,
Software Development, Solid
Mechanics, History, Accounting,
Chemistry, Finance,
Entrepreneurship.
 We can see that maximum students
possess these 4 specializations i.e. ,
Software Development, Solid
Mechanics, History, Chemistry
 Few students possess the
specializations in Physics, Accounting
& Finance.
 Least number of students pursue
entrepreneurship.

Experience wise Students:
 From this bar graph we can
observe that Students are
experienced from 0 to 7
years.
 Maximum No of students
have 2 years of experience.
 Least No of students have 4
years of experience.

Gender wise Students:
 From this Pie chart we observe
that out of total number of
students:
 No of Male – 388
 No of Female – 175

State wise Students:
 From the given map we can
see that students from all
over India are interested to
pursue various courses at
Data Gyan Institute.
 We conclude from the given
map that maximum
interested students belong to
Bihar.

Data management

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data management

Similar to Data management (20)

Recently uploaded

Recently uploaded (20)

Data management