SlideShare a Scribd company logo
1 of 23
Download to read offline
1 | P a g e
Internship Report
Ghulam Ishaq Khan Institute of Engineering
Sciences and Technology
2 | P a g e
Name: Salman Khan
Registration Number: 2012338
Organization: Teradata
Duration: 1 Month (Four Weeks)
Submission Date: 30th
November 2015
Faculty of Computer Science and Engineering (Fall- 2016)
3 | P a g e
Acknowledgement:
First I would like to thank Sir Hassan Waqar, Awais Ijaz Professional
Services Consultant , for giving me the opportunity to do an internship
within the organization. For me it was a unique experience to be in
Teradata Pakistan and to study an interesting data warehousing. It also
helped to get back my interest in databases and to have new plans for my
future career.
I also would like all the people that worked in the office of Teradata in
Lahore. With their patience and openness they created an enjoyable
working environment.
Furthermore I want to thank all the students, with whom I did the fieldwork.
We experienced great things together and they have shown me their final
year projects.
At last I would like to thank the all the administration staff of Ghulam Ishaq
Institute of engineering Sciences and technology and the faculty members
of Computer science department, especially Sir Fawad .
4 | P a g e
EXECUTIVE SUMMARY:
The report is specially meant for my internship program. It is concerned to
a brief study of operations, functions, tasks I performed during my
internship program.
Teradata is playing leading role in providing powerful, enterprise big data
analytics and services that include Data Warehousing, Data Driven
Marketing, BI and CRM.
In preparation of this report I have tried my best to provide all possible
information about the operations, functions, tasks and the corporate
information of Teradata Pakistan in brief and comprehensive form.
5 | P a g e
Letter of Undertaking:
6 | P a g e
Internship Certificate:
7 | P a g e
About Teradata:
Introduction:
Teradata Corporation is a publicly held international computer company
that sells analytic data platforms, marketing applications and related
services. Its analytics products are meant to consolidate data from different
sources and make the data available for analysis. Teradata marketing
applications are meant to support marketing teams that use data analytics
to inform and develop programs.
Teradata is an enterprise software company that develops and sells
a relational database management system (RDBMS) with the same name.
Teradata is publicly traded on the New York Stock Exchange (NYSE) under
the stock symbol TDC.
Teradata Products:
The Teradata product is referred to as a "data warehouse system" and
stores and manages data. The data warehouses use a "shared nothing"
architecture, which means that each server node has its own memory and
processing power. Adding more servers and nodes increases the amount
of data that can be stored. The database software sits on top of the servers
and spreads the workload among them. Teradata sells applications and
software to process different types of data. In 2010, Teradata added text
analytics to track unstructured, such as word processor documents,
and semi-structured data, such as spreadsheets.
Teradata's product can be used for business analysis. Data warehouses
can track company data, such as sales, customer preferences, product
placement, etc.
8 | P a g e
Teradata Database:
Teradata is a relational database management system (RDBMS) that is:
• Teradata is an open system, running on a UNIX MP-RAS or Windows
server platform.
• Teradata is capable of supporting many concurrent users from various
client platforms.
• Teradata is compatible with industry standards (ANSI compliant).
• Teradata is completely built on a parallel architecture.
Why Teradata?
There have plenty of reasons why customers like to choose Teradata .
 Teradata supports larger warehouse data than all competitors
combined.
 Teradata Database can scale from 100 gigabytes to over 100+
petabytes of data on a single system without losing any performance
.This is called Scalability.
 Provides a parallel-aware Optimizer that makes query tuning
unnecessary to get a query to run.
 Automatic and even data distribution eliminates complex indexing
schemes or time-consuming reorganizations.
 Teradata Database can handle the most concurrent users, who are
often running multiple, complex queries.
 Designed and built with parallelism.
 Supports ad-hoc queries using SQL
 Single point of control for the DBA (Teradata Manager).
 Unconditional parallelism (parallel architecture)
 Teradata provides the lowest total cost (TCO) of ownership
 High availability of data because there is no single point of failure -
fault tolerance is built-in to the system.
9 | P a g e
Teradata Database can be used as :
 Enterprise data warehousing
 Active data warehousing
 Customer relationship management
 Internet and E-Business
 Data marts.
10 | P a g e
OBJECTIVE OR PURPOSE OF INTERNSHIP:
Two cogent reasons / purposes of the study are following.
1: General Purpose / Objective
 To know about how people works in an organization.
 To gain experience of work in Teradata which will help me in job
process.
 To know what skills they want from an employee.
 To see the application of our Professional studies especially.
2: Specific Purpose / Objective
Specific purpose of the study includes.
 To know how the employees in large organization handle a problem.
 To get a certificate from Teradata organization.
 To use their database software and to check its queries.
 To objectively observe the operations of Teradata in general.
11 | P a g e
Interview Questions?
 Tell me about your self
 What Can You Do for Us That Other Candidates Can't?
 What is parallelism in Teradata?
 Can we load a Multi set table using MLOAD?
 What is use of BI in Teradata?
 What is snowflake in database?
 What is star schema?
 Normalization is necessary because?
 De-normalization is necessary because?
 What are views is database?
12 | P a g e
Description of the internship:
This report is a short description of my four week internship carried out as
compulsory component of the BS in computer science. The internship was
carried out within the organization Teradata in summer 2015. As I am
interested in databases the work was concentrated on the data
warehousing.
At the beginning of the internship I formulated several learning goals, which
I wanted to achieve:
 to understand the functioning and working conditions of a non-
governmental organization;
 to see what is like to work in a professional environment;
 to see if this kind of work is a possibility for my future career;
 to use my gained skills and knowledge;
 to see what skills and knowledge I still need to work in a professional
environment;
 to learn about the organizing of a research project (planning,
preparation, permissions etc.)
 to learn about research methodologies (field methods/methods to
analyze data)
 to get fieldwork experience/collect data in an environment unknown
for me.
 to enhance my communication skills;
 to build a network.
This internship report contains my activities that have contributed to
achieve a number of my stated goals.
13 | P a g e
1st
Week:
During the first week I just revise the databases basic concept and did
practice of writing complex queries.
This is task is given to me as my homework while in the office I was given
the training session of using the software name Tableau.
Tableau Software:
Tableau Software is an American computer software company
headquartered in Seattle, Washington. It produces a family of
interactive data visualization products focused on business intelligence.
Products:
Tableau offers five main products: Tableau Desktop, Tableau Server,
Tableau Online, Tableau Reader and Tableau Public. Tableau Public and
Tableau Reader are free to use, while both Tableau Server and Tableau
Desktop come with a 14-day fully functional free trial period, after which the
user must pay for the software. Tableau Desktop comes in both a
Professional and a lower cost Personal edition. Tableau Online is available
with an annual subscription for a single user, and scales to support
thousands of users.
14 | P a g e
2nd
Week:
The below picture shows my assignment no 1.
15 | P a g e
The below list was to send to me by Miss Maria it contain the name of
different companies.
16 | P a g e
17 | P a g e
18 | P a g e
Conclusion:
This was my 2nd
week task which I did with full dedication and hard work.
19 | P a g e
3rd
Week:
In the 3rd
and 4th
week I was the task of creating a data warehouse.
In the 3rd
week I created a schema diagram for normalized data and then
created the tables. After the creation of the database it is the time to
populate that data base with data up to 500000 per table in the normalized
database.
Here is the schema diagram for normalized data.
20 | P a g e
Fact Tables:
A fact table is the central table in a star schema of a data warehouse. A
fact table stores quantitative information for analysis and is often
denormalized.
Dimension Tables:
Contrary to fact tables, dimension tables contain descriptive attributes (or
fields) that are typically textual fields (or discrete numbers that behave like
text). These attributes are designed to serve two critical purposes: query
constraining and/or filtering, and query result set labeling.
Code:
The following code is to generate data up to 500000 and store it into a text
file and then its loaded into the database tables.
#include<iostream>
#include <stdlib.h>
#include <time.h>
#include <fstream>
#include <string>
using namespace std;
static const char alphanum[]
="0123456789""ABCDEFGHIJKLMNOPQRSTUVWXYZ""abcdefghijklmnopqrstuvwxyz";
int stringLength = sizeof(alphanum) - 1;
char genRandom()
{
return alphanum[rand() % stringLength];
}
int main(){
int x;
srand (time(NULL));
ofstream myfile;
myfile.open ("Name.txt");
for(int i=0;i<500000;i++){
21 | P a g e
x=rand()% 8+4;
myfile<< i<<" | ";
for(int j=0;j<x;j++){
//cout<<j<<" my name is salman";
int num = rand() % 26;
char upper = static_cast<char>( 'A' + num ); // Convert to upper case
myfile <<upper ;
}
myfile<<" | ";
for(int z=0; z < 21; z++) // generate alphanumeric data
{
myfile << genRandom(); }
myfile<<"n";
}
myfile.close();
return 0;
}
22 | P a g e
4th
Week:
In the last week task was to denormalized the above database, make a
warehouse and check time difference for both normalized data and
denormalized data.
Schema diagram for denormalized data.
23 | P a g e
Comparison of Normalized and Denormalized queries:
Normalized Query:
Denormalized Query:

More Related Content

What's hot

Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Jaleann M McClurg MPH, CSPO, CSM, DTM
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
Eunjeong (Lucy) Park
 
Mahindra and mahindra strategy
Mahindra and mahindra strategyMahindra and mahindra strategy
Mahindra and mahindra strategy
Satender Kumar
 
Financial Analysis of Nepal Telecom (NTC)
Financial Analysis of Nepal Telecom (NTC)Financial Analysis of Nepal Telecom (NTC)
Financial Analysis of Nepal Telecom (NTC)
Ram Kumar Shah "Struggler"
 

What's hot (20)

Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)Introduction to Grad-CAM (short version)
Introduction to Grad-CAM (short version)
 
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow) Object classification using CNN & VGG16 Model (Keras and Tensorflow)
Object classification using CNN & VGG16 Model (Keras and Tensorflow)
 
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
Competing IT Priorities - An Operating Model for Data Stewardship and Busines...
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
Artificial Intelligence = ML + DL with Tensor Flow
Artificial Intelligence = ML + DL with Tensor FlowArtificial Intelligence = ML + DL with Tensor Flow
Artificial Intelligence = ML + DL with Tensor Flow
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Intro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer VisionIntro to Deep Learning for Computer Vision
Intro to Deep Learning for Computer Vision
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
Mahindra and mahindra strategy
Mahindra and mahindra strategyMahindra and mahindra strategy
Mahindra and mahindra strategy
 
Maruti Suzuki
Maruti SuzukiMaruti Suzuki
Maruti Suzuki
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Deep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog DetectorDeep Learning as a Cat/Dog Detector
Deep Learning as a Cat/Dog Detector
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
Deep Learning Tutorial | Deep Learning Tutorial for Beginners | Neural Networ...
 
Image colorization
Image colorizationImage colorization
Image colorization
 
Financial Analysis of Nepal Telecom (NTC)
Financial Analysis of Nepal Telecom (NTC)Financial Analysis of Nepal Telecom (NTC)
Financial Analysis of Nepal Telecom (NTC)
 
Shap
ShapShap
Shap
 
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from..."PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
"PyTorch Deep Learning Framework: Status and Directions," a Presentation from...
 
IPL Data Analysis using Data Science
IPL Data Analysis using Data ScienceIPL Data Analysis using Data Science
IPL Data Analysis using Data Science
 

Similar to Report for internship

SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White Paper
Vipul Neema
 
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docxRunning head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
todd271
 
Pavankumar_TeraData_DBA_8yrsExp
Pavankumar_TeraData_DBA_8yrsExpPavankumar_TeraData_DBA_8yrsExp
Pavankumar_TeraData_DBA_8yrsExp
pavankumar akula
 
Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Lab
kevinflorian
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Rohit Dubey
 

Similar to Report for internship (20)

Top tableau questions and answers in 2019
Top tableau questions and answers in 2019Top tableau questions and answers in 2019
Top tableau questions and answers in 2019
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
 
Metadata in data warehouse
Metadata in data warehouseMetadata in data warehouse
Metadata in data warehouse
 
SAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White PaperSAP BW vs Teradat; A White Paper
SAP BW vs Teradat; A White Paper
 
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docxRunning head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
Running head CS688 – Data Analytics with R1CS688 – Data Analyt.docx
 
Yeswanth-Resume
Yeswanth-ResumeYeswanth-Resume
Yeswanth-Resume
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Pavankumar_TeraData_DBA_8yrsExp
Pavankumar_TeraData_DBA_8yrsExpPavankumar_TeraData_DBA_8yrsExp
Pavankumar_TeraData_DBA_8yrsExp
 
Visualization using Tableau
Visualization using TableauVisualization using Tableau
Visualization using Tableau
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Aginity "Big Data" Research Lab
Aginity "Big Data" Research LabAginity "Big Data" Research Lab
Aginity "Big Data" Research Lab
 
Gopi
GopiGopi
Gopi
 
Planning Data Warehouse
Planning Data WarehousePlanning Data Warehouse
Planning Data Warehouse
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!
 
data wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjhdata wrangling (1).pptx kjhiukjhknjbnkjh
data wrangling (1).pptx kjhiukjhknjbnkjh
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Big Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptxBig Data Transformation Powered By Apache Spark.pptx
Big Data Transformation Powered By Apache Spark.pptx
 

Report for internship

  • 1. 1 | P a g e Internship Report Ghulam Ishaq Khan Institute of Engineering Sciences and Technology
  • 2. 2 | P a g e Name: Salman Khan Registration Number: 2012338 Organization: Teradata Duration: 1 Month (Four Weeks) Submission Date: 30th November 2015 Faculty of Computer Science and Engineering (Fall- 2016)
  • 3. 3 | P a g e Acknowledgement: First I would like to thank Sir Hassan Waqar, Awais Ijaz Professional Services Consultant , for giving me the opportunity to do an internship within the organization. For me it was a unique experience to be in Teradata Pakistan and to study an interesting data warehousing. It also helped to get back my interest in databases and to have new plans for my future career. I also would like all the people that worked in the office of Teradata in Lahore. With their patience and openness they created an enjoyable working environment. Furthermore I want to thank all the students, with whom I did the fieldwork. We experienced great things together and they have shown me their final year projects. At last I would like to thank the all the administration staff of Ghulam Ishaq Institute of engineering Sciences and technology and the faculty members of Computer science department, especially Sir Fawad .
  • 4. 4 | P a g e EXECUTIVE SUMMARY: The report is specially meant for my internship program. It is concerned to a brief study of operations, functions, tasks I performed during my internship program. Teradata is playing leading role in providing powerful, enterprise big data analytics and services that include Data Warehousing, Data Driven Marketing, BI and CRM. In preparation of this report I have tried my best to provide all possible information about the operations, functions, tasks and the corporate information of Teradata Pakistan in brief and comprehensive form.
  • 5. 5 | P a g e Letter of Undertaking:
  • 6. 6 | P a g e Internship Certificate:
  • 7. 7 | P a g e About Teradata: Introduction: Teradata Corporation is a publicly held international computer company that sells analytic data platforms, marketing applications and related services. Its analytics products are meant to consolidate data from different sources and make the data available for analysis. Teradata marketing applications are meant to support marketing teams that use data analytics to inform and develop programs. Teradata is an enterprise software company that develops and sells a relational database management system (RDBMS) with the same name. Teradata is publicly traded on the New York Stock Exchange (NYSE) under the stock symbol TDC. Teradata Products: The Teradata product is referred to as a "data warehouse system" and stores and manages data. The data warehouses use a "shared nothing" architecture, which means that each server node has its own memory and processing power. Adding more servers and nodes increases the amount of data that can be stored. The database software sits on top of the servers and spreads the workload among them. Teradata sells applications and software to process different types of data. In 2010, Teradata added text analytics to track unstructured, such as word processor documents, and semi-structured data, such as spreadsheets. Teradata's product can be used for business analysis. Data warehouses can track company data, such as sales, customer preferences, product placement, etc.
  • 8. 8 | P a g e Teradata Database: Teradata is a relational database management system (RDBMS) that is: • Teradata is an open system, running on a UNIX MP-RAS or Windows server platform. • Teradata is capable of supporting many concurrent users from various client platforms. • Teradata is compatible with industry standards (ANSI compliant). • Teradata is completely built on a parallel architecture. Why Teradata? There have plenty of reasons why customers like to choose Teradata .  Teradata supports larger warehouse data than all competitors combined.  Teradata Database can scale from 100 gigabytes to over 100+ petabytes of data on a single system without losing any performance .This is called Scalability.  Provides a parallel-aware Optimizer that makes query tuning unnecessary to get a query to run.  Automatic and even data distribution eliminates complex indexing schemes or time-consuming reorganizations.  Teradata Database can handle the most concurrent users, who are often running multiple, complex queries.  Designed and built with parallelism.  Supports ad-hoc queries using SQL  Single point of control for the DBA (Teradata Manager).  Unconditional parallelism (parallel architecture)  Teradata provides the lowest total cost (TCO) of ownership  High availability of data because there is no single point of failure - fault tolerance is built-in to the system.
  • 9. 9 | P a g e Teradata Database can be used as :  Enterprise data warehousing  Active data warehousing  Customer relationship management  Internet and E-Business  Data marts.
  • 10. 10 | P a g e OBJECTIVE OR PURPOSE OF INTERNSHIP: Two cogent reasons / purposes of the study are following. 1: General Purpose / Objective  To know about how people works in an organization.  To gain experience of work in Teradata which will help me in job process.  To know what skills they want from an employee.  To see the application of our Professional studies especially. 2: Specific Purpose / Objective Specific purpose of the study includes.  To know how the employees in large organization handle a problem.  To get a certificate from Teradata organization.  To use their database software and to check its queries.  To objectively observe the operations of Teradata in general.
  • 11. 11 | P a g e Interview Questions?  Tell me about your self  What Can You Do for Us That Other Candidates Can't?  What is parallelism in Teradata?  Can we load a Multi set table using MLOAD?  What is use of BI in Teradata?  What is snowflake in database?  What is star schema?  Normalization is necessary because?  De-normalization is necessary because?  What are views is database?
  • 12. 12 | P a g e Description of the internship: This report is a short description of my four week internship carried out as compulsory component of the BS in computer science. The internship was carried out within the organization Teradata in summer 2015. As I am interested in databases the work was concentrated on the data warehousing. At the beginning of the internship I formulated several learning goals, which I wanted to achieve:  to understand the functioning and working conditions of a non- governmental organization;  to see what is like to work in a professional environment;  to see if this kind of work is a possibility for my future career;  to use my gained skills and knowledge;  to see what skills and knowledge I still need to work in a professional environment;  to learn about the organizing of a research project (planning, preparation, permissions etc.)  to learn about research methodologies (field methods/methods to analyze data)  to get fieldwork experience/collect data in an environment unknown for me.  to enhance my communication skills;  to build a network. This internship report contains my activities that have contributed to achieve a number of my stated goals.
  • 13. 13 | P a g e 1st Week: During the first week I just revise the databases basic concept and did practice of writing complex queries. This is task is given to me as my homework while in the office I was given the training session of using the software name Tableau. Tableau Software: Tableau Software is an American computer software company headquartered in Seattle, Washington. It produces a family of interactive data visualization products focused on business intelligence. Products: Tableau offers five main products: Tableau Desktop, Tableau Server, Tableau Online, Tableau Reader and Tableau Public. Tableau Public and Tableau Reader are free to use, while both Tableau Server and Tableau Desktop come with a 14-day fully functional free trial period, after which the user must pay for the software. Tableau Desktop comes in both a Professional and a lower cost Personal edition. Tableau Online is available with an annual subscription for a single user, and scales to support thousands of users.
  • 14. 14 | P a g e 2nd Week: The below picture shows my assignment no 1.
  • 15. 15 | P a g e The below list was to send to me by Miss Maria it contain the name of different companies.
  • 16. 16 | P a g e
  • 17. 17 | P a g e
  • 18. 18 | P a g e Conclusion: This was my 2nd week task which I did with full dedication and hard work.
  • 19. 19 | P a g e 3rd Week: In the 3rd and 4th week I was the task of creating a data warehouse. In the 3rd week I created a schema diagram for normalized data and then created the tables. After the creation of the database it is the time to populate that data base with data up to 500000 per table in the normalized database. Here is the schema diagram for normalized data.
  • 20. 20 | P a g e Fact Tables: A fact table is the central table in a star schema of a data warehouse. A fact table stores quantitative information for analysis and is often denormalized. Dimension Tables: Contrary to fact tables, dimension tables contain descriptive attributes (or fields) that are typically textual fields (or discrete numbers that behave like text). These attributes are designed to serve two critical purposes: query constraining and/or filtering, and query result set labeling. Code: The following code is to generate data up to 500000 and store it into a text file and then its loaded into the database tables. #include<iostream> #include <stdlib.h> #include <time.h> #include <fstream> #include <string> using namespace std; static const char alphanum[] ="0123456789""ABCDEFGHIJKLMNOPQRSTUVWXYZ""abcdefghijklmnopqrstuvwxyz"; int stringLength = sizeof(alphanum) - 1; char genRandom() { return alphanum[rand() % stringLength]; } int main(){ int x; srand (time(NULL)); ofstream myfile; myfile.open ("Name.txt"); for(int i=0;i<500000;i++){
  • 21. 21 | P a g e x=rand()% 8+4; myfile<< i<<" | "; for(int j=0;j<x;j++){ //cout<<j<<" my name is salman"; int num = rand() % 26; char upper = static_cast<char>( 'A' + num ); // Convert to upper case myfile <<upper ; } myfile<<" | "; for(int z=0; z < 21; z++) // generate alphanumeric data { myfile << genRandom(); } myfile<<"n"; } myfile.close(); return 0; }
  • 22. 22 | P a g e 4th Week: In the last week task was to denormalized the above database, make a warehouse and check time difference for both normalized data and denormalized data. Schema diagram for denormalized data.
  • 23. 23 | P a g e Comparison of Normalized and Denormalized queries: Normalized Query: Denormalized Query: