SlideShare a Scribd company logo
1 of 27
Download to read offline
 
 
 
 
Advanced Data Management 
Technologies 
Project Module 1 – Data Warehouse 
 
 
 
 
 
 
 
 
 
 
Jonas Monkevičius 
Rokas Mačiulaitis 
Evija Urtāne 
 
 
 
 
 
 
 
2015 
   
1 
Domain Analysis and Description
1.1. Describe domain, provide motivation
 
The domain of our data warehouse should include all necessary information about students of 
Free University of Bozen­Bolzano. It consists of 5 faculties (Computer Science, Economics and 
Management, Education, Design and Art, Science and Technology). Each faculty has own study 
programs. For example Computer Science faculty now offering 3 study programs (Bachelor in 
Computer Science and Engineering, Master of Science in Computer Science, PhD in Computer 
Science). So each study program has own students. This university is trilingual, so some of the 
students should know at least three languages. 
1.2. Business processes
1.2.1. Student career
Process “Student career” shows activities what student can do in university (e.g. enroll, 
graduate, study) and also related information about student for statistics like what languages 
student knows, what universities student finished before and also for which country student 
come from. Also it is possible to get information about internship. 
Business questions:
About enrollment
● How many students enrolled from non­Europe countries in 2012? 
● How many local students (from South Tyrol) enrolled in 2011? 
● What is the percentage of Italian students enrolled last year? 
● How many student enrolled in Computer Science faculty in 2011? 
About graduation
● How many students from Asia finished master degree last year? 
● How many students graduated from those who enrolled in year 2010? 
● What is the average time for graduation? 
About studying process
● How many students are studying in Econimics Masters? 
● How many non regular students (ERASMUS+, etc) are studying in Computer Science 
bachelor? 
● How many incoming/outgoing students (mobility) are in year 2014? 
● How many students terminated (stoped) the studies in year 2012? 
About languages
● What is the  percentage of students from abroad who have primary language german in 
their home university? 
● How many students have better level in italian language then B1 than level? 
2 
● How much percents of students from South Tyrol chose english language as primary in 
Bolzano university from 2002 to 2014 years? 
Dimensions:
Event, internship, student, languages and levels, exact date, study program 
Measures:
Number of students, internship grade  
1.2.2. Grades
Process “Grades” shows information about activities related to exams (e.g. student go to exam 
and pass/fail it or student don’t show up on exam). 
Business questions:
● What is the average grade per student/ course/ lecturer/ study program/ faculty? 
● What is the percentage of passed/ failed/ no­show exams per student/ course? 
Dimensions:
Exam, Student, Date, Comision 
Measures:
Passed, Grade 
1.2.3. Card usage
Process “Card usage” shows information about all activities what people can do with card (e.g. 
buying food/coffee/snacks, print, scan, copy, open doors, lend things, put money in card). 
Business questions:
● How much money students are spending in cafeteria per week? 
● Which activity is most used? Student? Teacher? (Printing, buying in cafeteria/ uniBar/ 
coffee machine/ snacks machine, library, door opening, loans for design faculty) 
● How many paper sheets are used for printing in each month? 
● What is the percentage of students who are eating in cafeteria? 
Dimensions:
People, Action, Date 
Measures:
Count, Price per unit with discount, Amount 
 
   
3 
1.3. Bus matrix
Bus matrix shows relations between business processes and dimensions, because different 
business processes can use the same dimensions. Dimensions are used by different processes 
in case if they need  the same information. 
 
  E
x
a
c
t
D
a
t
e 
S
t
u
d
e
n
t 
E
x
a
m 
C
o
m
m
i
s
i
o
n 
P
e
o
p
l
e 
A
c
t 
i
o
n 
S
t
u
d
y 
p
r
o
g
r
a
m 
E
v
e
n
t 
I
n
t
e
r
n
s
h
i
p 
L 
a 
n 
g 
u 
a 
g 
e 
Student career fact  X  X          X  X  X  X 
Student grade fact  X  X  X  X             
Card usage fact  X        X  X         
 
2. Conceptual Design
 
Data warehouse of the university is divided to 3 bigger fact dimensions: students career fact , 
study exam fact, card usage fact.  
In students carrier fact it is possible to get information about student, student actions at 
the university (enrolled, graduated, started studies, paused studies, continued studies, stoped 
studies), internship information and information about student knowledge in languages. 
Exam fact store information about exam, students who enrolled to exam and in which 
faculty student studies. It is possible to get information about exam as status, grade, course and 
faculty. 
In card usage fact it is possible to get information about activities which are done with 
card. Each activity information about card owner and specific information for each activity like in 
which building activity happen. 
2.1. Student career fact
 
● Dimensions 
○ Exact Date. This dimension stores information about happened events 
time. ExactDate columns stores date type. For better query performance, 
date is divided to smaller partitions: month, year, semester (Winter, 
4 
Spring), academic year ,  week of year, day of year, day of month, day of 
week (in words). 
○ Student. Information about student. Student id, name, surname, student 
type(regular, free listener [student which attends only in a few courses 
and does not belong in any of study programs], working in uni student), 
birth date, gender, nationality, native language, study program. There are 
also not required data fields: email, highschool, highschool type, home 
location. 
○ Study Program. This dimension stores information about study program. It 
holds data fields: university, faculty, study program, curriculum (learning 
plan code), study type (full time studies, only one course), degree type 
(bachelor, master, PhD), location (location of the faculty). 
○ Event. In this dimension there are the types of available events in the 
university. The list of the events: enrolled, graduated, knows language, 
started internship, completed internship, started studies, paused studies, 
continued studies, stoped studies, studying, studying in italian language, 
studying in german language, studying in english language. This 
dimension also stores each event description. 
○ Internship. Information about internship. Grade ­ evaluation by supervisor 
in organization after completed internship. other fields: company 
(organization name), internship type (winter, summer), supervisor in uni 
name, supervisor in uni surname, supervisor in organization name, 
supervisor in organization surname, internship location. 
○ Languages And Levels. At this dimension there is information about which 
languages knows student and in which level. data fields of this dimension 
are: language code (code in 3 letters (ITA,ENG,GER,...)), language 
name, language description, language level.  
● Measures 
○ Students count, grade of the internship. 
5 
In schema it is visible that not all of hierarchy nodes are mandatory. Some of them are
optional (e.g. Zip code, City, Province). In schema also can be found shared dimensions:
dimensions Internship, Study program and Student share information about location.
2.2. Student grade fact
● Dimensions 
○ Student. Information about student. Student id, name, surname, student 
type(regular, free listener [student which attends only in a few courses and does 
not belong in any of study programs], working in uni student), birth date, gender, 
nationality, native language, study program. There are also not required data 
fields: email, highschool, highschool type, home location. 
○ Exam. Dimension stores information about exam as teacher, subject, credit 
points, room and type. 
○ Date. This dimension stores information about happened events time. Date 
columns stores date type. For better query performance, date is divided to 
smaller partitions: month, year, semester (Winter, Spring), academic year ,  week 
of year, day of year, day of month, day of week (in words). 
○ Commision. Dimension stores information about people who are evaluating. 
6 
● Measures 
○ Passed 
○ Grade 
 
2.3. Card usage fact
● Dimensions 
○ People. The dimension shows information about all people in university who have 
card. 
○ Action. The dimension stores information all activities what can be done with 
card. Each activity has type and place where this activity happened. Place is 
divided to smaller partitions: object, place, sector, building. Activity can have 
product and price. 
○ Date. This dimension stores information about happened events time. Columns 
stores date type. For better query performance, date is divided to smaller 
partitions: month, year, semester (Winter, Spring), academic year ,  week of year, 
day of year, day of month, day of week (in words). 
● Measures 
○ Count. 
○ Price per unit with discount 
○ Amount 
7 
 
3. Logical Design
3.1. Student career fact
  
 
 
8 
How many students are studying in Econimics Masters? 
select count(distinct student_id) 
from 
  jre_student_action_fact a, jre_studyprogram s, jre_event e 
where 
  a.study_program_id = s.studyprogramid 
  and a.event_id = e.event_id 
  and e.event_type = 'studying' 
  and university = 'Free University of Bozen' 
  and faculty = 'Economics and Management' 
  and degreetype = 'Bachelor'; 
 
StudyProgram 
 
Event 
 
StudentActionFact 
 
 
9 
3.2. Student grade fact
 
 
What is the average grade per student? 
select s.name, s.surname, avg(g.grade) 
from 
  jre_grade g, jre_student s 
where 
  g.studentid = s.studentid 
group by s.name, s.surname; 
 
Student 
 
Grade 
10 
 
3.3. Card usage fact
 
Class People will select data from two different tables: student and other people so as not to 
store the student data twice. 
 
How many paper sheets are used for printing in each month? 
select e.year, e.month, sum(count) 
from 
  jre_action a, jre_cardusage c, jre_exactdate e 
where 
  a.actionid = c.actionid 
  and c.dateid = e.dateid 
  and a.actiontype = 'Print' 
group by e.year, e.month; 
 
11 
Action 
 
ExactDate 
 
CardUsage 
 
 
4. Implementation
 
● One query that uses the ROLLUP, CUBE or GROUPING SETS operator 
 
How much money students spent with card in cafeteria in specific year, month, week? 
 
select d.year, d.month, d.weekofyear, sum(c.amount) 
from jre_action a, jre_exactdate d, allpeople p, jre_cardusage c 
where 
  a.actionid = c.ACTIONID 
  and d.dateid = c.dateid 
  and p.id = c.peopleid 
  and a.actiontype = 'Buy' and a.place = 'Mensa' 
12 
  and p.type = 'Student' 
group by rollup(d.year, d.month, d.weekofyear) 
order by d.year, d.weekofyear; 
 
 
● One query that uses the GROUPING ID and/or GROUP ID function. 
 
How many students studying in different language in each faculty?  
 
select 
  decode(grouping_id(e.event_type), 1, 'All languages', e.event_type) 
language, 
  decode(grouping_id(sp.faculty), 1, 'All faculty', sp.faculty) faculty, 
  count(saf.fact_id) students_count, 
  grouping_id(e.event_type, sp.faculty) grouping_id 
from 
  jre_studyprogram sp, 
  jre_event e, 
  jre_student_action_fact saf 
where 
  saf.study_program_id = sp.studyprogramid and 
  saf.event_id = e.event_id and 
  saf.event_id in (20,22,24) 
group by cube ( 
  e.event_type,  
  sp.faculty) 
order by  
  e.event_type,  
  sp.faculty; 
 
13 
 
5. Advanced Querying
 
● Ranking query using NTILE, 
 
            Divide all students in 4 buckets which are studying in university in 2015 years and know 
language in A or B level and order a query by students count. 
 
           ​select  
  lal.language_name,  
  lal.language_level,  
  count(language_level) as language_count,  
  ntile(4) over (order by count(language_level) )  
from   
  jre_languages_and_levels lal, 
  jre_exactdate ed, 
  jre_student_action_fact saf 
where 
  saf.language_id = lal.language_id and 
  saf.date_id = ed.dateid and 
  saf.event_id in (6) and 
  ed.year = 2015 and  
  lal.language_level in ('A1', 'A2','B1', 'B2') 
group by  
  ed.year,  
  lal.language_name,  
14 
  lal.language_level 
order by 
  language_count desc; 
 
 
 
● RANK or DENSE RANK functions 
 
  What is the average grade per student in Computer Science faculty? 
 
SELECT  
JRE_student.name, JRE_student.surname, 
ROUND(AVG(JRE_grade.grade),2) AS "AVERAGE GRADE", RANK() OVER( 
ORDER BY AVG(JRE_grade.grade) DESC) AS "RANK"   
FROM  
JRE_student, JRE_grade, JRE_studyProgram  
WHERE  
JRE_student.studentid=JRE_grade.studentid 
AND JRE_student.studyprogramid=JRE_studyprogram.studyprogramid  
AND JRE_studyprogram.faculty='Computer Science'  
GROUP BY  
JRE_student.name, JRE_student.surname;    
 
 
 
15 
What is the average grade per student and rank in faculty? 
S​ELECT  
JRE_studyprogram.faculty, JRE_student.name, JRE_student.surname, 
ROUND(AVG(JRE_grade.grade),2) AS "AVERAGE GRADE", RANK() OVER( 
PARTITION BY JRE_studyprogram.faculty ORDER BY AVG(JRE_grade.grade) 
DESC) AS "RANK IN Faculty"   
FROM  
JRE_student, JRE_grade, JRE_studyprogram  
WHERE  
JRE_student.studentid=JRE_grade.studentid 
AND JRE_student.studyprogramid=JRE_studyprogram.studyprogramid  
GROUP BY  
JRE_student.name, JRE_student.surname, JRE_studyprogram.faculty;  
 
 
 
Which activity is most used with card? 
 
select  
  e.month, a.actiontype as activity, count(a.actionid) as usedCount,  
  DENSE_RANK() OVER (ORDER BY count(a.actionid) desc) as rank 
from 
  jre_action a, jre_cardusage c, jre_exactdate e 
where 
  a.actionid = c.ACTIONID 
  and e.dateid = c.dateid 
group by a.actiontype, e.month; 
 
 
 
16 
● Windowing query using the windowing clause 
 
How much money are spent with card every week and accumulated amount for every 
week every year? 
 
select  
  d.Year, d.month, d.WEEKOFYEAR, 
  sum(c.amount) weekAmount, 
  sum(sum(c.amount)) OVER (Partition by d.year ORDER BY d.WEEKOFYEAR 
asc) as accumulated 
from  
jre_action a, jre_exactdate d, allpeople p, jre_cardusage c 
where 
  a.actionid = c.ACTIONID 
  and d.dateid = c.dateid 
  and p.id = c.peopleid 
  and a.actiontype = 'Buy' and a.place = 'Mensa' 
group by d.Year, d.month, d.WEEKOFYEAR 
order by d.Year, d.WEEKOFYEAR; 
 
 
 
● Period­to­period comparison query (a query comparing values across time periods, e.g., 
compare sales for every week of the current year with the sales of the corresponding 
weeks in the past year).  
 
How much money students spent with card in cafeteria every date? (amount for every 
date compared with previous and next date when was transactions) 
 
select  
  d.EXACTDATE,  
  sum(c.amount) currentdate, 
  LAG(SUM (c.amount),1) OVER (Order By d.EXACTDATE) as previousdate, 
  LEAD(SUM (c.amount),1) OVER (Order By d.EXACTDATE) as nextdate 
from jre_action a, jre_exactdate d, allpeople p, jre_cardusage c 
where 
  a.actionid = c.ACTIONID 
  and d.dateid = c.dateid 
  and p.id = c.peopleid 
  and a.actiontype = 'Buy' and a.place = 'Mensa' 
17 
  and p.type = 'Student' 
group by d.EXACTDATE 
order by d.EXACTDATE asc; 
 
 
6. Query performance
 
Three most frequently used queries: 
● How much money students spent with card in cafeteria every date? (amount for every 
date compared with previous and next date when was transactions) 
select  
  d.EXACTDATE,  
  sum(c.amount) currentdate, 
  LAG(SUM (c.amount),1) OVER (Order By d.EXACTDATE) as previousdate, 
  LEAD(SUM (c.amount),1) OVER (Order By d.EXACTDATE) as nextdate 
from jre_action a, jre_exactdate d, allpeople p, jre_cardusage c 
where 
  a.actionid = c.ACTIONID 
  and d.dateid = c.dateid 
  and p.id = c.peopleid 
  and a.actiontype = 'Buy' and a.place = 'Mensa' 
  and p.type = 'Student' 
group by d.EXACTDATE 
order by d.EXACTDATE asc; 
 
● How much money students spent with card in cafeteria in specific year, month, week? 
select d.year, d.month, d.weekofyear, sum(c.amount) 
from jre_action a, jre_exactdate d, allpeople p, jre_cardusage c 
where 
  a.actionid = c.ACTIONID 
  and d.dateid = c.dateid 
  and p.id = c.peopleid 
  and a.actiontype = 'Buy' and a.place = 'Mensa' 
  and p.type = 'Student' 
group by rollup(d.year, d.month, d.weekofyear) 
order by d.year, d.weekofyear; 
18 
 
● Which activity is most used with card in each month of the year? 
select  
  e.month,  
a.actiontype as activity,  
count(a.actionid) as usedCount,  
  DENSE_RANK() OVER (ORDER BY count(a.actionid) desc) as rank 
from 
  jre_action a, jre_cardusage c, jre_exactdate e 
where 
  a.actionid = c.ACTIONID 
  and e.dateid = c.dateid 
group by a.actiontype, e.month; 
 
Used dimensions in all three queries: 
1. ExactDate, Action, People G1={e,a,p} 
2. ExactDate, Action, People G1={e,a,p} 
3. ExactDate, Action G2={e,a} 
 
Node relation diagram created using lattice framework. 
Using greedy algorithm candidate nodes for creating materialized view are colored in gray: 
 
 
 
Decision about materialized view for optimizing select time is to make view G2={e,a}, because 
19 
● it can be used in all queries; 
● it doesn’t contain large amount of data (if compare to G1={e,a,p}). 
 
Materialized view: 
create materialized view jre_mv_date_action_sum 
as 
select  
e.EXACTDATE, e.year, e.MONTH, e.WEEKOFYEAR, a.ACTIONTYPE, a.PLACE, 
c.PEOPLEID, sum(c.amount) as amount, a.actionid 
from jre_action a, jre_exactdate e, jre_cardusage c 
where 
  a.actionid = c.ACTIONID 
  and e.DATEID = c.DATEID 
group by  
e.EXACTDATE, e.year, e.MONTH, e.WEEKOFYEAR, a.ACTIONTYPE, a.PLACE, 
c.PEOPLEID, a.actionid 
order by e.EXACTDATE, e.year, e.MONTH, e.WEEKOFYEAR, a.ACTIONTYPE, 
a.PLACE; 
 
First query changed to: 
select  
  c.EXACTDATE,  
  sum(c.amount) currentdate, 
  LAG(SUM (c.amount),1) OVER (Order By c.EXACTDATE) as previousdate, 
  LEAD(SUM (c.amount),1) OVER (Order By c.EXACTDATE) as nextdate 
from allpeople p, jre_mv_date_action_sum c 
where 
  p.id = c.peopleid 
  and c.actiontype = 'Buy' and c.place = 'Mensa' 
  and p.type = 'Student' 
group by c.EXACTDATE 
order by c.EXACTDATE asc; 
 
Second query changed to: 
select c.year, c.month, c.weekofyear, sum(c.amount) 
from jre_mv_date_action_sum c, allpeople p 
where 
  p.id = c.peopleid 
  and c.actiontype = 'Buy' and c.place = 'Mensa' 
  and p.type = 'Student' 
group by rollup(c.year, c.month, c.weekofyear) 
order by c.year, c.weekofyear; 
 
Third query changed to: 
select  
  e.month, 
20 
  e.actiontype as activity,  
count(e.actionid) as usedCount,  
  DENSE_RANK() OVER (ORDER BY count(e.actionid) desc) as rank 
from jre_mv_date_action_sum e 
group by e.actiontype, e.month; 
 
● Gain from materialized view 
 
Speed tests (Query1, Query2, Query3) were taken by using before chosen queries which select 
the same data, but one of them use materialized view and other don’t. 
 
Results: 
 
Tests  Time without MV  Time with MV  Time improvement 
(without/with) 
Query1  0.176  0.109  1.6x 
Query2  0.224  0.027  8.3x 
Query3  0.112  0.093  1.2x 
 
Test results show that all three queries executes faster with materialized view than without. In 
tests difference is not big, because of small data amount, but the real improvement are shown in 
column “Time improvement (without/with)”.  
Query2 executes 8.3 times faster with materialized view, it means that with large data, it will be 
very useful. 
 
● Lose from materialized view 
Used space for saved data. 
 
Materialized view use data from dimensions Action and ExactDate and from fact table 
Cardusage. View store already calculated data. 
 
The worst case: every day in materialized view table can be generated: 
number of data every day = number of all people * number of existing activities. 
21 
​Advanced Data Management 
Technologies 
Project Module 2 – Map Reduce 
 
 
 
 
 
 
 
 
 
 
Jonas Monkevičius 
Rokas Mačiulaitis 
Evija Urtāne 
 
 
 
 
 
 
 
2015
22 
Tasks
● Task: Instant Temporal aggregation 
● Question: What is the average salary? 
● Algorithm: 
 
First cycle 
○ Mapper 
■ Input is a pair from dataset 
■ Select values: salary, start and end time 
■ Go through each time instance and create output pair [time 
instance;salary] 
 
○ Reducer 
■ Input key is a time instance and value is list from salaries 
■ Go through all salaries, sum salaries and count how many salaries are 
summed 
■ Calculate average value ­ divide sum with count 
Second cycle 
○ Mapper 
■ Get time as key and salary as value 
■ Output key is salary and value is time  
○ Reducer 
■ Input key is salary and input value are list with all time intervals 
■ List is sorted and then time is grouped as intervals 
Algorithm output is all average salaries sorted ascending and for each salary selected all 
intervals. 
 
Functions (code) 
Cycle 1 
public void map(LongWritable key, Text value, Context context) throws 
IOException, InterruptedException { 
      String text = value.toString(); 
  String[] parts = text.split(";"); 
  IntWritable salary = new IntWritable(Integer.parseInt(parts[1])); 
  int start = Integer.valueOf(parts[2]); 
  int end = Integer.valueOf(parts[3]); 
    
  for(int x = start; x <= end; x = x+1)  
  { 
  context.write( new Text(Integer.toString(x)), salary); 
  } 
} 
23 
public void reduce(Text key, Iterable<IntWritable> values, Context context) 
throws IOException, InterruptedException  
{ 
  int sum = 0; 
  int count = 0; 
  for (IntWritable value : values)  
  { 
  sum += Integer.valueOf( value.toString()); 
  count += 1; 
  } 
  IntWritable avg =  new IntWritable( sum / count); 
  context.write(key, avg); 
} 
 
Cycle 2 
public void map(LongWritable key, Text value, Context context) throws 
IOException, InterruptedException  
{ 
  String text = value.toString(); 
  String[] parts = text.split("t"); 
  Text salary = new Text(parts[1]); 
  IntWritable time = new IntWritable(Integer.parseInt(parts[0])); 
  context.write( salary, time); 
} 
public void reduce(Text key, Iterable<IntWritable> values, Context context) 
throws IOException, InterruptedException  
{ 
  String s = ""; 
  List<Integer> list = new ArrayList<Integer>(); 
  for (IntWritable value : values)  
  { 
  list.add(Integer.parseInt(value.toString())); 
  } 
  Collections.sort(list); 
  int start = 0; int end = 0; 
  for (Integer value : list)  
  { 
  if(start == 0){ 
  start = value; 
  end = value; 
  } 
  if(value == end+1) end = value; 
  if(value > end+1){ 
  if(start == end) s = s + " " + start; 
  else s = s + " [" + start + "­" + end + "]"; 
24 
  start = value; 
  end = value; 
  } 
  } 
  if(start == end) s = s + " " + start; 
else s = s + " [" + start + "­" + end + "]"; 
  Text text = new Text(key.toString() +  " " + s.toString()) ; 
  context.write(text, new IntWritable(999)); 
} 
 
 
 
Test example 
Input:  
 
0;800;1;14 
1;400;3;6 
2;300;4;7 
0;500;4;5 
0;500;7;8 
 
Output after first map reduce cycle: 
 
1 800 
10 800 
11 800 
12 800 
13 800 
14 800 
2 800 
3 600 
4 500 
5 500 
6 500 
7 533 
8 650 
9 800 
 
Output after second cycle: 
 
500  [4­6] 
533  7 
600  3 
650  8 
800  [1­2] [9­14] 
25 
Speed tests: 
 
Figure 1. ​The graph of map reduce calculations. where x axis is size in Kilobytes and y axis is 
time in seconds. 
 
 
 
   200K  400K  600K  800K  1Mb 
Equal data  7.9   9.9   12.4   15.5   19.4 
Random data  213.4   268.8   338.7   426.7   537.7 
Seq data  262.6   335.5   426.1   541.7   687.3 
Worst data  2261.9   2940.5   3822.7   4969.5   6460.3 
   
  Table 1​. shows speed tests results where different data sets were used. 
 
● Data sets were used from course datasets: 
http://www.inf.unibz.it/dis/teaching/ADMT/proj/data/   
     In  all data sets there are 4 columns. first one means person id, second ­ salary, third ­ 
timestamp begin, fourth ­ timestamp end. The size of data sets was 200Kb, 400Kb, 600Kb, 
800Kb and 1Mb. The type of data sets was: 
● Equal data​. This data set using the same time value in ‘timestamp begin’ and 
‘timestamp end’ in a rows: 
26 
0;383;24048;24059 
1;886;24048;24059 
…; ...;   ...     ;   ... 
9;421;24048;24059 
 
● Random data. ​In this data set the ‘timestamp begin’ and ‘ timestamp end’ are not 
ordered by time, but the difference between begin and end of the time is not big: 
0;383;886;1663 
1;915;593;1728 
…; …; … ; ... 
9;123;67;1202 
 
● Seq data. ​ In this data set the ‘timestamp begin’ and  ‘timestamp end’ columns 
are ordered from 0 to max time period: 
0;383;0;199 
1;886;200;399 
.;...    ; …..; …. 
0;362;2000;2199 
  
● Worst data. ​As we understand from the name of the data set, to calculate the 
results you need a lot of time. Here, the timestamps data have increasing and 
decreasing values at ‘timestamp begin’ and ‘timestamp end’ respectfully. Also the 
interval between timestamp begin and end is very big: 
0;383;12000;40000000 
1;886;12001;39999999 
.; …. ; ….     ; …     ... 
9;421;12009;39999991 
 
In conclusion need to say, that  data set with equal data was calculated very fast and 
smoothly in few seconds. Sequential and random data were calculated in quite the same time 
and time was not that big as in worst data set. Worst data set calculations took a lot of time to 
calculate. 
 
To do map reduce jobs these software were used: 
● Hadoop framework v 1.2.1 
● Java SDK 1.6 
● Eclipse IDE for debug and test results 
  
Source code is available at:  
https://github.com/jmonkevicius/admt/blob/master/WordCount1.java 
27 

More Related Content

What's hot

Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...
Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...
Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...Abdul Rahman Sherzad
 
Course Presentaion: Mathematical Sciences
Course Presentaion: Mathematical SciencesCourse Presentaion: Mathematical Sciences
Course Presentaion: Mathematical SciencesBrunel University
 
Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...
Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...
Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...IJCSIS Research Publications
 
E 5 development-of_a_data_management_system_for_stud
E 5 development-of_a_data_management_system_for_studE 5 development-of_a_data_management_system_for_stud
E 5 development-of_a_data_management_system_for_studEdress Oryakhail
 
Computer Science and Engineering Brochure
Computer Science and Engineering BrochureComputer Science and Engineering Brochure
Computer Science and Engineering BrochureHarshil Lodhi
 
Hong Kong PHD Fellowship Scheme, 2015/16
Hong Kong PHD Fellowship Scheme, 2015/16Hong Kong PHD Fellowship Scheme, 2015/16
Hong Kong PHD Fellowship Scheme, 2015/162016
 
MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...
MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...
MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...AM Publications
 
Result generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organizationResult generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organizationeSAT Journals
 

What's hot (9)

Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...
Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...
Applicability of Educational Data Mining in Afghanistan: Opportunities and Ch...
 
Ijmet 09 11_004
Ijmet 09 11_004Ijmet 09 11_004
Ijmet 09 11_004
 
Course Presentaion: Mathematical Sciences
Course Presentaion: Mathematical SciencesCourse Presentaion: Mathematical Sciences
Course Presentaion: Mathematical Sciences
 
Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...
Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...
Lecturers' Attitudes towards Integration of E-Learning in Higher Education. C...
 
E 5 development-of_a_data_management_system_for_stud
E 5 development-of_a_data_management_system_for_studE 5 development-of_a_data_management_system_for_stud
E 5 development-of_a_data_management_system_for_stud
 
Computer Science and Engineering Brochure
Computer Science and Engineering BrochureComputer Science and Engineering Brochure
Computer Science and Engineering Brochure
 
Hong Kong PHD Fellowship Scheme, 2015/16
Hong Kong PHD Fellowship Scheme, 2015/16Hong Kong PHD Fellowship Scheme, 2015/16
Hong Kong PHD Fellowship Scheme, 2015/16
 
MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...
MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...
MEASURING UTILIZATION OF E-LEARNING COURSE DISCRETE MATHEMATICS TOWARD MOTIVA...
 
Result generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organizationResult generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organization
 

Viewers also liked

E Business y sus Componentes
E Business y sus ComponentesE Business y sus Componentes
E Business y sus ComponentesNathaly2595
 
Real-Time Operational Reporting for Oracle E-Business Suite with CS*Rapid
Real-Time Operational Reporting for Oracle E-Business Suite with CS*RapidReal-Time Operational Reporting for Oracle E-Business Suite with CS*Rapid
Real-Time Operational Reporting for Oracle E-Business Suite with CS*RapidCraig O'Neill
 
CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...
CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...
CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...Prof. Marcus Renato de Carvalho
 
A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...
A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...
A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...CONITEC
 
Medicina personalizada - Carlos Gil
Medicina personalizada - Carlos GilMedicina personalizada - Carlos Gil
Medicina personalizada - Carlos GilOncoguia
 
[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANS
[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANS[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANS
[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANSOncoguia
 
Google Analytics Implementation for Agencies and Companies
Google Analytics Implementation for Agencies and CompaniesGoogle Analytics Implementation for Agencies and Companies
Google Analytics Implementation for Agencies and CompaniesBen Holland
 

Viewers also liked (10)

ข้อมูลพื้นฐาน1
ข้อมูลพื้นฐาน1ข้อมูลพื้นฐาน1
ข้อมูลพื้นฐาน1
 
Archivo de prueba
Archivo de pruebaArchivo de prueba
Archivo de prueba
 
E Business y sus Componentes
E Business y sus ComponentesE Business y sus Componentes
E Business y sus Componentes
 
Guión docente 5
Guión docente 5Guión docente 5
Guión docente 5
 
Real-Time Operational Reporting for Oracle E-Business Suite with CS*Rapid
Real-Time Operational Reporting for Oracle E-Business Suite with CS*RapidReal-Time Operational Reporting for Oracle E-Business Suite with CS*Rapid
Real-Time Operational Reporting for Oracle E-Business Suite with CS*Rapid
 
CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...
CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...
CIRURGIA CESARIANA - Diretrizes de Atenção à Gestante. Ministério da Saúde CO...
 
A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...
A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...
A cooperação do Hospital Moinhos de Vento na elaboração de diretrizes clínica...
 
Medicina personalizada - Carlos Gil
Medicina personalizada - Carlos GilMedicina personalizada - Carlos Gil
Medicina personalizada - Carlos Gil
 
[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANS
[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANS[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANS
[OPERAÇÃO AMPULHETA] Reduzindo o prazo de revisão do Rol da ANS
 
Google Analytics Implementation for Agencies and Companies
Google Analytics Implementation for Agencies and CompaniesGoogle Analytics Implementation for Agencies and Companies
Google Analytics Implementation for Agencies and Companies
 

Similar to ADTMreport

EMJD: Application procedure
EMJD: Application procedureEMJD: Application procedure
EMJD: Application procedureEMAP Project
 
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...8th DisCo conference 2013
 
Matteo Uggeri - METID - PSO
Matteo Uggeri - METID - PSOMatteo Uggeri - METID - PSO
Matteo Uggeri - METID - PSOVISCED
 
INF4015 2015 Course Outline v1
INF4015 2015 Course Outline v1INF4015 2015 Course Outline v1
INF4015 2015 Course Outline v1Saika Dhansay
 
Predictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory SupportPredictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory Supportijcsit
 
Rammeverk: Developing basic competences in Statistics Denmark
Rammeverk: Developing basic competences in Statistics DenmarkRammeverk: Developing basic competences in Statistics Denmark
Rammeverk: Developing basic competences in Statistics DenmarkNordisk statistikermøte 2013
 
IDI's seminar at Lian 2016
IDI's seminar at Lian 2016IDI's seminar at Lian 2016
IDI's seminar at Lian 2016Letizia Jaccheri
 
Mom 2010 brochure orcofi
Mom 2010 brochure orcofiMom 2010 brochure orcofi
Mom 2010 brochure orcofipolux400
 
Unibs presentation
Unibs  presentationUnibs  presentation
Unibs presentationAlex McKenna
 
It shape newsletter 7th. issue
It shape newsletter 7th. issueIt shape newsletter 7th. issue
It shape newsletter 7th. issueIT Shape
 
List of Practices from Case Studies, by Indra Dedze
List of Practices from Case Studies, by Indra DedzeList of Practices from Case Studies, by Indra Dedze
List of Practices from Case Studies, by Indra Dedzegaihe
 
Course Catalog for Incoming Studens at FET - UNIPU
Course Catalog for Incoming Studens at FET - UNIPUCourse Catalog for Incoming Studens at FET - UNIPU
Course Catalog for Incoming Studens at FET - UNIPUN/A - Unemployed
 
I-TShape Newsletter - 3rd issue
I-TShape Newsletter - 3rd issueI-TShape Newsletter - 3rd issue
I-TShape Newsletter - 3rd issueITStudy Ltd.
 
IT-Shape 3. Newsletter
IT-Shape 3. NewsletterIT-Shape 3. Newsletter
IT-Shape 3. NewsletterIT Shape
 
Running a University with Odoo
Running a University with OdooRunning a University with Odoo
Running a University with OdooOdoo
 

Similar to ADTMreport (20)

Stephan vincent_lancrin_ocde
Stephan vincent_lancrin_ocdeStephan vincent_lancrin_ocde
Stephan vincent_lancrin_ocde
 
EMJD: Application procedure
EMJD: Application procedureEMJD: Application procedure
EMJD: Application procedure
 
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
DisCo 2013: Rohliková and Vejvodová and Zounek - Modern Technology and Univer...
 
Ict in 21st werkstuk
Ict in 21st werkstukIct in 21st werkstuk
Ict in 21st werkstuk
 
Matteo Uggeri - METID - PSO
Matteo Uggeri - METID - PSOMatteo Uggeri - METID - PSO
Matteo Uggeri - METID - PSO
 
INF4015 2015 Course Outline v1
INF4015 2015 Course Outline v1INF4015 2015 Course Outline v1
INF4015 2015 Course Outline v1
 
Predictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory SupportPredictive and Statistical Analyses for Academic Advisory Support
Predictive and Statistical Analyses for Academic Advisory Support
 
Rammeverk: Developing basic competences in Statistics Denmark
Rammeverk: Developing basic competences in Statistics DenmarkRammeverk: Developing basic competences in Statistics Denmark
Rammeverk: Developing basic competences in Statistics Denmark
 
IDI's seminar at Lian 2016
IDI's seminar at Lian 2016IDI's seminar at Lian 2016
IDI's seminar at Lian 2016
 
Mom 2010 brochure orcofi
Mom 2010 brochure orcofiMom 2010 brochure orcofi
Mom 2010 brochure orcofi
 
Unibs presentation
Unibs  presentationUnibs  presentation
Unibs presentation
 
It shape newsletter 7th. issue
It shape newsletter 7th. issueIt shape newsletter 7th. issue
It shape newsletter 7th. issue
 
Brochure M2 DEIPM 15_16
Brochure M2 DEIPM 15_16Brochure M2 DEIPM 15_16
Brochure M2 DEIPM 15_16
 
Communication Studies
Communication StudiesCommunication Studies
Communication Studies
 
List of Practices from Case Studies, by Indra Dedze
List of Practices from Case Studies, by Indra DedzeList of Practices from Case Studies, by Indra Dedze
List of Practices from Case Studies, by Indra Dedze
 
What is ISICT?
What is ISICT?What is ISICT?
What is ISICT?
 
Course Catalog for Incoming Studens at FET - UNIPU
Course Catalog for Incoming Studens at FET - UNIPUCourse Catalog for Incoming Studens at FET - UNIPU
Course Catalog for Incoming Studens at FET - UNIPU
 
I-TShape Newsletter - 3rd issue
I-TShape Newsletter - 3rd issueI-TShape Newsletter - 3rd issue
I-TShape Newsletter - 3rd issue
 
IT-Shape 3. Newsletter
IT-Shape 3. NewsletterIT-Shape 3. Newsletter
IT-Shape 3. Newsletter
 
Running a University with Odoo
Running a University with OdooRunning a University with Odoo
Running a University with Odoo
 

ADTMreport