SlideShare a Scribd company logo
1 of 76
Download to read offline
Pig i Hive: Szybkie
wprowadzenie
Radosław Stankiewicz
Bartłomiej Tartanus
Agenda
Wstęp -> 2 słowa o Map Reduce -> Pig -> Hive
3
Wprowadzenie
4
V O LUME5
Variety
6
A|123|10$
B|555|20$
Y|333|15$
{
'typ'='A',
'id'=123,
'kwota'='10$'
}
Velocity
OLAP
Real
Time
Batch
Streaming
Interactive
analytics
7
Value
9
Klasyfikacja problemu
• Baza danych ulic Warszawy, Dane w formacie JSON,
optymalizacja odbioru śmieci jednego z usługodawców.
• Zdarzenia z bazy transakcyjnej i kart kredytowych w
celu lepszego wykrywania fraudów
• System wyszukujący dobre oferty samochodów z wielu
serwisów - web crawling, parsowanie danych, analiza
trendów cen samochodów
• Centralne repozytorium skanów umów, TB danych,
codziennie przybywa kilkaset nowych dokumentów
10
Geneza
• za dużo danych
• pady serwerów
• wolne relacyjne bazy danych
11
12
Architektura
13 źródło: Hortonworks
Ekosystem Hadoop
14 źródło: Hortonworks
15
HDFS - Namenode,
Datanode
16
● User Commands
o dfs
o fsck
● Administration Commands
o datanode
o dfsadmin
o namenode
dfs:
appendToFile cat chgrp chmod chown copyFromLocal copyToLocal count cp du
dus expunge get getfacl getfattr getmerge ls lsr mkdir moveFromLocal
moveToLocal mv put rm rmr setfacl setfattr setrep stat tail test text touchz
hdfs dfs -put localfile1 localfile2 /user/tmp/hadoopdir
hdfs dfs -getmerge /user/hadoop/output/ localfile
komendy
17
Architektura YARN
18
Map Reduce Framework
19
Map Reduce Framework
20
M
M
M
M
R
R
R
R
R
Mapper
#!/usr/bin/env python
import sys
for line in sys.stdin:
words = line.strip().split()
for word in words:
print '%st%s' % (word, 1)
line = “Ala ma kota”
Ala 1
ma 1
kota 1
21
Reducer
#!/usr/bin/env python
import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:
line = line.strip()
word, count = line.split('t', 1)
count = int(count)
if current_word == word:
current_count += count
else:
if current_word:
print '%s,%s' % (current_word, current_count)
current_count = count
current_word = word
if current_word == word:
print '%s,%s' % (current_word, current_count)
ala 1
ala 1
bela 1
dela 1
ala,2
bela,1
dela,1
22
Uruchomienie streaming
cat input.txt | ./mapper.py | sort | ./reducer.py
bin/yarn jar [..]/hadoop-*streaming*.jar 

-file mapper.py -mapper ./mapper.py -file
reducer.py -reducer ./reducer.py 

-input /tmp/wordcount/input -output /tmp/
wordcount/output
23
Map Reduce w Java
(input) <k1, v1> -> map -> <k2, v2> -> combine ->
<k2, v2> -> reduce -> <k3, v3> (output)
1) Mapper
2) Reducer
3) run
public class WordCount extends Configured
implements Tool {
public static class TokenizerMapper{...}
public static class IntSumReducer{...}
public int run(...){...}
}
24
Mapper<KEYIN,VALUEIN,KEY
OUT,VALUEOUT>
public static class TokenizerMapper

extends Mapper<LongWritable, Text, Text, IntWritable>{



private final static IntWritable one = new IntWritable(1);

private Text word = new Text();



public void map(LongWritable key, Text value, Context context

) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word, one);

}

}
public void setup(...) {...}
public void cleanup(...) {...}
public void run(...) {...}

}
value = “Ala ma kota”
Ala,1
ma,1
kota,1
Reducer<KEYIN,VALUEIN,KEY
OUT,VALUEOUT>
public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();



public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}
public void setup(...) {...}
public void cleanup(...) {...}
public void run(...) {...}

}
kota,(1,1,1,1)
kota,4
Main
public int run(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new WordCount(),args);
System.exit(res);
}
yarn jar wc.jar WordCount /tmp/wordcount/input /tmp/wordcount/output
Wprowadzenie do
przetwarzania danych na
przykładzie Pig
28
Architektura Pig
29
Czy warto?
Top 5 stron odwiedzanych przez
użytkowników mających 18 lat
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.SequenceFileInputFormat;
import org.apache.hadoop.mapred.SequenceFileOutputFormat;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.jobcontrol.Job;
import org.apache.hadoop.mapred.jobcontrol.JobControl;
import org.apache.hadoop.mapred.lib.IdentityMapper;
public class MRExample {
public static class LoadPages extends MapReduceBase
implements Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable k, Text val,
OutputCollector<Text, Text> oc,
Reporter reporter) throws IOException {
// Pull the key out
String line = val.toString();
int firstComma = line.indexOf(',');
String key = line.substring(0, firstComma);
String value = line.substring(firstComma + 1);
Text outKey = new Text(key);
// Prepend an index to the value so we know which file
// it came from.
Text outVal = new Text("1" + value);
oc.collect(outKey, outVal);
}
}
public static class LoadAndFilterUsers extends MapReduceBase
implements Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable k, Text val,
OutputCollector<Text, Text> oc,
Reporter reporter) throws IOException {
// Pull the key out
String line = val.toString();
int firstComma = line.indexOf(',');
String value = line.substring(firstComma + 1);
int age = Integer.parseInt(value);
if (age < 18 || age > 25) return;
String key = line.substring(0, firstComma);
Text outKey = new Text(key);
// Prepend an index to the value so we know which file
// it came from.
Text outVal = new Text("2" + value);
oc.collect(outKey, outVal);
}
}
public static class Join extends MapReduceBase
implements Reducer<Text, Text, Text, Text> {
public void reduce(Text key,
Iterator<Text> iter,
OutputCollector<Text, Text> oc,
Reporter reporter) throws IOException {
// For each value, figure out which file it's from and
store it
// accordingly.
List<String> first = new ArrayList<String>();
List<String> second = new ArrayList<String>();
while (iter.hasNext()) {
Text t = iter.next();
String value = t.toString();
if (value.charAt(0) == '1')
first.add(value.substring(1));
else second.add(value.substring(1));
reporter.setStatus("OK");
}
// Do the cross product and collect the values
for (String s1 : first) {
for (String s2 : second) {
String outval = key + "," + s1 + "," + s2;
oc.collect(null, new Text(outval));
reporter.setStatus("OK");
}
}
}
public static class LoadJoined extends MapReduceBase
implements Mapper<Text, Text, Text, LongWritable> {
public void map(
Text k,
Text val,
OutputCollector<Text, LongWritable> oc,
Reporter reporter) throws IOException {
// Find the url
String line = val.toString();
int firstComma = line.indexOf(',');
int secondComma = line.indexOf(',', firstComma);
String key = line.substring(firstComma, secondComma);
// drop the rest of the record, I don't need it anymore,
// just pass a 1 for the combiner/reducer to sum instead.
Text outKey = new Text(key);
oc.collect(outKey, new LongWritable(1L));
}
}
public static class ReduceUrls extends MapReduceBase
implements Reducer<Text, LongWritable, WritableComparable,
Writable> {
public void reduce(
Text key,
Iterator<LongWritable> iter,
OutputCollector<WritableComparable, Writable> oc,
Reporter reporter) throws IOException {
// Add up all the values we see
long sum = 0;
while (iter.hasNext()) {
sum += iter.next().get();
reporter.setStatus("OK");
}
oc.collect(key, new LongWritable(sum));
}
}
public static class LoadClicks extends MapReduceBase
implements Mapper<WritableComparable, Writable, LongWritable,
Text> {
public void map(
WritableComparable key,
Writable val,
OutputCollector<LongWritable, Text> oc,
Reporter reporter) throws IOException {
oc.collect((LongWritable)val, (Text)key);
}
}
public static class LimitClicks extends MapReduceBase
implements Reducer<LongWritable, Text, LongWritable, Text> {
int count = 0;
public void reduce(
LongWritable key,
Iterator<Text> iter,
OutputCollector<LongWritable, Text> oc,
Reporter reporter) throws IOException {
// Only output the first 100 records
while (count < 100 && iter.hasNext()) {
oc.collect(key, iter.next());
count++;
}
}
}
public static void main(String[] args) throws IOException {
JobConf lp = new JobConf(MRExample.class);
lp.setJobName("Load Pages");
lp.setInputFormat(TextInputFormat.class);
lp.setOutputKeyClass(Text.class);
lp.setOutputValueClass(Text.class);
lp.setMapperClass(LoadPages.class);
FileInputFormat.addInputPath(lp, new
Path("/user/gates/pages"));
FileOutputFormat.setOutputPath(lp,
new Path("/user/gates/tmp/indexed_pages"));
lp.setNumReduceTasks(0);
Job loadPages = new Job(lp);
JobConf lfu = new JobConf(MRExample.class);
lfu.setJobName("Load and Filter Users");
lfu.setInputFormat(TextInputFormat.class);
lfu.setOutputKeyClass(Text.class);
lfu.setOutputValueClass(Text.class);
lfu.setMapperClass(LoadAndFilterUsers.class);
FileInputFormat.addInputPath(lfu, new
Path("/user/gates/users"));
FileOutputFormat.setOutputPath(lfu,
new Path("/user/gates/tmp/filtered_users"));
lfu.setNumReduceTasks(0);
Job loadUsers = new Job(lfu);
JobConf join = new JobConf(MRExample.class);
join.setJobName("Join Users and Pages");
join.setInputFormat(KeyValueTextInputFormat.class);
join.setOutputKeyClass(Text.class);
join.setOutputValueClass(Text.class);
join.setMapperClass(IdentityMapper.class);
join.setReducerClass(Join.class);
FileInputFormat.addInputPath(join, new
Path("/user/gates/tmp/indexed_pages"));
FileInputFormat.addInputPath(join, new
Path("/user/gates/tmp/filtered_users"));
FileOutputFormat.setOutputPath(join, new
Path("/user/gates/tmp/joined"));
join.setNumReduceTasks(50);
Job joinJob = new Job(join);
joinJob.addDependingJob(loadPages);
joinJob.addDependingJob(loadUsers);
JobConf group = new JobConf(MRExample.class);
group.setJobName("Group URLs");
group.setInputFormat(KeyValueTextInputFormat.class);
group.setOutputKeyClass(Text.class);
group.setOutputValueClass(LongWritable.class);
group.setOutputFormat(SequenceFileOutputFormat.class);
group.setMapperClass(LoadJoined.class);
group.setCombinerClass(ReduceUrls.class);
group.setReducerClass(ReduceUrls.class);
FileInputFormat.addInputPath(group, new
Path("/user/gates/tmp/joined"));
FileOutputFormat.setOutputPath(group, new
Path("/user/gates/tmp/grouped"));
group.setNumReduceTasks(50);
Job groupJob = new Job(group);
groupJob.addDependingJob(joinJob);
JobConf top100 = new JobConf(MRExample.class);
top100.setJobName("Top 100 sites");
top100.setInputFormat(SequenceFileInputFormat.class);
top100.setOutputKeyClass(LongWritable.class);
top100.setOutputValueClass(Text.class);
top100.setOutputFormat(SequenceFileOutputFormat.class);
top100.setMapperClass(LoadClicks.class);
top100.setCombinerClass(LimitClicks.class);
top100.setReducerClass(LimitClicks.class);
FileInputFormat.addInputPath(top100, new
Path("/user/gates/tmp/grouped"));
FileOutputFormat.setOutputPath(top100, new
Path("/user/gates/top100sitesforusers18to25"));
top100.setNumReduceTasks(1);
Job limit = new Job(top100);
limit.addDependingJob(groupJob);
JobControl jc = new JobControl("Find top 100 sites for users
18 to 25");
jc.addJob(loadPages);
jc.addJob(loadUsers);
jc.addJob(joinJob);
jc.addJob(groupJob);
jc.addJob(limit);
jc.run();
}
}
Users = load ‘users’ as (name, age);
Fltrd = filter Users by
age >= 18 and age <= 25;
Pages = load ‘pages’ as (user, url);
Jnd = join Fltrd by name, Pages by user;
Grpd = group Jnd by url;
Smmd = foreach Grpd generate group, COUNT(Jnd) as clicks;
Srtd = order Smmd by clicks desc;
Top5 = limit Srtd 5;
store Top5 into ‘top5sites’;
Architektura Pig
35
Tryb Pracy
Interaktywny lub Wsadowy
36
Tryb Pracy
Lokalny lub Rozproszony
37
Tryb Pracy
Map Reduce lub Tez
38
Typy danych
39
int long float double
chararray datetime boolean
bytearray biginteger bigdecimal
Złożone typy
40
tuple bag map
Podstawy Pig Latin -
wielkość liter
• A = LOAD 'data' USING PigStorage() AS (f1:int, f2:int, f3:int);

B = GROUP A BY f1;

C = FOREACH B GENERATE COUNT ($0);

DUMP C;
• Nazwy zmiennych A, B, and C (tzw. aliasy) są case sensitive.
• Wielkość liter jest też istotna dla:
• nazwy pól f1, f2, i f3
• nazwy zmiennych A, B, C
• nazwy funkcji PigStorage, COUNT
• Z wyjątkiem: LOAD, USING, AS, GROUP, BY, FOREACH, GENERATE, oraz DUMP
41
assert, and, any, all, arrange, as, asc, AVG, bag,
BinStorage, by, bytearray, BIGINTEGER, BIGDECIMAL,
cache, CASE, cat, cd, chararray, cogroup, CONCAT,
copyFromLocal, copyToLocal, COUNT, cp, cross,
datetime, %declare, %default, define, dense, desc,
describe, DIFF, distinct, double, du, dump, e, E,
eval, exec, explain, f, F, filter, flatten, float,
foreach, full, generate, group, help, if, illustrate,
import, inner, input, int, into, is, join, kill, l, L,
left, limit, load, long, ls, map, matches, MAX, MIN,
mkdir, mv, not, null, onschema, or, order, outer,
output, parallel, pig, PigDump, PigStorage, pwd, quit,
register, returns, right, rm, rmf, rollup, run,
sample, set, ship, SIZE, split, stderr, stdin, stdout,
store, stream, SUM, TextLoader, TOKENIZE, through,
tuple, union, using, void
42
Słowa kluczowe
Pierwsze kroki
data = LOAD 'input' AS (query:CHARARRAY);
A = LOAD 'data' USING PigStorage('t') AS (f1:int, f2:int, f3:int);
STORE A INTO '/tmp/result' USING PigStorage(';')
43
Pierwsze kroki
SAMPLE
DESCRIBE
DUMP
EXPLAIN
ILLUSTRATE
44
Kolejne kroki - operacje na
danych
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int,
scholarship:float);
B = FILTER A BY age > 20;
45
Kolejne kroki - operacje na
danych
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int,
scholarship:float);
B = FILTER A BY age > 20;
C = LIMIT B 5;
46
Kolejne kroki - operacje na
danych
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int,
scholarship:float);
B = FILTER A BY age > 20;
C = LIMIT B 5;
D = FOREACH C GENERATE name, scholarship*semestre as funds
47
Kolejne kroki - operacje na
danych
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int,
scholarship:float);
E = GROUP A by age
48
Kolejne kroki - operacje na
danych
A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int,
scholarship:float);
E = GROUP A by age
F = FOREACH E GENERATE group as age, AVG(A.scholarship)
49
Wydajność
Tez, Projekcje, Filtrowanie, Join
50
Warsztat
adres serwera ssh: 52.5.251.228
51
https://www.notehub.org/veuk1
Wprowadzenie do analizy
danych na przykładzie
Hive
52
Architektura
53
Unikalne cechy Hive
Zapytania SQL na plikach płaskich, np. CSV
54
Unikalne cechy Hive
Znaczne przyspieszenie analizy - nie potrzeba pisać Map Reduce
Optymalizacja, wykonywanie części operacji w pamięci zamiast MR
55
Unikalne cechy Hive
Nieograniczone formy integracji - MongoDB, Elastic Search,
HBase
56
Unikalne cechy Hive
Integracja narzędzi BI oraz DWH z Hive poprzez JDBC
57
Hive CLI
Tryb Interaktywny
hive
Tryb Wsadowy:
hive -e ‘select foo from bar’
hive -f ‘/path/to/my/script.q’
hive -f ‘hdfs://namenode:port/path/to/my/
script.q’
więcej opcji: hive --help
58
Typy danych
INT, TINYINT, SMALLINT, BIGINT
BOOLEAN
DECIMAL
FLOAT, DOUBLE
STRING
BINARY
TIMESTAMP
ARRAY, MAP, STRUCT, UNION
DATE
CHAR
VARCHAR
59
Składnia zapytań
SELECT, INSERT, UPDATE
GROUP BY
UNION
LEFT, RIGHT, FULL INNER, FULL OUTER JOIN
OVER, RANK
(NOT) IN, HAVING
(NOT) EXISTS
60
Data Definition Language
• CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
• DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
• TRUNCATE TABLE
• ALTER DATABASE/SCHEMA, TABLE, VIEW
• MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
• SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES,
PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
• DESCRIBE DATABASE/SCHEMA, table_name, view_name
61
Tabele
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '001'
STORED AS TEXTFILE;
62
Pierwsze kroki w Hive
CREATE TABLE tablename1 (foo INT, bar STRING) PARTITIONED BY (ds STRING);
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename1;
INSERT INTO TABLE tablename1 PARTITION (ds='2014') select_statement1 FROM
from_statement;
63
Inne formaty plików? SerDe
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/
start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
CREATE TABLE apachelog (
host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING,
size STRING, referer STRING, agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?:
([^ "]*|".*") ([^ "]*|".*"))?"
)
STORED AS TEXTFILE;
64
Inne formaty plików? SerDe
CREATE TABLE table (
foo STRING, bar STRING)
STORED AS TEXTFILE; ← lub SEQUENCEFILE, ORC, AVRO lub PARQUET
65
Zalety, wady, porównanie
Hive Pig
deklaratywny proceduralny
tabele tymczasowe pipeline
polegamy na optymalizatorze
bardziej ingerujemy w
implementacje
UDF, Transform UDF, streaming
sterowniki sql data pipeline splits
66
Stinger
http://hortonworks.com/labs/stinger/
67
Tips & Tricks
hive.vectorized.execution.enabled=true
ORC
hive.execution.engine=tez
John Lund Stone Getty Images
68
Warsztat
źródło:HikingArtist69
https://www.notehub.org/bjnwl
Warsztat 2
https://www.notehub.org/daxzs
Now what?
72
Chcesz wiedzieć więcej?
Szkolenia pozwalają na indywidualną pracę z każdym
uczestnikiem
• pracujemy w grupach 4-8 osobowych
• program może być dostosowany do oczekiwań
grupy
• rozwiązujemy i odpowiadamy na indywidualne
pytania uczestników
• mamy dużo więcej czasu :)
Szkolenie dedykowane dla
Ciebie
Jesteś architektem lub team leaderem?
• na przekrojowym szkoleniu 5-dniowym omawiamy i
ćwiczymy cały ekosystem Hadoopa
• na szkoleniu dedykowanym dla architektów dyskutujemy
o projektowaniu systemów BigData
Jesteś analitykiem?
• na dedykowanym szkoleniu przećwiczysz w szczegółach
Pig i Hive i rozwiążesz przykładowe problemy analityczne
Szkolenie dedykowane dla
Ciebie
Jesteś programistą?
• szkolenie 3-dniowe pozwala w szczegółach zapoznać się z
programowaniem zaawansowanych aspektów MapReduce
w Javie i programowaniem w podejściu strumieniowym
Interesuje Cię całość zagadnienia BigData?
• Przetwarzanie Big Data z użyciem Apache Spark
• Bazy danych NoSQL - Cassandra
• Bazy danych NoSQL - MongoDB
źródła
• HikingArtist.com - rysunki
• hortonworks.com - architektura HDP
• apache.org - grafiki Pig, Hive, Hadoop

More Related Content

What's hot

Google App Engine Developer - Day3
Google App Engine Developer - Day3Google App Engine Developer - Day3
Google App Engine Developer - Day3Simon Su
 
Using Cerberus and PySpark to validate semi-structured datasets
Using Cerberus and PySpark to validate semi-structured datasetsUsing Cerberus and PySpark to validate semi-structured datasets
Using Cerberus and PySpark to validate semi-structured datasetsBartosz Konieczny
 
Writing native bindings to node.js in C++
Writing native bindings to node.js in C++Writing native bindings to node.js in C++
Writing native bindings to node.js in C++nsm.nikhil
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 SpringKiyotaka Oku
 
All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopAll you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopSaša Tatar
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for CassandraEdward Capriolo
 
PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by exampleYunWon Jeong
 
Webinar: Building Your First App in Node.js
Webinar: Building Your First App in Node.jsWebinar: Building Your First App in Node.js
Webinar: Building Your First App in Node.jsMongoDB
 
Herding types with Scala macros
Herding types with Scala macrosHerding types with Scala macros
Herding types with Scala macrosMarina Sigaeva
 
Openstack taskflow 簡介
Openstack taskflow 簡介Openstack taskflow 簡介
Openstack taskflow 簡介kao kuo-tung
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 
Apache Spark Structured Streaming + Apache Kafka = ♡
Apache Spark Structured Streaming + Apache Kafka = ♡Apache Spark Structured Streaming + Apache Kafka = ♡
Apache Spark Structured Streaming + Apache Kafka = ♡Bartosz Konieczny
 
The Ring programming language version 1.7 book - Part 16 of 196
The Ring programming language version 1.7 book - Part 16 of 196The Ring programming language version 1.7 book - Part 16 of 196
The Ring programming language version 1.7 book - Part 16 of 196Mahmoud Samir Fayed
 
The Ring programming language version 1.6 book - Part 15 of 189
The Ring programming language version 1.6 book - Part 15 of 189The Ring programming language version 1.6 book - Part 15 of 189
The Ring programming language version 1.6 book - Part 15 of 189Mahmoud Samir Fayed
 
Mysql5.1 character set testing
Mysql5.1 character set testingMysql5.1 character set testing
Mysql5.1 character set testingPhilip Zhong
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socketPhilip Zhong
 
C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴명신 김
 
Jakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheelJakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheeltcurdt
 

What's hot (20)

Google App Engine Developer - Day3
Google App Engine Developer - Day3Google App Engine Developer - Day3
Google App Engine Developer - Day3
 
Using Cerberus and PySpark to validate semi-structured datasets
Using Cerberus and PySpark to validate semi-structured datasetsUsing Cerberus and PySpark to validate semi-structured datasets
Using Cerberus and PySpark to validate semi-structured datasets
 
Writing native bindings to node.js in C++
Writing native bindings to node.js in C++Writing native bindings to node.js in C++
Writing native bindings to node.js in C++
 
JJUG CCC 2011 Spring
JJUG CCC 2011 SpringJJUG CCC 2011 Spring
JJUG CCC 2011 Spring
 
All you need to know about the JavaScript event loop
All you need to know about the JavaScript event loopAll you need to know about the JavaScript event loop
All you need to know about the JavaScript event loop
 
Intravert Server side processing for Cassandra
Intravert Server side processing for CassandraIntravert Server side processing for Cassandra
Intravert Server side processing for Cassandra
 
PyCon KR 2019 sprint - RustPython by example
PyCon KR 2019 sprint  - RustPython by examplePyCon KR 2019 sprint  - RustPython by example
PyCon KR 2019 sprint - RustPython by example
 
Webinar: Building Your First App in Node.js
Webinar: Building Your First App in Node.jsWebinar: Building Your First App in Node.js
Webinar: Building Your First App in Node.js
 
Herding types with Scala macros
Herding types with Scala macrosHerding types with Scala macros
Herding types with Scala macros
 
Openstack taskflow 簡介
Openstack taskflow 簡介Openstack taskflow 簡介
Openstack taskflow 簡介
 
Typelevel summit
Typelevel summitTypelevel summit
Typelevel summit
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
Angular2 rxjs
Angular2 rxjsAngular2 rxjs
Angular2 rxjs
 
Apache Spark Structured Streaming + Apache Kafka = ♡
Apache Spark Structured Streaming + Apache Kafka = ♡Apache Spark Structured Streaming + Apache Kafka = ♡
Apache Spark Structured Streaming + Apache Kafka = ♡
 
The Ring programming language version 1.7 book - Part 16 of 196
The Ring programming language version 1.7 book - Part 16 of 196The Ring programming language version 1.7 book - Part 16 of 196
The Ring programming language version 1.7 book - Part 16 of 196
 
The Ring programming language version 1.6 book - Part 15 of 189
The Ring programming language version 1.6 book - Part 15 of 189The Ring programming language version 1.6 book - Part 15 of 189
The Ring programming language version 1.6 book - Part 15 of 189
 
Mysql5.1 character set testing
Mysql5.1 character set testingMysql5.1 character set testing
Mysql5.1 character set testing
 
Mysql handle socket
Mysql handle socketMysql handle socket
Mysql handle socket
 
C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴C#을 이용한 task 병렬화와 비동기 패턴
C#을 이용한 task 병렬화와 비동기 패턴
 
Jakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheelJakarta Commons - Don't re-invent the wheel
Jakarta Commons - Don't re-invent the wheel
 

Viewers also liked

Podstawy AngularJS
Podstawy AngularJSPodstawy AngularJS
Podstawy AngularJSSages
 
Budowa elementów GUI za pomocą biblioteki React - szybki start
Budowa elementów GUI za pomocą biblioteki React - szybki startBudowa elementów GUI za pomocą biblioteki React - szybki start
Budowa elementów GUI za pomocą biblioteki React - szybki startSages
 
Technologia Xamarin i wprowadzenie do Windows IoT core
Technologia Xamarin i wprowadzenie do Windows IoT coreTechnologia Xamarin i wprowadzenie do Windows IoT core
Technologia Xamarin i wprowadzenie do Windows IoT coreSages
 
Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...
Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...
Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...Sages
 
Bezpieczne dane w aplikacjach java
Bezpieczne dane w aplikacjach javaBezpieczne dane w aplikacjach java
Bezpieczne dane w aplikacjach javaSages
 
Wprowadzenie do technologii Puppet
Wprowadzenie do technologii PuppetWprowadzenie do technologii Puppet
Wprowadzenie do technologii PuppetSages
 
Szybkie wprowadzenie do eksploracji danych z pakietem Weka
Szybkie wprowadzenie do eksploracji danych z pakietem WekaSzybkie wprowadzenie do eksploracji danych z pakietem Weka
Szybkie wprowadzenie do eksploracji danych z pakietem WekaSages
 
Jak zacząć przetwarzanie małych i dużych danych tekstowych?
Jak zacząć przetwarzanie małych i dużych danych tekstowych?Jak zacząć przetwarzanie małych i dużych danych tekstowych?
Jak zacząć przetwarzanie małych i dużych danych tekstowych?Sages
 
Architektura aplikacji android
Architektura aplikacji androidArchitektura aplikacji android
Architektura aplikacji androidSages
 
Wprowadzenie do Big Data i Apache Spark
Wprowadzenie do Big Data i Apache SparkWprowadzenie do Big Data i Apache Spark
Wprowadzenie do Big Data i Apache SparkSages
 
Vert.x v3 - high performance polyglot application toolkit
Vert.x v3 - high performance  polyglot application toolkitVert.x v3 - high performance  polyglot application toolkit
Vert.x v3 - high performance polyglot application toolkitSages
 

Viewers also liked (11)

Podstawy AngularJS
Podstawy AngularJSPodstawy AngularJS
Podstawy AngularJS
 
Budowa elementów GUI za pomocą biblioteki React - szybki start
Budowa elementów GUI za pomocą biblioteki React - szybki startBudowa elementów GUI za pomocą biblioteki React - szybki start
Budowa elementów GUI za pomocą biblioteki React - szybki start
 
Technologia Xamarin i wprowadzenie do Windows IoT core
Technologia Xamarin i wprowadzenie do Windows IoT coreTechnologia Xamarin i wprowadzenie do Windows IoT core
Technologia Xamarin i wprowadzenie do Windows IoT core
 
Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...
Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...
Zrób dobrze swojej komórce - programowanie urządzeń mobilnych z wykorzystanie...
 
Bezpieczne dane w aplikacjach java
Bezpieczne dane w aplikacjach javaBezpieczne dane w aplikacjach java
Bezpieczne dane w aplikacjach java
 
Wprowadzenie do technologii Puppet
Wprowadzenie do technologii PuppetWprowadzenie do technologii Puppet
Wprowadzenie do technologii Puppet
 
Szybkie wprowadzenie do eksploracji danych z pakietem Weka
Szybkie wprowadzenie do eksploracji danych z pakietem WekaSzybkie wprowadzenie do eksploracji danych z pakietem Weka
Szybkie wprowadzenie do eksploracji danych z pakietem Weka
 
Jak zacząć przetwarzanie małych i dużych danych tekstowych?
Jak zacząć przetwarzanie małych i dużych danych tekstowych?Jak zacząć przetwarzanie małych i dużych danych tekstowych?
Jak zacząć przetwarzanie małych i dużych danych tekstowych?
 
Architektura aplikacji android
Architektura aplikacji androidArchitektura aplikacji android
Architektura aplikacji android
 
Wprowadzenie do Big Data i Apache Spark
Wprowadzenie do Big Data i Apache SparkWprowadzenie do Big Data i Apache Spark
Wprowadzenie do Big Data i Apache Spark
 
Vert.x v3 - high performance polyglot application toolkit
Vert.x v3 - high performance  polyglot application toolkitVert.x v3 - high performance  polyglot application toolkit
Vert.x v3 - high performance polyglot application toolkit
 

Similar to Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course

Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraDeependra Ariyadewa
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Andrew Yongjoon Kong
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19confluent
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
GDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with easeGDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with easeKAI CHU CHUNG
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiUnmesh Baile
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIsBig Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIsMatt Stubbs
 
TDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com GoTDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com Gotdc-globalcode
 
Reactive Programming Patterns with RxSwift
Reactive Programming Patterns with RxSwiftReactive Programming Patterns with RxSwift
Reactive Programming Patterns with RxSwiftFlorent Pillet
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"DataStax Academy
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGMatthew McCullough
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Camel one v3-6
Camel one v3-6Camel one v3-6
Camel one v3-6wxdydx
 

Similar to Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course (20)

Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
 
Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...Stream analysis with kafka native way and considerations about monitoring as ...
Stream analysis with kafka native way and considerations about monitoring as ...
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
GDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with easeGDG Devfest 2019 - Build go kit microservices at kubernetes with ease
GDG Devfest 2019 - Build go kit microservices at kubernetes with ease
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIsBig Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
Big Data LDN 2017: Processing Fast Data With Apache Spark: the Tale of Two APIs
 
TDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com GoTDC2018SP | Trilha Go - Processando analise genetica em background com Go
TDC2018SP | Trilha Go - Processando analise genetica em background com Go
 
Reactive Programming Patterns with RxSwift
Reactive Programming Patterns with RxSwiftReactive Programming Patterns with RxSwift
Reactive Programming Patterns with RxSwift
 
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
NYC* 2013 - "Advanced Data Processing: Beyond Queries and Slices"
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Camel one v3-6
Camel one v3-6Camel one v3-6
Camel one v3-6
 

More from Sages

Python szybki start
Python   szybki startPython   szybki start
Python szybki startSages
 
Budowanie rozwiązań serverless w chmurze Azure
Budowanie rozwiązań serverless w chmurze AzureBudowanie rozwiązań serverless w chmurze Azure
Budowanie rozwiązań serverless w chmurze AzureSages
 
Docker praktyczne podstawy
Docker  praktyczne podstawyDocker  praktyczne podstawy
Docker praktyczne podstawySages
 
Angular 4 pragmatycznie
Angular 4 pragmatycznieAngular 4 pragmatycznie
Angular 4 pragmatycznieSages
 
Jak działa blockchain?
Jak działa blockchain?Jak działa blockchain?
Jak działa blockchain?Sages
 
Qgis szybki start
Qgis szybki startQgis szybki start
Qgis szybki startSages
 
Architektura SOA - wstęp
Architektura SOA - wstępArchitektura SOA - wstęp
Architektura SOA - wstępSages
 
Wprowadzenie do technologii Big Data
Wprowadzenie do technologii Big DataWprowadzenie do technologii Big Data
Wprowadzenie do technologii Big DataSages
 

More from Sages (8)

Python szybki start
Python   szybki startPython   szybki start
Python szybki start
 
Budowanie rozwiązań serverless w chmurze Azure
Budowanie rozwiązań serverless w chmurze AzureBudowanie rozwiązań serverless w chmurze Azure
Budowanie rozwiązań serverless w chmurze Azure
 
Docker praktyczne podstawy
Docker  praktyczne podstawyDocker  praktyczne podstawy
Docker praktyczne podstawy
 
Angular 4 pragmatycznie
Angular 4 pragmatycznieAngular 4 pragmatycznie
Angular 4 pragmatycznie
 
Jak działa blockchain?
Jak działa blockchain?Jak działa blockchain?
Jak działa blockchain?
 
Qgis szybki start
Qgis szybki startQgis szybki start
Qgis szybki start
 
Architektura SOA - wstęp
Architektura SOA - wstępArchitektura SOA - wstęp
Architektura SOA - wstęp
 
Wprowadzenie do technologii Big Data
Wprowadzenie do technologii Big DataWprowadzenie do technologii Big Data
Wprowadzenie do technologii Big Data
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course

  • 1. Pig i Hive: Szybkie wprowadzenie Radosław Stankiewicz Bartłomiej Tartanus
  • 2.
  • 3. Agenda Wstęp -> 2 słowa o Map Reduce -> Pig -> Hive 3
  • 8.
  • 10. Klasyfikacja problemu • Baza danych ulic Warszawy, Dane w formacie JSON, optymalizacja odbioru śmieci jednego z usługodawców. • Zdarzenia z bazy transakcyjnej i kart kredytowych w celu lepszego wykrywania fraudów • System wyszukujący dobre oferty samochodów z wielu serwisów - web crawling, parsowanie danych, analiza trendów cen samochodów • Centralne repozytorium skanów umów, TB danych, codziennie przybywa kilkaset nowych dokumentów 10
  • 11. Geneza • za dużo danych • pady serwerów • wolne relacyjne bazy danych 11
  • 12. 12
  • 15. 15
  • 17. ● User Commands o dfs o fsck ● Administration Commands o datanode o dfsadmin o namenode dfs: appendToFile cat chgrp chmod chown copyFromLocal copyToLocal count cp du dus expunge get getfacl getfattr getmerge ls lsr mkdir moveFromLocal moveToLocal mv put rm rmr setfacl setfattr setrep stat tail test text touchz hdfs dfs -put localfile1 localfile2 /user/tmp/hadoopdir hdfs dfs -getmerge /user/hadoop/output/ localfile komendy 17
  • 21. Mapper #!/usr/bin/env python import sys for line in sys.stdin: words = line.strip().split() for word in words: print '%st%s' % (word, 1) line = “Ala ma kota” Ala 1 ma 1 kota 1 21
  • 22. Reducer #!/usr/bin/env python import sys current_word = None current_count = 0 word = None for line in sys.stdin: line = line.strip() word, count = line.split('t', 1) count = int(count) if current_word == word: current_count += count else: if current_word: print '%s,%s' % (current_word, current_count) current_count = count current_word = word if current_word == word: print '%s,%s' % (current_word, current_count) ala 1 ala 1 bela 1 dela 1 ala,2 bela,1 dela,1 22
  • 23. Uruchomienie streaming cat input.txt | ./mapper.py | sort | ./reducer.py bin/yarn jar [..]/hadoop-*streaming*.jar 
 -file mapper.py -mapper ./mapper.py -file reducer.py -reducer ./reducer.py 
 -input /tmp/wordcount/input -output /tmp/ wordcount/output 23
  • 24. Map Reduce w Java (input) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3, v3> (output) 1) Mapper 2) Reducer 3) run public class WordCount extends Configured implements Tool { public static class TokenizerMapper{...} public static class IntSumReducer{...} public int run(...){...} } 24
  • 25. Mapper<KEYIN,VALUEIN,KEY OUT,VALUEOUT> public static class TokenizerMapper
 extends Mapper<LongWritable, Text, Text, IntWritable>{
 
 private final static IntWritable one = new IntWritable(1);
 private Text word = new Text();
 
 public void map(LongWritable key, Text value, Context context
 ) throws IOException, InterruptedException {
 StringTokenizer itr = new StringTokenizer(value.toString());
 while (itr.hasMoreTokens()) {
 word.set(itr.nextToken());
 context.write(word, one);
 }
 } public void setup(...) {...} public void cleanup(...) {...} public void run(...) {...}
 } value = “Ala ma kota” Ala,1 ma,1 kota,1
  • 26. Reducer<KEYIN,VALUEIN,KEY OUT,VALUEOUT> public static class IntSumReducer
 extends Reducer<Text,IntWritable,Text,IntWritable> {
 private IntWritable result = new IntWritable();
 
 public void reduce(Text key, Iterable<IntWritable> values,
 Context context
 ) throws IOException, InterruptedException {
 int sum = 0;
 for (IntWritable val : values) {
 sum += val.get();
 }
 result.set(sum);
 context.write(key, result);
 } public void setup(...) {...} public void cleanup(...) {...} public void run(...) {...}
 } kota,(1,1,1,1) kota,4
  • 27. Main public int run(String[] args) throws Exception {
 Configuration conf = new Configuration();
 Job job = Job.getInstance(conf, "word count");
 job.setJarByClass(WordCount.class);
 job.setMapperClass(TokenizerMapper.class);
 job.setCombinerClass(IntSumReducer.class);
 job.setReducerClass(IntSumReducer.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));
 System.exit(job.waitForCompletion(true) ? 0 : 1);
 } public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new WordCount(),args); System.exit(res); } yarn jar wc.jar WordCount /tmp/wordcount/input /tmp/wordcount/output
  • 28. Wprowadzenie do przetwarzania danych na przykładzie Pig 28
  • 30. Czy warto? Top 5 stron odwiedzanych przez użytkowników mających 18 lat
  • 31. import java.io.IOException; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.KeyValueTextInputFormat; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.RecordReader; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.jobcontrol.Job; import org.apache.hadoop.mapred.jobcontrol.JobControl; import org.apache.hadoop.mapred.lib.IdentityMapper; public class MRExample { public static class LoadPages extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable k, Text val, OutputCollector<Text, Text> oc, Reporter reporter) throws IOException { // Pull the key out String line = val.toString(); int firstComma = line.indexOf(','); String key = line.substring(0, firstComma); String value = line.substring(firstComma + 1); Text outKey = new Text(key); // Prepend an index to the value so we know which file // it came from. Text outVal = new Text("1" + value); oc.collect(outKey, outVal); } } public static class LoadAndFilterUsers extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable k, Text val, OutputCollector<Text, Text> oc, Reporter reporter) throws IOException { // Pull the key out String line = val.toString(); int firstComma = line.indexOf(','); String value = line.substring(firstComma + 1); int age = Integer.parseInt(value); if (age < 18 || age > 25) return; String key = line.substring(0, firstComma); Text outKey = new Text(key); // Prepend an index to the value so we know which file // it came from. Text outVal = new Text("2" + value); oc.collect(outKey, outVal); } } public static class Join extends MapReduceBase implements Reducer<Text, Text, Text, Text> { public void reduce(Text key, Iterator<Text> iter, OutputCollector<Text, Text> oc, Reporter reporter) throws IOException { // For each value, figure out which file it's from and store it // accordingly. List<String> first = new ArrayList<String>(); List<String> second = new ArrayList<String>(); while (iter.hasNext()) { Text t = iter.next(); String value = t.toString(); if (value.charAt(0) == '1') first.add(value.substring(1)); else second.add(value.substring(1)); reporter.setStatus("OK"); } // Do the cross product and collect the values for (String s1 : first) { for (String s2 : second) { String outval = key + "," + s1 + "," + s2; oc.collect(null, new Text(outval)); reporter.setStatus("OK"); } } }
  • 32. public static class LoadJoined extends MapReduceBase implements Mapper<Text, Text, Text, LongWritable> { public void map( Text k, Text val, OutputCollector<Text, LongWritable> oc, Reporter reporter) throws IOException { // Find the url String line = val.toString(); int firstComma = line.indexOf(','); int secondComma = line.indexOf(',', firstComma); String key = line.substring(firstComma, secondComma); // drop the rest of the record, I don't need it anymore, // just pass a 1 for the combiner/reducer to sum instead. Text outKey = new Text(key); oc.collect(outKey, new LongWritable(1L)); } } public static class ReduceUrls extends MapReduceBase implements Reducer<Text, LongWritable, WritableComparable, Writable> { public void reduce( Text key, Iterator<LongWritable> iter, OutputCollector<WritableComparable, Writable> oc, Reporter reporter) throws IOException { // Add up all the values we see long sum = 0; while (iter.hasNext()) { sum += iter.next().get(); reporter.setStatus("OK"); } oc.collect(key, new LongWritable(sum)); } } public static class LoadClicks extends MapReduceBase implements Mapper<WritableComparable, Writable, LongWritable, Text> { public void map( WritableComparable key, Writable val, OutputCollector<LongWritable, Text> oc, Reporter reporter) throws IOException { oc.collect((LongWritable)val, (Text)key); } } public static class LimitClicks extends MapReduceBase implements Reducer<LongWritable, Text, LongWritable, Text> { int count = 0; public void reduce( LongWritable key, Iterator<Text> iter, OutputCollector<LongWritable, Text> oc, Reporter reporter) throws IOException { // Only output the first 100 records while (count < 100 && iter.hasNext()) { oc.collect(key, iter.next()); count++; } } }
  • 33. public static void main(String[] args) throws IOException { JobConf lp = new JobConf(MRExample.class); lp.setJobName("Load Pages"); lp.setInputFormat(TextInputFormat.class); lp.setOutputKeyClass(Text.class); lp.setOutputValueClass(Text.class); lp.setMapperClass(LoadPages.class); FileInputFormat.addInputPath(lp, new Path("/user/gates/pages")); FileOutputFormat.setOutputPath(lp, new Path("/user/gates/tmp/indexed_pages")); lp.setNumReduceTasks(0); Job loadPages = new Job(lp); JobConf lfu = new JobConf(MRExample.class); lfu.setJobName("Load and Filter Users"); lfu.setInputFormat(TextInputFormat.class); lfu.setOutputKeyClass(Text.class); lfu.setOutputValueClass(Text.class); lfu.setMapperClass(LoadAndFilterUsers.class); FileInputFormat.addInputPath(lfu, new Path("/user/gates/users")); FileOutputFormat.setOutputPath(lfu, new Path("/user/gates/tmp/filtered_users")); lfu.setNumReduceTasks(0); Job loadUsers = new Job(lfu); JobConf join = new JobConf(MRExample.class); join.setJobName("Join Users and Pages"); join.setInputFormat(KeyValueTextInputFormat.class); join.setOutputKeyClass(Text.class); join.setOutputValueClass(Text.class); join.setMapperClass(IdentityMapper.class); join.setReducerClass(Join.class); FileInputFormat.addInputPath(join, new Path("/user/gates/tmp/indexed_pages")); FileInputFormat.addInputPath(join, new Path("/user/gates/tmp/filtered_users")); FileOutputFormat.setOutputPath(join, new Path("/user/gates/tmp/joined")); join.setNumReduceTasks(50); Job joinJob = new Job(join); joinJob.addDependingJob(loadPages); joinJob.addDependingJob(loadUsers); JobConf group = new JobConf(MRExample.class); group.setJobName("Group URLs"); group.setInputFormat(KeyValueTextInputFormat.class); group.setOutputKeyClass(Text.class); group.setOutputValueClass(LongWritable.class); group.setOutputFormat(SequenceFileOutputFormat.class); group.setMapperClass(LoadJoined.class); group.setCombinerClass(ReduceUrls.class); group.setReducerClass(ReduceUrls.class); FileInputFormat.addInputPath(group, new Path("/user/gates/tmp/joined")); FileOutputFormat.setOutputPath(group, new Path("/user/gates/tmp/grouped")); group.setNumReduceTasks(50); Job groupJob = new Job(group); groupJob.addDependingJob(joinJob); JobConf top100 = new JobConf(MRExample.class); top100.setJobName("Top 100 sites"); top100.setInputFormat(SequenceFileInputFormat.class); top100.setOutputKeyClass(LongWritable.class); top100.setOutputValueClass(Text.class); top100.setOutputFormat(SequenceFileOutputFormat.class); top100.setMapperClass(LoadClicks.class); top100.setCombinerClass(LimitClicks.class); top100.setReducerClass(LimitClicks.class); FileInputFormat.addInputPath(top100, new Path("/user/gates/tmp/grouped")); FileOutputFormat.setOutputPath(top100, new Path("/user/gates/top100sitesforusers18to25")); top100.setNumReduceTasks(1); Job limit = new Job(top100); limit.addDependingJob(groupJob); JobControl jc = new JobControl("Find top 100 sites for users 18 to 25"); jc.addJob(loadPages); jc.addJob(loadUsers); jc.addJob(joinJob); jc.addJob(groupJob); jc.addJob(limit); jc.run(); } }
  • 34. Users = load ‘users’ as (name, age); Fltrd = filter Users by age >= 18 and age <= 25; Pages = load ‘pages’ as (user, url); Jnd = join Fltrd by name, Pages by user; Grpd = group Jnd by url; Smmd = foreach Grpd generate group, COUNT(Jnd) as clicks; Srtd = order Smmd by clicks desc; Top5 = limit Srtd 5; store Top5 into ‘top5sites’;
  • 37. Tryb Pracy Lokalny lub Rozproszony 37
  • 38. Tryb Pracy Map Reduce lub Tez 38
  • 39. Typy danych 39 int long float double chararray datetime boolean bytearray biginteger bigdecimal
  • 41. Podstawy Pig Latin - wielkość liter • A = LOAD 'data' USING PigStorage() AS (f1:int, f2:int, f3:int);
 B = GROUP A BY f1;
 C = FOREACH B GENERATE COUNT ($0);
 DUMP C; • Nazwy zmiennych A, B, and C (tzw. aliasy) są case sensitive. • Wielkość liter jest też istotna dla: • nazwy pól f1, f2, i f3 • nazwy zmiennych A, B, C • nazwy funkcji PigStorage, COUNT • Z wyjątkiem: LOAD, USING, AS, GROUP, BY, FOREACH, GENERATE, oraz DUMP 41
  • 42. assert, and, any, all, arrange, as, asc, AVG, bag, BinStorage, by, bytearray, BIGINTEGER, BIGDECIMAL, cache, CASE, cat, cd, chararray, cogroup, CONCAT, copyFromLocal, copyToLocal, COUNT, cp, cross, datetime, %declare, %default, define, dense, desc, describe, DIFF, distinct, double, du, dump, e, E, eval, exec, explain, f, F, filter, flatten, float, foreach, full, generate, group, help, if, illustrate, import, inner, input, int, into, is, join, kill, l, L, left, limit, load, long, ls, map, matches, MAX, MIN, mkdir, mv, not, null, onschema, or, order, outer, output, parallel, pig, PigDump, PigStorage, pwd, quit, register, returns, right, rm, rmf, rollup, run, sample, set, ship, SIZE, split, stderr, stdin, stdout, store, stream, SUM, TextLoader, TOKENIZE, through, tuple, union, using, void 42 Słowa kluczowe
  • 43. Pierwsze kroki data = LOAD 'input' AS (query:CHARARRAY); A = LOAD 'data' USING PigStorage('t') AS (f1:int, f2:int, f3:int); STORE A INTO '/tmp/result' USING PigStorage(';') 43
  • 45. Kolejne kroki - operacje na danych A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int, scholarship:float); B = FILTER A BY age > 20; 45
  • 46. Kolejne kroki - operacje na danych A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int, scholarship:float); B = FILTER A BY age > 20; C = LIMIT B 5; 46
  • 47. Kolejne kroki - operacje na danych A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int, scholarship:float); B = FILTER A BY age > 20; C = LIMIT B 5; D = FOREACH C GENERATE name, scholarship*semestre as funds 47
  • 48. Kolejne kroki - operacje na danych A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int, scholarship:float); E = GROUP A by age 48
  • 49. Kolejne kroki - operacje na danych A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, semestre:int, scholarship:float); E = GROUP A by age F = FOREACH E GENERATE group as age, AVG(A.scholarship) 49
  • 51. Warsztat adres serwera ssh: 52.5.251.228 51 https://www.notehub.org/veuk1
  • 52. Wprowadzenie do analizy danych na przykładzie Hive 52
  • 54. Unikalne cechy Hive Zapytania SQL na plikach płaskich, np. CSV 54
  • 55. Unikalne cechy Hive Znaczne przyspieszenie analizy - nie potrzeba pisać Map Reduce Optymalizacja, wykonywanie części operacji w pamięci zamiast MR 55
  • 56. Unikalne cechy Hive Nieograniczone formy integracji - MongoDB, Elastic Search, HBase 56
  • 57. Unikalne cechy Hive Integracja narzędzi BI oraz DWH z Hive poprzez JDBC 57
  • 58. Hive CLI Tryb Interaktywny hive Tryb Wsadowy: hive -e ‘select foo from bar’ hive -f ‘/path/to/my/script.q’ hive -f ‘hdfs://namenode:port/path/to/my/ script.q’ więcej opcji: hive --help 58
  • 59. Typy danych INT, TINYINT, SMALLINT, BIGINT BOOLEAN DECIMAL FLOAT, DOUBLE STRING BINARY TIMESTAMP ARRAY, MAP, STRUCT, UNION DATE CHAR VARCHAR 59
  • 60. Składnia zapytań SELECT, INSERT, UPDATE GROUP BY UNION LEFT, RIGHT, FULL INNER, FULL OUTER JOIN OVER, RANK (NOT) IN, HAVING (NOT) EXISTS 60
  • 61. Data Definition Language • CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX • DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX • TRUNCATE TABLE • ALTER DATABASE/SCHEMA, TABLE, VIEW • MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS) • SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE • DESCRIBE DATABASE/SCHEMA, table_name, view_name 61
  • 62. Tabele CREATE TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' STORED AS TEXTFILE; 62
  • 63. Pierwsze kroki w Hive CREATE TABLE tablename1 (foo INT, bar STRING) PARTITIONED BY (ds STRING); LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename1; INSERT INTO TABLE tablename1 PARTITION (ds='2014') select_statement1 FROM from_statement; 63
  • 64. Inne formaty plików? SerDe 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/ start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)" CREATE TABLE apachelog ( host STRING, identity STRING, user STRING, time STRING, request STRING, status STRING, size STRING, referer STRING, agent STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( "input.regex" = "([^]*) ([^]*) ([^]*) (-|[^]*]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|".*") ([^ "]*|".*"))?" ) STORED AS TEXTFILE; 64
  • 65. Inne formaty plików? SerDe CREATE TABLE table ( foo STRING, bar STRING) STORED AS TEXTFILE; ← lub SEQUENCEFILE, ORC, AVRO lub PARQUET 65
  • 66. Zalety, wady, porównanie Hive Pig deklaratywny proceduralny tabele tymczasowe pipeline polegamy na optymalizatorze bardziej ingerujemy w implementacje UDF, Transform UDF, streaming sterowniki sql data pipeline splits 66
  • 71.
  • 73. Chcesz wiedzieć więcej? Szkolenia pozwalają na indywidualną pracę z każdym uczestnikiem • pracujemy w grupach 4-8 osobowych • program może być dostosowany do oczekiwań grupy • rozwiązujemy i odpowiadamy na indywidualne pytania uczestników • mamy dużo więcej czasu :)
  • 74. Szkolenie dedykowane dla Ciebie Jesteś architektem lub team leaderem? • na przekrojowym szkoleniu 5-dniowym omawiamy i ćwiczymy cały ekosystem Hadoopa • na szkoleniu dedykowanym dla architektów dyskutujemy o projektowaniu systemów BigData Jesteś analitykiem? • na dedykowanym szkoleniu przećwiczysz w szczegółach Pig i Hive i rozwiążesz przykładowe problemy analityczne
  • 75. Szkolenie dedykowane dla Ciebie Jesteś programistą? • szkolenie 3-dniowe pozwala w szczegółach zapoznać się z programowaniem zaawansowanych aspektów MapReduce w Javie i programowaniem w podejściu strumieniowym Interesuje Cię całość zagadnienia BigData? • Przetwarzanie Big Data z użyciem Apache Spark • Bazy danych NoSQL - Cassandra • Bazy danych NoSQL - MongoDB
  • 76. źródła • HikingArtist.com - rysunki • hortonworks.com - architektura HDP • apache.org - grafiki Pig, Hive, Hadoop