Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.

Big Data
Zekeriya Beşiroğlu
http://zekeriyabesiroglu.com
http://bilginc.com
http://twitter.com/zbesiroglu

Zekeriya Besiroglu
• Bilginc IT Academy - Expert
Consultant
• + 16 IT
• +14 ORACLE DB/DWH
• +7 WEBLOGIC
• +3 BIG DATA
• TROUG
• Speaker

DATA TRENS
- Facebook has around 60 PB
warehouse and it’s constantly growing
- Twitter messages are 140 bytes each
generating 8TB data per day.
-Data is more than doubling every
year. 
-Almost 80% of data will be
unstructured data.
-Amazon: 35% of product sales come
from product recommendations

New Type of DATA?
• Sentiment : Understand how your customers feel about
your products / company
• Sensor/Machine:Discover patters in data streaming
automatically from sensors and machines.
• Unstructured: text,video,pictures.
• Server Logs:Search logs ﬁnd pattern
• Geographic:Analyze location-based data
• Clickstream:Capture and analyze website visitors data

Big Data
https://www.youtube.com/watch?v=1GU4Imbo6R8

Capacity vs Cost
Year Capacity(GB) Cost per GB(USD)
1990 0.10 $4000
1997 2 $150
2002 80 $3.75
2007 750 $0.35
2012 3.000 $0.05
2015 10.000 $0.02

What is Big Data
• Big Data is When the Volume,Velocity,Variety of
data gets to the point where it is too difﬁcult/
expensive for traditional systems to work with.

Traditional Large scale
Computing System Problems
• Computation has been
processor bound
• Relatively small amount
of data
• Complex processing
• Need bigger computers
• More memory,More/fast
processor

Better Solution
• Distributed Systems- Multiple
machine run for single job
Problem Of Distributed Systems
Data Stored central location
Data Copied processor runtime

Todays
• Total Data size PetaBytes
• Daily Terabytes
We Need New Solution
HADOOP

HADOOP
• Distribute the Data when it is stored
SPARK
Data is Distributed in Memory

Hadoop
• Hadoop consist of two component
• HDFS
• Map Reduce
• Hadoop ecosystem
• Pig,Hive,Hbase,Flume,Oozie,Sqoop,etc

Traditional ETL
Source Layer
Structured Data DWH Data Mart
ETL/ELT ETL/ELT
Hadoop ETL
Source Layer
Structured Data
UnStructed Data
DWH Data MartHADOOP

HDFS
• Hadoop Distributed File System:Storing data
• Data Split into blocks. 64 Mb…
• Each Block replicated e.g 3 times. replicas store different
nodes.
• Based on Google File system
• ext3,ext4,xfs
• No random writes allowed. Prefer large streaming reads

HDFS
• hadoop fs -ls (user home directory)
• hadoop fs -ls / (root directory)
• hadoop fs -cat /user/zekeriya/deneme.txt
• hadoop fs -mkdir
• hadoop fs -rm -r veri

MapReduce
• Process Data in the Hadoop Cluster
• Two Stage MAP and REDUCE

MAPREDUCE
map(String input_key, String input_value)
foreach word w in input_value:
emit(w, 1)
reduce(String output_key,
Iterator<int> intermediate_vals)
set count = 0
foreach v in intermediate_vals:
count += v
emit(output_key, count)
(1000,’Galatasaray sampiyon olur’)
(2000,’beşiktas sampiyon olur’)
(2200,’Galatasaray Türkiyedir’)

MAPREDUCE
Output Mapper
(‘Galatasaray’, 1), (‘sampiyon’, 1), (‘olur’, 1), (‘beşiktas’, 1),
(‘sampiyon, 1), (‘olur’, 1), (‘Galatasaray’, 1), (‘Türkiyedir’, 1)
Intermediate Data Reducer’a gönderilen
(‘Galatasaray’,[1,1])
(‘sampiyon’,[1,1])
(‘olur’,[1])
(‘beşiktas’,[1])
(‘Türkiyedir’,[1])
Reducer’ın son cıktısı
(‘Galatasaray’,2)
(‘sampiyon’,2)
(‘olur’,1)
(‘beşiktas’,1)
(‘Türkiyedir’,1)

Hadoop Ecosystem
• HIVE
• LIKE SQL
• User query data in hadoop cluster without knowing Java and Map
reduce.
• PIG
• Uses a dataﬂow scripting language
• IMPALA
• Open source project created by cloudier
• Very similar to HiveQL.Produces much faster.

Hadoop Ecosystem
• FLUME
• Import data into HDFS as it is generated
• Log files from a Web Server
• Sqoop
• Import data from tables in a OLTP into HDFS
• Populate database tables from files in HDFS
• Oozi
• Developers create a workflow of MapReduce Jobs

Hadoop Ecosystem
• HBASE
• HADOOP DATABASE
• NOSQL DATASTORE
• HUGE DATA STORE,GB,TB,PB
• Query Language get/put/scan
• Read/write Throughput Millions of query ps ,rdbms
is 1000s queries/second

Big Data
• Finance ,Fraud detection,Customer risk analysis
• Retail, Product recommendation,buy and discount
• Advertising,More effective web ads
• Defense
• Telco
• Healthcare

Analyzing Twitter Data
• https://github.com/cloudera/cdh-twitter-
example

Career Path
• Develop with Hadoop
• Hadoop Administration
• Hadoop for Data Scientists & Analysts

Zekeriya Beşiroğlu
http://zekeriyabesiroglu.com
http://twitter.com/zbesiroglu
http://bilginc.com
http://troug.org
mail to:zekeriyab@bilginc.com
zekeriyabesiroglu@gmail.com

Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.

Similar to Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data. (20)

More from Zekeriya Besiroglu

More from Zekeriya Besiroglu (7)

Recently uploaded

Recently uploaded (20)

Bigdata Nedir? Hadoop Nedir? MapReduce Nedir? Big Data.