Big Data

Big Data: hype or necessity?
Dr. ir. ing. Bart Vandewoestyne
Sizing Servers Lab, Howest, Kortrijk

February 18, ...
Big Data

Outline

1

Introduction
Big Data?

2

Big Data Technology
Hadoop
Pig, Hive
NoSQL

3

Big Data in my company?

4...
Big Data
Introduction

Outline

1

Introduction
Big Data?

2

Big Data Technology
Hadoop
Pig, Hive
NoSQL

3

Big Data in m...
Big Data
Introduction
Big Data?

Exponential growth of data

Big Data: This is just the beginning
100

9000

Percentage of...
Big Data
Introduction
Big Data?

Examples
Facebook hosts ≈ 10 billion photos ≈ 1 petabyte

Large Hadron Collider: will pro...
Big Data
Introduction
Big Data?

Examples
RFID readers

vehicle GPS traces

Smart energy meters

6 / 71
Big Data
Introduction
Big Data?

Examples from Flanders
Myriade

Be-Mobile / Touring Mobilis

Colruyt

avoid traffic jams
fin...
Big Data
Introduction
Big Data?

Big Data definition

Definition of Big Data depends on who you ask:
Big Data
“Multiple tera...
Big Data
Introduction
Big Data?

Quotes on Big Data

“Big data” is a subjective label attached to situations in
which huma...
Big Data
Introduction
Big Data?

The Three V’s

Volume The amount of data is big.
Variety Different kinds of data:
structur...
Big Data
Introduction
Big Data?

Structured data
Structured data
Pre-defined schema imposed on the data
Highly structured
U...
Big Data
Introduction
Big Data?

Semi-structured data

Semi-structured data
Inconsistent structure.
Cannot be stored in ro...
Big Data
Introduction
Big Data?

Semi-structured data: examples

Example
<?xml version="1.0"?>
<catalog>
<book id="bk101">...
Big Data
Introduction
Big Data?

Semi-structured data: examples
Example
@article{vandewoestyne2007cools_b,
author = {Vande...
Big Data
Introduction
Big Data?

Unstructured data
Definition (Unstructured data)
Lacks structure or parts of it lack struc...
Big Data
Introduction
Big Data?

Data Storage and Analysis
Storage capacity of hard drives has increased massively over
th...
Big Data
Introduction
Big Data?

Working in parallel
Problems
1 Hardware failure?
2

Combining data from different disks fo...
Big Data
Big Data Technology

Outline

1

Introduction
Big Data?

2

Big Data Technology
Hadoop
Pig, Hive
NoSQL

3

Big Da...
Big Data
Big Data Technology

Big Data Landscape

19 / 71
Big Data
Big Data Technology

Big Data Landscape

20 / 71
Big Data
Big Data Technology
Hadoop

Hadoop

Hadoop is VMware, but the other way around.

21 / 71
Big Data
Big Data Technology
Hadoop

Hadoop as the opposite of a virtual machine

VMware
1 take one physical server

Hadoo...
Big Data
Big Data Technology
Hadoop

Hadoop: core functionality

HDFS Self-healing, high-bandwidth, clustered storage.
Map...
Big Data
Big Data Technology
Hadoop

HDFS architecture

24 / 71
Big Data
Big Data Technology
Hadoop

MapReduce

25 / 71
Big Data
Big Data Technology
Hadoop

MapReduce

26 / 71
Big Data
Big Data Technology
Hadoop

Hadoop: applications
Example Hadoop stack:

→ Hadoop distributions
27 / 71
Big Data
Big Data Technology
Hadoop

Example Hadoop distributions

28 / 71
Big Data
Big Data Technology
Hadoop

Hadoop vs RDBMS
Relational Database Management Systems (RDBMS):
some queries → msecs
...
Big Data
Big Data Technology
Hadoop

Hadoop vs RDBMS
Hadoop:
some queries → seconds,
minutes
other queries → seconds!!!
Us...
Big Data
Big Data Technology
Pig, Hive

Apache Hadoop essentials: technology stack

31 / 71
Big Data
Big Data Technology
Pig, Hive

Pig

MapReduce requires programmers
think in terms of map and reduce
functions,
mo...
Big Data
Big Data Technology
Pig, Hive

Pig Latin

Pig Latin
Originally from Yahoo! to allow analysts to access data.
Data...
Big Data
Big Data Technology
Pig, Hive

Pig example

Load pages

Load users
Filter
by age
Join on
name
Group
on URL
Count
...
Big Data
Big Data Technology
Pig, Hive

In MapReduce

. . . 170 lines of Java MapReduce code . . .

35 / 71
Big Data
Big Data Technology
Pig, Hive

In Pig Latin

Example
Users
Fltrd
Pages
Jnd
Grpd
Smmd
Srtd
Top5
store

= load ’use...
Big Data
Big Data Technology
Pig, Hive

Hive

Originated at Facebook to analyze log data.
HiveQL: Hive Query Language, sim...
Big Data
Big Data Technology
Pig, Hive

Hive: example

Example (Create table to hold weather data)
CREATE TABLE records (y...
Big Data
Big Data Technology
Pig, Hive

Hive: example

Example (Run query)
hive>
>
>
>
>
1949
1950

SELECT year, MAX(tempe...
Big Data
Big Data Technology
NoSQL

NoSQL

40 / 71
Big Data
Big Data Technology
NoSQL

RDBMS: Codd’s 12 rules

Codd’s 12 rules
A set of rules designed to define what is requi...
Big Data
Big Data Technology
NoSQL

ACID

ACID
A set of properties that guarantee that database transactions are
processed...
Big Data
Big Data Technology
NoSQL

Scaling up
What if you need to scale up your RDBMS in terms of
dataset size,
read/writ...
Big Data
Big Data Technology
NoSQL

NoSQL

NoSQL
‘Invented’ by Carl Strozzi in 1998 (for his file-based database)

“Not onl...
Big Data
Big Data Technology
NoSQL

NoSQL databases
Four emerging NoSQL categories:

45 / 71
Big Data
Big Data Technology
NoSQL

Key-Value stores or ‘the big hash table’

Keys

Values

13a1

Nexus 32 GB

13a2

Nexus...
Big Data
Big Data Technology
NoSQL

Key-Value stores or ’the big hash table’

47 / 71
Big Data
Big Data Technology
NoSQL

Column-oriented DBMS
Example
Id
10
12
11
22

LastName
Smith
Jones
Johnson
Jones

First...
Big Data
Big Data Technology
NoSQL

Column family based databases

Like column-oriented DBMS, but with a twist
Columns and...
Big Data
Big Data Technology
NoSQL

Column family based databases

Most complex NoSQL database type.
Based on Google’s Big...
Big Data
Big Data Technology
NoSQL

Document databases
Data is stored as a collection of
documents
(JSON, XML,. . . but al...
Big Data
Big Data Technology
NoSQL

Document databases
Example (Document 2)
{
FirstName: "Jonathan",
Address: "15 Wanamass...
Big Data
Big Data Technology
NoSQL

Document databases: examples

53 / 71
Big Data
Big Data Technology
NoSQL

Graph databases
Sister in-Law To

Julie

ed

Lis
Rock
Music

Steve

M
arr
i

o
sT
ten
...
Big Data
Big Data Technology
NoSQL

Graph databases: examples

Well-suited for problems with network-structure:
mine data ...
Big Data
Big Data Technology
NoSQL

Us the right tool for the right job!

http://db-engines.com/
56 / 71
Big Data
Big Data in my company?

Outline

1

Introduction
Big Data?

2

Big Data Technology
Hadoop
Pig, Hive
NoSQL

3

Bi...
Big Data
Big Data in my company?

Typical RDBMS scaling story
1. Initial Public Launch
From local workstation → remotely h...
Big Data
Big Data in my company?

Typical RDBMS scaling story

4. New features → query complexity ↑, now too many joins
De...
Big Data
Big Data in my company?

Typical RDBMS scaling story
6. Some queries are still too slow
Periodically premateriali...
Big Data
Big Data in my company?

Two types of companies (personal view)

‘Core Big Data’ company
Core business = big data...
Big Data
Big Data in my company?

Two types of companies (personal view)

‘General Big Data’ company
Some other core busin...
Big Data
Big Data in my company?

Use-cases of Big Data

‘Core Big Data’ company
Big Data

‘General Big Data’ company
Busi...
Big Data
Big Data in my company?

Some examples

Intrusion detection based on
server log data
Real-time security analytics...
Big Data
Big Data in my company?

Some examples
How to predict wine quality?
Skip tasting! Use science!
Weather seems the ...
Big Data
Big Data in my company?

Big Data in your company
Big data is typically a division of the IT-department.
Requires...
Big Data
Big Data in my company?

Big Data in your company

67 / 71
Big Data
Big Data in my company?

IWT TETRA project

Our current mission
IWT TETRA project
Submission deadline: March 12, ...
Big Data
Conclusions

Outline

1

Introduction
Big Data?

2

Big Data Technology
Hadoop
Pig, Hive
NoSQL

3

Big Data in my...
Big Data
Conclusions

Conclusions

“Big” can be small too.
The Big Data landscape is huge.
RDBMS and SQL are not dead.
The...
Big Data
Conclusions

Questions?

Questions?

johan@sizingservers.be
bart@sizingservers.be
71 / 71
Upcoming SlideShare
Loading in...5
×

Big Data: hype or necessity?

2,976

Published on

An introduction to Big Data and related technologies, targeted towards an audience of local SME employees and managers.

Published in: Technology

Big Data: hype or necessity?

  1. 1. Big Data Big Data: hype or necessity? Dr. ir. ing. Bart Vandewoestyne Sizing Servers Lab, Howest, Kortrijk February 18, 2014 1 / 71
  2. 2. Big Data Outline 1 Introduction Big Data? 2 Big Data Technology Hadoop Pig, Hive NoSQL 3 Big Data in my company? 4 Conclusions 2 / 71
  3. 3. Big Data Introduction Outline 1 Introduction Big Data? 2 Big Data Technology Hadoop Pig, Hive NoSQL 3 Big Data in my company? 4 Conclusions 3 / 71
  4. 4. Big Data Introduction Big Data? Exponential growth of data Big Data: This is just the beginning 100 9000 Percentage of uncertain data 80 7000 60 6000 5000 You are here Social Media 40 4000 VoIP Percent of uncertain data Volume in Exabytes 8000 Sensors & Devices 20 3000 Enterprise Data 0 2010 © 2013 International Business Machines Corporation 2015 4 4 / 71
  5. 5. Big Data Introduction Big Data? Examples Facebook hosts ≈ 10 billion photos ≈ 1 petabyte Large Hadron Collider: will produce ≈ 15 petabytes per year 5 / 71
  6. 6. Big Data Introduction Big Data? Examples RFID readers vehicle GPS traces Smart energy meters 6 / 71
  7. 7. Big Data Introduction Big Data? Examples from Flanders Myriade Be-Mobile / Touring Mobilis Colruyt avoid traffic jams find optimal planning 7 / 71
  8. 8. Big Data Introduction Big Data? Big Data definition Definition of Big Data depends on who you ask: Big Data “Multiple terabytes or petabytes.” (according to some professionals) “I don’t know.” (today’s big may be tomorrow’s normal) “Relative to its context.” 8 / 71
  9. 9. Big Data Introduction Big Data? Quotes on Big Data “Big data” is a subjective label attached to situations in which human and technical infrastructures are unable to keep pace with a company’s data needs. It’s about recognizing that for some problems other storage solutions are better suited. 9 / 71
  10. 10. Big Data Introduction Big Data? The Three V’s Volume The amount of data is big. Variety Different kinds of data: structured semi-structured unstructured Velocity Speed-issues to consider: How fast is the data available for analysis? How fast can we do something with it? Other V’s: Veracity, Variability, Validity, Value,. . . 10 / 71
  11. 11. Big Data Introduction Big Data? Structured data Structured data Pre-defined schema imposed on the data Highly structured Usually stored in a relational database system Example numbers: 20, 3.1415,. . . strings: ”Hello World” dates: 21/03/1978 ... Roughly 20% of all data out there is structured. 11 / 71
  12. 12. Big Data Introduction Big Data? Semi-structured data Semi-structured data Inconsistent structure. Cannot be stored in rows and tables in a typical database. Information is often self-describing (label/value pairs). Example XML, SGML,. . . tweets BibTeX files sensor feeds logs ... 12 / 71
  13. 13. Big Data Introduction Big Data? Semi-structured data: examples Example <?xml version="1.0"?> <catalog> <book id="bk101"> <author>Gambardella, Matthew</author> <title>XML Developer’s Guide</title> <genre>Computer</genre> <price>44.95</price> </book> </catalog> 13 / 71
  14. 14. Big Data Introduction Big Data? Semi-structured data: examples Example @article{vandewoestyne2007cools_b, author = {Vandewoestyne, Bart and Cools, Ronald}, title = {On obtaining higher order convergence for smooth periodic functions}, journal = {Journal of Complexity}, year = {2008}, volume = {24}, number = {3}, pages = {328--340}, month = jun } 14 / 71
  15. 15. Big Data Introduction Big Data? Unstructured data Definition (Unstructured data) Lacks structure or parts of it lack structure. Example multimedia: videos, photos, audio files,. . . word processing documents email messages reports free-form text ... presentations Experts estimate that 80 to 90 % of the data in any organization is unstructured. 15 / 71
  16. 16. Big Data Introduction Big Data? Data Storage and Analysis Storage capacity of hard drives has increased massively over the years. Access speeds have not kept up. Example (Reading a whole disk) Year 1990 2010 Storage Capacity 1370 MB 1 TB Transfer Speed 4.4 MB/s 100 MB/s Time ≈ 5 minutes > 2.5 hours Solution: work in parallel! Using 100 drives (each holding 1/100th of the data), reading 1 TB takes less than 2 minutes. 16 / 71
  17. 17. Big Data Introduction Big Data? Working in parallel Problems 1 Hardware failure? 2 Combining data from different disks for analysis? Solutions 1 HDFS: Hadoop Distributed Filesystem 2 MapReduce: programming model 17 / 71
  18. 18. Big Data Big Data Technology Outline 1 Introduction Big Data? 2 Big Data Technology Hadoop Pig, Hive NoSQL 3 Big Data in my company? 4 Conclusions 18 / 71
  19. 19. Big Data Big Data Technology Big Data Landscape 19 / 71
  20. 20. Big Data Big Data Technology Big Data Landscape 20 / 71
  21. 21. Big Data Big Data Technology Hadoop Hadoop Hadoop is VMware, but the other way around. 21 / 71
  22. 22. Big Data Big Data Technology Hadoop Hadoop as the opposite of a virtual machine VMware 1 take one physical server Hadoop 1 take many physical servers 2 split it up 2 merge them all together 3 get many small virtual servers 3 get one big, massive, virtual server 22 / 71
  23. 23. Big Data Big Data Technology Hadoop Hadoop: core functionality HDFS Self-healing, high-bandwidth, clustered storage. MapReduce Distributed, fault-tolerant resource management, coupled with scalable data processing. 23 / 71
  24. 24. Big Data Big Data Technology Hadoop HDFS architecture 24 / 71
  25. 25. Big Data Big Data Technology Hadoop MapReduce 25 / 71
  26. 26. Big Data Big Data Technology Hadoop MapReduce 26 / 71
  27. 27. Big Data Big Data Technology Hadoop Hadoop: applications Example Hadoop stack: → Hadoop distributions 27 / 71
  28. 28. Big Data Big Data Technology Hadoop Example Hadoop distributions 28 / 71
  29. 29. Big Data Big Data Technology Hadoop Hadoop vs RDBMS Relational Database Management Systems (RDBMS): some queries → msecs other queries → hours, days use when latency is important ACID transactions (banking,. . . ) 100% SQL compliance Very fast to max speed! Unstructured data → BLOB :-( 29 / 71
  30. 30. Big Data Big Data Technology Hadoop Hadoop vs RDBMS Hadoop: some queries → seconds, minutes other queries → seconds!!! Use when: Slower to (higher) max speed. . . throughput important scalability of storage/compute (un|semi)structured data complex data processing (NoSQL, Java, C, Python,. . . ) 30 / 71
  31. 31. Big Data Big Data Technology Pig, Hive Apache Hadoop essentials: technology stack 31 / 71
  32. 32. Big Data Big Data Technology Pig, Hive Pig MapReduce requires programmers think in terms of map and reduce functions, more than likely use the Java language. Pig provides a high-level language (Pig Latin) that can be used by Analysts Data Scientists Statisticians Etc. . . 32 / 71
  33. 33. Big Data Big Data Technology Pig, Hive Pig Latin Pig Latin Originally from Yahoo! to allow analysts to access data. Dataflow language. Makes it simpler to write MapReduce programs. Abstracts you from specific details → focus on data processing. Has User Defined Functions (UDFs). Compiles script into a set of MapReduce jobs. 33 / 71
  34. 34. Big Data Big Data Technology Pig, Hive Pig example Load pages Load users Filter by age Join on name Group on URL Count clicks Input data file with user data file with website data Your task Find the top 5 most visited pages by users aged 18-25. Order by clicks Take top 5 34 / 71
  35. 35. Big Data Big Data Technology Pig, Hive In MapReduce . . . 170 lines of Java MapReduce code . . . 35 / 71
  36. 36. Big Data Big Data Technology Pig, Hive In Pig Latin Example Users Fltrd Pages Jnd Grpd Smmd Srtd Top5 store = load ’users’ as (name, age); = filter Users by age >= 18 and age <= 25; = load ’pages’ as (user, url); = join Fltrd by name, Pages by user; = group Jnd by url; = foreach Grpd generate group, COUNT(Jnd) as clicks; = order Smmd by clicks desc; = limit Srtd 5; Top5 into ’top5sites’; Only 9 lines of Pig Latin. 36 / 71
  37. 37. Big Data Big Data Technology Pig, Hive Hive Originated at Facebook to analyze log data. HiveQL: Hive Query Language, similar to standard SQL. Queries are compiled into MapReduce jobs. Has command-line shell, similar to e.g. MySQL shell. 37 / 71
  38. 38. Big Data Big Data Technology Pig, Hive Hive: example Example (Create table to hold weather data) CREATE TABLE records (year STRING, temperature INT, quality INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ’t’; Example (Populate Hive with the data) LOAD DATA LOCAL INPATH ’input/sample.txt’ OVERWRITE INTO TABLE records; 38 / 71
  39. 39. Big Data Big Data Technology Pig, Hive Hive: example Example (Run query) hive> > > > > 1949 1950 SELECT year, MAX(temperature) FROM records WHERE temperature != 9999 AND (quality = 0 OR quality = 1) GROUP BY year; 111 22 39 / 71
  40. 40. Big Data Big Data Technology NoSQL NoSQL 40 / 71
  41. 41. Big Data Big Data Technology NoSQL RDBMS: Codd’s 12 rules Codd’s 12 rules A set of rules designed to define what is required from a database management system in order for it to be considered relational. Rule 0 The Foundation rule Rule 1 The Information rule Rule 2 The guaranteed access rule Rule 3 Systematic treatment of null values Rule 4 Active online catalog based on the relational model ... ... 41 / 71
  42. 42. Big Data Big Data Technology NoSQL ACID ACID A set of properties that guarantee that database transactions are processed reliably. Atomicity A transaction is all or nothing. Consistency Only transactions with valid data. Isolation Simultaneous transactions will not interfere. Durability Written transaction data stays there “forever” (even in case of power loss, crashes, errors,. . . ). 42 / 71
  43. 43. Big Data Big Data Technology NoSQL Scaling up What if you need to scale up your RDBMS in terms of dataset size, read/write concurrency? This usually involves breaking Codds rules, loosening ACID restrictions, forgetting conventional DBA wisdom, loose most of the desirable properties that made RDBMS so convenient in the first place. NoSQL to the rescue! 43 / 71
  44. 44. Big Data Big Data Technology NoSQL NoSQL NoSQL ‘Invented’ by Carl Strozzi in 1998 (for his file-based database) “Not only SQL” It’s NOT about saying that SQL should never be used, saying that SQL is dead. 44 / 71
  45. 45. Big Data Big Data Technology NoSQL NoSQL databases Four emerging NoSQL categories: 45 / 71
  46. 46. Big Data Big Data Technology NoSQL Key-Value stores or ‘the big hash table’ Keys Values 13a1 Nexus 32 GB 13a2 Nexus 16 GB 13a3 Nexus 08 GB Most basic type of NoSQL databases. Aggregation of key-value pairs. Typically only 4 operations: create(key, value) read(key) update(key, value) delete(key) Fast, scalable, less complex. Mainly used for systems with simple queries (caches etc. . . . ) 46 / 71
  47. 47. Big Data Big Data Technology NoSQL Key-Value stores or ’the big hash table’ 47 / 71
  48. 48. Big Data Big Data Technology NoSQL Column-oriented DBMS Example Id 10 12 11 22 LastName Smith Jones Johnson Jones FirstName Joe Mary Cathy Bob Salary 40000 50000 44000 55000 Row-based: 10,Smith,Joe,40000;12,Jones,Mary,50000;11,Johnson,Cathy,44000;22,Jones,Bob,55000 Column-based: 10,12,11,22;Smith,Jones,Johnson,Jones;Joe,Mary,Cathy,Bob;40000,50000,44000,55000 48 / 71
  49. 49. Big Data Big Data Technology NoSQL Column family based databases Like column-oriented DBMS, but with a twist Columns and supercolumns ≈ RDBMS table columns Family of columns ≈ RDBMS table Keyspace ≈ RDBMS database 49 / 71
  50. 50. Big Data Big Data Technology NoSQL Column family based databases Most complex NoSQL database type. Based on Google’s BigTable paper. More flexibility than traditional RDBMS: adding (super)columns is always possible. Excellent for analysis and mass treatment of data (via Map-Reduce type operations) 50 / 71
  51. 51. Big Data Big Data Technology NoSQL Document databases Data is stored as a collection of documents (JSON, XML,. . . but also PDF, Excel,. . . ) Documents → collection of key-value pairs Values can be simple values arrays another document (collection of key-values) Schemaless Quite well queryable 51 / 71
  52. 52. Big Data Big Data Technology NoSQL Document databases Example (Document 2) { FirstName: "Jonathan", Address: "15 Wanamassa Road", Children: [ {Name: "Michael", Age: 10}, {Name: "Jennifer", Age: 8}, {Name: "Samantha", Age: 5}, {Name: "Elena", Age: 2} ] Example (Document 1) { FirstName: "Bob", Address: "5 Oak St.", Hobby: "sailing" } } Best suited for custom queries like the ones in RDBMS. Quite popular for Content Management Systems. 52 / 71
  53. 53. Big Data Big Data Technology NoSQL Document databases: examples 53 / 71
  54. 54. Big Data Big Data Technology NoSQL Graph databases Sister in-Law To Julie ed Lis Rock Music Steve M arr i o sT ten To Listens To o Br Bob Colleague Of Fido Has Pet Jim f rO e th Drives W ork s BMW Fo r Works For IBM Based on graph theory. Employ nodes (objects) and edges (relations between objects). 54 / 71
  55. 55. Big Data Big Data Technology NoSQL Graph databases: examples Well-suited for problems with network-structure: mine data from social media “customers who bought this also looked at. . . ” relations between persons ... 55 / 71
  56. 56. Big Data Big Data Technology NoSQL Us the right tool for the right job! http://db-engines.com/ 56 / 71
  57. 57. Big Data Big Data in my company? Outline 1 Introduction Big Data? 2 Big Data Technology Hadoop Pig, Hive NoSQL 3 Big Data in my company? 4 Conclusions 57 / 71
  58. 58. Big Data Big Data in my company? Typical RDBMS scaling story 1. Initial Public Launch From local workstation → remotely hosted MySQL instance. 2. Service popularity ↑, too many reads hitting the database Add memcached to cache common queries. Reads are now no longer strictly ACID; cached data must expire. 3. Popularity ↑↑, too many writes hitting the database Scale MySQL vertically by buying a beefed-up server:  16 cores   Costly 128 GB of RAM   banks of 15 k RPM hard drives 58 / 71
  59. 59. Big Data Big Data in my company? Typical RDBMS scaling story 4. New features → query complexity ↑, now too many joins Denormalize your data to reduce joins. (Thats not what they taught me in DBA school!) 5. Rising popularity swamps the server; things are too slow Stop doing any server-side computations. 59 / 71
  60. 60. Big Data Big Data in my company? Typical RDBMS scaling story 6. Some queries are still too slow Periodically prematerialize the most complex queries, and try to stop joining in most cases. 7. Reads are OK, writes are getting slower and slower. . . Drop secondary indexes and triggers (no indexes?). If you stay up at night worrying about your database (uptime, scale, or speed), you should seriously consider making a jump from the RDBMS world to HBase. 60 / 71
  61. 61. Big Data Big Data in my company? Two types of companies (personal view) ‘Core Big Data’ company Core business = big data processing, crunching, analyzing,. . . Example Google, Facebook,. . . Smart metering companies Video/Image processing companies Biotech companies with sequencing data ... 61 / 71
  62. 62. Big Data Big Data in my company? Two types of companies (personal view) ‘General Big Data’ company Some other core business. Lots of useful data is available. Desirable: business analytics, process optimization,. . . Example Supermarkets → customer cards Transport firms → GPS-traces ... 62 / 71
  63. 63. Big Data Big Data in my company? Use-cases of Big Data ‘Core Big Data’ company Big Data ‘General Big Data’ company Business Analytics improve decision-making, crunching, gain operational insights, hacking, increase overall performance, processing, analyzing, ... track and analyze shopping patterns, ... Both Explore! Discover hidden gems! 63 / 71
  64. 64. Big Data Big Data in my company? Some examples Intrusion detection based on server log data Real-time security analytics Fraud detection Customer behavior based sentiment analysis of social media Campaign analytics 64 / 71
  65. 65. Big Data Big Data in my company? Some examples How to predict wine quality? Skip tasting! Use science! Weather seems the key variable. Correlate historical weather & wine data. Reduce fuel cost and improve driver safety by analyzing geolocation data 65 / 71
  66. 66. Big Data Big Data in my company? Big Data in your company Big data is typically a division of the IT-department. Requires skilled people: sysadmins software developers data-scientists visualization experts ... Advice, trend (Andrew McAfee) Give geeks a seat at the decision-making table. 66 / 71
  67. 67. Big Data Big Data in my company? Big Data in your company 67 / 71
  68. 68. Big Data Big Data in my company? IWT TETRA project Our current mission IWT TETRA project Submission deadline: March 12, 2014 Three pillars New to Big Data Tech? → Explain, Advise and Help Already using Big Data Tech? → Benchmark and Tune Got Data? → Analyse and Visualize Interested? → Come talk to us! 68 / 71
  69. 69. Big Data Conclusions Outline 1 Introduction Big Data? 2 Big Data Technology Hadoop Pig, Hive NoSQL 3 Big Data in my company? 4 Conclusions 69 / 71
  70. 70. Big Data Conclusions Conclusions “Big” can be small too. The Big Data landscape is huge. RDBMS and SQL are not dead. The right tool for the right job! Your company can benefit from Big Data technology. We can help. Be brave in your quest. . . 70 / 71
  71. 71. Big Data Conclusions Questions? Questions? johan@sizingservers.be bart@sizingservers.be 71 / 71

×