Introduction to Big Data

Introduction to Big Data
Dr. Putchong Uthayopas
Department of Computer Engineering,
Faculty of Engineering, Kasetsart University
Email: putchong@ku.th

We
are
living
in
the
world
of
Data

Geophysical
Exploration
Medical Imaging
Video
Surveillance
Mobile Sensors
Gene Sequencing
Smart Grids
Social Media

Big data is high-volume, high-velocity and high-
variety information assets that demand cost-
effective, innovative forms of information
processing for enhanced insight and decision
making.
“Gartner Inc.”

Why
BigData?

• Improve
product
and

service

• Increase
customer

sa<sfac<on/behavior

• Improve
opera<on

eﬃciency

• Understand

emerging
market

trends

The
real
value

of
big
data
is
in

the
insights
it

produces
when

analyzed—
discovered

paEerns,

derived

meaning,

indicators
for

decisions,
and

ul<mately
the

ability
to

respond
to
the

world

with

greater

intelligence.

Know thy self, know
thy enemy. A
thousand battles, a
thousand victories.
h#p://www.intel.com/content/dam/www/public/us/en/
documents/product-‐briefs/big-‐data-‐cloud-‐technologies-‐
brief.pdf
)

Source:
The
ﬁeld
guide
to
Data
Science

Big
Data
vs
Business
Intelligent
vs.

Analy<cs

•  BI
soLware
and
technology

– Well
structure
data
from
warehouse

– Visual
Representa<on
of
data
to
gain
insight
into

data

– 
Some
predic<ve
capability
such
as
sta<s<cal

analysis
,
Data
mining

•  Big
Data

– Focus
on
analysis
of
huge
and
unstructured
data

set
to
gain
insight
informa<on
automa<cally

Property
of
Big
Data

BIG
Data

Volume

Velocity

Variety

Volume

•  Big
data
must
be

huge

– Beyond
the

capability
of
a
single

computer
server
to

process
it

– Possible
to
store
the

data
but
diﬃcult
to

process
it

Velocity

•  Big
data
accumulate
at
a

very
fast
speed

–  Stock
market
data

–  Internet

access
log

–  Social
media
data

•  TwiEer
,
facebook,
IG

•  We
need
to

–  Extract
meaning
as
fast
and

as
much
as
we
can
before

throwing
away
the
data

Variety

•  Data
come
with

variety

–  Tradi<onal
data

base

–  Documents

–  Web
page

–  Social
media

data

–  Image

–  Video/Audio

–  Loca<on

Diya
Soubra,

The
3Vs
that
deﬁne
Big
Data,
2012

hEp://www.datasciencecentral.com/forum/topics/the-‐3vs-‐that-‐deﬁne-‐big-‐data

Considera<on
for
Applying
Big

Data

hEp://fredericgonzalo.com/en/2013/07/07/big-‐data-‐in-‐tourism-‐hospitality-‐4-‐key-‐components/

BRIEF
OVERVIEW
OF
BIG
DATA

TOOLS

Big
Data
Ecosystem

Reference:
hEp://dataconomy.com/understanding-‐big-‐data-‐ecosystem/

Big
Data
Eco
system-‐
Infrastructure

•  Hadoop-‐

–  technologies
designed
for
the
storing,
processing
and
analysing

of
data
by
breaking
up
and
distribu<ng
data
into
parts
and

analysing
those
parts
concurrently,
rather
than
tackling
one

monolithic
block
of
data
all
in
one
go.

•  NoSQL

–  Stands
for
Not
Only
SQL

–  involved
in
processing
large
volumes
of
mul<-‐structured
data.

Most
NoSQL
databases
are
most
adept
at
handling
discrete
data

stored
among
mul<-‐structured
data.

•  Massively
Parallel
Processing
(MPP)
Databases

–  MPP
databases
work
by
segmen<ng
data
across
mul<ple
nodes,

and
processing
these
segments
of
data
in
parallel,
and
uses
SQL.

Reference:

Big
Data
Eco
system-‐
Analy<cs

•  AnalyHcs
PlaIorms

–  Integrate
and
analyse
data
to
uncover
new
insights,
and
help
companies
make
beEer-‐
informed
decisions.

•  VisualizaHon
PlaIorms

– 
visualizing
data;
taking
the
raw
data
and
presen<ng
it
in
complex,
mul<-‐dimensional

visual
formats
to
illuminate
the
informa<on

•  Business
Intelligence
(BI)
PlaIorms

–  analyze
data
from
mul<ple
sources
to
deliver
services
such
as
business
intelligence

reports,
dashboards
and
visualiza<ons

•  Machine
Learning

–  machine
learning
is
data
the
algorithm
‘learns
from’,
and
the
output
depends
on
the
use

case.
One
of
the
most
famous
examples
is
IBM’s
super
computer
Watson,
which
has

‘learned’
to
scan
vast
amounts
of
informa<on
to
ﬁnd
speciﬁc
answers,
and
can
comb

through
200
million
pages
of
structured
and
unstructured
data
in
minutes.

Reference:

How
can
we
store
and
process
massive

data

•  Beyond
capability
of
a
single
server

•  Basic
Infrastructure

–  Cluster
of
servers

–  High
speed
interconnected

–  High
speed
storage
cluster

•  Incoming
data
will
be
spread
across
the
server
farm

•  Processing
is
quickly
distributed
to
the
farm

•  Result
is
collected
and
send
back

NoSQL
(Not
Only
SQL)

•  A
NoSQL
(oLen
interpreted
as
Not
only
SQL)

database
provides
a
mechanism
for
storage
and

retrieval
of
data
that
is
modeled
in
means
other
than

the
tabular
rela<ons
used
in
rela<onal
databases.

– being
non-‐relaHonal,
distributed,
open-‐
source
and
horizontally
scalable.

– Used
to
handle
a
huge
amount
of
data

– The
original
inten<on
has
been
modern
web-‐scale

databases.

Reference:
hEp://nosql-‐database.org/

•  MongoDB
is
a
general
purpose,

open-‐source
database.

•  MongoDB
features:

–  Document
data
model
with

dynamic
schemas

–  Full,
ﬂexible
index
support
and
rich

queries

–  Auto-‐Sharding

for
horizontal

scalability

–  Built-‐in
replica<on
for
high

availability

–  Text
search

–  Advanced
security

•  Hadoop
is
an
open-‐source
soLware
framework
wriEen
in
Java
for

distributed
storage
and
distributed
processing
of
very
large
data
sets
on

computer
clusters
built
from
commodity
hardware.

•  The
base
Apache
Hadoop
framework
is
composed
of
the
following

modules:

–  Hadoop
Common
–
contains
libraries
and
u<li<es
needed
by
other
Hadoop

modules;

–  Hadoop
Distributed
File
System
(HDFS)
–
a
distributed
ﬁle-‐system
that
stores

data
on
commodity
machines,
providing
very
high
aggregate
bandwidth

across
the
cluster;

–  Hadoop
YARN
–
a
resource-‐management
plakorm
responsible
for
managing

compute
resources
in
clusters
and
using
them
for
scheduling
of
users'

applica<ons;and

–  Hadoop
MapReduce
–
a
programming
model
for
large
scale
data
processing.

•  Hadoop
was
created
by
Doug
Cumng
and
Mike
Cafarella
in
2005.
Cumng,

who
was
working
at
Yahoo!
at
the
<me,
named
it
aLer
his
son's
toy

elephant.

Magic
behind
Hadoop
and
HDFS

•  Problem
is
divided
into
two
phases

–  Map
applying
some
ac<on
to
data
in
<key,
Value>

Pair
and
get
some
intermediate
results

–  Reduce
summarize
intermediate
result
<key,value>

and
return
back
to
main
program

Ricky
Ho,
How
Hadoop
Map/Reduce
works,

hEp://architects.dzone.com/ar<cles/how-‐hadoop-‐mapreduce-‐works

Example:
Word
count

•  Coun<ng
word
in
an
input
text
ﬁle.

–  How
many
word
“love”
in
a
novel?
^_^

•  In
map
phase
the
sentence
would
be
split
as
words
and

form
the
ini<al
key
value
pair
<word,
1>

•  “tring
tring
the
phone
rings”
becomes
<tring,1>
,<tring,1>,
<the,1>,

<phone,1>,
<rings,1>

–  In
the
reduce
phase
the
keys
are
grouped
together
and
the
values

for
similar
keys
are
added.

•  There
are
only
one
pair
of
similar
keys
‘tring’
the
values
for
these
keys

would
be
added
so
the
out
put
key
value
pairs
would
be

•  <tring,2>,
<the,1>,
<phone,1>,
<rings,1>

•  Reduce
forms
an
aggrega<on
phase
for
keys

–  This
would
give
the
number
of
occurrence
of
each
word
in
the

input.

hEp://kickstarthadoop.blogspot.com/2011/04/word-‐count-‐hadoop-‐map-‐reduce-‐
example.html

In-‐memory
Database

•  An
in-‐memory
database
is

–  a
database
management
system
that

primarily
relies
on
main

memory
for
computer
data
storage.

–  faster
than
disk-‐op<mized
databases

since
the
internal
op<miza<on

algorithms
are
simpler
and
execute

fewer
CPU
instruc<ons.

– 
Accessing
data
in
memory

eliminates
seek
<me
when
querying

the
data,
which
provides
faster
and

more
predictable
performance
than

disk.

Source:
hEp://en.wikipedia.org/wiki/In-‐memory_database

What
is
Spark?

Eﬃcient

•  General
execu<on
graphs

•  In-‐memory
storage

Usable

•  Rich
APIs
in
Java,
Scala,

Python

•  Interac<ve
shell

Fast and Expressive Cluster Computing !
Engine Compatible with Apache Hadoop
2-‐5×
less
code

Up
to
10×
faster
on
disk,
100×
in
memory

Spark
at
Yahoo

•  Personalizing
news
pages
for
Web
visitors
and

another
for
running
analy<cs
for
adver<sing.

For
news
personaliza<on,
the
company
uses

ML
algorithms
running
on
Spark
to
figure
out

what
individual
users
are
interested
in,
and

also
to
categorize
news
stories
as
they
arise
to

figure
out
what
types
of
users
would
be

interested
in
reading
them.

–  wrote
a
Spark
ML
algorithm
120
lines
of
Scala.

(Previously,
its
ML
algorithm
for
news

personaliza<on
was
wriEen
in
15,000
lines
of
C++.)

–  With
just
30
minutes
of
training
on
a
large,
hundred

million
record
data
set,
the
Scala
ML
algorithm
was

ready
for
business.

•  Second
use
case
shows
off
Hive
on
Spark

(Shark’s)
interac<ve
capability.

–  use
exis<ng
BI
tools
to
view
and
query
their

adver<sing
analy<c
data
collected
in
Hadoop.

hEp://www.datanami.com/2014/03/06/apache_spark_3_real-‐
world_use_cases/

BigData
Goes
to
Cloud

•  Data
is
already
on
the
cloud

– Virtual
organiza<on

– Cloud
based
SaaS
Service

•  Big
Data
As
a
Service
on
the
Cloud

– Private
Cloud

– Public
Cloud

Amazon
•  Amazon
EC2

– Computa<on
Service
using
VM

•  Amazon
DynamoDB

– Large
scalable
NoSQL
databased

– Fully
distributed
shared
nothing
architecture

•  Amazon
Elas<c
MapReduce
(Amazon
EMR)

– Hadoop
based
analysis
engine

– Can
be
used
to
analyse
big
data
without
the

need
to
build
the
infrastucture

hEp://aws.amazon.com/big-‐data/

Google
Cloud
Plakorm
•  App
engines

–  mobile
and
web
app

•  Cloud
SQL

–  MySQL
on
the
cloud

•  Cloud
Storage

–  Data
storage

•  Big
Query

–  Data
analysis

•  Google
Compute
Engine

–  Processing
of
large
data

BIG
DATA
BENEFIT
AND
USE
CASE

Current
Trends

•  Big
data
toward
real

usage

–  From
pilot
to
real
usage

•  More
soLware
solu<on

–  Infrastructure

–  Analy<cs

•  Sta<s<cal
Analysis

•  Social
Graph
Analysis

•  More
unstructured
data

–  Facebook
,
twiEer,
text
,

video,
image

Analy<cs

Structured
Unstructured

Big
Data

Google
Flu

•  paEern
emerges
when
all
the
flu-‐
related
search
queries
are
added

together.

•  We
compared
our
query
counts
with

tradi<onal
flu
surveillance
systems

and
found
that
many
search
queries

tend
to
be
popular
exactly
when
flu

season
is
happening.

•  By
coun<ng
how
oLen
we
see
these

search
queries,
we
can
es<mate
how

much
flu
is
circula<ng
in
different

countries
and
regions
around
the

world.

hEp://www.google.org/flutrends/
about/how.html

WHAT
FACEBOOK
KNOWS

hEp://www.facebook.com/data

Cameron
Marlow
calls
himself
Facebook's
"in-‐
house
sociologist."
He
and
his
team
can
analyze

essen<ally
all
the
informa<on
the
site
gathers.

Study
of
Human
Society

•  Facebook,
in
collabora<on
with
the
University

of
Milan,
conducted
experiment
that
involved

– the
en<re
social
network
as
of
May
2011

– more
than
10
percent
of
the
world's
popula<on.

•  Analyzing
the
69
billion
friend
connec<ons

among
those
721
million
people
showed
that

– four
intermediary
friends
are
usually
enough
to

introduce
anyone
to
a
random
stranger.

Why?

•  Facebook
can
improve
users
experience

– make
useful
predic<ons
about
users'
behavior

– make
beEer
guesses
about
which
ads
you
might

be
more
or
less
open
to
at
any
given
<me

•  Right
before
Valen<ne's
Day
this
year
a

blog
post
from
the
Data
Science
Team
listed

the
songs
most
popular
with
people
who
had

recently
signaled
on
Facebook
that
they
had

entered
or
leL
a
rela<onship

How
facebook
handle
Big
Data?

•  Facebook
built
its
data
storage
system
using
open-‐
source
soLware
called
Hadoop.

–  Hadoop
spreading
them
across
many
machines
inside
a

data
center.

–  Use
Hive,
open-‐source
that
acts
as
a
transla<on
service,

making
it
possible
to
query
vast
Hadoop
data
stores
using

rela<vely
simple
code.

•  Much
of
Facebook's
data
resides
in
one
Hadoop
store

more
than
100
petabytes
(a
million
gigabytes)
in
size,

says
Sameet
Agarwal,
a
director
of
engineering
at

Facebook
who
works
on
data
infrastructure,
and
the

quan<ty
is
growing
exponen<ally.
"Over
the
last
few

years
we
have
more
than
doubled
in
size
every
year,”

eBay

•  eBay
is
using
Hadoop
technology
and
the
Hbase
database,
which
supports
real-‐
<me
analysis
of
Hadoop
data,
to
build
a
new
search
engine
for
its
auc<on
site.

–  97
million
ac<ve
buyers
and
sellers

–  over
200
million
items
for
sale
in
50,000
categories.

–  The
site
handles
close
to
2
billion
page
views.

– 
250
million
search
queries
and
tens
of
billions
of
database
calls
daily.

•  The
company
has
9
petabytes
of
data
stored
on
Hadoop
and
Teradata
clusters,

and
the
amount
is
growing
quickly,
he
said.

•  100
eBay
engineers
are
working
on
the
Cassini
project.
The
new
engine
is

expected
to
respond
to
user
queries
with
results
that
are
context-‐based
and
more

accurate
than
those
provided
by
the
current
system.

Source:
hEp://www.computerworld.com/ar<cle/2550078/data-‐center/hadoop-‐is-‐ready-‐for-‐the-‐enterprise-‐-‐it-‐execs-‐say.html

•  JPMorgan
Chase
s<ll
relies
heavily
on
rela<onal

database
systems
for
transac<on
processing.

•  Hadoop
technology
is
used
for
a
growing
number
of

purposes,
including
fraud
detecGon,
IT
risk

management
and
self
service.

–  With
over
150
petabytes
of
data
stored
online,
30,000

databases
and
3.5
billion
log-‐ins
to
user
accounts.

•  Hadoop's
ability
to
store
vast
volumes
of
unstructured

data
allows
the
company
to
collect
and
store
Web

logs,
transac<on
data
and
social
media
data.

•  The
data
is
aggregated
into
a
common
plakorm
for

use
in
a
range
of
customer-‐focused
data
mining
and

data
analy<cs
tools.

Source:
hEp://www.computerworld.com/ar<cle/2550078/data-‐center/hadoop-‐is-‐ready-‐for-‐the-‐enterprise-‐-‐it-‐execs-‐say.html

Premier

•  Premier,
the
U.S.
healthcare
alliance
network.
More

than
2,700
members,
hospitals
and
health
systems,
90,000
non-‐acute
facili<es
and
400,000
physicians

–  a
large
database
of
clinical,
ﬁnancial,
pa<ent,and
supply

chain
data

–  generated
comprehensive
and
comparable
clinical

outcome
measures,
resource
u<liza<on
reports
and

transac<on
level
cost
data.

•  Big
data
is
used
to
improve
the
healthcare
processes
at

approximately
330
hospitals,
saving
an
es<mated

29,000
lives
and
reducing
healthcare
spending
by

nearly
$7
billion

Reference:
IBM:
Data
Driven
Healthcare
Organiza<ons
Use
Big
Data
Analy<cs
for
Big

Gains;
2013.
hEp://www03.ibm.com/industries/ca/en/healthcare/

documents/Data_driven_healthcare_organiza<ons_use_big_data_analy<cs_for_big_gains.pdf.

Some
Sucesss

•  The
Rizzoli
Orthopedic
Ins<tute
in
Bologna,

Italy

– using
advanced
analy<cs
to
gain
a
more
“granular

understanding”
of
the
clinical
varia<ons
within

families
whereby
individual
pa<ents
display

extreme
diﬀerences
in
the
severity
of
their

symptoms.

•  The
insight
is
reported
to
have
reduced
annual

hospitaliza<ons
by
30%
and
the
number
of

imaging
tests
by
60%.

Social
Media
Analy<cs

•  Social
media
analyHcs
is
the
prac<ce
of

gathering
data
from
blogs
and
social

media
websites
and
analyzing
that
data
to

make
business
decisions.
The
most
common

use
of
social
media
analyHcs
is
to
mine

customer
sen<ment
in
order
to
support

marke<ng
and
customer
service
ac<vi<es.

What
is
social
media
analy<cs?
-‐
Deﬁni<on
from
WhatIs.com

Star<ng
a
Big
Data
Ini<a<ve

Data

Infrastructure

Big
Data
Tools

Analy<cs
SoLware

Visualiza<on

Top
Down

BoEom
Up

Data
Product

•  Data
Product
provides
ac<onalble
informa<on

without
exposing
decision
maker
to
the

underlying
data
or
analy<cs

– Movie
Recommenda<ons

– Weather
Forecast

– Stock
Market
Predic<on

– Opera<on
improvement

– Health
Diagnosis

– Targeted
Adver<sing

Source:
The
Filed
Guide
to
Data
Science,
Booz,
Allen,
Hamilton

BoEom
up
approach

•  What
is
the
data
that
we
have?

•  How
can
we
collect
and
store
it?

•  What
is
the
infrastructure
and

tool
to
process
this
big
data?

•  What
analy<cs
method
can
be

apply?

•  What
is
the
insight
we
can
gain

from
this
data
and
analysis?

Top
down

•  What
is
the
business

challenge
that
can
create

value
and
impact
to
the

organiza<on?

•  What
is
the
data
that
we

need?

•  What
is
the
tools
and
analy<cs

approach
that
should
be

used
?

•  What
is
the
infrastructure

needed?

Some
thought

•  BoEom
up
approach
may
be
good
when
you
do
not
know

how
to
start?

•  Pick
some
easy
ques<on
and
start
a
pilot

–  Learning
infrastructure
technology,
analy<c
technology
and

tools

–  Using
data
you
already
have

•  Top
down
that
focus
on
business
value
is
beEer
but

challenging

–  Hard
to
ask
a
good
ques<on,
need
management
to
iden<fy
the

need

–  May
have
to
ask
many
ques<ons
and
pick
the
right
one
based

on

•  Impact
and
value

•

Example:
What
is/is
not

big
data

problem?

•  I
want
to
classify
the
legal
documents
to
make

it
easy
to
process
these
documents

•  I
want
to
learn
how
our
customer
react
to
our

new
Tee-‐shirt

•  I
want
to
understand
how
our
students
use

facebook

Trend:

Informa<on
Tsunami
is
coming!

Informa<on
Tsunami

•  Rapid
expansion
of
Smartphone
Usage,
social

compu<ng,
mobile
applica<on,
gaming

•  Rapid
increases
in
Network
Bandwidth
and
coverage

–  Wiﬁ,
4G

•  Rapid
move
toward
Internet
of
Things
(IOT)

–  Sensor
everywhere,
mul<media
informa<on

Trend:

Big
data
infrastructure

becomes
even
more
powerful
and

easy
to
use

BigData
Infrastructure
Goes
to
Cloud

•  Data
is
already
on
the
cloud

–  Virtual
organiza<on

–  Cloud
based
SaaS
Service

•  Big
Data
As
a
Service
on
the
Cloud

–  Private
Cloud

–  Public
Cloud

•  IBM
Bluemix,
Amazon
AWS
(EMR)
and
many

Big
Data

Services

Services

App

App

Trend:

Big
data
is
moving

toward
the
real
usage

Trends

•  Big
data
toward
real
usage

–  From
pilot
to
real
usage

•  More
soLware
solu<on

–  Infrastructure

–  Analy<cs

•  Sta<s<cal
Analysis

•  Social
Graph
Analysis
with

machine
learning

•  More
unstructured
data

–  Facebook
,
twiEer,
text
,

video,
image

Analy<cs

Structured
Unstructured

Big
Data

Trend
:
much
smarter
data
analy<cs

is
coming

Big
Data
Analy<cs

•  a
set
of
advanced
technologies

designed
to
work
with
large

volumes
of
heterogeneous
data.

•  explore
the
data
and
to
discover

interrela<onships
and
paEerns

using

sophis<cated
quan<ta<ve

methods
such
as

•  machine
learning

•  neural
networks

•  robo<cs
algorithm

•  computa<onal
mathema<cs

•  ar<ﬁcial
intelligence

Deep
Learning

•  Deep
learning
is
a
subcategory
of
machine
learning

with
the
use
of
neural
networks
to
improve
things

like
speech
recogni<on,
computer
vision,

and
natural
language
processing.

–  Unsupervised
learning
for
abstract
concept

Applying
Deep
Learning

•  In
2011,
Stanford
computer
science
professor
Andrew
Ng
founded
Google’s
Google

Brain
project,
which
created
a
neural
network
trained
with
deep
learning

algorithms,
which
famously
proved
capable
ofrecognizing
high
level
concepts,
such

as
cats,
aLer
watching
just
YouTube
videos-‐-‐and
without
ever
having
been
told

what
a
“cat”
is.

•  Facebook
using
deep
learning
exper<se
to
help
create
solu<ons
that
will
beEer

iden<fy
faces
and
objects
in
the
350
million
photos
and
videos
uploaded
to

Facebook
each
day.

•  Voice
recogni<on
like
Google
Now
and
Apple’s
Siri
is
now
using
deep
learning.

–  According
to
Google
researchers,
the
voice
error
rate
in
the
new
version
of
Android-‐-‐aLer

adding
insights
from
deep
learning-‐-‐stands
at
25%
lower
than
previous
versions
of
the

soLware.

Source:
h#p://www.fastcolabs.com/3026423/why-‐google-‐is-‐invesGng-‐in-‐deep-‐learning

h#p://www.wired.com/2014/08/deep-‐learning-‐yann-‐lecun/

IBM
Watson
and
Cogni<ve
Technology

•  Watson
is
a
cogni<ve

technology
that
processes

informa<on
more
like
a
human
than

a
computer—by
understanding

natural
language,
genera<ng

hypotheses
based
on
evidence,
and

learning
as
it
goes.
And
learn
it
does.

•  Watson
“gets
smarter”
in
three

ways:

–  being
taught
by
its
users

– 
learning
from
prior
interac<ons

–  being
presented
with
new
informa<on.

•  This
means
organiza<ons
can
more

fully
understand
and
use
the
data

that
surrounds
them,
and
use
that

data
to
make
beEer
decisions.

Applying
Watson
in
Healthcare

•  WellPoint,
Inc.
is
an
Indianapolis-‐based
health

benefits
company.

–  approximately
37
million
health
plan
members

–  processes
more
than

550
million
claims
per
year.

•  Using
IBM
Watson™
to
improve
the
quality

and

efficiency
of
healthcare
decisions.

–  WellPoint
trained
Watson
with
25,000
historical

cases.
Now
Watson
uses
hypothesis
genera<on
and

evidence-‐based
learning
to
generate
confidence-‐
scored
recommenda<ons
that
help
nurses
make

decisions
about
u<liza<on
management.
Natural

language
processing
leverages
unstructured
data,

such
as
text-‐based
Treatment
requests.

•  Benefit

–  Helps
UM
nurses
make
faster
UM
decisions
about

treatment
requests

–  Could
accelerate
healthcare
preapprovals,
which
can

be
cri<cal
when
treatments
are
<me-‐sensi<ve

–  Includes
unstructured
data
in
the
streamlined

decision
process

Challenges

•  Developing
Big
Data
Applica<on
is
not
simple

– New
algorithm,
new
soLware
development
tools

•  Proper
policy
about
data
security
and

ownership

•  Lack
of
Data
Scien<sts

– Diﬀerent
from
SoLware
Developer

Have
fun
with
your
Big
Data

Advanture!

Introduction to Big Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Introduction to Big Data

Similar to Introduction to Big Data (20)

More from IMC Institute

More from IMC Institute (20)

Recently uploaded

Recently uploaded (20)

Introduction to Big Data