Big Data is getting bigger and bigger but at the same time before adopting it seriously and exploiting it we should also take care of the security shortcomings it comes up with....from a forensics and security point of view....we need to understand the vulnerabilities they come up with before blindly adopting them!!!!
7. An English professor wrote the words :
“A Woman without her man is
nothing”
On the chalk board and asked his students to punctuate it correctly….
“A Woman,without her man,is nothing”
“A Woman: Without her, man is nothing”
7
8. A greater scope of Geo Int info
New kinds of Geo data and analysis
Real time Geo information
Data influx from new technologies
Non traditional forms of Geo data
Large volumes of Geo data
The latest buzzword
Social media data
0 2 4 6 8 10 12 14 16 18 20
Series 1
DEFINING BIG SPATIAL DATA
8
How we understand
it ?
9. Spatial data sets exceeding capacity of
current computing systems……
….to manage, process or analyze
the data with reasonable effort
due to Volume, Velocity, Variety and
Veracity
DEFINING BIG SPATIAL DATA
BIG SPATIAL DATA
9
11. DEFINING BIG SPATIAL DATA
BIG SPATIAL DATA
Finding
actionable info
in Massive
volumes of both
structured and
unstructured
geo data that is
so large and
complex that it’s
difficult to process
with traditional
database and
software
techniques……
Volume
Velocity
VARIETY
VERACITY
Data
at
rest
Data in
Motion
Data in
Many
forms
Data in Doubt
11
12. 90% of data in
the world was
created in the
last 2 years
2.5 EB of
data is
created
every day
U.S. drone aircraft
sent back 24 years
worth of video
footage in 2009
Gigabyte (GB) - 1,024MB
Terabyte (TB) - 1,024GB
Petabyte (PB) - 1,024TB
Exabyte (EB) - 1,024PB
13. * Estimated revenue FY 2013
growth of geospatial data is outpacing
both software and services and is set
to become a major contributor to the
overall growth of the industry
13
14. 100% security is a myth
No one has said this!!!
But it remains a fact
14
Increasing attack
surface
20. Distributed programming frameworks
Input file
Map Intermediate
Combining Shuffle
Output File
Local
Reduce
Reduce
Mapper performs
computation
& outputs a
key/value pairs
20
Reducer combines
the values
belonging to each
distict key and
outputs the result
Utilise parallilism in computation & storage to process massive amounts of
data
21. MAP REDUCE
FRAMEWORK
Splits the input data-set into
independent chunks which are
processed
in a completely parallel manner
Aggregate results from map phase
performs a summary operation
Schedules and re-runs tasks
Splits the input
Moves map outputs to reduce inputs
Receive the results
Distributed programming frameworks
21
22. So challenge is not storage but it is I/O speed
One Machine
4 i/o Channels
Each channel : 100 MB/s
10 Machine’s
4 i/o Channels
Each channel : 100 MB/s
Read 1 TB
45 Min 4.5 Min
29. STORAGE TIERS
- Multi-tiered storage media
- Necessitated by scalable size
- Different categories of data
- Different types of storage
Data storage & transaction logs
29
30. Lower tier means reduced
security, loose access
controls
Keeping track of data
location
Data storage & transaction logs
30
40. EG : How a retailer was
able to identify that a
teenager was pregnant
before her father knew
40
PRIVACY ISSUES
In the world of big data,privacy invasion is a business model
46. 46
USE KERBEROS FOR NODE AUTHENTICATION
– (BUT WE KNOW IT’S A PAIN TO SET UP)
STRINGENT POLICIES
STANDARD TO INTRA COUNTRY LAWS
EXHAUSTIVE LOGS
SECURE COMMUNICATION
STRINGENT POLICIES