"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is revolutionizing the world.
2. What Is Big Data Analytics?
● Big Data
– Buzz word
– Two definitions:
● Data sets too large for modern relational databases
● Semi-structured/Unstructured data sets
● Analytics
– The science of measuring and discovering patterns
and trends with data
5. Data, Data, Everywhere...
● In 2004:
– Internet traffic: 1 Exabyte (that's 134,217,728 8GB
flash drives)
– A lot of other media:
● Newspapers/books/magazines
● DVDs
6. Data, Data, Everywhere...
● Today:
– Internet traffic: 1.3 Zettabytes (that's
178,670,639,360 8 GB sticks)
● 110.3 exabytes per month
– Even more media:
● Mobile devices (phones/tablets/mp3 players/etc)
● The Internet of Things
● Streaming Media
7. The Internet of Things
● How many of you have...
– Fitness trackers?
– E-readers?
– Ipods?
● Tie them to social sites (i.e. Facebook)?
8. The Internet of Things
● You're being tracked!
● So what?
– Marketing
– Medical
– Government
● Building fuller picture of what's tracked.
13. Data Storage
● Relational Databases
– Structured data
– Can scale to huge volumes of data
● Hadoop
– Semi-structured/unstructured data
– Massively parallel storage and processing
17. What Solution to Pick?
● Data Volume and Speed
– Relational Databases Will Cap out
– ”Big Data” Stores Scale (For Now)
● Hadoop
● Spark
● Lucene
– Alternative Modeling Techniques
● Hyper Normalized (6-8NF)
– Inmon's Textual Disambiguation
– Anchor Modeling
– Data Vault
18.
19. Hadoop
● Version 1
– Giant data store
– File distribution
– File parsing tools
– Generic security
● Version 2
– Giant data store
– Replaced foundation work
– Unified security -LDAP/Kerberos support
23. “Big Data” Solutions
● Search the entire data set
● Great performance
● Highly accurate
● Integrates into Analytics tools
– Only some of the tools are able to support Hadoop,
etc.
24. Statistics
● Designed for all sizes of data sets
● Decreases time to results
● As accurate as needed
● Analytics tools fully support
● Most “Big Data” tools support
25. Analytics Tools
● Can access data of most sizes
– Most can handle Hadoop and some NoSQL
databases
● Built for Predictive Modeling
● Starting to handle social/network modeling
26. How to Get Started
● Grab some tools!
– RapidMiner (http://rapidminer.com/)
– R (http://www.r-project.org/)
– Weka (http://www.cs.waikato.ac.nz/ml/weka/)
● Grab some data!
– http://www.kdnuggets.com/datasets/index.html
– http://aws.amazon.com/publicdatasets/
– http://www.reddit.com/r/datasets
27. Prizes/Challenges
● Kaggle - https://www.kaggle.com/
● MIT - http://bigdata.csail.mit.edu/challenge
● Heritage Health Prize -
http://www.heritagehealthprize.com/c/hhp