Big Data Analytics - Introduction

1,340 views
1,127 views

Published on

"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is revolutionizing the world.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,340
On SlideShare
0
From Embeds
0
Number of Embeds
248
Actions
Shares
0
Downloads
43
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data Analytics - Introduction

  1. 1. Big Data Analytics
  2. 2. What Is Big Data Analytics? ● Big Data – Buzz word – Two definitions: ● Data sets too large for modern relational databases ● Semi-structured/Unstructured data sets ● Analytics – The science of measuring and discovering patterns and trends with data
  3. 3. Source: http://www.socialtalent.co/blog/big-data-whats-the-big-deal
  4. 4. Data, Data, Everywhere... ● In 2004: – Internet traffic: 1 Exabyte (that's 134,217,728 8GB flash drives) – A lot of other media: ● Newspapers/books/magazines ● DVDs
  5. 5. Data, Data, Everywhere... ● Today: – Internet traffic: 1.3 Zettabytes (that's 178,670,639,360 8 GB sticks) ● 110.3 exabytes per month – Even more media: ● Mobile devices (phones/tablets/mp3 players/etc) ● The Internet of Things ● Streaming Media
  6. 6. The Internet of Things ● How many of you have... – Fitness trackers? – E-readers? – Ipods? ● Tie them to social sites (i.e. Facebook)?
  7. 7. The Internet of Things ● You're being tracked! ● So what? – Marketing – Medical – Government ● Building fuller picture of what's tracked.
  8. 8. Social Network Integration
  9. 9. Six Degrees of Separation Source: http://www.83toinfinity.com
  10. 10. Source: http://www.math.cornell.edu/~numb3rs/blanco/social_net.jpg
  11. 11. Data Storage
  12. 12. Data Storage ● Relational Databases – Structured data – Can scale to huge volumes of data ● Hadoop – Semi-structured/unstructured data – Massively parallel storage and processing
  13. 13. Relational Database Source: http://www.ntu.edu.sg/home/ehchua/programming/sql/images/ManyToOne.png
  14. 14. Unstructured Data Source: http://storagegaga.com/2011/12/
  15. 15. Semi-structured Source: http://www.stylusstudio.com/images/figures/sql_xml_xml_fragment.gif
  16. 16. What Solution to Pick? ● Data Volume and Speed – Relational Databases Will Cap out – ”Big Data” Stores Scale (For Now) ● Hadoop ● Spark ● Lucene – Alternative Modeling Techniques ● Hyper Normalized (6-8NF) – Inmon's Textual Disambiguation – Anchor Modeling – Data Vault
  17. 17. Hadoop ● Version 1 – Giant data store – File distribution – File parsing tools – Generic security ● Version 2 – Giant data store – Replaced foundation work – Unified security -LDAP/Kerberos support
  18. 18. Tools ● Oozie ● Hive ● NoSQL Databases – Hbase – MongoDB
  19. 19. JSON { "employees": [ { "firstName":"John" , "lastName":"Doe" }, { "firstName":"Anna" , "lastName":"Smith" }, { "firstName":"Peter" , "lastName":"Jones" } ] } Source: http://www.w3schools.com/json/json_syntax.asp
  20. 20. How to Analyze? ● Performance ● Timeliness ● Accuracy ● Feedback
  21. 21. “Big Data” Solutions ● Search the entire data set ● Great performance ● Highly accurate ● Integrates into Analytics tools – Only some of the tools are able to support Hadoop, etc.
  22. 22. Statistics ● Designed for all sizes of data sets ● Decreases time to results ● As accurate as needed ● Analytics tools fully support ● Most “Big Data” tools support
  23. 23. Analytics Tools ● Can access data of most sizes – Most can handle Hadoop and some NoSQL databases ● Built for Predictive Modeling ● Starting to handle social/network modeling
  24. 24. How to Get Started ● Grab some tools! – RapidMiner (http://rapidminer.com/) – R (http://www.r-project.org/) – Weka (http://www.cs.waikato.ac.nz/ml/weka/) ● Grab some data! – http://www.kdnuggets.com/datasets/index.html – http://aws.amazon.com/publicdatasets/ – http://www.reddit.com/r/datasets
  25. 25. Prizes/Challenges ● Kaggle - https://www.kaggle.com/ ● MIT - http://bigdata.csail.mit.edu/challenge ● Heritage Health Prize - http://www.heritagehealthprize.com/c/hhp
  26. 26. ● Twitter - @OpenDataAlex ● LinkedIn – alexmeadows ● Github - dbaAlex Questions? Comments?

×