Your SlideShare is downloading. ×
  • Like
Big Data Analytics - Introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Big Data Analytics - Introduction

  • 780 views
Published

"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is …

"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is revolutionizing the world.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
780
On SlideShare
0
From Embeds
0
Number of Embeds
22

Actions

Shares
Downloads
28
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big Data Analytics
  • 2. What Is Big Data Analytics? ● Big Data – Buzz word – Two definitions: ● Data sets too large for modern relational databases ● Semi-structured/Unstructured data sets ● Analytics – The science of measuring and discovering patterns and trends with data
  • 3. Source: http://www.socialtalent.co/blog/big-data-whats-the-big-deal
  • 4. Data, Data, Everywhere... ● In 2004: – Internet traffic: 1 Exabyte (that's 134,217,728 8GB flash drives) – A lot of other media: ● Newspapers/books/magazines ● DVDs
  • 5. Data, Data, Everywhere... ● Today: – Internet traffic: 1.3 Zettabytes (that's 178,670,639,360 8 GB sticks) ● 110.3 exabytes per month – Even more media: ● Mobile devices (phones/tablets/mp3 players/etc) ● The Internet of Things ● Streaming Media
  • 6. The Internet of Things ● How many of you have... – Fitness trackers? – E-readers? – Ipods? ● Tie them to social sites (i.e. Facebook)?
  • 7. The Internet of Things ● You're being tracked! ● So what? – Marketing – Medical – Government ● Building fuller picture of what's tracked.
  • 8. Social Network Integration
  • 9. Six Degrees of Separation Source: http://www.83toinfinity.com
  • 10. Source: http://www.math.cornell.edu/~numb3rs/blanco/social_net.jpg
  • 11. Data Storage
  • 12. Data Storage ● Relational Databases – Structured data – Can scale to huge volumes of data ● Hadoop – Semi-structured/unstructured data – Massively parallel storage and processing
  • 13. Relational Database Source: http://www.ntu.edu.sg/home/ehchua/programming/sql/images/ManyToOne.png
  • 14. Unstructured Data Source: http://storagegaga.com/2011/12/
  • 15. Semi-structured Source: http://www.stylusstudio.com/images/figures/sql_xml_xml_fragment.gif
  • 16. What Solution to Pick? ● Data Volume and Speed – Relational Databases Will Cap out – ”Big Data” Stores Scale (For Now) ● Hadoop ● Spark ● Lucene – Alternative Modeling Techniques ● Hyper Normalized (6-8NF) – Inmon's Textual Disambiguation – Anchor Modeling – Data Vault
  • 17. Hadoop ● Version 1 – Giant data store – File distribution – File parsing tools – Generic security ● Version 2 – Giant data store – Replaced foundation work – Unified security -LDAP/Kerberos support
  • 18. Tools ● Oozie ● Hive ● NoSQL Databases – Hbase – MongoDB
  • 19. JSON { "employees": [ { "firstName":"John" , "lastName":"Doe" }, { "firstName":"Anna" , "lastName":"Smith" }, { "firstName":"Peter" , "lastName":"Jones" } ] } Source: http://www.w3schools.com/json/json_syntax.asp
  • 20. How to Analyze? ● Performance ● Timeliness ● Accuracy ● Feedback
  • 21. “Big Data” Solutions ● Search the entire data set ● Great performance ● Highly accurate ● Integrates into Analytics tools – Only some of the tools are able to support Hadoop, etc.
  • 22. Statistics ● Designed for all sizes of data sets ● Decreases time to results ● As accurate as needed ● Analytics tools fully support ● Most “Big Data” tools support
  • 23. Analytics Tools ● Can access data of most sizes – Most can handle Hadoop and some NoSQL databases ● Built for Predictive Modeling ● Starting to handle social/network modeling
  • 24. How to Get Started ● Grab some tools! – RapidMiner (http://rapidminer.com/) – R (http://www.r-project.org/) – Weka (http://www.cs.waikato.ac.nz/ml/weka/) ● Grab some data! – http://www.kdnuggets.com/datasets/index.html – http://aws.amazon.com/publicdatasets/ – http://www.reddit.com/r/datasets
  • 25. Prizes/Challenges ● Kaggle - https://www.kaggle.com/ ● MIT - http://bigdata.csail.mit.edu/challenge ● Heritage Health Prize - http://www.heritagehealthprize.com/c/hhp
  • 26. ● Twitter - @OpenDataAlex ● LinkedIn – alexmeadows ● Github - dbaAlex Questions? Comments?