Your SlideShare is downloading. ×
0
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Big Data Analytics - Introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data Analytics - Introduction

922

Published on

"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is …

"Big Data" is big business, but what does it really mean? How will big data impact industries and consumers? This slide deck goes through some of the high level details of the market and how it is revolutionizing the world.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
922
On Slideshare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
36
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Analytics
  • 2. What Is Big Data Analytics? ● Big Data – Buzz word – Two definitions: ● Data sets too large for modern relational databases ● Semi-structured/Unstructured data sets ● Analytics – The science of measuring and discovering patterns and trends with data
  • 3. Source: http://www.socialtalent.co/blog/big-data-whats-the-big-deal
  • 4. Data, Data, Everywhere... ● In 2004: – Internet traffic: 1 Exabyte (that's 134,217,728 8GB flash drives) – A lot of other media: ● Newspapers/books/magazines ● DVDs
  • 5. Data, Data, Everywhere... ● Today: – Internet traffic: 1.3 Zettabytes (that's 178,670,639,360 8 GB sticks) ● 110.3 exabytes per month – Even more media: ● Mobile devices (phones/tablets/mp3 players/etc) ● The Internet of Things ● Streaming Media
  • 6. The Internet of Things ● How many of you have... – Fitness trackers? – E-readers? – Ipods? ● Tie them to social sites (i.e. Facebook)?
  • 7. The Internet of Things ● You're being tracked! ● So what? – Marketing – Medical – Government ● Building fuller picture of what's tracked.
  • 8. Social Network Integration
  • 9. Six Degrees of Separation Source: http://www.83toinfinity.com
  • 10. Source: http://www.math.cornell.edu/~numb3rs/blanco/social_net.jpg
  • 11. Data Storage
  • 12. Data Storage ● Relational Databases – Structured data – Can scale to huge volumes of data ● Hadoop – Semi-structured/unstructured data – Massively parallel storage and processing
  • 13. Relational Database Source: http://www.ntu.edu.sg/home/ehchua/programming/sql/images/ManyToOne.png
  • 14. Unstructured Data Source: http://storagegaga.com/2011/12/
  • 15. Semi-structured Source: http://www.stylusstudio.com/images/figures/sql_xml_xml_fragment.gif
  • 16. What Solution to Pick? ● Data Volume and Speed – Relational Databases Will Cap out – ”Big Data” Stores Scale (For Now) ● Hadoop ● Spark ● Lucene – Alternative Modeling Techniques ● Hyper Normalized (6-8NF) – Inmon's Textual Disambiguation – Anchor Modeling – Data Vault
  • 17. Hadoop ● Version 1 – Giant data store – File distribution – File parsing tools – Generic security ● Version 2 – Giant data store – Replaced foundation work – Unified security -LDAP/Kerberos support
  • 18. Tools ● Oozie ● Hive ● NoSQL Databases – Hbase – MongoDB
  • 19. JSON { "employees": [ { "firstName":"John" , "lastName":"Doe" }, { "firstName":"Anna" , "lastName":"Smith" }, { "firstName":"Peter" , "lastName":"Jones" } ] } Source: http://www.w3schools.com/json/json_syntax.asp
  • 20. How to Analyze? ● Performance ● Timeliness ● Accuracy ● Feedback
  • 21. “Big Data” Solutions ● Search the entire data set ● Great performance ● Highly accurate ● Integrates into Analytics tools – Only some of the tools are able to support Hadoop, etc.
  • 22. Statistics ● Designed for all sizes of data sets ● Decreases time to results ● As accurate as needed ● Analytics tools fully support ● Most “Big Data” tools support
  • 23. Analytics Tools ● Can access data of most sizes – Most can handle Hadoop and some NoSQL databases ● Built for Predictive Modeling ● Starting to handle social/network modeling
  • 24. How to Get Started ● Grab some tools! – RapidMiner (http://rapidminer.com/) – R (http://www.r-project.org/) – Weka (http://www.cs.waikato.ac.nz/ml/weka/) ● Grab some data! – http://www.kdnuggets.com/datasets/index.html – http://aws.amazon.com/publicdatasets/ – http://www.reddit.com/r/datasets
  • 25. Prizes/Challenges ● Kaggle - https://www.kaggle.com/ ● MIT - http://bigdata.csail.mit.edu/challenge ● Heritage Health Prize - http://www.heritagehealthprize.com/c/hhp
  • 26. ● Twitter - @OpenDataAlex ● LinkedIn – alexmeadows ● Github - dbaAlex Questions? Comments?

×