Your SlideShare is downloading. ×
Big Data and Big Analytics - Why, what and how
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data and Big Analytics - Why, what and how

323

Published on

Webinar Presentation from 2013-08-30 …

Webinar Presentation from 2013-08-30
An introduction to what Big Data and Big Analytics can be used for and why it is relevant for your. Includes real life samples and ideas and concludes with a look at InfiniDB

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
323
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data and Big Analytics – Why, what and how
  • 2. Agenda • Big Data and Big Analytics – What is it? • Big Analytics vs. the Data Warehouse? • Big Analytics examples • Database technologies for Big Analytics • Questions and Answers
  • 3. What is Big Data? • Big Data is data that is not immediately related to my own business • Big Data is largely unstructured • Big Data consists of data from many different sources, such as Facebook,Twitter web-pages, blogs and any other source you can find • Big Data is all about volume and analysis!
  • 4. Because you want to grow your business! • You can get customers from your competitors – The data on these customers are not in your CRM! – Why did they go with someone else than with you? Your Data Warehouse has few answers to this! • You can grow the market – Those new customers are not in your CRM or Data Warehouse either, to a large extent! • You can do both of these!
  • 5. Why do I need all this data • “My Data Warehouse tells me all I ever want to know, in gruesome detail, about my customers, what more do I need?” • “I get much more data from my CRM system than I do from friggin’ Facebook!” • “Why would I need all those pictures from Facebook and all those twitter texts, they tell me nuthin’!”
  • 6. What is Big Analytics • To get insights from Big Data, you need a more powerful analysis: Big Analytics • Big Analytics often cannot rely on simple BTREE indexes • Big Analytics provides exponentially better accuracy the more data you have
  • 7. What is Big Analytics useful for? • For getting information on things in the “outside world” – My competitors – My competitors customers • For foreseeing trends – What will be “the next big thing” in my business? – What new markets are developing? – What is happening in my current market?
  • 8. Big Data, Analytics and Insights! Big Data Big Analytics Big Insights!
  • 9. Big Analytics use cases • The higher the volume of your business, the more useful Big Data becomes – If you have very few customers, Big Data might be less useful • Retail is a common use case, but there are many more – Finance – Big Data trend analysis – Intelligence – Analysis of new and unknown trends and loosely tied groups – Politics – What is my competition up to?
  • 10. Big Analytics vs. Data Warehouse • Your Data Warehouse is very focused and contains high quality information on low level data: “John Doe bought Chocko Chocolate Chip Cookies for $3.61 on Jan 12 2013” • Big Data provides much more data, but each information item has less detail to it: “Chocko Chocolate Chip Cookies suck!” “An increasing amount of people tweet about Chocolate Chip Cookies”
  • 11. Big Analytics vs. Data Warehouse • What Big Analytics lack in terms of data item correctness can be compensated for by: – Volume: If more than 200.000 tweets agree that our Chocko cookies suck, then we should probably look into it. – Proper analysis: Images can be analyzed for content and stuff you didn’t think about: Maybe “Ma Cookies” brand cookies has an edge on us in that their packaging looks more pleasing? Do we see “Ma Cookies” being eaten in unexpected places or at unexpected times?
  • 12. Big Analytics - Linguistic analysis • This is for tweets, blogs, Facebook and similar. Proper linguistic analysis is complex: – Sentiment “Ma Cookies might seems like they suck, but they are actually quite tasty” – Temporal “In January 2011 we wrote that Chocko Cookies used to taste like manure in 2008, but that they have improved since then” – Ranking – Really complex for larger blocks of text
  • 13. Other types of Big Analytics • Image analysis is a fast developing field, where we find new and interesting use cases – What are the most popular colors? – What color has peoples clothes? – How long has that suitcase been standing at the floor at the airport? • Location analysis – Where did this happen? – In what city is that? What country? • Temporal analysis – When did this happen? When was it published?
  • 14. New Visualizations for New Insights • Visualizing data as a report with columns and rows isn’t always effective • With new and diverse types of data, we need new ways of visualizing data – Location on maps – Timelines – Sentiments • Even with traditional Data Warehouse data, new visualizing can provide new insights! • Interactive visualizations
  • 15. Big Analytics and Visualization examples
  • 16. What is Mitt Romney talking about?
  • 17. Map Visualization – Android or iOS Visualizations by MapBox • Smartphone OS metadata in Geography view – iPhone is Red, Android is Green – Based on data from Verizon passed to NSA
  • 18. Big Analytics database issues • Big Analytics is complex! • Big Analytics doesn’t always allow the “analyze-once-find-later” attribute of a classic index! • Big Analytics is compute intensive • Big Analytics needs some programming. Yikes!
  • 19. Map-Reduce to the rescue • Map-Reduce allows distributed processing on large amounts of data – Map – Algorithm to distribute data across nodes – Reduce – Algorithm to aggregate data from the nodes • Hadoop is the best known and used Map-Reduce framework • Map and Reduce still must be developed • But we still need some kind of database
  • 20. So, what we need is an Analytical Database • Support for complex analysis • Support for distributed, parallel processing (Map-Reduce for example) • Support for storing and processing massive amounts of data • Some kind of cool index technology that work with big data, both reads and writes – Or maybe. A scary idea just came to me…
  • 21. No indexes! Because you don’t need or want them! • What! What’s wrong with good old BTREEs? – They are not well suited to Big Data! – Their usefulness slows down as data grows – Updates slow down significantly as the tree grows! – Skewed data is doesn’t work well • SPATIAL? FREETEXT? HASH? BITMAP? – These are either too specialized or lacks the functionality we need
  • 22. Calpont InfiniDB Real-time, Consistent Query Performance Linear Scale for Massive Data Removes Limits to Dimensions and Granularity Easy to Deploy and Maintain
  • 23. Tiered Query Execution •User Module – Processes SQL Requests •Performance Module – Executes the Queries or Single ServerMPP
  • 24. Map-Reduce for Powerful Analytics SQL Operations are mapped to Performance Module threads • Parallel/Distributed Data Access • Parallel/Distributed Joins (Inner, Outer) • Parallel/Distributed Sub-queries (From, Where, Select) • Parallel/Distributed Group By, Distinct, and Aggregation • Extensible with Parallel/Distributed User Defined Functions Results are returned to User Module in Reduce Phase Map  Reduce 
  • 25. Calpont InfiniDB • Support for Amazon EC2 – Full EBS support – Prepackaged AMIs for ease of provisioning • Hadoop connector • Multiple parallel load options • Available now!
  • 26. • This is true of analytics in general, but particularly true when working with Big Analytics • The more data you have, the more relevant questions you can ask • The more questions you ask, the more you know • The more you know, the more questions you can ask • The wider the range of data you have, the wider questions can be asked If you think you have all the right answers, you haven’t asked all the right questions
  • 27. Questions? Answers! The question is not “What is the answer?”, the question is “What is the question?”. Henri Poincaré

×