• Like
Big data, why care
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Big data, why care

  • 860 views
Published

 

Published in Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
860
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
18
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BigData, Why Care?Saturday 20 October 12
  • 2. Speaker Daan Gerits - BigData Architect - DataCrunchers.eu § Semantic Analysis, Data Harvesting, ... § Hadoop, Azure, BigInsights, ... § Storm BigData.be co-organizer Datacrunchers Consultancy Services 2Saturday 20 October 12
  • 3. BigData A lot of technical fuzz - Hadoop, Storm, Pig, ... Seems to be only for the big players - Google, Facebook, Linkedin, Twitter, ... So why should ‘we’ care? - we = Startups, Smaller and Medium Enterprises (SSME) Datacrunchers Consultancy Services 3Saturday 20 October 12
  • 4. What BigData Promises Ability to store and process large amounts of data - Scalable in hardware and software - Scalable in budget Which means your budget can grow with your data - start small with a small cluster - the more data you want to manage, the more systems you add Lower cost systems - Several low to medium end systems - instead of 1 big expensive one Datacrunchers Consultancy Services 4Saturday 20 October 12
  • 5. But what can you do with it? Analyze your data with higher precision Analyze historical facts Prevent Data Loss - Infrastructure failure - Human errors Eliminate data silo’s Datacrunchers Consultancy Services 5Saturday 20 October 12
  • 6. High Precision Analysis Traditional Technologies - Problems: § Unable to store all data - Solutions: § Sharding § Aggregate data - Problems: § Sharding has a high maintanance cost § Sharding is complex for users and apps § Manual sharding adds a high risk § Data Aggregation causes loss in data precision Datacrunchers Consultancy Services 6Saturday 20 October 12
  • 7. High Precision Analysis BigData allows us to - Store and process large amounts of data § So no need to aggregate - ‘Forget’ about sharding § BigData technologies do this for you § Makes it predictable § And transparant But - You have to configure it correctly - You don’t have ad-hoc querying (yet) Datacrunchers Consultancy Services 7Saturday 20 October 12
  • 8. Analyze Historical Facts Data Warehouse - Built on top of parameters What if we forget to add a parameter? - Add the parameter - Start gathering information for that parameter Problem: - We will only have information from the moment we add the parameter! Datacrunchers Consultancy Services 8Saturday 20 October 12
  • 9. Analyze Historical Facts Let’s store everything Determine the parameters later - by humans - by machine learning algorithms Analysis will process all data What if we forget to add a parameter? - add the parameter - regenerate your reports Datacrunchers Consultancy Services 9Saturday 20 October 12
  • 10. Analyze Historical Data Conclusion - Traditionally: Ask first, store later - BigData: store first, ask later Datacrunchers Consultancy Services 10Saturday 20 October 12
  • 11. Prevent Data Loss Traditional technologies - Machine Failure § I hope you have a backup from yesterday? - Human Error § Whoops I deleted those records § I hope you have a backup from yesterday? - So in the worst case, you lose one day of data Datacrunchers Consultancy Services 11Saturday 20 October 12
  • 12. Prevent Data Loss BigData allows us to - Survive machine failure without data-loss - Survive human error without data-loss But - You need a data-model which supports this § Incremental model - You need to restrict operations § Only append data, No updates or deletes Datacrunchers Consultancy Services 12Saturday 20 October 12
  • 13. Prevent Data Loss Conclusion - Traditional technologies § requires very advanced setups to handle machine failure § allow you to go back to yesterday’s state - BigData § requires knowledge of how the failover algorithms work § expects failure most of the time § allows you to go back to the previous state Datacrunchers Consultancy Services 13Saturday 20 October 12
  • 14. Eliminate Data Silo’s Departments having their own data sources - start to modify that data - start to treat it as their master data - not coupled to the master dataset Causes a lot of overhead - Silo’s miss master data updates - Business decisions based on silo data, not the more accurate master data No obvious way out Datacrunchers Consultancy Services 14Saturday 20 October 12
  • 15. Eliminate Data Silo’s Consolidate the silo’s - Identify the silo’s - Import the data from the silo’s into one store - Reconstruct master data based on silo rules and priorities Sales Sa Master Marketing M Data Support Su Datacrunchers Consultancy Services 15Saturday 20 October 12
  • 16. Eliminate Data Silo’s Generate read-only data-models per application Data changes are sent to the master data - using a specific api - using database triggers M1 ERP/CRM DB Master M2 Public API Data M3 DataWarehouse Datacrunchers Consultancy Services 16Saturday 20 October 12
  • 17. Eliminate Data Silo’s Conclusion - You will have to consolidate - But you need a structural solution - Which can be provided by BigData - In a flexible and future-proof way Datacrunchers Consultancy Services 17Saturday 20 October 12
  • 18. Conclusion There is a lot to think about But BigData can do a lot of things - A lot more than I explained today For a reasonable price And you are not alone - bigdata.be - datacrunchers.eu Datacrunchers Consultancy Services 18Saturday 20 October 12
  • 19. Questions?Saturday 20 October 12