• Like
Big data, why care
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Big data, why care



Published in Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. BigData, Why Care?Saturday 20 October 12
  • 2. Speaker Daan Gerits - BigData Architect - DataCrunchers.eu § Semantic Analysis, Data Harvesting, ... § Hadoop, Azure, BigInsights, ... § Storm BigData.be co-organizer Datacrunchers Consultancy Services 2Saturday 20 October 12
  • 3. BigData A lot of technical fuzz - Hadoop, Storm, Pig, ... Seems to be only for the big players - Google, Facebook, Linkedin, Twitter, ... So why should ‘we’ care? - we = Startups, Smaller and Medium Enterprises (SSME) Datacrunchers Consultancy Services 3Saturday 20 October 12
  • 4. What BigData Promises Ability to store and process large amounts of data - Scalable in hardware and software - Scalable in budget Which means your budget can grow with your data - start small with a small cluster - the more data you want to manage, the more systems you add Lower cost systems - Several low to medium end systems - instead of 1 big expensive one Datacrunchers Consultancy Services 4Saturday 20 October 12
  • 5. But what can you do with it? Analyze your data with higher precision Analyze historical facts Prevent Data Loss - Infrastructure failure - Human errors Eliminate data silo’s Datacrunchers Consultancy Services 5Saturday 20 October 12
  • 6. High Precision Analysis Traditional Technologies - Problems: § Unable to store all data - Solutions: § Sharding § Aggregate data - Problems: § Sharding has a high maintanance cost § Sharding is complex for users and apps § Manual sharding adds a high risk § Data Aggregation causes loss in data precision Datacrunchers Consultancy Services 6Saturday 20 October 12
  • 7. High Precision Analysis BigData allows us to - Store and process large amounts of data § So no need to aggregate - ‘Forget’ about sharding § BigData technologies do this for you § Makes it predictable § And transparant But - You have to configure it correctly - You don’t have ad-hoc querying (yet) Datacrunchers Consultancy Services 7Saturday 20 October 12
  • 8. Analyze Historical Facts Data Warehouse - Built on top of parameters What if we forget to add a parameter? - Add the parameter - Start gathering information for that parameter Problem: - We will only have information from the moment we add the parameter! Datacrunchers Consultancy Services 8Saturday 20 October 12
  • 9. Analyze Historical Facts Let’s store everything Determine the parameters later - by humans - by machine learning algorithms Analysis will process all data What if we forget to add a parameter? - add the parameter - regenerate your reports Datacrunchers Consultancy Services 9Saturday 20 October 12
  • 10. Analyze Historical Data Conclusion - Traditionally: Ask first, store later - BigData: store first, ask later Datacrunchers Consultancy Services 10Saturday 20 October 12
  • 11. Prevent Data Loss Traditional technologies - Machine Failure § I hope you have a backup from yesterday? - Human Error § Whoops I deleted those records § I hope you have a backup from yesterday? - So in the worst case, you lose one day of data Datacrunchers Consultancy Services 11Saturday 20 October 12
  • 12. Prevent Data Loss BigData allows us to - Survive machine failure without data-loss - Survive human error without data-loss But - You need a data-model which supports this § Incremental model - You need to restrict operations § Only append data, No updates or deletes Datacrunchers Consultancy Services 12Saturday 20 October 12
  • 13. Prevent Data Loss Conclusion - Traditional technologies § requires very advanced setups to handle machine failure § allow you to go back to yesterday’s state - BigData § requires knowledge of how the failover algorithms work § expects failure most of the time § allows you to go back to the previous state Datacrunchers Consultancy Services 13Saturday 20 October 12
  • 14. Eliminate Data Silo’s Departments having their own data sources - start to modify that data - start to treat it as their master data - not coupled to the master dataset Causes a lot of overhead - Silo’s miss master data updates - Business decisions based on silo data, not the more accurate master data No obvious way out Datacrunchers Consultancy Services 14Saturday 20 October 12
  • 15. Eliminate Data Silo’s Consolidate the silo’s - Identify the silo’s - Import the data from the silo’s into one store - Reconstruct master data based on silo rules and priorities Sales Sa Master Marketing M Data Support Su Datacrunchers Consultancy Services 15Saturday 20 October 12
  • 16. Eliminate Data Silo’s Generate read-only data-models per application Data changes are sent to the master data - using a specific api - using database triggers M1 ERP/CRM DB Master M2 Public API Data M3 DataWarehouse Datacrunchers Consultancy Services 16Saturday 20 October 12
  • 17. Eliminate Data Silo’s Conclusion - You will have to consolidate - But you need a structural solution - Which can be provided by BigData - In a flexible and future-proof way Datacrunchers Consultancy Services 17Saturday 20 October 12
  • 18. Conclusion There is a lot to think about But BigData can do a lot of things - A lot more than I explained today For a reasonable price And you are not alone - bigdata.be - datacrunchers.eu Datacrunchers Consultancy Services 18Saturday 20 October 12
  • 19. Questions?Saturday 20 October 12