Big data, why care

BigData, Why Care?

Saturday 20 October 12

Speaker
Daan Gerits
- BigData Architect
- DataCrunchers.eu
§ Semantic Analysis, Data Harvesting, ...
§ Hadoop, Azure, BigInsights, ...
§ Storm
BigData.be co-organizer

Datacrunchers Consultancy Services 2


BigData
A lot of technical fuzz
- Hadoop, Storm, Pig, ...
Seems to be only for the big players
- Google, Facebook, Linkedin, Twitter, ...
So why should ‘we’ care?
- we = Startups, Smaller and Medium Enterprises (SSME)



What BigData Promises
Ability to store and process large amounts of data
- Scalable in hardware and software
- Scalable in budget
Which means your budget can grow with your data
- start small with a small cluster
- the more data you want to manage, the more systems
you add
Lower cost systems
- Several low to medium end systems
- instead of 1 big expensive one



But what can you do with it?
Analyze your data with higher precision
Analyze historical facts
Prevent Data Loss
- Infrastructure failure
- Human errors
Eliminate data silo’s



High Precision Analysis
Traditional Technologies
- Problems:
§ Unable to store all data
- Solutions:
§ Sharding
§ Aggregate data
- Problems:
§ Sharding has a high maintanance cost
§ Sharding is complex for users and apps
§ Manual sharding adds a high risk
§ Data Aggregation causes loss in data precision



High Precision Analysis
BigData allows us to
- Store and process large amounts of data
§ So no need to aggregate
- ‘Forget’ about sharding
§ BigData technologies do this for you
§ Makes it predictable
§ And transparant
But
- You have to configure it correctly
- You don’t have ad-hoc querying (yet)



Analyze Historical Facts
Data Warehouse
- Built on top of parameters
What if we forget to add a parameter?
- Add the parameter
- Start gathering information for that parameter
Problem:
- We will only have information from the moment we add
the parameter!



Analyze Historical Facts
Let’s store everything
Determine the parameters later
- by humans
- by machine learning algorithms
Analysis will process all data
What if we forget to add a parameter?
- add the parameter
- regenerate your reports



Analyze Historical Data
Conclusion
- Traditionally: Ask first, store later
- BigData: store first, ask later



Prevent Data Loss
Traditional technologies
- Machine Failure
§ I hope you have a backup from yesterday?
- Human Error
§ Whoops I deleted those records
§ I hope you have a backup from yesterday?
- So in the worst case, you lose one day of data



Prevent Data Loss
BigData allows us to
- Survive machine failure without data-loss
- Survive human error without data-loss
But
- You need a data-model which supports this
§ Incremental model
- You need to restrict operations
§ Only append data, No updates or deletes



Prevent Data Loss
Conclusion
- Traditional technologies
§ requires very advanced setups to handle machine failure
§ allow you to go back to yesterday’s state
- BigData
§ requires knowledge of how the failover algorithms work
§ expects failure most of the time
§ allows you to go back to the previous state



Eliminate Data Silo’s
Departments having their own data sources
- start to modify that data
- start to treat it as their master data
- not coupled to the master dataset
Causes a lot of overhead
- Silo’s miss master data updates
- Business decisions based on silo data, not the more
accurate master data
No obvious way out



Consolidate the silo’s
- Identify the silo’s
- Import the data from the silo’s into one store
- Reconstruct master data based on silo rules and priorities

Sales Sa
Master
Marketing M
Data

Support Su



Generate read-only data-models per application
Data changes are sent to the master data
- using a specific api
- using database triggers

M1 ERP/CRM DB

Master
M2 Public API
Data

M3 DataWarehouse



Conclusion
- You will have to consolidate
- But you need a structural solution
- Which can be provided by BigData
- In a flexible and future-proof way



Conclusion
There is a lot to think about
But BigData can do a lot of things
- A lot more than I explained today
For a reasonable price
And you are not alone
- bigdata.be
- datacrunchers.eu



Questions?


Big data, why care

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data, why care

Similar to Big data, why care (20)

More from Daan Gerits

More from Daan Gerits (6)

Recently uploaded

Recently uploaded (20)

Big data, why care