Hi, My Name is Abhijit Lele, I am a solutions Engineer @ hortonworks. I support our customers to understand and achieve their business and technical goals with Hadoop and Big data ecosystem in general.
So if we were to turn our original assumptions on their respective heads, we might be able to come up with an alternate set of rules, that allow for a new way of thinking about large data stores.
What is Big Data?• Big does not have to be always Petabytes• Big refers to big enough for traditional systems to handle efficiently
Big Data Facts• Twitter generates 8TB of data every day• eBay data warehouse is 10+ PB• Facebook data warehouse is 36+ PB• Yahoo! Has 100+ PB data• Google scans and indexes 500+ PB data
Data Types• Structured – Pre-defined schema – Example: relational database system• Semi Structured – No identifiable structure – Cannot be stored in rows and tables in a database – Examples : logs, tweets,• Un Structured – Irregular structure or it lacks structure – Examples: free-form text, reports, customer feedback forms Copyright Hortonworks 2012 4
Characteristics of Big Data• Volume• Velocity• Variety• Value Copyright Hortonworks 2012 5
Problem with Legacy Solution• Expensive – Scale up costs lots of $$• Rigid• Stale Data Copyright Hortonworks 2012 6
Hadoop Approach• Process data locally• Expect Hardware failures• Handle failover elegantly• Duplicate a small percentage of the data to small groups (versus entire database)
Compare with RDBMS Copyright Hortonworks 2012 8