Understanding Big Data...
...or trying to understand a new
awesome paradigm
POST /hello-world
Javier Lafora
(@eLafo)
A small walk through historyA small walk through history
Data Growth
“90% of the data in the world today has been
created in the last two years alone.” - IBM
http://www-01.ibm.com/software/in/data/bigdata/
Data growth
“There were 5 Exabytes of information created
between the dawn of civilization through 2003,
but that much information is now created every 2
days”. - Eric Schmidt, Google CEO, 2010
Traditional approach
One powerful
computer for all the
data...
Traditional approach
...until it reaches
storage limit...
Traditional approach
...or until it reaches
process limit
Distribute your data
Parallelize your computing
Distributing your data
Distributed filesystems
A DFS manages files and folders across
multiple computers. It serves the same purpose
as a traditional file system, but is designed to
provide file storage and controlled access to
files over local and wide area networks.
You don't want to worry about where
a file is located
You don't want to worry about
replicating data
You don't want to worry about
managing failures
Parallelizing computation
You don't want to worry about
breaking computation into pieces
You don't want to worry about
scaling your code
Hadoop to the rescue
Distributed File System
Parallelizing algorithm
Map Reduce
Three different problems
Volume
Streaming / real time
Variety
Structured data
Semi-structured data
Unstructured data
Questions?
aspgems.com
fin
Thanks

Understanding big data-drupalcamp

Editor's Notes

  • #12 1 EB = 1024 PB 1 PB = 1024 TB 1 TB = 1024 GB