This was originally posted by MapR's CMO Jack Norris in June of 2013 -http://www.mapr.com/blog/how-big-is-big-data …
This was originally posted by MapR's CMO Jack Norris in June of 2013 -http://www.mapr.com/blog/how-big-is-big-data
" There’s been a lot in the news lately about the NSA and Verizon call detail records… how much data are they talking about?
Each phone call has a call detail record. A call detail record contains information about a call, not the call itself. In other words, these records contain information about the originating number, the terminating number, the length of the call, etc. At first glance, it would seem to be a huge endeavor to analyze all the calls in the U.S., and it would even seem to require a huge datacenter (or multiple datacenters) to store all the data.
But in reality, the amount of data is relatively small on the spectrum of Big Data projects. There are 300 million people in the U.S., and approximately 250 million of them are adults and teens. If we assume that everyone generates 10 phone calls per day, on average, we have over 2.5 billion phone calls. The size of a typical call detail record is 200 bytes. In some cases, there can be multiple records generated for a single call. Think of these call records as metadata. If we assume 10 call records per call, this would expand to 2KB of data for every phone call. Given these assumptions, the size of the data would be 5 terabytes per day.
At MapR, we have customers who are analyzing many times this amount of data on a daily basis. How big would the cluster need to be? Well, we have customers with 32TB of data on a single node. If an organization wanted to analyze 30 days of U.S. call detail records, it would be approximately 150TB of data, which is just 5 nodes of a MapR Hadoop cluster. The total call record volume for the U.S. wouldn’t come close to creating a busy signal on a MapR cluster."