Big data for beginners. Tried to prove that "Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it..." is totally wrong.
2. –Getting started in ‘Big Data’
“Wanted: Ph.D.-level statistician with the
technical skill to use data-visualization software
and a deep understanding of the _____ industry.”
3. Gartner: "high volume, velocity and/or variety information assets that
demand cost-effective, innovative forms of information processing
that enable enhanced insight, decision making, and process
automation."
SAS: "The term big data has been around for decades, and we've
been doing analytics all this time. It's not big, it's just bigger."
Wikipedia: Big data is the term for a collection of data sets so large
and complex that it becomes difficult to process using on-hand
database management tools or traditional data processing
applications.
4.
5.
6.
7.
8.
9. Size # many standard laptop hdd
U.S. Library of Congress 235 terabytes 320
Youtube
48 hours video / minute
371 terabytes / day
495
Facebook 100 terabytes / day 133
Walmart 2.5 petabytes / hour 3500
Average hospital in 2015 665 terabytes / year 908
12. • Visa recently advised that it has greatly improved its ability to detect fraudulent
transactions (estimated to be 6 cents out of every $100) by increasing the amount of
data it analyzes and looking at a broader range of attributes for each transaction.
• Citibank has improved the quality of its consumer loan portfolio by hiring IBM's
Watson supercomputer as a "financial advisor." By using information on market
conditions as well as the applicant's life events, interactions on social media and past
decisions, the company is able to get a far better prediction of potential loan defaults
and fraud.
• Walmart applied big data techniques and technologies to allow it to understand how
to better serve its online customers. The retailer generated product and category
popularity scores by mining social media, which it combined with a self-teaching
semantic search capability honed by the clickstream data of 45 million online
shoppers each month.
• Netflix uses data analysis to refine movie recommendations and customer searches,
as well as to identify which movies and TV shows to license or develop.
13.
14. Data is only as good as the intelligence we can
glean from it, and that entails effective data
analytics and a whole lot of computing power to
cope with the exponential increase in volume.