4. "The number of transistors incorporated in a chip will
approximately double every 24 months.“
--Gordon Moore, Intel co-founder
4
5. Excel’s Limit
Feature Maximum limit
Open
workbooks
Limited by available
memory and system
resources
Worksheet size 65,536 rows by 256 columns
Column width 255 characters
Row height 409 points
Feature Maximum limit
Open
workbooks
Limited by available memory
and system resources
Worksheet size 1,048,576 rows by 16,384 columns
Column width 255 characters
Row height 409 points
Excel 2003
Excel 2007
5
57. Humans as Visual Animals
Computers excel at predictive models.
Computers excel at data mining.
Humans perceive and interpret better.
Human vision still plays an important role.
57
58. What Humans Do Well
Identifying visual patterns
Identifying anomalies
Seeing patterns across groups
Interpreting content of images
58
Dan Ariely (professor of behavioral economics at Duke University)
…everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...“
Doug Laney in 2001
three V's of volume, velocity, and variety.
Obviously this is a relative definition.
physical capacity and performance of computers double about every two years.
over a million
Iris data
collected by Edgar Anderson
analyzed by Ronald Fisher
both of whom published their papers in 1936
This data set contains four measurements: The widths and lengths of the petals and sepals for three species of Iris. It's got about 150 cases
hasn't changed in nearly 80 years
fire hose
One researcher has estimated that 80 percent of enterprise data may be unstructured
semi – structured
Xml – Extensible Markup Language
Json - JavaScript Object Notation
In fact, a recent study by Forester Research shows that variety is the biggest factor that's leading companies to big data solutions. In fact, variety was mentioned over four times as often as data volume.
you can't use your standard approach
advertising or marketing
ad - informal an advertisement
retailers lose about $3.5 billion dollars each year to online fraud
insurance fraud, not counting health insurance, is estimated to be more than $40 billion dollars per year
NIH - National Institutes of Health
NASA - National Aeronautics and Space Administration
scholarship and research
Drew Conway in 2010
Exif = Exchangeable image file format
meta data that comes from a picture on an iPhone
Time
GPS altitude, latitude, longitude and position
GPS image direction
Immerse = to put someone or something deep into a liquid so that they are completely covered
Radio-frequency identification (RFID)
Now, it's estimated that by 2020, which is only a few years from now, as many as 30 billion uniquely identifiable devices may be connected to the internet.
They're all in the same physical space
There's usually only one copy
it doesn't do anything about sharing
Infrastructure as a service (IaaS)
Platform as a service (PaaS)
Software as a service (SaaS)
Data as a service (DaaS)
GIGO, for garbage in, garbage out
You may recall the Mars climate orbiter that NASA launched back in 1998 at a cost of nearly $300 million and it crashed when it approached Mars a year later because some of the flight programming data had not been converted to metric units.
Data starts and ends in Hadoop.
Hadoop can handle different formats.