Talking about Big Data generates a lot of questions; however, most of the focus is on the technologies and skills required to collect and store this volume of information as opposed to the insight that companies need to derive from it. What factors should organizations consider in order to ensure that they are capitalizing on their investments with these technologies? How do you break through business silos to enable sharing of data to increase organizational value? Leveraging his cross-industry experience at companies like The Walt Disney Company, Travelers Insurance and Demand Media, Brendan Aldrich will discuss the question of “big value” with industry examples and a particular focus on his current work to deploy a “data democracy” within the City Colleges of Chicago.
Session Discovery Topics:
• Big value - keeping an eye on the forest (assumptions, judgment and bias)
• Data democracy - increasing productivity with data transparency and open access
1. Chicago Big Data Executive Summit
June 12, 2013
Big Value
from
Generating
Big Data
2. Using Data to Derive Value
• Lessons Learned:
– Data size is relative to an organizations ability to make use of it
– Assumptions and bias can get in the way
– The best insights are actionable
3. R. Brendan Aldrich
Executive Director, Data Warehousing
City Colleges of Chicago
• 18 years in Information Technology
• 13 years running data warehouse, business intelligence and analytics teams for
global high volume data companies such as The Walt Disney
Company, Travelers Insurance and Demand Media
• Currently building a data democracy at the City Colleges of Chicago
• TDWI and AERA membership
Speaker Introduction
4. • Colleges:
– Richard J. Daley College
– Kennedy-King College
– Malcolm X College
– Olive-Harvey College
– Harry S Truman College
– Harold Washington College
– Wilbur Wright College
• Satellites:
– Lakeview Learning Center
– Dawson Technical Institute
– West Side Learning Center
– South Chicago Learning Center
– Arturo Velasquez Institute
– Humboldt Park Vocational
Education Center
• Culinary
– The French Pastry School
– Washburn Culinary Institute
• Parot Cage Restaurant
• Sikia Banquet Room
• Broadcast
– WYCC TV (Channel 20)
– WKKC FM 89.9
…as well as five child development centers, the Center for Distance Learning and
the Workforce Institute
The City Colleges of Chicago is the largest community college district in the state of Illinois
and one of the largest in the country with more than 5,800 administrators, staff and faculty
educating over 120,000 students annually at facilities located within the city of Chicago.
5. The Origin of Big Data
John Mashey, chief scientist at Silicon
Graphics until 2000, gave hundreds of
talks to small groups in the mid-to-late
1990’s using the term “Big Data” to
describe how the boundaries of
computing keep advancing.
1
6. Gartner Group
2001: Doug Laney first uses
“Volume, Velocity & Variety” to describe
Big Data
2
2012: Gartner updates the definition to:
“Big data are high volume, high velocity
and/or high variety information assets that
require new forms of processing to enable
enhanced decision making, insight
discovery and process automation”
7. Datafication is Driving Big Data
Datafication:
Creating new data that didn’t previously exist in digital form
The more you know about your customer, the better you can differentiate
yourself from your competitors.
8. Disney’s Magic Bands
3
• Customer Value:
– Disney’s MagicBands will allow park guests to access the park, sign up for ride
waitlists (FastPass), interact with characters, purchase items, lost parents, etc.
• Company Value:
– What type of guest are you and how do you route through the park
(rides, concessions, shows, purchases, etc.)
– Route optimization, scheduling, ride balancing
– Know your customer
• Worldwide
– 121.4 million guests (2011)
• Florida
– 17.1 million guests (2011)
9. Getting to Big Value
(or… Don’t Miss the Trees for the Forest)
1. Gathering vs. Understanding
2. Assumptions
3. Bias
10. Barrier #1: Gathering vs. Understanding
“Big Data is not defined by it’s data management
challenges, but by the organization’s capabilities
in analyzing the data, deriving intelligence from
it, and leveraging it to make forward looking
decisions.”
4
- Issac Sacolick, VP Technology at McGraw-Hill Construction
13. Value Derived from Human Interaction
“Data and data sets are not objective; they are creations
of human design. We give numbers their voice, draw
inferences from them, and define their meaning
through our interpretations.”
5
- Kate Crawford, Principal Researcher @ Microsoft Research
14. What Does Your Data Weigh?
• Light Data
– Easily quantifiable measures and facts
• Mid-Weight Data
– Interesting data; trends; patterns
• Heavy Data
– Rich, meaningful, verified, and
actionable data
Data classification on the value being derived from the data
15. Barrier #2: Assumptions
People inherently make assumptions…
which can lead you to find what you expect
as opposed to the marketable anomalies
16. • DVD rental and video streaming company with over
– 33 million subscribers (29 million streaming) in 40 countries
• Big Data Stats:
– More than 50 Cassandra clusters with over 750 nodes
– More than 50,000 reads & 100,000 writes per second.
• Claims 75% of its subscribers are influenced by what it
suggests they will like.
6
17. House of Cards
• Netflix’s data indicated that the same
subscribers who loved the original BBC
production of “House of Cards” also loved
movies starring Kevin Spacey or directed
by David Fincher.
7
• Netflix has committed $100 million to
create two 13-episode seasons.
18. Were they Right?
• From a data standpoint, it’s hard to know since
Netflix doesn’t release viewership numbers.
• But how else could we evaluate?
– Facebook likes: 206k
– Twitter: 34,706 Followers
– Mainstream Culture
• Magazine Covers?
• Talk shows?
• What do you hear?
• What could we conclude?
19. Barrier #3: BIAS
“Hidden biases in both the collection and analysis
stages present considerable risks, and are as
important to the big-data equation as the numbers
themselves.”
5
- Kate Crawford, Principal Researcher @ Microsoft Research
20. Classification of Bias8
• Cognitive
– Misunderstanding of the
probabilities.
• Selection
– Most available, convenient
and/or cost-effective as
opposed to most relevant.
• Sampling
– Most relevant to a subset that
may not hold true in the
wider population.
• Modeling
– Biased assumptions drive
selection of wrong variables
• Funding
– Assumptions, interpretations,
data and applications skewed
to favor funding party
• Representation
– Larger data sets do not
ensure that the data is
representative.
21. Accounting for Bias9
• Know your Enemy
– Be aware of biases that may affect your analysis. Document them as
part of your results
• Make use of Subject Matter Experts
– Validate your results with domain experts and use them to test your
findings and algorithms
• Continuous Exploration
– Don’t settle for satisfactory! Investigate the anomalies and explore
the data outside of your focus
22. Generating Big Value
• Big Data is quantitative
• Deriving meaningful insights requires people
• Managing assumptions and bias increases value
• Insights identified can be acted upon
• Insights acted upon must be continually reviewed
Anything Else?
23. Rise of the Data Democracy
“Humans are not an important part of utilizing
new data, they are single most important part of
the process.”
10
- Bryce Maddock, CEO of TaskUs.com
25. Building a Data Democracy:
Enable Everyone with Access
• The right data must be available in all
areas of the organization.
• Access to and use of data will create
positive and lasting change.
• All City Colleges of Chicago employees
will be able to use this platform to
obtain data and/or run reports.
Only part of this challenge is licensing cost! Organizational acceptance, tool
selection, bandwidth, data comprehension and accessible training are
critical!
27. Building a Data Democracy:
One-Size Does Not Fit All
… and Interactive
Analytics for all users
Reports …User-Created
Dashboards
A unified data warehouse and web-based interface for
accessing and interacting with data
28. Building a Data Democracy:
Increase Data Comprehension & Skills
Integrated Data Dictionary and Online Training
By integrating necessary reference and training information directly
into the analytics website, we enable our employees to know with
certainty what their data means and how to use it effectively.
29. Takeaways
• Generating Big Value from Big Data:
– Datafication is driving differentiation in the marketplace
• Collect the data that drives your business
– The value in Big Data is derived from human insight
• How much does your data weigh?
– Be aware of Assumptions and Bias in your approach
• Evaluate what does and doesn’t benefit your analysis
– Enable everyone with the right data to succeed
• Data democracy
31. References
• Infographics
IBM Big Data Hub, Infographic, “Tuning Into Big Data As The Buzz Gets
Louder”, 9/26/12, http://www.ibmbigdatahub.com/infographic/tuning-big-data-buzz-gets-
louder
Mushroom Networks, Infographic, “Landscape of Big
Data”, 2013, http://www.mushroomnetworks.com/infographics/landscape-of-big-data
Graeme Noseworthy, Infographic, “The Flood of Big
Data”, 4/24/12, http://analyzingmedia.com/2012/infographic-big-flood-of-big-data-in-
digital-marketing/
4 Issac Sacolick, Blog, “What is Big Data The Real Challenges Beyond Volume, Velocity and
Variety”, 12/11/12, http://blogs.starcio.com/2012/12/what-is-big-data-real-challenges-
beyond.html
7 Mary McNamara, Los Angeles Times, “Netflix’s ‘House of Cards’ looks, but doesn’t
sound, like a hit””, 4/27/13, http://articles.latimes.com/2013/apr/27/entertainment/la-et-
st-house-of-cards-netflix-20130427
6 Andrew Leonard, Salon, “How Netflix is turning viewers into
puppets”, 2/1/13, http://www.salon.com/2013/02/01/how_netflix_is_turning_viewers_into
_puppets/
• Articles
5 Kate Crawford,Blog, Harvard Business Review, “The Hidden Biases in Big
Data”, 4/1/13, http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html
8 James Kobielus,IBM Big Data Hub, “Data Scientist: Bias, Backlash and Brutal Self-
Criticism”, 5/16/13, http://www.ibmbigdatahub.com/blog/data-scientist-bias-backlash-and-
brutal-self-criticism
9 Haowen Chan and Robin Morris, GigaOm, “Careful: Your big data analytics may be polluted
by data scientist bias”, 5/4/13, http://gigaom.com/2013/05/04/careful-your-big-data-
analytics-may-be-polluted-by-data-scientist-bias/
10 James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles
Roxburgh and Angela Hung Byers, McKinsey Global Institute, “Big data: The next frontier
for innovation, competition and
productivity”, 5/11, http://www.mckinsey.com/insights/business_technology/big_data_th
e_next_frontier_for_innovation
3 Jules Polonetsky, Linkedin Post, “Magic Lessons for
Retailers”, 5/31/13, http://www.linkedin.com/today/post/article/20130531031125-258347-
magic-lessons-for-retailers
11 Bryce Maddock, Blog, “People and Big Data: Separately Good, Together
Great”, 9/26/12, http://www.huffingtonpost.com/bryce-maddock/big-data_b_1908358.html
1 Steve Lohr, The New York Times, “The Origins of ‘Big Data’: An Etymological Detective
Story”, http://bits.blogs.nytimes.com/2013/02/01/the-origins-of-big-data-an-etymological-
detective-story/
2 Doug Laney, Blog, “Deja VVVu: Others Claiming Garner’s Construct for Big
Data”, 1/14/12, http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-
volume-velocity-variety-construct-for-big-data/
Editor's Notes
PygmalionWritten in 1912 by George Bernard ShawLondon Premiere in 1914 at Sir Herbert Beerbohm Tree’s His Majesty’s TheatreAdapted as “My Fair Lady” in 1956
120 million people in the U.S. now own Smartphones72 hours of video are added to YouTube every minute230 million tweets per day30+ billion pieces of data added to facebook every month9,000 job search results for data scientists in 2012
Reasons Cited for Education: Lack of data-driven mindset and available dataCluster A: Computer and electronic products and information sectorsCluster B: Finance and Insurance and GovernmentCluster C: Construction, Arts, Education & Other: Negative productivity growth indicating strong systemic barriers to increasing productivityCluster D: Manufacturing, Transportation, Wholesale and Professional servicesCluster E: Local services
“A data democracy is not about making ‘all data available for everyone’. It’s about making sure that each person has access to the data, metadata, analytic tools and reports they need to best fulfill their own role and responsibilities.”