REBECCA PARSONS
rparsons@thoughtworks.com
http://thoughtworks.com
CTO
The Evolving
Panorama of Data
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese an...
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the sprea...
Changing Nature of Data
Response
How we use data now
3 v’s velocity, variety and volume (and add value)
Walmart: 1 million transactions per hour
Facebook: 40 billion photos
The Economist: Feb 25th 2010
Data is: Growing
Set up ...
640K ought to be
enough for anybody
Note: Although this is often attributed to Bill Gates - he never said
it.
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
1,482,824
1,287,537
1,080,872
853,698
616,308
356,191
127,942
40,22...




Data is: Distributed
Data is: Distributed
98% of internet access
points in Africa are mobile
30 million networked sensor nodes
growing 30% per ...
Data is: Valuable
$300 billion / year for US health care
60% increase in retail margins
McKinsey Global Institute: Big dat...
Data is: Urgent
Time lag: need Input but not in a batch job
Data is: Connected
In unpredictable, imprecise, important. valuable and evolving ways
Changing Nature of Data
Response
How we use data now
"NoSQL"
Document
Graph
Graph
Key-value
Column-
family
Graph
Graph
Polyglot
Persistence
Varied representations permit different perspectives on data and
can expose new insights
Data Sources
were will be
text
image
video
connections
and meta data
Analytics
will be
pattern recognition
data mining
chasing connections
were
roll-ups
trends
variance
Not just chasing but d...
Changing Nature of Data
Response
How we use data now
10,000 ft view (literally)
CodeCity by
Richard Wettel
http://www.inf.unisi.ch/phd/wettel/codecity.html
Separate the piles ...
Interesting aspect is finding the relationships
Personalization changes the way we use the web
Way data is combined from disparate, disconnected data sources
was 148,00 < 1 yr ago
School safety
Brazil meeting
http://ushahidi.com/
means testimony. Initially tracked post-2008
election violence in Kenya (45,000+ users for
this)
132 ...
http://libyacrisismap.net/
End sexual harassment in Egypt (site also in Arabic)
July 2011, now 239 datasets
8,500+ downloads in first 3 months
Expenditures but also census and poverty data
NYACLU Stop and Frisk suit
Combining multiple sites for cohesive picture
(screen scraping in this case)
Relating lots of different data, scraped from many different
applications. Dirty data, incomplete data, etc.
http://unglobalpulse.org
Trying to predict disasters... cell phone top up purchases, etc
Can’t talk about data science here in the US without mentioning
Prism
What about
us?
More data more readily available requires better access
protection. Protect internally, from hackers and more accidental
e...
Order-Taker
Syndrome
Must be part of the process
REBECCA PARSONS
rparsons@thoughtworks.com
http://thoughtworks.com
CTO
clip art from http://openclipart.org
Thank you!
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/big-data-
analysis
Evolving Panorama of Data
Evolving Panorama of Data
Evolving Panorama of Data
Upcoming SlideShare
Loading in...5
×

Evolving Panorama of Data

248

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/17bqGie.

Rebecca Parsons reviews some of the changes in how data is used and analyzed, including new technology approaches, looking at how data is used to track election violence, movement of people after a natural disaster, and attempts to predict famine and other humanitarian crises before they happen.Filmed at qconnewyork.com.

Dr. Rebecca Parsons is ThoughtWorks' Chief Technology Officer. She has more than 30 years' experience in leading the creation of large-scale distributed and services based applications, and the integration of disparate systems. Rebecca received a BS in Computer Science and Economics from Bradley University, and both an MS and Ph.D. in Computer Science from Rice University.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
248
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Evolving Panorama of Data

  1. 1. REBECCA PARSONS rparsons@thoughtworks.com http://thoughtworks.com CTO The Evolving Panorama of Data
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /big-data-analysis
  3. 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  4. 4. Changing Nature of Data Response How we use data now 3 v’s velocity, variety and volume (and add value)
  5. 5. Walmart: 1 million transactions per hour Facebook: 40 billion photos The Economist: Feb 25th 2010 Data is: Growing Set up next slide by saying that some people think only Google size companies should worry about this.
  6. 6. 640K ought to be enough for anybody Note: Although this is often attributed to Bill Gates - he never said it.
  7. 7. 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1,482,824 1,287,537 1,080,872 853,698 616,308 356,191 127,942 40,223 8,6401,990442 Monthly Contributors to Wikipedia souce: wikipedia Data is: Distributed http://stats.wikimedia.org/EN/ TablesWikipediansContributors.htm Contributors defined as people who edited at least 10 times. Data for the month of January for the years in question
  8. 8.     Data is: Distributed
  9. 9. Data is: Distributed 98% of internet access points in Africa are mobile 30 million networked sensor nodes growing 30% per year McKinsey Global Institute: Big data:The next frontier for innovation, competition, and productivity
  10. 10. Data is: Valuable $300 billion / year for US health care 60% increase in retail margins McKinsey Global Institute: Big data:The next frontier for innovation, competition, and productivity
  11. 11. Data is: Urgent Time lag: need Input but not in a batch job
  12. 12. Data is: Connected In unpredictable, imprecise, important. valuable and evolving ways
  13. 13. Changing Nature of Data Response How we use data now
  14. 14. "NoSQL"
  15. 15. Document Graph Graph Key-value Column- family
  16. 16. Graph
  17. 17. Graph Polyglot Persistence Varied representations permit different perspectives on data and can expose new insights
  18. 18. Data Sources were will be text image video connections and meta data
  19. 19. Analytics will be pattern recognition data mining chasing connections were roll-ups trends variance Not just chasing but discovering connections - significant value here. EXPLoratory
  20. 20. Changing Nature of Data Response How we use data now
  21. 21. 10,000 ft view (literally) CodeCity by Richard Wettel http://www.inf.unisi.ch/phd/wettel/codecity.html Separate the piles into good and bad
  22. 22. Interesting aspect is finding the relationships
  23. 23. Personalization changes the way we use the web
  24. 24. Way data is combined from disparate, disconnected data sources
  25. 25. was 148,00 < 1 yr ago School safety Brazil meeting
  26. 26. http://ushahidi.com/ means testimony. Initially tracked post-2008 election violence in Kenya (45,000+ users for this) 132 countries, 20,000+ instances
  27. 27. http://libyacrisismap.net/
  28. 28. End sexual harassment in Egypt (site also in Arabic)
  29. 29. July 2011, now 239 datasets 8,500+ downloads in first 3 months
  30. 30. Expenditures but also census and poverty data
  31. 31. NYACLU Stop and Frisk suit
  32. 32. Combining multiple sites for cohesive picture (screen scraping in this case)
  33. 33. Relating lots of different data, scraped from many different applications. Dirty data, incomplete data, etc.
  34. 34. http://unglobalpulse.org Trying to predict disasters... cell phone top up purchases, etc
  35. 35. Can’t talk about data science here in the US without mentioning Prism
  36. 36. What about us?
  37. 37. More data more readily available requires better access protection. Protect internally, from hackers and more accidental exposure. Balance needs versus privacy, even given changing expectations around privacy. Also worry about accuracy
  38. 38. Order-Taker Syndrome Must be part of the process
  39. 39. REBECCA PARSONS rparsons@thoughtworks.com http://thoughtworks.com CTO clip art from http://openclipart.org Thank you!
  40. 40. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/big-data- analysis

×