• Save
Evolving Panorama of Data
Upcoming SlideShare
Loading in...5

Evolving Panorama of Data



Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/17bqGie. ...

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/17bqGie.

Rebecca Parsons reviews some of the changes in how data is used and analyzed, including new technology approaches, looking at how data is used to track election violence, movement of people after a natural disaster, and attempts to predict famine and other humanitarian crises before they happen.Filmed at qconnewyork.com.

Dr. Rebecca Parsons is ThoughtWorks' Chief Technology Officer. She has more than 30 years' experience in leading the creation of large-scale distributed and services based applications, and the integration of disparate systems. Rebecca received a BS in Computer Science and Economics from Bradley University, and both an MS and Ph.D. in Computer Science from Rice University.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Evolving Panorama of Data Evolving Panorama of Data Presentation Transcript

  • REBECCA PARSONS rparsons@thoughtworks.com http://thoughtworks.com CTO The Evolving Panorama of Data
  • InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /big-data-analysis
  • Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • Changing Nature of Data Response How we use data now 3 v’s velocity, variety and volume (and add value)
  • Walmart: 1 million transactions per hour Facebook: 40 billion photos The Economist: Feb 25th 2010 Data is: Growing Set up next slide by saying that some people think only Google size companies should worry about this.
  • 640K ought to be enough for anybody Note: Although this is often attributed to Bill Gates - he never said it.
  • 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1,482,824 1,287,537 1,080,872 853,698 616,308 356,191 127,942 40,223 8,6401,990442 Monthly Contributors to Wikipedia souce: wikipedia Data is: Distributed http://stats.wikimedia.org/EN/ TablesWikipediansContributors.htm Contributors defined as people who edited at least 10 times. Data for the month of January for the years in question
  •     Data is: Distributed
  • Data is: Distributed 98% of internet access points in Africa are mobile 30 million networked sensor nodes growing 30% per year McKinsey Global Institute: Big data:The next frontier for innovation, competition, and productivity
  • Data is: Valuable $300 billion / year for US health care 60% increase in retail margins McKinsey Global Institute: Big data:The next frontier for innovation, competition, and productivity
  • Data is: Urgent Time lag: need Input but not in a batch job
  • Data is: Connected In unpredictable, imprecise, important. valuable and evolving ways
  • Changing Nature of Data Response How we use data now
  • "NoSQL"
  • Document Graph Graph Key-value Column- family
  • Graph
  • Graph Polyglot Persistence Varied representations permit different perspectives on data and can expose new insights
  • Data Sources were will be text image video connections and meta data
  • Analytics will be pattern recognition data mining chasing connections were roll-ups trends variance Not just chasing but discovering connections - significant value here. EXPLoratory
  • Changing Nature of Data Response How we use data now
  • 10,000 ft view (literally) CodeCity by Richard Wettel http://www.inf.unisi.ch/phd/wettel/codecity.html Separate the piles into good and bad
  • Interesting aspect is finding the relationships
  • Personalization changes the way we use the web
  • Way data is combined from disparate, disconnected data sources
  • was 148,00 < 1 yr ago School safety Brazil meeting
  • http://ushahidi.com/ means testimony. Initially tracked post-2008 election violence in Kenya (45,000+ users for this) 132 countries, 20,000+ instances
  • http://libyacrisismap.net/
  • End sexual harassment in Egypt (site also in Arabic)
  • July 2011, now 239 datasets 8,500+ downloads in first 3 months
  • Expenditures but also census and poverty data
  • NYACLU Stop and Frisk suit
  • Combining multiple sites for cohesive picture (screen scraping in this case)
  • Relating lots of different data, scraped from many different applications. Dirty data, incomplete data, etc.
  • http://unglobalpulse.org Trying to predict disasters... cell phone top up purchases, etc
  • Can’t talk about data science here in the US without mentioning Prism
  • What about us?
  • More data more readily available requires better access protection. Protect internally, from hackers and more accidental exposure. Balance needs versus privacy, even given changing expectations around privacy. Also worry about accuracy
  • Order-Taker Syndrome Must be part of the process
  • REBECCA PARSONS rparsons@thoughtworks.com http://thoughtworks.com CTO clip art from http://openclipart.org Thank you!
  • Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/big-data- analysis