data, big data, open data

  • 1,016 views
Uploaded on

 

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,016
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
14
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Innovazione tecnologica, web e statisticadata, big data, open data Vincenzo Patruno Roma, 29 gennaio 2013
  • 2. Un mondo di dati
  • 3. Obama’s ElectionVictory
  • 4. Creating a “single source of truth”Combining disparate data sources of potential donors, volunteers and voters(email, postal, telephone, mobile and social contacts with historical votingrecords, polling and fundraising data)They built a single view of individuals that informedtheir strategies for raising funds, mobilizingvolunteers and securing votes.Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423
  • 5. Profiling and predicting Demographics and data collected by fieldwork on the campaign trail were added to the mix, allowing predictive modelling to score people on their likelihood to donate or vote for the Democrats. Channels of communication were optimized, and the type of messaging was tailored to maximize the likelihood of response.Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423
  • 6. Turning data into the human touchThe power of localised networks andneighbourhoodsUsing centralized data to provide geo-targeted insight, campaignvolunteers could base themselves in the areas that mattered most, talkingto the voters they had got to know since the start of the 2008 campaign.Deliver their message from within communitiesThe impact of this saw them receive double the votes they achieved in2008 in the marginal states.Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423
  • 7. Turning data into the human touchSono stati oltre due milioni i piccoli donatori che hannoversato nelle casse della sua campagna oltre 427milioni di dollari.Circa il 55% dei fondi raccolti sono arrivate da donazionisotto i 200 dollari.Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423
  • 8. Focus on the swing statesRegular polling of states like Ohio throughout thecampaign provided valuable data for the team to processand analyze trends.For example, the analysts could track the impact of the three TV debates onthe democratic vote in real-time and were able to identify specific segments totarget with campaign material – split by region, demographics and the profilescoring that had been modeled in the new database. One Democrat officialcommented that they scenario tested the election 66,000 times every night inorder to calculate predicted outcomes for swing states.Campaign resource was then allocated appropriately to persuade undecidedvoters most likely to pledge their allegiance to Obama.By the time election day came around, the Democrats hada clear idea of how voting in the swing states was looking.Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423
  • 9. Data science involvement in the election wasn’tjust restricted to the candidates’ teams.Nate Silver used sabermetrics to accurately predict the outcome ofall 50 state votesSource: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423
  • 10. Big Data – What Is It? Big Data – What Is It?Volume. Variety. Velocity. Volume. Variety. Velocity.Variability. Complexity. Taken together, these three “Vs” of Big Data were originally posited by Gartner’s Doug Laney in a 2001 research report. Variability. Complexity. Taken together, these three “Vs” of Big Data were originally posited by Gartner’s Doug Laney in a 2001 research report.
  • 11. “It’s difficult to imagine thepower that you’re going to havewhen so many different sorts ofdata are available”Tim Berners Lee
  • 12. Data never sleeps
  • 13.
  • 14.
  • 15.
  • 16. Facebook WorldSource: http://ipcarrier.blogspot.it/2010/12/facebook-world.html
  • 17. http://youtu.be/xJXOavGwAW8
  • 18. The Data Deluge
  • 19. Mass Opinion Business Intelligence (MOBI) analyzes andclassifies comments made online and distills the information into apre-defined, structured database.MOBI methodology combines online measurement, cloudcomputing and market research that provides live consumersentiment data around brands, products and purchase influencingfactors using decision-supported information from millions ofunsolicited opinions.http://en.wikipedia.org/wiki/WiseWindow
  • 20. Financial Services Industry: Bloomberg andWiseWindow use social media and big data to improveinvestment returns.http://en.wikipedia.org/wiki/WiseWindow
  • 21. Natural disasters: Twitter was a richer and more up-to-date source of information about the 5.8 magnitudequake in Virginia.
  • 22. http://youtu.be/PThAriHjk10Traffic Twitter after Japan earthquake
  • 23. Automotive Industry: Big data analysis of social mediacomments can predict trends in automotive equipmentfailures.
  • 24. Telecommunications: T-Mobile used big data integratedwith its transaction systems and social media todramatically cut customer defections in one quarter.
  • 25. Energy/Utility Industry: GE is going to use social mediareports to track outages faster and better.
  • 26. Advertising Industry: Dachis Group used big dataanalysis of social media to create a more up-to-date andaccurate ranking of the competitive position ofengagement at large companies.
  • 27. Marketing: Nestle is using social media listening andanalytics to engage at scale in the market using its bigdata powered central command center.
  • 28. Education Industry: DoSomething.org engaged 200,000people worldwide in Facebook to combat bullying inschools and analyzed their sentiments.
  • 29. Criminal Justice: Police department around the UnitedStates now use social media analysis extensively tofight crime.
  • 30. Health Care Industry: Using social media and big data totrack cholera outbreaks in Haiti faster and moreaccurately.
  • 31. APIApplicationProgrammingInterface
  • 32. API
  • 33. API
  • 34. APIAPI
  • 35. APIhttp://apistat.istat.it/?q=gettable&dataset=DCIS_POPORESBIL&dim=82,0,0,0&lang=0&tr=&te= query string
  • 36. API
  • 37. http://developers.facebook.com/ https://dev.twitter.com/ Es: https://stream.twitter.com/1.1/statuses/sample.json
  • 38. http://cs.croakun.com/
  • 39. […]
  • 40. 7% work Thanx Piet!  50% pointless babble3% 5% TV and Radiopolitics 10% spare time activities
  • 41. http://youtu.be/iReY3W9ZkLU
  • 42. Top 5 Myths about Big Data1. Big Data is Only About Massive Data Volume Generally speaking, experts consider petabytes of data volumes as the starting point for Big Data, although this volume indicator is a moving target. Therefore, while volume is important, the next two “Vs” are better individual indicators. Variety refers to the many different data and file types that are important to manage and analyze more thoroughly, but for which traditional relational databases are poorly suited. Some examples of this variety include sound and movie files, images, documents, geo- location data, web logs, and text strings. Velocity is about the rate of change in the data and how quickly it must be used to create real value. Traditional technologies are especially poorly suited to storing and using high- velocity data. So new approaches are needed. If the data in question is created and aggregates very quickly and must be used swiftly to uncover patterns and problems, the greater the velocity and the more likely that you have a Big Data opportunity.
  • 43. Top 5 Myths about Big Data2. Big Data Means Hadoop Hadoop is the Apache open-source software framework for working with Big Data. It was derived from Google technology and put to practice by Yahoo and others. But, Big Data is too varied and complex for a one-size-fits-all solution. While Hadoop has surely captured the greatest name recognition, it is just one of three classes of technologies well suited to storing and managing Big Data. The other two classes are NoSQL and Massively Parallel Processing (MPP) data stores. (See myth number five below for more about NoSQL.) Examples of MPP data stores include EMC’s Greenplum, IBM’s Netezza, and HP’s Vertica.
  • 44. Top 5 Myths about Big Data3. Big Data Means Unstructured Data Big Data is probably better termed “multi-structured” as it could include text strings, documents of all types, audio and video files, metadata, web pages, email messages, social media feeds, form data, and so on. The consistent trait of these varied data types is that the data schema isn’t known or defined when the data is captured and stored. Rather, a data model is often applied at the time the data is used.
  • 45. Top 5 Myths about Big Data4. Big Data is for Social Media Feeds andSentiment Analysis Simply put, if your organization needs to broadly analyze web traffic, IT system logs, customer sentiment, or any other type of digital shadows being created in record volumes each day, Big Data offers a way to do this. Even though the early pioneers of Big Data have been the largest, web-based, social media companies -- Google, Yahoo, Facebook -- it was the volume, variety, and velocity of data generated by their services that required a radically new solution rather than the need to analyze social feeds or gauge audience sentiment.
  • 46. Top 5 Myths about Big Data5. NoSQL means No SQL NoSQL means “not only” SQL because these types of data stores offer domain-specific access and query techniques in addition to SQL or SQL-like interfaces. Technologies in this NoSQL category include key value stores, document-oriented databases, graph databases, big table structures, and caching data stores. The specific native access methods to stored data provide a rich, low-latency approach, typically through a proprietary interface. SQL access has the advantage of familiarity and compatibility with many existing tools. Although this is usually at some expense of latency driven by the interpretation of the query to the native “language” of the underlying system. For example, Cassandra, the popular open source key value store offered in commercial form by DataStax, not only includes native APIs for direct access to Cassandra data, but CQL (it’s SQL-like interface) as its emerging preferred access mechanism. It’s important to choose the right NoSQL technology to fit both the business problem and data type and the many categories of NoSQL technologies offer plenty of choice.
  • 47. http://youtu.be/0eUeL3n7fDs
  • 48. http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf
  • 49. “Data scientist”
  • 50. http://open.nasa.gov/blog/2012/10/04/what-is-nasa-doing-with-big-data-today/
  • 51. Possono i BD essere utilizzati per misurarefenomeni Economici, Sociali, Ambientali?
  • 52. Indagini Campionarie Archivi Amministrativi
  • 53. Le statistiche sui prezzi
  • 54. Significance magazine august 2012Big Data and City Living – what can it do for us?
  • 55. Big Data Sources Sensors Transact ionalAdminis trative Behavio ural Tracking Devices
  • 56. Web Scraping
  • 57. Web Scraping http://www.comune.torino.it/ambiente/aria/qualita_aria/dati_aria/valori _annuali_pm10.shtml https://scraperwiki.com/scrapers/valori_pm10_in_comune_di_torino/Esempi
  • 58. Web Scraping http://thebiobucket.blogspot.it/2011/10/little-webscraping- exercise.html#more EsempiMilano, 13 Dicembre 2012
  • 59. Web ScrapingEsempi http://www.metoffice.gov.uk/climate/uk/stationdata/armaghdata.txt
  • 60. http://elezionistorico.interno.it/
  • 61. Open Data LOpen Data si basa sulla constatazione che il dato pubblico è stato prodotto con denaro pubblico, quindi della collettività. Ed è a questa che il dato deve essere restituito.
  • 62. Open Data Dati liberamente accessibili a tutti in formato aperto senza restrizioni di copyright, brevetti o altre forme di controllo che ne limitino l’utilizzo.
  • 63. Open GovernmentSi intende un modello di Governance alivello centrale e locale basato sullapertura(partecipazione e collaborazione) e sullatrasparenza nei confronti dei cittadini
  • 64. Le iniziative
  • 65. Le iniziative
  • 66. Open Data Government DataCorporate Community Data Open Data Data
  • 67. Community Data
  • 68. Corporate Data
  • 69. I cataloghi di dati
  • 70. I formati degli Open DataEs. http://www.istat.it/it/files/2012/12/Tavole_XLS.zip
  • 71. I cataloghi di dati territorio categoria titolo fonte licenza data descrizioneMetadati url
  • 72. Volume FontiRelazioni Contesto
  • 73. Data Integration Ricoveriospedalieri
  • 74. Data Integration Concessio ni edilizie Cause di morteCasellario RicoveriGiudiziario ospedalieri Delibere comunali Industrie per ATECO Dati Spesa ambientali sanitaria Provvedim enti Regionali Mappe Dichiarazio ni dei Politici
  • 75. Data Integration Concessio ni edilizie Cause di morteCasellario RicoveriGiudiziario ospedalieri Delibere comunali Industrie per ATECO Dati Spesa ambientali sanitaria Provvedim enti Regionali Dati Geografici Dichiarazio ni dei Politici
  • 76. RDF
  • 77. LOD Cloud
  • 78. LOD Cloud
  • 79. Linked Open DataSemantic Web
  • 80. Grazie dell’attenzione! @vincpatruno vincenzo.patruno@istat.it http://www.vincenzopatruno.org