Big and Open Data 
Challenges for Smartcity 
Dr. Victoria López 
Grupo G-TeC 
www.tecnologiaUCM.es 
Universidad Complutense de Madrid 
August 
26th 
2014 
55 Exchange 
Place 
NYC
Big and Open data. Challenges for Smartcity 
• What about Big Data? 
• Fighting with Big Data. 
• Big Data. Big Projects. Privacity. 
• Open Data. Transparency. Smartcities.
What about Big Data? 
From Data Warehouse to Big Data (large Data Bases) 
3 
1970 relational model invented 
RDBMS declared mainstream till 90s 
One-size fits all, Elephant vendors- heavily 
encoded even indexing by B-trees.
What about Big Data? 
Big Data 3+1+1 V’s 
4
Fighting with Big Data 
5
Fighting with the Big Data 
Bioinformatics, Genoma data, DNA, RNA, Proteins and, 
in general all biological data have been required by 
computing monitors and storing in large data bases in 
several laboratories and researching centers along the 
world 
The Human Genome Project 
6
Customer point of view 
Looking for flights 
– Not a simple search 
7
Web Issues: Short path 
8 
Joke but, behind our comfortable position there are 
some math and programming…
• Restrictions: 
– Total time 
– Total Costs 
– Date/hour 
• How to sort the results? 
– http://www.sorting-algorithms.com/ 
9 
Web issues: Searching & Sorting
How many? 
10 
Order your room now! 
One teenager working = one afternoon at home
How many? 
11 
Order all New York rooms NOW! 
One teenager working alone?
The solution: organization 
12
13 
Big Data: Map Reduce 
• Created by Google (2004) 
– Parallel programming model 
– Simple concept, smart, suitable for multiple applications 
– Big datasets  multi-node in multiprocessors 
– Sets of nodes: Clusters or Grids (distributed programming) 
– Able to process 20 PB per day 
– Based on Map & Reduce, classical methods in functional programming 
related to the classic Divide & Conquer 
– Come from numeric analysis (big matrix products). 
• Main feature: scalability to many nodes 
– Scan of 100 TB in 1 node @ 50 MB/sec = 23 days 
– Scan in a cluster of 1000 nodes = 33 minutes
Big Data: Hadoop, Spark 
– Used by Yahoo!, Facebook, Twitter 
Amazon, eBay… 
– Can be used in different architectures: 
both clusters (in-house) and grid 
(Cloudcomputing) 
https://hadoop.apache.org/ https://spark.apache.org/ 14
How amount of data? 
15
Recommender Systems 
16 
Renew your car insurance 
– Semantic Web tools 
– Analysing & storing personal information
Business need to be competitive 
17 
Harvard Business Review (HBR) blog, CMOs and CIOs Need to Get Along to Make Big 
Data Work,
Big Data & Business 
18
Big Data for Big projects 
Real Time 
The Obama 2012 campaign used data analytics and the 
experimental method to assemble a winning coalition vote by 
vote. In doing so, it overturned the long dominance of TV 
advertising in U.S. politics and created something new in the 
world: a national campaign run like a local ward election, where 
the interests of individual voters were known and addressed. 
19
20 
Big Data for Big projects 
Real Time 
How Brazil vs. Germany played out on Twitter 
Geotagged tweets mentioning key terms around the Word Cup game, 
July 8, 2014
Where are my Personal Data? 
21 
Social 
Sensing
The close future: Internet of the things 
22
Open Data 
“Open data is data that can be freely used, reused and redistributed by anyone – 
subject only, at most, to the requirement to attribute and sharealike.” 
OpenDefinition.org - 
“Open data is data that can be freely used, 
reused and redistributed by anyone – subject 
only, at most, to the requirement to attribute 
and share alike.” OpenDefinition.org 
Availability and Access: the data must be 
available as a whole and at no more than a 
reasonable reproduction cost, preferably by 
downloading over the internet. The data 
must also be available in a convenient and 
modifiable form. 
Reuse and Redistribution: the data must be 
provided under terms that permit reuse and 
redistribution including the intermixing with 
other datasets. The data must be machine-readable. 
Universal Participation: everyone must be 
able to use, reuse and redistribute – there 
should be no discrimination against fields of 
endeavour or against persons or groups. For 
example, ‘non-commercial’ restrictions that 
would prevent ‘commercial’ use, or 
restrictions of use for certain purposes (e.g. 
only in education), are not allowed. 
23
Open Data 
24
Why Open Data by Open Knowledge Foundation 
25
Recycla.me 
Mariam Saucedo 
Pilar Torralbo 
Daniel Sanz 
Ana Alfaro 
Sergio Ballesteros 
Lidia Sesma 
Héctor Martos 
Álvaro Bustillo 
Arturo Callejo 
Belén Abellanas 
Jaime Ramos 
Ignacio P. de Ziriza 
Victor Torres 
Alberto Segovia 
Miguel Bueno 
Mar Octavio de 
Toledo 
Antonio Sanmartín 
Carlos Fernández 
MAPA DE RECURSOS 
26 
RECYCLA.TE
Madrid – Smart City 
• Parks and gardens 
• Parkings for 
• Cars 
• Motorbikes 
• Bikes 
• Recycing Points 
• Fixed 
• Mobile 
• Cloths 
• Stations 
• Bioetanol 
• Gas 
• Oil 
• Electric 
• Routes for bikes 
• Vías ciclistas 
• Calles seguras 
• Residential Priority Areas 
27
28
The way from data to value 
• Big Data Collection 
– Monitoring 
– Data cleaning and integration 
– Hosted Data Platforms and the Cloud 
• Big Data Storage 
– Modern Data Bases 
– Distributed Computing Platforms 
– NoSQL, NewSQL 
• Big Data Systems 
– Security 
– Multicore scalability 
– Visualization and User Interfaces 
• Big Data Analytics 
– Fast algorithms 
– Data compression 
– Machine learning tools 
– Visualization & Reporting 
29 
The MIT proposal stage list 
to deal with Big Data
Conclusions 
30 
Big Data, Open Data and Smartcity 
• Era of Data Revolution (Alex 'Sandy' Pentland, 
http://www.media.mit.edu/people/sandy) 
• New technologies & development 
• New Business 
• Great opportunities in Smartcity development
www.madrid.org 
Madrid City Hall 
Dr. Victoria López www.tecnologiaUCM.es

Fortune Time Institute: Big Data - Challenges for Smartcity

  • 1.
    Big and OpenData Challenges for Smartcity Dr. Victoria López Grupo G-TeC www.tecnologiaUCM.es Universidad Complutense de Madrid August 26th 2014 55 Exchange Place NYC
  • 2.
    Big and Opendata. Challenges for Smartcity • What about Big Data? • Fighting with Big Data. • Big Data. Big Projects. Privacity. • Open Data. Transparency. Smartcities.
  • 3.
    What about BigData? From Data Warehouse to Big Data (large Data Bases) 3 1970 relational model invented RDBMS declared mainstream till 90s One-size fits all, Elephant vendors- heavily encoded even indexing by B-trees.
  • 4.
    What about BigData? Big Data 3+1+1 V’s 4
  • 5.
  • 6.
    Fighting with theBig Data Bioinformatics, Genoma data, DNA, RNA, Proteins and, in general all biological data have been required by computing monitors and storing in large data bases in several laboratories and researching centers along the world The Human Genome Project 6
  • 7.
    Customer point ofview Looking for flights – Not a simple search 7
  • 8.
    Web Issues: Shortpath 8 Joke but, behind our comfortable position there are some math and programming…
  • 9.
    • Restrictions: –Total time – Total Costs – Date/hour • How to sort the results? – http://www.sorting-algorithms.com/ 9 Web issues: Searching & Sorting
  • 10.
    How many? 10 Order your room now! One teenager working = one afternoon at home
  • 11.
    How many? 11 Order all New York rooms NOW! One teenager working alone?
  • 12.
  • 13.
    13 Big Data:Map Reduce • Created by Google (2004) – Parallel programming model – Simple concept, smart, suitable for multiple applications – Big datasets  multi-node in multiprocessors – Sets of nodes: Clusters or Grids (distributed programming) – Able to process 20 PB per day – Based on Map & Reduce, classical methods in functional programming related to the classic Divide & Conquer – Come from numeric analysis (big matrix products). • Main feature: scalability to many nodes – Scan of 100 TB in 1 node @ 50 MB/sec = 23 days – Scan in a cluster of 1000 nodes = 33 minutes
  • 14.
    Big Data: Hadoop,Spark – Used by Yahoo!, Facebook, Twitter Amazon, eBay… – Can be used in different architectures: both clusters (in-house) and grid (Cloudcomputing) https://hadoop.apache.org/ https://spark.apache.org/ 14
  • 15.
    How amount ofdata? 15
  • 16.
    Recommender Systems 16 Renew your car insurance – Semantic Web tools – Analysing & storing personal information
  • 17.
    Business need tobe competitive 17 Harvard Business Review (HBR) blog, CMOs and CIOs Need to Get Along to Make Big Data Work,
  • 18.
    Big Data &Business 18
  • 19.
    Big Data forBig projects Real Time The Obama 2012 campaign used data analytics and the experimental method to assemble a winning coalition vote by vote. In doing so, it overturned the long dominance of TV advertising in U.S. politics and created something new in the world: a national campaign run like a local ward election, where the interests of individual voters were known and addressed. 19
  • 20.
    20 Big Datafor Big projects Real Time How Brazil vs. Germany played out on Twitter Geotagged tweets mentioning key terms around the Word Cup game, July 8, 2014
  • 21.
    Where are myPersonal Data? 21 Social Sensing
  • 22.
    The close future:Internet of the things 22
  • 23.
    Open Data “Opendata is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.” OpenDefinition.org - “Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and share alike.” OpenDefinition.org Availability and Access: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. Reuse and Redistribution: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable. Universal Participation: everyone must be able to use, reuse and redistribute – there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed. 23
  • 24.
  • 25.
    Why Open Databy Open Knowledge Foundation 25
  • 26.
    Recycla.me Mariam Saucedo Pilar Torralbo Daniel Sanz Ana Alfaro Sergio Ballesteros Lidia Sesma Héctor Martos Álvaro Bustillo Arturo Callejo Belén Abellanas Jaime Ramos Ignacio P. de Ziriza Victor Torres Alberto Segovia Miguel Bueno Mar Octavio de Toledo Antonio Sanmartín Carlos Fernández MAPA DE RECURSOS 26 RECYCLA.TE
  • 27.
    Madrid – SmartCity • Parks and gardens • Parkings for • Cars • Motorbikes • Bikes • Recycing Points • Fixed • Mobile • Cloths • Stations • Bioetanol • Gas • Oil • Electric • Routes for bikes • Vías ciclistas • Calles seguras • Residential Priority Areas 27
  • 28.
  • 29.
    The way fromdata to value • Big Data Collection – Monitoring – Data cleaning and integration – Hosted Data Platforms and the Cloud • Big Data Storage – Modern Data Bases – Distributed Computing Platforms – NoSQL, NewSQL • Big Data Systems – Security – Multicore scalability – Visualization and User Interfaces • Big Data Analytics – Fast algorithms – Data compression – Machine learning tools – Visualization & Reporting 29 The MIT proposal stage list to deal with Big Data
  • 30.
    Conclusions 30 BigData, Open Data and Smartcity • Era of Data Revolution (Alex 'Sandy' Pentland, http://www.media.mit.edu/people/sandy) • New technologies & development • New Business • Great opportunities in Smartcity development
  • 31.
    www.madrid.org Madrid CityHall Dr. Victoria López www.tecnologiaUCM.es