Fortune Time Institute: Big Data - Challenges for Smartcity
Big and Open Data
Challenges for Smartcity
Dr. Victoria López
Universidad Complutense de Madrid
Big and Open data. Challenges for Smartcity
• What about Big Data?
• Fighting with Big Data.
• Big Data. Big Projects. Privacity.
• Open Data. Transparency. Smartcities.
What about Big Data?
From Data Warehouse to Big Data (large Data Bases)
1970 relational model invented
RDBMS declared mainstream till 90s
One-size fits all, Elephant vendors- heavily
encoded even indexing by B-trees.
Fighting with the Big Data
Bioinformatics, Genoma data, DNA, RNA, Proteins and,
in general all biological data have been required by
computing monitors and storing in large data bases in
several laboratories and researching centers along the
The Human Genome Project
Customer point of view
Looking for flights
– Not a simple search
Web Issues: Short path
Joke but, behind our comfortable position there are
some math and programming…
– Total time
– Total Costs
• How to sort the results?
Web issues: Searching & Sorting
Order your room now!
One teenager working = one afternoon at home
Order all New York rooms NOW!
One teenager working alone?
Big Data: Map Reduce
• Created by Google (2004)
– Parallel programming model
– Simple concept, smart, suitable for multiple applications
– Big datasets multi-node in multiprocessors
– Sets of nodes: Clusters or Grids (distributed programming)
– Able to process 20 PB per day
– Based on Map & Reduce, classical methods in functional programming
related to the classic Divide & Conquer
– Come from numeric analysis (big matrix products).
• Main feature: scalability to many nodes
– Scan of 100 TB in 1 node @ 50 MB/sec = 23 days
– Scan in a cluster of 1000 nodes = 33 minutes
Big Data: Hadoop, Spark
– Used by Yahoo!, Facebook, Twitter
– Can be used in different architectures:
both clusters (in-house) and grid
https://hadoop.apache.org/ https://spark.apache.org/ 14
Big Data for Big projects
The Obama 2012 campaign used data analytics and the
experimental method to assemble a winning coalition vote by
vote. In doing so, it overturned the long dominance of TV
advertising in U.S. politics and created something new in the
world: a national campaign run like a local ward election, where
the interests of individual voters were known and addressed.
Big Data for Big projects
How Brazil vs. Germany played out on Twitter
Geotagged tweets mentioning key terms around the Word Cup game,
July 8, 2014
“Open data is data that can be freely used, reused and redistributed by anyone –
subject only, at most, to the requirement to attribute and sharealike.”
“Open data is data that can be freely used,
reused and redistributed by anyone – subject
only, at most, to the requirement to attribute
and share alike.” OpenDefinition.org
Availability and Access: the data must be
available as a whole and at no more than a
reasonable reproduction cost, preferably by
downloading over the internet. The data
must also be available in a convenient and
Reuse and Redistribution: the data must be
provided under terms that permit reuse and
redistribution including the intermixing with
other datasets. The data must be machine-readable.
Universal Participation: everyone must be
able to use, reuse and redistribute – there
should be no discrimination against fields of
endeavour or against persons or groups. For
example, ‘non-commercial’ restrictions that
would prevent ‘commercial’ use, or
restrictions of use for certain purposes (e.g.
only in education), are not allowed.
Ignacio P. de Ziriza
Mar Octavio de
MAPA DE RECURSOS
Madrid – Smart City
• Parks and gardens
• Parkings for
• Recycing Points
• Routes for bikes
• Vías ciclistas
• Calles seguras
• Residential Priority Areas
The way from data to value
• Big Data Collection
– Data cleaning and integration
– Hosted Data Platforms and the Cloud
• Big Data Storage
– Modern Data Bases
– Distributed Computing Platforms
– NoSQL, NewSQL
• Big Data Systems
– Multicore scalability
– Visualization and User Interfaces
• Big Data Analytics
– Fast algorithms
– Data compression
– Machine learning tools
– Visualization & Reporting
The MIT proposal stage list
to deal with Big Data
Big Data, Open Data and Smartcity
• Era of Data Revolution (Alex 'Sandy' Pentland,
• New technologies & development
• New Business
• Great opportunities in Smartcity development
Madrid City Hall
Dr. Victoria López www.tecnologiaUCM.es