La telefonía móvil como fuente de información para el estudio de la movilidad...Esri España
Existe una multitud de sectores donde es necesario disponer de datos que permitan entender los patrones de comportamiento de la población: la planificación y la operación de los sistemas de transporte requiere información precisa, fiable y actualizada sobre la demanda de viajes; los patrones de actividad y movilidad de los turistas tienen profundas implicaciones para la planificación de infraestructuras, el desarrollo de la oferta turística y las estrategias de marketing turístico; entender el comportamiento espacial de los clientes es clave para optimizar las estrategias de distribución, comercialización y publicidad, determinar la localización de un nuevo comercio o punto de venta, o maximizar el retorno de la inversión en acciones de marketing. Las fuentes de datos tradicionales, basadas fundamentalmente en encuestas y registros administrativos, proporcionan información muy valiosa, pero no están exentas de inconvenientes. En general, las encuestas resultan caras y lentas de realizar, lo que limita el tamaño de la muestra y la frecuencia de actualización de la información, a lo que hay que añadir otras limitaciones intrínsecas, como las respuestas incorrectas e imprecisas, o la dependencia de la disposición a responder de los entrevistados. En los últimos años, la generalización del uso de dispositivos móviles ha abierto nuevas oportunidades para superar muchas de estas limitaciones. La posibilidad de recoger datos geolocalizados sobre la actividad de las personas, de manera dinámica y a un coste sensiblemente inferior al de los métodos tradicionales, abre la puerta a infinidad de aplicaciones. Las más evidentes son quizá las relacionadas con el transporte y la movilidad, pero el abanico es mucho más amplio, abarcando casi cualquier área que requiera información sobre los patrones de actividad y movilidad de la población. Las nuevas fuentes de datos plantean asimismo importantes retos, desde la necesidad de desarrollar nuevas metodologías de análisis, hasta la protección de la privacidad.
Vídeo de la ponencia: https://youtu.be/5PKC5Qm0eHM
ONS Local has been established by the Office for National Statistics (ONS) to support evidence-based decision-making at the local level. We aim to host insightful events that connect our users with exciting developments happening in subnational statistics and analysis at the ONS and across other organisations.
Have you ever wondered which local authorities are similar to each other? This presentation discusses cluster analysis ONS has published to draw insight into which local authorities are performing in a similar way against key policy themes, promoting greater joined up working between local authorities with similar characteristics to address common problems they face. Our analysis also provides local authorities with control groups for investigating the impact of policy interventions.
In this webinar, we will cover the methods used to create our outputs, demonstrate some of our findings in our interactive visualisation tool and present information on our future plans to expand on this work.
This event is open to all, however we anticipate it will be of most interest to anyone working at a local level, or with data on the policy themes of economy, transport connectivity, education, skills, health and wellbeing.
If you have any questions, please contact ons.local@ons.gov.uk
Predictive Analysis of Bike Sharing System Using Machine Learning Algorithmssushantparte
Provided business solutions based on the ethical aspects of data collection and shortcomings of business by visualizing data and forecasting the demand using Ensemble Learning Technique (Random Forest) with an RMSE of 89.09%.
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Gloria Re Calegari
Prediction of expensive datasets starting from a set of cheap heterogeneous information sources in smart city scenarios.
Prediction of the population and land use of Milano starting from data about Points Of Interest and phone activity.
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...Wassim Derguech
Public datasets are becoming more and more available for organizations. Both public and private data can be used to drive innovations and new solutions to various problems. The Internet of Things (IoT) and Open Data are particularly promising in real time predictive data analytics for effective decision support. The main challenge in this context is the dynamic selection of open data and IoT sources to support predictive analytics. This issue is widely discussed in various domains including economics, market analysis, energy usage, etc. Our case study is the prediction of energy usage of a building using open data and IoT. We propose a two-step solution: (1) data management: collection, filtering and warehousing and (2) data analytics: source selection and prediction. This work has been evaluated in real settings using IoT sensors and open weather data.
A Big Data Telco Solution by Dr. Laura Wynterwkwsci-research
Presented during the WKWSCI Symposium 2014
21 March 2014
Marina Bay Sands Expo and Convention Centre
Organized by the Wee Kim Wee School of Communication and Information at Nanyang Technological University
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
Research Triangle Analysts October presentation on Big Data by Dahl Winters (formerly of Research Triangle Institute). Dahl takes her viewers on a whirlwind tour of big data tools such as Hadoop and big data algorithms such as MapReduce, clustering, and deep learning. These slides document the many resources available on the internet, as well as guidelines of when and where to use each.
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
Research Triangle Analysts October presentation on Big Data by Dahl Winters (formerly of Research Triangle Institute). Dahl takes her viewers on a whirlwind tour of big data tools such as Hadoop and big data algorithms such as MapReduce, clustering, and deep learning. These slides document the many resources available on the internet, as well as guidelines of when and where to use each.
Big Data Analytics and Open Data : The presentation aim is to enhance the awareness about big data analytics by process and importance of open data. Two case studies overview with accuracy and introduction is presented by Sharjeel Imtiaz.
PhD from University of East London
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
La telefonía móvil como fuente de información para el estudio de la movilidad...Esri España
Existe una multitud de sectores donde es necesario disponer de datos que permitan entender los patrones de comportamiento de la población: la planificación y la operación de los sistemas de transporte requiere información precisa, fiable y actualizada sobre la demanda de viajes; los patrones de actividad y movilidad de los turistas tienen profundas implicaciones para la planificación de infraestructuras, el desarrollo de la oferta turística y las estrategias de marketing turístico; entender el comportamiento espacial de los clientes es clave para optimizar las estrategias de distribución, comercialización y publicidad, determinar la localización de un nuevo comercio o punto de venta, o maximizar el retorno de la inversión en acciones de marketing. Las fuentes de datos tradicionales, basadas fundamentalmente en encuestas y registros administrativos, proporcionan información muy valiosa, pero no están exentas de inconvenientes. En general, las encuestas resultan caras y lentas de realizar, lo que limita el tamaño de la muestra y la frecuencia de actualización de la información, a lo que hay que añadir otras limitaciones intrínsecas, como las respuestas incorrectas e imprecisas, o la dependencia de la disposición a responder de los entrevistados. En los últimos años, la generalización del uso de dispositivos móviles ha abierto nuevas oportunidades para superar muchas de estas limitaciones. La posibilidad de recoger datos geolocalizados sobre la actividad de las personas, de manera dinámica y a un coste sensiblemente inferior al de los métodos tradicionales, abre la puerta a infinidad de aplicaciones. Las más evidentes son quizá las relacionadas con el transporte y la movilidad, pero el abanico es mucho más amplio, abarcando casi cualquier área que requiera información sobre los patrones de actividad y movilidad de la población. Las nuevas fuentes de datos plantean asimismo importantes retos, desde la necesidad de desarrollar nuevas metodologías de análisis, hasta la protección de la privacidad.
Vídeo de la ponencia: https://youtu.be/5PKC5Qm0eHM
ONS Local has been established by the Office for National Statistics (ONS) to support evidence-based decision-making at the local level. We aim to host insightful events that connect our users with exciting developments happening in subnational statistics and analysis at the ONS and across other organisations.
Have you ever wondered which local authorities are similar to each other? This presentation discusses cluster analysis ONS has published to draw insight into which local authorities are performing in a similar way against key policy themes, promoting greater joined up working between local authorities with similar characteristics to address common problems they face. Our analysis also provides local authorities with control groups for investigating the impact of policy interventions.
In this webinar, we will cover the methods used to create our outputs, demonstrate some of our findings in our interactive visualisation tool and present information on our future plans to expand on this work.
This event is open to all, however we anticipate it will be of most interest to anyone working at a local level, or with data on the policy themes of economy, transport connectivity, education, skills, health and wellbeing.
If you have any questions, please contact ons.local@ons.gov.uk
Predictive Analysis of Bike Sharing System Using Machine Learning Algorithmssushantparte
Provided business solutions based on the ethical aspects of data collection and shortcomings of business by visualizing data and forecasting the demand using Ensemble Learning Technique (Random Forest) with an RMSE of 89.09%.
Smart Urban Planning Support through Web Data Science on Open and Enterprise ...Gloria Re Calegari
Prediction of expensive datasets starting from a set of cheap heterogeneous information sources in smart city scenarios.
Prediction of the population and land use of Milano starting from data about Points Of Interest and phone activity.
An Autonomic Approach to Real-Time Predictive Analytics using Open Data and ...Wassim Derguech
Public datasets are becoming more and more available for organizations. Both public and private data can be used to drive innovations and new solutions to various problems. The Internet of Things (IoT) and Open Data are particularly promising in real time predictive data analytics for effective decision support. The main challenge in this context is the dynamic selection of open data and IoT sources to support predictive analytics. This issue is widely discussed in various domains including economics, market analysis, energy usage, etc. Our case study is the prediction of energy usage of a building using open data and IoT. We propose a two-step solution: (1) data management: collection, filtering and warehousing and (2) data analytics: source selection and prediction. This work has been evaluated in real settings using IoT sensors and open weather data.
A Big Data Telco Solution by Dr. Laura Wynterwkwsci-research
Presented during the WKWSCI Symposium 2014
21 March 2014
Marina Bay Sands Expo and Convention Centre
Organized by the Wee Kim Wee School of Communication and Information at Nanyang Technological University
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
Research Triangle Analysts October presentation on Big Data by Dahl Winters (formerly of Research Triangle Institute). Dahl takes her viewers on a whirlwind tour of big data tools such as Hadoop and big data algorithms such as MapReduce, clustering, and deep learning. These slides document the many resources available on the internet, as well as guidelines of when and where to use each.
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
Research Triangle Analysts October presentation on Big Data by Dahl Winters (formerly of Research Triangle Institute). Dahl takes her viewers on a whirlwind tour of big data tools such as Hadoop and big data algorithms such as MapReduce, clustering, and deep learning. These slides document the many resources available on the internet, as well as guidelines of when and where to use each.
Big Data Analytics and Open Data : The presentation aim is to enhance the awareness about big data analytics by process and importance of open data. Two case studies overview with accuracy and introduction is presented by Sharjeel Imtiaz.
PhD from University of East London
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
2. Motivation
• Interested in Machine Learning
• Wide range of Machine Learning applications
in use
• Data-driven cities: City slicker - Data are slowly
changing the way cities operate (The
Economist)
3. Initial ideas
• Predict fires to dispatch ambulances efficiently
• Predict crimes to dispatch police cars efficiently
• Predict energy consumption (gas, electricity, etc.)
• Predict increase of waste using population
• Predict emission of carbon dioxide
• Predict the rise of rents and house prices using
economics and population data
• Map Londoners’ health on to the map of London
• Predict happiness by region
• Predict congestion
10. Initial ideas
• London Datastore has a variety of data
– Mostly statistics
– Not a lot of individual data
• What to learn?
11. The Idea
• Rent Prediction in London by Machine
Learning
– Can retrieve individual rent data from Zoopla
– Rent keeps changing and it is hard to know if the
rent is right for the place
• For landlords, it can be a standard to decide rent
• For tenants, it can be a standard to judge rent
• For Zoopla, it can attract more customers
12. Data Source
• Zoopla (about 45,000 examples)
– Latitude, Longitude, # of bedrooms, # of bathrooms, #
of floors, # of receptions, property type, price
• Walkscore
– Calculate score of an address based on how walkable
it is. (Close to grocery stores, restaurants, cafes, etc…)
• MapIt
– Converting Latitude/Longitude to Ward and Borough
code
13. Data Source
• London Datastore
– Ward profile
• Mean Age, Population density, % Not Born in UK, General Fertility
Rate, Male life expectancy, Female life expectancy, % children in
year 6 who are obese, Rate of All Ambulance Incidents per 1,000
population, Employment rate (16-74), Median House Price,
Number of properties sold, % Households Social Rented, %
Households Private Rented, % dwellings in council tax bands A or
B, % dwellings in council tax bands C, D or E, % dwellings in
council tax bands F, G or H, Claimant Rate of Income Support, %
with no qualifications, % with Level 4 qualifications and above,
Crime rate, Deliberate Fires, Cars per household, Average Public
Transport Accessibility score, Turnout at Mayoral election - 2012
– Borough profile
• Total carbon emissions, Teenage conception rate, Life satisfaction
score, Worthwhileness score, Happiness score, Anxiety score
14. Steps to solve
1. Collect and combine data
2. Preprocess data
3. Try different algorithms of machine learning
on the collected data
4. Tune the parameters of ML algorithms
5. Evaluate the results and algorithms
15. Step 1: Collect and Combine Data
1. Download listings data using Zoopla API
2. Get Walkscore using the API
3. Convert Longitude/Latitude to ward and
borough code using self-hosted MapIt
4. Merge ward and borough profile downloaded
from London Datastore to listings data
MapIt: UK
16. Step 2: Preprocess Data
• Scale (bias elimination)
• Encode categorical features
• Impute
– Replace n/a or space with mean
• Shuffle
• Split into training dataset and test dataset
(cross validation)
18. Step 4: Tune Parameters of Algs.
• Grid Search
– Exhaustively search the possible combinations of
parameters
– Takes too much time on my computer
• Random Search
– Takes less time
– Result is similar to grid search
Let’s see tuning parameters…
25. 0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 2 3 4 5 6 7 8 9 10 new data
MSE
Cross Validation / Score on new data
MSE on Cross Validation and new listings data
KNN
GBR
RF
SVR
standard deviation
26. Final Result
Final Result (MSE)MSE from Step 3
Fitting Time
(42003 examples)
Predicting Time
(3582 examples)
Random Forest 0.108435602 0.241214063 53.12 sec 2.03 sec
Gradient Tree
Boosting 0.117256254 0.273875445 149.18 sec 0.45 sec
Support Vector
Machine 0.143577993 0.429585937 3192.02 sec 4.54 sec
K-Nearest
Neighbors 0.217186025 0.306133182 3.97 sec 3.82 sec
28. Compare rents with Zoopla Estimate (1/2)
Zoopla Estimate
Actual Rent
Predicted Rent by Random Forests
£381.5120443 pw = £1653 pcm
(pw x 52 / 12 = pcm)
29. Compare rents with Zoopla Estimate (2/2)
Zoopla Estimate
Actual Rent
Predicted Rent by Random Forests
£1488.237929 pw = £6449 pcm
30. Conclusion
• Random Forest works the best for this
problem
• Data quality in dataset greatly influence the
result of prediction more than parameters of
machine learning algorithms does
• Can not compare all the predicted rents with
Zoopla estimate, but got some results closer
to the actual rents than Zoopla estimate
31. Future Work
• Adding more room specific information such
as size of the room and age
• Make an app to predict rent by inputting an
address, # of bedrooms, # of bathrooms, # of
floors and property type
32. Challenges
• Collect Data
– Time consuming
– Hard to find good dataset
• Statistics
– Possible to use machine learning without knowing
math/statistics
– Need to know in order to understand what ML
algorithms do deeply or tune the parameters
efficiently
33. What I learned
• Python
• Scikit-learn / Tableau / Google Maps API /
Walkscore API / Coordinate systems (MapIt
API)
• How to apply machine learning algorithms
• Collecting good dataset is more important
than algorithms
34. References
• Walkscore
– https://www.walkscore.com
• MapIt
– http://mapit.poplus.org
• Google Maps API
– https://developers.google.com/maps/documentation/javascript/
• Scikit-learn
– http://scikit-learn.org/stable/
• London Datastore
– http://data.london.gov.uk
• Tableau
– http://www.tableau.com
36. References
• Data-driven cities: City slicker - Data are slowly changing
the way cities operate (The Economist)
– http://www.economist.com/news/britain/21629533-data-are-
slowly-changing-way-cities-operate-city-slicker
• CS7641.TNL.MATLAB. Supervised Learning Workflow and
Algorithms
– http://wiki.omscs.org/confluence/display/CS7641ML/CS7641.T
NL.MATLAB.+Supervised+Learning+Workflow+and+Algorithms
• Coursera: Machine Learning by Andrew Ng
– https://www.coursera.org/course/ml
• Questions?
37. MSE
• The RMSE is the distance, on average, of a
data point from the fitted line (representing
predictions made by the model), measured
along a vertical line.