Prediction of expensive datasets starting from a set of cheap heterogeneous information sources in smart city scenarios.
Prediction of the population and land use of Milano starting from data about Points Of Interest and phone activity.
City Data Dating: emerging affinities between diverse urban datasetsGloria Re Calegari
Cities are complex environments in which digital technologies are more and more pervasive; this digitization of the urban space has led to a rich ecosystem of data producers and data consumers. Moreover, heterogeneous sources differ in terms of data complexity, spatio-temporal resolution and curation/maintenance costs. Do those diverse urban sources reflect the same picture of the city? Do distinct perspectives share some commonalities?
We present our data analytics empirical experiments on a set of urban sources related to the city of Milano; our investigation is aimed at discovering “affinities” between datasets by means of different quantitative and qualitative correlation analyses. We also explore the influence of spatial resolution and data complexity on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
Presentation by John Östh, Aura Reggiani
& Laurie Schintler
Advanced Brainstorm Carrefour (ABC): ‘Smart People in Smart Cities’
Matej Bel University, Banská Bystrica, Slovakia (August, 2016)
Provenance Analytics at AAAI Human Computation Conference 2013T Dong Huynh
Trung Dong Huynh presenting the paper entitled "Interpretation of Crowdsourced Activities using Provenance Network Analysis" - How analysing provenance graphs can help interpreting crowdsouced activities in CollabMap
Using FME to Manipulate Various Sources of Data for Traffic Collision Spatio-...Safe Software
It is of interest to explore the spatial correlation between traffic collisions and other types of available data, for example demographic and economic information, land-use, license premises and weather. To effectively prepare the data for further analysis, the City of Edmonton Office of Traffic Safety used a series of extract-transform and load (ETL) tools provided by FME. This presentation will introduce three key ETL processes performed with FME: 1) manipulate roadway GIS data for network-based spatial analysis; 2) build spatial connection between various sources of data; 3) effectively prepare the proper format of spatial data to establish advanced spatial statistics.
This document presents research on using machine learning and deep learning models to predict stock prices. The researcher collected stock price data for two companies, Tata Steel and Hero Motocorp, at 5-minute intervals over two years. Eight classification and eight regression models were tested on this data to predict opening stock prices. The results showed that deep learning models like LSTM outperformed other regression models, while ANN performed best among classification models on average. Identifying individual errors in the best-performing LSTM model is identified as a potential next step.
An innovative methodology calculates safety scores and risk-based heat maps for construction plans based on equipment trajectories. The methodology:
1. Calculates safety scores for grid squares based on distances from obstacles.
2. Averages safety scores over time for each grid to assess plan risk.
3. Translates construction plans into discrete event simulations to schedule activities and simulate equipment trajectories using motion planning algorithms. These are used to generate heat maps showing risk over time.
A talk at the Urban Science workshop at the Puget Sound Regional Council July 20 2014 organized by the Northwest Institute for Advanced Computing, a joint effort between Pacific Northwest National Labs and the University of Washington.
City Data Dating: emerging affinities between diverse urban datasetsGloria Re Calegari
Cities are complex environments in which digital technologies are more and more pervasive; this digitization of the urban space has led to a rich ecosystem of data producers and data consumers. Moreover, heterogeneous sources differ in terms of data complexity, spatio-temporal resolution and curation/maintenance costs. Do those diverse urban sources reflect the same picture of the city? Do distinct perspectives share some commonalities?
We present our data analytics empirical experiments on a set of urban sources related to the city of Milano; our investigation is aimed at discovering “affinities” between datasets by means of different quantitative and qualitative correlation analyses. We also explore the influence of spatial resolution and data complexity on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
Presentation by John Östh, Aura Reggiani
& Laurie Schintler
Advanced Brainstorm Carrefour (ABC): ‘Smart People in Smart Cities’
Matej Bel University, Banská Bystrica, Slovakia (August, 2016)
Provenance Analytics at AAAI Human Computation Conference 2013T Dong Huynh
Trung Dong Huynh presenting the paper entitled "Interpretation of Crowdsourced Activities using Provenance Network Analysis" - How analysing provenance graphs can help interpreting crowdsouced activities in CollabMap
Using FME to Manipulate Various Sources of Data for Traffic Collision Spatio-...Safe Software
It is of interest to explore the spatial correlation between traffic collisions and other types of available data, for example demographic and economic information, land-use, license premises and weather. To effectively prepare the data for further analysis, the City of Edmonton Office of Traffic Safety used a series of extract-transform and load (ETL) tools provided by FME. This presentation will introduce three key ETL processes performed with FME: 1) manipulate roadway GIS data for network-based spatial analysis; 2) build spatial connection between various sources of data; 3) effectively prepare the proper format of spatial data to establish advanced spatial statistics.
This document presents research on using machine learning and deep learning models to predict stock prices. The researcher collected stock price data for two companies, Tata Steel and Hero Motocorp, at 5-minute intervals over two years. Eight classification and eight regression models were tested on this data to predict opening stock prices. The results showed that deep learning models like LSTM outperformed other regression models, while ANN performed best among classification models on average. Identifying individual errors in the best-performing LSTM model is identified as a potential next step.
An innovative methodology calculates safety scores and risk-based heat maps for construction plans based on equipment trajectories. The methodology:
1. Calculates safety scores for grid squares based on distances from obstacles.
2. Averages safety scores over time for each grid to assess plan risk.
3. Translates construction plans into discrete event simulations to schedule activities and simulate equipment trajectories using motion planning algorithms. These are used to generate heat maps showing risk over time.
A talk at the Urban Science workshop at the Puget Sound Regional Council July 20 2014 organized by the Northwest Institute for Advanced Computing, a joint effort between Pacific Northwest National Labs and the University of Washington.
Distributed and heterogeneous data analysis for smart urban planningEduardo Oliveira
Over the past decade, ‘smart’ cities have capitalized on new technologies and insights to transform their systems, operations and services. The rationale behind the use of these technologies is that an evidence-based, analytical approach to decision-making will lead to more robust and sustainable outcomes. However, harvesting high-quality data from the dense network of sensors embedded in the urban infrastructure, and combining this data with social network data, poses many challenges. In this paper, we investigate the use of an intelligent middleware – Device Nimbus – to support data capture and analysis techniques to inform urban planning and design. We report results from a ‘Living Campus’ experiment at the University of Melbourne, Australia focused on a public learning space case study. Local perspectives, collected via crowdsourcing, are combined with distributed and heterogeneous environmental sensor data. Our analysis shows that Device Nimbus’ data integration and intelligent modules provide high-quality support for decision-making and planning.
Review of the NYS DEC's Climate Smart Resiliency Planning (CRSP) tool results from the City of Kingston. The CRSP tool is used as a check list for determining gaps in climate preparedness at the beginning of a municipal planning process.
Presented at the 2013 APA + ASLA NY Upstate Chapter Annual Conference
Audience: planners, landscape architects, municipal officials, consultants, decision makers and general public.
The document outlines Modena, Italy's steps to become a smart city through urban planning. It discusses how Modena built a geographic information system (GIS) to collect and organize data on populations, buildings, traffic, schools, green spaces, and utilities. This GIS acts as a sensor to monitor the city. Officials then use the data across departments to inform decisions and ensure services meet demand. Examples shown include using demographic and infrastructure data to plan for schools, transportation, and public services. The document also discusses how other Italian local governments, provinces, and utilities utilize GIS-based solutions to improve service delivery and management.
Polycentric Cities and Sustainable DevelopmentDuncanSmith
Research mapping the density and function of commercial activities in Greater London, then exploring relationships with travel patterns. Part of my PhD research at CASA UCL. Presented at Regional Science UK and Ireland Section 2009.
This document discusses smart cities and urban planning in India. It begins with definitions of traditional city planning and smart city planning. It then discusses the impacts of globalization and economic changes on urbanization and city growth in India. Some key challenges discussed for Indian cities include population growth, urban sprawl, flooding, garbage, air and water pollution. The document examines trends in urbanization for India by 2030 and outlines some urban challenges around areas like transportation, infrastructure, land use, and the environment. It advocates for a shift towards more sustainable urban planning approaches focused on mobility and people rather than just transportation infrastructure expansion.
Harvesting business Value with Data ScienceInfoFarm
Slidedeck from our seminar on "Harvesting Business Value with Data Science" (18/03/2015)
Topics covered:
- What is Data Science?
- Data Science: Tools and Techniques
- Data Science examples:
- Market segmentation
- Impact analysis
- Recommendations
- Water treatment
- Damage type research
- Call center aid
- Personalized client mailing (Essent)
- What do people write about us
- Fraud detection: Gotch’All (KU Leuven)
The document discusses different types of city forms including the radiocentric, gridiron, and linear cities. It provides examples like Moscow as a radiocentric city with concentric rings radiating from the Kremlin. Chandigarh and San Francisco are discussed as examples of gridiron cities with orthogonal street grids. Navi Mumbai is presented as a linear city developing along transportation routes. The document also covers models of urban land use including the concentric zone, sector, and multiple nuclei models.
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataGloria Re Calegari
We present the challenges faced by a Data Scientist in exploring and analyzing heterogeneous Open Geospatial Data. This work is aimed at explaining the initial steps of a data exploration process, specifically aimed at discovering similarities and differences conveyed by diverse sources and resulting from their correlation analysis; we also explore the influence of spatial resolution on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
This presentation summarizes a summer internship project at Idea Cellular Ltd. aimed at improving network availability by reducing downtime at major outage sites. Key findings include: (1) Out of 345 sites studied, 290 outages were closed as of May 10th, with improved availability of 0.7% by May 11th and 0.27% by May 27th. (2) Cost savings of INR 24.09 crore were estimated from reduced outages. (3) Notifications to engineers helped reduce outages by up to 18.68% from April to May. While improvements were observed, the study had limitations such as a small sample size and inability to determine all specific outage causes.
Transport for London - London's Operations Digital TwinNeo4j
1) London Transport is developing an Operations Digital Twin to provide a real-time simulation of traffic conditions on London's roads.
2) The Digital Twin integrates multiple real-time and historical data sources into a common framework and graph database aligned by road links and time.
3) This allows the Digital Twin to identify traffic incidents and disruptions, help manage traffic, and support planning and analysis across Transport for London.
This document summarizes a research paper on estimating time-evolving origin-destination (O-D) matrices using high-speed GPS data streams. It discusses using online machine learning techniques to build and maintain O-D matrices and histograms over time in order to model variables like travel time. A real-world case study using taxi GPS data from Porto, Portugal is also presented. Experimental results show the proposed time-evolving O-D matrix and multidimensional discretization techniques outperform static grid-based approaches and offline regression models in estimating travel times.
Sharing the experience and results of using georeferenced 2010 Census data in Mexico and EO to train algorithms in order to detect urban growth and generate useful information for estimating population for non-census years.
This presentation was given by Prof. K N Subramanya, Principal, RV College of Engineering & CoE IoT during IoTForum's AgriTech Day 2019 on February 9, 2019 at NIANP-ICAR, Bangaluru
Bruce Thompson on digital disruption and the environment OCESAdmin
Bruce Thompson talks about land capability mapping in the Victorian context at IPAA Public Sector Week session on digital disruption and the environment, sponsored by the Commissioner for Environmental Sustainability and Nous Group.
- Weather
- Production Targets
- Contingency Plans
Harvesting Head
Control Interface
Production
Statistics
Machine
Parameters
Tree Detection
& Recognition
SLOPE
In-Vehicle
Interface
Machine
Monitoring
Route
Planning
Cable Crane
Control
Risks and Mitigation Actions
Technical Meeting
2-4/Jul/2014
Risks:
- Integration with existing systems (MHG, TREE) not seamless
- Mobile/In-Vehicle interfaces not robust enough for field conditions
- User acceptance of new interfaces
Mitigation Actions:
- Early prototyping and testing with end users
- Modular design allowing independent development
This document discusses smart apps and how Pivotal uses data science to build them. It describes three key components of smart apps: data, a smart system that uses data science to understand user behavior, and a user interface. It then provides examples of smart apps Pivotal has developed for logistics and automotive customers, describing how machine learning models were used to predict delivery locations and road conditions. The document emphasizes an API-first approach and using cloud platforms like Cloud Foundry to operationalize models and deliver insights through predictive APIs.
Distributed and heterogeneous data analysis for smart urban planningEduardo Oliveira
Over the past decade, ‘smart’ cities have capitalized on new technologies and insights to transform their systems, operations and services. The rationale behind the use of these technologies is that an evidence-based, analytical approach to decision-making will lead to more robust and sustainable outcomes. However, harvesting high-quality data from the dense network of sensors embedded in the urban infrastructure, and combining this data with social network data, poses many challenges. In this paper, we investigate the use of an intelligent middleware – Device Nimbus – to support data capture and analysis techniques to inform urban planning and design. We report results from a ‘Living Campus’ experiment at the University of Melbourne, Australia focused on a public learning space case study. Local perspectives, collected via crowdsourcing, are combined with distributed and heterogeneous environmental sensor data. Our analysis shows that Device Nimbus’ data integration and intelligent modules provide high-quality support for decision-making and planning.
Review of the NYS DEC's Climate Smart Resiliency Planning (CRSP) tool results from the City of Kingston. The CRSP tool is used as a check list for determining gaps in climate preparedness at the beginning of a municipal planning process.
Presented at the 2013 APA + ASLA NY Upstate Chapter Annual Conference
Audience: planners, landscape architects, municipal officials, consultants, decision makers and general public.
The document outlines Modena, Italy's steps to become a smart city through urban planning. It discusses how Modena built a geographic information system (GIS) to collect and organize data on populations, buildings, traffic, schools, green spaces, and utilities. This GIS acts as a sensor to monitor the city. Officials then use the data across departments to inform decisions and ensure services meet demand. Examples shown include using demographic and infrastructure data to plan for schools, transportation, and public services. The document also discusses how other Italian local governments, provinces, and utilities utilize GIS-based solutions to improve service delivery and management.
Polycentric Cities and Sustainable DevelopmentDuncanSmith
Research mapping the density and function of commercial activities in Greater London, then exploring relationships with travel patterns. Part of my PhD research at CASA UCL. Presented at Regional Science UK and Ireland Section 2009.
This document discusses smart cities and urban planning in India. It begins with definitions of traditional city planning and smart city planning. It then discusses the impacts of globalization and economic changes on urbanization and city growth in India. Some key challenges discussed for Indian cities include population growth, urban sprawl, flooding, garbage, air and water pollution. The document examines trends in urbanization for India by 2030 and outlines some urban challenges around areas like transportation, infrastructure, land use, and the environment. It advocates for a shift towards more sustainable urban planning approaches focused on mobility and people rather than just transportation infrastructure expansion.
Harvesting business Value with Data ScienceInfoFarm
Slidedeck from our seminar on "Harvesting Business Value with Data Science" (18/03/2015)
Topics covered:
- What is Data Science?
- Data Science: Tools and Techniques
- Data Science examples:
- Market segmentation
- Impact analysis
- Recommendations
- Water treatment
- Damage type research
- Call center aid
- Personalized client mailing (Essent)
- What do people write about us
- Fraud detection: Gotch’All (KU Leuven)
The document discusses different types of city forms including the radiocentric, gridiron, and linear cities. It provides examples like Moscow as a radiocentric city with concentric rings radiating from the Kremlin. Chandigarh and San Francisco are discussed as examples of gridiron cities with orthogonal street grids. Navi Mumbai is presented as a linear city developing along transportation routes. The document also covers models of urban land use including the concentric zone, sector, and multiple nuclei models.
A Data Scientist Exploration in the World of Heterogeneous Open Geospatial DataGloria Re Calegari
We present the challenges faced by a Data Scientist in exploring and analyzing heterogeneous Open Geospatial Data. This work is aimed at explaining the initial steps of a data exploration process, specifically aimed at discovering similarities and differences conveyed by diverse sources and resulting from their correlation analysis; we also explore the influence of spatial resolution on the dependence strength between heterogeneous urban sources, to pave the way to a meaningful information fusion.
This presentation summarizes a summer internship project at Idea Cellular Ltd. aimed at improving network availability by reducing downtime at major outage sites. Key findings include: (1) Out of 345 sites studied, 290 outages were closed as of May 10th, with improved availability of 0.7% by May 11th and 0.27% by May 27th. (2) Cost savings of INR 24.09 crore were estimated from reduced outages. (3) Notifications to engineers helped reduce outages by up to 18.68% from April to May. While improvements were observed, the study had limitations such as a small sample size and inability to determine all specific outage causes.
Transport for London - London's Operations Digital TwinNeo4j
1) London Transport is developing an Operations Digital Twin to provide a real-time simulation of traffic conditions on London's roads.
2) The Digital Twin integrates multiple real-time and historical data sources into a common framework and graph database aligned by road links and time.
3) This allows the Digital Twin to identify traffic incidents and disruptions, help manage traffic, and support planning and analysis across Transport for London.
This document summarizes a research paper on estimating time-evolving origin-destination (O-D) matrices using high-speed GPS data streams. It discusses using online machine learning techniques to build and maintain O-D matrices and histograms over time in order to model variables like travel time. A real-world case study using taxi GPS data from Porto, Portugal is also presented. Experimental results show the proposed time-evolving O-D matrix and multidimensional discretization techniques outperform static grid-based approaches and offline regression models in estimating travel times.
Sharing the experience and results of using georeferenced 2010 Census data in Mexico and EO to train algorithms in order to detect urban growth and generate useful information for estimating population for non-census years.
This presentation was given by Prof. K N Subramanya, Principal, RV College of Engineering & CoE IoT during IoTForum's AgriTech Day 2019 on February 9, 2019 at NIANP-ICAR, Bangaluru
Bruce Thompson on digital disruption and the environment OCESAdmin
Bruce Thompson talks about land capability mapping in the Victorian context at IPAA Public Sector Week session on digital disruption and the environment, sponsored by the Commissioner for Environmental Sustainability and Nous Group.
- Weather
- Production Targets
- Contingency Plans
Harvesting Head
Control Interface
Production
Statistics
Machine
Parameters
Tree Detection
& Recognition
SLOPE
In-Vehicle
Interface
Machine
Monitoring
Route
Planning
Cable Crane
Control
Risks and Mitigation Actions
Technical Meeting
2-4/Jul/2014
Risks:
- Integration with existing systems (MHG, TREE) not seamless
- Mobile/In-Vehicle interfaces not robust enough for field conditions
- User acceptance of new interfaces
Mitigation Actions:
- Early prototyping and testing with end users
- Modular design allowing independent development
This document discusses smart apps and how Pivotal uses data science to build them. It describes three key components of smart apps: data, a smart system that uses data science to understand user behavior, and a user interface. It then provides examples of smart apps Pivotal has developed for logistics and automotive customers, describing how machine learning models were used to predict delivery locations and road conditions. The document emphasizes an API-first approach and using cloud platforms like Cloud Foundry to operationalize models and deliver insights through predictive APIs.
La telefonía móvil como fuente de información para el estudio de la movilidad...Esri España
Existe una multitud de sectores donde es necesario disponer de datos que permitan entender los patrones de comportamiento de la población: la planificación y la operación de los sistemas de transporte requiere información precisa, fiable y actualizada sobre la demanda de viajes; los patrones de actividad y movilidad de los turistas tienen profundas implicaciones para la planificación de infraestructuras, el desarrollo de la oferta turística y las estrategias de marketing turístico; entender el comportamiento espacial de los clientes es clave para optimizar las estrategias de distribución, comercialización y publicidad, determinar la localización de un nuevo comercio o punto de venta, o maximizar el retorno de la inversión en acciones de marketing. Las fuentes de datos tradicionales, basadas fundamentalmente en encuestas y registros administrativos, proporcionan información muy valiosa, pero no están exentas de inconvenientes. En general, las encuestas resultan caras y lentas de realizar, lo que limita el tamaño de la muestra y la frecuencia de actualización de la información, a lo que hay que añadir otras limitaciones intrínsecas, como las respuestas incorrectas e imprecisas, o la dependencia de la disposición a responder de los entrevistados. En los últimos años, la generalización del uso de dispositivos móviles ha abierto nuevas oportunidades para superar muchas de estas limitaciones. La posibilidad de recoger datos geolocalizados sobre la actividad de las personas, de manera dinámica y a un coste sensiblemente inferior al de los métodos tradicionales, abre la puerta a infinidad de aplicaciones. Las más evidentes son quizá las relacionadas con el transporte y la movilidad, pero el abanico es mucho más amplio, abarcando casi cualquier área que requiera información sobre los patrones de actividad y movilidad de la población. Las nuevas fuentes de datos plantean asimismo importantes retos, desde la necesidad de desarrollar nuevas metodologías de análisis, hasta la protección de la privacidad.
Vídeo de la ponencia: https://youtu.be/5PKC5Qm0eHM
ESTIMATING THE EFFORT OF MOBILE APPLICATION DEVELOPMENTcsandit
The rise of the use of mobile technologies in the world, such as smartphones and tablets,
connected to mobile networks is changing old habits and creating new ways for the society to
access information and interact with computer systems. Thus, traditional information systems
are undergoing a process of adaptation to this new computing context. However, it is important
to note that the characteristics of this new context are different. There are new features and,
thereafter, new possibilities, as well as restrictions that did not exist before. Finally, the systems
developed for this environment have different requirements and characteristics than the
traditional information systems. For this reason, there is the need to reassess the current
knowledge about the processes of planning and building for the development of systems in this
new environment. One area in particular that demands such adaptation is software estimation.
The estimation processes, in general, are based on characteristics of the systems, trying to
quantify the complexity of implementing them. Hence, the main objective of this paper is to
present a proposal for an estimation model for mobile applications, as well as discuss the
applicability of traditional estimation models for the purpose of developing systems in the
context of mobile computing. Hence, the main objective of this paper is to present an effort
estimation model for mobile applications.
BA Summit 2014 Predictive maintenance: Met big data het lek dichtenDaniel Westzaan
Predictive maintenance is een van de big-datatoepassingen met enorme potentie. Voor Vitens, het grootste waterbedrijf van Nederland met meer dan 5,5 miljoen klanten, toonden CGI en IBM in een proof of value aan dat sneller en nauwkeuriger lekken lokaliseren in potentie miljoenen kan besparen.
De primaire taak van Vitens is ervoor zorgen dat klanten te allen tijde kunnen beschikken over topkwaliteit drinkwater. Met een netwerk van meer dan 49.000 km relatief oude pijpleiding, is het kostenefficiënt onderhouden van het netwerk een voortdurende uitdaging. Veelal wordt gekozen voor preventief onderhoud waardoor pijpleiding vaak eerder wordt vervangen dan strikt nodig is. Desondanks treden er regelmatig lekken op met soms grote schade en bedreiging van de leveringszekerheid.
Het lokaliseren van lekken gebeurt handmatig, wat veel tijd en geld kost omdat het zoekgebied vaak kan oplopen tot tientallen kilometers. Vitens vroeg CGI en IBM om met behulp van een big-datatoepassing een methode te ontwikkelen voor het lokaliseren van lekken. In een proof of value werd historische data geanalyseerd waarbij de helft van de geanalyseerde lekken tot op 2,5 km nauwkeurig kon worden gelokaliseerd.
Door sneller lekken te lokaliseren of zelfs te voorspellen, kan Vitens niet alleen direct besparen op inzet van medewerkers voor lokalisatie en bezetting van het callcenter. Het maakt het ook mogelijk om de effectieve levensduur van pijpleidingen te verlengen of, bij minder kritische delen van het netwerk, zelfs te kiezen voor de maximale levensduur waarbij pas leiding pas wordt vervangen bij het daadwerkelijk optreden van lekken.
The document discusses big data use cases and requirements. It provides 51 detailed use cases across various domains that generate many terabytes to petabytes of data. It also describes extracting 437 specific requirements from the use cases and analyzing trends. The next steps involve matching requirements to a reference architecture and prioritizing use cases for implementation.
The document outlines a presentation on multimedia data mining. It discusses three articles: 1) a tool for visually mining multimedia data for social studies, 2) a framework for mining traffic video sequences, and 3) using voice mining to understand customer feedback. It also provides an introduction to multimedia data mining and recommendations.
This document describes how GIS was used to create the Telephone Exchange Information and Planning System (TEIPS) for the Vastrapur telephone exchange in Ahmedabad, India. TEIPS integrated spatial and non-spatial data on the telephone network into a GIS database to help with tasks like cable route planning, fault detection, and monitoring pillar utilization over time. The system allowed technicians to more efficiently plan and maintain the network.
[DSC Europe 23] Mihailo Ilic - Scalable and Interoperable Data Flow Managemen...DataScienceConferenc1
In recent years, there has been a significant increase in the use of Smart Farming Technologies (SFTs), which are seen as key enablers in farm management for crop monitoring and reduction of chemical use. This presentation will cover a key component for the advancement of such systems – a data infrastructure which offers semantic and syntactic interoperability. Through the utilization of ontologies and smart data models in the agricultural domain, this kind of infrastructure can support actionable digital twins and advance farming capabilities.
big data analytics in mobile cellular networkshubham patil
This document proposes applying big data analytics to improve mobile cellular networks. It presents an architectural framework that collects big data from mobile networks, including signaling data, traffic data, location data, and radio waveforms. The data is analyzed using platforms like Apache Hadoop. Analytics can optimize network operations and enhance the subscriber experience through applications like identifying coverage issues and facilitating location-based services. Open challenges remain in fully leveraging big data to advance cellular networks.
Analysis of Educational Robotics activities using a machine learning approachLorenzo Cesaretti
These slides present the preliminary results through the utilisation of machine learning techniques for the analysis of Educational Robotics activities. An experimentation with 197 secondary school students from Italy was con-ducted, through updating Lego Mindstorms EV3 programming blocks in order to record log files containing the coding sequences designed by the students (within team work), during the resolution of a preliminary Robotics’ exercise. We utilised four machine learning techniques (logistic regression, support vec-tor machine, K-nearest neighbors and random forests) to predict the students’ performance, comparing a supervised approach (using twelve indicators ex-tracted from the log files as input for the algorithms) and a mixed approach (ap-plying a k-means algorithm to calculate the machine learning features). The re-sults have highlighted that SVM with the mixed approach outperformed the other techniques, and that three learning styles were predominantly emerged from the data mining analysis.
RECAP at ETSI Experiential Network Intelligence (ENI) MeetingRECAP Project
This presentation was delivered by Johan Forsman (Tieto), Jörg Domaschka (UULM) and Paolo Casari (IMDEA Networks) at the ETSI Experiential Network Intelligence (ENI) Meeting in Warsaw, Poland, on April 12th, 2019. ETSI Experiential Networked Industry Specification Group (ENI ISG) work on defining a Cognitive Network Management architecture using Artificial Intelligence (AI) techniques and context-aware policies to adjust offered services based on changes in user needs, environmental conditions and business goals. The intention is that the use of Artificial Intelligence techniques in the network management system should solve some of the problems of future network deployment and operations. For more information, see https://www.etsi.org/technologies/experiential-networked-intelligence.
R3 TREES - Integrated Management of Urban Green AreasPaolo Viskanic
R3 GIS is an Italian company that develops green area management software called R3 TREES. The software allows multiple stakeholders to access a central geodatabase of urban green space assets. It facilitates jobs, inspections, and workflows while also providing citizen information through public maps. R3 TREES supports management of various asset types and integrates tools for data entry, quality control, historical records, and more to help municipalities efficiently maintain their urban green areas.
Similar to Smart Urban Planning Support through Web Data Science on Open and Enterprise Data (20)
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Smart Urban Planning Support through Web Data Science on Open and Enterprise Data
1. Smart Urban Planning Support through
Web Data Science on Open and
Enterprise Data
Gloria Re Calegari and Irene Celino
CEFRIEL – Politecnico di Milano
1
The 24th International World Wide Web Conference
Florence, Italy
18 – 22 May 2015
Web Data Science meets Smart Cities
19th May 2015
2. Digital information about cities
• Large number of data sources available on the web (Open data):
• Urban planning (land cover, public registers)
• Demographics and statistics about municipality
• User generated information:
• Volunteered geographic information and crowdsourcing information (Open Street Map)
• Location based social network (Foursquare check-ins and geo located information)
• Close data sources produced and maintained by enterprises:
• Phone activity data
Cost of data management (collection, cleansing, maintenance) is highly variable
with respect to the diverse data origins.
2
3. Research goal
Long term goal:
• Can we predict (generate or update) a costly dataset from a set of
cheap information sources?
Cheap datasets
Expensive
datasets
Predict or
update
3
4. Our case study
• Data collection
• Available datasets about Milano
• Problem of spatial granularities and pre-processing of the datasets
• Data processing
• Definition of input/output
• Predictive analysis
• Statistical learning
• Machine learning
• Results evaluation
4
5. Milano datasets
Demographics:
• population density
• Spatial resolution: census area
• Source: Milano open data
Points of interest (POIs):
• Trasports, schools, sports facilities, amenity places,
shops ...
• Spatial resolution: lat-long points
• Source: Milano open data (official) and Open Street
Map (user generated)
5
6. Milano datasets
Land use cover:
• type of land use according to CORINE
taxonomy (3-levels hierarchy, up to 40 types of
land use defined)
• CORINE taxonomy
http://swa.cefriel.it/ontologies/corine.html#
• 5 type selected (which better feature
metropolitan area as Milan)
1. Residential
2. Agricultural
3. Commercial/industrial
4. Parks and green areas
5. Sport centres
• Spatial resolution: building level
• Source: Lombardy region open data
6
7. Milano datasets
Call data records:
• 5 phone activities
• Incoming SMS
• Outcoming SMS
• Incoming CALL
• Outcoming CALL
• Internet
• Recorded every 10 minutes (144 values a day for each activity)
for 2 months (Nov-Dec 2013)
• Summarizing structure: a footprint for each cell (average
activity over all the days, distinguishing between week and
weekend days)
• Spatial resolution: grid of 3538 square cells of 250m
• Source: Telecom Italia – provided for their Big Data Challenge
http://theodi.fbk.eu/openbigdata/
7
8. Pre-processing
Uniform the spatial resolution in order to
make datasets comparable.
Spatial resolution used: grid of 3538 square
cells of 250m
Overlapping and intersecting layers using
QGIS software.
New datasets generated:
• Presence/absence of POIs in each cell
• Weighted sum of population density in each cell
• Percentage shares of each land use over each cell area
8
9. Selection of input/output variables
Predictive models
(regression)
Land use density:
• Residential
• Agricultural
• Commercial
• Green area
• Sport facilities
Population density
Telecom data
• means of each
phone activity
(10 values)
• means hour-by-
hour of all the
activities (24
values)
POIs
• School
• Transport
• Shop
• Food
• Sport
• ...
9
INPUT
OUTPUT
10. Aims of the experiments
1. Comparing different regression algorithms
1. Statistical Learning approach -> Multiple Linear Regression (MLR)
2. Machine Learning approach -> Random Forest (RF)
2. Evaluating how the number of predictors impacts the models
performances
1. All the predictors
2. Manual selection of a subset of predictors
3. Automatic selection of predictors by AIC (Akaike information criterion)
10
11. Tests performed
5 tests combining the different algorithms and inputs
All predictors Manual selection AIC selection
RF x x
MLR x x x
11
12. Methodology of the experiments
• Dividing dataset into training (90%) and test (10%) sets
• Training the model using the 10 fold cross validation to avoid
overfitting
• Calculating the Adjusted R^2 Index to measure the goodness of the
model (percentage of variance explained)
12
13. Results
1) Different output results: some
variables are predicted better
2) Models comparison: RF always
equals or outperforms MLR (data
does not follow a linear distribution
but a more complex one)
3) Number of predictors: RF-manual
selection is usually better than RF-
all and MLR AIC-selection is better
than others MLR models. Higher
the number of variables included in
the model, the more the risk of
overfitting (higher difference
between R^2 of training and test
set)
MLR – manual
selection
MLR – all MLR – AIC
selection
RF – all RF– manual
selection
13
Adj R-square RF - all RF - manual selection
Train Test Train Test
population 0.668 0.623 0.604 0.591
residential 0.633 0.588 0.623 0.614
14. worse results in RF-manual selection
Predictors importance calculated by RF-all
14
7 vars in
the top10
out of the
manually
selected
2 vars in
the top10
out of the
manually
selected
Variable selection is an essential step in optimizing a predictive model
better results in RF-manual selection
15. Conclusions
• Encouraging results in employing open and enterprise datasets in
regression models
• Good results in predicting population, residential and agricultural
areas -> explained variability reaching 62%
• There is a relation between land use/popoulation and diverse and
heterogeneous datasets used as predictors (POIs and phone activity)
• Chosing the best predictors is an ‘’art’’. A lot of relevant data
available about cities. A preprocessing phase is essential to select only
the most informative and discriminative variables.
15
16. Future work
• Improvements on input variables: preprocessing predictors to extract more
discriminative information from the data (changing the POIs data from
presence/absence to distances from the closest POI )
• Improvements on output variables: definition of new outputs that are easier
to predict experimentally (dense residential, sparse residential, agricultural,
industrial/commercial, parks and natural stuff). Problems in predicting specific
land uses (parks, sport centres) -> other kind of input data may be required.
• Improvements on predictive algorithms: better results using Support Vector
Machine (SVM) -> the urban environment is so complex that cannot be
modelled using linear models
• Reproducibility of our solution on different scenarios: comparable results
obtained on other European cities (Barcelona, Muenchen and Brussels) -> the
methodology proposed is successful.
16
17. 17
Thank you! Any question?
Gloria Re Calegari and Irene Celino
CEFRIEL – Politecnico di Milano