A DATA-DRIVEN APPROACH: URBAN WATER QUALITY PREDICTION THROUGH
UBIQUITOUS DATA
Presented By
P.GURU SAI (MCA II Year)
Reg. No: 22091F0014
Under the esteemed guidance of
MR.V.RAJA SEKHAR MCA, M. TECH
Assistant Professor, Dept. of CSE
DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS
RAJEEV GANDHI MEMORIAL COLLEGE OF ENGINEERING & TECHNOLOGY
(AUTONOMOUS)
NANDYAL-518501, (Estd-1995)
A DATA-DRIVEN APPROACH:
URBAN WATER QUALITY PREDICTION
THROUGH UBIQUITOUS DATA
A Data-Driven Approach: Leveraging ubiquitous (Universal) data from multiple
domains to forecast water quality, addressing challenges through a multi-task
multi-view learning framework.
INDEX
 Abstract
 Introduction
 Existing System
 Proposed System
Software Design
Modules
Proposed Algorithms
SDLC
UML Diagrams
Software and Hardware
Requirements
Implementation
Testing
Feature Enhancement
Snap Shorts
Conclusion
 ABSTRACT
Urban water quality is of great importance to our daily lives.
 Prediction of urban water quality help to control water pollution and protect
human health.
The data-driven approach is used to predict urban water quality using
ubiquitous data.
It leverages a multi-task multi-view learning framework to fuse
heterogeneous data from various domains, capturing both local and global
information to forecast water quality.
The approach addresses the challenges of non-linear variations in water
quality and spatial correlations among different stations.
 INTRODUCTION
Urban water is a vital resource that affects various aspects of human, health and
urban lives.
 People living in major cities are increasingly concerned about the urban water
quality, calling for technology that can monitor and predict the water quality in
real time throughout the city.
 Urban water quality, which serves as “a powerful environmental determinant” and
“a foundation for the prevention and control of waterborne diseases” , refers to the
physical, chemical and biological characteristics of a water body.
 And several chemical indexes (such as residual chlorine, turbidity and pH) can be
used as effective measurements for the water quality in current urban water
distribution systems.
 EXISTING SYSTEM
Several studies in the environmental science have been tried to analyze
the water quality problems.
In general, those data-driven approaches in the environmental science
can fall into the following three major categories:
Instance-based Learning models (IBL)
 Artificial Neural Network models (ANN)
 And Support Vector Machine models (SVM).
 DISADVANTAGES
The system is implemented only Multi-task and Multi-view Learning
Approaches.
Instance-Based Learning(IBL)Models: Sensitive to Noise, Computationally
Expensive, Lack of Interpretability
Artificial Neural Network (ANN) Models: Long Training Time , Fixed
Size Limitation ,Lack of Interpretability
Support Vector Machine (SVM) Models: Choosing the Right Kernel,
Long Training Time, Difficult to Interpret, Overfitting, Sensitivity
to Hyperparameter
 PROPOSED SYSTEM
Proposed System :The proposed system is a data-driven approach that leverages ubiquitous data from
various sources to predict urban water quality. It utilizes a multi-task multi-view learning framework to
fuse heterogeneous data from different domains, capturing both local and global information to forecast
water quality.
Key Components:
Data Collection:
Ubiquitous Data Sources: Collect data from various domains, including water quality monitoring stations,
hydraulic systems, meteorology, pipeline networks, road networks, and Points of Interest (POIs).
Data Integration: Combine the collected data into a unified dataset.
Multi-Task Multi-View Learning Framework:
Spatio-temporal View Alignment: Combine local spatial and temporal information of each station.
Prediction Alignment Among Stations: Capture spatial correlations among different stations and perform
co-predictions.
 ADVANTAGES
Improved Accuracy: The multi-task multi-view learning framework effectively
integrates data from various domains, leading to more accurate predictions of
urban water quality.
Captures Complex Factors: The system can capture complex factors affecting
water quality, including spatial and temporal factors, and their spatio-temporal
heterogeneity.
Global Information: The framework incorporates global information among
different stations, addressing the challenge of spatial correlations.
Flexibility and Scalability: The data-driven approach can be applied to various
urban water distribution systems, making it a flexible and scalable solution.
Real-Time Monitoring: The system can provide real-time water quality
predictions, enabling timely interventions to control water pollution and protect
human health.
System Design
Actual Problem
Normally, a water quality dataset is found to be incomplete and noisy, as a
result, reading data from dataset linkage traditionally fails within the
discipline of software engineering.
Firstly, the issue of data quality and standardization looms large.
We cannot predict the accuracy with noisy or incomplete data.
The main problem is that it is not using the perfect framework or software
engineering approach to predict the accuracy of the disease.
 Solution
The DATA-DRIVEN APPROACH uses five Algorithms which are : Naïve Bayes ,
K-Nearest Neighbours(KNN), Random Forest, Logistic Regression
Classifiers, Linear SVM .
In which we use mainly three algorithms .Which are given below :
Naïve Bayes,
SVM and
 Logistic Regression Classifiers.
Modules

Service Provider
In this module, the Service Provider has to login by using valid user name and password. After login
successful he can do some operations such as
Login, Train and Test Data Sets, View Trained and Tested Accuracy in Bar Chart, View Trained and
Tested Accuracy Results, View Predicted Water Quality Type, Find Water Quality Prediction Ratio,
Download Trained Data Sets, View Water Quality Prediction Ratio Results, View All Remote Users.
 View and Authorize Users
In this module, the admin can view the list of users who all registered. In this, the admin can view the user’s
details such as, user name, email, address and admin authorize the users.
In this module, there are n numbers of users are present. User should register before
doing any operations. Once user registers, their details will be stored to the database.
After registration successful, he has to login by using authorized user name and
password.
Once Login is successful user will do some operations like REGISTER AND
LOGIN, PREDICT WATER QUALITY TYPE, VIEW YOUR PROFILE.
Remote User
 SOFTWARE REQUIREMENTS
Operating System : Windows 10/11
Programming : Python
Front End : Python
HARDWARE REQUIREMENTS
Processor : Intel core i5
RAM : 4 GB
Hard disk : 512 GB
 IMPLEMENTATION
• PYTHON
• Python is an object-oriented, interpreted, high-level programming language. The
design of Python emphasizes readability. It contains fewer syntactical structures than
other languages and usually uses English terms in contrast to other languages that use
punctuation.
• Interpreted Python: An interpreter processes Python at runtime. Your program does
not have to be compiled before it is run. This is like the difference between PHP and
PERL.
• Python is Interactive: You can write programs by just interacting with the interpreter
while seated at a Python prompt.
• Python is object-oriented: Python is compatible with the Object-Oriented
programming approach, which encapsulates code inside objects.
• Python is a Beginner's Language: Python is an excellent language for those just
starting out in programming, as it allows the creation of a variety of programmers,
such as games, simple text processing, and Web browsers.
• Python Features
• Features of Python include:
• Ease of learning: Python has a straightforward structure, a small number of keywords, and a well-defined
syntax. This facilitates the student's rapid language acquisition.
• Easy to read: Python code has better definition and is easier to see with the naked eye.
• Easy to maintain: The source code of Python is not too difficult to maintain.
• A large standard library: The majority of the Python library is cross-platform compatible and highly portable
on Windows, Macintosh, and UNIX systems.
• Interactive Mode: Python includes an interactive mode that enables interactive debugging and testing of short
code segments.
• Portability: Python offers a uniform user interface across a broad range of hardware systems, enabling it to run
on them.
• Extendable: The Python interpreter can accept low-level modules. Programmers can enhance or modify these
modules to make their tools more effective.
• Databases: All major commercial databases have interfaces available for Python.
• GUI Programming: Python allows the creation and porting of GUI applications to a variety of system calls,
libraries, and Windows systems, including Macintosh, Windows MFC, and Unix's X Window system.
• Scalable: Compared to shell scripting, Python offers larger programs more structure and assistance.
TESTING
• Functional test
• Systematic demonstrations of the functions tester’s availability in accordance with technical and
business requirements, system documentation, and user guides are provided by functional testing.
• The following items are the focal points of functional testing:
• Valid Input : The kinds of valid input that have been established must be approved.
Rejecting classes of detected invalid input is necessary.
• Functions : The mentioned functions need to be used.
• Output : The application output classes that have been identified need to be
practiced.
• Systems : Invoking interface systems or procedures is necessary.
• Functional tests are organized and prepared with an emphasis on requirements, important
functions, or unique test cases. Systematic coverage for identifying business process flows is also
necessary. Data fields, specified procedures, and subsequent processes must all be taken into
account while testing. Further tests are found and the usefulness of the ones that are already in
place is assessed prior to the completion of functional testing.
System Test
• Testing the system makes ensuring that all of the integrated software is up to code. To
guarantee dependable and understood outcomes, it checks a setup. System integration test
with configuration orientation is a form of system testing. Pre-driven process connections and
integration points are highlighted in system testing, which is based on process descriptions
and flows.
White Box Testing
• White box testing is a type of software testing where the tester is privy to the program's inner
workings, structure, and language—or at the very least, what it is meant to do. It has a
purpose. It is employed for testing regions that are inaccessible from a black box level.
Black Box Testing
• Testing software "black box" means doing it without having any idea of the inner workings,
architecture, or language of the module being tested. such the majority of other test types,
black box tests also need to be written from an official source document, such a specification
or requirements document. This type of testing treats the software being tested as a "black
box. "It is impossible to "see" inside. Without taking into account the functionality of the
software, the test generates inputs and reacts to outputs.
SNAP SHOTS
CONCLUSION
• This Ppt presents a novel data-driven approach to estimate the water quality of a station by combining numerous sources of urban
information.
• We assess our approach based on Shenzhen’s water quality and different urban information. The test comes about illustrate the
adequacy and proficiency of our approach.
• Particularly, our approach outflanks the conventional RC rot demonstrate [2] and other classical time arrangement prescient models
(ARMA, Kalman) in terms of RMSE metric.
• In the interim, as our approach comprises of two components, each of the components illustrates its viability through broad tests and
investigation.
• In specific, the to begin with component is the powerful components distinguishing proof, which investigates the components that
influence the urban water quality by means of broad tests and examination in Area 3 and 4.
• The moment one is a spatiotemporal multi-view multi-task learning (STMTMV) system that comprises of multi-view learning and
multi-task learning.
• The tests have appeared that STMTMV has a prescient precision of around 85% for determining following 1-4 hours, which beats the
single-task strategies (LR) by roughly 11% and the single-view strategies (t-view and s-view) by around 11% and 12%, separately.
• The code has been discharged at: https://www.microsoft.com/enus/research/publication/urbanwater- quality-prediction-based-multi-
task-multi-view-learning2/ In future, we arrange to bargain with the water quality induction issues in the urban water dissemination
frameworks through a restricted number of water quality screen station
THANK YOU

22cggggffhhfdffgv091F0014 FINAL PPT-1.pptx

  • 1.
    A DATA-DRIVEN APPROACH:URBAN WATER QUALITY PREDICTION THROUGH UBIQUITOUS DATA Presented By P.GURU SAI (MCA II Year) Reg. No: 22091F0014 Under the esteemed guidance of MR.V.RAJA SEKHAR MCA, M. TECH Assistant Professor, Dept. of CSE DEPARTMENT OF MASTER OF COMPUTER APPLICATIONS RAJEEV GANDHI MEMORIAL COLLEGE OF ENGINEERING & TECHNOLOGY (AUTONOMOUS) NANDYAL-518501, (Estd-1995)
  • 2.
    A DATA-DRIVEN APPROACH: URBANWATER QUALITY PREDICTION THROUGH UBIQUITOUS DATA A Data-Driven Approach: Leveraging ubiquitous (Universal) data from multiple domains to forecast water quality, addressing challenges through a multi-task multi-view learning framework.
  • 3.
    INDEX  Abstract  Introduction Existing System  Proposed System Software Design Modules Proposed Algorithms SDLC UML Diagrams Software and Hardware Requirements Implementation Testing Feature Enhancement Snap Shorts Conclusion
  • 4.
     ABSTRACT Urban waterquality is of great importance to our daily lives.  Prediction of urban water quality help to control water pollution and protect human health. The data-driven approach is used to predict urban water quality using ubiquitous data. It leverages a multi-task multi-view learning framework to fuse heterogeneous data from various domains, capturing both local and global information to forecast water quality. The approach addresses the challenges of non-linear variations in water quality and spatial correlations among different stations.
  • 5.
     INTRODUCTION Urban wateris a vital resource that affects various aspects of human, health and urban lives.  People living in major cities are increasingly concerned about the urban water quality, calling for technology that can monitor and predict the water quality in real time throughout the city.  Urban water quality, which serves as “a powerful environmental determinant” and “a foundation for the prevention and control of waterborne diseases” , refers to the physical, chemical and biological characteristics of a water body.  And several chemical indexes (such as residual chlorine, turbidity and pH) can be used as effective measurements for the water quality in current urban water distribution systems.
  • 6.
     EXISTING SYSTEM Severalstudies in the environmental science have been tried to analyze the water quality problems. In general, those data-driven approaches in the environmental science can fall into the following three major categories: Instance-based Learning models (IBL)  Artificial Neural Network models (ANN)  And Support Vector Machine models (SVM).
  • 7.
     DISADVANTAGES The systemis implemented only Multi-task and Multi-view Learning Approaches. Instance-Based Learning(IBL)Models: Sensitive to Noise, Computationally Expensive, Lack of Interpretability Artificial Neural Network (ANN) Models: Long Training Time , Fixed Size Limitation ,Lack of Interpretability Support Vector Machine (SVM) Models: Choosing the Right Kernel, Long Training Time, Difficult to Interpret, Overfitting, Sensitivity to Hyperparameter
  • 8.
     PROPOSED SYSTEM ProposedSystem :The proposed system is a data-driven approach that leverages ubiquitous data from various sources to predict urban water quality. It utilizes a multi-task multi-view learning framework to fuse heterogeneous data from different domains, capturing both local and global information to forecast water quality. Key Components: Data Collection: Ubiquitous Data Sources: Collect data from various domains, including water quality monitoring stations, hydraulic systems, meteorology, pipeline networks, road networks, and Points of Interest (POIs). Data Integration: Combine the collected data into a unified dataset. Multi-Task Multi-View Learning Framework: Spatio-temporal View Alignment: Combine local spatial and temporal information of each station. Prediction Alignment Among Stations: Capture spatial correlations among different stations and perform co-predictions.
  • 9.
     ADVANTAGES Improved Accuracy:The multi-task multi-view learning framework effectively integrates data from various domains, leading to more accurate predictions of urban water quality. Captures Complex Factors: The system can capture complex factors affecting water quality, including spatial and temporal factors, and their spatio-temporal heterogeneity. Global Information: The framework incorporates global information among different stations, addressing the challenge of spatial correlations. Flexibility and Scalability: The data-driven approach can be applied to various urban water distribution systems, making it a flexible and scalable solution. Real-Time Monitoring: The system can provide real-time water quality predictions, enabling timely interventions to control water pollution and protect human health.
  • 10.
    System Design Actual Problem Normally,a water quality dataset is found to be incomplete and noisy, as a result, reading data from dataset linkage traditionally fails within the discipline of software engineering. Firstly, the issue of data quality and standardization looms large. We cannot predict the accuracy with noisy or incomplete data. The main problem is that it is not using the perfect framework or software engineering approach to predict the accuracy of the disease.
  • 11.
     Solution The DATA-DRIVENAPPROACH uses five Algorithms which are : Naïve Bayes , K-Nearest Neighbours(KNN), Random Forest, Logistic Regression Classifiers, Linear SVM . In which we use mainly three algorithms .Which are given below : Naïve Bayes, SVM and  Logistic Regression Classifiers.
  • 12.
    Modules  Service Provider In thismodule, the Service Provider has to login by using valid user name and password. After login successful he can do some operations such as Login, Train and Test Data Sets, View Trained and Tested Accuracy in Bar Chart, View Trained and Tested Accuracy Results, View Predicted Water Quality Type, Find Water Quality Prediction Ratio, Download Trained Data Sets, View Water Quality Prediction Ratio Results, View All Remote Users.  View and Authorize Users In this module, the admin can view the list of users who all registered. In this, the admin can view the user’s details such as, user name, email, address and admin authorize the users.
  • 13.
    In this module,there are n numbers of users are present. User should register before doing any operations. Once user registers, their details will be stored to the database. After registration successful, he has to login by using authorized user name and password. Once Login is successful user will do some operations like REGISTER AND LOGIN, PREDICT WATER QUALITY TYPE, VIEW YOUR PROFILE. Remote User
  • 14.
     SOFTWARE REQUIREMENTS OperatingSystem : Windows 10/11 Programming : Python Front End : Python HARDWARE REQUIREMENTS Processor : Intel core i5 RAM : 4 GB Hard disk : 512 GB
  • 15.
     IMPLEMENTATION • PYTHON •Python is an object-oriented, interpreted, high-level programming language. The design of Python emphasizes readability. It contains fewer syntactical structures than other languages and usually uses English terms in contrast to other languages that use punctuation. • Interpreted Python: An interpreter processes Python at runtime. Your program does not have to be compiled before it is run. This is like the difference between PHP and PERL. • Python is Interactive: You can write programs by just interacting with the interpreter while seated at a Python prompt. • Python is object-oriented: Python is compatible with the Object-Oriented programming approach, which encapsulates code inside objects. • Python is a Beginner's Language: Python is an excellent language for those just starting out in programming, as it allows the creation of a variety of programmers, such as games, simple text processing, and Web browsers.
  • 16.
    • Python Features •Features of Python include: • Ease of learning: Python has a straightforward structure, a small number of keywords, and a well-defined syntax. This facilitates the student's rapid language acquisition. • Easy to read: Python code has better definition and is easier to see with the naked eye. • Easy to maintain: The source code of Python is not too difficult to maintain. • A large standard library: The majority of the Python library is cross-platform compatible and highly portable on Windows, Macintosh, and UNIX systems. • Interactive Mode: Python includes an interactive mode that enables interactive debugging and testing of short code segments. • Portability: Python offers a uniform user interface across a broad range of hardware systems, enabling it to run on them. • Extendable: The Python interpreter can accept low-level modules. Programmers can enhance or modify these modules to make their tools more effective. • Databases: All major commercial databases have interfaces available for Python. • GUI Programming: Python allows the creation and porting of GUI applications to a variety of system calls, libraries, and Windows systems, including Macintosh, Windows MFC, and Unix's X Window system. • Scalable: Compared to shell scripting, Python offers larger programs more structure and assistance.
  • 17.
    TESTING • Functional test •Systematic demonstrations of the functions tester’s availability in accordance with technical and business requirements, system documentation, and user guides are provided by functional testing. • The following items are the focal points of functional testing: • Valid Input : The kinds of valid input that have been established must be approved. Rejecting classes of detected invalid input is necessary. • Functions : The mentioned functions need to be used. • Output : The application output classes that have been identified need to be practiced. • Systems : Invoking interface systems or procedures is necessary. • Functional tests are organized and prepared with an emphasis on requirements, important functions, or unique test cases. Systematic coverage for identifying business process flows is also necessary. Data fields, specified procedures, and subsequent processes must all be taken into account while testing. Further tests are found and the usefulness of the ones that are already in place is assessed prior to the completion of functional testing.
  • 18.
    System Test • Testingthe system makes ensuring that all of the integrated software is up to code. To guarantee dependable and understood outcomes, it checks a setup. System integration test with configuration orientation is a form of system testing. Pre-driven process connections and integration points are highlighted in system testing, which is based on process descriptions and flows. White Box Testing • White box testing is a type of software testing where the tester is privy to the program's inner workings, structure, and language—or at the very least, what it is meant to do. It has a purpose. It is employed for testing regions that are inaccessible from a black box level. Black Box Testing • Testing software "black box" means doing it without having any idea of the inner workings, architecture, or language of the module being tested. such the majority of other test types, black box tests also need to be written from an official source document, such a specification or requirements document. This type of testing treats the software being tested as a "black box. "It is impossible to "see" inside. Without taking into account the functionality of the software, the test generates inputs and reacts to outputs.
  • 19.
  • 23.
    CONCLUSION • This Pptpresents a novel data-driven approach to estimate the water quality of a station by combining numerous sources of urban information. • We assess our approach based on Shenzhen’s water quality and different urban information. The test comes about illustrate the adequacy and proficiency of our approach. • Particularly, our approach outflanks the conventional RC rot demonstrate [2] and other classical time arrangement prescient models (ARMA, Kalman) in terms of RMSE metric. • In the interim, as our approach comprises of two components, each of the components illustrates its viability through broad tests and investigation. • In specific, the to begin with component is the powerful components distinguishing proof, which investigates the components that influence the urban water quality by means of broad tests and examination in Area 3 and 4. • The moment one is a spatiotemporal multi-view multi-task learning (STMTMV) system that comprises of multi-view learning and multi-task learning. • The tests have appeared that STMTMV has a prescient precision of around 85% for determining following 1-4 hours, which beats the single-task strategies (LR) by roughly 11% and the single-view strategies (t-view and s-view) by around 11% and 12%, separately. • The code has been discharged at: https://www.microsoft.com/enus/research/publication/urbanwater- quality-prediction-based-multi- task-multi-view-learning2/ In future, we arrange to bargain with the water quality induction issues in the urban water dissemination frameworks through a restricted number of water quality screen station
  • 24.