Yuwu chen wastewater treatment

The urban wastewater treatment
Yuwu Chen
Department of Chemical Engineering
12/4/2014

Introduction
 Wastewater treatment is the process of removing contaminants from wastewater

Introduction
 Water quality index
 Chemical oxygen demand (COD): the amount of dissolved oxygen needed by a
strong oxidizing agent water to break down organic material present in a given
water sample at certain temperature over a specific time period.
 Biological oxygen demand (BOD): the amount of dissolved oxygen needed by
aerobic biological organisms in a body of water to break down organic material
present in a given water sample at certain temperature over a specific time
period.
They indirectly measure the amount of organic compounds in water. COD and
BOD should be correlated.
 Suspended solids (SS)
 Volatile supended
 Sediments (SED)
 Inorganic element (N-NH3, P, S etc)
 pH
Directly measure the amount of a certain contaminant in water

Data Description
 The dataset comes from the daily measures of sensors in a urban wastewater treatment
plant.
 The data was collected by Manel Poch at Universitat Autonoma de Barcelona. Bellaterra.
Barcelona; Spain
 The full dataset was donated by Javier Bejar and Ulises Cortes at Universitat Politecnica
de Catalunya. Barcelona; Spain, and is available at:
http://archive.ics.uci.edu/ml/machine-learning-databases/water-treatment/

Data Description
 Date
 In dd/mm/yy format: 1/1/90 to10/30/91. Some days in this period are not
included.
 Water volume
 The daily flow volume to the plant in m3: 10005 to 60081
 Water quality index (28 variables)
 Water quality index were recorded before and/or after a process step.
 BOD, COD, SS, SSV, SED ...
 Performance (9 variables )
 Performance variables were directly calculated from water quality index. They
can be used to evaluate the performance of each process unit. 0.6% to 100%

Data Management
 Data transformation
The original variable “date” is characteristic and too long. So I transform it to
a categorical variable “day”:
date day
1/1/1990 1
2/1/1990 2
……
30/10/1991 668
Then rename the row name of the data-frame with the variable day.
 Correct the wrong format in the variable BOD.in3
 Subset data
 In this study, five water quality index of influent/effluent were used: pH, COD, BOD,
SS, SED.
 Omit the missing value in each subset
Pretreatment Primar
y
Secondar
y
influent2 influent3 effluentinfluent1

Data Summary
 Paired plot example: influent1 (influent to the pretreatment unit)

Method Description
 Step 1: Principle component analysis (PCA) on each influent/effluent subset
 Visualize the data to see the relationships among the observations and
variables in low dimensions
 Step 2: Clustering days based on the daily performance
 Identify subgroups of similar days based on the daily performance of each
process unit or the whole plant

Step 1: Principle component analysis (PCA)
on influent1 subset
 Principal component loading vector of influent1 (influent to the pretreatment unit)
 Proportion of variance explained (PVE) by each PC and cumulative PVE
1 2 3 4 5 6
0.00.20.40.60.81.0
Principal Component
ProportionofVarianceExplained
1 2 3 4 5 6
0.00.20.40.60.81.0
Principal Component
CumulativeProportionofVarianceExplained

on influent1subset
 Biplot for influent1

on other three influent/effluent subsets
 Biplots for other three influent/effluent subsets
-2 0 2 4 6
-20246
PC1
PC2
2
3
4
7
89
10
11 12
14
15
1617
1819
21
22
23
24 25
26
28
293033
35
36
3738
3940
42
43 44
45
46
47
49
50
52
53
54
56
6466
67
6870
717273
74
75
77
78
79
80
8182
84
85
86
87
88
89
91
92
93
94
9596
98
99
100
101
106
107
108109
112
113
114
115
116
117
119121
122123
124
126
128
129
130
131133
134
135
138140
141142
143
144
145
147
148
149 150
152
154
155
156157
158 159
161
162
163
164
165166
168
169
170
171
172
173175
176
177
178
179180
182
183
184
185
186
187189
190
191
192
193194
196
197
198
199
200201
203
204
205
206
207
208
210
212
213
214
215217
218219
220
221222225
231
232
233
234
235
236
239240
241
242
243
245
246
247
248
249250
252
254
255
256
257
259
260
261
262
263264
266
267
268
269
270
271
273
274
275
276
277
278280
281282
283285
287
288
289
290 291 292
294
295
296
297
298
299
308
309
310
311312313
315
316
317
318
319322
323
324
325
326
327
329
330 331
332
333
334
336
337
338
340
343
344
346347
350351 352
353
354
355
357
360
361
364
366
367
368
369
371
372
373
374
375
378
379
380
381382
383
385
386
387
388389
392
393
394
395
396
397
399
400
401 402
403
406
407408409410
411
413
414
415
417
420
421422
423
424425
427
428
429
430
431
434 435
436
437
438
439
441
443
444
445
448
449
450
456 457458
459
460
462
463
464
465
466
469
470471472
473
474
476
477
478
480
483
484486
487
488490
491
492
493
494
497
498
499
500
501
502504
505
506 507
508511
512
513514515
516
518 519
520
521
522
525
526
528529532
533534
535
536
537
540
541
542543544
546
547
548
549550
553
554
555
556
578579
581
582
583
584
585
588
589
590
591
593
596
597
598
599600
603 604
605
606
639
640
641
642644
646
647649
650
651
653
654
656
657
658
660
661
667
-0.5 0.0 0.5 1.0 1.5
-0.50.00.51.01.5
volume
pH.in3
BOD.in3COD.in3
SS.in3
SED.in3
0 5 10
0510
PC1
PC2
34
7
11
1214
151617
18
19
21
22
23
2425
26
28
2930
33
35
37
3839
40
46
47
49
50
52
53
54
56
64
70
71
72
73
7475
77
78
79
81
8284
85
8687
88
89
91
92
93
94
9596
98 99
100101
106
107
108
109110
112
113
114
115 116117
119
121
122
123
124
126
127
128
129
130
131133
134
135
138
140
141
142
143
144145
147
148
149
150
152154
155
156
157
158159
161
162
163
164
165
166
168
169170171
172173
175
176 177178179180
182
183
184
185
186
187
189
190
191
192193194
196
197
198 199200201
203 204
205
206
207
208
210
211
212213
214
215
217 218
219
220
221
222224
225
227231
232
233
234
235
239240
241
242
243
245
246247
248
249
250252254
255
256
257
259
260261
262
263
264266
267268
270
271
273274275
276277
278
280
281
282
283
285
287
288
289
290
291292
294
295
296 297
299
306
308
309
310
311312
313
315
316
317318
319
322
323 324325326327
330
331332
333
334336
337
338
340343
345
346347
350
351
352
353
354355
357
360
361 364366367368
369
371
372
373 374375
378
379
380
381
382383
385
386
387
388
389
392
393
394
395
396
397
399
400
401
402 403
406
407408
409410
411
413
414
415
417
420421
422
423
424
425
427428
429430431434
435
436
437
438
439441442
443
444
445
448
449
450
451
456
457
458
459
460
462
463464
465
466
469
470
471
472
473
474476477
478
479
480
483
484486
487488
490
491492
493
494
497
498
499500 501
502504
505506
507
508
511
512
513
514
515
516518
519520
521
522
525 526528
529
532
533534
535
536537
540
541542
543
544546547
548
549
550
553
554
555
556
578579
581
582
583
584
585
586
588
589
590
591
593
595
596
597
598599600
603
604
605
606
639
640
641643
644
646647649
650
651
653
654656657658
660
661662
663
665667
-0.2 0.0 0.2 0.4 0.6 0.8
-0.20.00.20.40.60.8
volume
pH.in2
BOD.in2
SS.in2
SED.in2
Influent2 (pretreatment >> primary) Influent3 (primary >>
secondary )
Effluent (out of plant)
0 5 10 15 20 25
0510152025
PC1
PC2
12
34
7
8910111214
15
1617
181921
22
23
24
25
26
28
293033
35
36 3738
39
40
42
434546
4749
50
5253
54
5664656667
68
70
71
72 73
74757778
79 80 8182
84
85
86
87888991
9293
9495
96
98
99
100101
106
107108
109
112
113114
115 116117 119121
122123124
126127
128
129
130
131133134135137138140
141 142
143
144
145147148
149150
152
154
155
156157
158159
161162163
164
165166
169170171172173
175
177
178179180
182
183
184
185
186
187189190
191
192
193194196
197198
199200201
204
205
206208
210
212213
214
215217
218219220221222224225
227228229231
232
233234
235236
240242
243
245246
247
248
249250
252
254255257
259
260261
262263264266267
269270
271
273
274
275
276277
278280
281282283
285287
288289
290291292294
295
296
297298299
308
310
311312313315316
317
318
319322
323324
325
326327
329
330
331
332
333334336
337
338
340343344 346347
350351352353354355
357
360361366
367
368369371
372373
374
375
378
379
380 381
382
383
385
386387388389
392
393394395396
397
399
400401402403408409410411
413
414
415
417
420421
422423
424425
427
428429
430
431
434435436437438439
441442443
444445
448450456457458459460
462
463464465
466
469
470471472473474
476
477
478
479480
483
484
486487
488490491492
493494
497
498
499500
501502504
505
506507508511
512
513514515
518519520
521522
525
526528529532533534535536537
540541542543544546547548
550
553
555
556
578579582
583
584585586
588
589590591593
595
596597598599600
603604
605606639640641642644
646
647
649
650
651
653
654655656
657658
660
661
663664
665667
0.0 0.5 1.0
0.00.51.0
volume
pH.out
BOD.out
COD.out
SS.out
SED.out

Step 2: Clustering days based on the daily
performance
 What dissimilarity measure should be used to cluster the days?
 If Euclidean distance is used, then days when the process unit/the whole plant
have similar overall performance will be clustered together (Yes, this is
desirable).
 if correlation-based distance is used, then days with similar “preferences” (e.g.
days when have better BOD and COD performance but worse SS and SED
performance) will be clustered together, even if some days with these
“preferences” were better overall performance than others
 Scale to the unit variance or not?
 Data must be scaled, otherwise the water volume will dominate.
 Hierarchical clustering will be used.
 K-means or K-medoids?
 K-medoids is more robust than K-means in the presence of outlier

Hierarchical clustering: Average linkage
74
403
116
222
149
162
448
378
224
219
147
430
142
191
437
9
166
325
148
33
22
270
177
85
282
86
330
260
505
94
93
96
122
667
236
235
654
595
329
7
525
420
327
582
534
3
518
205
352
544
112
488
478
555
591
152
65
45
108
383
184
91
190
507
506
387
266
285
355
463
277
371
201
439
199
547
350
589
550
500
435
511
457
198
374
197
502
99
492
200
140
470
476
332
583
597
422
606
519
14
024681012
average linkage
Height

Hierarchical clustering: Complete linkage
74
403
116
222
149
378
122
667
235
236
152
591
65
45
108
534
3
518
205
352
544
112
488
478
555
654
7
420
327
582
595
329
140
470
476
332
14
525
583
422
606
200
597
519
162
448
224
219
147
430
142
191
437
9
166
325
96
93
260
505
94
148
33
22
86
330
85
282
270
177
457
374
197
435
198
492
511
502
99
387
266
285
355
463
371
350
277
589
550
500
201
439
199
547
383
184
91
190
507
506
051015
complete linkage
Height

Hierarchical clustering: Single linkage
74
403
116
378
222
149
448
162
96
147
235
437
93
33
22
148
85
282
219
177
654
9
236
430
142
191
224
595
329
270
166
325
260
505
94
86
330
591
152
355
7
506
534
3
507
91
190
285
45
108
65
463
544
184
383
420
327
582
205
518
352
555
478
112
488
511
371
14
435
525
200
492
277
476
457
332
201
439
199
350
547
589
550
500
198
374
197
502
99
583
140
470
519
597
422
606
387
266
122
667
0246810
single linkage
Height

K-medoids clustering
0 5 10 15 20
-505
clusplot(pam(x = sdata, k = k, diss = diss))
Component 1
Component2
These two components explain 80.33 % of the point variability.
Silhouette width si
-0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Silhouette plot of pam(x = sdata, k = k, diss = diss)
Average silhouette width : 0.37
n = 430 2 clusters Cj
j : nj | avei Cj si
1 : 149 | -0.01
2 : 281 | 0.57
0 5 10 15 20
-10-505
clusplot(pam(x = globalscale2, k = 3))
Component 1
Component2
These two components explain 80.33 % of the point variability.
-0.4
Silhouet
Average s
n = 430

Conclusion
Water quality index and flow amount of influent/effluent
have been visualized by PCA to see the relationships
among the observations and variables in low dimensions.
Clustering methods have been used to identify subgroups
of similar days.

Reference
``Avaluacio de tecniques de classificacio per a la gestio de Bioprocessos: Aplicacio a un
reactor de fangs activats'' Master Thesis. Dept. de Quimica. Unitat d'Enginyeria Quimica.
Universitat Autonoma de Barcelona. Bellaterra (Barcelona). 1993.
``LINNEO+: A Classification Methodology for Ill-structured Domains''. Research report RT-
93-10-R. Dept. Llenguatges i Sistemes Informatics. Barcelona. 1993.
``A knowledge-based system for the diagnosis of waste-water treatment plant''.
Proceedings of the 5th international conference of industrial and engineering applications of
AI and Expert Systems IEA/AIE-92. Ed Springer-Verlag. Paderborn, Germany, June 92.

Yuwu chen wastewater treatment

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Yuwu chen wastewater treatment

Similar to Yuwu chen wastewater treatment (20)

Recently uploaded

Recently uploaded (20)

Yuwu chen wastewater treatment

Editor's Notes