Your SlideShare is downloading. ×
Keywords and examples of machine learning
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Keywords and examples of machine learning

330

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
330
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Machine learning: Keywords + Applications1) Applications of machine learning - wind power forecasting (important e.g. for PengHu island!) - rainfalls estimation2) Some key words (you must know what they mean): - black box / white box - shrinking horizon - objective function - “what you get is what you have” - model complexity - cross-validation - generative model - quantile, value-at-risk
  • 2. What you will see in these slides1) Applications of machine learning - wind power forecasting (important e.g. for PengHu island!) - rainfalls estimation2) Some key words (you must know what they mean): - black box / white box - shrinking horizon - objective function - “what you get is what you have” - model complexity - cross-validation - generative model - quantile, value-at-risk
  • 3. I want to produce electricity
  • 4. I want to produce electricityI have:- water for hydroelectricity- a nuclear power plant- wind farms- gas turbines
  • 5. I want to produce electricityI must ensure, for each time step: Production of electricity = Demand of electricity Demand(t0), Demand(t1), Demand(t2), Demand(t3) known.
  • 6. I want to produce electricityWe get four equations:Production(t0) = Demand(t0)Production(t1) = Demand(t1)Production(t2) = Demandt(2)Production(t3) = Demand(t3) Other equation: Production = hydro-production + nuclear-production + wind-farm production + gas production
  • 7. I want to produce electricityWe get four equations:H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)Stock level for Hydro depends on production x(1) = x(0)-H(0) x(2) = x(1)-H(1) x(3) = x(2)-H(2) x(4) = x(3)-H(3)
  • 8. Also depends on inflowsWe get four equations:H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)Stock level for Hydro: x(0); constraint: x(i) >= 0 x(1) = x(0)+I(0)-H(0) x(2) = x(1)+I(1)-H(1) x(3) = x(2)+I(2)-H(2) x(4) = x(3)+I(3)-H(3)
  • 9. 8 equations (yes, it increases...)H(t0)+W(t0)+N(t0)+G(t0) = Demand(t0)H(t1)+W(t1)+N(t1)+G(t1) = Demand(t1)H(t2)+W(t2)+N(t2)+G(t2) = Demandt(2)H(t3)+W(t3)+N(t3)+G(t3) = Demand(t3)X(0) + I(0) – H(0) >=0X(0) + I(0) – H(0) + I(1) – H(1) >=0X(0) + I(0) – H(0) + I(1) – H(1) + I(2) – H(2) >=0X(0) + I(0)–H(0) +I(1)– H(1) +I(2)–H(2) +I(3)-H(3)>=0
  • 10. 8 equations (yes, it increases...)Nuclear has constraints as well:- N(1) in f(N(0))- N(2) in f(N(1))- N(3) in f(N(2)) (very simplified; in fact there are stocks, refills...)
  • 11. Ok! Summary ?W(0), W(1), W(2), W(3) wind farms production = can not be chosen and W(1), W(2), W(3) unknown!To be chosen:G(0), G(1), G(2), G(3) gas turbines productionH(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)N(0), N(1), N(2), N(3) nuclear power
  • 12. Ok! Summary ?To be chosen:G(0), G(1), G(2), G(3) gas turbines productionH(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)N(0), N(1), N(2), N(3) nuclear powerConstraints: production plans must satisfy constraints.E.g.: if unlimited gas turbines production, we might decide G(0)=demand(0)-W(0), G(1)=demand(1)-W(1), G(2)=demand(2)-W(2), G(3)=demand(3)-W(3) ==> it is a feasible solution
  • 13. Ok! Summary ?To be chosen:G(0), G(1), G(2), G(3) gas productionH(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)N(0), N(1), N(2), N(3) nuclear powerConstraints: production plans must satisfy constraints.E.g.: if unlimited gas production, we might decide G(0)=demand(0)-W(0), G(1)=demand(1)-W(1), G(2)=demand(2)-W(2), G(3)=demand(3)-W(3) ==> it is a feasible solution ==> it is a bad feasible solution
  • 14. Ok! Summary ?To be chosen:G(0), G(1), G(2), G(3) gas productionH(0), H(1), H(2), H(3) hydroelectric production (can be somehow negative)N(0), N(1), N(2), N(3) nuclear powerConstraints: production plans must satisfy constraints.E.g.: if unlimited gas production, we might decide G(0)=demand(0)-W(0), G(1)=demand(1)-W(1), G(2)=demand(2)-W(2), G(3)=demand(3)-W(3) ==> it is a feasible solution ==> it is a bad feasible solutionObjective function: not all solutions are equivalent!
  • 15. Ok! Summary ?Production cost: Hcost * (H0+H1+H2+H3) + Ncost * (N0+N1+N2+N3) + Gcost * (G0+G1+G2+G3) + Wcost* (W0+W1+W2+W3)Nb: Cost does not only mean $. Cost means ecological & environmental costs as well.
  • 16. Quizz !So we have:x0,x1,x2,x3: states at time t0, t1, t2, t3.x0 is given, x1, x2, x3 depend on our decisions.Some decisions are chosen at time t0.Some decisions are chosen at time t1.Some decisions are chosen at time t2.Some decisions are chosen at time t3.The cost depends on all decisions.Is this a supervised learning problem ?Is this a reinforcement learning problem ?Is this a boring problem ?
  • 17. Ok! Summary ?So we have equations.If we know W(1),W(2),W(3), we can evaluate the production cost.We want to: - solve equations - minimize production costProblem: we dont know W(1), W(2), W(3).How to know ?
  • 18. Ok! Summary ? We want to know W(1), W(2), W(3). Steps: (1) Weather simulation: we predict the wind at time steps t1 t2 t3 (as in classical weather forecast) (2) From the wind forecast, predict the power (e.g. “black box” model): Based on data E.g. mean-square error Predicting W(1), W(2), W(3): Boring problem ? Supervised learning problem ? Reinforcement learning problem ?
  • 19. Ok! Summary ? We want to know W(1), W(2), W(3). Steps: (1) Weather simulation: we predict What does the wind at time steps t1 t2 t3 (as in classical box” “black weather forecast) mean ? (2) From the wind forecast, predict the power (e.g. “black box” model): Based on data E.g. mean-square error Predicting W(1), W(2), W(3): Boring problem ? Supervised learning problem ? Reinforcement learning problem ?
  • 20. Difficulties ?In many cases, you will see in your life as an engineer that:- collecting datas and models is a big part of the work- solving the problem exactly is impossible- what really matters in an application is to find where the current codes are not satisfactory, and not to spend time on other aspects
  • 21. Typical questions for this application Many constraints / effects are missing ! (for the real application, we must have far more constraints...)
  • 22. Typical questions for this application Many constraints / effects are missing !Mean square (for the real error in the application, supervised we must have far learning for moreW1,W2,W3 ? constraints...) But .......... ................ .................
  • 23. Typical questions for this application Many constraints / effects are missing ! (for the real How many timeMean square application, steps in the future error in the we must have far should supervised more we consider ? learning forW1,W2,W3 ? constraints...) But .......... ................ .................
  • 24. Typical questions for this application Many constraints / effects are missing ! (for the real How many time Mean square application, steps in the future error in the we must have far should supervised more we consider ? learning for W1,W2,W3 ? constraints...) But .......... ................ ................. We should penalizecases with W4 small !
  • 25. Typical questions for this application Many constraints / effects are missing ! (for the real How many time Mean square application, steps in the future error in the we must have far should supervised more we consider ? learning for W1,W2,W3 ? constraints...) But .......... ................ In case of long ................. term: should we We should consider “climate change” penalizecases with W4 bias ? small !
  • 26. Some of these points Typical questions for are important, some are negligible, this application depending on the system Many constraints / under analysis. effects are missing ! (for the real How many time Mean square application, steps in the future error in the we must have far should supervised more we consider ? learning for W1,W2,W3 ? constraints...) But .......... ................ In case of long ................. term: should we We should consider “climate change” penalizecases with W4 bias ? small !
  • 27. Another beautiful applicationThis is Paris.Beautiful town.With plenty of people(10 millions in IDF).
  • 28. Another beautiful applicationThis is Paris.Beautiful town.With plenty of people(10 millions in IDF).Producing plenty of fecalmatter ==> dirty water.
  • 29. Our river in Paris is the “Seine”. A French politician said he would soon swim across it.After all, he never did it. For your health, dont do it. Nevertheless, we try to keep it as clean as possible.
  • 30. Dirty water should be separated from the Seine.And usually it is.Something like this: Seine Dirty water
  • 31. Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine SeineDirtywater
  • 32. No typhoon in France.But we can have heavy rains/winds in Paris:- 0.96 dm in 24 hours happened in 1987.- gusts at 169 km/h in 1999 (very unusual in France) Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine Seine Dirty water (yes, in Taiwan it is more impressive, sometimes it is 16.7 dm in 24 hours and gusts can reach 250 km/h...)
  • 33. No typhoon in France.But we can have heavy rains/winds in Paris:- 0.96 dm in 24 hours happened in 1987.- gusts at 169 km/h in 1999 (very unusual in France) Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine Seine Dirty water (yes, in Taiwan it is more impressive, sometimes it is 16.7 dm in 24 hours and gusts can reach 250 km/h...)
  • 34. No typhoon in France.But we can have heavy rains/winds in Paris:- 0.96 dm in 24 hours happened in 1987.- gusts at 169 km/h in 1999 (very unusual in France) Problem: if big rainfalls reach dirty water, then dirty water might pollute the Seine Seine Dirty water → Seine!(yes, in Taiwan it is more impressive, sometimes it is 16.7 dm in 24 hours and gusts can reach 250 km/h...)
  • 35. Another beautiful applicationThree water networks:- dirty water: should go to cleaning stations- clean water: can go to the Seine, but cant be drunk- drinkable water (France: tap water = drinkable)
  • 36. Big water networkDirty Dirty Dirty Dirtywater water water waterClean Clean Clean Cleanwater water water water
  • 37. Water vs dirty waterChallenge:Summer storms.Not comparable to a Taiwanese typhoon.But a lot of water.Can make dirty water become very big.Can invade clean water.Your mission:- Get read of dirty water- Protect clean water
  • 38. Water vs dirty waterState: level in each stock, valves status (open or closed)At each time step, rainfalls(i) liters of water reach stock i. you can open or close valves ==> get a new state.Your mission: - Get read of dirty water - Protect clean water
  • 39. Water vs dirty water Typically:(0, 1, 0, 0, 0, 1, 0, 1, 0.42, 0.2, 0.0, 0.8, 0.3) (valves) (stock levels)Plenty of rules:- if (valve 4 opens, then water from stock 1 goes to stock 2 at rate 0.02m3/s)- if (stock[2]>0.3) then dirty water ==> Seine, 3 0.1m /s==> Miminize the quantity of dirty water in clean stocks at the end of the storm
  • 40. Water vs dirty water D-dimensional vectorEquations:Stocks(t+1) = complicatedFunction(Stocks(t), rainfalls(t), valves(t)) D-dimensional vector(D=number of stocks) D-dimensional V-dimensional vector vector (D=number of stocks) (V=number of valves)
  • 41. Water vs dirty water To be decided:valves(t) for each t If there are 240 times steps, we get 240 x V decision V-dimensional vector variables(V=number of valves)Criterion = objective function = quantity of dirtywater reaching the clean network + quantity ofdirty water in the river
  • 42. Shrinking horizonToo many time steps!At each time step, make a decisionusing only 30 time steps.Move this window of 30 time steps.
  • 43. Shrinking horizonToo many time steps!At each time step, make a decisionusing only 30 time steps.Move this window of 30 time steps.
  • 44. Shrinking horizonToo many time steps!At each time step, make a decisionusing only 30 time steps.Move this window of 30 time steps.
  • 45. Shrinking horizonToo many time steps!At each time step, make a decisionusing only 30 time steps.Move this window of 30 time steps.
  • 46. Shrinking horizonmoving window of 30 time steps
  • 47. Summary ? Is this:- an optimization problem ?- a reinforcement learning problem ?- a supervised machine learning problem ?
  • 48. Summary ? Is this:- an optimization problem ?- a reinforcement learning problem ?- a supervised machine learning problem ?Problem: rainfalls are unknown.
  • 49. How to predict rainfalls ?In fact, there are distinct rainfalls: - R1: a spatial distribution of rainfalls (one number per time step per point of the map) - R2: a underground list of rainfall arrivals (inflows), per stocks (D-dimensional)Input data: - weather forecast of archive ( R1(t) for each t) - archives of weather forecast R1(t) - archives of inflows R2(t)
  • 50. If your life was depending on it, whatwould you do ?
  • 51. If your life was depending on it, what would you do ?We are at time t.We need a forecaster:- which takes available data as input- and outputs R2(t) for t>=t (why not for t < t ?)
  • 52. If your life was depending on it, what would you do ?We are at time t.We need a forecaster:- which takes available data as input- and outputs R2(t) for t>=t (why not for t < t ?)(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30)) =?
  • 53. If your life was depending on it, what would you do ?We are at time t.We need a forecaster:- which takes available data as input- and outputs R2(t) for t>=t (why not for t < t ?)(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30)) = f( R1(t) ) ?
  • 54. If your life was depending on it, what would you do ?We are at time t.We need a forecaster:- which takes available data as input- and outputs R2(t) for t>=t (why not for t < t ?)(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30)) = f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50) ) (because there are delays)
  • 55. If your life was depending on it, what would you do ?We are at time t.We need a forecaster:- which takes available data as input- and outputs R2(t) for t>=t (why not for t < t ?)(R2(t+1),R2(t+2),R2(t+3), .... , R2(t+30)) = f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) ) (because “what you get is what you have”)
  • 56. If your life was depending on it, what would you do ? = f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )Why ?
  • 57. If your life was depending on it, what would you do ? = f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )Because less parameters.
  • 58. If your life was depending on it, what would you do ? = f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )Because less parameters.Rule of thumb: number of parameters less than number of data points / 20 <=== why ?
  • 59. If your life was depending on it, what would you do ? = f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )Because less parameters.Rule of thumb: number of parameters less than number of data points / 20 <=== why ?How to choose between all these models ?
  • 60. If your life was depending on it, what would you do ? = f( R1(t), R1(t-1), R1(t-2), R1(t-3), R1(t-4), ..., R1(t-50), R2(t) )and then agregation: = f( R1(t), R1(t-1)+R1(t-2), R1(t-3)+R1(t-4)+R1(t-5)+R1(t-6), +..., R2(t) )Because less parameters.Rule of thumb: number of parameters less than number of data points / 20 <=== why ?How to choose between all these models ? Cross-validation.
  • 61. Main weakness of this analysis ?The same as in the previous application.We predicted R2(t), R2(t+1), ....Then we maximize cleanness based on these forecasts.But there are huge uncertainties.
  • 62. Main weakness of this analysis ?This is often done in real world. No change on theOften, we do not spend time optimization algorithm on checking that the consequencesare minor. (we are just pessimistic in the forecasts)“Cheap” solutions (do not take too much time): - predicting a quantile (do you know how ?) instead of a conditional expectation and check on simulations - predicting a conditional expectation + moments (do you know how ?) Then, optimize on average(slight change in the objective function)
  • 63. What about an exact solution ?The exact solution is much harder to implement.We can use forecasts with moments.Then, we get a MDP.Then, this is reinforcement learning.- simple: forecasting + optimizing- a bit more complex: pessimistic forecasting + optimizing- more complex: forecasting with moments + optimizing on average or optimizing a quantile (“value at risk”)- advanced: full reinforcement learning model
  • 64. What about an exact solution ?The best choice depends on the precision of your model,the budget you have.Some problems involve billions of US $ and have precise models.Then, each percent of improvement represents more money than all your professional life. Then, you can (must) implement something very advanced.Sometimes, model are very imprecise.Then, optimizing at 0.001% is meaningless. Improving the modelis more important.- simple: forecasting + optimizing- a bit more complex: pessimistic forecasting + optimizing- more complex: forecasting with moments + optimizing on average or optimizing a quantile (“value at risk”)- advanced: full reinforcement learning model
  • 65. What do you think ?Did you understand ?1) Applications of machine learning - wind power forecasting (important e.g. for PengHu island!) - rainfalls estimation2) Some key words (you must know what they mean): - black box / white box - shrinking horizon - objective function - “what you get is what you have” - model complexity - cross-validation - generative model - quantile, value-at-risk ===> olivier.teytaud@inria.fr

×