pg. 1
BIG DATA FOR MANAGEMENT
MBA 748A
Department of Industrial and Management Engineering
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
DETAILED PROJECT PROPOSAL
POWER PLANT INTELLIGENT MAINTENANCE
SYSTEM
BY:
SHIVAM GUPTA -16125039
MAHENDRA KUMAR-16114016
pg. 2
A. INTRODUCTION
There have been several technological breakthroughs in power generation since its advent in
1882 when the first thermal power plant was set up by Thomas Edison in England. Since than
power technology has both expanded in fuel sources and power production methods. This
complex ecosystem involved Fuel Production Units (eg. CIL), Logistics for Fuel, Power Producers
(e.g. Adani Power), Power Equipment Suppliers (eg L&T), Grids (eg. PGCIL), DISCOMs (e.g.
Torrent Power), Users, Financial Institutions (e.g. PFC) and Environmental Protection Institutions
(eg. NGT).
The problem which we try to address here is that of Power Plant Maintenance. Power Plant has
number of critical equipment which are necessary for smooth operations of power plant. As power
plant ages, the failure rate in these equipment increases leading to continuous power plant shut
downs. The problems of the system are sometimes unrecognizable and needs OEM’s expertise
who go through stored operational data in power plant servers to recognize the problems.
Services of these critical part service engineers like that of STG & Boiler package varies almost
from Rs 10000 to 50000 per day and since they are only informed when situation is grave hence
it takes almost 4to 7 days in case of domestic and 30-40 days in case of international sites. Other
factors also matter such as support labor and spares availability.
Power Plant operation involves continuous monitoring of over 1000+ data which is being relayed
directly from sensors on critical equipment. These are skillfully monitored by power plant operation
engineer who varies air, fuel, water etc. on basis of transmitter readings and experience for getting
required power output.
But due to wear & tear in moving parts, some of the sensor readings become bad and if operator
is unable to recognize reason behind these bad readings, it might lead to equipment breakdown.
Further, it might be possible that bad pattern of system readings keeps on repeating itself which
is also difficult to recognize. And even if pattern is recognized, the damage to design life cannot
be predicted.
The objective of this project proposal is to design Power Plant Intelligent Maintenance
System (IMS) which has following aims:
1. Preventive Maintenance for avoiding complete maintenance
2. Reduce Power Plant Breakdowns to almost zero.
pg. 3
B. DATA FLOW DIAGRAM
Fig1. Data Flow Diagram for Power Plant Intelligent Maintenance System
Auto Calibrate Inputs using
Historical GOOD DATA Points
such that Expected Plant
life>=Design Life
POWER PLANT
CONFIRMATION
FOR MAINTENANCE
ANALYSIS FOR
EXPECTED LIFEOF
PARTS OF
EQUIPMENT
PRIMARY INPUTS
ERP Levels ofSpares
/Consumables Inventory
YES
SECONDARYINPUTS
HistoricTemperature,Pressurereadings etc. for
Equipment from OEM
Design Life Period
ANALYSIS FOR
PREVENTIVE
MAINTENANCE
POWER PLANTSHUTDOWN
FOR MAINTENANCE
Original Equipment
Manufacturer
EQUIPMENTCONTROL
PANELS
(Equipment Secondary
Data embedded in
Control loops)
PRIMARYINPUTS
1. EquipmentTemperature
2. Equipment Pressure
3. Equipment Vibration
4. RPM
5. EquipmentMaterial Flow
Rate
6. Motor Currentand
Voltage
7. Operator Inputs
8. Time
GOODBAD
SECONDARY INPUTS
From OEMSpecification
1. Operating Temperature
Range
2. Operating PressureRange
3. Operating Vibration Range
4. Operating RPMand Flow
5. Maximum Current &Voltage
Readings
6. Weather
COMPARSION OF
PrimaryData with
Secondary Data
DATA
ACQUISITION
USING
TRANSMITTER
S/SENSORS
PRE-PROCESSING DATA
AND ANALYSIS
ANALYSIS OF
BAD POINTS
REPORT
GENERATION
Is Frequency ofBAD
POINTS occurring
beyond Acceptable
?
Is ExpectedLife
< DesignLife of
Equipment?
Is Spares
Available?
YES
STORAGEOF PRIMARY
INPUTS EQUIPMENTWISEON
SERVERS
pg. 4
C. WHAT DATA WILL YOU REQUIRE FOR YOUR
APPLICATION?
Data Type of
Data
Style of Data Source of Data Volume Data Req
Weather Secondary Structured Meteorological
Dept./Forecasting
Agencies
Large
Volume
6TB/day
Safe Running
Parameters of
Equipment
1. Operating
Temperature
Range
2. Operating
Pressure Range
3. Operating
Vibration Range
4. Operating RPM
and Flow
5. Maximum Current
& Voltage
Readings
Secondary Structured OEM Technical Data
Sheet
Small
Volume
50 GB
Equipment Readings
1. Equipment
Temperature
2. Equipment
Pressure
3. Equipment
Vibration
4. RPM
5. Equipment Material
Flow Rate
6. Motor Current and
Voltage
7. Time
Primary Structured Equipment
Transmitters/Sensors
Large
Volume
4-5 TB/day
Operator/OEM
Maintenance Engineer
Calibration Inputs
Primary Semi-
Structured
Power Plant Servers Large
Volume
1 TB/day
Spares Inventory Secondary Structured ERP Database Large
Volume
500
GB/day
Historic Data for
Equipment
Secondary Structured Original Equipment
Manufacturer
Medium
Volume
40TB-
50TB/ 6
month
pg. 5
D. HOW WILL YOU INGEST THE DATA?
Data Ingestion takes in to the account whether data has volume, variety and velocity. The data
considered above has following attributes:
 Volume:
a. Data for weather will require large storage space.
b. Equipment safe running parameter range of temperature, pressure and other readings for
equipment’s from OEM will require very less volume as data set is fixed
c. While data for Equipment Temperature, Equipment Pressure, Equipment Vibration, RPM,
Equipment Material flow rate, Motor current and voltage, Operator inputs & Time requires
large amount of storage space since it is continuously being relayed at different refresh
rates. (Assuming refresh rate of 1 per sec)
d. Equipment Historical Data can be taken from OEM itself or can be bought from other
power plant using same equipment. The data is of small volume.
e. Inventory Data is collected from SAP system and is of small volume.
 Velocity:
a. Data for weather is continuously being relayed by Meteorological with high refresh rates.
b. Equipment safe running parameter range of temperature, pressure and other readings for
equipment’s from OEM will be fixed data set
c. While data for Equipment Temperature, Equipment Pressure, Equipment Vibration, RPM,
Equipment Material flow rate, Motor current and voltage, Operator inputs & Time is also
being relayed continuously from local equipment transmitters. Refresh rate is very high in
these cases.
d. Equipment Historical Data can be taken from OEM itself and is a fixed data which can be
bought once in 6 months.
e. Inventory Data is collected from SAP system is also being relayed continuously for
equipment but with (4-5 refresh rates per day).
 Variety
a. Variety of Data is mostly structured in form of temperature, pressure readings form local
transmitters/sensors etc. and also historical data bought from OEM/other power plants.
b. Only semi-structured data is input from operator which can give multiple type of inputs for
power plant operations and hence classified as semi-structured data.
We can use any of Apache framework for ingestion of structured and unstructured data which are
capable for handling huge volume of data.
E. WHERE AND HOW WILL YOU STORE IT?
Since the data storage requirement is limited, therefore we should use distributed node storage
architecture.
The database would be stored in cloud since it requires access from multiple agencies and cloud
storage is best for this purpose. We can use Google Cloud TM
storage.
pg. 6
F. HOW WILL YOU GET IT? WILL YOU NEED LEGAL
PERMISSIONS?
Power Plant operational data is already being stored in internal servers of power plants. For this
purpose, we need to take legal permission of power producers to store and process the same on
cloud and providing them insight into their operating data.
We also need to consider weather data which can be bought from National Data Centre and also
airline data is open source which can be taken from sources such as openflight.com
Historical data needs to be taken for design life from OEM or data can also be taken from other
power plant servers who use same equipment. Since they are direct beneficiaries from this
process we can sell them the data for marginal profit and get the data from them
G. HOW WILL YOU ENSURE QUALITY OF THE
DATA?
Parameters of Data Qualities are as follows:
i. Accuracy
Power Plant operators uses the day to day power plant data for monitoring healthiness of the
system. The data provided by these critical equipment’s is highly accurate. Further to increase
reliability of readings we can increase no of measuring devices. Also, we need to have similar unit
for e.g. Pressure in ATA and bar is not comparable.
ii. Timeliness
The sensors installed on critical equipment’s in power plants varies from a signal refresh rate of
1 per second to 1 per 5min
iii. Completeness
Data suitable for operations might not be suitable for taking decision regarding effective life of the
equipment. We need historical data for operations from OEM/ other power plant for calculating
life against it. There can be missing data due to different refresh rate for a data set for making
decision.
We will use Talend TM
Data Quality software package for end-to-end data profiling and monitoring
improves the completeness, accuracy and integrity of data, so you have more confidence in the
decisions you make.
It has following features:
1. Advanced data profiling
2. Customizable assessment
3. Graphical charts with drilldown data
4. Fully open source
pg. 7
H. HOW WILL SCALE UP DATA VOLUME, VARIETY
AND VELOCITY?
A scalable application platform not only accommodates rapid growth in traffic and data volume,
variety and velocity (scaling up) but also adapts to decreases in demand (scaling down).
The data volume can be scaled up when users of smaller equipment suppliers starts integrating
with this application to have insight of their product usage.
Also, if we need to reliability in determining age of equipment, we will need a lot larger data from
OEMs.
.
I. WHAT TECHNOLOGY WILL BE USED FOR YOUR
APPLICATION? JUSTIFY YOUR CHOICE.
Following technologies are being used at different stages of data flow:
1. Data Collection:
Local Transmitters and Sensors: These are used for collecting equipment
running data
2. Data Ingestion
Apache Kafka can be used for capturing streaming data sets
3. Data Storage:
We can use Google Cloud TM
storage
4. Data Analytics
Apache Hadoop, MapReduce
We will be using Apache Hadoop databases. Since, we are using both structured and semi-
structured data hence we would need application for converting unstructured data (visa
regulations for different countries) to structured data (days required for visa process) and process
the same in Apache Hadoop databases.
The Apache Hadoop software library is a framework that allows for the distributed processing of
large data sets across clusters of computers using simple programming models. It is designed to
scale up from single servers to thousands of machines, each offering local computation and
storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to
detect and handle failures at the application layer, so delivering a highly-available service on top
of a cluster of computers, each of which may be prone to failures.
pg. 8
J. WHAT WILL BE THE BUDGET OF YOUR
PROJECT?
The cost of handling data depends upon the cost of storage, maintaining databases, software,
hardware and data security. The average cost of the project would be $1000/TB of data.
Data which needs to be handled - 10TB-15TB/day
Cost of Project - $15000/day
Cost for 450 Industrial Power Plants - $5.47 million/year/user
Indian Coal Fired PP (Capacity) -188967.88 MW
Days which can be saved per year due to -5 days out of 10 days forced shutdown
Forced shutdown
Loss of Revenue -188967.88MW*1000*(3600*24*5)* Rs 1.5/70
-$ 1.7 trillion
Since the losses of revenue due to forced shutdown are huge hence the power producers will
choose big data application for arranging maintenance engineer and advisory function for
prevention of forced Shutdowns.
K. REFERENCES
1. “All India Installed Capacity (IN MW) OF Power Stations”, CEA, Ministry of Power,
GOI.
http://www.cea.nic.in/reports/monthly/installedcapacity/2016/installed_capacity-12.pdf
2. “DataProcessing,Product GenerationandDistributionat the NWS National Centers
for Environmental Prediction”, NCEP
https://www.nist.gov/sites/default/files/documents/itl/ssd/is/Big-Data_NCEP.pdf
3. “Big data storage architecture:Categories, strengths and use cases”, Phil Goodwin,
Search Storage.
http://searchstorage.techtarget.com/feature/Big-data-storage-architecture-Categories-
strengths-and-use-cases
4. “Data Quality Concepts | Data Quality Tutorial”, Data Warehousing Tutorial, Edureka.
https://www.youtube.com/watch?v=HWaBdqmTqEA
pg. 9
5. “List of Coal Based Thermal Power Stations in India up to 2016”, ENVIS Centre on
Flyash, Ministry of Environment, Forest & Climate Change, GOI
http://cbrienvis.nic.in/Thermal%20Power%20Station%20in%20India%202016.pdf
6. “How Hadoop cuts big data costs”, Jeff Bertolucci, Information Week.
http://www.informationweek.com/software/how-hadoop-cuts-big-data-costs/d/d-
id/1105546
7. https://www.talend.com/download/talend-open-studio/
L. CONTRIBUTORS
1. SHIVAM GUPTA
R. No.-16125039
MBA, DIME
IIT KANPUR
1. MAHENDRA KUMAR
R. No.-16114016
M.Tech, DIME
IIT KANPUR
SPECIAL THANKS TO
ALOK TRIVEDI
Asst. Manager
Isgec Heavy Engineering Ltd
Noida-20130

Power plant intelligent maintenance advisory system

  • 1.
    pg. 1 BIG DATAFOR MANAGEMENT MBA 748A Department of Industrial and Management Engineering INDIAN INSTITUTE OF TECHNOLOGY KANPUR DETAILED PROJECT PROPOSAL POWER PLANT INTELLIGENT MAINTENANCE SYSTEM BY: SHIVAM GUPTA -16125039 MAHENDRA KUMAR-16114016
  • 2.
    pg. 2 A. INTRODUCTION Therehave been several technological breakthroughs in power generation since its advent in 1882 when the first thermal power plant was set up by Thomas Edison in England. Since than power technology has both expanded in fuel sources and power production methods. This complex ecosystem involved Fuel Production Units (eg. CIL), Logistics for Fuel, Power Producers (e.g. Adani Power), Power Equipment Suppliers (eg L&T), Grids (eg. PGCIL), DISCOMs (e.g. Torrent Power), Users, Financial Institutions (e.g. PFC) and Environmental Protection Institutions (eg. NGT). The problem which we try to address here is that of Power Plant Maintenance. Power Plant has number of critical equipment which are necessary for smooth operations of power plant. As power plant ages, the failure rate in these equipment increases leading to continuous power plant shut downs. The problems of the system are sometimes unrecognizable and needs OEM’s expertise who go through stored operational data in power plant servers to recognize the problems. Services of these critical part service engineers like that of STG & Boiler package varies almost from Rs 10000 to 50000 per day and since they are only informed when situation is grave hence it takes almost 4to 7 days in case of domestic and 30-40 days in case of international sites. Other factors also matter such as support labor and spares availability. Power Plant operation involves continuous monitoring of over 1000+ data which is being relayed directly from sensors on critical equipment. These are skillfully monitored by power plant operation engineer who varies air, fuel, water etc. on basis of transmitter readings and experience for getting required power output. But due to wear & tear in moving parts, some of the sensor readings become bad and if operator is unable to recognize reason behind these bad readings, it might lead to equipment breakdown. Further, it might be possible that bad pattern of system readings keeps on repeating itself which is also difficult to recognize. And even if pattern is recognized, the damage to design life cannot be predicted. The objective of this project proposal is to design Power Plant Intelligent Maintenance System (IMS) which has following aims: 1. Preventive Maintenance for avoiding complete maintenance 2. Reduce Power Plant Breakdowns to almost zero.
  • 3.
    pg. 3 B. DATAFLOW DIAGRAM Fig1. Data Flow Diagram for Power Plant Intelligent Maintenance System Auto Calibrate Inputs using Historical GOOD DATA Points such that Expected Plant life>=Design Life POWER PLANT CONFIRMATION FOR MAINTENANCE ANALYSIS FOR EXPECTED LIFEOF PARTS OF EQUIPMENT PRIMARY INPUTS ERP Levels ofSpares /Consumables Inventory YES SECONDARYINPUTS HistoricTemperature,Pressurereadings etc. for Equipment from OEM Design Life Period ANALYSIS FOR PREVENTIVE MAINTENANCE POWER PLANTSHUTDOWN FOR MAINTENANCE Original Equipment Manufacturer EQUIPMENTCONTROL PANELS (Equipment Secondary Data embedded in Control loops) PRIMARYINPUTS 1. EquipmentTemperature 2. Equipment Pressure 3. Equipment Vibration 4. RPM 5. EquipmentMaterial Flow Rate 6. Motor Currentand Voltage 7. Operator Inputs 8. Time GOODBAD SECONDARY INPUTS From OEMSpecification 1. Operating Temperature Range 2. Operating PressureRange 3. Operating Vibration Range 4. Operating RPMand Flow 5. Maximum Current &Voltage Readings 6. Weather COMPARSION OF PrimaryData with Secondary Data DATA ACQUISITION USING TRANSMITTER S/SENSORS PRE-PROCESSING DATA AND ANALYSIS ANALYSIS OF BAD POINTS REPORT GENERATION Is Frequency ofBAD POINTS occurring beyond Acceptable ? Is ExpectedLife < DesignLife of Equipment? Is Spares Available? YES STORAGEOF PRIMARY INPUTS EQUIPMENTWISEON SERVERS
  • 4.
    pg. 4 C. WHATDATA WILL YOU REQUIRE FOR YOUR APPLICATION? Data Type of Data Style of Data Source of Data Volume Data Req Weather Secondary Structured Meteorological Dept./Forecasting Agencies Large Volume 6TB/day Safe Running Parameters of Equipment 1. Operating Temperature Range 2. Operating Pressure Range 3. Operating Vibration Range 4. Operating RPM and Flow 5. Maximum Current & Voltage Readings Secondary Structured OEM Technical Data Sheet Small Volume 50 GB Equipment Readings 1. Equipment Temperature 2. Equipment Pressure 3. Equipment Vibration 4. RPM 5. Equipment Material Flow Rate 6. Motor Current and Voltage 7. Time Primary Structured Equipment Transmitters/Sensors Large Volume 4-5 TB/day Operator/OEM Maintenance Engineer Calibration Inputs Primary Semi- Structured Power Plant Servers Large Volume 1 TB/day Spares Inventory Secondary Structured ERP Database Large Volume 500 GB/day Historic Data for Equipment Secondary Structured Original Equipment Manufacturer Medium Volume 40TB- 50TB/ 6 month
  • 5.
    pg. 5 D. HOWWILL YOU INGEST THE DATA? Data Ingestion takes in to the account whether data has volume, variety and velocity. The data considered above has following attributes:  Volume: a. Data for weather will require large storage space. b. Equipment safe running parameter range of temperature, pressure and other readings for equipment’s from OEM will require very less volume as data set is fixed c. While data for Equipment Temperature, Equipment Pressure, Equipment Vibration, RPM, Equipment Material flow rate, Motor current and voltage, Operator inputs & Time requires large amount of storage space since it is continuously being relayed at different refresh rates. (Assuming refresh rate of 1 per sec) d. Equipment Historical Data can be taken from OEM itself or can be bought from other power plant using same equipment. The data is of small volume. e. Inventory Data is collected from SAP system and is of small volume.  Velocity: a. Data for weather is continuously being relayed by Meteorological with high refresh rates. b. Equipment safe running parameter range of temperature, pressure and other readings for equipment’s from OEM will be fixed data set c. While data for Equipment Temperature, Equipment Pressure, Equipment Vibration, RPM, Equipment Material flow rate, Motor current and voltage, Operator inputs & Time is also being relayed continuously from local equipment transmitters. Refresh rate is very high in these cases. d. Equipment Historical Data can be taken from OEM itself and is a fixed data which can be bought once in 6 months. e. Inventory Data is collected from SAP system is also being relayed continuously for equipment but with (4-5 refresh rates per day).  Variety a. Variety of Data is mostly structured in form of temperature, pressure readings form local transmitters/sensors etc. and also historical data bought from OEM/other power plants. b. Only semi-structured data is input from operator which can give multiple type of inputs for power plant operations and hence classified as semi-structured data. We can use any of Apache framework for ingestion of structured and unstructured data which are capable for handling huge volume of data. E. WHERE AND HOW WILL YOU STORE IT? Since the data storage requirement is limited, therefore we should use distributed node storage architecture. The database would be stored in cloud since it requires access from multiple agencies and cloud storage is best for this purpose. We can use Google Cloud TM storage.
  • 6.
    pg. 6 F. HOWWILL YOU GET IT? WILL YOU NEED LEGAL PERMISSIONS? Power Plant operational data is already being stored in internal servers of power plants. For this purpose, we need to take legal permission of power producers to store and process the same on cloud and providing them insight into their operating data. We also need to consider weather data which can be bought from National Data Centre and also airline data is open source which can be taken from sources such as openflight.com Historical data needs to be taken for design life from OEM or data can also be taken from other power plant servers who use same equipment. Since they are direct beneficiaries from this process we can sell them the data for marginal profit and get the data from them G. HOW WILL YOU ENSURE QUALITY OF THE DATA? Parameters of Data Qualities are as follows: i. Accuracy Power Plant operators uses the day to day power plant data for monitoring healthiness of the system. The data provided by these critical equipment’s is highly accurate. Further to increase reliability of readings we can increase no of measuring devices. Also, we need to have similar unit for e.g. Pressure in ATA and bar is not comparable. ii. Timeliness The sensors installed on critical equipment’s in power plants varies from a signal refresh rate of 1 per second to 1 per 5min iii. Completeness Data suitable for operations might not be suitable for taking decision regarding effective life of the equipment. We need historical data for operations from OEM/ other power plant for calculating life against it. There can be missing data due to different refresh rate for a data set for making decision. We will use Talend TM Data Quality software package for end-to-end data profiling and monitoring improves the completeness, accuracy and integrity of data, so you have more confidence in the decisions you make. It has following features: 1. Advanced data profiling 2. Customizable assessment 3. Graphical charts with drilldown data 4. Fully open source
  • 7.
    pg. 7 H. HOWWILL SCALE UP DATA VOLUME, VARIETY AND VELOCITY? A scalable application platform not only accommodates rapid growth in traffic and data volume, variety and velocity (scaling up) but also adapts to decreases in demand (scaling down). The data volume can be scaled up when users of smaller equipment suppliers starts integrating with this application to have insight of their product usage. Also, if we need to reliability in determining age of equipment, we will need a lot larger data from OEMs. . I. WHAT TECHNOLOGY WILL BE USED FOR YOUR APPLICATION? JUSTIFY YOUR CHOICE. Following technologies are being used at different stages of data flow: 1. Data Collection: Local Transmitters and Sensors: These are used for collecting equipment running data 2. Data Ingestion Apache Kafka can be used for capturing streaming data sets 3. Data Storage: We can use Google Cloud TM storage 4. Data Analytics Apache Hadoop, MapReduce We will be using Apache Hadoop databases. Since, we are using both structured and semi- structured data hence we would need application for converting unstructured data (visa regulations for different countries) to structured data (days required for visa process) and process the same in Apache Hadoop databases. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
  • 8.
    pg. 8 J. WHATWILL BE THE BUDGET OF YOUR PROJECT? The cost of handling data depends upon the cost of storage, maintaining databases, software, hardware and data security. The average cost of the project would be $1000/TB of data. Data which needs to be handled - 10TB-15TB/day Cost of Project - $15000/day Cost for 450 Industrial Power Plants - $5.47 million/year/user Indian Coal Fired PP (Capacity) -188967.88 MW Days which can be saved per year due to -5 days out of 10 days forced shutdown Forced shutdown Loss of Revenue -188967.88MW*1000*(3600*24*5)* Rs 1.5/70 -$ 1.7 trillion Since the losses of revenue due to forced shutdown are huge hence the power producers will choose big data application for arranging maintenance engineer and advisory function for prevention of forced Shutdowns. K. REFERENCES 1. “All India Installed Capacity (IN MW) OF Power Stations”, CEA, Ministry of Power, GOI. http://www.cea.nic.in/reports/monthly/installedcapacity/2016/installed_capacity-12.pdf 2. “DataProcessing,Product GenerationandDistributionat the NWS National Centers for Environmental Prediction”, NCEP https://www.nist.gov/sites/default/files/documents/itl/ssd/is/Big-Data_NCEP.pdf 3. “Big data storage architecture:Categories, strengths and use cases”, Phil Goodwin, Search Storage. http://searchstorage.techtarget.com/feature/Big-data-storage-architecture-Categories- strengths-and-use-cases 4. “Data Quality Concepts | Data Quality Tutorial”, Data Warehousing Tutorial, Edureka. https://www.youtube.com/watch?v=HWaBdqmTqEA
  • 9.
    pg. 9 5. “Listof Coal Based Thermal Power Stations in India up to 2016”, ENVIS Centre on Flyash, Ministry of Environment, Forest & Climate Change, GOI http://cbrienvis.nic.in/Thermal%20Power%20Station%20in%20India%202016.pdf 6. “How Hadoop cuts big data costs”, Jeff Bertolucci, Information Week. http://www.informationweek.com/software/how-hadoop-cuts-big-data-costs/d/d- id/1105546 7. https://www.talend.com/download/talend-open-studio/ L. CONTRIBUTORS 1. SHIVAM GUPTA R. No.-16125039 MBA, DIME IIT KANPUR 1. MAHENDRA KUMAR R. No.-16114016 M.Tech, DIME IIT KANPUR SPECIAL THANKS TO ALOK TRIVEDI Asst. Manager Isgec Heavy Engineering Ltd Noida-20130