SlideShare a Scribd company logo
1 of 24
NFIRS DATA
ANALYSIS
William Hall
Sourabh Gujar
Preface
 Interning as Database Analyst Research
 Reporting directly to William Hall
 Pursuing Masters in Information Systems from
DePaul University
 Tools used:
 Oracle SQL Developer
 Tableau
 Microsoft’s Power BI
 Delimit
IN A NUTSHELL, WHAT DO I GET
PAID FOR?
 Analyze the data generated from National Fire
Incident Reporting System from the past 15
years.
 Firing queries in Sequential Query Language
 Generating easy-to-read graphs from derived
results
 Sitting in front of a computer for hours
Asking the big question:
WHAT IS NFIRS?
 The National Fire Incident Reporting System
is an information system initiated and
supported by the U.S. Fire Administration.
They developed NFIRS as means of
assessing the nature and scope of the fire
problem in the U.S.
NFIRS in numbers
 Year started
 Fire departments
participating
 Number of well
years documented
 Number of modules
for each year
 Number of incidents
per year
 1976
 23,000
 15 (2000-2014)
 11
 More than 2 million
The Bibles
DATA SOURCE
GOT TO CATCH THEM ALL!
X 15EACH OF APPROXIMATELY 1.60 GB
https://www.usfa.fema.gov/data/statistics/order_download_data.html#tools
FUN FACTS ABOUT DATA SIZE
 Harry Potter
Number of words per book:
The Philosopher's Stone - 76,944
The Chamber of Secrets - 85,141
The Prisoner of Azkaban - 107,253
The Goblet of Fire - 190,637
The Order of the Phoenix - 257,045
The Half-Blood Prince - 168,923
The Deathly Hallows - Approximately
198,227
 Each of these books range from 2
MB to 3 MB
 NFRIS
Basic Incident Table 2 Million records
Fire Incident Table 600,000 records
Incident address Table 2 Million
 The above mentioned files are of 150
MB- 300 MB each.
 For the years 2000- 2014 total size:
 1.60 GB * 15 years = 24 GB
 5 Million * 15 years = 75 Million
records
All the above calculations are based on approximate values
THE PROCESS:
GETTING & CONVERTING DATA
 Original Format
 Converted format
 Converter used
 DBF (Database File)
 CSV (Comma
separated values)
 Delimit
DATA CLEANSING
 The process made sure we had the right set of
data at our disposal with minimum variance
and abnormalities.
 It was mostly performed on known set of
variables. For instance there are 50 states,
although the in the data there 58 states
abbreviated as 0, 5, 8 or any random number
for that matter. These are the anomalies which
need to be filtered out.
FIRING QUERIES:
 In layman’s term:
Number of fire incidents which took place in the
United States for the past 15 years where the
Automatic Extinguishing System failed.
SELECT COUNT(DISTINCT INC_NO) AS NUMBER_OF_INCIDENTS, STATE FROM(
SELECT INC_NO, to_date(to_char(INC_DATE,'09099999'),'MM/DD/YYYY') as
IncidentDate, STATE, FDID, AES_OPER
FROM FIREINCIDENT2010
WHERE
STRUC_TYPE='1'
AND BLDG_ABOVE >'0'
AND STRUC_STAT= '2'
AND AES_PRES ='1' OR AES_PRES ='2' --Present AND Partial System Present
GROUP BY INC_NO, to_date(to_char(INC_DATE,'09099999'),'MM/DD/YYYY'),
state, FDID,AES_OPER
ORDER BY to_date(to_char(INC_DATE,'09099999'),'MM/DD/YYYY') ASC
)
WHERE
AES_OPER ='2' OR- -AES Partially worked
AES_OPER ='4' OR—AES failed
AES_OPER ='0' OR
AES_OPER ='U'
group by STATE
ORDER BY STATE;
GENERATING GRAPHS
Exporting generated graphs from Oracle SQL
Developer into Tableau, an application for
generating interactive graphs.
Schedule
Class/Week Topic
1 Completing formalities
2 Getting acquainted with the work
3 Reading through the Bibles
4 Getting your hands dirty with Oracle SQL Developer
5 Getting dirtier
6 Generating graphs
7 Trying new methodologies for graph generation
8 Presentation preparation
9 Getting a solid grasp of the role
10 Appreciation and sad feeling
Experience at Portland Cement
Association
• It’s an honor to serve for an organization which is into
existence for the past 100 years.
• Office culture at it’s best; weekly intern meetings
adding a lot of value to the experience.
• Inputs from various sources and stories which made
it big adds a lot to motivational factor.
• As an international student it’s a great opportunity to
delve deeper into office culture.
• Making new connections has never been easier.
• All praises for the free food during breakfast
meetings!
NOT MY FAVORITE
PART, ALTHOUGH I
AM OPEN TO
QUESTIONS!

More Related Content

Similar to Internship_Presentation

Internshipreport_SourabhGujar
Internshipreport_SourabhGujarInternshipreport_SourabhGujar
Internshipreport_SourabhGujarSourabh Gujar
 
Databases.ppt
Databases.pptDatabases.ppt
Databases.pptWebsite30
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用mysqlops
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightEduardo Castro
 
Air Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and PredictionsAir Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and PredictionsCarlo Carandang
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
HR ABAP Technical Overview | http://sapdocs.info/
HR ABAP Technical Overview | http://sapdocs.info/HR ABAP Technical Overview | http://sapdocs.info/
HR ABAP Technical Overview | http://sapdocs.info/sapdocs. info
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringRick Hwang
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Summit
 
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...Amazon Web Services
 
Portfolio For Charles Tontz
Portfolio For Charles TontzPortfolio For Charles Tontz
Portfolio For Charles Tontzctontz
 
A Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software DefectsA Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software DefectsChetan Hireholi
 
Airline Delays  Your task is to create an application that.docx
Airline Delays  Your task is to create an application that.docxAirline Delays  Your task is to create an application that.docx
Airline Delays  Your task is to create an application that.docxnettletondevon
 

Similar to Internship_Presentation (20)

Internshipreport_SourabhGujar
Internshipreport_SourabhGujarInternshipreport_SourabhGujar
Internshipreport_SourabhGujar
 
Databases_Presentation.ppt
Databases_Presentation.pptDatabases_Presentation.ppt
Databases_Presentation.ppt
 
Data Management and Databases.ppt
Data Management and Databases.pptData Management and Databases.ppt
Data Management and Databases.ppt
 
Databases.ppt
Databases.pptDatabases.ppt
Databases.ppt
 
eBay EDW元数据管理及应用
eBay EDW元数据管理及应用eBay EDW元数据管理及应用
eBay EDW元数据管理及应用
 
SQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsightSQL Server 2008 R2 StreamInsight
SQL Server 2008 R2 StreamInsight
 
Air Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and PredictionsAir Pollution in Nova Scotia: Analysis and Predictions
Air Pollution in Nova Scotia: Analysis and Predictions
 
BITS_RAVE
BITS_RAVEBITS_RAVE
BITS_RAVE
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
HR ABAP Technical Overview | http://sapdocs.info/
HR ABAP Technical Overview | http://sapdocs.info/HR ABAP Technical Overview | http://sapdocs.info/
HR ABAP Technical Overview | http://sapdocs.info/
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Amazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and MonitoringAmazon CloudWatch - Observability and Monitoring
Amazon CloudWatch - Observability and Monitoring
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
Spark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike FreedmanSpark Streaming and IoT by Mike Freedman
Spark Streaming and IoT by Mike Freedman
 
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
AWS re:Invent 2016: IoT Blueprints: Optimizing Supply for Smart Agriculture f...
 
Resume
ResumeResume
Resume
 
Portfolio For Charles Tontz
Portfolio For Charles TontzPortfolio For Charles Tontz
Portfolio For Charles Tontz
 
A Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software DefectsA Machine Learning approach to predict Software Defects
A Machine Learning approach to predict Software Defects
 
Airline Delays  Your task is to create an application that.docx
Airline Delays  Your task is to create an application that.docxAirline Delays  Your task is to create an application that.docx
Airline Delays  Your task is to create an application that.docx
 

Internship_Presentation

  • 2. Preface  Interning as Database Analyst Research  Reporting directly to William Hall  Pursuing Masters in Information Systems from DePaul University  Tools used:  Oracle SQL Developer  Tableau  Microsoft’s Power BI  Delimit
  • 3. IN A NUTSHELL, WHAT DO I GET PAID FOR?  Analyze the data generated from National Fire Incident Reporting System from the past 15 years.  Firing queries in Sequential Query Language  Generating easy-to-read graphs from derived results  Sitting in front of a computer for hours
  • 4. Asking the big question: WHAT IS NFIRS?  The National Fire Incident Reporting System is an information system initiated and supported by the U.S. Fire Administration. They developed NFIRS as means of assessing the nature and scope of the fire problem in the U.S.
  • 5. NFIRS in numbers  Year started  Fire departments participating  Number of well years documented  Number of modules for each year  Number of incidents per year  1976  23,000  15 (2000-2014)  11  More than 2 million
  • 8. GOT TO CATCH THEM ALL! X 15EACH OF APPROXIMATELY 1.60 GB https://www.usfa.fema.gov/data/statistics/order_download_data.html#tools
  • 9. FUN FACTS ABOUT DATA SIZE  Harry Potter Number of words per book: The Philosopher's Stone - 76,944 The Chamber of Secrets - 85,141 The Prisoner of Azkaban - 107,253 The Goblet of Fire - 190,637 The Order of the Phoenix - 257,045 The Half-Blood Prince - 168,923 The Deathly Hallows - Approximately 198,227  Each of these books range from 2 MB to 3 MB  NFRIS Basic Incident Table 2 Million records Fire Incident Table 600,000 records Incident address Table 2 Million  The above mentioned files are of 150 MB- 300 MB each.  For the years 2000- 2014 total size:  1.60 GB * 15 years = 24 GB  5 Million * 15 years = 75 Million records All the above calculations are based on approximate values
  • 11. GETTING & CONVERTING DATA  Original Format  Converted format  Converter used  DBF (Database File)  CSV (Comma separated values)  Delimit
  • 12. DATA CLEANSING  The process made sure we had the right set of data at our disposal with minimum variance and abnormalities.  It was mostly performed on known set of variables. For instance there are 50 states, although the in the data there 58 states abbreviated as 0, 5, 8 or any random number for that matter. These are the anomalies which need to be filtered out.
  • 13. FIRING QUERIES:  In layman’s term: Number of fire incidents which took place in the United States for the past 15 years where the Automatic Extinguishing System failed.
  • 14. SELECT COUNT(DISTINCT INC_NO) AS NUMBER_OF_INCIDENTS, STATE FROM( SELECT INC_NO, to_date(to_char(INC_DATE,'09099999'),'MM/DD/YYYY') as IncidentDate, STATE, FDID, AES_OPER FROM FIREINCIDENT2010 WHERE STRUC_TYPE='1' AND BLDG_ABOVE >'0' AND STRUC_STAT= '2' AND AES_PRES ='1' OR AES_PRES ='2' --Present AND Partial System Present GROUP BY INC_NO, to_date(to_char(INC_DATE,'09099999'),'MM/DD/YYYY'), state, FDID,AES_OPER ORDER BY to_date(to_char(INC_DATE,'09099999'),'MM/DD/YYYY') ASC ) WHERE AES_OPER ='2' OR- -AES Partially worked AES_OPER ='4' OR—AES failed AES_OPER ='0' OR AES_OPER ='U' group by STATE ORDER BY STATE;
  • 15. GENERATING GRAPHS Exporting generated graphs from Oracle SQL Developer into Tableau, an application for generating interactive graphs.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. Schedule Class/Week Topic 1 Completing formalities 2 Getting acquainted with the work 3 Reading through the Bibles 4 Getting your hands dirty with Oracle SQL Developer 5 Getting dirtier 6 Generating graphs 7 Trying new methodologies for graph generation 8 Presentation preparation 9 Getting a solid grasp of the role 10 Appreciation and sad feeling
  • 22. Experience at Portland Cement Association • It’s an honor to serve for an organization which is into existence for the past 100 years. • Office culture at it’s best; weekly intern meetings adding a lot of value to the experience. • Inputs from various sources and stories which made it big adds a lot to motivational factor. • As an international student it’s a great opportunity to delve deeper into office culture. • Making new connections has never been easier. • All praises for the free food during breakfast meetings!
  • 23.
  • 24. NOT MY FAVORITE PART, ALTHOUGH I AM OPEN TO QUESTIONS!

Editor's Notes

  1. Beginning course details and/or books/materials needed for a class/project.
  2. A schedule design for optional periods of time/objectives.