Applying Data Science
to
Government Services
Data Science
• Extracting knowledge or insights from data
– in various forms, structured or unstructured
• Utilizes data preparation, statistics, predictive
modeling and Machine Learning
• Applied to various domains
– Discovering new cures, Improving science research
– Optimizing supply chains and delivery routes
– Reducing traffic congestions, Optimizing energy grids
– Forecasting weather, Improving sports performance
– Improving security and reducing spam
– Targeted marketing, personalization, churn prediction
© Harbinger Systems | www.harbinger-systems.com
8-Levels of Analytics (SAS)
© Harbinger Systems | www.harbinger-systems.com
Information Strategy (Gartner)
• Enterprise Information Management
– Information is everywhere & growing
– Volume, Variety & Velocity
– Drive innovation in rapid information processing
• Information Strategy
– Harness the power of information assets
– Drive growth, improve efficiency
• Data Analytics – Strategic decision making
– Insights from your large and complex datasets
– Predict future behaviors, trends and outcomes
© Harbinger Systems | www.harbinger-systems.com
Machine Learning (ML)
A type of Artificial Intelligence that provides computers with
ability to learn without being explicitly programmed.
– Computer can infer rules inherent in data
– Computer adapts when exposed to new data
• (Tom Mitchell ) - A computer program is set to learn from an
experience E with respect to some task T and some performance
measure P if its performance on T as measured by P improves
with experience E
• Automating Automata
© Harbinger Systems | www.harbinger-systems.com
What’s a Machine Learning Problem?
© Harbinger Systems | www.harbinger-systems.com
Emphasis of machine learning is on
automatic methods
Devise learning algorithms that do the
learning automatically without human
intervention
Program by example: we don't care what
the machine does, as long as it does it
right
Result-oriented rather than process-
oriented
How can Machine Learning Add Value?
© Harbinger Systems | www.harbinger-systems.com
ML is a data driven approach
• Business knowledge isn’t necessary
ML is domain independent
• Same algorithms can be used across domains and in different use cases
ML creates flexible decision systems
• Creates robust systems that can adjust for changing systems without
human intervention
ML and Big Data
ML thrives with big data!
– Accuracy of algorithms increases with size of data
– Statistical approaches can treat big datasets much better than
traditional paradigms
– Decision making using ML can adapt to transactional data much better
© Harbinger Systems | www.harbinger-systems.com
Machine Learning Big Data
Fraud Detection: Did the user really do this login/make this purchase?
Product Recommendation: Will the user like this product?
Stock Trading: Will the stock go up or down?
Medical Diagnosis: Given some symptoms, what is the patient
suffering from
© Harbinger Systems | www.harbinger-systems.com
Machine Learning Applications
© Harbinger Systems | www.harbinger-systems.com
How to Categorize the Problem?
Generally, machine learning problems looks to:
Identify a Value
Assign data points to a category
Discover similarities between two data points
© Harbinger Systems | www.harbinger-systems.com
Flowchart
Start
Sufficient
Data?
Sort into
category?
Predict a
value?
Define Problem!
Labeled
Data
Clustering
Classification
Get more!
Regression
© Harbinger Systems | www.harbinger-systems.com
What to look for in algorithms:
Flexible across many use cases
Able to handle several input types
Accurate
Resistant to over-fitting/noise/error
Machine Learning Algorithms
© Harbinger Systems | www.harbinger-systems.com
Random Forest
Used for classification and regression
Works on small subsets of data and combines the result into the best estimate
XGBoost
Works on classification and regression
Starts off with a weak learner that improves over successive iterations
K-Means
Works on classification and clustering
Tries to find boundaries between data points for each individual variable
Machine Learning Algorithms
© Harbinger Systems | www.harbinger-systems.com
Tools and Technologies
Emphasis on tools which
Can integrate with existing data architecture
Have a smooth learning curve
Simplify the process of analysis and prediction
Have an active community
© Harbinger Systems | www.harbinger-systems.com
Popular Machine Learning Tools
Python
Free, open-source, widely popular
Consolidates many important libraries in python, C
Has an active community
Disclaimer: Brand names, logos and trademarks used herein remain the property of their respective owners.
© Harbinger Systems | www.harbinger-systems.com
Popular Machine Learning Tools
R
Statistical computing language that simplifies complex
statistical operations
Large number of libraries available for extending
functionality (DB connectors, algorithm, visualization)
Disclaimer: Brand names, logos and trademarks used herein remain the property of their respective owners.
Open Data and Gov Services
© Harbinger Systems | www.harbinger-systems.com
• Open data, tools and resources available
• ~181K datasets
Sample Applications
• City-Data provides detailed profiles of all U.S. cities -
demographics, crime rates, home values, cost of
living, etc.
• Farmers can use Climate Corporation’s services to
plan, manage, and protect crops
• SPOT Crime : Free public facing crime mapping and
alert website
© Harbinger Systems | www.harbinger-systems.com
Conclusion
• Harness the power of your data to deliver
higher value services and remain competitive
–“Data is the currency of the future” –
Michael Cockrill, CIO State of WA
• Machine learning provides a powerful
framework for extracting insights
© Harbinger Systems | www.harbinger-systems.com
Thank You
© Harbinger Systems | www.harbinger-systems.com

Application of Data Science in Government Services – IPMA Forum 2016 Speaker Session

  • 1.
  • 2.
    Data Science • Extractingknowledge or insights from data – in various forms, structured or unstructured • Utilizes data preparation, statistics, predictive modeling and Machine Learning • Applied to various domains – Discovering new cures, Improving science research – Optimizing supply chains and delivery routes – Reducing traffic congestions, Optimizing energy grids – Forecasting weather, Improving sports performance – Improving security and reducing spam – Targeted marketing, personalization, churn prediction © Harbinger Systems | www.harbinger-systems.com
  • 3.
    8-Levels of Analytics(SAS) © Harbinger Systems | www.harbinger-systems.com
  • 4.
    Information Strategy (Gartner) •Enterprise Information Management – Information is everywhere & growing – Volume, Variety & Velocity – Drive innovation in rapid information processing • Information Strategy – Harness the power of information assets – Drive growth, improve efficiency • Data Analytics – Strategic decision making – Insights from your large and complex datasets – Predict future behaviors, trends and outcomes © Harbinger Systems | www.harbinger-systems.com
  • 5.
    Machine Learning (ML) Atype of Artificial Intelligence that provides computers with ability to learn without being explicitly programmed. – Computer can infer rules inherent in data – Computer adapts when exposed to new data • (Tom Mitchell ) - A computer program is set to learn from an experience E with respect to some task T and some performance measure P if its performance on T as measured by P improves with experience E • Automating Automata © Harbinger Systems | www.harbinger-systems.com
  • 6.
    What’s a MachineLearning Problem? © Harbinger Systems | www.harbinger-systems.com Emphasis of machine learning is on automatic methods Devise learning algorithms that do the learning automatically without human intervention Program by example: we don't care what the machine does, as long as it does it right Result-oriented rather than process- oriented
  • 7.
    How can MachineLearning Add Value? © Harbinger Systems | www.harbinger-systems.com ML is a data driven approach • Business knowledge isn’t necessary ML is domain independent • Same algorithms can be used across domains and in different use cases ML creates flexible decision systems • Creates robust systems that can adjust for changing systems without human intervention
  • 8.
    ML and BigData ML thrives with big data! – Accuracy of algorithms increases with size of data – Statistical approaches can treat big datasets much better than traditional paradigms – Decision making using ML can adapt to transactional data much better © Harbinger Systems | www.harbinger-systems.com Machine Learning Big Data
  • 9.
    Fraud Detection: Didthe user really do this login/make this purchase? Product Recommendation: Will the user like this product? Stock Trading: Will the stock go up or down? Medical Diagnosis: Given some symptoms, what is the patient suffering from © Harbinger Systems | www.harbinger-systems.com Machine Learning Applications
  • 10.
    © Harbinger Systems| www.harbinger-systems.com How to Categorize the Problem? Generally, machine learning problems looks to: Identify a Value Assign data points to a category Discover similarities between two data points
  • 11.
    © Harbinger Systems| www.harbinger-systems.com Flowchart Start Sufficient Data? Sort into category? Predict a value? Define Problem! Labeled Data Clustering Classification Get more! Regression
  • 12.
    © Harbinger Systems| www.harbinger-systems.com What to look for in algorithms: Flexible across many use cases Able to handle several input types Accurate Resistant to over-fitting/noise/error Machine Learning Algorithms
  • 13.
    © Harbinger Systems| www.harbinger-systems.com Random Forest Used for classification and regression Works on small subsets of data and combines the result into the best estimate XGBoost Works on classification and regression Starts off with a weak learner that improves over successive iterations K-Means Works on classification and clustering Tries to find boundaries between data points for each individual variable Machine Learning Algorithms
  • 14.
    © Harbinger Systems| www.harbinger-systems.com Tools and Technologies Emphasis on tools which Can integrate with existing data architecture Have a smooth learning curve Simplify the process of analysis and prediction Have an active community
  • 15.
    © Harbinger Systems| www.harbinger-systems.com Popular Machine Learning Tools Python Free, open-source, widely popular Consolidates many important libraries in python, C Has an active community Disclaimer: Brand names, logos and trademarks used herein remain the property of their respective owners.
  • 16.
    © Harbinger Systems| www.harbinger-systems.com Popular Machine Learning Tools R Statistical computing language that simplifies complex statistical operations Large number of libraries available for extending functionality (DB connectors, algorithm, visualization) Disclaimer: Brand names, logos and trademarks used herein remain the property of their respective owners.
  • 17.
    Open Data andGov Services © Harbinger Systems | www.harbinger-systems.com • Open data, tools and resources available • ~181K datasets
  • 18.
    Sample Applications • City-Dataprovides detailed profiles of all U.S. cities - demographics, crime rates, home values, cost of living, etc. • Farmers can use Climate Corporation’s services to plan, manage, and protect crops • SPOT Crime : Free public facing crime mapping and alert website © Harbinger Systems | www.harbinger-systems.com
  • 19.
    Conclusion • Harness thepower of your data to deliver higher value services and remain competitive –“Data is the currency of the future” – Michael Cockrill, CIO State of WA • Machine learning provides a powerful framework for extracting insights © Harbinger Systems | www.harbinger-systems.com
  • 20.
    Thank You © HarbingerSystems | www.harbinger-systems.com

Editor's Notes

  • #2 Applying data science to gain insights, improve efficiency and deliver higher value services. What skillsets, technologies and practices are required to deliver the best value? What you will learn What do you do with the data? What skillsets do you need in order to use the data? How to map data analytics to deliver higher value services and gain efficiencies?
  • #4 Retrospective analysis Dashboarding - Real-time processing Prediction #8 Optimization: How do we do things better? E.g. price optimization, markdown optimization and size optimization
  • #5 Big data forces you to wrestle with key strategic and operational challenges Find new ways to leverage information sources to drive growth improve your strategic decision making? You need to know which investments will deliver the most business value and ROI Are there new expectations for information quality and management Known, Known Unknowns and Unknown Unknowns (Insights)
  • #6 Tom Mitchell – Professor at the Carnegie Mellon University Automating Automata
  • #8 Adjusts for large amount of data
  • #11 Product Recommendation
  • #12 Regressional Analysis - regression analysis helps one understand how the typical value of the dependent variable (or 'criterion variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed
  • #14 XGBoost is an optimized distributed gradient boosting system designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework http://dmlc.cs.washington.edu/xgboost.html K-Means - k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells (Lloyd's algorithm, also known as Voronoi iteration )
  • #18 https://www.data.gov/impact/ U.S. Postal Service was one of the early pioneers in implementing machine learning at a large scale – Reading postal addresses Fishing services Population Health Management Agriculture Crime mapping Education
  • #21 `