Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
How to program your way into data science?
1. How to program your way into Data
Science?
Eeshan Chatterjee
Data Scientist @ MediaIQ Digital
https://in.linkedin.com/in/eeshanchatterjee
www.github.com/EeshanChatterjee
2. What is Data?
Google Definition:
● Facts and statistics collected together for reference or analysis.
● The quantities, characters, or symbols on which operations are performed by a computer, which may be stored
and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.
● Things known or assumed as facts, making the basis of reasoning or calculation.
Umm... OK. But what is data in the business world?
Lets simplify the entire thing.
If you can Observe it, Record it, Store it and Measure it, It's gonna help your business.
This is the data that is important to you.
3. What data does my business
generate?
Each and every department, right from the CEO's Office, to the janitorial division collects data.
Stored!
People Data
Sales Data
Customer Satisfaction Data
Industrial Production
& Wastage Data Travel Data
Energy Data
5. The Basics
How did we arrive at Data Science?
Measure KPIs
Model Key Metrics
Operations
Research
The Era of
Business Intelligence
Dashboards
Frequent Updates
Business Analytics
The Era of
Data Science
Cockpits
Distributed
Computation
Federated Data
Intelligent Systems
Guess What didn't Change: Help Business make Better Decisions!
The Era of
Statistical Insight
6. The Basics
If it's always been the same core job, can a statistician call himself a Data Scientist?
Well... Not exactly. Today the job has diversified, demanding a wider skillset!
Data Design
Architect
DataEngineer
Requirement/Business
Analyst
Math &
Statistics
Business
&
Domain
Tech &
Computer
Science
DESIGNTHINKING
}
7. But.. Programming for Everything?
Actually, Yes. Let's look at a popular cheatsheet circulating on the internet.
Infographic courtesy: http://nirvacana.com/thoughts/becoming-a-data-scientist/
Guess what, We
can't tick off 15%
of this checklist
without
programming!
8. Programming for Math
Scripting
Language
Packages
Data
Structures
Notebooks &
Markdown
Plotting
Techniques
Classes &
Functions
Cross-
Language
Execution
The Algo Whiz Codebook
● Choose your scripting
language. R & Python
are the popular chioces.
● Use what's out there.
Prebuilt packages for
almost every technique
are freely available for
use.
● Interactive plots cut
down EDA time by a
huge margin.
9. R or Python?
The holy grail of data science choices! It is indeed difficult to choose between the two.
Their capabilities are pretty much the same. So, Which one do I choose?*
Choose R When Choose Python When
● You are begining to
explore your data
● You are looking to find
one-time insight or
developing analysis
methodology
● You want to try out a
broad spectrum of
techniques to find best
ensembles to use
● You have a good
understanding of the
data and techniques you
want to use
● You want to deploy your
analysis methodology
as a persistant large-
scale production system
● You want to train deep
models on GPUs
* This one is based on my experience and opinion. It has worked for me.
The next person you ask, will have a different take on the matter.
10. Programming for Tech
Data Platforms
Ingestion &
Management
Services
JAVA
Distribution &
Scale
Hadoop, Yarn,
Scala, JADE...
JAVA
Efficient
Processing
Low level
Subroutines
C++
GPGPU & Large
Scale ML
CUDA, OpenGL,
MPI
C/C++
The Scale-Out Toolbox
● C++ and JAVA form the
backbone of almost
every at-scale data
system
● Most NoSQL &
NewSQL databases are
based on Java
● Large scale machine
learning with millions of
data points most
certainly need GPU
scale processing.
11. Programming for the Business
Image courtesy: http://exposedata.com/tutorial/canvas/
The Decision-Maker's
Cockpit
● Interactive charts allow
answering of business
questions intuitive.
● Real time updates allow
decisions based on the
latest information
available.
● Bird's eye and drill down
capabilities allow for
multiple perspectives
without losing context.
12. Design Thinking and Programming
Design Thinking let's you break down and analyse the problem and synthesize the best solution
from multiple solutions possible.
At-Scale
Solution
Desired
Future
State
Complication 1
Roadblock 2
Issue 3
Possible
Solution 1
Possible
Solution 2
Possible
Solution 3
Possible
Solution 4
Prototype
Solution 4
Prototype
Solution 3
Prototype
Solution 2
Prototype
Solution 1
Consumption
Current
State
Define | Ideate | Prototype | Iterate | Develop | Deploy