Bigdata -> Data Science -> AI,
and some $$$ in between
DNA’s journey in data science & big data
prologue of prologue
you have to have an idea
THE IDEA
ALL OF THE DATA
WE HAVE
PROFIT
some data
some data
some data some data
some data
some data
some data
some
report
some
report
some
report
some
report
some
report
some
report
some
report
ONE SOURCE OF
TRUTH
+
CUSTOMER FIRST
+
AUTOMATE ALL
THE THINGS
WTF?
PROFIT
?activities?
who
cares
webdata?
who cares
Agenda
Prologue:
The big thing(s)
The four things of analytics ~ the roadmap on how to do those things
Achievements
Whats inside: AWS good stuff & hype & love
Culture stuff
Upcoming
prologue
The BIG THING(s)
1. Business: it was the omnichannel customer
a. the ever-more-demanding, influential and independent customer
b. rise of need for analytical insight & data
c. demanding inf. management and analytics to be operational, not
finance-driven
d. stop sub-optimizing the system (customer)
2. Tech: it was cloud, open-source, and data science
a. suddenly - endless scale & processing power
b. reduced time-to-environment from weeks to minutes
c. reduced cost
d. ability to create intelligent data products that reduce time-to-insight and
time-to-action
hard for humans data science, machine learning data engineering, data pipelines
easy for humans AI / NLP reporting, basic calculus
hard for machines easy for machines
System requirements
- Infinite scale
- Process 10’000++ messages per sec
- Automated deploy & tests
- Version control
- Pay-for-use, not for-licence
- Real-time pipeline, disaster recovery, exactly-once-quarantees
- Real-time analytics, sub-second latency for everything
- Infinite processing power for data science stuff & large analytical deployments
- Array of libraries to make the data scientist’s life easier
- Modular, i can change any part of it, being that software or hardware
- Secure, EU referendums and Safe Harbour etc.
- Pipeline and persistent storage & data platform can be done from scratch to
production in 6 months
- Cant cost really anything, since had to scrape a small budget. 3 developers max.
OKAY! SOUNDS FAIR.
Business requirements
- Understand the omnichannel customer
- Reduce churn
- Increase cross-sales
- Increase product usage & increase retention
- Increase marketing ROI
- Insight should be real-time
- Actions should be near-real-time and everyone can do them
- Know where to put infrastructure better than before
- Make sense of unstructured data & text & speech & so forth
- Automate 80% of insight / data that was previously done by hand
- Your system shall not cost anything
- But it should deliver competitive advantage
OKAY! SOUNDS FAIR.
WHAT WOULD MACGYVER DO?
WHAT WOULD MACGYVER DO?
WOULD HE:
a) go and buy a licence and servers
and then wait around
b) build the damn thing from what
he happens to find with zero cost
WHAT WOULD MACGYVER DO?
YES!
b) build the damn thing from what he
happens to find with zero cost
Achievements & upcoming
Done (within a year):
Assisted investments & business (1) operations:
xx-xxx mil. / year
Directly optimized / machine learning (2) -handled
operations: x-xx mil. / year
Machine learning* & Data Science introduced
Marketing efforts from weeks to minutes
Automation from 10% to 80%
Conversion on direct channels up from 50 to 300
percent
Amount of automated & personalized channels
from 1 to 5 (all)
One source of truth & self-made
-> we know how it works
Ability to handle all types of data
Upcoming 2017:
Artifical intelligence (AI)*
Chatbots (AI)
“Acquistion” of display advertising
Understanding speech (AI)
Moving from CPU to GPU
DNA.FI fully personalized (w/ new concept)
* Data Science -> Machine Learning -> Artifical Intelligence
whats inside
code! (surprise)
clojure
python
c++
tensorflow
syntaxnet
spark
scala
sql
postgres
redshift
ec2
R
random forest
s3
jenkins
ansible
cnn / rnn / lstm
jupyter
aerospike
kafka
snowplow
scikit learn
matplotlib
als
k means
mllib
numpy, pandas, scipy
… etc
COLLECT
real-time
batch
omnichannel
COMBINE
digital to brick n mortar
digital to everything
context to everything
customer to everything
COMPUTE
recommendations
analysis
reports
segments
predictions
descriptions
next best actions
customer journey
EXECUTE
churn prevention
cross-sales
targeted marketing
customer service efficiency
customer experience improvement
omnichannel optimization
react in real time
product development
CONTROL
continuous deployment
infrastructure as code
Customer interface layer
Channel layer
Delivery layer
Data / Machine
learning layer
Collecting layer
realtime 1.3T batch ~ 100gb
-> to redshift, we load 5’511’649’731 rows
Why redshift?
reporting on top of raw data;
17’072’941 rows joined to 110’773’366 rows
joined to 24’945’364 rows joined to 2’297’076
rows joined to 1’841’262 rows + some
dimensions and result returned in < 10 sec
-> no db-admins, no indexes, no “tuning”
Class: TV, Liiga
Rank: 0.87, 0.90
What happens in social
media? What is talked
about?
What’s
wrong?
from reporting sales to
reporting potential
(and the ways of going
from potential to sales)
R is still goooood.
And jupyter.
ALS recommendations /w
1.3 T data = good
1 0 1 1 0 1 0 0 1 1 1 0 1
ALS recommendations /w
1.3 T data = good
1 0 1 1 0 1 0 0 1 1 1 0 1
culture stuff
more important than you’d think
http://www.slideshare.net/reed2001/
culture-1798664/
http://www.slideshare.net/reed2001
/culture-1798664/
MacGyver (remember?, what would MacGyver
do) = The thinker-doer
- Usually development methods split thinkers (project managers, scrum managers,
product owners and the lot) with doers (developers, analysts)
- This is (mostly) shit
- You’d need people leading who also know their stuff
- Saves money, time and nerves
- People communicate better
- Thinker-doers can communicate with business and translate to development
actions, even develop the things themselves
Demos & openness = The secret sauce to
success (and freedom to do more stuff)
- We sit on the “business floor”, right in between of basically everyone
- And we almost always have something displayed on a screen
- We make it easy to come and talk to us
- We make demos available to everyone
- We connect
- This makes all the difference
always connected
kindergarten - no output but
loads of fun
if done right, ultimate success
forced connection
(procedures!)
basic IT waterfall project basic IT “agile” project
never connected cave-people? chaos
nothing changes (or we close
our eyes that it does)
everything changes
all-the-time
business - IT alignment
Bigdata/
AI
Business
Directors* are doing
their own marketing
automation activities
without any help
*ping Solita, how many directors code...
And now, we have business even writing their
own code! (no, really)
upcoming
1st try: word2vec + naive bayes
2nd try: convolutional neural net
3rd try: LSTM/RNN
4th try: syntaxnet
5th “try”: -> include speech recognition
6th try: spaCy
7th try, part I: latent dirichlet allocation
8th try: ?
Nth try: ?
Now?
in a good place. can’t fully disclose what we’re running though. :)
basically we can understand both speech and written natural language so that the
language can “flow” and it can be in a chat context or in longer formats;
ex:
- hi do you happen to have iPhones on stock?
- yea!
- cool. what’s the price? <- have to link to previous parts of conversation
NB! this is quite simple in English but tear-your-eyes-off-to-scratch-your-brain* -hard
with Finnish. we might be the first ones actually there.
*modified from: Friends, 1995, The One with the Baby on the Bus
Lessons learned
Understand the BIG THINGS (cloud, open source, omnichannel customer, data science, time-to-x)
Sit where business sits. And sit together. DO STUFF TOGETHER.
Don’t use project managers who can’t code (or who are not really good in the subject domain).
Apply advanced analytics to automate 80% of small decisions made all the time.
Continuous communication beats meetings. Don’t meet.
At least start with AI. dont just tweet about that shit.
DNA - Einstein - Data science ja bigdata

DNA - Einstein - Data science ja bigdata

  • 1.
    Bigdata -> DataScience -> AI, and some $$$ in between DNA’s journey in data science & big data
  • 2.
    prologue of prologue youhave to have an idea
  • 4.
    THE IDEA ALL OFTHE DATA WE HAVE PROFIT some data some data some data some data some data some data some data some report some report some report some report some report some report some report ONE SOURCE OF TRUTH + CUSTOMER FIRST + AUTOMATE ALL THE THINGS WTF? PROFIT ?activities? who cares webdata? who cares
  • 6.
    Agenda Prologue: The big thing(s) Thefour things of analytics ~ the roadmap on how to do those things Achievements Whats inside: AWS good stuff & hype & love Culture stuff Upcoming
  • 7.
  • 8.
    The BIG THING(s) 1.Business: it was the omnichannel customer a. the ever-more-demanding, influential and independent customer b. rise of need for analytical insight & data c. demanding inf. management and analytics to be operational, not finance-driven d. stop sub-optimizing the system (customer) 2. Tech: it was cloud, open-source, and data science a. suddenly - endless scale & processing power b. reduced time-to-environment from weeks to minutes c. reduced cost d. ability to create intelligent data products that reduce time-to-insight and time-to-action
  • 9.
    hard for humansdata science, machine learning data engineering, data pipelines easy for humans AI / NLP reporting, basic calculus hard for machines easy for machines
  • 10.
    System requirements - Infinitescale - Process 10’000++ messages per sec - Automated deploy & tests - Version control - Pay-for-use, not for-licence - Real-time pipeline, disaster recovery, exactly-once-quarantees - Real-time analytics, sub-second latency for everything - Infinite processing power for data science stuff & large analytical deployments - Array of libraries to make the data scientist’s life easier - Modular, i can change any part of it, being that software or hardware - Secure, EU referendums and Safe Harbour etc. - Pipeline and persistent storage & data platform can be done from scratch to production in 6 months - Cant cost really anything, since had to scrape a small budget. 3 developers max. OKAY! SOUNDS FAIR.
  • 11.
    Business requirements - Understandthe omnichannel customer - Reduce churn - Increase cross-sales - Increase product usage & increase retention - Increase marketing ROI - Insight should be real-time - Actions should be near-real-time and everyone can do them - Know where to put infrastructure better than before - Make sense of unstructured data & text & speech & so forth - Automate 80% of insight / data that was previously done by hand - Your system shall not cost anything - But it should deliver competitive advantage OKAY! SOUNDS FAIR.
  • 12.
  • 13.
    WHAT WOULD MACGYVERDO? WOULD HE: a) go and buy a licence and servers and then wait around b) build the damn thing from what he happens to find with zero cost
  • 14.
    WHAT WOULD MACGYVERDO? YES! b) build the damn thing from what he happens to find with zero cost
  • 15.
    Achievements & upcoming Done(within a year): Assisted investments & business (1) operations: xx-xxx mil. / year Directly optimized / machine learning (2) -handled operations: x-xx mil. / year Machine learning* & Data Science introduced Marketing efforts from weeks to minutes Automation from 10% to 80% Conversion on direct channels up from 50 to 300 percent Amount of automated & personalized channels from 1 to 5 (all) One source of truth & self-made -> we know how it works Ability to handle all types of data Upcoming 2017: Artifical intelligence (AI)* Chatbots (AI) “Acquistion” of display advertising Understanding speech (AI) Moving from CPU to GPU DNA.FI fully personalized (w/ new concept) * Data Science -> Machine Learning -> Artifical Intelligence
  • 16.
  • 17.
    code! (surprise) clojure python c++ tensorflow syntaxnet spark scala sql postgres redshift ec2 R random forest s3 jenkins ansible cnn/ rnn / lstm jupyter aerospike kafka snowplow scikit learn matplotlib als k means mllib numpy, pandas, scipy … etc
  • 19.
    COLLECT real-time batch omnichannel COMBINE digital to brickn mortar digital to everything context to everything customer to everything COMPUTE recommendations analysis reports segments predictions descriptions next best actions customer journey EXECUTE churn prevention cross-sales targeted marketing customer service efficiency customer experience improvement omnichannel optimization react in real time product development CONTROL continuous deployment infrastructure as code
  • 20.
    Customer interface layer Channellayer Delivery layer Data / Machine learning layer Collecting layer
  • 21.
    realtime 1.3T batch~ 100gb -> to redshift, we load 5’511’649’731 rows
  • 22.
    Why redshift? reporting ontop of raw data; 17’072’941 rows joined to 110’773’366 rows joined to 24’945’364 rows joined to 2’297’076 rows joined to 1’841’262 rows + some dimensions and result returned in < 10 sec -> no db-admins, no indexes, no “tuning”
  • 23.
    Class: TV, Liiga Rank:0.87, 0.90 What happens in social media? What is talked about?
  • 24.
    What’s wrong? from reporting salesto reporting potential (and the ways of going from potential to sales)
  • 25.
    R is stillgoooood. And jupyter.
  • 26.
    ALS recommendations /w 1.3T data = good 1 0 1 1 0 1 0 0 1 1 1 0 1
  • 27.
    ALS recommendations /w 1.3T data = good 1 0 1 1 0 1 0 0 1 1 1 0 1
  • 29.
    culture stuff more importantthan you’d think
  • 30.
  • 31.
  • 32.
    MacGyver (remember?, whatwould MacGyver do) = The thinker-doer - Usually development methods split thinkers (project managers, scrum managers, product owners and the lot) with doers (developers, analysts) - This is (mostly) shit - You’d need people leading who also know their stuff - Saves money, time and nerves - People communicate better - Thinker-doers can communicate with business and translate to development actions, even develop the things themselves
  • 33.
    Demos & openness= The secret sauce to success (and freedom to do more stuff) - We sit on the “business floor”, right in between of basically everyone - And we almost always have something displayed on a screen - We make it easy to come and talk to us - We make demos available to everyone - We connect - This makes all the difference
  • 34.
    always connected kindergarten -no output but loads of fun if done right, ultimate success forced connection (procedures!) basic IT waterfall project basic IT “agile” project never connected cave-people? chaos nothing changes (or we close our eyes that it does) everything changes all-the-time business - IT alignment
  • 35.
  • 36.
    Directors* are doing theirown marketing automation activities without any help *ping Solita, how many directors code...
  • 37.
    And now, wehave business even writing their own code! (no, really)
  • 38.
  • 40.
    1st try: word2vec+ naive bayes 2nd try: convolutional neural net 3rd try: LSTM/RNN
  • 41.
    4th try: syntaxnet 5th“try”: -> include speech recognition 6th try: spaCy
  • 42.
    7th try, partI: latent dirichlet allocation 8th try: ?
  • 43.
  • 44.
    Now? in a goodplace. can’t fully disclose what we’re running though. :) basically we can understand both speech and written natural language so that the language can “flow” and it can be in a chat context or in longer formats; ex: - hi do you happen to have iPhones on stock? - yea! - cool. what’s the price? <- have to link to previous parts of conversation NB! this is quite simple in English but tear-your-eyes-off-to-scratch-your-brain* -hard with Finnish. we might be the first ones actually there. *modified from: Friends, 1995, The One with the Baby on the Bus
  • 45.
    Lessons learned Understand theBIG THINGS (cloud, open source, omnichannel customer, data science, time-to-x) Sit where business sits. And sit together. DO STUFF TOGETHER. Don’t use project managers who can’t code (or who are not really good in the subject domain). Apply advanced analytics to automate 80% of small decisions made all the time. Continuous communication beats meetings. Don’t meet. At least start with AI. dont just tweet about that shit.