SlideShare a Scribd company logo
1 of 162
challenges, learnings and opportunities
presented by imron zuhri, adit, and samudra
KUDO codefest 14 May 2016
machine learning
can a machine think?
in 1996, Garry Kasparov was not afraid of a computer, and he won
the next year, he played against a new and improved Deep Blue and lost
this is the move that was so surprising, so un-machine-like,
that he was sure the IBM team had cheated
Rd5
Rd1
a random move, a computer bug
to kasparov, a sign of superior intelligence
Rd5
Rd1
big data analytics, is the culmination
of the machine way of thinking
we can now immensely
extend our memory and computational power
to helped us doing that
what is machine learning
some definitions
 a (hypnotized) user’s perspective
a scientific (witchcraft) field that:
researches fundamental principles from data (potions) and
develops magical algorithms (spells to cast)
 (pascal vincent, 2015)
 field of study that gives computers the ability to learn without
being explicitly programmed
 arthur samuel (1959)
 formal definitions (tom mitchell, 1998):
“A machine is said to be learning IF
it improves with:
 each experience E
 on specific tasks T
 with specific performance P
CURRENT VIEW OF ML FOUNDING DISCIPLINES
10
three niches for machine learning
data mining: using historical data to improve
decisions
 medical records  medical knowledge
software applications that are difficult to program
by hand
 autonomous driving
 image classification
user modeling
 automatic recommender systems
source: rong jin, 2013
(some) open problems in machine learning
 one-shot learning
 unsupervised learning
 reinforced learning
 artificial general intelligence
“most of human and animal learning
is unsupervised learning. If
intelligence was a cake, unsupervised
learning would be the cake,
supervised learning would be the
icing on the cake, and reinforcement
learning would be the cherry on the
cake. We know how to make the icing
and the cherry, but we don't know
how to make the cake.”
yan lecun
challenges in machine learning
 data-related:
 abundant yet scattered data
 unstructured, noisy data
 offline-stored data (duh!)
 resource-related:
 data storage
 space constraints
 computing power
 training time
 inve$$$tments
• initial investments
• running costs
challenges in machine learning
 methodical issues:
 result consistency
(i.e. accuracy)
 overfitting
 algorithm computational efficiency
 miscellaneous:
 architectural differences/
 portability issues
 popularity of non-open standard, vendor-
locked compute libraries/apis
(rawr!)
recent breakthroughs in machine learning
deepmind atari q learner (2014)
plays 5 kinds of atari 2600 games
states: pixels in atari
actions: left/right move
reward: score
algorithm used:
feedforward “q-learning”
conv-net
for unsupervised map of reward
recent breakthroughs in machine learning
the translator (2015)
real-time translations of speech
from/into 7 different languages
able to run from even from
resource-constrained embedded
hardware (i.e. smartphones)
uses same engine that was used in
microsoft cortana (creepy!)
Reinforcement Learning: DeepMind AlphaGo
 google deepmind alphago (2016)
 99.8% winning rate
vs other algorithm
 first program to defeat
human go champion
 algorithm used:
 deep neural network
 monte carlo search tree
 supervised learning from expert games
 reinforcement learning vs other alphago instances
supervised learning: random forest
deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data,
result:
 top 5 are random forest classifier
 for kaggle competition, try gbm : xgboost.
supervised: deep learning
don’t be fooled, dl research improve
part by part, either new kind of layer,
new activation function, new non-
convex optimization solver, or deeper
neural net.
from rodrigo benenson
deep learning accuracies ranking
supervised: deep learning
summary:
 relu works better than sigmoid function for activation.
 maxout works better when applied to dropconnect for
activation function.
 dropout layer works to fight overfitting.
 adagrad and adadelta works better if you don’t want to
tune optimization hyperparameter.
 deeper layer works: highway layer and residual layer.
unsupervised: t-sne
t-stochastic neighbor embedding
maaten and hinton (2008):
mnist data set visualization
 works best for data-viz
 can be used for clustering too
(if you’d bother to tweak the algo)
Given 100 and 1000 label of data, and the other unlabeled (~50.000)
Try to predict 10.000 future data.
● It works! with small label data.
● Now we don’t have to tell some interns or PhD student to label some
data. :)
A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015)
semi-supervised learning: ladder neural networks
collaborative filtering: restricted boltzmann machine
rbm for collaborative learning (hinton, 2008):
 it has been used in netflix and spotify algo.
 it works better than svd!
 correlation(svd, rbm) : -1 < c < 1
• can be assembled with svd
 to improve the prediction.
some advices for applied machine learning research
(this competition)
 preprocessing: scaling & imputation
 cross-validation: choose best algos
 hyperparameter optimization
 ensembling n-models: dark knowledge
raschka(2014):
scaling improve prediction!
gelman(2006)
do prediction for n/a data, then
predict the data with noise
less biased!
data preprocessing: scaling & imputation
cross-validation: how to choose best algo?
 cross-validation is a must!
 (tibshirani et.al 2014)
 don’t overlap your cross-
validation data partition!
 (zhang, data robot)
hyperparameter optimization
if you want to search best hyperparamaters:
do random search.
random search is better than grid search
(bengio, 2012)
ensembling n-models: dark knowledge
If two model give same accuracy, but low
correlation of prediction output, then we can
improve prediction accuracy by averaging
model prediction.
(Hinton, 2015)
the landscape of opportunities
Popular Big Data Industry
Financial Services Telco Web/Media Retail Healthcare Government
• Fraud detection
• Compliance
reporting
• Portfolio analysis
• Customer
statements
• Wire transfer alerts
• Customer
acquisition,
retention, and
profitability
• Subscriber data
management
• Fraud analysis
• Social analysis
• Response times
• Traffic analysis
• Product
affinity/bundling
• Sentiment Analysis
• Content
monetization
• Advertising
optimization
• Optimization of user
experience/ click
stream analysis
• Network
optimization to
support service
levels
• Store operation
analysis
• Customer loyalty
programs
• Collaborative
planning and
forecasting
• Loss prevention
• Supply chain
optimization
• Drug development
and launch cost
reduction
• Regulatory
compliance
• Product quality
• Return on
promotional
investment
• Lowered risk of new
product success
• Security/anti-terror
• Recovery Act public
disclosure
• Budgetary control
and management
• Educational
reporting
• Asset control and
assessment
Environment
monitoring
*cisco 2013-2014
currently the biggest prescriptive analytics engine:
contextual advertising
http://www.flashtalking.com/us/targeted-ads/
another one:
marketplace and services recommendation engine
challenges of implementation
and
what we do with machine learning
do you follow waze instruction during the first one week?
 would you buy a self-driving car that couldn’t drive
itself in 99 percent of the country?
 or that knew nearly nothing about parking,
 couldn’t be taken out in snow or heavy rain,
 and would drive straight over a gaping pothole?
if your answer is yes, then check out the google self-driving car, model year
2014
but
can we trust them enough?
the BIGGEST CHALLENGES in indonesia
DATA SETS
the current analytics technology
human still doing
most of the process
the current challenges of big data analytics?
heterogeneous
data sources,
systems and
formats
time consuming
and complex
data preparation
process
almost
impossible task
of integrating
various kind of
data
it requires
experts to
analyze big and
complex data
most of the user
interactions are
not intuitive
“Before performing analytics, data scientists must first
format and prepare the raw data for analytics, often with
more than 80% of the effort.”, said Intel Corp. Research
what it would be like,
if we can simplify the whole process?
?
?
hence our vision
we believe human should not be bogged down by tedious matters.
by reimagining analytics we envisioned the creation of intelligent
machines,
that will free human to focus on solving the world’s toughest
problems.
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
intelligent machines that can helped us collect the massive amount of data
automatically reads and connects to
any kind of data, including automatic
machine to machine connections
structured
data
printed
invoices
social media
conversation
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
x-men
then helped us separate the signals from the noise
automatic data quality assessments,
data cleansing and data filtering
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
complete the information and connect them all in a meaningful way
automatic data transformation, entity
extraction, contextual profiling
regi
mita
gundam
batman
tom
mediatrac
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
gundam
batman
sith
and finally helped us making sense of the massively connected data
contextual search and
recommendation
intelligent data discovery
regi
mita
gundam
batman
tom
mediatrac
gundam
batman
sith
through a highly intuitive and natural user interface
natural language interface
voice and gesture recognition
ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
digital
telco
legal
retail
healthcare
agriculture
multi format
structured
unstructured
unclean
missing data
unstandardized
unconnected
difficult to analyze
cleaned and standardized
enriched and validated
connected at granular level
analytics ready
data
automatic
data collection
automatic
data preparation
automatic
data integration
teritory management
CONFIDENTIAL for internal use only
all of our silo data will have a totally elevated value,
once you connect them all in a meaningful way
are all of our current data connected yet?
Almost…
google is a humongous library index, with a smart
library card search that redirects you to the original
documents
facebook is a giant personal scrapbook of all your
acquaintances that are currently linked by manual
tagging and friends list
source:techglimpse
youtube and instagram are a huge repository of
current knowledge, lifestyle and trends that are still
largely unconnected
now imagine this!
when we can have intelligent machines that can
connect everything, in a meaningful way…
we can start asking questions, on things we never
thought possible to be asked before
can map songs across social
graphs.
Spotify
can give us situational data — where
someone is listening to a song,
when, how and even (to an extent) why.
Shazam
can help us track the growth of a song
using search and streams.
YouTube
are becoming hotbeds for music discovery.
Instagram & Vine
If we can connect all their data together?
or if you have a radio station, what sort of playlist that will appeal to
your target audience, if we know, that a sizeable percentage of them
have a hummer?
we can even predict specific combination of words, notes and
beats that will increase the chance of putting the song in
billboard top 40 this upcoming season.
here are some sample of hidden insights
that we can discover from our own large repository of data,
using our intelligent data integration and data discovery tools
when we integrate historical media articles with geodemographic and point of
interest database we can create a model that can predict high probability of fire
incidence down to street level
productivy optimization
lessons learned including how to scale your ML
scalability problems - outline
 large scale machine learning
 mahout - scalable ml on hadoop
 jubatus – distributed online real-time ml
 vowpal wabbit – fast learning at yahoo/ms
 trident ml and storm pattern: ml on storm, yarn
 upcoming --- samoa: ml on s4, storm
 issues in scalable distributed ml
 load balancing
 auto scaling
 job scheduling
 workflow management
 data and model parallelism
 parameter server framework
 peer-to-peer framework
scalability problems - outline
 distributed deep learning
 yahoolda: scalable parallel framework in latent variable models
 distbelief – distributed deep learning on cluster
 h2o – distributed deep learning on spark
 adam at msr – distributed deep learning
 dl4j – open source for deep learning on hadoop and spark
 petuum – distributed machine learning
 singa – distributed deep learning
 tensorflow: google large scale distributed dl
 mxnet: heterogeneous distributed deep learning
 caffee on spark: yahoo
 distributed learning and optimization
 proximal splitting/auxiliary coordinates;
 bundle (sub-gradient);
 shotgun: parallelized cdm (coordinate descent method)
 asynchronous sgd;
 hogwild/dogwild;
what’s next?
emerging analytics technology for automatic
analytics on large dimensional data
online deep learning
topological data analysis
fuzzy-rough set based data exploration system
granular computing
kernel set and spatiotemporal analysis
applied differential geometry
non axiomatic reasoning system
intelligent rule and knowledge extraction/discovery
multi agent based modeling
weak signal detection and analysis
bayesian networks analysis
genetic programming
self organizing neural networks
and also more humanlike user
interaction and data visualization
technology
eye tracking
glass-free auto stereoscopy
touch sensitive hologram
natural language user interface
tangible user interface
wearable gestural interface
brain-computer interface
sensor network user interface
In the meantime
principles for the development of a complete mind:
study the science of art. study the art of science.
develop your senses — especially learn how to see.
realize that everything connects to everything else.
Leonardo DaVinci

More Related Content

What's hot

What is Deep Learning?
What is Deep Learning?What is Deep Learning?
What is Deep Learning?NVIDIA
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 
Machine Learning ppt.pptx
Machine Learning ppt.pptxMachine Learning ppt.pptx
Machine Learning ppt.pptx21MC048SARANRAJ
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckSlideTeam
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural NetworksAniket Maurya
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural networkKIRAN R
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisPramod Sharma
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 
Machine Learning
Machine LearningMachine Learning
Machine LearningKumar P
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Machine learning in Cyber Security
Machine learning in Cyber SecurityMachine learning in Cyber Security
Machine learning in Cyber SecurityRajathV2
 

What's hot (20)

What is Deep Learning?
What is Deep Learning?What is Deep Learning?
What is Deep Learning?
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
Machine Learning ppt.pptx
Machine Learning ppt.pptxMachine Learning ppt.pptx
Machine Learning ppt.pptx
 
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete DeckAI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
AI Vs ML Vs DL PowerPoint Presentation Slide Templates Complete Deck
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Deep Learning With Neural Networks
Deep Learning With Neural NetworksDeep Learning With Neural Networks
Deep Learning With Neural Networks
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 
Machine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer DiagnosisMachine Learning - Breast Cancer Diagnosis
Machine Learning - Breast Cancer Diagnosis
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Cnn
CnnCnn
Cnn
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
Transfer Learning and Fine-tuning Deep Neural Networks
 Transfer Learning and Fine-tuning Deep Neural Networks Transfer Learning and Fine-tuning Deep Neural Networks
Transfer Learning and Fine-tuning Deep Neural Networks
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Deep learning
Deep learningDeep learning
Deep learning
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Machine learning in Cyber Security
Machine learning in Cyber SecurityMachine learning in Cyber Security
Machine learning in Cyber Security
 

Similar to ML insights from KUDO codefest

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...Dozie Agbo
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinTuri, Inc.
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkSaratoga
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Mathieu DESPRIEE
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloOCTO Technology
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1gauravsc36
 
What is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersWhat is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersAnkur Jain
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsxSangeetaTripathi8
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningGovind Mudumbai
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGYAI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGYsantoshverma90
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van TolTalentEvent
 

Similar to ML insights from KUDO codefest (20)

Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
An Elementary Introduction to Artificial Intelligence, Data Science and Machi...
 
GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Big data analytics 1
Big data analytics 1Big data analytics 1
Big data analytics 1
 
Intro to AI.pptx
Intro to AI.pptxIntro to AI.pptx
Intro to AI.pptx
 
What is Artificial Intelligence - Beginners
What is Artificial Intelligence - BeginnersWhat is Artificial Intelligence - Beginners
What is Artificial Intelligence - Beginners
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
SSE 2017 10-09
SSE 2017 10-09SSE 2017 10-09
SSE 2017 10-09
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGYAI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
AI INTRODUCTION.pptx,INFORMATION TECHNOLOGY
 
Fontys Eric van Tol
Fontys Eric van TolFontys Eric van Tol
Fontys Eric van Tol
 

More from CodePolitan

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberCodePolitan
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0CodePolitan
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarovCodePolitan
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjajaCodePolitan
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialCodePolitan
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayatCodePolitan
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexanderCodePolitan
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3CodePolitan
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for HackathonCodePolitan
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsCodePolitan
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless ArchitectureCodePolitan
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?CodePolitan
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCodePolitan
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of ThingsCodePolitan
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?CodePolitan
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan
 

More from CodePolitan (19)

Pre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium MemberPre-Order #2 CodePolitan Premium Member
Pre-Order #2 CodePolitan Premium Member
 
Materi devcussion 1.0
Materi devcussion 1.0Materi devcussion 1.0
Materi devcussion 1.0
 
Slides alexander-makarov
Slides alexander-makarovSlides alexander-makarov
Slides alexander-makarov
 
Slides galvin-widjaja
Slides galvin-widjajaSlides galvin-widjaja
Slides galvin-widjaja
 
Dev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potentialDev summit.io 2017 unlock your potential
Dev summit.io 2017 unlock your potential
 
Slides imanzah-hidayat
Slides imanzah-hidayatSlides imanzah-hidayat
Slides imanzah-hidayat
 
Ids johanes alexander
Ids   johanes alexanderIds   johanes alexander
Ids johanes alexander
 
Vison final
Vison   finalVison   final
Vison final
 
Tride
TrideTride
Tride
 
React ftw
React ftwReact ftw
React ftw
 
2017 10 28 angular in war - rev3
2017 10 28   angular in war - rev32017 10 28   angular in war - rev3
2017 10 28 angular in war - rev3
 
Rapid Android Development for Hackathon
Rapid Android Development for HackathonRapid Android Development for Hackathon
Rapid Android Development for Hackathon
 
Memaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.jsMemaksimalkan Non-Blocking IO pada Node.js
Memaksimalkan Non-Blocking IO pada Node.js
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
 
Get in Touch with Internet of Things
Get in Touch with Internet of ThingsGet in Touch with Internet of Things
Get in Touch with Internet of Things
 
IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?IoT Devices, Which One is Right for You to Learn?
IoT Devices, Which One is Right for You to Learn?
 
CodePolitan Media Partner SOP
CodePolitan Media Partner SOPCodePolitan Media Partner SOP
CodePolitan Media Partner SOP
 

Recently uploaded

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 

Recently uploaded (20)

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 

ML insights from KUDO codefest

  • 1. challenges, learnings and opportunities presented by imron zuhri, adit, and samudra KUDO codefest 14 May 2016 machine learning
  • 2. can a machine think?
  • 3. in 1996, Garry Kasparov was not afraid of a computer, and he won the next year, he played against a new and improved Deep Blue and lost
  • 4. this is the move that was so surprising, so un-machine-like, that he was sure the IBM team had cheated Rd5 Rd1
  • 5. a random move, a computer bug to kasparov, a sign of superior intelligence Rd5 Rd1
  • 6. big data analytics, is the culmination of the machine way of thinking we can now immensely extend our memory and computational power to helped us doing that
  • 7. what is machine learning
  • 8. some definitions  a (hypnotized) user’s perspective a scientific (witchcraft) field that: researches fundamental principles from data (potions) and develops magical algorithms (spells to cast)  (pascal vincent, 2015)  field of study that gives computers the ability to learn without being explicitly programmed  arthur samuel (1959)  formal definitions (tom mitchell, 1998): “A machine is said to be learning IF it improves with:  each experience E  on specific tasks T  with specific performance P
  • 9. CURRENT VIEW OF ML FOUNDING DISCIPLINES
  • 10. 10 three niches for machine learning data mining: using historical data to improve decisions  medical records  medical knowledge software applications that are difficult to program by hand  autonomous driving  image classification user modeling  automatic recommender systems source: rong jin, 2013
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. (some) open problems in machine learning  one-shot learning  unsupervised learning  reinforced learning  artificial general intelligence “most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake.” yan lecun
  • 50. challenges in machine learning  data-related:  abundant yet scattered data  unstructured, noisy data  offline-stored data (duh!)  resource-related:  data storage  space constraints  computing power  training time  inve$$$tments • initial investments • running costs
  • 51. challenges in machine learning  methodical issues:  result consistency (i.e. accuracy)  overfitting  algorithm computational efficiency  miscellaneous:  architectural differences/  portability issues  popularity of non-open standard, vendor- locked compute libraries/apis (rawr!)
  • 52. recent breakthroughs in machine learning deepmind atari q learner (2014) plays 5 kinds of atari 2600 games states: pixels in atari actions: left/right move reward: score algorithm used: feedforward “q-learning” conv-net for unsupervised map of reward
  • 53. recent breakthroughs in machine learning the translator (2015) real-time translations of speech from/into 7 different languages able to run from even from resource-constrained embedded hardware (i.e. smartphones) uses same engine that was used in microsoft cortana (creepy!)
  • 54. Reinforcement Learning: DeepMind AlphaGo  google deepmind alphago (2016)  99.8% winning rate vs other algorithm  first program to defeat human go champion  algorithm used:  deep neural network  monte carlo search tree  supervised learning from expert games  reinforcement learning vs other alphago instances
  • 55. supervised learning: random forest deldago et. al. (2014) used 179 classifiers with 121 data sets in uci data, result:  top 5 are random forest classifier  for kaggle competition, try gbm : xgboost.
  • 56. supervised: deep learning don’t be fooled, dl research improve part by part, either new kind of layer, new activation function, new non- convex optimization solver, or deeper neural net. from rodrigo benenson deep learning accuracies ranking
  • 57. supervised: deep learning summary:  relu works better than sigmoid function for activation.  maxout works better when applied to dropconnect for activation function.  dropout layer works to fight overfitting.  adagrad and adadelta works better if you don’t want to tune optimization hyperparameter.  deeper layer works: highway layer and residual layer.
  • 58. unsupervised: t-sne t-stochastic neighbor embedding maaten and hinton (2008): mnist data set visualization  works best for data-viz  can be used for clustering too (if you’d bother to tweak the algo)
  • 59. Given 100 and 1000 label of data, and the other unlabeled (~50.000) Try to predict 10.000 future data. ● It works! with small label data. ● Now we don’t have to tell some interns or PhD student to label some data. :) A Rasmus, H Valpola, M Honkala, M Berglund, and T Raiko. (2015) semi-supervised learning: ladder neural networks
  • 60. collaborative filtering: restricted boltzmann machine rbm for collaborative learning (hinton, 2008):  it has been used in netflix and spotify algo.  it works better than svd!  correlation(svd, rbm) : -1 < c < 1 • can be assembled with svd  to improve the prediction.
  • 61. some advices for applied machine learning research (this competition)  preprocessing: scaling & imputation  cross-validation: choose best algos  hyperparameter optimization  ensembling n-models: dark knowledge
  • 62. raschka(2014): scaling improve prediction! gelman(2006) do prediction for n/a data, then predict the data with noise less biased! data preprocessing: scaling & imputation
  • 63. cross-validation: how to choose best algo?  cross-validation is a must!  (tibshirani et.al 2014)  don’t overlap your cross- validation data partition!  (zhang, data robot)
  • 64. hyperparameter optimization if you want to search best hyperparamaters: do random search. random search is better than grid search (bengio, 2012)
  • 65. ensembling n-models: dark knowledge If two model give same accuracy, but low correlation of prediction output, then we can improve prediction accuracy by averaging model prediction. (Hinton, 2015)
  • 66. the landscape of opportunities
  • 67.
  • 68. Popular Big Data Industry Financial Services Telco Web/Media Retail Healthcare Government • Fraud detection • Compliance reporting • Portfolio analysis • Customer statements • Wire transfer alerts • Customer acquisition, retention, and profitability • Subscriber data management • Fraud analysis • Social analysis • Response times • Traffic analysis • Product affinity/bundling • Sentiment Analysis • Content monetization • Advertising optimization • Optimization of user experience/ click stream analysis • Network optimization to support service levels • Store operation analysis • Customer loyalty programs • Collaborative planning and forecasting • Loss prevention • Supply chain optimization • Drug development and launch cost reduction • Regulatory compliance • Product quality • Return on promotional investment • Lowered risk of new product success • Security/anti-terror • Recovery Act public disclosure • Budgetary control and management • Educational reporting • Asset control and assessment Environment monitoring *cisco 2013-2014
  • 69.
  • 70.
  • 71. currently the biggest prescriptive analytics engine: contextual advertising http://www.flashtalking.com/us/targeted-ads/
  • 72. another one: marketplace and services recommendation engine
  • 73. challenges of implementation and what we do with machine learning
  • 74. do you follow waze instruction during the first one week?
  • 75.  would you buy a self-driving car that couldn’t drive itself in 99 percent of the country?  or that knew nearly nothing about parking,  couldn’t be taken out in snow or heavy rain,  and would drive straight over a gaping pothole? if your answer is yes, then check out the google self-driving car, model year 2014
  • 76. but
  • 77. can we trust them enough?
  • 78. the BIGGEST CHALLENGES in indonesia
  • 80. the current analytics technology human still doing most of the process
  • 81. the current challenges of big data analytics? heterogeneous data sources, systems and formats time consuming and complex data preparation process almost impossible task of integrating various kind of data it requires experts to analyze big and complex data most of the user interactions are not intuitive “Before performing analytics, data scientists must first format and prepare the raw data for analytics, often with more than 80% of the effort.”, said Intel Corp. Research
  • 82. what it would be like, if we can simplify the whole process? ? ?
  • 83. hence our vision we believe human should not be bogged down by tedious matters. by reimagining analytics we envisioned the creation of intelligent machines, that will free human to focus on solving the world’s toughest problems.
  • 84. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 85. intelligent machines that can helped us collect the massive amount of data automatically reads and connects to any kind of data, including automatic machine to machine connections structured data printed invoices social media conversation
  • 86. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam x-men
  • 87. then helped us separate the signals from the noise automatic data quality assessments, data cleansing and data filtering regi mita gundam
  • 88. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam
  • 89. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 90. complete the information and connect them all in a meaningful way automatic data transformation, entity extraction, contextual profiling regi mita gundam batman tom mediatrac
  • 91. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery gundam batman sith
  • 92. and finally helped us making sense of the massively connected data contextual search and recommendation intelligent data discovery regi mita gundam batman tom mediatrac gundam batman sith
  • 93. through a highly intuitive and natural user interface natural language interface voice and gesture recognition ada berapa banyak restoran yg jual soto sepanjang jalan senopati?
  • 95.
  • 96. multi format structured unstructured unclean missing data unstandardized unconnected difficult to analyze cleaned and standardized enriched and validated connected at granular level analytics ready data automatic data collection automatic data preparation automatic data integration
  • 97.
  • 98.
  • 100. all of our silo data will have a totally elevated value, once you connect them all in a meaningful way
  • 101. are all of our current data connected yet?
  • 103. google is a humongous library index, with a smart library card search that redirects you to the original documents
  • 104. facebook is a giant personal scrapbook of all your acquaintances that are currently linked by manual tagging and friends list source:techglimpse
  • 105. youtube and instagram are a huge repository of current knowledge, lifestyle and trends that are still largely unconnected
  • 107. when we can have intelligent machines that can connect everything, in a meaningful way… we can start asking questions, on things we never thought possible to be asked before
  • 108. can map songs across social graphs. Spotify can give us situational data — where someone is listening to a song, when, how and even (to an extent) why. Shazam can help us track the growth of a song using search and streams. YouTube are becoming hotbeds for music discovery. Instagram & Vine If we can connect all their data together?
  • 109. or if you have a radio station, what sort of playlist that will appeal to your target audience, if we know, that a sizeable percentage of them have a hummer?
  • 110. we can even predict specific combination of words, notes and beats that will increase the chance of putting the song in billboard top 40 this upcoming season.
  • 111. here are some sample of hidden insights that we can discover from our own large repository of data, using our intelligent data integration and data discovery tools
  • 112. when we integrate historical media articles with geodemographic and point of interest database we can create a model that can predict high probability of fire incidence down to street level
  • 113.
  • 115.
  • 116.
  • 117.
  • 118.
  • 119.
  • 120. lessons learned including how to scale your ML
  • 121.
  • 122.
  • 123.
  • 124.
  • 125.
  • 126.
  • 127.
  • 128.
  • 129.
  • 130.
  • 131.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142.
  • 143.
  • 144.
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153. scalability problems - outline  large scale machine learning  mahout - scalable ml on hadoop  jubatus – distributed online real-time ml  vowpal wabbit – fast learning at yahoo/ms  trident ml and storm pattern: ml on storm, yarn  upcoming --- samoa: ml on s4, storm  issues in scalable distributed ml  load balancing  auto scaling  job scheduling  workflow management  data and model parallelism  parameter server framework  peer-to-peer framework
  • 154. scalability problems - outline  distributed deep learning  yahoolda: scalable parallel framework in latent variable models  distbelief – distributed deep learning on cluster  h2o – distributed deep learning on spark  adam at msr – distributed deep learning  dl4j – open source for deep learning on hadoop and spark  petuum – distributed machine learning  singa – distributed deep learning  tensorflow: google large scale distributed dl  mxnet: heterogeneous distributed deep learning  caffee on spark: yahoo  distributed learning and optimization  proximal splitting/auxiliary coordinates;  bundle (sub-gradient);  shotgun: parallelized cdm (coordinate descent method)  asynchronous sgd;  hogwild/dogwild;
  • 156.
  • 157.
  • 158.
  • 159. emerging analytics technology for automatic analytics on large dimensional data online deep learning topological data analysis fuzzy-rough set based data exploration system granular computing kernel set and spatiotemporal analysis applied differential geometry non axiomatic reasoning system intelligent rule and knowledge extraction/discovery multi agent based modeling weak signal detection and analysis bayesian networks analysis genetic programming self organizing neural networks
  • 160. and also more humanlike user interaction and data visualization technology eye tracking glass-free auto stereoscopy touch sensitive hologram natural language user interface tangible user interface wearable gestural interface brain-computer interface sensor network user interface
  • 162. principles for the development of a complete mind: study the science of art. study the art of science. develop your senses — especially learn how to see. realize that everything connects to everything else. Leonardo DaVinci