(Technical) Big Data
Analytics
for non-technical end-users
ErikTromp – CEO UnderstandLing
Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids
(Short) Intro
• Big data science experts
• Specialisms
• Computational Linguistics
• Customer Experience Management
• Service: strategic advices all the way to operational implementation
• Own platform:Tuktu
• Soon: own product:CEMistry
• Trainings/education on big data science
(Short) Intro
Quantify every touchpoint of a customer with your company
4 major areas
• Text Analytics
• Web Analytics
• Mobile Analytics
• CRM/Backend Analytics
(Short) Intro
• ErikTromp
• Age: 28
• CEO UnderstandLing
• Graduated on Sentiment Analysis in 2011
• Multilingual Sentiment Analysis on Social Media
• Software engineer – Scala
• Machine learning
• Author of platformTuktu
Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids
Rationale
Big data science allows to utilize opportunities
Rationale
Big data science allows to utilize opportunities
Big data science drives business
Rationale
Big data science allows to utilize opportunities
Big data science drives business
But is very much a technical revolution, with business implications
Rationale
Many companies want to utilize the opportunities big data science brings
Rationale
Many companies want to utilize the opportunities big data science brings
These companies do not have sufficient capabilities to do so
Rationale
Many companies want to utilize the opportunities big data science brings
These companies do not have sufficient capabilities to do so
Nor are there many suppliers that can do tech, analytics and know their
business
Rationale
But these companies often do have their own (business) analysts
Rationale
IDEA
Make big data science accessible to non-technical users
Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids
Tuktu
http://www.tuktu.io
https://github.com/UnderstandLingBV/Tuktu
Tuktu – Early Days
• Started off as a personal project to make life easier
• Out of a collaboration with the Maastricht University
• Idea: save time on coding/engineering, focus on logic and functionalities
Tuktu – Early Days
• Started off as a personal project to make life easier
• Out of a collaboration with the Maastricht University
• Idea: save time on coding/engineering, focus on logic and functionalities
Instead of writing code over and over again, have it present and configure its
building blocks
Tuktu – Early Days
• Started off as a personal project to make life easier
• Out of a collaboration with the Maastricht University
• Idea: save time on coding/engineering, focus on logic and functionalities
Instead of writing code over and over again, have it present and configure its
building blocks
In a visual and straightforward way!
Tuktu – Now
Your one-stop shop for everything big data science
Tuktu – Now
• Realtime and batch processing
• Synchronous and asynchronous
processing
• REST API
• Drag-and-drop modelling of jobs
• Distributed file system:TDFS
• Key/value-sture:TuktuDB
• Real-time visualization
• Web analytics support
• Scheduling
• No master/slave architecture
• Local or distributed computing
• Machine learning
• Deep learning
• Cross-platform due to JVM
• Easy installation: just unzip!
Tuktu
DEMO
Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids
Deep Learning for Computational Linguistics
IDEA
Learn language models generically
Deep Learning for Computational Linguistics
IDEA
Learn language models generically
Model every CL-problem on top on the generic model
Deep Learning for Computational Linguistics
This way, we can do almost any task on almost any language
Without too much/with less effort
Deep Learning for Computational Linguistics
How?
Deep Learning for Computational Linguistics
There are many linguistics resources available
Sadly; most is for English
In particular: AnnotatedTreebanks for deep parsing
Deep Learning for Computational Linguistics
We can use this however
Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
3. Co-finetune models on co-trained word vectors
Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
3. Co-finetune models on co-trained word vectors
4. Pre-train (recursive) auto-encoder using parsing model for target
language
Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
3. Co-finetune models on co-trained word vectors
4. Pre-train (recursive) auto-encoder using parsing model for target
language
5. Use recursive auto-encoder for specific task in target language
• Topic detection, sentiment analysis, named entity recoginition, authorship profiling
Deep Learning for Computational Linguistics
DEMO
Unsupervised parsing in Dutch
Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids
CEMistry
TEXT ANALYTICS
MOBILEANALYTICS
WEB ANALYTICS
BACKEND/CRM
Customer Profile
Tuktu.js
Visitor Customer
Events
Page views
“SDK”
Visitor Customer
Events
AppTriggers
User Customer
Collectors
Events
Communication
(NLP)
(Database)
Connectors
Customer
Events
Transactions
Questions?
ErikTromp
CEO UnderstandLing
erik@understandling.com
http://www.understandling.com
http://www.tuktu.io
http://www.linkedin.com/in/eriktromp
https://github.com/UnderstandLingBV/Tuktu
Talk to us on Gitter!
https://gitter.im/UnderstandLingBV/Tuktu

Deep learning for text analytics

  • 1.
    (Technical) Big Data Analytics fornon-technical end-users ErikTromp – CEO UnderstandLing
  • 2.
    Contents • (Short) into •Rationale • Tuktu platform • Deep learning for computational linguistics • CEMistry – Customer Experience Monitoring on steroids
  • 3.
    (Short) Intro • Bigdata science experts • Specialisms • Computational Linguistics • Customer Experience Management • Service: strategic advices all the way to operational implementation • Own platform:Tuktu • Soon: own product:CEMistry • Trainings/education on big data science
  • 4.
    (Short) Intro Quantify everytouchpoint of a customer with your company 4 major areas • Text Analytics • Web Analytics • Mobile Analytics • CRM/Backend Analytics
  • 5.
    (Short) Intro • ErikTromp •Age: 28 • CEO UnderstandLing • Graduated on Sentiment Analysis in 2011 • Multilingual Sentiment Analysis on Social Media • Software engineer – Scala • Machine learning • Author of platformTuktu
  • 6.
    Contents • (Short) into •Rationale • Tuktu platform • Deep learning for computational linguistics • CEMistry – Customer Experience Monitoring on steroids
  • 7.
    Rationale Big data scienceallows to utilize opportunities
  • 8.
    Rationale Big data scienceallows to utilize opportunities Big data science drives business
  • 9.
    Rationale Big data scienceallows to utilize opportunities Big data science drives business But is very much a technical revolution, with business implications
  • 10.
    Rationale Many companies wantto utilize the opportunities big data science brings
  • 11.
    Rationale Many companies wantto utilize the opportunities big data science brings These companies do not have sufficient capabilities to do so
  • 12.
    Rationale Many companies wantto utilize the opportunities big data science brings These companies do not have sufficient capabilities to do so Nor are there many suppliers that can do tech, analytics and know their business
  • 13.
    Rationale But these companiesoften do have their own (business) analysts
  • 14.
    Rationale IDEA Make big datascience accessible to non-technical users
  • 15.
    Contents • (Short) into •Rationale • Tuktu platform • Deep learning for computational linguistics • CEMistry – Customer Experience Monitoring on steroids
  • 16.
  • 17.
    Tuktu – EarlyDays • Started off as a personal project to make life easier • Out of a collaboration with the Maastricht University • Idea: save time on coding/engineering, focus on logic and functionalities
  • 18.
    Tuktu – EarlyDays • Started off as a personal project to make life easier • Out of a collaboration with the Maastricht University • Idea: save time on coding/engineering, focus on logic and functionalities Instead of writing code over and over again, have it present and configure its building blocks
  • 19.
    Tuktu – EarlyDays • Started off as a personal project to make life easier • Out of a collaboration with the Maastricht University • Idea: save time on coding/engineering, focus on logic and functionalities Instead of writing code over and over again, have it present and configure its building blocks In a visual and straightforward way!
  • 20.
    Tuktu – Now Yourone-stop shop for everything big data science
  • 21.
    Tuktu – Now •Realtime and batch processing • Synchronous and asynchronous processing • REST API • Drag-and-drop modelling of jobs • Distributed file system:TDFS • Key/value-sture:TuktuDB • Real-time visualization • Web analytics support • Scheduling • No master/slave architecture • Local or distributed computing • Machine learning • Deep learning • Cross-platform due to JVM • Easy installation: just unzip!
  • 22.
  • 23.
    Contents • (Short) into •Rationale • Tuktu platform • Deep learning for computational linguistics • CEMistry – Customer Experience Monitoring on steroids
  • 24.
    Deep Learning forComputational Linguistics IDEA Learn language models generically
  • 25.
    Deep Learning forComputational Linguistics IDEA Learn language models generically Model every CL-problem on top on the generic model
  • 26.
    Deep Learning forComputational Linguistics This way, we can do almost any task on almost any language Without too much/with less effort
  • 27.
    Deep Learning forComputational Linguistics How?
  • 28.
    Deep Learning forComputational Linguistics There are many linguistics resources available Sadly; most is for English In particular: AnnotatedTreebanks for deep parsing
  • 29.
    Deep Learning forComputational Linguistics We can use this however
  • 30.
    Deep Learning forComputational Linguistics 1. Co-train word vectors for target language and English
  • 31.
    Deep Learning forComputational Linguistics 1. Co-train word vectors for target language and English 2. Train parsing models on English language
  • 32.
    Deep Learning forComputational Linguistics 1. Co-train word vectors for target language and English 2. Train parsing models on English language 3. Co-finetune models on co-trained word vectors
  • 33.
    Deep Learning forComputational Linguistics 1. Co-train word vectors for target language and English 2. Train parsing models on English language 3. Co-finetune models on co-trained word vectors 4. Pre-train (recursive) auto-encoder using parsing model for target language
  • 34.
    Deep Learning forComputational Linguistics 1. Co-train word vectors for target language and English 2. Train parsing models on English language 3. Co-finetune models on co-trained word vectors 4. Pre-train (recursive) auto-encoder using parsing model for target language 5. Use recursive auto-encoder for specific task in target language • Topic detection, sentiment analysis, named entity recoginition, authorship profiling
  • 35.
    Deep Learning forComputational Linguistics DEMO Unsupervised parsing in Dutch
  • 36.
    Contents • (Short) into •Rationale • Tuktu platform • Deep learning for computational linguistics • CEMistry – Customer Experience Monitoring on steroids
  • 37.
    CEMistry TEXT ANALYTICS MOBILEANALYTICS WEB ANALYTICS BACKEND/CRM CustomerProfile Tuktu.js Visitor Customer Events Page views “SDK” Visitor Customer Events AppTriggers User Customer Collectors Events Communication (NLP) (Database) Connectors Customer Events Transactions
  • 38.
  • 39.