This document discusses big data analytics tools for non-technical users. It introduces Tuktu, a platform that makes big data science accessible through a visual drag-and-drop interface. It also describes using deep learning models trained on linguistic resources to perform natural language tasks across languages with less effort. Finally, it presents CEMistry, a customer experience monitoring product that analyzes text, web, mobile, and backend data to build customer profiles.
2. Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids
3. (Short) Intro
• Big data science experts
• Specialisms
• Computational Linguistics
• Customer Experience Management
• Service: strategic advices all the way to operational implementation
• Own platform:Tuktu
• Soon: own product:CEMistry
• Trainings/education on big data science
4. (Short) Intro
Quantify every touchpoint of a customer with your company
4 major areas
• Text Analytics
• Web Analytics
• Mobile Analytics
• CRM/Backend Analytics
5. (Short) Intro
• ErikTromp
• Age: 28
• CEO UnderstandLing
• Graduated on Sentiment Analysis in 2011
• Multilingual Sentiment Analysis on Social Media
• Software engineer – Scala
• Machine learning
• Author of platformTuktu
6. Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids
9. Rationale
Big data science allows to utilize opportunities
Big data science drives business
But is very much a technical revolution, with business implications
11. Rationale
Many companies want to utilize the opportunities big data science brings
These companies do not have sufficient capabilities to do so
12. Rationale
Many companies want to utilize the opportunities big data science brings
These companies do not have sufficient capabilities to do so
Nor are there many suppliers that can do tech, analytics and know their
business
17. Tuktu – Early Days
• Started off as a personal project to make life easier
• Out of a collaboration with the Maastricht University
• Idea: save time on coding/engineering, focus on logic and functionalities
18. Tuktu – Early Days
• Started off as a personal project to make life easier
• Out of a collaboration with the Maastricht University
• Idea: save time on coding/engineering, focus on logic and functionalities
Instead of writing code over and over again, have it present and configure its
building blocks
19. Tuktu – Early Days
• Started off as a personal project to make life easier
• Out of a collaboration with the Maastricht University
• Idea: save time on coding/engineering, focus on logic and functionalities
Instead of writing code over and over again, have it present and configure its
building blocks
In a visual and straightforward way!
21. Tuktu – Now
• Realtime and batch processing
• Synchronous and asynchronous
processing
• REST API
• Drag-and-drop modelling of jobs
• Distributed file system:TDFS
• Key/value-sture:TuktuDB
• Real-time visualization
• Web analytics support
• Scheduling
• No master/slave architecture
• Local or distributed computing
• Machine learning
• Deep learning
• Cross-platform due to JVM
• Easy installation: just unzip!
28. Deep Learning for Computational Linguistics
There are many linguistics resources available
Sadly; most is for English
In particular: AnnotatedTreebanks for deep parsing
30. Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
31. Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
32. Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
3. Co-finetune models on co-trained word vectors
33. Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
3. Co-finetune models on co-trained word vectors
4. Pre-train (recursive) auto-encoder using parsing model for target
language
34. Deep Learning for Computational Linguistics
1. Co-train word vectors for target language and English
2. Train parsing models on English language
3. Co-finetune models on co-trained word vectors
4. Pre-train (recursive) auto-encoder using parsing model for target
language
5. Use recursive auto-encoder for specific task in target language
• Topic detection, sentiment analysis, named entity recoginition, authorship profiling
35. Deep Learning for Computational Linguistics
DEMO
Unsupervised parsing in Dutch
36. Contents
• (Short) into
• Rationale
• Tuktu platform
• Deep learning for computational linguistics
• CEMistry – Customer Experience Monitoring on steroids