SlideShare a Scribd company logo
1 of 36
Download to read offline
NLP for videos:
Understanding
customers' feelings in
videos
Author: Albert Lewandowski
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
About me
● Big Data DevOps Engineer - GetInData
● Focused on infrastructure, cloud, Big Data, AI, scalable
web applications
● Certified Google Cloud Architect
● Certified Kubernetes Administrator
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Content
● Problem to solve
● Big Data Frameworks or not?
● Cloud Magic
● How to mix technologies?
● Observability
● Lessons learnt
Problem to solve
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
The Problem
Big volume of the videos is complex to be monitored while
more and more young users prefer to mention brands in the
video-based social media.
60%
Of companies
don’t convert
leads into revenue
95%
of a message when
they watch it in a
video, compared to
10% when reading
it in the text
Source: Agility PR, Rick Whittinghton, Hubspot, Insivia
54%
of consumers want to
see more video
content from a brand
or business they
support
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Solution
Scalable
Cloud platform written in Golang, Python and React with
Azure Machine Learning services, and with Apache Spark.
Artificial
Intelligence
Efficient
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
~5 - 6 weeks for the project
● Which tools are the fastest in delivering results?
● What is the crucial to meet requirements for PoC?
● How can we analyze language?
● What data do we need to create valuable insights?
● Can we provide scalable platform?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Perception
Business
logic
CI/CD
Idempotency
Reprocessing
Explainability
Monitoring
Testing
Serving
Infrastructure
Data Ingestion
Security
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Reality
Business logic
CI/CD
Idempotency
Reprocessing
Explainability
Monitoring
Testing
Serving
Infrastructure
Data Ingestion
Security
Big Data Frameworks
or not?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Quick start
1. Linux command line is enough as the entrypoint for the
project.
2. Python script and managed services.
3. Do not reinvent the wheel.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Big Data tools?
1. Apache Spark can be flexible tool, especially when we
know it and we want to test it with bigger scale.
2. Writing own app in Golang can be a wise choice when we
want to proceed with simple actions like gathering data
from external sources.
3. Limitations of the components
a. Like external SDK
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Cloud Magic
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Complex Analysis
Spark seems to be the right solution for it but the speed of
development was more important than creating scalable
solution.
Processing Polish language is really tough and it requires
much more code development.
Spark NLP v3 from John Snow Labs is worth checking.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Target output
● Frequency of the phrase (like the problem of the product).
● Feelings related to it and if there is only mentioned a
problem or a problem is the main character.
● Each video is tagged with the categories corresponding to:
type of content, feelings, key words.
● Visualizing changes depending on timeperiod.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Managed Services
Public cloud provides wide range of services but their quality
may differ.
Moreover, pricing can be really high.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
List of steps
1. Get required links to videos.
2. Process video to get only audio.
3. Save audio to storage.
4. Get audio and process it with Azure Cognitive Services to
receive text.
5. Save output to ElasticSearch.
6. Process output to get emotions and feelings based on the
text with Spark.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Video To Audio To Text
● Azure Cognitive Services
○ It works pretty well with many languages
○ Speech To Text
● Custom implementation
○ It requires a lot of time
○ Required for production use cases
How to mix
technologies?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Microservices - Perfect Match
- We can easily divide the platform
- Data Ingestion - there can be a big number of small parts
of data
- Data processing - no need for real time, batch processing
in Spark works well
- Queue is a must
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What about local setup?
● Docker Compose
○ Apps can be quickly containerized
● Cloud services
○ To mock or not to mock them?
● Remote developer instance
○ Ephemeral Kubernetes clusters might be a good idea also for your case
Observability
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Observability
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Observability
Monitoring describes the process of gathering metrics about IT
environment, running applications and observing the system
performance
Observability is about measuring how well internal states of the
system can be inferred from knowledge of its external outputs
(according to the control theory).
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Observability
Example:
- data processing job written in Spark, that rewrites data from
location A to B.
Gathering its metrics and setting up alerts or creating
dashboard with simple runtime visualization are a quite
simple tasks. However to achieve observability we should
collect metrics about the amount of processed data, JVM
statistics and some metrics about infrastructure under the
hood.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Quick and simple setup
Prometheus
Metrics
Loki with Promtail
Log Analytics
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What about alerts?
Alerts signify that
a human needs to take action
immediately
in response to something that is
either happening or about to
happen, in order to improve the
situation.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What to monitor?
Errors
Quality and quantity
Data scraping
Self-managed
Compute Resources
Managed
Compute Resources
Performance of NLP
pipelines
Logs monitoring
Lessons learnt
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Keep It Simple
● In case of PoC, go with the simplest possible solution.
● Cloud services are always worthy being checked.
● Mixing technologies is a good idea if we already have
know-how within the team
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Corner cases, corner cases
● Remember about corner cases
○ Processing greater number of events
○ Possibility to scale-up and scale-down environments
○ Limitations or downtime of any external services
○ Data Reprocessing
● CICD is always your friend
○ Unit and integration tests are must-have
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Contact details
albert.lewandowski@getindata.com
LinkedIn:
https://www.linkedin.com/in/albert-lewandowski
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Join Us!
Data Engineer
Spark, Snowflake, Airflow, AWS
Link
Data Scientist
Python, SQL, Data Science
Link
MLOps Engineer
MLOps tools, Python, public cloud
Link
Data Engineer (GCP)
GCP, Spark, BigQuery
Link
Thank you for your
attention!

More Related Content

Similar to NLP for videos: Understanding customers' feelings in videos - Albert Lewandowski, GetInData

OpenOffice.org/StarOffice & DRM, OMC Workshop 2006
OpenOffice.org/StarOffice & DRM, OMC Workshop 2006OpenOffice.org/StarOffice & DRM, OMC Workshop 2006
OpenOffice.org/StarOffice & DRM, OMC Workshop 2006Malte Timmermann
 
V2STech Corporate presentation for Software Product Development
V2STech Corporate presentation for Software Product DevelopmentV2STech Corporate presentation for Software Product Development
V2STech Corporate presentation for Software Product DevelopmentV2STech Solutions Private limited
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupYashrajNayak4
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczGetInData
 
DevOps best practices in microservices | Walkingtree Technologies
DevOps best practices in microservices | Walkingtree TechnologiesDevOps best practices in microservices | Walkingtree Technologies
DevOps best practices in microservices | Walkingtree TechnologiesWalking Tree Technologies
 
Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...
Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...
Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...VMware Tanzu
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSPuppet
 
Sidiq Permana - Building For The Next Billion Users
Sidiq Permana - Building For The Next Billion UsersSidiq Permana - Building For The Next Billion Users
Sidiq Permana - Building For The Next Billion UsersDicoding
 
IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...
IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...
IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...Software AG
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsAnant Corporation
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...Harry McLaren
 
How to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at ScaleHow to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at ScaleDevOps.com
 
In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015Software AG
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGDSCNiT
 
Develop, deploy, and operate services at reddit scale oscon 2018
Develop, deploy, and operate services at reddit scale   oscon 2018Develop, deploy, and operate services at reddit scale   oscon 2018
Develop, deploy, and operate services at reddit scale oscon 2018Gregory Taylor
 
SplunkLive! Zurich 2018: Event Analytics
SplunkLive! Zurich 2018: Event AnalyticsSplunkLive! Zurich 2018: Event Analytics
SplunkLive! Zurich 2018: Event AnalyticsSplunk
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...GetInData
 

Similar to NLP for videos: Understanding customers' feelings in videos - Albert Lewandowski, GetInData (20)

OpenOffice.org/StarOffice & DRM, OMC Workshop 2006
OpenOffice.org/StarOffice & DRM, OMC Workshop 2006OpenOffice.org/StarOffice & DRM, OMC Workshop 2006
OpenOffice.org/StarOffice & DRM, OMC Workshop 2006
 
V2STech Corporate presentation for Software Product Development
V2STech Corporate presentation for Software Product DevelopmentV2STech Corporate presentation for Software Product Development
V2STech Corporate presentation for Software Product Development
 
Software product development services
Software product development servicesSoftware product development services
Software product development services
 
Splunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning WebinarSplunk Artificial Intelligence & Machine Learning Webinar
Splunk Artificial Intelligence & Machine Learning Webinar
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr MenclewiczData-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
Data-Driven Fast Track: Introduction to data-drivenness with Piotr Menclewicz
 
DevOps best practices in microservices | Walkingtree Technologies
DevOps best practices in microservices | Walkingtree TechnologiesDevOps best practices in microservices | Walkingtree Technologies
DevOps best practices in microservices | Walkingtree Technologies
 
Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...
Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...
Iterating For Success: A Case Study in Remote Paired Programming, The Evoluti...
 
Controlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWSControlled Evolution with Puppet and AWS
Controlled Evolution with Puppet and AWS
 
Sidiq Permana - Building For The Next Billion Users
Sidiq Permana - Building For The Next Billion UsersSidiq Permana - Building For The Next Billion Users
Sidiq Permana - Building For The Next Billion Users
 
IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...
IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...
IW13 Keynote, Wolfram Jost—Innovations for the Digital Enterprise: The Digita...
 
Dagster @ R&S MNT
Dagster @ R&S MNTDagster @ R&S MNT
Dagster @ R&S MNT
 
Data Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps FundamentalsData Engineer's Lunch #68: DevOps Fundamentals
Data Engineer's Lunch #68: DevOps Fundamentals
 
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
SplDevOps: Making Splunk Development a Breeze With a Deep Dive on DevOps' Con...
 
How to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at ScaleHow to Operate Kubernetes CI/CD Pipelines at Scale
How to Operate Kubernetes CI/CD Pipelines at Scale
 
In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015In-Memory Data Management Goes Mainstream - OpenSlava 2015
In-Memory Data Management Goes Mainstream - OpenSlava 2015
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Develop, deploy, and operate services at reddit scale oscon 2018
Develop, deploy, and operate services at reddit scale   oscon 2018Develop, deploy, and operate services at reddit scale   oscon 2018
Develop, deploy, and operate services at reddit scale oscon 2018
 
SplunkLive! Zurich 2018: Event Analytics
SplunkLive! Zurich 2018: Event AnalyticsSplunkLive! Zurich 2018: Event Analytics
SplunkLive! Zurich 2018: Event Analytics
 
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...Managing Big Data projects in a constantly changing environment - Rafał Zalew...
Managing Big Data projects in a constantly changing environment - Rafał Zalew...
 

More from GetInData

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...GetInData
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competitionGetInData
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? GetInData
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierGetInData
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformGetInData
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataGetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...GetInData
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...GetInData
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataGetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataGetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...GetInData
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataGetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...GetInData
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...GetInData
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...GetInData
 
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...GetInData
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...GetInData
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?GetInData
 

More from GetInData (20)

How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...How do we work with customers on Big Data / ML / Analytics Projects using Scr...
How do we work with customers on Big Data / ML / Analytics Projects using Scr...
 
How NOT to win a Kaggle competition
How NOT to win a Kaggle competitionHow NOT to win a Kaggle competition
How NOT to win a Kaggle competition
 
How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team? How to become good Developer in Scrum Team?
How to become good Developer in Scrum Team?
 
OpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easierOpenLineage & Airflow - data lineage has never been easier
OpenLineage & Airflow - data lineage has never been easier
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInDataModel serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
Model serving made easy using Kedro pipelines - Mariusz Strzelecki, GetInData
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...MLOps implemented - how we combine the cloud & open-source to boost data scie...
MLOps implemented - how we combine the cloud & open-source to boost data scie...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInDataFeast + Amundsen Integration - Mariusz Strzelecki, GetInData
Feast + Amundsen Integration - Mariusz Strzelecki, GetInData
 
Big data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInDataBig data trends - Krzysztof Zarzycki, GetInData
Big data trends - Krzysztof Zarzycki, GetInData
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
Analytics 101 - How to build a data-driven organisation? - Rafał Małanij, Get...
 
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInDataMonitoring in Big Data Platform - Albert Lewandowski, GetInData
Monitoring in Big Data Platform - Albert Lewandowski, GetInData
 
Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...Monitoring environment based on satellite data with Python and PySpark - Albe...
Monitoring environment based on satellite data with Python and PySpark - Albe...
 
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
Welcome to MLOps candy shop and choose your flavour! - Mateusz Pytel & Marius...
 
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
Real time analytics that controls 50% of mobile network in Poland - Maciej Br...
 
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
How to maximize profit from IoT by using data platform - Albert Lewandowski, ...
 
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
Understanding Big Data Analytics - solutions for growing businesses - Rafał M...
 
Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?Hot to build continuously processing for 24/7 real-time data streaming platform?
Hot to build continuously processing for 24/7 real-time data streaming platform?
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

NLP for videos: Understanding customers' feelings in videos - Albert Lewandowski, GetInData

  • 1. NLP for videos: Understanding customers' feelings in videos Author: Albert Lewandowski
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. About me ● Big Data DevOps Engineer - GetInData ● Focused on infrastructure, cloud, Big Data, AI, scalable web applications ● Certified Google Cloud Architect ● Certified Kubernetes Administrator
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Content ● Problem to solve ● Big Data Frameworks or not? ● Cloud Magic ● How to mix technologies? ● Observability ● Lessons learnt
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. The Problem Big volume of the videos is complex to be monitored while more and more young users prefer to mention brands in the video-based social media. 60% Of companies don’t convert leads into revenue 95% of a message when they watch it in a video, compared to 10% when reading it in the text Source: Agility PR, Rick Whittinghton, Hubspot, Insivia 54% of consumers want to see more video content from a brand or business they support
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Solution Scalable Cloud platform written in Golang, Python and React with Azure Machine Learning services, and with Apache Spark. Artificial Intelligence Efficient
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ~5 - 6 weeks for the project ● Which tools are the fastest in delivering results? ● What is the crucial to meet requirements for PoC? ● How can we analyze language? ● What data do we need to create valuable insights? ● Can we provide scalable platform?
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Perception Business logic CI/CD Idempotency Reprocessing Explainability Monitoring Testing Serving Infrastructure Data Ingestion Security
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Reality Business logic CI/CD Idempotency Reprocessing Explainability Monitoring Testing Serving Infrastructure Data Ingestion Security
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Quick start 1. Linux command line is enough as the entrypoint for the project. 2. Python script and managed services. 3. Do not reinvent the wheel.
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Big Data tools? 1. Apache Spark can be flexible tool, especially when we know it and we want to test it with bigger scale. 2. Writing own app in Golang can be a wise choice when we want to proceed with simple actions like gathering data from external sources. 3. Limitations of the components a. Like external SDK
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Complex Analysis Spark seems to be the right solution for it but the speed of development was more important than creating scalable solution. Processing Polish language is really tough and it requires much more code development. Spark NLP v3 from John Snow Labs is worth checking.
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Target output ● Frequency of the phrase (like the problem of the product). ● Feelings related to it and if there is only mentioned a problem or a problem is the main character. ● Each video is tagged with the categories corresponding to: type of content, feelings, key words. ● Visualizing changes depending on timeperiod.
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Managed Services Public cloud provides wide range of services but their quality may differ. Moreover, pricing can be really high.
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. List of steps 1. Get required links to videos. 2. Process video to get only audio. 3. Save audio to storage. 4. Get audio and process it with Azure Cognitive Services to receive text. 5. Save output to ElasticSearch. 6. Process output to get emotions and feelings based on the text with Spark.
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Video To Audio To Text ● Azure Cognitive Services ○ It works pretty well with many languages ○ Speech To Text ● Custom implementation ○ It requires a lot of time ○ Required for production use cases
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Microservices - Perfect Match - We can easily divide the platform - Data Ingestion - there can be a big number of small parts of data - Data processing - no need for real time, batch processing in Spark works well - Queue is a must
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What about local setup? ● Docker Compose ○ Apps can be quickly containerized ● Cloud services ○ To mock or not to mock them? ● Remote developer instance ○ Ephemeral Kubernetes clusters might be a good idea also for your case
  • 24. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Observability
  • 25. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Observability Monitoring describes the process of gathering metrics about IT environment, running applications and observing the system performance Observability is about measuring how well internal states of the system can be inferred from knowledge of its external outputs (according to the control theory).
  • 26. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Observability Example: - data processing job written in Spark, that rewrites data from location A to B. Gathering its metrics and setting up alerts or creating dashboard with simple runtime visualization are a quite simple tasks. However to achieve observability we should collect metrics about the amount of processed data, JVM statistics and some metrics about infrastructure under the hood.
  • 27. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Quick and simple setup Prometheus Metrics Loki with Promtail Log Analytics
  • 28. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What about alerts? Alerts signify that a human needs to take action immediately in response to something that is either happening or about to happen, in order to improve the situation.
  • 29. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What to monitor? Errors Quality and quantity Data scraping Self-managed Compute Resources Managed Compute Resources Performance of NLP pipelines Logs monitoring
  • 31. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Keep It Simple ● In case of PoC, go with the simplest possible solution. ● Cloud services are always worthy being checked. ● Mixing technologies is a good idea if we already have know-how within the team
  • 32. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Corner cases, corner cases ● Remember about corner cases ○ Processing greater number of events ○ Possibility to scale-up and scale-down environments ○ Limitations or downtime of any external services ○ Data Reprocessing ● CICD is always your friend ○ Unit and integration tests are must-have
  • 33. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Q&A
  • 34. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Contact details albert.lewandowski@getindata.com LinkedIn: https://www.linkedin.com/in/albert-lewandowski
  • 35. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Join Us! Data Engineer Spark, Snowflake, Airflow, AWS Link Data Scientist Python, SQL, Data Science Link MLOps Engineer MLOps tools, Python, public cloud Link Data Engineer (GCP) GCP, Spark, BigQuery Link
  • 36. Thank you for your attention!