RTL collects various data sources like click data, account data, and campaign data. Their data science team uses this data for tasks like churn modeling, response modeling, and customer segmentation. They employ techniques like text mining, computer vision, and association rule mining. For text mining of movie plots, they create a term document matrix and calculate cosine similarity to find similar movies. They also use pre-trained models like VGG16 and ResNet with Keras to perform tasks like content tagging, feature extraction, and measuring image similarities. Survival curves are also used to analyze at what points in episodes or series people stop watching.
In this projet, we analyze a dataset about 10,000 movies which was orginally generated from the TMDb movie database APi and published by kaggle https://www.kaggle.com/tmdb/tmdb-movie-metadata. We've analyzed the dataset, in order the answer different research questions:
- Most popular movies by genre,
- relations between movie popularity and rating with the production budget and revenue
This talk describes the problem of data silos, and the root cause which is lack of incentive to share. Ocean Protocol aims to democratize data for use by AI, by leveraging blockchain incentives. It uses a Proofed Curation Market construction, which combines cryptographic proof (e.g. proof of availability) with curation markets.
Watch the companion webinar at: http://embt.co/1hjDU8s
Many DBAs may only know enough about data modeling to be dangerous. There are a number of challenges that DBAs face when trying to do data modeling, as well as some preconceived notions of what they think data modeling can (or can’t) do for them, such as generating useful DDL code.
This 90-minute session will provide specific insights and examples to show DBAs how a data modeling tool can help them improve database performance. Data modeling can simplify routine tasks and provide valuable context for a database implementation. Karen Lopez and John Sterrett will debunk seven dangerous myths that DBAs believe about data modeling, and also discuss and demonstrate:
+ Challenges DBAs encounter with data modeling
+ What data modeling really means and how it adds value
+ Why data modeling is key to successful agile projects
+ How data model-driven development saves time and money
+ Why data modeling should be done throughout the development lifecycle
How to Build a Recommendation Engine on SparkCaserta
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
Build, train, and deploy machine learning models at scale - AWS Summit Cape T...Amazon Web Services
Speaker: Adrian Hornsby, AWS
Level: 300
Machine learning often feels a lot harder than it should be to most developers because the process to build and train models, and then deploy them into production is too complicated and too slow. Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. In this session, I will make a quick introduction to machine learning and walk through leveraging Sagemaker for your machine learning projects.
Brochure data science learning path board-infinity (1)NirupamNishant2
Board Infinity is a best digital marketing and data science institute in mumbai, which is a full-stack career platform for students and jobseekers enabled by personalised learning paths,career coaches and access to various job oppurtunities. We provide online and offline training in Data Science, Digital Marketing, Full stack Web Development,Product management< machine learning and Atrificial Intelligence,Online career counselling and other career solutions
In this projet, we analyze a dataset about 10,000 movies which was orginally generated from the TMDb movie database APi and published by kaggle https://www.kaggle.com/tmdb/tmdb-movie-metadata. We've analyzed the dataset, in order the answer different research questions:
- Most popular movies by genre,
- relations between movie popularity and rating with the production budget and revenue
This talk describes the problem of data silos, and the root cause which is lack of incentive to share. Ocean Protocol aims to democratize data for use by AI, by leveraging blockchain incentives. It uses a Proofed Curation Market construction, which combines cryptographic proof (e.g. proof of availability) with curation markets.
Watch the companion webinar at: http://embt.co/1hjDU8s
Many DBAs may only know enough about data modeling to be dangerous. There are a number of challenges that DBAs face when trying to do data modeling, as well as some preconceived notions of what they think data modeling can (or can’t) do for them, such as generating useful DDL code.
This 90-minute session will provide specific insights and examples to show DBAs how a data modeling tool can help them improve database performance. Data modeling can simplify routine tasks and provide valuable context for a database implementation. Karen Lopez and John Sterrett will debunk seven dangerous myths that DBAs believe about data modeling, and also discuss and demonstrate:
+ Challenges DBAs encounter with data modeling
+ What data modeling really means and how it adds value
+ Why data modeling is key to successful agile projects
+ How data model-driven development saves time and money
+ Why data modeling should be done throughout the development lifecycle
How to Build a Recommendation Engine on SparkCaserta
How to Build a Recommendation Engine on Spark was a presentation given by Joe Caserta, CEO and founder of Caserta Concepts, at @AnalyticsWeek in Boston.
Boston's Data AnalyticsStreet Conference is a 2 day packed event with thought provoking keynotes, knowledge filled sessions, intense workshops, insightful panels, and real-world case studies - engaging analytics community with latest methodologies and trends. The conference encompasses largest Speaker-to-Attendee ratio for unmatched networking and learning opportunity.
For more information on the services and solutions Caserta Concepts offers, visit our website at http://casertaconcepts.com/.
Build, train, and deploy machine learning models at scale - AWS Summit Cape T...Amazon Web Services
Speaker: Adrian Hornsby, AWS
Level: 300
Machine learning often feels a lot harder than it should be to most developers because the process to build and train models, and then deploy them into production is too complicated and too slow. Amazon SageMaker is a fully-managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. In this session, I will make a quick introduction to machine learning and walk through leveraging Sagemaker for your machine learning projects.
Brochure data science learning path board-infinity (1)NirupamNishant2
Board Infinity is a best digital marketing and data science institute in mumbai, which is a full-stack career platform for students and jobseekers enabled by personalised learning paths,career coaches and access to various job oppurtunities. We provide online and offline training in Data Science, Digital Marketing, Full stack Web Development,Product management< machine learning and Atrificial Intelligence,Online career counselling and other career solutions
Module 9: Natural Language Processing Part 2Sara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this material to further our mission of improving access to machine learning. Education please reach out to inquiry@deltanalytics.org .
Accelerating Path to Production for Generative AI-powered ApplicationsHostedbyConfluent
"In this session, we will discuss some recent developments in Generative AI and how those can be leveraged to build intelligent applications. Learn how to bring the power of large language models (LLMs) to your private, real-time operational data across multiple data types. We will talk about improving the accuracy of LLMs in your applications by leveraging Retrieval Augmented Generation, which provides proprietary knowledge to the LLM.
From real-time responses to sophisticated interactions, learn how you can easily build a range of AI-driven experiences that leverage your operational data with minimal complexity.
MongoDB Atlas provides native vector search capabilities and a flexible document model all within an enterprise-ready developer data platform empowering teams to iterate quickly on applications enriched with generative AI. Coupling Atlas with Confluent makes it easier to leverage streaming data when informing LLMs with proprietary data."
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Enhancing and Automating Decision Making with Machine Learning. Feature Engineering: Creating Features that Make Machine Learning Work, by BigML.
MLSEV 2019: 1st edition of the Machine Learning School in Seville, Spain.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
Deep learning: from theory via virality to applications in businessRasmus Rothe
This talk was held by Rasmus Rothe at the Berlin Machine Learning Group at Rocket Internet on June 2, 2016. He presented the research behind howhot.io, some fun facts, how it became viral and how he intends to leverage the technology behind it to solve real-world problems with his new company, Merantix.
Big Data for Small Businesses & StartupsFujio Turner
Big Data is not just for Big Businesses. In this slideshare we will cover how small businesses and startups can leverage Big Data to increase revenue. HPCC Systems lets you get started with only one machine and grow to exabytes.
1. Mining and understanding customers behavior from data outside the firewall and joining it with internal data to turn it into actionable marketing strategies.
2. Understanding your whole business with BI tools. Learn how Big Data help join data from different parts of your business to see the big picture.
Worst Practices in Artificial IntelligenceWilliam Tsoi
In this talk I discuss six "worst practices" in Artificial Intelligence, so that you don't make the same mistakes as you embark on your AI and Machine Learning journey!
The full talk (in cantonese) is here: https://youtu.be/NIIztmpA6Hc?t=1172
Eland: A Python client for data analysis and explorationElasticsearch
Python is a highly adopted language for data science and analysis. Eland is a Python client and toolkit for DataFrames, big data, machine learning, and ETL in Elasticsearch. Get an introduction to Eland with a hands-on demo where you’ll learn about the DataFrame implementation of Eland, as well as how to manage machine learning models.
Building predictive models in Azure Machine LearningMostafa
This presentation covers how to build and drive insights from data by building machine learning models. The session covers how to develop and train models in Python/R using Azure Machine Learning. The session covers how to explore key concepts in data acquisition, preparation, exploration, and visualization, and take a look at how to build a predictive solution using Azure Machine Learning, R, and Python. The session covers tips and tricks on selecting the right algorithm for your data science problem and how to utilize Machine Learning to solve it.
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
Liberty Mutual Enterprise Data Lake Use Case Study
By building a data lake, Liberty Mutual Insurance Group Enterprise Analytics department has created a platform to implement various big data analytic projects. We will share our journey and how we leveraged Hortonworks Hadoop distribution and other open source technologies to meet our project needs. This session will cover data lake architecture, security, and use cases.
Module 9: Natural Language Processing Part 2Sara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this material to further our mission of improving access to machine learning. Education please reach out to inquiry@deltanalytics.org .
Accelerating Path to Production for Generative AI-powered ApplicationsHostedbyConfluent
"In this session, we will discuss some recent developments in Generative AI and how those can be leveraged to build intelligent applications. Learn how to bring the power of large language models (LLMs) to your private, real-time operational data across multiple data types. We will talk about improving the accuracy of LLMs in your applications by leveraging Retrieval Augmented Generation, which provides proprietary knowledge to the LLM.
From real-time responses to sophisticated interactions, learn how you can easily build a range of AI-driven experiences that leverage your operational data with minimal complexity.
MongoDB Atlas provides native vector search capabilities and a flexible document model all within an enterprise-ready developer data platform empowering teams to iterate quickly on applications enriched with generative AI. Coupling Atlas with Confluent makes it easier to leverage streaming data when informing LLMs with proprietary data."
Data Scientist has been regarded as the sexiest job of the twenty first century. As data in every industry keeps growing the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is creating new paradigms in data driven business decisions. As the field is emerging out of its infancy a wide range of skill sets are becoming an integral part of being a Data Scientist. In this talk I will discuss the different driven roles and the expertise required to be successful in them. I will highlight some of the unique challenges and rewards of working in a young and dynamic field.
Enhancing and Automating Decision Making with Machine Learning. Feature Engineering: Creating Features that Make Machine Learning Work, by BigML.
MLSEV 2019: 1st edition of the Machine Learning School in Seville, Spain.
Relational databases were conceived to digitize paper forms and automate well-structured business processes, and still have their uses. But RDBMS cannot model or store data and its relationships without complexity, which means performance degrades with the increasing number and levels of data relationships and data size. Additionally, new types of data and data relationships require schema redesign that increases time to market.
A native graph database like Neo4j naturally stores, manages, analyzes, and uses data within the context of connections meaning Neo4j provides faster query performance and vastly improved flexibility in handling complex hierarchies than SQL.
Deep learning: from theory via virality to applications in businessRasmus Rothe
This talk was held by Rasmus Rothe at the Berlin Machine Learning Group at Rocket Internet on June 2, 2016. He presented the research behind howhot.io, some fun facts, how it became viral and how he intends to leverage the technology behind it to solve real-world problems with his new company, Merantix.
Big Data for Small Businesses & StartupsFujio Turner
Big Data is not just for Big Businesses. In this slideshare we will cover how small businesses and startups can leverage Big Data to increase revenue. HPCC Systems lets you get started with only one machine and grow to exabytes.
1. Mining and understanding customers behavior from data outside the firewall and joining it with internal data to turn it into actionable marketing strategies.
2. Understanding your whole business with BI tools. Learn how Big Data help join data from different parts of your business to see the big picture.
Worst Practices in Artificial IntelligenceWilliam Tsoi
In this talk I discuss six "worst practices" in Artificial Intelligence, so that you don't make the same mistakes as you embark on your AI and Machine Learning journey!
The full talk (in cantonese) is here: https://youtu.be/NIIztmpA6Hc?t=1172
Eland: A Python client for data analysis and explorationElasticsearch
Python is a highly adopted language for data science and analysis. Eland is a Python client and toolkit for DataFrames, big data, machine learning, and ETL in Elasticsearch. Get an introduction to Eland with a hands-on demo where you’ll learn about the DataFrame implementation of Eland, as well as how to manage machine learning models.
Building predictive models in Azure Machine LearningMostafa
This presentation covers how to build and drive insights from data by building machine learning models. The session covers how to develop and train models in Python/R using Azure Machine Learning. The session covers how to explore key concepts in data acquisition, preparation, exploration, and visualization, and take a look at how to build a predictive solution using Azure Machine Learning, R, and Python. The session covers tips and tricks on selecting the right algorithm for your data science problem and how to utilize Machine Learning to solve it.
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
Liberty Mutual Enterprise Data Lake Use Case Study
By building a data lake, Liberty Mutual Insurance Group Enterprise Analytics department has created a platform to implement various big data analytic projects. We will share our journey and how we leveraged Hortonworks Hadoop distribution and other open source technologies to meet our project needs. This session will cover data lake architecture, security, and use cases.
Data Science inspiratie sessie, ludieke voorbeelden die enkele machine learning technieken illustreren. Voorspellen van huizenprijzen, soap analytics, auto's, Ikea, de nederlandse film wereld
Jaap Huisprijzen, GTST, The Bold, IKEA en IensLonghow Lam
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens, zomaar wat toepassingen van machine learning met Dataiku.
Slides of my presentation at BigDataExpo Utrect 20-Sep-2018
Slides from my lightning talk at satRDay Amsterdam, 1 sep 2018. Two hobby projects with R package text2vec. 1. Predicting house prices from house descriptions. 2. Word embeddings from the soap series The Bold and The Beautiful
Slides of my presentation at the Dataiku meetup on 12th July in Amsterdam (NL)
https://www.meetup.com/Analytics-Data-Science-by-Dataiku-Amsterdam/events/251910036/
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
1. Data science @
RTL Nederland
Longhow Lam
@longhowlam
1st Master Search
Advanced Analytics meetup
5-juli-2017
2. Agenda
RTL Data science set up
Some data science topics @RTL
Text mining
Computer vision
Association rules
Power BI
3. RTL Data science set up
Source Data
Click data
Heartbeat data
Account data
Location
Metadata
Campaign data
Etc..
Data science team
4 data engineers
4 data scientists
The main businesses at RTL we work for
ETL processes
Find out for your self:
https://www.rtl.nl/werkenbij/
Use cases
Churn modeling
Response modeling
Customer segmentation
Look-alikes-for Advertisers
Recommendation engines
4. Content similarity
Which movies on Videoland are close to each other?
Which news articles on RTL Nieuws are close to each other?
movies, we can look at the movie summaries or Video captures
news articles, we can look the text of the articles or corresponding news image
Hence text mining and computer vision
6. Text mining
2000 movie plots / summaries on VideoLand
For each movie plot: count the words / terms
Put the counts in a so-called term document matrix
There are around 50.000 terms in the 2000 movie plots
Usually this matrix is very sparse
Aap Film Auto …. Leven …. …. Zwaar
Film1 1 4 8
Film2 10
Film3 5
… 1
…
… 6 6
Film2000 1 8
Term document matrix
7. Similarity: cosine similarity
Between movies (so between rows of the matrix) we can now calculate similarities
A distance that is often used is cosine similarity
Visually we can see this distance in the following figure:
Suppose we only have two terms:
1. Leven
2. Spannend
# leven
# Spannend
Film 1
Film 2
cosine similarity
8. VideoLand To get a feeling for Movie similarities we created a small shiny app
11. Two approaches used @RTL
Computer Vision API from Vendors (Microsoft, Clarifai, Google,…)
Tweak things ourselves with Keras/Tensorflow
+ Ready and Easy to use (Just send your image to them)
+ Not too expensive ($0.84 / 1000 images)
- No control on what is returned
- Takes more effort to set it up
- Needs more knowledge
+ More control on what you are doing
12. RTL Nieuws image: API examples
Feature Name Value
Description { "type": 0, "captions": [ { "text": "a group of people
sitting on a table", "confidence":
0.4894670976127814 } ] }
Tags [ { "name": "person", "confidence":
0.996391236782074 }, { "name": "indoor",
"confidence": 0.9104063510894775 }, { "name":
"people", "confidence": 0.7057779431343079 } ]
Image Format Jpeg
Image Dimensions 4096 x 3078
Clip Art Type 0 Non-clipart
Line Drawing Type 0 Non-LineDrawing
Black & White Image False
Is Adult Content False
Adult Score 0.042066238820552826
Is Racy Content False
Racy Score 0.061784882098436356
Categories [ { "name": "people_many", "score": 0.9296875 } ]
Faces [ { "age": 52, "gender": "Male", "faceRectangle": {
"width": 298, "height": 298, "left": 433, "top": 1370 }
}, { "age": 78, "gender": "Male", "faceRectangle": {
"width": 269, "height": 269, "left": 3212, "top": 1410
} }, { "age": 64, "gender": "Male", "faceRectangle": {
"width": 241, "height": 241, "left": 2108, "top": 1534
} } ]
13. Feature Name Value
Description { "type": 0, "captions": [ { "text": "Linda de Mol
talking on a cell phone", "confidence":
0.46178352459016536 } ] }
Tags [ { "name": "person", "confidence":
0.9999904632568359 }, { "name": "outdoor",
"confidence": 0.9974232912063599 }, { "name":
"woman", "confidence": 0.9967917799949646 }, {
"name": "lady", "confidence":
0.7658315896987915 } ]
Image Format Jpeg
Image Dimensions 1024 x 421
Clip Art Type 0 Non-clipart
Line Drawing Type 0 Non-LineDrawing
Black & White
Image
False
Is Adult Content False
Adult Score 0.009753250516951084
Is Racy Content False
Racy Score 0.014254707843065262
Categories [ { "name": "people_portrait", "score": 0.96875 } ]
Faces [ { "age": 28, "gender": "Female", "faceRectangle": {
"width": 282, "height": 282, "left": 286, "top": 35 } }
]
RTL Nieuws image: API examples
15. Tweak things ourselves with Keras
Keras is a high-level neural networks API running on top of
either TensorFlow
or Theano.
and now also CNTK
Developed for fast experimentation.
Easier to use than tensorflow, but you still have lot’s of options
There is now also an R interface (of course created by Rstudio… )
16. Keras: Simpel set-up “Architecture”
Tensorflow installed on a (linux) machine
Ideally with lots of GPU’s
pip install keras
You’re good to go in Python
(Jupyter notebooks)
install_github("rstudio/keras")
You’re good to go in
R / RStudio
17. Example in R: Neural network with two hidden layers
Pixel 3
Pixel 2
Pixel 1
Pixel 783
Pixel 784
Label 0
Label 9
18. Using pre-trained models
Image classifiers have been trained on big GPU machines
for weeks with millions of pictures on very large networks
Not many people do that from scratch. Instead, one can
use pre-trained networks and start from there.
VGG19 deep learning model
143 million weights!!!
23. RTL NIEUWS Image similarity
1024 RTL Nieuws Sample pictures. Compute for each image the 25.088 feature values.
Calculate for each image the top 10 closest images, based on cosine similarity.
Little Shiny APP
27. Take five Brad Pitt pictures
Run them trough the pre-trained
vgg16 and extract feature vectors.
This is a 5 by 25088 matrix
The brad Pit Index
Take other images, run them through the VGG16
Calculate the distances with the five Brad Pitt pictures and average:
0.771195 0.802654 0.714752 0.792587 0.8291976 0.80969440.665990 0.9737212
28. 0.6273 0.5908 0.8231 0.7711 0.8839 0.8975 0.6934 0.9659
Focusing on only the face!!
29. Can you shake hands with your neighbor?
A little Statistical Experiment
30. Can you shake hands with your neighbor?
A little Statistical Experiment
50.1% of people don’t wash their
hands after visiting the toilet
34. Association Rules Mining
Market basket analysis
Association rules mining (arm)
Mixture of different methods
Ensemble
ARM is one of several so called collaborative filter algorithms
Collaborative filtering is a method of making recommendations
about the interests of one user (filter) by collecting preferences
or behavior from many users (collaborating).
Memory-based algorithms
Slope one (slope1)
K nearest neighbors (knn)
Model-based algorithms
Matrix factorization methods
35. Association rule mining
The basics
Identify frequent item sets (or rules) in the customer transaction data:
IF item X THEN item Y
IF item A and B THEN item very likely item C
Not all rules are interesting, use ‘support’ and ‘lift’ to judge importance of a rule
# trxs. {X} {Y}
Total # trxs.
Support (X,Y) =
Lift (X,Y) =
Support (X,Y)
Support (X) * Support(Y)
Support & Lift
GTST Nieuwe Tijden 10.8%
Star trek GTST 0.018%
For example a lift of 2.5 means:
If people have watched movie X then they are 2.5 more
likely to watch movie Y than if they didn’t watch movie X
36. Association rules virtual items
User Movie
1 Blacklist
1 Startrek
1 James bond
2 Kill Bill
2 Pulp fiction
3 Stargate
3 Men in Black
An old trick with association rules mining is to add ‘virtual’ items
User Virtual item
1 Blacklist
1 Startrek
1 James bond
1 Male
1 [25-30) Y
2 Kill Bill
2 Pulp fiction
2 Female
2 [40-45) Y
3 Stargate
3 Men in Black
2 Male
2 [50-55) Y
Rules that now might appear are for example:
Male, [40-45), Startrek James Bond
Female, [20-25), Kill Bill Pulp Fiction
39. Survival curve
At which moment in an episode do people stop watching?
Can we compare different episodes and series?
Survival Curves!!
For a specific Episode from a specific Serie:
Take all Videoland streams: Starts / Stops from
Determine completion rate, and rank all streams on completion rate
Calculate empirical distribution F
Survival: S =1 – F
Do this for all episodes and series