Meetup Link: https://www.meetup.com/Cognitive-Computing-Enthusiasts/events/250444108/
Recording Link: https://www.youtube.com/watch?v=4uXg1KTXdQc
When developing a machine learning system, the possibilities are limitless. However, with the recent explosion of Big Data and AI, there are more options than ever to filter through. Which technologies to select, which model topologies to build, and which infrastructure to use for deployment, just to name a few. We have explored these options for our faceted refinement system for video content system (consisting of 100K+ videos) along with their many roadblocks. Three primary areas of focus involve natural language processing, video frame sampling, and infrastructure deployment.
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
DeepPavlov is an open-source framework for the development of production-ready chat-bots and complex conversational systems, as well as NLP and dialog systems research.
Peter Muschick MSc thesis
Universitat Pollitecnica de Catalunya, 2020
Sign language recognition and translation has been an active research field in the recent years with most approaches using deep neural networks to extract information from sign language data. This work investigates the mostly disregarded approach of using human keypoint estimation from image and video data with OpenPose in combination with transformer network architecture. Firstly, it was shown that it is possible to recognize individual signs (4.5% word error rate (WER)). Continuous sign language recognition though was more error prone (77.3% WER) and sign language translation was not possible using the proposed methods, which might be due to low accuracy scores of human keypoint estimation by OpenPose and accompanying loss of information or insufficient capacities of the used transformer model. Results may improve with the use of datasets containing higher repetition rates of individual signs or focusing more precisely on keypoint extraction of hands.
https://imatge-upc.github.io/synthref/
Integrating computer vision with natural language processing has achieved significant progress
over the last years owing to the continuous evolution of deep learning. A novel vision and language
task, which is tackled in the present Master thesis is referring video object segmentation, in which a
language query defines which instance to segment from a video sequence. One of the biggest chal-
lenges for this task is the lack of relatively large annotated datasets since a tremendous amount of
time and human effort is required for annotation. Moreover, existing datasets suffer from poor qual-
ity annotations in the sense that approximately one out of ten language expressions fails to uniquely
describe the target object.
The purpose of the present Master thesis is to address these challenges by proposing a novel
method for generating synthetic referring expressions for an image (video frame). This method pro-
duces synthetic referring expressions by using only the ground-truth annotations of the objects as well
as their attributes, which are detected by a state-of-the-art object detection deep neural network. One
of the advantages of the proposed method is that its formulation allows its application to any object
detection or segmentation dataset.
By using the proposed method, the first large-scale dataset with synthetic referring expressions for
video object segmentation is created, based on an existing large benchmark dataset for video instance
segmentation. A statistical analysis and comparison of the created synthetic dataset with existing ones
is also provided in the present Master thesis.
The conducted experiments on three different datasets used for referring video object segmen-
tation prove the efficiency of the generated synthetic data. More specifically, the obtained results
demonstrate that by pre-training a deep neural network with the proposed synthetic dataset one can
improve the ability of the network to generalize across different datasets, without any additional annotation cost. This outcome is even more important taking into account that no additional annotation cost is involved.
DeepPavlov is an open-source framework for the development of production-ready chat-bots and complex conversational systems, as well as NLP and dialog systems research.
Deep learning networks can be successfully applied to big data for knowledge discovery, knowledge application, and knowledge-based prediction. In other words, deep learning can be a powerful engine for producing actionable results.
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW
Module 9: Natural Language Processing Part 2Sara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this material to further our mission of improving access to machine learning. Education please reach out to inquiry@deltanalytics.org .
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
Describes the Kairntech approach to real-world NLP/AI requirements, putting an emphasis on the quick and efficient creation and curation of training data sets.
Chatbots and Natural Language Generation - A Bird Eyes ViewMark Cieliebak
Chatbots, conversational user interfaces, dialogue systems, question-answering - the names differ, but the fundamental idea is the same: smart computer systems which can "talk" to humans in a natural way. Chatbots and their derivatives are designed to understand human language, interpret its content, and reply accordingly. This long-standing vision from artificial intelligence has gained enormous momentum since 2015.
But what is possible, and where are the boundaries? Do chatbots really "understand" the meaning of text? And how can they be employed beneficially in real-world applications?
In this talk, we will give an overview of state-of-the-art technologies and applications for dialogue systems in research and industry.
This talk was given to an internal audience of non-engineers at Freshworks primarily product managers, marketers & sales folks to educate on machine learning basics & future.
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
Tim Menzies, NC State University, presents at the 2015 HPCC Systems Engineering Summit Community Day.
For Big Data applications, there is a lack of any gold standards for "good analysis" or methods to assess our certification programs. Hence, we are still in the dark about whether or not our human analysts are making the best use possible of the tools of Big Data. While much progress has been made in the systems aspects of Big Data, certain critical human-centered aspects remain an open issue. Regardless of the sophistication of the analysis tools and environment, all that architecture can still be used incorrectly by users. If this issue was confined to a small number of inexperienced users, then it could be addressed via process improvements such as better training. But is it? What do we know about our analysts? Where are the studies that mine the people doing the data miners?
This presentation offers some preliminary results on tools that combine ECL with other methods that recognize the code generated by experienced or inexperienced developers. While the results are preliminary, they do raise the possibility that we can better characterize what it means to be experienced (or inexperienced) at Big Data applications.
Everyone has been hearing about Machine learning and AI for a while now, but recently, it exploded.
Like you, Ortus and the CFML Community have been playing with AI too, and one of the end results is ChatGPT Box.
AI is cool, and for some people scary, but a lot of people wonder if there is really any true value for us developers, or our businesses. In this session we’ll discuss what ChatGPT Box is, why we created it, what types of problems it solves, why we are using AI to solve those problems, and how we trained and tamed our own AI.
We will also touch on some of the science behind the scenes, to help you understand the moving parts, and how ChatGPT Box v1.0.0 is just a drop in the ocean of the possibilities, we’ll touch on some ideas we have, and in the end, using ChatGPT Box can make you a much more productive Ortusian Developer!
Develop your career in the field of software development. Want to learn programming and develop your own applications, the presentation helps you to understanding the technology and the training methodologies required for that.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
More Related Content
Similar to Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
Deep learning networks can be successfully applied to big data for knowledge discovery, knowledge application, and knowledge-based prediction. In other words, deep learning can be a powerful engine for producing actionable results.
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
Serverless Toronto's 6th-anniversary event helps IT pros understand and prepare for the #GenAI tsunami ahead. You'll gain situational awareness of the LLM Landscape, receive condensed insights, and actionable advice about RAG in 2024 from Google AI Lead Mark Ryan and LlamaIndex creator Jerry Liu. We chose #RAG (Retrieval-Augmented Generation) because it is the predominant paradigm for building #LLM (Large Language Model) applications in enterprises today - and that's where the jobs will be shifting. Here is the recording: https://youtu.be/P5xd1ZjD-Os?si=iq8xibj5pJsJ62oW
Module 9: Natural Language Processing Part 2Sara Hooker
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this material to further our mission of improving access to machine learning. Education please reach out to inquiry@deltanalytics.org .
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
Describes the Kairntech approach to real-world NLP/AI requirements, putting an emphasis on the quick and efficient creation and curation of training data sets.
Chatbots and Natural Language Generation - A Bird Eyes ViewMark Cieliebak
Chatbots, conversational user interfaces, dialogue systems, question-answering - the names differ, but the fundamental idea is the same: smart computer systems which can "talk" to humans in a natural way. Chatbots and their derivatives are designed to understand human language, interpret its content, and reply accordingly. This long-standing vision from artificial intelligence has gained enormous momentum since 2015.
But what is possible, and where are the boundaries? Do chatbots really "understand" the meaning of text? And how can they be employed beneficially in real-world applications?
In this talk, we will give an overview of state-of-the-art technologies and applications for dialogue systems in research and industry.
This talk was given to an internal audience of non-engineers at Freshworks primarily product managers, marketers & sales folks to educate on machine learning basics & future.
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
Tim Menzies, NC State University, presents at the 2015 HPCC Systems Engineering Summit Community Day.
For Big Data applications, there is a lack of any gold standards for "good analysis" or methods to assess our certification programs. Hence, we are still in the dark about whether or not our human analysts are making the best use possible of the tools of Big Data. While much progress has been made in the systems aspects of Big Data, certain critical human-centered aspects remain an open issue. Regardless of the sophistication of the analysis tools and environment, all that architecture can still be used incorrectly by users. If this issue was confined to a small number of inexperienced users, then it could be addressed via process improvements such as better training. But is it? What do we know about our analysts? Where are the studies that mine the people doing the data miners?
This presentation offers some preliminary results on tools that combine ECL with other methods that recognize the code generated by experienced or inexperienced developers. While the results are preliminary, they do raise the possibility that we can better characterize what it means to be experienced (or inexperienced) at Big Data applications.
Everyone has been hearing about Machine learning and AI for a while now, but recently, it exploded.
Like you, Ortus and the CFML Community have been playing with AI too, and one of the end results is ChatGPT Box.
AI is cool, and for some people scary, but a lot of people wonder if there is really any true value for us developers, or our businesses. In this session we’ll discuss what ChatGPT Box is, why we created it, what types of problems it solves, why we are using AI to solve those problems, and how we trained and tamed our own AI.
We will also touch on some of the science behind the scenes, to help you understand the moving parts, and how ChatGPT Box v1.0.0 is just a drop in the ocean of the possibilities, we’ll touch on some ideas we have, and in the end, using ChatGPT Box can make you a much more productive Ortusian Developer!
Develop your career in the field of software development. Want to learn programming and develop your own applications, the presentation helps you to understanding the technology and the training methodologies required for that.
Similar to Breaking Through The Challenges of Scalable Deep Learning for Video Analytics (20)
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Globus Connect Server Deep Dive - GlobusWorld 2024Globus
We explore the Globus Connect Server (GCS) architecture and experiment with advanced configuration options and use cases. This content is targeted at system administrators who are familiar with GCS and currently operate—or are planning to operate—broader deployments at their institution.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaYara Milbes
Discover the transformative power of the WhatsApp API in our latest SlideShare presentation, "Top 7 Unique WhatsApp API Benefits." In today's fast-paced digital era, effective communication is crucial for both personal and professional success. Whether you're a small business looking to enhance customer interactions or an individual seeking seamless communication with loved ones, the WhatsApp API offers robust capabilities that can significantly elevate your experience.
In this presentation, we delve into the top 7 distinctive benefits of the WhatsApp API, provided by the leading WhatsApp API service provider in Saudi Arabia. Learn how to streamline customer support, automate notifications, leverage rich media messaging, run scalable marketing campaigns, integrate secure payments, synchronize with CRM systems, and ensure enhanced security and privacy.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
Launch Your Streaming Platforms in MinutesRoshan Dwivedi
The claim of launching a streaming platform in minutes might be a bit of an exaggeration, but there are services that can significantly streamline the process. Here's a breakdown:
Pros of Speedy Streaming Platform Launch Services:
No coding required: These services often use drag-and-drop interfaces or pre-built templates, eliminating the need for programming knowledge.
Faster setup: Compared to building from scratch, these platforms can get you up and running much quicker.
All-in-one solutions: Many services offer features like content management systems (CMS), video players, and monetization tools, reducing the need for multiple integrations.
Things to Consider:
Limited customization: These platforms may offer less flexibility in design and functionality compared to custom-built solutions.
Scalability: As your audience grows, you might need to upgrade to a more robust platform or encounter limitations with the "quick launch" option.
Features: Carefully evaluate which features are included and if they meet your specific needs (e.g., live streaming, subscription options).
Examples of Services for Launching Streaming Platforms:
Muvi [muvi com]
Uscreen [usencreen tv]
Alternatives to Consider:
Existing Streaming platforms: Platforms like YouTube or Twitch might be suitable for basic streaming needs, though monetization options might be limited.
Custom Development: While more time-consuming, custom development offers the most control and flexibility for your platform.
Overall, launching a streaming platform in minutes might not be entirely realistic, but these services can significantly speed up the process compared to building from scratch. Carefully consider your needs and budget when choosing the best option for you.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Breaking Through The Challenges of Scalable Deep Learning for Video Analytics
1. Breaking Through the Challenges of
Scalable Deep Learning for
Video Analytics
Steven Flores, sflores@compthree.com
Luke Hosking, lhosking@compthree.com
2. Use cases
A customer is somebody with a lot of unannotated video whose content they
want annotated and indexed into a searchable database. For example,
● Media: video library going back decades.
● Research institutions: video from a lecture series.
● Management and HR: conference/meetings notes.
3. What info do we want from video?
● What and who is in the video?
● What happens in the video?
● What is the video about?
(Example here: https://www.youtube.com/watch?v=X3a-ZX6ObJU)
4. Information from audio
● Topic modeling speech transcripts.
● Sentiment analysis of speech transcripts.
● Hot language and/or loud sounds heat map.
● Keywords (named entities) from transcripts. The Federal Reserve is widely expected to
increase interest rates again Wednesday...
Politics and policy
Sports
Science and
Technology
5. Using keywords to extract info
Within transcripts, keywords such as people, locations, organizations, and
geo-political entities carry much of the latent information we seek from a video.
For example, a video transcript containing the excerpt
...probably confirm the North Korean side in its willingness…
should appear if we search for the term “North Korea.” Also, the presence of
this term, along with other keywords, may support a topic assignment.
6. Keyword extraction
Keyword extraction can be a difficult problem. Free extractors always come
with their own ridgid taxonomy and may not be production quality:
For example, with the python natural language toolkit (NLTK)...
...probably confirm the North Korean side in its willingness…
Geo-socio-political group Geo-political entity
7. Using a human-curated whitelist
We maintain a “whitelist” of extracted keywords. This solves two problems:
● Quality control supervision of proposed keywords.
● Better custom keyword taxonomies are assigned to keywords on the list.
NLTK finds “North Korean” in the text, and we find it in the whitelist with its tag
...probably confirm the North Korean side in its willingness…
Ethnicity
But we have two more problems:
● Human supervision is time-consuming (prohibitively so with a large list).
● This doesn’t solve the case of a keyword phrase incorrectly split by NLTK.
8. Building a custom keyword extractor
The article Natural Language Processing (almost) from Scratch (R. Collobert et
al. 2011) introduces the “senna” named entity (keyword) extractor:
● A two-layer fully connected neural network.
● For each word, the input is its surrounding “context” words in the text.
● Input context words are mapped to 50-dim vectors in a word2vec model.
cat
sat
on
the
mat
I
O
E
B
S
10. Senna architecture advantages
● Results are often better than NLTK, thus requiring less human supervision.
● Minimal text preprocessing (for example, no chunking) is required.
● Because input is context-based, it may be possible to train a senna
network with automatically generated partially-annotated training data.
● With greater ease of generating training data, we can train keyword
extractors that are tailored to customer needs (taxonomy, jargon, etc.).
11. Sentiment heat maps
Sentiment heat maps indicate areas of potentially high interest in the video.
● Based on word sentiment and heated language.
● This may not be sufficient. We can also incorporate information from the
audio stream, such as loudness, to indicate areas of interest.
12. Challenges and future work
Keyword extraction:
● Adapting the senna model for in-house custom keyword extractors.
● Improving keyword extraction for “messy” spoken-language transcripts.
● How to quickly create training data for customer-dependent taxonomies?
Topic modeling:
● Supervised for customer-dependent topics?
● Unsupervised if the user wants to discover unknown information?
● How to do good topic modeling for “messy” spoken-language transcripts?
14. Object detection
Performing object detection on frames tells you what objects appear in a video:
We use various pre-trained models from the TensorFlow detection model zoo.
15. Challenges with object detection
Freely-available object detection models based on ResNet and Inception
architectures are production quality. Nonetheless, there are some challenges:
● What objects do we want to detect? Is this customer dependent?
● How to we create enough training data to build custom models quickly?
16. Scene recognition
We train a wide-ResNet model (S. Zagoruyko et al. 2016) to recognize scenes:
We train the network using the Places365 dataset with consolidated scene
categories (for example, not distinguishing stores based on their interiors).
17. Face recognition
A face recognition model require millions of faces for training and comprises
many steps: face detection, cropping and re-scaling, and classification.
To train such a model from scratch is very time-consuming. However, near
state-of-the art models are freely available. We are using dlib face recognition.
18. Face embeddings
Rather than simply recognize faces from a small list of people, most face
recognition models are trained to give good face-to-vector embeddings.
The model user then provides a list of images of faces to recognize, the model
maps the faces to vectors, and query faces are identified via k-nn search.
19. Who should we recognize?
What faces should we recognize? The answer may be customer dependent:
In generic situations, we should recognize people who are “famous enough”
(well-known politicians, celebrities, artists, scientists, thought-leaders, etc.)
What constitutes famous enough? How do we make a list of their names?
Given the list of names, how do we get enough pictures of their faces?
Steven Flores
(Engineer, Comp Three)
Luke Hosking
(Engineer, Comp Three)
20. Famous enough?
Our criteria for “famous enough” is partly set by our need to get a list of names
of such famous people: famous = has a wikipedia biography with birthday.
We can easily pull this list of famous people from the wikidata API. We record
each person’s name, birthday, occupation(s), and wikipedia page address.
Brad Pitt is in... Rich Skrenta is out (no b-day on wikipedia)
21. The gallery problem
Many state of the art facial recognition systems are still not good at picking the
correct face from a large gallery of faces. They generate many false positives.
The rank-1 accuracy decreases as the gallery “distractor” face count increases. (The MegaFace
Benchmark: 1 Million Faces for Recognition at Scale, I. Kemelmacher-Shlizerman et al. 2015)
22. A potential solution...
Given some faces each with a list of candidate names, use other information
(topic modeling, co-occurrence frequency) to find optimal name assignments:
On the left, Idina Menzel is correctly tagged. On the right, Amy Grant is wrongly
tagged “Fanny Cadeo;” her name is the second choice based on the image.
Use the fact that both are musicians to correct the second tag to “Amy Grant.”
23. Processing time considerations
● Estimated size of a “large” video cache: 40,000
● Number of frames in a typical 30 second video: 750
● Average video frame processing time (GTX 1080 GPU): about 1 second
→ Estimated time to process the entire video cache: almost one year...
The long time to process this hypothetical video cache is way too long!
Solution: only sample video keyframes (frames at shot changes or high-action
moments). These may contain most of the relevant information. For example,
● https://www.youtube.com/watch?v=_7WZ74F3j_I: 2650 frames
● Number of “irregularly spaced” keyframes processed: 10 keyframes
24. Challenges and future work
Object detection and scene recognition:
● What do we want to detect? (Customer-dependent?)
● How to we generate enough training data quickly and efficiently?
● What benchmarks do we need to hit for production quality?
Face recognition:
● Who can we / do we want to detect? (Customer-dependent?)
● How can we use other information to improve face-to-name assignments?
● What benchmarks do we need to hit for production quality?
Scalability:
● How can we speed up the wait time for image evaluation?
● What tradeoffs must we make to minimize video processing time?
● What can we trim without compromising performance benchmarks?
26. Digital Ocean Instance
Docker Host
Augi Real-time Components
Port 5000
Augi Backend
Port 5001
Text Annotator
index.html
bundle.js
Port 80
Nginx
Elasticsearch
File System
Video Object
Store
Port 9200/videos/
Port 5002
Image Service
28. Microservices
Augi Preprocessing Pipeline
Python Code
Video Frame
Sampling
Transcript
Extractor
Audio
Extractor
Elasticsearch
Text
Annotation
Video Store on
File System
Classify
Image
DataConsolidation
ESDocumentInsert
LoopOverVideos
31. Augi Preprocessing Workflow
Python scripts
● download videos and video metadata (youtube, proprietary APIs)
● manage overall process for list of videos to be enriched
Docker
● text Annotator
● image Classifier
Modular architecture
● file system based cache
● orchestration with override flags
32. Challenges
Iterative development over tens, to hundreds, of thousands of videos
File system based cache of data produced by each step in preprocessing,
along with granular overrides for each preprocessing method, allow for targeted
testing and implementation.
On-prem challenge: no internet access
We needed the architecture to be usable on-prem for clients that require data
security (confidential/healthcare sectors). Current external services used are
Google Cloud Speech and AWS S3, disk storage and products like Nuance
Dragon could be run on-prem.