網路安全是一個特殊的研究領域,其中一個原因是在網路安全問題中,"對手"不是文字、影像或任何形式死板板的資料,而是活生生的人;這些製造問題的黑客 (black hat hackers) 終日找尋各種系統及網路漏洞,企圖提出更高明的攻擊方式來獲取各種可能的利益。因此,在網路安全研究中,我們無法"預設"黑客會有什麼樣的攻擊行為,而必須從真正的資料中尋找蛛絲馬跡,從大量資料中發現及解決各種已發生或將發生可能危害使用者資料安全及隱私的行為。在這場研究中,我將介紹 data-driven network security research 並以幾個實際的研究案例來展示真實資料的統計分析可以幫助我們解決什麼樣的安全問題。
網路安全是一個特殊的研究領域,其中一個原因是在網路安全問題中,"對手"不是文字、影像或任何形式死板板的資料,而是活生生的人;這些製造問題的黑客 (black hat hackers) 終日找尋各種系統及網路漏洞,企圖提出更高明的攻擊方式來獲取各種可能的利益。因此,在網路安全研究中,我們無法"預設"黑客會有什麼樣的攻擊行為,而必須從真正的資料中尋找蛛絲馬跡,從大量資料中發現及解決各種已發生或將發生可能危害使用者資料安全及隱私的行為。在這場研究中,我將介紹 data-driven network security research 並以幾個實際的研究案例來展示真實資料的統計分析可以幫助我們解決什麼樣的安全問題。
What is passion? Why does it matter? Can we really teach it in an environment of high stakes and high standards? This presentation explores what learning driven by passion -- not proficiency -- looks, feels and sounds like.
Data Driven Teaching: Using Data to Inform Teaching. Practical Tips and Examples from Faculty and Grads of The University of Texas of Arlington.
TA New Teacher Webinar Series 2015-2016 Launches Saturday, September 12!
The University of Texas at Arlington's "New Teacher Webinar Series" for 2015-2016 Launches on 9/12/15! Join us on Saturday, September 12 at 1:00 pm (CST) for the UTA New Teacher Webinar on"Data-Driven Teaching" All are welcome! Click here for more details: https://www.smore.com/wb17y Link to join the webinar: https://elearn.uta.edu/webapps/bb-collaborate-bb_bb60/launchSession/guest?uid=80eb975c-0d1b-4e13-8cf1-99fcc8fdac73 The recording will be posted on our YouTube channel: https://www.youtube.com/user/UTANewTeachers and slideshare channel: http://www.slideshare.net/UTANewTeachers
We hope you can attend! Please share this info with anyone else who might be interested. Contact Dr. Peggy Semingson with any questions at: peggys@uta.edu
*Cut and paste any links above, if needed, into your browser window.
Pinterest: https://www.pinterest.com/UTANewTeachers/
Facebook: https://www.facebook.com/UTANewTeacherProject
YouTube: https://www.youtube.com/user/UTANewTeachers
slideshare: http://www.slideshare.net/UTANewTeachers
Future webinars:
Sept 12 (Topic: Data-Driven Assessment)
October 10 (Topic: Using EdModo in the Classroom)
Data driven learning (ETJ Language Teaching Expo)Michael Brown
Presentation given at the 2015/16 ETJ English Language Teaching Expo at Kanda Institute of Foreign Languages. Tokyo, Japan (Jan 30-31)
Note: Slide 26 should say "Sentence Corpus of Remedial English", not "Score Corpus of Remedial English"
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
Advanced Spark and TensorFlow Meetup 08-04-2016
Fundamental Algorithms of Neural Networks including Gradient Descent, Back Propagation, Auto Differentiation, Partial Derivatives, Chain Rule
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Chris Fregly
In this completely 100% Open Source demo-based talk, Chris Fregly from PipelineIO will be addressing an area of machine learning and artificial intelligence that is often overlooked: the real-time, end-user-facing "serving” layer in a hybrid-cloud and on-premise deployment environment using Jupyter, NetflixOSS, Docker, and Kubernetes.
Serving models to end-users in real-time in a highly-scalable, fault-tolerant manner requires not only an understanding of machine learning fundamentals, but also an understanding of distributed systems and scalable microservices.
Chris will combine his work experience from both Databricks and Netflix to present a 100% open source, real-world, hybrid-cloud, on-premise, and NetflixOSS-based production-ready environment to serve your notebook-based Spark ML and TensorFlow AI models with highly-scalable and highly-available robustness.
Speaker Bio
Chris Fregly is a Research Scientist at PipelineIO - a Streaming Analytics and Machine Learning Startup in San Francisco.
Chris is an Apache Spark Contributor, Netflix Open Source Committer, Founder of the Global Advanced Spark and TensorFlow Meetup, and Author of the upcoming book, Advanced Spark, and Creator of the upcoming O'Reilly video series, Scaling TensorFlow Distributed in Production.
Previously, Chris was an engineer at Databricks and Netflix - as well as a Founding Member of the IBM Spark Technology Center in San Francisco.
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
Top 10 Performance Gotchas in scaling in-memory Algorithms
Abstract:
Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm.
Track: Scalability, Availability, and Performance: Putting It All Together
Time: Wednesday, 11:45am - 12:35pm
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
https://www.meetup.com/TensorFlow-Chicago/events/240267321/
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/240587698/
http://pipeline.io
https://github.com/fluxcapacitor/pipeline
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
Empowering the Data Scientist with "1-Click" Production Deployment and Canary Testing of High-Performance and Highly-Scalable Spark ML and TensorFlow Models directly from Jupyter/iPython Notebooks using Docker, Kubernetes, Netflix OSS, Microservices, and Spinnaker.
With proper tooling and metrics, Data Scientists can directly deploy, analyze, A/B test, rollback, and scale out their Spark ML and TensorFlow model into live production serving with zero friction.
We will show you the open source tools that we've built based on Docker, Kubernetes, Netflix Open Source, Microservices, Spinnaker - and even Chaos Monkey!
Speaker: Chris Fregly @ PipelineIO, formerly Databricks and Netflix
Feature Talk: Real-time Aggregations, Approximations, Similarities, and Recommendations at Scale using Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird
Talk Abstract: Starting with a live, interactive demo generating audience-specific recommendations, we'll dive deep into each of the key components including NiFi, Kafka, Stanford CoreNLP, Docker, Word2Vec, LDA, Twitter Algebird, Spark Streaming, SQL, ML, GraphX. As a bonus, we'll discuss the latest Netflix Recommendations Pipeline and related open source projects.
Talk Agenda:
• Intro
• Live, Interactive Recommendations Demo
• Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird (advancedspark.com)
• Types of Similarity
• Euclidean vs. Non-Euclidean Similarity
• Jaccard Similarity
• Cosine Similarity
• LogLikelihood Similarity
• Edit Distance
• Text-based Similarities and Analytics
• Word2Vec
• LDA Topic Extraction
• TextRank
• Similarity-based Recommendations
• User-to-User
• Content-based, Item-to-Item (Amazon)
• Collaborative-based, User-to-Item (Netflix)
• Graph-based, Item-to-Item "Pathways" (Spotify)
• Aggregations, Approximations, and Similarities at Scale
• Twitter Algebird
• MinHash and Bucketing
• Locality Sensitive Hashing (LSH)
• BloomFilters
• CountMin Sketch
• HyperLogLog
• Q & A
Speaker Bio: Chris Fregly is a Research Engineer @ Flux Capacitor AI in SF, an Apache Spark Contributor, and a Netflix Open Source Committer.
Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com.
Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.
Introduction forFidel, R., Mark Pejtersen, A., Cleal, B., & Bruce, H. (2004). A multidimensional approach to the study of human information interaction: A case study of collaborative information retrieval. Journal of the American Society for Information Science and Technology, 55(11), 939-953.
What is passion? Why does it matter? Can we really teach it in an environment of high stakes and high standards? This presentation explores what learning driven by passion -- not proficiency -- looks, feels and sounds like.
Data Driven Teaching: Using Data to Inform Teaching. Practical Tips and Examples from Faculty and Grads of The University of Texas of Arlington.
TA New Teacher Webinar Series 2015-2016 Launches Saturday, September 12!
The University of Texas at Arlington's "New Teacher Webinar Series" for 2015-2016 Launches on 9/12/15! Join us on Saturday, September 12 at 1:00 pm (CST) for the UTA New Teacher Webinar on"Data-Driven Teaching" All are welcome! Click here for more details: https://www.smore.com/wb17y Link to join the webinar: https://elearn.uta.edu/webapps/bb-collaborate-bb_bb60/launchSession/guest?uid=80eb975c-0d1b-4e13-8cf1-99fcc8fdac73 The recording will be posted on our YouTube channel: https://www.youtube.com/user/UTANewTeachers and slideshare channel: http://www.slideshare.net/UTANewTeachers
We hope you can attend! Please share this info with anyone else who might be interested. Contact Dr. Peggy Semingson with any questions at: peggys@uta.edu
*Cut and paste any links above, if needed, into your browser window.
Pinterest: https://www.pinterest.com/UTANewTeachers/
Facebook: https://www.facebook.com/UTANewTeacherProject
YouTube: https://www.youtube.com/user/UTANewTeachers
slideshare: http://www.slideshare.net/UTANewTeachers
Future webinars:
Sept 12 (Topic: Data-Driven Assessment)
October 10 (Topic: Using EdModo in the Classroom)
Data driven learning (ETJ Language Teaching Expo)Michael Brown
Presentation given at the 2015/16 ETJ English Language Teaching Expo at Kanda Institute of Foreign Languages. Tokyo, Japan (Jan 30-31)
Note: Slide 26 should say "Sentence Corpus of Remedial English", not "Score Corpus of Remedial English"
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
Advanced Spark and TensorFlow Meetup 08-04-2016
Fundamental Algorithms of Neural Networks including Gradient Descent, Back Propagation, Auto Differentiation, Partial Derivatives, Chain Rule
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Chris Fregly
In this completely 100% Open Source demo-based talk, Chris Fregly from PipelineIO will be addressing an area of machine learning and artificial intelligence that is often overlooked: the real-time, end-user-facing "serving” layer in a hybrid-cloud and on-premise deployment environment using Jupyter, NetflixOSS, Docker, and Kubernetes.
Serving models to end-users in real-time in a highly-scalable, fault-tolerant manner requires not only an understanding of machine learning fundamentals, but also an understanding of distributed systems and scalable microservices.
Chris will combine his work experience from both Databricks and Netflix to present a 100% open source, real-world, hybrid-cloud, on-premise, and NetflixOSS-based production-ready environment to serve your notebook-based Spark ML and TensorFlow AI models with highly-scalable and highly-available robustness.
Speaker Bio
Chris Fregly is a Research Scientist at PipelineIO - a Streaming Analytics and Machine Learning Startup in San Francisco.
Chris is an Apache Spark Contributor, Netflix Open Source Committer, Founder of the Global Advanced Spark and TensorFlow Meetup, and Author of the upcoming book, Advanced Spark, and Creator of the upcoming O'Reilly video series, Scaling TensorFlow Distributed in Production.
Previously, Chris was an engineer at Databricks and Netflix - as well as a Founding Member of the IBM Spark Technology Center in San Francisco.
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
Top 10 Performance Gotchas in scaling in-memory Algorithms
Abstract:
Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm.
Track: Scalability, Availability, and Performance: Putting It All Together
Time: Wednesday, 11:45am - 12:35pm
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
Using the latest advancements from TensorFlow including the Accelerated Linear Algebra (XLA) Framework, JIT/AOT Compiler, and Graph Transform Tool, I’ll demonstrate how to optimize, profile, and deploy TensorFlow Models in GPU-based production environment.
This talk is contains many Spark ML and TensorFlow AI demos using PipelineIO's 100% Open Source Community Edition. All code and Docker images are available to reproduce on your own CPU or GPU-based cluster.
Chris Fregly is Founder and Research Engineer at PipelineIO, a Streaming Machine Learning and Artificial Intelligence Startup based in San Francisco. He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly Training and Video Series titled, "High Performance TensorFlow in Production."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.
https://www.meetup.com/TensorFlow-Chicago/events/240267321/
https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/240587698/
http://pipeline.io
https://github.com/fluxcapacitor/pipeline
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
Empowering the Data Scientist with "1-Click" Production Deployment and Canary Testing of High-Performance and Highly-Scalable Spark ML and TensorFlow Models directly from Jupyter/iPython Notebooks using Docker, Kubernetes, Netflix OSS, Microservices, and Spinnaker.
With proper tooling and metrics, Data Scientists can directly deploy, analyze, A/B test, rollback, and scale out their Spark ML and TensorFlow model into live production serving with zero friction.
We will show you the open source tools that we've built based on Docker, Kubernetes, Netflix Open Source, Microservices, Spinnaker - and even Chaos Monkey!
Speaker: Chris Fregly @ PipelineIO, formerly Databricks and Netflix
Feature Talk: Real-time Aggregations, Approximations, Similarities, and Recommendations at Scale using Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird
Talk Abstract: Starting with a live, interactive demo generating audience-specific recommendations, we'll dive deep into each of the key components including NiFi, Kafka, Stanford CoreNLP, Docker, Word2Vec, LDA, Twitter Algebird, Spark Streaming, SQL, ML, GraphX. As a bonus, we'll discuss the latest Netflix Recommendations Pipeline and related open source projects.
Talk Agenda:
• Intro
• Live, Interactive Recommendations Demo
• Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird (advancedspark.com)
• Types of Similarity
• Euclidean vs. Non-Euclidean Similarity
• Jaccard Similarity
• Cosine Similarity
• LogLikelihood Similarity
• Edit Distance
• Text-based Similarities and Analytics
• Word2Vec
• LDA Topic Extraction
• TextRank
• Similarity-based Recommendations
• User-to-User
• Content-based, Item-to-Item (Amazon)
• Collaborative-based, User-to-Item (Netflix)
• Graph-based, Item-to-Item "Pathways" (Spotify)
• Aggregations, Approximations, and Similarities at Scale
• Twitter Algebird
• MinHash and Bucketing
• Locality Sensitive Hashing (LSH)
• BloomFilters
• CountMin Sketch
• HyperLogLog
• Q & A
Speaker Bio: Chris Fregly is a Research Engineer @ Flux Capacitor AI in SF, an Apache Spark Contributor, and a Netflix Open Source Committer.
Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com.
Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.
Introduction forFidel, R., Mark Pejtersen, A., Cleal, B., & Bruce, H. (2004). A multidimensional approach to the study of human information interaction: A case study of collaborative information retrieval. Journal of the American Society for Information Science and Technology, 55(11), 939-953.
5. Naïve Bayes Algorithm
Transfer learning
Apriori Algorithm
Gaussian distribute
Random Forests
Logistic Regression
(Deep)Neural Networks
Decision Trees
Nearest Neighbour
Support Vector Machine K Means Algorithm
Linear Regression
Active learning
Domain adaptation
Semi-supervised learningReinforcement learning
unsupervised learningsupervised learning
9. 9
Emotion
Health Care
Education
Voice Recognition
Symptom diagnosis
Behavior Activity
Image Recogn
Medical
IBM Pathway Genomics
Detection of Diabetic
Retinopathy in Retinal
Fundus Photographs
customer behavior
Medical Imaging
Genomic Medicine
跨領域整合 – 與人相關
10. What do I do ?
&
What am I going to share ?
10
12. 12
Seek a window into human mind and traits…
…through engineering approach
S. Narayanan and P. G. Georgiou, “Behavioral signal processing: Deriving human behavioral informatics
from speech and language," Proceedings of the IEEE, vol. 101, no. 5, pp. 1203–1233, 2013.
13. 13
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
• Help experts to do things they know in a more efficient manner at scale
• Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
33. QUANTITATIVE:
QUANTITATIVE EVIDENCE DIRECTLY FROM MEASURABLE SIGNALS
EFFICIENCY :
HELP DO THINGS THAT EXPERTS KNOW TO DO WELL MORE
EFFICIENTLY, CONSISTENTLY & AT SCALE
SUPPLMENTARY:
COMPLEMENT WITH GOLD STANDARD METHOD WHEN APPROPRIATE
POSSIBILITY:
TOOLS FOR NOVEL ACTIONABLE INSIGHT DISCOVERY
33
COMPUTING BEHAVIORAL TRAITS & STATES FOR DECISION MAKING & ACTION
…aim..
34. 34
BSP的Enablers . . . (半邊的拼圖)
Text
Processing
Voice Activity
Detection
Alignment
Transcription
Keyword
Spotting
Prosody
Modeling
Voice QualityDiarization
Speaker
Identification
Dialog Act
Tagging
Face
Detection
Expression
recognition
Action
recognition
Language
Understandin
Affective
Computing
Speaker State
and Trait
Joint Speech
Visual
Processing
Interaction
Modeling
Sentiment
Analysis
35. 35
訊號處理、機器學習
Enabling Technologies
領域專家知識
Domain Experts Knowledge
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Dialog act
tagging
Keyword
spotting
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理
40. 40
Computational Methods that Model Human Behavior Signals
• Manifested in Overt and Covert Cues
• Processed and Used by Humans Explicitly or Implicitly
• Facilitate Human Analysis and Decision Making
Outcome of Behavioral Signal Processing
• Behavioral Analytics
QUANTIFYING HUMAN EXPRESSED BEHAVIOR AND
HUMAN “FELT SENSE”
DERIVING INTERPRETABLE BEHAVIOR ANALYTICS
FROM DATA FOR ACTIONAL INSIGHTS
56. 56
social-communicative neurodevelopmental disorder
• Prevalence: 1 in 68 children (1 in 42 males) diagnosed [CDC2014]
• ASD: “Spectrum” disorder due to the extreme heterogeneity
• Intervention leads to improved outcomes
BSP in Autism 中的角色?
What is Autism?
57. 57
ROLE OF BSP?
自動的分析醫生小孩在ADOS診斷中互動中 social and
interactive 行為
AIM?
• Analysis at scale
• Quantitative evidence from signals
• New finding beyond current status-quo
in psychiatry (?)
60. 60
Can we?
Automatic measuring spontaneous social (verbal/nonverbal) behavior between
clinician and child predicting the child rating of atypical amount of social
reciprocal communication
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
從聲音影像,開發醫生小孩社交互動行為指標,用以分析預測 相互性社會溝通數量
69. 69
where
when
how
BIIC:無聲隔離室
本來:無限制國家教育院的教室
只好:盡量不要發聲的教室
Ensure current system is not altered too much at the BEGINNING
at-scale, ease-of-application is crucial
在ecological validity & quality control 之中有拿捏
BIIC:每個校長在培訓班中的考試
本來:無
只好:在人力可以範圍內全部錄
BIIC:耳麥、多軌錄音、臉部、肢體動作,
Kinect、全部synchronized
本來:無器材、要可以用簡單人力做
只好:上半身錄影外接麥克風收音
84. Autism Diagnostic Observation Schedule [Lord 2001]
• Subject interacts with a psychologist for ~45 minutes
• Current gold standard, research-level observational coding
• Psychologists are trained using stringent training protocol
• Semi-structured assessment in eliciting socio-communicative
behavior of the ASD children for diagnostics
• Multiple subparts events (14) on rating of a wide range number of
socio-communicative behavior (28)
84
91. • Speech signal per session
• Energy every frame
– frame = 25ms
– standard deviation (normalize D.C. offset)
• 閥值Threshold
– speech percentage in the wav
• Speech Segments
– Energy > Threshold Energy
Short-Time energy
Formula:
𝑬 𝒏 =
𝒎=𝒏−𝑵+𝟏
𝒏
𝒙 𝟐
(𝒎)
簡單的聲音偵測器
98. 98
Clustering
speaker change
detection
1. Generate i-vector for each ‘segment’
2. Compute pair-wise similarity each cluster
3. Merge closest clusters
4. Update distances of remaining clusters to
new cluster
5. Iterate steps 2-4 until stopping criterion is
met
126. 126
比較快可以上手算
Versatile and Fast Audio Feature
Extractor
Open-Source and Cross-platform
Abundant speech-related features
Signal energy Loudness、
Mel-spectra、MFCC、PLP-
CC、Pitch
Audio I/O
Supported A lot I/O formats: WEKA
HTK LibSVM
可直接視覺化
稍微容易一點
PraatOpensmile
其實還很多啦 . . .
127. 127
低階訊號描述值
編碼/Profile
影像特性
Histogram of oriented gradients (HoG)
Scale-invariant feature transform (Sift)
Local binary pattern (Lbp)
3D SIFT
HOG3D
texture、shape、keypoint、edge
比較常來形容影像(照片) frame
Histogram of oriented gradients (HoG) Local binary pattern (Lbp)
137. 137
Term Weighting Method
a simplifying representation by term count
Term Frequency
How important (or
informative) a word in a document.
Inverse Document Frequency
How important (or
informative) a word in the corpus.
𝑡𝑓𝑡,𝑑
=
𝑛 𝑡,𝑑
𝑘 𝑛 𝑘,𝑑
𝑖𝑑𝑓𝑡,𝐷
= log
𝑁
1 + 𝑑 ∈ 𝐷 ∶ 𝑡 ∈ 𝑑
X
Term Frequency–Inverse Document Frequency (TF-IDF)
有時候就很有效了
138. 138
不一定依一個詞為單位 . . .
N-gram
Turn unigram term into bigram term on the word token step
for instance,
John also likes to watch football games
[ 'John also' , 'also likes' , 'likes to' , 'to watch' , 'watch football' , 'football
games' ]
[ 1 , 1 , 1 , 1 , 1 , 1 ]
可以無限延伸這些東西
那也希望能夠透過這
樣子的一個方式來…
提升我們老師的教學
文
字
139. 139
Distributed word representation
用向量表達一個字(詞)
CBOW predicting the word given its context
Skip-gram predicting the context given a word
distributed representation encoded in the hidden layer of the neural
network as representations of words
164. 164
ADOS
Emotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series of
multimodal behavior coordination measure
across a session . . .
167. 167
ADOS
Emotion Part
Multimodal Turn-taking Behavior
Coordination Time Series
Automatic generating a time-series of
multimodal behavior coordination measure
across a session . . .
197. 197
Psychologists unconsciously alter communicative social behavior strategy (cueing
behavior?) as conditioned on ASD kids ability to carry out reciprocal communication
during interaction
200. 200
Descriptor’s
Included
Child Prosody Psych Prosody Child and Psych
Prosody
Spearman’s ρ 0.64*** 0.79*** 0.67***
Psychologists acoustics at least as predictive of child ASD severity ratings
跟以前英文ADOS發現有類似!
[1] Daniel Bone, Chi-Chun Lee, Matthew P. Black, Marian E. Williams, Pat Levitt, Sungbok Lee, and Shrikanth Narayanan, "The Psychologist as
an Interlocutor in Autism Spectrum Disorder Assessment: Insights from a Study of Spontaneous Prosody", Journal of Speech, Language, and
Hearing Research, 2014, 57(4), 1162-1177.
Hard to obtained scientific insights without such behavioral analytics for
domain experts
NEED MORE VERIFICATION
202. Is it Technical? Example Pitfall 1
Controlling for Channel Factors
• Interspeech 2013 Autism Challenge
• Baseline Approach
Black-box (works well)
2-class baseline: 92.8% UAR (chance is 50% UAR)
• Hypothesis: Model captures channel, not diagnosis
ASD/SLI from 2 clinics, TD from classrooms
• Simple experiment showed channel differences
Matched baseline
• Conclusion: Remit (or note) noise sources in data collection.
202
Daniel Bone, Theodora Chaspari, Kartik Audkhasi, James Gibson, Andreas Tsiartas, Maarten Van Segbroeck, Ming Li, Sungbok Lee, and Shrikanth
Narayanan, "Classifying Language-Related Developmental Disorders from Speech Cues: the Promise and the Potential Confounds", InterSpeech, 2013.
11/11/2014
204. Is it Technical: Example Pitfall 2
Behavior Analysis & Modeling: Cross-validation
They do not perform speaker-separated cross-fold
validation!
• Can we detect United States Senators’ party affiliations
from speech features (with black-box approach)?
Performance increases as # samples/speaker
increases
Conclusion: Always perform speaker-separated
cross-validation!
20411/11/2014
212. 212
Behavioral Signal Processing (BSP)
Compute Human Behavior Traits and States for Domain Experts Decision Making
• Help experts to do things they know in a more efficient manner at scale
• Develop novel behavioral analytics framework for possible scientific discovery
from qualitative to quantitative . . .
through verbal and non-verbal behavioral cues . . .
Transformative effort . . .
213. 213
OF
FOR
BY
COMPUTING
HUMANS
Human action and behavior data
Meaningful analysis, timely decision making &
intervention (action)
Collaborative integration of human expertise
with automated processing
By professor Shrikanth Narayanan
214. 214
訊號處理、機器學習
Enabling Technologies
領域專家知識
Domain Experts Knowledge
Low level
descriptors
Acoustic
features
Motion
features
Text
features
Image
features
Speech
recognition
Face
recognition
Action
recognition
Dialog act
tagging
Keyword
spotting
Text
processing
Sentiment
Analysis
Affect
recognition
Speaker
states and
traits
Visual-
speech
processing
Interaction
modeling
Subjective
assessment
Internal state
& construct
Neuro-
developmen
tal disorder
Evidence-
based
observational
coding
Intervention
efficacy
Coder
variability
control
Development
of coding
manual
Self report
measure
validity
Coding
mechanism
Social
behavior
Affective
behavior
Communica
tive
behavior
Dyadic
behavior
人類訊號處理
Relative New:
RICH R&D
OPPORTUNITIES
(CHALLENGES)