Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
3/9/2015 Joydeep Ghosh UT-ECE
Approaches to Mining
Large-Scale Heterogeneous Data:
Old and New
Prof. Joydeep Ghosh
Schlumb...
3/9/2015 Joydeep Ghosh UT-ECE
What we do
• Data-Driven Modeling & Knowledge Discovery
“Big Data Predictive and Prescriptiv...
Multi-sensor Fusion (80s, 90s)
• Blackboards (KBS)
• Multiple Hypothesis Tracking
• Basic Tracking (Kalman filters, Gauss-...
• The usual ones +
“Important applications can be found in time-critical
situations or in situation with a high decision r...
Overall Architecture
(Extreme) Design Choices
Combining Multiple Classifiers
J. Ghosh, S. Beck and L. Deuser, IEEE Jl. of Ocean Engineering, Vol 17,
No. 4, October 1992...
Combining Multiple Clusterings (2002)
• Given a set of provisional partitionings, we want to aggregate them
into a single ...
Combining Multiple Trackers (97,98)
Adaptive Kalman Filter Bank
Modern Settings: Networked, Het Data
(Collective) Matrix Factorization
Factorization of Heterogeneous Data
Patients!
Diagnoses!
Procedures!
Medications!Demographics! Physicians!
W X Y Z
High-throughput Phenotyping on Electronic Health Records using
Multi-Tensor Factorization ($2.2 Mil grant from NSF)
4
Tens...
To Come
• Internet of Things (IoT)
• Network of (information) networks
• ….
Approaches to Mining Large-Scale Heterogeneous Data: Old and New
Upcoming SlideShare
Loading in …5
×

Approaches to Mining Large-Scale Heterogeneous Data: Old and New

495 views

Published on

2015 D-STOP Symposium session by UT Austin's Joydeep Ghosh. Watch the presentation at http://youtu.be/y2kYLM8GdbI?t=19m42s

Get symposium details: http://ctr.utexas.edu/research/d-stop/education/annual-symposium/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Approaches to Mining Large-Scale Heterogeneous Data: Old and New

  1. 1. 3/9/2015 Joydeep Ghosh UT-ECE Approaches to Mining Large-Scale Heterogeneous Data: Old and New Prof. Joydeep Ghosh Schlumberger Centennial Chaired Professor Fellow, IEEE Director, IDEAL (Intelligent Data Exploration and Analysis Lab) University of Texas at Austin
  2. 2. 3/9/2015 Joydeep Ghosh UT-ECE What we do • Data-Driven Modeling & Knowledge Discovery “Big Data Predictive and Prescriptive Analytics” – Data Types: • relational databases, distributed sensors, signals, images, web-logs, key-value…. • data (continuous + symbolic) + domain knowledge – Tools: • Data mining/stats; web mining; machine learning, Neural nets, signal/image processing…. – Large Scale System issues – Speciality: Multi-learner systems • Use multiple, complementary approaches for more robust modeling of complex engineering problems • Custom models, where “canned solutions” are inadequate.
  3. 3. Multi-sensor Fusion (80s, 90s) • Blackboards (KBS) • Multiple Hypothesis Tracking • Basic Tracking (Kalman filters, Gauss- Markov,..) • Detection/Identification
  4. 4. • The usual ones + “Important applications can be found in time-critical situations or in situation with a high decision risk, where human deficiencies are to be compensated for by automatically or interactively working fusion techniques (compensating for decreasing attention in routine situations; focusing the attention on anomalous or rare events; complementing limited memory, reaction, or combination capabilities of human beings)” Koch, 2010. Rationale
  5. 5. Overall Architecture
  6. 6. (Extreme) Design Choices
  7. 7. Combining Multiple Classifiers J. Ghosh, S. Beck and L. Deuser, IEEE Jl. of Ocean Engineering, Vol 17, No. 4, October 1992, pp. 351-363. Ave/median/.. MLP RBF Classifer N FFT Pre-processsed Data from Observed Phenomenon . . . . . .. . . Gabor Wavelets Feature Set M
  8. 8. Combining Multiple Clusterings (2002) • Given a set of provisional partitionings, we want to aggregate them into a single consensus partitioning, even without access to original features. Clusterer #1 (individual cluster labels) (consensus labels) Provides Improved Accuracy + Robustness + Knowledge Re-use
  9. 9. Combining Multiple Trackers (97,98) Adaptive Kalman Filter Bank
  10. 10. Modern Settings: Networked, Het Data
  11. 11. (Collective) Matrix Factorization
  12. 12. Factorization of Heterogeneous Data Patients! Diagnoses! Procedures! Medications!Demographics! Physicians! W X Y Z
  13. 13. High-throughput Phenotyping on Electronic Health Records using Multi-Tensor Factorization ($2.2 Mil grant from NSF) 4 Tensor Construction + Generation +"…"+" λ1" Phenotype 1 λR" Phenotype R Refinement Applications GWAS Predictive Models Cohort Construction Adaptation " " EHR" Site A " " EHR" Site B ≈" Tensor Construction + Generation
  14. 14. To Come • Internet of Things (IoT) • Network of (information) networks • ….

×