http://cascaderesearch.org
DATA-DRIVEN SENSEMAKING INAN EVOLVING, NOISY
WORLD
K. Selçuk Candan
Professor of Computer Scien...
http://cascaderesearch.org
“Sense”making…what does it mean?
• Etymology:
• 1st sense: from latin “sentire” or “to perceive...
http://cascaderesearch.org
..did you notice something?
• …there is a gap between the first meaning (feel, measurement) and...
http://cascaderesearch.org
energy
business/enterprise
We are living in a dynamic world…
health-care
entertainment
educatio...
http://cascaderesearch.org
Epidemics….
• SARS (Severe Acute Respiratory Syndrome) epidemic is estimated to have started in...
http://cascaderesearch.org
Epidemics….
K. Selcuk Candan @ ASU
• NSF III#1318788 “Data Management for Real-Time Data Driven...
http://cascaderesearch.org
Bad news…
• Challenge #1: Epidemic data involves
• 100s of inter-dependent parameters,
• spanni...
http://cascaderesearch.org
Building energy sector…
• Building sector was responsible for nearly half of CO2 emissions in U...
http://cascaderesearch.org
Good news….
• By 2030, 82% of the US building stock is expected to be relying on smart and clea...
http://cascaderesearch.org
energy
business/enterprise
Sensemaking in a dynamic world…
health-care
entertainment
education
...
http://cascaderesearch.org
energy
business/enterprise
Data challenges in a dynamic world
health-care
entertainment
educati...
http://cascaderesearch.org
energy
business/enterprise
Data challenges in a dynamic world
health-care
entertainment
educati...
http://cascaderesearch.org
ASU Center forAssured and Scalable Data Engineering
CASCADE-IUCRC
Industry/University Collabora...
http://cascaderesearch.org
“Big Data” Industry Roundtable at ASU
• Co-organized with IBM
• On-site or off-site participati...
http://cascaderesearch.org
2nd Event…
http://cascaderesearch.org
Key knowledge gaps..
• Six most critical knowledge competency groups (in terms of the value
gap...
http://cascaderesearch.org
CASCADE Mission
• Mission: to support the innovation of data architectures
and tools that can m...
http://cascaderesearch.org
modeling
organization
storage/indexing
replication
fusion/integration
ingest compression visual...
http://cascaderesearch.orgmodeling
hiding
security encryption
repudiation provenance
authentication
trust models
access co...
http://cascaderesearch.org
CASCADE team
Name Title Area(s) of Specialization as they relate to proposed
concentration
K. S...
http://cascaderesearch.org
CASCADE team
http://cascaderesearch.org
http://cascaderesearch.org
So what about my team’s work?
http://cascaderesearch.org
Common approaches to learning
• There are several technical approaches.
• factorization, matrix...
http://cascaderesearch.org
• There are several technical approaches.
• factorization, matrix/tensor decomposition
• probab...
http://cascaderesearch.org
Tensor analysis…
• Tensor decomposition [CP,Tucker] can be used for
• understanding spectral ch...
http://cascaderesearch.org
Tensor representation of data
• Most media and sensor data are
• multi-dimensional and
• multi-...
http://cascaderesearch.org
Tensor analysis…
• Tensor decomposition [CP,Tucker] can be used for
• understanding spectral ch...
http://cascaderesearch.org
Tensor analysis…
• Tensor decomposition [CP,Tucker] can be used for
• understanding spectral ch...
http://cascaderesearch.org
Common data characteristics…
• The key characteristics of the real worlddata sets
include the f...
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
time
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
tim...
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
tim...
http://cascaderesearch.org
..and the metadata……
• Different modes of the tensor can have different types of
metadata..
tim...
http://cascaderesearch.org
..alternatively…Networks of Time Series (NoTS)
http://cascaderesearch.org
Research challenges…
Questions:
• how to best account for the different modalities of the
data?...
http://cascaderesearch.org
What about other approaches?
• There are several technical approaches.
• factorization, matrix/...
http://cascaderesearch.org
Conclusions…
Making sense of a dynamically evolving world is a really
really challenging task……...
http://cascaderesearch.org
candan@asu.edu
cascaderesearch.orgcascade.asu.edu
http://cascaderesearch.org
Relevant Publications
• Xinsheng Li, Shenyu Huang, K. Selcuk Candan, Maria Luisa Sapino. 2PCP: ...
http://cascaderesearch.org
Relevant Publications
• Xilun Chen and K. Selcuk Candan. LWI-SVD: Low-rank, Windowed, Increment...
http://cascaderesearch.org
Relevant Publications
• Xiaolan Wang, K. Selcuk Candan, and Maria Luisa Sapino. Leveraging Meta...
http://cascaderesearch.org
Relevant Publications
• K. Selçuk Candan, Rosaria Rossini, Maria Luisa Sapino, Xiaolan Wang: sD...
Upcoming SlideShare
Loading in …5
×

Cognitive systems16

261 views

Published on

K. Selcuk Candan, Arizona State University, will make this presentation at the Cognitive Systems Institute Speaker Series call on May 19, 2016.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
261
On SlideShare
0
From Embeds
0
Number of Embeds
55
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Cognitive systems16

  1. 1. http://cascaderesearch.org DATA-DRIVEN SENSEMAKING INAN EVOLVING, NOISY WORLD K. Selçuk Candan Professor of Computer Science and Engineering Director, Center for Assured and Scalable Data Engineering (CASCADE) Supported by • NSF; “Data Management for Real-Time Data Driven Epidemic Spread Simulations” • NSF; “RAPID - Understanding the Evolution Patterns of the Ebola Outbreak in West-Africa and Supporting Real-Time Decision Making and Hypothesis Testing through Large Scale Simulations” • NSF; “E-SDMS: Energy Simulation Data Management System Software” • JCI; “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations” • NSF; “An Infrastructure to Support Complex Financial Patterns (CFP) based Real-Time Services Delivery and Visual Analytics” • NSF I/UCRC planning grant (NSF-IIP1464579) for “Center for Assured and Scalable Data Engineering”
  2. 2. http://cascaderesearch.org “Sense”making…what does it mean? • Etymology: • 1st sense: from latin “sentire” or “to perceive” • any of the faculties, as sight, hearing, smell, taste, or touch, by which humans and animals perceive stimuli originating from outside or inside the body • 2nd sense: “to attain awareness or understanding of…” • “awareness” implies vigilance in observing or alertness in drawing inferences from what one experiences • “understanding” is the power to make experience intelligible by applying concepts and categories
  3. 3. http://cascaderesearch.org ..did you notice something? • …there is a gap between the first meaning (feel, measurement) and the second (awareness, understanding) • ..and that gap (or the data infrastructure needed to bridge that gap) is what my research is about knowledgebasessensors awareness, understanding, control sensing applicat ion sensemaking
  4. 4. http://cascaderesearch.org energy business/enterprise We are living in a dynamic world… health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences
  5. 5. http://cascaderesearch.org Epidemics…. • SARS (Severe Acute Respiratory Syndrome) epidemic is estimated to have started in China in November 2002, had spread to 29 countries by August 2003 • A pandemic similar to the swine flu in 2009 is estimated to cost $360 billion in a mild scenario to the global economy and up to $4 trillion in an ultra scenario, within the first year of the outbreak • The World Health Organization declared the Ebola epidemic in West Africa a Public Health Emergency of International Concern on August 8th, 2014, with exponential dynamics characterizing the initial growth in numbers of new cases in some areas K. Selcuk Candan @ ASU • NSF III#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations”
  6. 6. http://cascaderesearch.org Epidemics…. K. Selcuk Candan @ ASU • NSF III#1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations” Not much room for error Both action and inaction can have high costs in terms of their economic impacts and human lives affected
  7. 7. http://cascaderesearch.org Bad news… • Challenge #1: Epidemic data involves • 100s of inter-dependent parameters, • spanning multiple layers and geo-spatial frames, • affected by complex dynamic processes operating at different resolutions. • Challenge #2: Given the • unpredictability of an epidemic and • unpredictability of the actions of various independent agencies, decision makers need to generate many thousands of simulations, each with different parameters corresponding to plausible scenarios. • Challenge #3: Models and simulations need to be continuously revised based on real-world data as the epidemic and intervention mechanisms evolve. K. Selcuk Candan @ ASU
  8. 8. http://cascaderesearch.org Building energy sector… • Building sector was responsible for nearly half of CO2 emissions in US in 2009. • According to the US Energy Information Administration, buildings consume more energy than any other sector, with 48.7% of the overall energy consumption and building energy consumption is projected to grow faster than the industry and transportation sectors. K. Selcuk Candan @ ASU • NSF SI^2#1339835 “E-SDMS: Energy Simulation Data Management System Software” • JCI Grant “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations ” U.S. Energy Information Administration. 2008. International Energy Statistics
  9. 9. http://cascaderesearch.org Good news…. • By 2030, 82% of the US building stock is expected to be relying on smart and cleaner energy technologies • Building energy management systems (BEMSs) process large volumes of data, including • continuously collected heating, ventilation, and air conditioning (HVAC) sensor and actuation data of residential and commercial buildings of all types and sizes • other sensory data, such as occupancy, humidity, lighting levels, air speed and quality, • architectural, mechanical, and building automation system configuration data, • local whether and GIS data that provide contextual information, as well as • price, consumption, and cost data from electricity (such as smart grid) and gas utilities K. Selcuk Candan @ ASU • NSF SI^2#1339835 “E-SDMS: Energy Simulation Data Management System Software” • JCI Grant “I2AV: Integrate, Index, Analyze, and Visualize Energy Data for Data-driven Simulations and Optimizations ” http://econtrol.me/Smart%20Building.html http://customloungeuk.com Because of the • size and complexity of the data and • the varying spatial and temporal scales at which the key processes operate; experts lack the means to understand and predict relevant processes.
  10. 10. http://cascaderesearch.org energy business/enterprise Sensemaking in a dynamic world… health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences Sense & Integrate Simulate & Predict Validate & Interpret Act & Adapt (a) Sense & Integrate: take as inputs, and integrate, data, and models of the application space and continuously sensed real- time observational data, (b) Simulate & Predict: support data-driven simulation and predictive analysis over integrated data sets and models, (c) Validate & Interpret: enable validation of observations, models, and simulation/prediction results and intuitive data and result representation to provide trustworthy and accurate decision making, and (d) Act & Adapt: provide continuous adaptation of models and predictions based on the validated predictions and observations.
  11. 11. http://cascaderesearch.org energy business/enterprise Data challenges in a dynamic world health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences (I)mprecision (S)parsity (Q)uality/Noise ISQ (H)igh-dimensional (M)ulti-modal Inter-(L)inked (E)volving HMLE (V)olume (V)elocity (V)ariety 3Vs
  12. 12. http://cascaderesearch.org energy business/enterprise Data challenges in a dynamic world health-care entertainment education rehabilitation elderly-care production life-sciences sports security defense transportation supply-chain retail arts advertisement child-care pet-care personal-data management robotics smart-rooms smart-offices training space exploration sciences (I)mprecision (S)parsity (Q)uality/Noise ISQ (H)igh-dimensional (M)ulti-modal Inter-(L)inked (E)volving HMLE (V)olume (V)elocity (V)ariety 3Vs
  13. 13. http://cascaderesearch.org ASU Center forAssured and Scalable Data Engineering CASCADE-IUCRC Industry/University Collaborative Research Center (I/UCRC) * NSF I/UCRC planning grant (NSF-IIP1464579)
  14. 14. http://cascaderesearch.org “Big Data” Industry Roundtable at ASU • Co-organized with IBM • On-site or off-site participation • Aerojet, • Avnet, • Boeing, • Facebook • Google • IBM TJ Watson (Exascale System Software), • IBM Smart Analytics • IO Data Centers, • Johnson Controls, • LinkedIn, • Lockhed Martin, • Mayo Clinic, • NEC Labs, • Oracle, • Salt River Project, • SAP
  15. 15. http://cascaderesearch.org 2nd Event…
  16. 16. http://cascaderesearch.org Key knowledge gaps.. • Six most critical knowledge competency groups (in terms of the value gap – i.e., the difference between current and desired states of the knowledge area) 1. temporal and spatial analyses, 2. summarization, cleaning, visualization, anomaly detection, 3. real-time processing for streaming data, • media analytics 4. representations and fusion for unstructured/structured data, semantic Web, • make unstructured data queriable, prioritize and rank data, correlate and identify the gaps in the data 5. graph-based models, social networks, • entity analytics, (social and other) network analytics 6. performance and scalability, distributed architectures. "Hunting for the Value Gaps in Data Management, Services, and Analytics” ACM SIGMOD blog; http://wp.sigmod.org/
  17. 17. http://cascaderesearch.org CASCADE Mission • Mission: to support the innovation of data architectures and tools that can match the scale of the data and support timely and assured decision making to generate value. Validate & Interpret Act & Adapt Sense & Integrate Simulate & Predict Data Management Data Analysis Data Assurance
  18. 18. http://cascaderesearch.org modeling organization storage/indexing replication fusion/integration ingest compression visualization partitioning hiding security encryption repudiation provenance authentication trust models access control finger printing tamper detectionsummarization/aggregation sampling cleaning normalization annotation dimensionality reduction media analysis machine learning FUNDAMENTAL KNOWLEDGE ENABLING TECHNOLOGIES SYSTEMS Technology Element: Real-time Data Processing and Analysis Technology Element: Parallel and Distributed Data Processing and Analysis Technology Element: High-dimensional and Multi-modal Data Processing and Analysis Technology Element: Trusted and Privacy-preserving Data Processing and Analysis Fundamental Insights Partners & Stakeholders SystemRequirements TECHNOLOGY BARRIERS: • availability, • timeliness, • cost, • consistency, • trust, • privacy, • security, • compliance, and • accessibility FUNDAMENTAL BARRIERS: • heterogeneous data and models, • transient, mobile, and distributed data, • multi-scale, multi-resolution data, • data with different quality, precision, privacy, security, and trust levels, and • varying data volume and characteristics • high dimensional, complex data Requirements Product and Outcomes
  19. 19. http://cascaderesearch.orgmodeling hiding security encryption repudiation provenance authentication trust models access control finger printing tamper detectionsummarization/aggregation sampling cleaning normalization annotation dimensionality reduction media analysis machine learning ENABLING TECHNOLOGIES SYSTEMS Technology Element: Real-time Data Processing and Analysis Technology Element: Parallel and Distributed Data Processing and Analysis Technology Element: High-dimensional and Multi-modal Data Processing and Analysis Technology Element: Trusted and Privacy-preserving Data Processing and Analysis Fundamental Insights & rs FUND • hetero • transient • multi-s • data wit privacy, Requirements Product and Outcomes
  20. 20. http://cascaderesearch.org CASCADE team Name Title Area(s) of Specialization as they relate to proposed concentration K. Selcuk Candan Professor Scalable data management and media analysis Hasan Davulcu Assoc. Professor Databases and data extraction Gail Joon Ahn Professor Security and privacy in distributed data systems Huan Liu Professor Data mining and analysis Ross Maciejewski Assistant Professor Data visualization Baoxin Li Professor Statistical machine learning, media analysis Rao Kambhampati Professor Data integration, data cleaning Chitta Baral Professor Knowledge representation, NLP Dijuang Huang Associate Professor Data clouds Hanghang Tong Assistant Professor Graph structured data Mohamed Sarwat Assistant Professor Data management systems Jingrui He Assistant Professor Data analysis and sparse learning Paolo Shakarian Assistant Professor Data and network analysis Rong Pan Assoc. Professor Data analytics Jing Li Assoc. Professor Data analytics Ron Askin Professor Data-driven decision models Teresa Wu Professor Decision support, health informatics Ming Zhao Associate Professor Scalable data processing Adam Doupe Assistant Professor Data security Paolo Papotti Assistant Professor Data integration and management 21
  21. 21. http://cascaderesearch.org CASCADE team http://cascaderesearch.org
  22. 22. http://cascaderesearch.org So what about my team’s work?
  23. 23. http://cascaderesearch.org Common approaches to learning • There are several technical approaches. • factorization, matrix/tensor decomposition • probabilistic (Bayesian/graphical model) learning • deep structured learning and neural networks.
  24. 24. http://cascaderesearch.org • There are several technical approaches. • factorization, matrix/tensor decomposition • probabilistic (Bayesian/graphical model) learning • deep structured learning and neural networks. Common approaches to learning
  25. 25. http://cascaderesearch.org Tensor analysis… • Tensor decomposition [CP,Tucker] can be used for • understanding spectral characteristics of the data and • clustering the data based on inter-dependencies. CP-decomposition: R clusters and cluster memberships Factor Matrix Factor Matrix Factor Matrix Core Tensor
  26. 26. http://cascaderesearch.org Tensor representation of data • Most media and sensor data are • multi-dimensional and • multi-relational • Temporally evolving data… or represented as E.g. A B C : : : a b 2 : : : a b 2 1a b 2 time Alternative #1: incrementally growing tensor time …… Alternative #2: sequence of tensor snapshots
  27. 27. http://cascaderesearch.org Tensor analysis… • Tensor decomposition [CP,Tucker] can be used for • understanding spectral characteristics of the data and • clustering the data based on inter-dependencies. CP-decomposition: R clusters and cluster memberships Factor Matrix Factor Matrix Factor Matrix Core Tensor
  28. 28. http://cascaderesearch.org Tensor analysis… • Tensor decomposition [CP,Tucker] can be used for • understanding spectral characteristics of the data and • clustering the data based on inter-dependencies. Tucker- decomposition: r1xr2xr3 clusters and cluster memberships Factor Matrix Factor Matrix Factor Matrix Core Tensor Problems: • these are very computationally expensive operations, • they are also memory intensive, • they do not go hand-in-hand with other data manipulation operations (selection, join, union)
  29. 29. http://cascaderesearch.org Common data characteristics… • The key characteristics of the real worlddata sets include the following: • multi-variate • multi-modal • temporal, • spatial, • hierarchical, • graphical • multi-layer • multi-resolution • inter-dependent • observations of interest depend on and impact each other time
  30. 30. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time
  31. 31. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time hierarchy Tempe PHX AZ CA US SF
  32. 32. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time distance matrix hierarchy
  33. 33. http://cascaderesearch.org ..and the metadata…… • Different modes of the tensor can have different types of metadata.. time distance matrix hierarchy graph Differently-Modal Tensors (DMT)
  34. 34. http://cascaderesearch.org ..alternatively…Networks of Time Series (NoTS)
  35. 35. http://cascaderesearch.org Research challenges… Questions: • how to best account for the different modalities of the data? • can we leverage metadata to support multi-resolution and incremental tensor analysis operations? • can we implement a memory hierarchy supported tensor analysis? • can we co-optimize tensor analysis and other data manipulation operations?
  36. 36. http://cascaderesearch.org What about other approaches? • There are several technical approaches. • factorization, matrix/tensor decomposition • probabilistic (Bayesian/graphical model) learning • deep structured learning and neural networks. ….many of the algorithms are based on iterative processes, such as alternating least squares (ALS) or stochastic gradient descent (SGD), which approximate the best solution until a convergence condition is reached Question: Can we develop metadata-supported and multi-scale techniques that can leverage the volume/cost trade-offs provided by storage hierarchies to provide high accuracy at minimum cost?
  37. 37. http://cascaderesearch.org Conclusions… Making sense of a dynamically evolving world is a really really challenging task…… modeling organization storage/indexing replication fusion/integration ingest compression visualization partitioning hiding security encryption repudiation provenance authentication trust models access control finger printing tamper detectionsummarization/aggregation sampling cleaning normalization annotation dimensionality reduction media analysis machine learning FUNDAMENTAL KNOWLEDGE ENABLING TECHNOLOGIES SYSTEMS Technology Element: Real-time Data Processing and Analysis Technology Element: Parallel and Distributed Data Processing and Analysis Technology Element: High-dimensional and Multi-modal Data Processing and Analysis Technology Element: Trusted and Privacy-preserving Data Processing and Analysis Fundamental Insights Partners & Stakeholders SystemRequirements TECHNOLOGY BARRIERS: • availability, • timeliness, • cost, • consistency, • trust, • privacy, • security, • compliance, and • accessibility FUNDAMENTAL BARRIERS: • heterogeneous data and models, • transient, mobile, and distributed data, • multi-scale, multi-resolution data, • data with different quality, precision, privacy, security, and trust levels, and • varying data volume and characteristics • high dimensional, complex data Requirements Product and Outcomes
  38. 38. http://cascaderesearch.org candan@asu.edu cascaderesearch.orgcascade.asu.edu
  39. 39. http://cascaderesearch.org Relevant Publications • Xinsheng Li, Shenyu Huang, K. Selcuk Candan, Maria Luisa Sapino. 2PCP: Two-Phase CP Decomposition for Billion-Scale Dense Tensors. IEEE Int. Conference on Data Engineering (ICDE) 2016. • Jung Hyun Kim, K. Selcuk Candan, Maria Luisa Sapino, PageRank Revisited: On the Relationship between Node Degrees and Node Significances in Different Applications, International Workshop on Querying Graph Structured Data (GraphQ'16), in conjunct with EDBT 2016. • Mijung Kim, K. Selcuk Candan: Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient CP and tucker decompositions. Data Min. Knowl. Discov, 30(1): 1-46 (2016) • Shengyu Huang, Xinsheng Li, K. Selcuk Candan, Maria Luisa Sapino: Reducing seed noise in personalized PageRank. Social Netw. Analys. Mining. 6(1): 6:1-6:25 (2016) • Mithila Nagendra, K. Selcuk Candan: Efficient Processing of Skyline-Join Queries over Multiple Data Sources. ACM Trans. Database Syst. 40(2): 10 (2015) • Jung Hyun Kim, K. Selcuk Candan, Maria Luisa Sapino: Locality-sensitive and Re-use Promoting Personalized PageRank Computations. Knowledge and Information Systems, pp 1-39, First online: 18 June 2015. • Parth Nagarkar, K. Selcuk Candan, Aneesha Bhat: Compressed Spatial Hierarchical Bitmap (cSHB) Indexes for Efficiently Processing Spatial Range Query Workloads. PVLDB 8(12): 1382-1393 (2015) • Xilun Chen, K. Selcuk Candan, Maria Luisa Sapino, Paulo Shakarian: KSGM: Keynode-driven Scalable Graph Matching. CIKM 2015: 1101-1110
  40. 40. http://cascaderesearch.org Relevant Publications • Xilun Chen and K. Selcuk Candan. LWI-SVD: Low-rank, Windowed, Incremental Singular Value Decompositions on Time-Evolving Data Sets. KDD'14, NY, USA. 2014. • Xilun Chen and K. Selcuk Candan. GI-NMF: Group Incremental Non-Negative Matrix Factorization on Data Streams. ACM International Conference on Conference on Information and Knowledge Management (CIKM'14). Shaghai, China. 2014. • Mijung Kim and K. Selcuk Candan. Efficient Static and Dynamic In-Database Tensor Decompositions on Chunk-Based Array Stores. ACM International Conference on Conference on Information and Knowledge Management (CIKM'14). Shaghai, China. 2014. • Xinsheng Li, Shenyu huang, K. Selcuk Candan, Maria Luisa Sapino. Focusing Decomposition Accuracy by Personalizing Tensor Decomposition (PTD). ACM International Conference on Information and Knowledge Management (CIKM'14). Shanghai, China. 2014. • Mijung Kim and K. Selcuk Candan. Pushing-Down Tensor Decompositions over Unions to Promote Reuse of Materialized Decompositions. The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD'14). Nancy, France. 2014. • Shengyu Huang, Xinsheng Li, K. Selcuk Candan, Maria Luisa Sapino. “Can you really trust that seed?”•: Reducing the Impact of Seed Noise in Personalized PageRank. International Conference on Advances in Social Network Analysis and Mining (ASONAM). Beijing, China. 2014 • Parth Nagarkar and K. Selcuk Candan. HCS: Hierarchical Cut Selection for Efficiently Processing Queries on Data Columns using Hierarchical Bitmap Indices. EDBT'14: pp. 271-282, 2014.
  41. 41. http://cascaderesearch.org Relevant Publications • Xiaolan Wang, K. Selcuk Candan, and Maria Luisa Sapino. Leveraging Metadata for Identifying Local, Robust Multi-variate Temporal (RMT) Features. accepted to ICDE 2014 • Claudio Schifanella, K. Selcuk Candan, and Maria Luisa Sapino. Multiresolution Tensor Decompositions with Mode Hierarchies. Trans. on Knowledge Discovery from Data (TKDD), ACM Transactions on Knowledge Discovery from Data (TKDD), 8(2), June 2014. • Jung W. Kim, K. Selcuk Candan, and M. L. Sapino. LR-PPR: Locality-Sensitive, Re-use Promoting, Approximate Personalized PageRank Computation. CIKM'13, 2013. • Mithila Nagendra and K. Selcuk Candan. Layered Processing of Skyline-Window-Join (SWJ) Queries using Iteration-Fabric. ICDE'13, pp. 985-996, 2013. • Mithila Nagendra and K. Selcuk Candan. SkySuite: A Framework of Skyline Join Operators for Static and Stream Environments. VLDB'13, 2013. • Jung Hyun Kim, Xilun Chen, K. Selcuk Candan, and Maria Luisa Sapino. Hive Open Research Network Platform, at EDBT'13, pp. 985-996, 2013. • Mijung Kim, K. Selçuk Candan: SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data Knowl. Eng. 72: 285-303 (2012) • Claudio Schifanella, Maria Luisa Sapino, K. Selçuk Candan: On context-aware co-clustering with metadata support. J. Intell. Inf. Syst. 38(1): 209-239 (2012)
  42. 42. http://cascaderesearch.org Relevant Publications • K. Selçuk Candan, Rosaria Rossini, Maria Luisa Sapino, Xiaolan Wang: sDTW: Computing DTW Distances using Locally Relevant Constraints based on Salient Feature Alignments. PVLDB 5(11): 1519-1530 (2012) • Mijung Kim, K. Selçuk Candan: Decomposition-by-normalization (DBN): leveraging approximate functional dependencies for efficient Tensor Decomposition. CIKM 2012: 355-364 • Jung Hyun Kim, K. Selçuk Candan, Maria Luisa Sapino: Impact Neighborhood Indexing (INI) in diffusion graphs. CIKM 2012: 2184-2188 • K. Selçuk Candan, Rosaria Rossini, Maria Luisa Sapino, Xiaolan Wang: STFMap: query- and feature-driven visualization of large time series data sets. CIKM 2012: 2743-2745 • Mithila Nagendra, K. Selçuk Candan: Skyline-sensitive joins with LR-pruning. EDBT 2012: 252-263 • Songling Liu, Juan P. Cedeño, K. Selçuk Candan, Maria Luisa Sapino, Shengyu Huang, Xinsheng Li: R2DB: A System for Querying and Visualizing Weighted RDF Graphs. ICDE 2012: 1313-1316.

×