The Rensselaer IDEA: Data Exploration
Upcoming SlideShare
Loading in...5
×
 

The Rensselaer IDEA: Data Exploration

on

  • 1,306 views

The Rensselaer Institute for Data Exploration and Applications is addressing new modes of data exploration and integration to enhance the work of campus researchers (and beyond). This talk outlines ...

The Rensselaer Institute for Data Exploration and Applications is addressing new modes of data exploration and integration to enhance the work of campus researchers (and beyond). This talk outlines the "data exploration" technologies being explored

Statistics

Views

Total Views
1,306
Views on SlideShare
1,253
Embed Views
53

Actions

Likes
6
Downloads
12
Comments
0

2 Embeds 53

https://twitter.com 51
https://www.rebelmouse.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The Rensselaer IDEA: Data Exploration The Rensselaer IDEA: Data Exploration Presentation Transcript

  • Data Exploration Jim Hendler Director, Rensselaer Institute for Data Exploration and Applications THE RENSSELAER IDEA Rensselaer Polytechnic Institute, USA http://www.cs.rpi.edu/~hendler
  • Data-driven research areas at RPI • • • • • • • • • Data-driven Medical and Healthcare Applications Predictive Models for Business and Economics “Biome” studies for Built and Natural Environments Question Answering from texts and data Resiliency Models for Population-Scale Problems and cybersecurity domains Semantically-enabled Data Services for Science and Engineering Research Materials genome and nano-manufacturing informatics Platforms for testing Policy and Open Data issues … IDEA
  • The Rensselaer IDEA: empowering our researchers Application-specific data tools Data discovery, integration, and interaction technologies IDEA
  • The trunk: Shared Data Technologies High Performance Modeling and Simulation • Center for Computational Innovation Cognitive Computing • Watson at Rensselaer IBM Partnership Perceptualization • Experimental Multimedia Performing Arts Center Data Science • Data Science Research Center IDEA
  • Roots: Data Exploration Geekopedia: Data exploration helps a data consumer focus an information search on the pertinent aspect of relevant data before true analysis can be achieved. In large data sets, data is not gathered or controlled in a focused manner. Even in smaller data sets, it is also true that data gathered are not in a very rigid and specific technique can result in a disorganized manner and a myriad of subsets each… Discover Integrate Validate Explain DATA IDEA
  • Data Exploration Challenges Discover Integrate Validate Explain These needs live outside traditional data/info architectures IDEA
  • Discovery needs semantics How do you find the Data you need? Middle Eastern Terrorists for $800 ? IDEA
  • Discovery – there’s a lot out there IDEA
  • Discovery needs more than keywords World Bank: Africa Africover: Agriculture Kenya: Agricultural US Data.gov: Crop IDEA
  • Integration needs Semantics Person Campus Personnel RIN 660125137 Address # 1118 Address St Pinehurst Address zip 12203 Course topic CSCI Course # YES RPI ID 4961 660125137 Name Hendler NO!!!! Campus Classes CRN Name IDEA 1118 Intro to Physics
  • Semantic Web and Linked Data (UK) Royal Mail County Council IOGDC Open Data Tutorial Ordnance Survey IDEA 11
  • Data Mashups http://logd.tw.rpi.edu IDEA Distribution Statement
  • Validation needs semantics Easy for us IDEA
  • Hard for machines… Head to head comparison shows that burglaries in Avon and Somerset (UK) far exceed those in Los Angeles, California IDEA
  • Data + everything else you know Same or different? Do the terms mean the same? Are they collected in the same way? Are they processed differently? … IDEA
  • Validation/Explanation need knowledge Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007) Statistical correlation needs explanation IDEA
  • Explanation also needs Semantics Inference Web: McGuinness – various DoD/IC projects IDEA
  • Closing the loop: where do the semantics come from? How do we go from the predictive analytics of Big Data to models/explanat ions that allow new understanding? Data Prediction Design Model IDEA
  • 1. Better tools for Analytics, Agents and HPC Make the tools and algorithms being developed by RPI researchers more “reusable” and multitask (including HPC data-analytic tools) IDEA
  • 2. Next-Gen Visualization (at scale) How can multi-modal, multi-user, large scale sensory (visualization, sonification, haptics) interaction change the way we understand data? IDEA
  • 3. Include “agents” in the modeling Develop technologies that enable researchers to work with “humanbased” data at larger scales and in new ways • Population-scale computing models for agent-based simulations IDEA
  • Approach Platform: Research in using supercomputers for discrete modeling • Carothers’ ROSS model KR Model: • Weaver’s restricted rules on graphs Challenge problem: • Classification algorithms at petaflop scale • “Logical” (nonlinear, discontinuous) agents IDEA
  • 4. Exploit Cognitive Computing IDEA will be the hub of Rensselaer’s cognitivecomputing research • eg. Answer questions such as “Why” and “How” integrated with large scale simulations IDEA
  • Watson’s parallel model © Making Watson Fast, IBM J Res and Dev,3/4 2012 Distributed (coarse-grained) parallelism IDEA
  • Cognitive Computing at Scale DeepQA type approach best on large clusters (Physical) Simulation runs on supercomputers IDEA
  • Approach: link these computational models Surmise (unproven): Cognitive Computing on a fast (large) cluster can query computations run against data generated by simulations (physical or agent-based) on the supercomputer IDEA
  • 5. Data services will provide synergy across disciplines • Semantics is a key technology for common data services P o le ep Agency Policy Makers System Scientists Politicians Decision-level semantic mediation: high-level vocabularies that facilitate policy-level decision-making Inte ra d g te A p a io s p lic t n Inter-disciplinary Data Visualization Apps S m tic e an in rope te rability Integration Frameworks & Methodologies Eco & other system Assessment Apps Application-level semantic mediation: mid-level vocabularies that facilitate the interoperability of system models and data products S f t w re o a , T o &A p o ls p s Disciplinespecific model(s) S m tic e an in rope te rability Dataproduct Generator S m tic qu ry e an e , h ypoth is an s d in re c fe n e Information/ S cience Apps Qu ry e , ac e s an c s d u e of data s Data-level Semantic mediation: lower-level vocabularies applied to each data source for a specific science domain of interest D ta a Rp s o e o it rie s Federal Repository Discovery, Integration. Validation Curation, Citation,Archiving … IDEA Commercial Database Researcher Private Database Other Data Sources Me tadata, s h m c e a, data ... ... ...
  • Conclusions • The “warehouse” is only a small part of the data ecosystem • Database technologies are only part of the story • Discovery, Integration, … , validation, explanation are key to solving problems with data • Closing the loop means “exploring” our data • Humans are still a key player in this • The Rensselaer IDEA will explore • Data-driven applications and tools, but also… • … multimodal visualization, multiscale and agent modeling, cognitive computing, and semantic data platforms IDEA
  • Rensselaer Institute for Data Exploration and Applications