This document provides information about the SEASR project, including:
1) The goal of SEASR is to develop software tools and frameworks to enable data mining and analysis of unstructured data for humanities scholars.
2) SEASR will provide interfaces, workflow engines, data management tools, and other integrated software to empower humanities researchers.
3) Workshops will be held to educate scholars on using the SEASR software environment.
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Seasr Overview Ws April 2009
1. Pathways to SEASR
National Center for Supercomputing Applications!
University of Illinois at Urbana-Champaign
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation
2. SEASR
This project will focus on developing, integrating, deploying, and sustaining a
•
set of reusable and expandable software components and a supporting
framework, SEASR that will benefit a broad set of data mining applications for
scholars in humanities
The key goals established for this effort are a set of software centric directives:
•
Support the development of a state-of-the-art software environment for unstructured
•
data management and analysis of digital libraries, repositories and archives, as well as
educational platforms that are expected to contribute to many of the humanities
breakthroughs of the 21st century.
Support the continued development, expansion, and maintenance of end-to-end
•
software system – user interfaces, workflow engines, data management, analysis and
visualization tools, collaborative tools, and other software integrated into a complete
environment SEASR – to bring the full power of data analytics to the scholars.
Support education and training for use of this software environment for analysis through
•
workshops to promote its usage among scholars.
+INTERMEDIATE LAYER + EOT
APPLICATIONS
3. Agenda Day 1 Morning
Wednesday April 22, 2009
• 8:30am Registration and Breakfast
• 9:00am SEASR Overview
• 10:00am Break
• 10:30am SEASR Application Examples and Demonstrations
– Zotero and SEASR
– Text Analysis and SEASR
– Audio Analysis and SEASR (NEMA, NESTER)
– Fedora and SEASR
• Noon Lunch
4. Agenda Day 1 Afternoon
1:00pm SEASR Architecture, Installation
•
2:00pm SEASR Tools
•
2:30pm Break
•
3:00pm Breakout Session
•
– Humanities: SEASR Tools with Hand On Demo
– Developers: SEASR Technical Details
4:30pm End of Day
•
5. Workshop Objective
The objective of the workshop is:
• To explain and demonstrate the utility of SEASR
for digital humanities, and to bring you to a point
where you could deploy, contribute and utilize the
SEASR environment.
SEASR + TOOLS + EXEMPLARS + HANDS ON
6. Workshop Goals
The goals of the workshop are:
• LEARN: Provide a detailed understanding of the SEASR framework
• LEARN: Provide a foundation and examples for participant teams to
use SEASR in a study or inquiry
• ADOPT: Share participant generated research plans to utilize SEASR
• INSTALL: Provide detailed instructions on how to install, build
components, integrate existing applications, and maintain the SEASR
environment
• SUPPORT: Develop plans for resolution of issues raised by the user
community in utilization of SEASR
• SUSTAIN: Develop a plan for community driven future development
and dissemination of SEASR
Learn + Adopt + Sustain
8. SEASR: Reach + Relevance + Reuse + Repeatability
SEASR emphasizes flexibility, scalability, modularity,
provides community hub and access to heterogeneous
data and computational systems
– Semantic driven environment for SOA interoperability
– Encourages sharing and participation for building communities
– Modular construction allows flows to be modified and configured
to encourage reusability within and across domains
– Enables a mashup and integration of tools
– Data-intensive flows can be executed on a simple desktop or a
large cluster(s) without modification
– Computation can be created for distributed execution on servers
where the content lives
– User accessibility to control trust and compliance with required
copyright license of content
– Relies on standardized Resource Description Framework (RDF) to
define components and flow
9. SEASR Apps SEASR Plugins SEASR Web Apps SEASR Services
Meandre Data‐Intensive Flows
SEASR Components
Developer Tools
Data Analy5cs Visualiza5on
Gateway Connec5ons Descrip5ve Sta5s5cs Graphing
Data Persistence Predic5ve Modeling Modeling Vis
Data Transforma5on Discovery Info Vis (small mul5ples)
Natural Lang Processing
Component Repository Component Discovery
Meandre Infrastructure
Shared Stores File Systems Metadata Stores SOA Gateways
Virtualiza5on Infrastructure
11. Workbench
• Web-based UI
• Components and flows
are retrieved from server
• Additional locations of
components and flows
can be added to server
• Create flow using a
graphical drag and drop
interface
• Change property values
• Execute the flow
13. SEASR @ Work – Zotero
• Plugin to Firefox
• Zotero manages the
collection
• Launch SEASR Analytics
– Citation Analysis uses the
JUNG network importance
algorithms to rank the authors
in the citation network that is
exported as RDF data from
Zotero to SEASR
– Zotero Export to Fedora
through SEASR
– Saves results from SEASR
Analytics to a Collection
• Launch MONK Processing
– MONK DB Ingestion Workflow
14. SEASR @ Work – Fedora
Interactive Web
Application
Web Service
15. SEASR @ Work – Entity Mash-up
• Entity Extraction
with OpenNLP
• Locations
viewed on
Google Map
• Dates viewed on
Simile Timeline
16. SEASR @ Work – Audio Analysis
• NEMA: Executes a
SEASR flow for each
run
– Loads audio data
– Extracts features for
every 10 sec moving
window of audio
– Loads and applies the
models
– Sends results back to the
WebUI
• NESTER: Annotation of
Audio via Spectral
Analysis
17. SEASR @ Work – MONK
Executes flows for
each analysis
requested
– Predictive
modeling using
Naïve Bayes
– Predictive
modeling using
Support Vector
Machines (SVM)
18. SEASR @ Work – DISCUS
On-demand usage of
•
analytics while surfing
– While navigating
request analytics to be
performed on page
– Text extraction and
cleaning
Summarization and key
•
work extraction
– List the important
terms on the page
being analyzed
– Provide relevant short
summaries
Visual maps
•
– Provide a visual
representation of the
key concepts
– Show the graph of
relations between
concepts
19. SEASR and UIMA : Emotion Tracking
Goal is to have this type of Visualization to track emotions across a
text document (Leveraging flare.prefuse.org)
21. Pathways to SEASR
National Center for Supercomputing Applications!
University of Illinois at Urbana-Champaign
The SEASR project and its Meandre infrastructure!
are sponsored by The Andrew W. Mellon Foundation