Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Poster jsoe research expo 2009
1. PALMS-CI: A Policy-driven Cyberinfrastructure
For the Exposure Biology Community
Barry Demchak bdemchak@ucsd.edu and Ingolf Krüger
California Institute for Telecommunications and Information Technology, San Diego Division
Requirements
Functional (FRs)
• Support research workflow
• Allow multiple investigators & studies
• Support community contribution of
device profiles, calculations,
visualizers
• Share raw data & processed results
while maintaining provenance
Quality (QRs)
• Dynamic access control
• Confidentiality and privacy
(HIPAA/IRB)
• High availability and reliability
• Scalability (bandwidth/storage/users)
• Auditability
Challenges
Early Identification and modelling
• Stakeholders
• Quality requirements (QRs)
• Crosscutting concerns
Policy Definition and Execution
Agile development process
• Responsive to changing requirements
Future-proof architecture
• Ease of maintainability and evolution
while minimizing risk to operations
Rich Services3
Service Oriented Architecture (SOA)
• Based on composite pattern (i.e.,
system-of-systems), messaging
pattern, routing pattern, and role-
based interactions, choreography
• Crosscutting concerns (including
policy evaluation) as interceptors
Agile Development Framework
• End-to-end model-driven approach
• Early & continuous identification and
prioritization of crosscutting concerns
Results
Models
• Use cases, domain models, services
Implementation
• Java-based Enterprise Service Bus
• Standards-based messaging
• Storage virtualization based on OSS
• Inversion of Control creates worker
threads on demand
Features
• Rapid incorporation of emergent data
sources at low risk to existing users
• Seamless incorporation of novel
intermediary services (e.g., policy)
• Easy integration w/new clients & CIs
• Scales easily to high usage while
maintaining high performance
Future improvements
• Policy-driven crosscutting concerns
(e.g., IA & HIPAA, scaling, failure
mitigation, self-configuration)
• Migration to cloud
Functional Requirements (FRs)
This material is based upon work supported by the National Institutes of
Health under Grant No 1U01CA130771-01 (Project PALMS: Kevin Patrick,
PI) and the National Science Foundation under Grant No CCF-0702791
f
Cyberinfrastructures (CI)2
f
f
PALMS
f
Science encompassing reliable information delivery to intended parties under appropriate circumstances. Defined by
National Security Agency (NSA) as information availability, integrity, confidentiality, non-repudiation, and access control.
Demanded by all or most CI stakeholders as a condition of participating in the CI.
Information Assurance (IA)1
References
Store/organize
Collect data
Analyze
Visualize
Physical Activity Location Measurement System to understand where activity-related energy expenditure occurs in
humans as a function of time and space. Harvests data from wearable devices on small and large scales, provides
framework for research and analysis, and has ultimate goal of discovering methods for engineering better health.
An Internet-based research computing environment that supports data acquisition, data storage, data management,
data integration, data mining, data visualization, and other computing and information processing services. Different
stakeholders produce, consume, manage, and govern a CI, and their requirements must be simultaneously met or else
the integrity of the CI degrades.
1. W. McNight. What is Information Assurance? Crosstalk: The Journal of Defense Software Engineering. July 2002.
2. Revolutionizing Science and Engineering Through Cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on
Cyberinfrastructure. Washington, DC: National Science Foundation, January 2003. http://www.nsf.gov/cise/sci/reports/atkins.pdf
3. M. Arrott, B. Demchak, V. Ermagan, C. Farcas, E. Farcas, I. H. Krüger, and M. Menarini, Rich Services: The Integration Piece of the SOA Puzzle. In
Proceedings of the IEEE International Conference on Web Services (ICWS), Salt Lake City, Utah, USA. IEEE, Jul. 2007, pp. 176-183.
Quality Requirements (QRs)
Controlled AccessSecure
Reliable
Reusable
Manageable
Maintainable
Scalable
Performant
Highly Available
High Data Integrity
ConfidentialHIPAA-compliant
Auditable
Robust
Rich Services VirtualNetwork
Rich Services
RAS4
Services
Service S 1
Roles
U1
U2
U3
U4
U5
Use Case Graph
Concerns
C1 C2 C3
C4
CC1
CC2CC3
Domain Model
R1 R2
R3 R4
R5 R6
R1 R2
msg
R3
CC1
CC2
Role Domain Model
R1 R2
R3 R4
R5 R6
CC1 CC2 CC3
Router/Interceptor
Messenger /Communicator
RAS1 RAS2
CC1 CC4 CC5
Router /Interceptor
Messenger / Communicator
RAS5 RAS6RAS3
S
/
D
S
/
D
RIS:
RIS:
ServiceElicitationRichServiceArchitecture
RAS7
Systemof Systems Topology
H1 H2
H3
H5
H6
H7
H8
H9
H4
RAS1 RAS2 RAS3
RAS5 RAS6 RAS7
Infrastructure Mapping
H1:RAS1 H2:RAS2
H3:CC1
H5:RAS2
H6:RAS5
H7:RAS7H8:RAS7
H9:RAS6
H4:RAS3
Optimization
Implementation
RAS1 RAS 2
RAS3 RAS 4
RAS5 RAS 6
RAS7 CC1
CC2 CC3
CC4 CC5
Analysis
Synthesis
Analysis
Identification
Definition
Consolidation
Refinement
Hierarchic
composition
Refinement
Logical Model
SystemArchitecture
Definition
Logical Architecture Loop
DeploymentLoop
Rich Service Development Process
Rich Service ArchitecturePALMS Browser