This document summarizes a talk about the CMS Data Aggregation System (DAS). DAS aggregates metadata from multiple CMS databases to allow users to query across different services. It uses a plug-and-play architecture to integrate new databases in a customizable way while preserving each database's access policies. Benchmark tests showed DAS can aggregate over 500,000 records from two databases into JSON documents within a few seconds by caching results. Future plans include further testing DAS in production and potentially releasing it as open source software.
A great power point presentation for DBMS Concepts from start to end and with best examples chapter by chapter. Please go though each chapters sequentially for your knowledge.
A very easy going study material for better understanding and concepts of Database Management System.
Behind the Scenes at LiveJournal: Scaling StorytimeSergeyChernyshev
Brad talks about clustering setups using MySQL and DRDB and their Open Source software most of which he wrote initially and continues to develop.
A lot of these techniques and/or software is used by many other companies as well - among them Flickr/Yahoo! and Facebook.
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...JAX London
2011-11-02 | 02:25 PM - 03:15 PM
In 2009 RBS set out to build a single store of trade and risk data that all applications in the bank could access simultaniously. This talk discusses a number of novel techniques that were developed as part of this work. Based on Oracle Coherence the ODC departs from the trend set by most caching solutions by holding its data in a normalised form making it both memory efficient and easy to change. However it does this in a novel way that supports most arbitrary queries without the usual problems associated with distributed joins. We'll be discussing these patterns as well as others that allow linear scalability, fault tolerance and millisecond latencies.
A great power point presentation for DBMS Concepts from start to end and with best examples chapter by chapter. Please go though each chapters sequentially for your knowledge.
A very easy going study material for better understanding and concepts of Database Management System.
Behind the Scenes at LiveJournal: Scaling StorytimeSergeyChernyshev
Brad talks about clustering setups using MySQL and DRDB and their Open Source software most of which he wrote initially and continues to develop.
A lot of these techniques and/or software is used by many other companies as well - among them Flickr/Yahoo! and Facebook.
Java Tech & Tools | Beyond the Data Grid: Coherence, Normalisation, Joins and...JAX London
2011-11-02 | 02:25 PM - 03:15 PM
In 2009 RBS set out to build a single store of trade and risk data that all applications in the bank could access simultaniously. This talk discusses a number of novel techniques that were developed as part of this work. Based on Oracle Coherence the ODC departs from the trend set by most caching solutions by holding its data in a normalised form making it both memory efficient and easy to change. However it does this in a novel way that supports most arbitrary queries without the usual problems associated with distributed joins. We'll be discussing these patterns as well as others that allow linear scalability, fault tolerance and millisecond latencies.
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra – as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Nagios Conference 2012 - John Murphy - Rational Configuration DesignNagios
John Murphy's presentation on well designed Nagios configurations.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
OpenSplice DDS enables seamless, timely, scalable and dependable data sharing between distributed applications and network-connected devices. Its technical and operational benefits have propelled adoption across multiple industries, such as Defence and Aerospace, SCADA, Gaming, Cloud Computing, Automotive, etc.
If you want to learn about OpenSplice DDS or discover some of its advanced features, this webcast is for you!
In this two-parts webcast we will cover all the aspects tied to architecting and developing OpenSplice DDS systems. We will look into Quality of Services, data selectors concurrency and scalability concerns.
We will present the brand-new, and recently finalized, C++ and Java APIs for DDS, including examples of how this can be used with C++11 features. We will show how, increasingly popular, functional languages such as Scala can be used to efficiently and elegantly exploit the massive HW parallelism provided by modern multi-core processors.
Finally we will present some OpenSplice specific extensions for dealing very high-volumes of data – meaning several millions of messages per seconds.
The 6 Critical Components of Population HealthHealth Catalyst
This article examines how to define population health through a review of the top analytics research firms. It lands on a single theme, but in the process it uncovers six common categories of IT capabilities required to successfully manage population health:
Data Aggregation
Patient Stratification
Care Coordination
Patient Engagement
Performance Reporting
Administrative/Business
These six strategic components define the population health ecosystem, and successful organizations must multitask across these domains, working with an enterprise data warehouse, if they hope to thrive in value-based healthcare and become true partners and assets in their respective communities.
Landmark Review of Population Health ManagementHealth Catalyst
Population health management (PHM) is in its early stages of maturity, suffering from inconsistent definitions and understanding, overhyped by vendors and ill-defined by the industry. Healthcare IT vendors are labeling themselves with this new and popular term, quite often simply re-branding their old-school, fee-for-service, and encounter-based analytic solutions. Even the analysts —KLAS, Chilmark, IDC, and others—are also having a difficult time classifying the market. In this paper, I identify and define 12 criteria that any health system will want to consider in evaluating population health management companies. The reality of the market is that there is no single vendor that can provide a complete PHM solution today. However there are a group of vendors that provide a subset of capabilities that are certainly useful for the next three years. In this paper, I discuss the criteria and try my best to share an unbiased evaluation of sample of the PHM companies in this space.
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012Chris Richardson
The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can’t use JDBC and must partition your data.
In this presentation you will learn about popular NoSQL databases – MongoDB, and Cassandra – as well at VoltDB. We will compare and contrast each database’s data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Nagios Conference 2012 - John Murphy - Rational Configuration DesignNagios
John Murphy's presentation on well designed Nagios configurations.
The presentation was given during the Nagios World Conference North America held Sept 25-28th, 2012 in Saint Paul, MN. For more information on the conference (including photos and videos), visit: http://go.nagios.com/nwcna
OpenSplice DDS enables seamless, timely, scalable and dependable data sharing between distributed applications and network-connected devices. Its technical and operational benefits have propelled adoption across multiple industries, such as Defence and Aerospace, SCADA, Gaming, Cloud Computing, Automotive, etc.
If you want to learn about OpenSplice DDS or discover some of its advanced features, this webcast is for you!
In this two-parts webcast we will cover all the aspects tied to architecting and developing OpenSplice DDS systems. We will look into Quality of Services, data selectors concurrency and scalability concerns.
We will present the brand-new, and recently finalized, C++ and Java APIs for DDS, including examples of how this can be used with C++11 features. We will show how, increasingly popular, functional languages such as Scala can be used to efficiently and elegantly exploit the massive HW parallelism provided by modern multi-core processors.
Finally we will present some OpenSplice specific extensions for dealing very high-volumes of data – meaning several millions of messages per seconds.
The 6 Critical Components of Population HealthHealth Catalyst
This article examines how to define population health through a review of the top analytics research firms. It lands on a single theme, but in the process it uncovers six common categories of IT capabilities required to successfully manage population health:
Data Aggregation
Patient Stratification
Care Coordination
Patient Engagement
Performance Reporting
Administrative/Business
These six strategic components define the population health ecosystem, and successful organizations must multitask across these domains, working with an enterprise data warehouse, if they hope to thrive in value-based healthcare and become true partners and assets in their respective communities.
Landmark Review of Population Health ManagementHealth Catalyst
Population health management (PHM) is in its early stages of maturity, suffering from inconsistent definitions and understanding, overhyped by vendors and ill-defined by the industry. Healthcare IT vendors are labeling themselves with this new and popular term, quite often simply re-branding their old-school, fee-for-service, and encounter-based analytic solutions. Even the analysts —KLAS, Chilmark, IDC, and others—are also having a difficult time classifying the market. In this paper, I identify and define 12 criteria that any health system will want to consider in evaluating population health management companies. The reality of the market is that there is no single vendor that can provide a complete PHM solution today. However there are a group of vendors that provide a subset of capabilities that are certainly useful for the next three years. In this paper, I discuss the criteria and try my best to share an unbiased evaluation of sample of the PHM companies in this space.
Services Oriented Infrastructure in a Web2.0 WorldLexumo
Tom Maguire discusses applying SOA Web 2.0 technologies, and open standards to the problems faced by IT in an ever changing world.
This session was recorded at EMC World 2007 in Orlando Florida
Apache Camel: The Swiss Army Knife of Open Source Integrationprajods
The Camel project from Apache(camel.apache.org), is a very popular, light weight, open source integration framework.
This presentation shows some interesting features of Camel and the unique advantages that Camel brings to your integration projects. Some business
use cases are shown to explain how Camel makes open source integration a cakewalk.
Table of contents:
1. An overview of Apache Camel
2. Integration architecture explained
3. Using Camel in different integration architectures
3.a. In the Securities domain
3.b. In the Travel domain
4. High Availability and Load Balancing with Camel
This deck was presented at the Spark meetup at Bangalore. The key idea behind the presentation was to focus on limitations of Hadoop MapReduce and introduce both Hadoop YARN and Spark in this context. An overview of the other aspects of the Berkeley Data Analytics Stack was also provided.
Balancing Replication and Partitioning in a Distributed Java DatabaseBen Stopford
This talk, presented at JavaOne 2011, describes the ODC, a distributed, in-memory database built in Java that holds objects in a normalized form in a way that alleviates the traditional degradation in performance associated with joins in shared-nothing architectures. The presentation describes the two patterns that lie at the core of this model. The first is an adaptation of the Star Schema model used to hold data either replicated or partitioned data, depending on whether the data is a fact or a dimension. In the second pattern, the data store tracks arcs on the object graph to ensure that only the minimum amount of data is replicated. Through these mechanisms, almost any join can be performed across the various entities stored in the grid, without the need for key shipping or iterative wire calls.
Hummingbird - Open Source for Small Satellites - GSAW 2012Logica_hummingbird
This presentation about the Hummingbird project won best presentation at the GSAW 2012 event in Los Angeles - http://csse.usc.edu/gsaw/index.html
To read more from this author about how the IT industry can shape the boundaries of the space industry, please visit - http://www.logica.com/we-work-in/space/related%20media/thought-pieces/2011/new-paradigms-for-space/
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
Greenplum is using Hadoop in several interesting ways as part of a larger big data architecture with EMC Greenplum Database (a scale-out MPP SQL database) and EMC Isilon (a scale-out network-attached storage appliance). After a quick introduction of Greenplum Database and Isilon, I list some ways Greenplum is tightly integrating with Hadoop and why we would want to do such a thing. Integration points discussed include: Greenplum Database external tables to seamlessly access data in HDFS, querying HBase tables natively from Greenplum Database, Greenplum Database having its underlying storage on HDFS, and Isilon OneFS as a seamless replacement for HDFS.
Hadoop makes data storage and processing at scale available as a lower cost and open solution. If you ever wanted to get your feet wet but found the elephant intimidating fear no more.
We will explore several integration considerations from a Windows application prospective like accessing HDFS content, writing streaming jobs, using .NET SDK, as well as HDInsight on premise or on Azure.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Data Aggregation System
1. GenDB LumiDB
Data
Phedex PSetDB
Quality
DBS SiteDB RunDB Overview
How can I find
my data?
CMS Data Aggregation System
Valentin Kuznetsov, Cornell University
ICCS Workshop, Amsterdam, May 31 - Jun. 2d, 2010
1
2. Talk outline
✤ Introduction
✤ Motivations
✤ What is DAS?
✤ Design, architecture, implementations
✤ Current status & benchmarks
✤ Future plans
2
3. Introduction
✤ CMS is a general purpose physics detector built for the LHC
✤ beam collision 25 nsec, online trigger 300 Hz, event size 1-2MB
✤ More then 3000 physicists, 183 institution, 38 countries
✤ CMS uses distributed computing and data model
✤ 1 Tier-0, 7 Tier-1, O(50) Tier-2, O(50) Tier-3 centers
✤ 2-6 PB/year of real data + 1x Simulated data, ~500GB/year of meta-data
✤ Code: C++/Python; Databases: ORACLE, MySQL, CouchDB, MongoDB ...
4. Motivations ...
Data Aggregation System
✤ A user want to query different
meta-data services without
knowing of their existence
run
A user want to combine
RunSummary DataQuality LumiDB
✤ run, trigger, detector, ... trigger, ecal, hcal, ... lumi, luminosity, hltpath
run, run lumi
information from different lumi
meta-data services Phedex DBS
block, file, block.replica, block, run, file, block, site, MC id
GenDB
generator, xsection,
file.replica, se, node, ... site config, tier, dataset,
lumi, parameters, ....
process, decay, ...
✤ A user has domain knowledge,
site pset
but need to query X services, SiteDB
site, admin, site.status, ..
Overview
country, node, region, ..
Parameter Set DB
CMSSW parameters
using Y interface and dealing
with Z data formats to get our
Service E
param1, param2, DC
Service ..
Service
param1, param2, .. B
Service
param1, param2, .. A
Service
param1, param2, ..
data param1, param2, ..
4
5. What is DAS?
✤ DAS stands for Data Aggregation System
✤ It is layer on top of existing data-services
✤ It aggregates data across distributed data-services while preserving
their integrity, security policy and data-formats
✤ it provides caching for data-services (side effect)
✤ It represents data in defined format: JSON documents
✤ It allows query data via free text-based queries
✤ Agnostic to data content 5
6. Challenges ...
✤ Combining N data-services is a great idea, but
✤ there is no ad-hoc IT solution
✤ DAS doesn’t hold the data, can’t have pre-defined schema
✤ must support existing APIs, data formats, interfaces, security
policies
✤ must relate and aggregate meta-data
✤ must be efficient, flexible, scalable and easy to use
✤ Work on DAS prototype to understand those challenges 6
7. DAS prototype
✤ Code written in python, ideal for prototyping
✤ Use existing meta-data from CMS data-services as test-bed
✤ 8 data-services, 75/250GB in tables/indexes
✤ Use document-oriented “schema-less’’database: MongoDB
✤ raw cache, merge result cache, mapping and analytics DBs
✤ Support free keyword-based queries, e.g. site=T1_CERN, run=100
✤ Aggregate information using key-value matching
7
8. DAS architecture
Invoke the same API(params)
Update cache periodically
DAS robot Fetch popular
queries/APIs
DAS DAS DAS DAS
mapping Map data-service cache merge Analytics
output to DAS
records
record query, API
call to Analytics
runsum mapping aggregator
lumidb
data-services
parser
DAS core
DAS web
plugins
phedex CPU core RESTful interface
server
DAS core UI
sitedb
dbs DAS Cache server
9. DAS workflow query
DAS DAS
core logging
parser
✤ Query parser
yes no
query
DAS merge
✤ Query DAS merge collection yes no
query
DAS cache
✤ Query DAS cache collection
DAS DAS query DAS
merge cache data-services Mapping
✤ invoke call to data service
Aggregator DAS
✤ write to analytics Analytics
results
✤ Aggregate results (generator)
Web UI
10. DAS and data-services
✤ DAS is data-service agnostic
✤ a data-service is identified by its URI and input parameters
✤ Use plug-and-play mechanism:
✤ add new data-service using ASCII map file (URI, parameters, ...)
✤ use generic HTTP access and standard data-parsers (XML, JSON)
✤ Use dedicated plugin:
✤ specific access requirements, custom parsers, etc.
11. DAS map files
Data Aggregation System
system : google_maps
format : JSON
---
urn : google_geo_maps
url : "http://maps.google.com/maps/geo"
expire : 30 DAS mapping
params : { "q" : "required", "output": "json" }
daskeys : [
{"key":"city","map":"city.name","pattern":""},
]
Data Service: URL/api?params
12. DAS benchmark
✤ Fetch all blocks from our bookkeeping (DBS) and data transfer (PhEDEx) CMS data services
✤ parse, remap notations, store to cache, merge matched records (aggregation)
✤ Linux 64-bit, 1CPU for DAS, 1CPU for MongoDB, record size ~1KB
✤ Elapsed time = retrieval time + parsing time + remapping time + cache insertion/indexing
time + output creation time
Time, no Time w/
Format Records
cache cache
DBS yield XML 387K 68s 0.98s 393K DAS records,
PhEDEx yield XML 190K 107s 0.98s create ~6K docs/s
read ~7.6K docs/s
Merge step JSON 577K 63s 0.9s
DAS total JSON 393K 238s 2.05s 12
13. Future plans
✤ DAS goes into production this year in CMS:
✤ confirm scalability, transparency and durability w/ various data-
services
✤ work on analytics to organize pre-fetch strategies
✤ Apply to other domain disciplines
✤ Release as open source
14. Summary
✤ Data Aggregation System is data agnostic and allow to query/
aggregate meta-data information in customizable way
✤ The current architecture easily integrates with existing data-services
preserving their access, security policy and development cycle
✤ DAS is designed to work with existing CMS data-services, but can
easily go beyond that boundary
✤ Plug-and-play mechanism makes it easily to add new data-services
and configure DAS to specific domain