Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote given at IEEE Meta-Data, Bathesda, MD, April 6 1999.
Embarcadero ER/Studio helps companies document and enhance existing databases, improve data consistency, effectively communicate models across the enterprise, and model more than just data. With many additional features lacking in Sybase PowerDesigner, ER/Studio brings clarity to complex data models.
Presentation given by Chris Welty (IBM Research) at Knoesis. We get the permission to upload this presentation from Chris Welty. Event details are at: http://j.mp/Welty-at-Knoesis and the associate video is at: https://www.youtube.com/watch?v=grDKpicM5y0
Linked Open Data (LOD) has emerged as one of the largest collections of interlinked structured datasets on the Web. Although the adoption of such datasets for applications is
increasing, identifying relevant datasets for a specific task or topic is still challenging. As an initial step to make such identification easier, we provide an approach to automatically identify the topic domains of given datasets. Our method utilizes existing knowledge sources, more specifically Freebase, and we present an evaluation which validates the topic domains we can identify with our system. Furthermore, we evaluate the effectiveness of identified topic domains for the purpose of finding relevant datasets, thus showing that our approach improves reusability of LOD datasets.
Talk given by prof. Amit Sheth at the ICMSE-MGI Digital Data Workshop held at Kno.e.sis Center from November 13-14 2013.
workshop page: http://wiki.knoesis.org/index.php/ICMSE-MGI_Digital_Data_Workshop
Amit Sheth, Pramod Anantharam, Krishnaprasad Thirunarayan, "kHealth: Proactive Personalized Actionable Information for Better Healthcare", Workshop on Personal Data Analytics in the Internet of Things at VLDB2014, Hangzhou, China, September 5, 2014.
Accompanying Video: http://youtu.be/pqcbwGYHPuc
Paper: http://www.knoesis.org/library/resource.php?id=2008
Amit Sheth, 'Semantic Computing in Real-World: Vertical and Horizontal application, within Enterprise and on the Web, ' Panel Presentation at International Conference on Semantic Computing (ICSC2011), Palo Alto, CA, September 20, 2011.
Embarcadero ER/Studio helps companies document and enhance existing databases, improve data consistency, effectively communicate models across the enterprise, and model more than just data. With many additional features lacking in Sybase PowerDesigner, ER/Studio brings clarity to complex data models.
Presentation given by Chris Welty (IBM Research) at Knoesis. We get the permission to upload this presentation from Chris Welty. Event details are at: http://j.mp/Welty-at-Knoesis and the associate video is at: https://www.youtube.com/watch?v=grDKpicM5y0
Linked Open Data (LOD) has emerged as one of the largest collections of interlinked structured datasets on the Web. Although the adoption of such datasets for applications is
increasing, identifying relevant datasets for a specific task or topic is still challenging. As an initial step to make such identification easier, we provide an approach to automatically identify the topic domains of given datasets. Our method utilizes existing knowledge sources, more specifically Freebase, and we present an evaluation which validates the topic domains we can identify with our system. Furthermore, we evaluate the effectiveness of identified topic domains for the purpose of finding relevant datasets, thus showing that our approach improves reusability of LOD datasets.
Talk given by prof. Amit Sheth at the ICMSE-MGI Digital Data Workshop held at Kno.e.sis Center from November 13-14 2013.
workshop page: http://wiki.knoesis.org/index.php/ICMSE-MGI_Digital_Data_Workshop
Amit Sheth, Pramod Anantharam, Krishnaprasad Thirunarayan, "kHealth: Proactive Personalized Actionable Information for Better Healthcare", Workshop on Personal Data Analytics in the Internet of Things at VLDB2014, Hangzhou, China, September 5, 2014.
Accompanying Video: http://youtu.be/pqcbwGYHPuc
Paper: http://www.knoesis.org/library/resource.php?id=2008
Amit Sheth, 'Semantic Computing in Real-World: Vertical and Horizontal application, within Enterprise and on the Web, ' Panel Presentation at International Conference on Semantic Computing (ICSC2011), Palo Alto, CA, September 20, 2011.
A statistical and schema independent approach to determine equivalent properties between linked datasets. The approach utilizes interlinking between datasets and property extensions to understand the equivalence of properties.
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory Henson, and Amit Sheth, 'Trust Networks', In: 5th Indian International Conference on Artificial Intelligence (IICAI-11), December 14-16, 2011 (invited tutorial).
Krishnaprasad Thirunarayan and Amit Sheth: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications, In: Proceedings of AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013.
With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledge and inferences that can enable smart healthcare and ultimately benefit community at large.
Paper: http://www.knoesis.org/library/resource.php?id=1903
Harshal Patni, "Real Time Semantic Analysis of Streaming Sensor Data," MS Thesis Defense, Kno.e.sis Center, Wright State University, Dayton OH, March 21, 2001.
More at: http://wiki.knoesis.org/index.php/SSW
Dissertation Advisor: Prof. Amit Sheth
Cursing is not uncommon during conversations in the physical world: 0.5% to 0.7% of all the words we speak are curse words, given that 1% of all the words are first-person plural pronouns (e.g., we, us, our). On social media, people can instantly chat with friends without face-to-face interaction, usually in a more public fashion and broadly disseminated through highly connected social network. Will these distinctive features of social media lead to a change in people’s curs- ing behavior? In this paper, we examine the characteristics of cursing activity on a popular social media platform – Twitter, involving the analysis of about 51 million tweets and about 14 million users. In particular, we explore a set of questions that have been recognized as crucial for understanding curs- ing in offline communications by prior studies, including the ubiquity, utility, and contextual dependencies of cursing.
Original paper: http://knoesis.org/library/resource.php?id=1937
Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth, User Interests Identification on Twitter Using a Hierarchical Knowledge Base, ESWC 2014, May 2014.
Paper at: http://j.mp/user-ig
More at: http://wiki.knoesis.org/index.php/Hierarchical_Interest_Graph
Invited talk presented by Hemant Purohit (http://knoesis.org/researchers/hemant) at the NCSU workshop on IT for sustainable tourism development. The talk presents application of technology developed for crisis coordination into more general marketplace coordination via social media for helping suppliers (micro-entrepreneurs) and demanders (tourists).
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Key coverage:
Use of ontologies for semantic interoperability (http://knoesis.org/library/resource.php?id=00277); InfoHarness (http://knoesis.org/library/resource.php?id=00275) and VisualHarness (http://knoesis.org/library/resource.php?id=00267) demonstrate faceted search; MREF - putting metadata on HREF is way ahead of its time (see: http://knoesis.org/library/resource.php?id=00294); multi-ontology query processing in OBSERVER system (http://knoesis.org/library/resource.php?id=00273)
In this presentation we review some of the research problems we address at EPFL in the area of sensor data management. At the level of infrastructure we have developed a middleware to seamlessly integrate, aggregate and analyze heterogeneous sensor data streams in real-time, a WIKI based repository supporting the cooperative management of the metadata associated with sensor deployments and cloud-based storage infrastructure. An important problem in managing sensor data is their efficient storage and transmission using compression techniques. To that end we apply model-based compression methods. For analyzing sensor data, we have developed methods to dynamically estimate the variability, which can be readily used for outlier detection, and to extract semantic features from GPS sensor data streams. We also investigate techniques for trading off between the accuracy of the sensor data obtained and the degree of privacy preservation that can be maintained.
The Sensor Data Management presentation was presented by Karl Aberer (Ecole Polytechnique Federale de Lausanne) at the PlanetData project Meeting on February 28 - March 4, 2011 in Innsbruck, Austria.
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Amit Sheth
Amit Sheth, Keynote: International Conference on Interoperating Geographic Systems (Interop’97), Santa Barbara, December 3-4 1997.
Related technical paper: http://knoesis.org/library/resource.php?id=00230
Creating an RAD Authoratative Data Environmentanicewick
Sharing data in agencies can be a burden, with users placing data on numerous desktop packages, the idea of sharing becomes impossible. However, new RAD tools allow quick web applications to be developed to replace the Excel, MSAcces, and Filemaker data stores, with real , controlled authoritative database integration.
This presentation defines both the problem space, and the proposed solution.
See www.data4USA.com for more information
A statistical and schema independent approach to determine equivalent properties between linked datasets. The approach utilizes interlinking between datasets and property extensions to understand the equivalence of properties.
Krishnaprasad Thirunarayan, Pramod Anantharam, Cory Henson, and Amit Sheth, 'Trust Networks', In: 5th Indian International Conference on Artificial Intelligence (IICAI-11), December 14-16, 2011 (invited tutorial).
Krishnaprasad Thirunarayan and Amit Sheth: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications, In: Proceedings of AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013.
With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledge and inferences that can enable smart healthcare and ultimately benefit community at large.
Paper: http://www.knoesis.org/library/resource.php?id=1903
Harshal Patni, "Real Time Semantic Analysis of Streaming Sensor Data," MS Thesis Defense, Kno.e.sis Center, Wright State University, Dayton OH, March 21, 2001.
More at: http://wiki.knoesis.org/index.php/SSW
Dissertation Advisor: Prof. Amit Sheth
Cursing is not uncommon during conversations in the physical world: 0.5% to 0.7% of all the words we speak are curse words, given that 1% of all the words are first-person plural pronouns (e.g., we, us, our). On social media, people can instantly chat with friends without face-to-face interaction, usually in a more public fashion and broadly disseminated through highly connected social network. Will these distinctive features of social media lead to a change in people’s curs- ing behavior? In this paper, we examine the characteristics of cursing activity on a popular social media platform – Twitter, involving the analysis of about 51 million tweets and about 14 million users. In particular, we explore a set of questions that have been recognized as crucial for understanding curs- ing in offline communications by prior studies, including the ubiquity, utility, and contextual dependencies of cursing.
Original paper: http://knoesis.org/library/resource.php?id=1937
Pavan Kapanipathi, Prateek Jain, Chitra Venkataramani, Amit Sheth, User Interests Identification on Twitter Using a Hierarchical Knowledge Base, ESWC 2014, May 2014.
Paper at: http://j.mp/user-ig
More at: http://wiki.knoesis.org/index.php/Hierarchical_Interest_Graph
Invited talk presented by Hemant Purohit (http://knoesis.org/researchers/hemant) at the NCSU workshop on IT for sustainable tourism development. The talk presents application of technology developed for crisis coordination into more general marketplace coordination via social media for helping suppliers (micro-entrepreneurs) and demanders (tourists).
The recent emergence of the “Linked Data” approach for publishing data represents a major step forward in realizing the original vision of a web that can "understand and satisfy the requests of people and machines to use the web content" – i.e. the Semantic Web. This new approach has resulted in the Linked Open Data (LOD) Cloud, which includes more than 70 large datasets contributed by experts belonging to diverse communities such as geography, entertainment, and life sciences. However, the current interlinks between datasets in the LOD Cloud – as we will illustrate – are too shallow to realize much of the benefits promised. If this limitation is left unaddressed, then the LOD Cloud will merely be more data that suffers from the same kinds of problems, which plague the Web of Documents, and hence the vision of the Semantic Web will fall short.
This thesis presents a comprehensive solution to address the issue of alignment and relationship identification using a bootstrapping based approach. By alignment we mean the process of determining correspondences between classes and properties of ontologies. We identify subsumption, equivalence and part-of relationship between classes. The work identifies part-of relationship between instances. Between properties we will establish subsumption and equivalence relationship. By bootstrapping we mean the process of being able to utilize the information which is contained within the datasets for improving the data within them. The work showcases use of bootstrapping based methods to identify and create richer relationships between LOD datasets. The BLOOMS project (http://wiki.knoesis.org/index.php/BLOOMS) and the PLATO project, both built as part of this research, have provided evidence to the feasibility and the applicability of the solution.
Semantic Interoperability & Information Brokering in Global Information SystemsAmit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
Key coverage:
Use of ontologies for semantic interoperability (http://knoesis.org/library/resource.php?id=00277); InfoHarness (http://knoesis.org/library/resource.php?id=00275) and VisualHarness (http://knoesis.org/library/resource.php?id=00267) demonstrate faceted search; MREF - putting metadata on HREF is way ahead of its time (see: http://knoesis.org/library/resource.php?id=00294); multi-ontology query processing in OBSERVER system (http://knoesis.org/library/resource.php?id=00273)
In this presentation we review some of the research problems we address at EPFL in the area of sensor data management. At the level of infrastructure we have developed a middleware to seamlessly integrate, aggregate and analyze heterogeneous sensor data streams in real-time, a WIKI based repository supporting the cooperative management of the metadata associated with sensor deployments and cloud-based storage infrastructure. An important problem in managing sensor data is their efficient storage and transmission using compression techniques. To that end we apply model-based compression methods. For analyzing sensor data, we have developed methods to dynamically estimate the variability, which can be readily used for outlier detection, and to extract semantic features from GPS sensor data streams. We also investigate techniques for trading off between the accuracy of the sensor data obtained and the degree of privacy preservation that can be maintained.
The Sensor Data Management presentation was presented by Karl Aberer (Ecole Polytechnique Federale de Lausanne) at the PlanetData project Meeting on February 28 - March 4, 2011 in Innsbruck, Austria.
Semantic Interoperability in Infocosm: Beyond Infrastructural and Data Intero...Amit Sheth
Amit Sheth, Keynote: International Conference on Interoperating Geographic Systems (Interop’97), Santa Barbara, December 3-4 1997.
Related technical paper: http://knoesis.org/library/resource.php?id=00230
Creating an RAD Authoratative Data Environmentanicewick
Sharing data in agencies can be a burden, with users placing data on numerous desktop packages, the idea of sharing becomes impossible. However, new RAD tools allow quick web applications to be developed to replace the Excel, MSAcces, and Filemaker data stores, with real , controlled authoritative database integration.
This presentation defines both the problem space, and the proposed solution.
See www.data4USA.com for more information
Semantic Interoperability and Information Brokering in Global Information Sys...Amit Sheth
Amit Sheth, "Semantic Interoperability and Information Brokering in Global Information Systems," Keynote talk at IEEE-Metadata Conference, Bethesda, MD, USA, April 6, 1999.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Ieee metadata-conf-1999-keynote-amit sheth
1. Bethesda, Maryland, April 6, 1999
Amit Sheth
Large Scale Distributed Information Systems Lab
University of Georgia
http://lsdis.cs.uga.edu
2. Three perspectives to GlobIS
autonomy
Information Integration Perspective
distribution
heterogeneity (terminological,
semantic
contextual)
Information Brokering Perspective meta-data
data
knowledge
information ―Vision‖ Perspective
connectivity computing data
3. Evolving targets and approaches in integrating
data and information (a personal perspective)
a society for ubiquitous exchange of (tradeable)
information in all digital forms of representation;
information anywhere, anytime, any forms
Generation III ADEPT,
DL-II projects
1997... InfoQuilt
Generation II InfoSleuth, KMed, DL-I projects
VisualHarness
Infoscopes, HERMES, SIMS,
1990s InfoHarness Garlic,TSIMMIS,Harvest, RUFUS,...
Generation I Mermaid Multibase, MRDSM, ADDS,
1980s DDTS IISS, Omnibase, ...
4. Generation I
•Data recognized as corporate resource — leverage it!
• Data predominantly in structured databases, different data models,
transitioning from network and hierarchical to relational DBMSs
• Heterogeneity (system, modeling and schematic) as well as need to
support autonomy posed main challenges;
major issues were data access and connectivity
• Information integration through Federated architecture
• Support for corporate IS applications as the primary objective,
update often required, data integrity important
5. Generation I
(heterogeneity in FDBMSs)
Database System
•Semantic Heterogeneity
•Differences in DBMS
• data models
(abstractions, constraints, query languages)
1980s • System level support
(concurrency control, commit, recovery)
C
Operating System
o
• file system m
• naming, file types, operation m
• transaction support u
• IPC n
1970s Hardware/System
i
c
• instruction set a
• data representation/coding t
• configuration i
o
n
6. Generation I
(Federated Database Systems: Schema Architecture)
External External
• Dimensions for
Schema Schema interoperability and
integration:
Federated
... distribution, autonomy
Schema
schema
and heterogeneity
integration
Export Export Export
... Schema
Schema Schema
•Model Heterogeneity:
Component ... Component Common/Canonical
Schema Schema Data Model
schema
translation
Schema Translation
Local ... Local
Schema Schema • Information sharing
while preserving
Component ... Component
autonomy
DBS DBS
7. Generation I
(characterization of schematic conflicts in multidatabase systems)
Schematic
Conflicts
Domain Definition Data Value Abstraction Level Schematic Entity Definition
Incompatibility Incompatibility Incompatibility Discrepancies Incompatibility
Naming Conflicts Known Generalization Data Value Naming
Inconsistency Conflicts Attribute Conflicts
Data Representation
Conflict Database
Conflicts Temporal Aggregation
Inconsistency Conflicts Entity Attribute Identifier
Data Scaling Conflicts
Conflict
Conflicts Acceptable
Inconsistency Data Value Schema
Data Precision Isomorphism
Entity Conflict
Conflicts Conflicts
Default Value Missing Data
Conflicts BUT
Items Conflicts
these techniques for dealing with schematic
Attribute Integrity Sheth & Kashyap, Kim & Seo
Constraint Conflicts heterogeneity do not directly map to dealing
with much larger variety of heterogeneous
media
8. Generation II
• Significant improvements in computing and connectivity (standardization
of protocol, public network, Internet/Web); remote data access as given;
• Increasing diversity in data formats, with focus on variety of textual data
and semi-structured documents
• Many more data sources, heterogeneous information sources,
but not necessarily better understanding of data
• Use of data beyond traditional business applications:
mining + warehousing, marketing, e-commerce
• Web search engines for keyword based querying against HTML pages;
attribute-based querying available in a few search systems
• Use of metadata for information access; early work on ontology support
distribution applied to metadata in some cases
• Mediator architecture for information management
9. Generation II
(limited types of metadata, extractors, mappers, wrappers)
Nexis Digital Videos
UPI
AP
... ...
Documents Data Stores
Global/Enterprise Digital Maps
Web Repositories
...
Digital Images Digital Audios
Find Marketing Manager positions in a
company that is within 15 miles of San
Francisco and whose stock price has
been growing at a rate of at least 25% EXTRACTORS
per year over the last three years
Junglee, SIGMOD Record, Dec. 1997 METADATA
10. Generation II
(a metadata classification: the informartion pyramid)
METADATA STANDARDS
User
General Purpose:
Ontologies
Dublin Core, MCF
Classifications
Move in this Domain Models Domain/industry specific:
direction to Geographic (FGDC, UDK, …),
Domain Specific Metadata
tackle Library (MARC,…)
area, population (Census),
information land-cover, relief (GIS),metadata
overload!! concept descriptions from ontologies
Domain Independent (structural) Metadata
(C++ class-subclass relationships, HTML/SGML
Document Type Definitions, C program structure...)
Direct Content Based Metadata
(inverted lists, document vectors, WAIS, Glimpse, LSI)
Content Dependent Metadata(size, max colors, rows, columns...)
Content Independent Metadata(creation-date, location, type-of-sensor...)
Data(Heterogeneous Types/Media)
12. What‘s next (after comprehensive use of metadata)?
Query processing and information requests
NOW
traditional queries based on keywords
attribute based queries
content-based queries
NEXT
‗high level‘ information requests involving
ontology-based, iconic, mixed-media, and
media-independent information rrequests
user selected ontology, use of profiles
13. GIS Data Representation – Example
multiple heterogeneous metadata models with different
tag names for the same data in the same GIS domain
Kansas State
FGDC Metadata Model UDK Metadata Model
Theme keywords: digital line graph, Search terms: digital line graph,
hydrography, transportation... hydrography, transportation...
Title: Dakota Aquifer Topic: Dakota Aquifer
Online linkage: Adress Id:
http://gisdasc.kgs.ukans.edu/dasc/ http://gisdasc.kgs.ukans.edu/dasc/
Direct Spatial Reference Method: Vector Measuring Techniques: Vector
Horizontal Coordinate System Definition: Co-ordinate System:
Universal Transverse Mercator Universal Transverse Mercator
… … … ... … … … ...
14. Generation III
• Increasing information overload and broader variety of information
content (video content, audio clips etc) with increasing amount of visual
information, scientific/engineering data
• Continued standardization related to Web for representational and metadata
issues (MCF, RDF, XML)
• Changes in Web architecture; distributed computing (CORBA, Java)
• Users demand simplicity, but complexities continue to rise
• Web is no longer just another information source, but decision support through
―data mining and information discovery, information fusion, information
dissemination, knowledge creation and management‖, ―information management
complemented by cooperation between the information system and humans‖
•Information Brokering Architecture proposed for information management
15. Information Brokering: An Enabler for the Infocosm
INFORMATION CONSUMERS arbitration between information
People consumers and providers for resolving
Corporations
Programs information impedance
Universities Government
Information Information Information
User User User Request Request Request
Query Query Query
INFORMATION/DATA
INFORMATION BROKERING
OVERLOAD
Information Data Information Information Data Information
System Repository System System Repository System
Newswires Corporations dynamic reinterpretation of information
requests for determination of relevant
Universities Research Labs
information services and products
INFORMATION PROVIDERS —
dynamic creation and composition of
information products
16. Information Brokering: Three Dimensions
THREE DIMENSIONS
C O N S U M E R S
B R O K E R S
VOCABULARY
M E T A D A T A
P R O V I D E R S
S E M A N T I C S
D A T A
S T R U C T U R E
S Y N T A X
S Y S T E M
Objective:
Reduce the problem of knowing structure and semantics of data in the huge
number of information sources on a global scale to: understanding and
navigating a significantly smaller number of domain ontologies
17. What else can Information Brokering do?
W W W + Information Brokering
WWW
Domain Specific Ontologies as
a confusing heterogeneity of media,
“semantic (Tower of Babel)
formats conceptual views”
information correlation usingusing concept
Information correlation physical (HREF)
mappings at the extensional data level level
links at the intensional concept
Browsing of information using information
location dependent browsing of terminological
using physical (HREF) links
relationships across ontologies
user has to keep track of information content !!
Higher level of abstraction, closer
to user view of information !!
18. Concepts, tools and techniques to support semantics
context semantic
proximity inter-ontological
relations
media-independent
information correlations
ontologies
(esp. domain-specific) profiles
domain-specific metadata
19. Tools to support semantics
• Context, context, context
• Media-independent information correlations
• Multiple ontologies
– Semantic Proximity (relationships between concepts within
and across ontologies) using domain, context,
modeling/abstraction/representation, state
– Characterizing Loss of Information incurred due to
differences in vocabulary
BIG challenge:identifying relationship or
similarity between objects of different media,
developed and managed by different persons and systems
20. Heterogeneity... … is a Babel Tower!!
SEMANTIC HETEROGENEITY
metadata
ontologies
contexts
SEMANTIC INTEROPERABILITY
21. The InfoQuilt Project
THE INFOQUILT VISION
Semantic interoperability between systems, sharing knowledge
using multiple ontologies
Logical correlation of information
Media independent information processing
REALIZATION OF THE VISION
fully distributed, adaptable, agent-based system
information/knowledgement supported by collaborative
processes
http://lsdis.cs.uga.edu/proj/iq/iq.html
22. InfoQuilt Project: using the Metadata REFerence link
MREF
Complements HREF, creating a ―logical web‖ through media
independent ontology & metadata based correlation
It is a description of the information asset we want to retrieve
Semantic Correlation using MREF MREF Concept
constraints
relations
attributes Model for logical
correlation using
domain ontologies ontological terms MREF
IQ_Asset ontology + and metadata
extension ontologies
Framework for RDF
representing MREF‘s
MREF
Serialization
(one implementation XML
keywords content attributes choice)
(color, scene cuts, …)
http://lsdis.cs.uga.edu/proj/iq/iq.html
23. Domain Specific Correlation – example
Potential locations for a future shopping mall identified by allregionshaving
apopulationgreater than 5000, andareagreater than 50 sq. ft. having an urban
land cover and moderaterelief<A MREF ATTRIBUTES(population > 5000; area > 50;
region-type = ‘block’; land-cover = ‘urban’; relief = ‘moderate’) can be viewed here</A>
domain specific metadata: terms chosen from domain specific ontologies
Population:
Area:
=>media-independent
relationshipsbetween domain
Boundaries:
specific metadata:population,
Regions Land cover: area, land cover, relief
(SQL): Image Features
Relief: (image processing
routines) =>correlation between image
Boundaries and structured data at a
higher domain specific level
asopposed to physical ―link-
chasing‖ in the WWW
Census DB TIGER/Line DB US Geological Survey
25. A DL II approach for Information Brokering
Iscape 1 Iscape N
CONSTRUCTING APPROPRIATE INFORMATION LANDSCAPES
CONSTRUCTING ADDITIONAL
META-INFORMATION RESOURCES
DISCOVERING COLLECTIONS OF
HETEROGENEOUS INFORMATION AND
META-INFORMATION RESOURCES
Domain
Specific Domain
Ontologies Independent
Images Data Stores Documents Digital Media
Ontologies
Physical/Simulation
World
26. ADEPT Information Landscape Concept Prototype
(a scenario for Digital Earth:
learning in the context of the “El Niño” phenomenon)
Sample Iscapes Requests:
–How does El Niño affect sea animals? Look for
broadcast videos of less than 2 minutes.
– How are some regions affected by El Niño? Look at
request information using
East/West Pacific regions.
keywords
– What disasters have been related to El Niño?
domain-specific attributes
– What storm occurrencesattributes
domain-independent are attributed to El Niño?
– Show reports related to El Niño that contain Clinton.
TRY ISCAPE CONCEPT DEMO
27. Putting MREFs to work
IQ_Asset ontology +
extension ontologies
domain ontologies
MREF Builder
MREF
User construct new MREF repository
MREF
repository
User
Agent
User Profile Broker Agent
profiles Manager
28. Context: the lynchpin of semantics
Cricket
―For instance, if you were to use Yahoo! or Infoseek to
search the web for pizza, your results would probably
be hundreds of matches for the word pizza. Many of
these could be pizza parlors around the world. Yet if
you run the same search within NeighborNet, you will
allows you to order pizza to be delivered instead of
shipped.‖
From a Press Resease of FutureOne, Inc. March 24, 1999
http://home.futureone.com/about/pr/021699.asp
29. Constructing c-contexts from ontological terms
C-CONTEXT:
―All documents stored in the database
have been published by some agency‖
DATABASE
OBJECTS => Cdef(DOC) = <(hasOrganization, AgencyConcept)>
AGENCY(RegNo, Name, Affiliation) C-Context = <(C1 , V1) (C2 , V2) ... (Ck , Vk) >
DOC(Id, Title, Agency) a collection of
contextual coordinatesCi s(roles) and
valuesVi s(concepts/concept descriptions)
Agency
Concept Advantages:
Document
Concept Use of ontologies for an intensional
domain specific description of data
Representation of extra information
Relationships between objects not
ONTOLOGICAL TERMS represented in the database schema
Using terminological relationships in
the ontology
30. Using c-contexts to reason about
EXAMPLE
information in database
Cdef(DOC) CQ
<(hasOrganization, AgencyConcept)> <(hasOrganization,{―USGS‖})>
glb(Cdef(DOC), CQ)
<(self, DocumentConcept),(hasOrganization, { ―USGS‖ })>
- Reasoning with c-contexts: glb(Cdef(DOC), CQ)
- Ontological Inferences:
- DocumentConcept
- (hasOrganization, { ―USGS‖ })
Challenge 1: use of multiple ontologies
Challenge 2: estimating the loss of information
31. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
OBSERVER architecture
Data Repositories
IRM
Ontology
Server Mappings
Ontologies
Interontologies
Terminological Query User
Relationships Processor Query
IRM NODE USER NODE
COMPONENT NODE COMPONENT NODE
Ontology Ontology
Server Server
Mappings Mappings
Query Ontologies Query Ontologies
Processor Processor
Data Repositories Data Repositories
Eduardo Mena (III’98)
32. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Query construction - Example
“Get title and number of pages of books written by Carl Sagan”
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
33. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Query construction - Example Re-use of Knowledge:
Biblio-Thing Bibliography Data Ontology
Stanford-I
“Get title and number of pages of books written by Carl Sagan”
Document Conference Agent
User ontology: WN
Person Organization
[name pages] for Author
Book Technical-Report
(AND book (FILLS creator “Carl Sagan”))
Publisher University
Miscellaneous-Publication
Proceedings
Target ontology: Stanford-I
Edited-Book
Thesis
Integrated ontology WN-Stanford-I
Periodical-Publication Technical-Manual
Cartographic-Map
[title number-of-pages] for
Doctoral-Thesis Computer-Program
Multimedia-Document
Journal Newspaper
(AND book (FILLS doc-author-name “Carl Sagan”))
Master-Thesis Artwork
Magazine
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
34. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Re-use of Knowledge:
Query construction - Example
Print-Media A subset of WordNet 1.5
“Get title and number of pages of books written by Carl Journalism
Press Publication
Sagan”
User
Newspaper ontology: WN
Magazine Periodical
Book
[name pages] for Journals
Pictorial
Series
Trade-Book Brochure (AND book (FILLS creator “Carl Sagan”))
TextBook
SongBook
Reference-Book PrayerBook
Target ontology: Stanford-I
CookBook Encyclopedia
Integrated ontology WN-Stanford-I
WordBook
Instruction-Book HandBook Directory Annual
[title number-of-pages] for
GuideBook
(AND book (FILLS doc-author-name “Carl Sagan”))
Manual Bible
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
Instructions Reference-Manual
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
35. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
WN ontology and user query
Query construction - Example
“Get title and number of pages of books written by Carl Sagan”
User ontology: WN
[name pages] for
(AND book (FILLS creator “Carl Sagan”))
Target ontology: Stanford-I
Integrated ontology WN-Stanford-I
[title number-of-pages] for
(AND book (FILLS doc-author-name “Carl Sagan”))
Ontologies sites: http://www.cogsci.princeton.edu/~wn/w3wn.html
http://www-ksl.stanford.edu/knowledge-sharing/ontologies/html/bibliographic-data/
Eduardo Mena (III’98)
36. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Estimating the loss of information
To choose the plan with the least loss
To present a level of confidence in the answer
Based on intensional information (terminological difference)
Based on extensional information (precision and recall)
Plans in the example
User Query: (AND book
(FILLS doc-author-name “Carl Sagan”))
Plan 1: (ANDdocument(FILLS doc-author-name “Carl Sagan”))
Plan 2: (ANDperiodical-publication (FILLS doc-author-name “Carl Sagan”))
Plan 3: (ANDjournal(FILLS doc-author-name “Carl Sagan”))
Plan 4: (ANDUNION(book, proceedings, thesis, misc-publication, technical-report)
(FILLS doc-author-name “Carl Sagan”))
Eduardo Mena (III’98)
37. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Loss of information based on intensional information
User Query: (AND book (FILLS doc-author-name “Carl Sagan”))
Plan 1:
(ANDdocument (FILLS doc-author-name “Carl Sagan”))
book:=(AND publication (AT-LEAST 1 ISBN))
publication:=(AND document (AT-LEAST 1 place-of-publication))
Loss:“Instead of books written by Carl Sagan, OBSERVER is
providing all the documents written by Carl Sagan (even if they
do not have an ISBN and place of publication)”
Eduardo Mena (III’98)
38. Estimating information loss for multi-ontology based
query processing in the OBSERVER/InfoQuilt system
Example: loss for the plans
Plan 1:(AND document (FILLS doc-author-name “Carl Sagan”)) [case 2]
91.57% < (1-Loss) < 91.75%
Plan 2: (AND periodical-publication (FILLS doc-author-name “Carl Sagan”))
94.03% < (1-Loss) < 100%[case 3]
Plan 3: (AND journal (FILLS doc-author-name “Carl Sagan”)) [case 3]
98.56% < (1-Loss) < 100%
Plan 4: (AND UNION(book, proceedings, thesis, misc-publication, technical-
report) (FILLS doc-author-name “Carl Sagan”)) [case 1]
0% < (1-Loss) < 7.22%
Eduardo Mena (III’98)
39. Summary
Knowledge Mgmt.,
Visual, Information
Knowledge Semantic
Scientific/Eng. Brokering,
Cooperative IS
Structural, Mediator,
Semi-structured Metadata
Schematic Federated IS
Text Syntax,
Data Federated DB
Structured Databases System
40. Agenda for research
Interoperation not at systems level, but at informational and
possibly knowledge level
– traditional database and information retrieval solutions
do not suffice
– need to understand context; measures of similarities
Need to increase impetus on semantic level issues involving
terminological and contextual differences, possible perceptual
or cognitive differences in future
– information systems and humans need to cooperate,
possible involving a coordination and collaborative
processes
41. Related Reading
Books:
Information Brokering for Digital Media, Kashyap and Sheth, Kluwer,
1999 (to appear)
Multimedia Data Management: Using Metadata to Integrate and Apply
Digital Media, Sheth and Klas Eds, McGraw-Hill, 1998
Cooperative Information Systems, Papazoglou and Schlageter Eds.,
Academic Press, 1998
Management of Heterogeneous and Autonomous Database Systems,
Elmagarmid, Rusinkiewica, Sheth Eds, Morgan Kaufmann, 1998.
Special Issues and Proceedings:
Formal Ontologies in Information Systems, Guarino Ed., IOS Press, 1998
Semantic Interoperability in Global Information Systems, Ouksel and
Sheth, SIGMOD Record, March 1999.
http://lsdis.cs.uga.edu Acknowledgements:
[See publications on Metadata, Semantics,Context, Tarcisio Lima
InfoHarness/InfoQuilt] Vipul Kashyap
amit@cs.uga.edu