The document discusses enabling interactivity between humans and artificial intelligence for subjective information seeking tasks. It proposes a model where the user and AI agent interact in a mixed-initiative system through exploration and feedback. The AI agent guides the user through the data and the user provides feedback through exploration actions. Reinforcement learning can be used to learn an optimal policy for interactions by modeling them as sequential decision making. Features of the data and interactions are used to learn the policy instead of a value function. This enables learning policies for subjective information seeking in open-world environments.
Generating Cultural Personas From Social Data - A Perspective of Middle Easte...Joni Salminen
CITE: "Salminen, J., Sercan, Ş., Haewoon, K., Jansen, B. J., An, J., Jung, S., Vieweg, S., Harrell, F. (2017). Generating Cultural Personas from Social Data: A Perspective of Middle Eastern Users. In Proceedings of The Fourth International Symposium on Social Networks Analysis, Management and Security (SNAMS-2017). Prague, Czech Republic, 21–23, August."
Download paper: http://jonisalminen.com/wp-content/uploads/2018/08/Generating-Cultural-Personas-From-Social-Data_SNAMS2017.pdf
***
Automatic Persona Generation (APG) is a system and methodology developed at Qatar Computing Research Institute, Hamad Bin Khalifa University.
The goal is to give faces to social and online analytics data. Personas can be generated from YouTube, Facebook, and Google Analytics data.
The system can be found at https://persona.qcri.org
Knowledge Graphs and their central role in big data processing: Past, Present...Amit Sheth
Keynote at CODS-COMAD 2020, Hyderabad, India, 06 Jan 2020: https://cods-comad.in/keynotes.html
Abstract : Early use of knowledge graphs, before the start of this century, related to building a knowledge graph manually or semi-automatically and applying them for semantic applications, such as search, browsing, personalization, and advertisement. Taalee/Semagix Semantic Search in 2000 had a KG that covered many domains and supported search with an equivalent of today’s infobox. Along with the growth of big data, machine learning became the preferred technique for searching, analyzing and deriving insights from such data. We observed the complementary nature of bottom-up (machine learning-driven) and top-down (semantic, knowledge graph and planning based) techniques. Recently we have seen growing efforts involving the shallow use of a knowledge graph to improve the semantic and conceptual processing of data. The future promises deeper and congruent incorporation or integration of the knowledge graphs in the learning techniques (which we call knowledge-infused learning), where knowledge graphs combining statistical AI (bottom-up) and symbolic AI learning techniques (top-down) play a critical role in hybrid and integrated intelligent systems. Throughout this talk, we will provide real-world examples, products, and applications where the knowledge graph played a pivotal role.
The recent series of innovations in deep learning have shown enormous potential to impact individuals and society, both positively and negatively. The deep learning models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, signal processing, and human-computer interactions. However, the Black-Box nature of deep learning models and their over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for the system’s interpretability and explainability. Furthermore, deep learning methods have not yet been proven in their ability to effectively utilize relevant domain knowledge and experience critical to human understanding. This aspect is missing in early data-focused approaches and necessitated knowledge-infused learning and other strategies to incorporate computational knowledge. Rapid advances in our ability to create and reuse structured knowledge as knowledge graphs make this task viable. In this talk, we will outline how knowledge, provided as a knowledge graph, is incorporated into the deep learning methods using knowledge-infused learning. We then discuss how this makes a fundamental difference in the interpretability and explainability of current approaches and illustrate it with examples relevant to a few domains.
Highlights and summary of long-running programmatic research on data science; practices, roles, tools, skills, organization models, workflow, outlook, etc. Profiles and persona definition for data scientist model. Landscape of org models for data science and drivers for capability planning. Secondary research materials.
Discovery and the Age of Insight: Walmart EIM Open House 2013Joe Lamantia
Discovery is the most important business capability in the emerging Age of Insight - it's the missing ingredient that makes Big Data a source of value for businesses and people.
The Language of Discovery is an essential tool for providing discovery capability, whether at the scale of designing a single discovery application, determining the value proposition of a new product or service, or managing a strategic portfolio of technology and business initiatives.
This presentation outlines the Age of Insight, and suggests deep structural and historic precedents visible in the Age of Reason, especially in the central parallels between Natural Philosophy and the emerging discipline of Data Science. We then review the language of discovery, and consider widely visible examples of products and services that demonstrate the language.
We review our own usage of the framework as an analytical and generative toolkit for providing discovery capability, and share best practices for employing this perspective across a variety of levels of need.
Generating Cultural Personas From Social Data - A Perspective of Middle Easte...Joni Salminen
CITE: "Salminen, J., Sercan, Ş., Haewoon, K., Jansen, B. J., An, J., Jung, S., Vieweg, S., Harrell, F. (2017). Generating Cultural Personas from Social Data: A Perspective of Middle Eastern Users. In Proceedings of The Fourth International Symposium on Social Networks Analysis, Management and Security (SNAMS-2017). Prague, Czech Republic, 21–23, August."
Download paper: http://jonisalminen.com/wp-content/uploads/2018/08/Generating-Cultural-Personas-From-Social-Data_SNAMS2017.pdf
***
Automatic Persona Generation (APG) is a system and methodology developed at Qatar Computing Research Institute, Hamad Bin Khalifa University.
The goal is to give faces to social and online analytics data. Personas can be generated from YouTube, Facebook, and Google Analytics data.
The system can be found at https://persona.qcri.org
Knowledge Graphs and their central role in big data processing: Past, Present...Amit Sheth
Keynote at CODS-COMAD 2020, Hyderabad, India, 06 Jan 2020: https://cods-comad.in/keynotes.html
Abstract : Early use of knowledge graphs, before the start of this century, related to building a knowledge graph manually or semi-automatically and applying them for semantic applications, such as search, browsing, personalization, and advertisement. Taalee/Semagix Semantic Search in 2000 had a KG that covered many domains and supported search with an equivalent of today’s infobox. Along with the growth of big data, machine learning became the preferred technique for searching, analyzing and deriving insights from such data. We observed the complementary nature of bottom-up (machine learning-driven) and top-down (semantic, knowledge graph and planning based) techniques. Recently we have seen growing efforts involving the shallow use of a knowledge graph to improve the semantic and conceptual processing of data. The future promises deeper and congruent incorporation or integration of the knowledge graphs in the learning techniques (which we call knowledge-infused learning), where knowledge graphs combining statistical AI (bottom-up) and symbolic AI learning techniques (top-down) play a critical role in hybrid and integrated intelligent systems. Throughout this talk, we will provide real-world examples, products, and applications where the knowledge graph played a pivotal role.
The recent series of innovations in deep learning have shown enormous potential to impact individuals and society, both positively and negatively. The deep learning models utilizing massive computing power and enormous datasets have significantly outperformed prior historical benchmarks on increasingly difficult, well-defined research tasks across technology domains such as computer vision, natural language processing, signal processing, and human-computer interactions. However, the Black-Box nature of deep learning models and their over-reliance on massive amounts of data condensed into labels and dense representations pose challenges for the system’s interpretability and explainability. Furthermore, deep learning methods have not yet been proven in their ability to effectively utilize relevant domain knowledge and experience critical to human understanding. This aspect is missing in early data-focused approaches and necessitated knowledge-infused learning and other strategies to incorporate computational knowledge. Rapid advances in our ability to create and reuse structured knowledge as knowledge graphs make this task viable. In this talk, we will outline how knowledge, provided as a knowledge graph, is incorporated into the deep learning methods using knowledge-infused learning. We then discuss how this makes a fundamental difference in the interpretability and explainability of current approaches and illustrate it with examples relevant to a few domains.
Highlights and summary of long-running programmatic research on data science; practices, roles, tools, skills, organization models, workflow, outlook, etc. Profiles and persona definition for data scientist model. Landscape of org models for data science and drivers for capability planning. Secondary research materials.
Discovery and the Age of Insight: Walmart EIM Open House 2013Joe Lamantia
Discovery is the most important business capability in the emerging Age of Insight - it's the missing ingredient that makes Big Data a source of value for businesses and people.
The Language of Discovery is an essential tool for providing discovery capability, whether at the scale of designing a single discovery application, determining the value proposition of a new product or service, or managing a strategic portfolio of technology and business initiatives.
This presentation outlines the Age of Insight, and suggests deep structural and historic precedents visible in the Age of Reason, especially in the central parallels between Natural Philosophy and the emerging discipline of Data Science. We then review the language of discovery, and consider widely visible examples of products and services that demonstrate the language.
We review our own usage of the framework as an analytical and generative toolkit for providing discovery capability, and share best practices for employing this perspective across a variety of levels of need.
Don't Handicap AI without Explicit KnowledgeAmit Sheth
Keynote at IEEE Services 2021: Abstract: https://conferences.computer.org/services/2021/keynotes/sheth.html
Video: https://lnkd.in/d-r3YXaC
Video of the same keynote given at DEXA2021: https://www.youtube.com/watch?v=u-06kK9TysA
September 9, 2021 15:00 - 16:20 UTC
ABSTRACT
Knowledge representation as expert system rules or using frames and a variety of logics played a key role in capturing explicit knowledge during the hay days of AI in the past century. Such knowledge, aligned with planning and reasoning is part of what we refer to as Symbolic AI. The resurgent AI of this century in the form of Statistical AI has benefitted from massive data and computing. On some tasks, deep learning methods have even exceeded human performance levels. This gave the false sense that data alone is enough, and explicit knowledge is not needed. But as we start chasing machine intelligence that is comparable with human intelligence, there is an increasing realization that we cannot do without explicit knowledge. Neuroscience (role of long-term memory, strong interactions between different specialized regions of data on tasks such as multimodal sensing), cognitive science (bottom brain versus top brain, perception versus cognition), brain-inspired computing, behavioral economics (system 1 versus system 2), and other disciplines point to need for furthering AI to neuro-symbolic AI (i.e., hybrid of Statistical AI and Symbolic AI, also referred to as the third wave of AI). As we make this progress, the role of explicit knowledge becomes more evident. I will specifically look at our endeavor to support human-like intelligence, our desire for AI systems to interact with humans naturally, and our need to explain the path and reasons for AI systems’ workings. Nevertheless, the variety of knowledge needed to support understanding and intelligence is varied and complex. Using the example of progressing from NLP to NLU, I will demonstrate the dimensions of explicit knowledge, which may include, linguistic, language syntax, common sense, general (world model), specialized (e.g., geographic), and domain-specific (e.g., mental health) knowledge. I will also argue that despite this complexity, such knowledge can be scalability created and maintained (even dynamically or continually). Finally, I will describe our work on knowledge-infused learning as an example strategy for fusing statistical and symbolic AI in a variety of ways.
QU Speaker Series - Session 3
https://qusummerschool.splashthat.com
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Machine Learning and Model Risk (With a focus on Neural Network Models)
All models are wrong and when they are wrong they create financial or non-financial risks. Understanding, testing and managing model failures are the key focus of model risk management particularly model validation.
For machine learning models, particular attention is made on how to manage model fairness, explainability, robustness and change control. In this presentation, I will focus the discussion on machine learning explainability and robustness. Explainability is critical to evaluate conceptual soundness of models particularly for the applications in highly regulated institutions such as banks. There are many explainability tools available and my focus in this talk is how to develop fundamentally interpretable models.
Neural networks (including Deep Learning), with proper architectural choice, can be made to be highly interpretable models. Since models in production will be subjected to dynamically changing environments, testing and choosing robust models against changes are critical, an aspect that has been neglected in AutoML.
Machine learning for factor investing - Tony Guida
https://quspeakerseries5.splashthat.com/
Topic: Machine Learning for Factor Investing: case study on "Trees"
In this presentation, Tony will first introduce the concept of supervised learning. Then he will cover the practitioner angle for constructing non linear multi factor signals using stock characteristics. He will show the added value of ML based signals over traditional linear stale factors blend in equity.
This master class is derived from Guillaume Coqueret and Tony Guida's latest book "Machine Learning for Factor Investing"
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Generating Synthetic Data with Generative Adversarial Networks: Opportunities and Challenges
Limited data access continues to be a barrier to data-driven product development. In this talk, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge.
We identify key challenges of existing GAN approaches with respect to fidelity (e.g., capturing complex multidimensional correlations, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity).
To address fidelity challenges, we discuss our experiences designing a custom workflow called DoppelGANger and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DoppelGANger achieves up to 43% better fidelity than baseline models.
With respect to privacy, we identify fundamental challenges with both classical notions of privacy as well as recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges.
QU Summer school 2020 speaker Series - Session 7
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Managing Machine Learning Models in the Financial Industry
Lecture 1: Model Risk Management for AI and Machine Learning
Artificial intelligence and machine learning are part of today’s modeler’s toolbox for building challenger models and new innovative models that address business needs. However, AI presents new and unique challenges for risk management, particularly for assessing, controlling, and managing model risk for models of limited transparency. Another key consideration is the speed at which these models can be developed, validated, and then deployed into productive use to be competitive adhering to a robust model risk management program. This talk will highlight best practices for integrating AI into model risk practices and showcase examples across the model lifecycle.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
Finely Chair talk: Every company is an AI company - and why Universities sho...Amit Sheth
Video: https://youtu.be/ZS8rGSzb_9I
The context of this talk is this statement from the host institution's provost: "We are trying to mobilize our campus activities around AI.” I connect academic initiatives in Interdisciplinary AI with industry needs.
--- Original abstract -----
Every company now is an AI company: Now, Near Future, or Distant Future?
Amit Sheth, AI Institute, University of South Carolina
“Every company now is an AI company. The industrial companies are changing, the supply chain…every single sector, it’s not only tech.” said Steven Pagliuca, CEO of Bain Capital at the 2019 World Economic Forum. With this statement as the context, I will provide an overview of AI landscape -- what AI capabilities are for real, what is being oversold, what is nonexistent, what is unlikely in our lifetime. I will also provide an anecdote-supported review through a broad variety of current and eminent applications of AI that rely on some of the well-developed and emerging AI capabilities. The objective is to help those considering AI applications start thinking of new business opportunities, new products and services, and new revenue/business models in the context of rapid penetration of AI technologies everywhere. I will seek to answer: Is AI just hype or something already happening? If it has not happened in your industry, is it impending? Do bad impacts of AI outweigh the good?
Don't Handicap AI without Explicit KnowledgeAmit Sheth
Keynote at IEEE Services 2021: Abstract: https://conferences.computer.org/services/2021/keynotes/sheth.html
Video: https://lnkd.in/d-r3YXaC
Video of the same keynote given at DEXA2021: https://www.youtube.com/watch?v=u-06kK9TysA
September 9, 2021 15:00 - 16:20 UTC
ABSTRACT
Knowledge representation as expert system rules or using frames and a variety of logics played a key role in capturing explicit knowledge during the hay days of AI in the past century. Such knowledge, aligned with planning and reasoning is part of what we refer to as Symbolic AI. The resurgent AI of this century in the form of Statistical AI has benefitted from massive data and computing. On some tasks, deep learning methods have even exceeded human performance levels. This gave the false sense that data alone is enough, and explicit knowledge is not needed. But as we start chasing machine intelligence that is comparable with human intelligence, there is an increasing realization that we cannot do without explicit knowledge. Neuroscience (role of long-term memory, strong interactions between different specialized regions of data on tasks such as multimodal sensing), cognitive science (bottom brain versus top brain, perception versus cognition), brain-inspired computing, behavioral economics (system 1 versus system 2), and other disciplines point to need for furthering AI to neuro-symbolic AI (i.e., hybrid of Statistical AI and Symbolic AI, also referred to as the third wave of AI). As we make this progress, the role of explicit knowledge becomes more evident. I will specifically look at our endeavor to support human-like intelligence, our desire for AI systems to interact with humans naturally, and our need to explain the path and reasons for AI systems’ workings. Nevertheless, the variety of knowledge needed to support understanding and intelligence is varied and complex. Using the example of progressing from NLP to NLU, I will demonstrate the dimensions of explicit knowledge, which may include, linguistic, language syntax, common sense, general (world model), specialized (e.g., geographic), and domain-specific (e.g., mental health) knowledge. I will also argue that despite this complexity, such knowledge can be scalability created and maintained (even dynamically or continually). Finally, I will describe our work on knowledge-infused learning as an example strategy for fusing statistical and symbolic AI in a variety of ways.
QU Speaker Series - Session 3
https://qusummerschool.splashthat.com
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Machine Learning and Model Risk (With a focus on Neural Network Models)
All models are wrong and when they are wrong they create financial or non-financial risks. Understanding, testing and managing model failures are the key focus of model risk management particularly model validation.
For machine learning models, particular attention is made on how to manage model fairness, explainability, robustness and change control. In this presentation, I will focus the discussion on machine learning explainability and robustness. Explainability is critical to evaluate conceptual soundness of models particularly for the applications in highly regulated institutions such as banks. There are many explainability tools available and my focus in this talk is how to develop fundamentally interpretable models.
Neural networks (including Deep Learning), with proper architectural choice, can be made to be highly interpretable models. Since models in production will be subjected to dynamically changing environments, testing and choosing robust models against changes are critical, an aspect that has been neglected in AutoML.
Machine learning for factor investing - Tony Guida
https://quspeakerseries5.splashthat.com/
Topic: Machine Learning for Factor Investing: case study on "Trees"
In this presentation, Tony will first introduce the concept of supervised learning. Then he will cover the practitioner angle for constructing non linear multi factor signals using stock characteristics. He will show the added value of ML based signals over traditional linear stale factors blend in equity.
This master class is derived from Guillaume Coqueret and Tony Guida's latest book "Machine Learning for Factor Investing"
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Generating Synthetic Data with Generative Adversarial Networks: Opportunities and Challenges
Limited data access continues to be a barrier to data-driven product development. In this talk, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge.
We identify key challenges of existing GAN approaches with respect to fidelity (e.g., capturing complex multidimensional correlations, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity).
To address fidelity challenges, we discuss our experiences designing a custom workflow called DoppelGANger and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DoppelGANger achieves up to 43% better fidelity than baseline models.
With respect to privacy, we identify fundamental challenges with both classical notions of privacy as well as recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges.
QU Summer school 2020 speaker Series - Session 7
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Managing Machine Learning Models in the Financial Industry
Lecture 1: Model Risk Management for AI and Machine Learning
Artificial intelligence and machine learning are part of today’s modeler’s toolbox for building challenger models and new innovative models that address business needs. However, AI presents new and unique challenges for risk management, particularly for assessing, controlling, and managing model risk for models of limited transparency. Another key consideration is the speed at which these models can be developed, validated, and then deployed into productive use to be competitive adhering to a robust model risk management program. This talk will highlight best practices for integrating AI into model risk practices and showcase examples across the model lifecycle.
This workshop presentation from Enterprise Knowledge team members Joe Hilger, Founder and COO, and Sara Nash, Technical Analyst, was delivered on June 8, 2020 as part of the Data Summit 2020 virtual conference. The 3-hour workshop provided an interdisciplinary group of participants with a definition of what a knowledge graph is, how it is implemented, and how it can be used to increase the value of your organization’s datas. This slide deck gives an overview of the KM concepts that are necessary for the implementation of knowledge graphs as a foundation for Enterprise Artificial Intelligence (AI). Hilger and Nash also outlined four use cases for knowledge graphs, including recommendation engines and natural language query on structured data.
Finely Chair talk: Every company is an AI company - and why Universities sho...Amit Sheth
Video: https://youtu.be/ZS8rGSzb_9I
The context of this talk is this statement from the host institution's provost: "We are trying to mobilize our campus activities around AI.” I connect academic initiatives in Interdisciplinary AI with industry needs.
--- Original abstract -----
Every company now is an AI company: Now, Near Future, or Distant Future?
Amit Sheth, AI Institute, University of South Carolina
“Every company now is an AI company. The industrial companies are changing, the supply chain…every single sector, it’s not only tech.” said Steven Pagliuca, CEO of Bain Capital at the 2019 World Economic Forum. With this statement as the context, I will provide an overview of AI landscape -- what AI capabilities are for real, what is being oversold, what is nonexistent, what is unlikely in our lifetime. I will also provide an anecdote-supported review through a broad variety of current and eminent applications of AI that rely on some of the well-developed and emerging AI capabilities. The objective is to help those considering AI applications start thinking of new business opportunities, new products and services, and new revenue/business models in the context of rapid penetration of AI technologies everywhere. I will seek to answer: Is AI just hype or something already happening? If it has not happened in your industry, is it impending? Do bad impacts of AI outweigh the good?
What does a data scientist actually do? Here at Good Rebels we wanted to outline a profile of this new profession, with the help of various industry leaders from academia, business and institutions. In short, we concluded that the main tasks of a data scientist are to identify data, transform it when incomplete, categorize it, prepare it for analysis, perform the analysis, visualize the results and communicate them.
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction
.
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
Social networking sites are a significant source of information to know the behavior of users and to know
what is occupying society of all ages and accordingly helpful information can be provided to specialists
and decision-makers. According to official sources, 98.43% of Saudi youth use social networking sites. The
study and analysis of social media data are done to provide the necessary information to increase
investment opportunities within the Kingdom of Saudi Arabia, by studying and analyzing what people
occupy on the communication sites through their tweets about the labor market and investment. Given the
huge volume of data and also its randomness, a survey of the data will be done and collected from through
keywords, the priority of arranging the data, and recording it as (positive - negative - mixed). The study
analysis and conclusion will be based on data-mining and its techniques of analysis and deduction.
Exploring Research Opportunities in the Digital EraTogar Simatupang
The focus of this presentation is to specialize in the field of business sciences in areas that include entrepreneurship, finance, big data, and technology, operations and logistics, and human resources.
Data ethics and machine learning: discrimination, algorithmic bias, and how t...Data Driven Innovation
Machine learning and data mining algorithms construct predictive models and decision making systems based on big data. Big data are the digital traces of human activities - opinions, preferences, movements, lifestyles, ... - hence they reflect all human biases and prejudices. Therefore, the models learnt from big data may inherit all such biases, leading to discriminatory decisions. In my talk, I discuss many real examples, from crime prediction to credit scoring to image recognition, and how we can tackle the problem of discovering discrimination using the very same approach: data mining.
Quantified Self movement allows to collect a lot of
personal data which can be used to nurture the model
of the users. Evenly, when aggregated, these personal
data become a picture of the people of a space in a City
Model. This model can be fed also by data coming from
crowdsensing. The resulting City Model can be used to
provide personalized services to citizen, and to increase
people awareness about their behaviour that can help
in promoting collective behavioural change. The paper
Crowdsourcing Approaches for Smart City Open Data ManagementEdward Curry
A wide-scale bottom-up approach to the creation and management of open data has been demonstrated by projects like Freebase, Wikipedia, and DBpedia. This talk explores how to involving a wide community of users in collaborative management of open data activities within a Smart City. The talk discusses how crowdsourcing techniques can be applied within a Smart City context using crowdsourcing and human computation platforms such as Amazon Mechanical Turk, Mobile Works, and Crowd Flower.
What does it_takes_to_be_a_good_data_scientist_2019_aim_simplilearnPraj H
Over the years, the term ‘data scientist’ has evolved greatly. From describing a person who handles data, to a professional who leverages machine learning — this definition has seen a great deal of change. Now, circa 2019, there are numerous blogs, Reddit pages and Quora threads dedicated to the discussion about “how to become a good data scientist”.
Data science and the art of persuasionAlex Clapson
The presentation of data science to lay audiences—the last mile—hasn’t evolved as rapidly or as fully as the science’s technical part. It must catch up, and that means rethinking how data science teams are put together, how they’re managed, and who’s involved at every point in the process, from the first data stream to the final chart shown to the board. Until companies can successfully traverse that last mile, data science teams will under deliver. They will provide, in Willard Brinton’s words, foundations without cathedrals.
Similar to Talk straps: Interactivity between Human and Artificial Intelligence (20)
FROM GRADUATE SCHOOL TO PROFESSIONAL LIFE PREPARING A LONG JOURNEYGenoveva Vargas-Solar
Long studies and particularly PhD, imply being ready to perform an exciting journey that leads to unexpected places. It is a personal experience that imposes courage, patience and discipline but above all it is a human experience that calls for intellectual ambition, and humility. As any important personal adventure, the journey must be thoroughly prepared beforehand according to one’s own expectations, objectives and motivation. Of course, as any plan, it will be adjusted as events come up and as the individual discovers new opportunities and weaves them with a personal life.
This talk points out, as a travel guide, practical information about things that should be considered during PhD for preparing ones own personal strategy to have access to the « best » opportunities when planning and starting a professional career.
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
This novel and multidisciplinary data centric and scientific movement, promises new and not yet imagined applications that rely on massive amounts of evolving data that need to be cleaned, integrated and analysed for modelling purposes. Yet, data management issues are not usually perceived as central. In this keynote I will explore the key challenges and opportunities for data management in this new scientific world, and discuss how a possible data centric artificial intelligence supported by high performance computing (HPC) can best contribute to these exciting domains. If the moto is not academic, huge numbers of dollars being devoted to related applications are moving industry and academia to analyse these directions.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Talk straps: Interactivity between Human and Artificial Intelligence
1. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Enabling Interactivity between
Human and Artificial Intelligence
Behrooz Omidvar-Tehrani
@BehroozOmidvar
2nd Workshop on Smart Data Integration and Processing on Service Based Environments
December 14, 2020
2. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Users, in their different roles, have different information seeking needs.
• Typical paradigms of information seeking are search, recommendation,
and exploration (search by experience).
Information seeking
2
Data scientist Domain expert Information consumer
3. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
For a full-fledged information seeking, users should be able to interact with the data system and
fulfill their tasks.
Main goal in information seeking
3
Data scientist
explores the banking data
to find patterns and trends
in loan records.
Medical expert
explores the effect of air
pollution on chronic respiratory
diseases in different regions.
Information consumer
explores Amazon to buy
a good point-and-shoot
camera.
4. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
State of the art in information seeking — are we done?
4
Tableau Visual Analytics tool
5. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• In their paper called Subjective Databases, Li et el. claim that “making decisions that involve
subjective preferences is labor intensive for the end user.”
Subjective and objective tasks in information seeking
5
User
Subjective
task
Objective
task
their price is reasonable.
their price is lower than $300
per night.
I want to receive the list
of hotel such that …
Data
[VLDB’19]
6. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• A challenge in information seeking is that most often complex and cold users carry vague tasks.
• The vagueness emanates from the lack of knowledge about the data schema and/or data
distribution.
Subjectivity in information seeking
6
Data scientist explores the banking data find
patterns and trends in loan records.
Medical expert explores the effect of air pollution on
chronic respiratory diseases in different regions.
Information consumer explores Amazon to buy a
good point-and-shoot camera.
How is an interesting effect defined?
What does distinguish between an
interesting and futile pattern/trend?
How is a good camera specified?
7. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• No scientific method can provide a precise answer for a subjective task. However a frame of
reference is more feasible to deliver.
• The Fermi estimation is an approach to obtain this frame of reference for Fermi problems, e.g.,
the number of piano tuners in Chicago, or number of extraterrestrial civilizations in the Milky
Way galaxy (Drake equation).
• Fermi estimates work because overestimates and underestimates help cancel each other out, in
the absence of a consistent bias.
How to obviate subjectivity?
7
List hotels with reasonable price! I can afford more!
User
AI agent
Can’t do! Should be okay.
Is the price range $50-$100
per night reasonable?
What about $500-
$700 per night?
Hence, $200-$300?
[Weinstein 2012]
8. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Data mining, databases, machine learning, and visual analytics, contribute to fulfilling clear tasks.
Objective tasks
8
Find the correlation between the loan
approval and the investees’ demographics.
Medical expert compares the survival rate in
a high and low air polluted areas.
Find a point-and-shoot camera with 4K video
resolution and the price lower than $100.
Statistics and visual analytics, such as
Kaplan-Meier charts.
ML techniques such as linear regression
Find groups of investees with success stories.
Pattern mining techniques such as
frequent item-set mining.
SQL querying
9. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Typically, an AI pipeline with ML components can easily perform objective tasks.
Status quo for objective tasks
9
Data
Preparation
(ETL)
Black-box
Machine
Learning
Data
Presentation
Raw data User
Inductiv
10. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Typical AI pipelines lack personalization, customization, and transparency, because the user is
left out of the loop.
Shortcomings of AI pipelines for subjective tasks
10
AI system
Raw data User
User context
(personalization)
User feedback
(Customization)
Explanation
(Transparency)
Focus of today’s talk!
11. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Inspired by Fermi estimation, incorporating interactivity between human and artificial
intelligence in AI pipelines can be solution for subjective information seeking tasks.
• Two big questions about interactivity …
How to formalize interactions?
How can the interaction between users and the AI agent happen?
How to learn interactions?
Can these interactions get automated? Can AI predict user decisions? Can AI assist users in these
interactions?
Interactivity for subjective information seeking
11
12. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Formalizing interactions
Part 1
13. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Moravec’s paradox. Machines and humans frequently have opposite strengths and weaknesses.
Machines do reasoning, humans do sensory.
• An interactive pipeline becomes a mixed-initiative system.
Proposed model
13
Raw data User
Exploration
AI agent as an ML component
in the pipeline
Feedback
Guide
[Hans Moravec, Harvard University Press 1988]
14. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• In a mixed-initiative system, there is a conversation between the user and the AI agent.
• The system guides the user in the plethora of data points, and the user provides feedback on
how this guidance should be performed.
• In such system, both the user and the agent grow together toward converging on a target.
Interactions in mixed-initiative system
14
User
AI agent
Target
Towards clarifying subjective tasks
Towards comprehending user needs
Guide
Feedback
15. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Assume a program committee (PC) chair wants to build a PC for a conference, formed by
geographically distributed male and female researchers with different seniority and dexterity
levels.
Example: PC formation
15
Data Pipelines for User Group Analy4cs: SIGMOD 2019 Tutorial July 5, 2019 at Amsterdam
By-example explora4on
• Explora4on is expressed as similarity with an example group: explore-
around and explore-within.
IUGA
[Omidvar-Tehrani et al., CIKM’15]
prolific, high publi.,
SIGMOD (29)
prolific, high publi.,
SIGMOD (29)
junior, high publi.
(46)
explore
around
prolific, high publi.,
ACM (46)
Itera4on 1
produc4ve,
temporal databases
(11)
highly senior, VLDB
(119)
SIGMOD, schema
matching, male
Itera4on 2
The group
contains L. Popa, A. Doan,
M. Benedikt, and S. Amer-
Yahia
The group contains F. Bonchi, K.
Chakrabar4, P. Fratenali and F. Naumann
Sihem Amer-Yahia
Denilson Barbosa
Michael Benedikt
Francesco Bonchi
Kaushik Chakrabar4
Lei Chen
Piero Fraternali
Felix Naumann
Paolo Papov
Lucian Popa
Mar4n Theobald
Fei Wu
Program Commiee
(not exhaus4ve)
Build a program commi"ee for WebDB’14 formed by geograph
male and female researchers with different seniority and
explore
within
[CIKM’15] [VLDBJ’19]
User (PC chair)
16. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Assume a tourist (who is a cold user) is walking in the area of the Pompidou center in Paris. After
30 minutes of walking, she gets tired and asks the system for “me time” recommendations.
Another example: cold-start POI recommendation
16
[SIGSPATIAL’20]
Visitors who have many friends, check in actively, and tend to visit
historical landmarks.
Visitors who post many photos tend to visit Asian restaurants.
Visitors who have many friends and tend to visit coffee shops.
I look for “me time”
Visitors who have many friends and visit restaurants on evenings.
Visitors who visit Modern Art Museums .
Visitors with many check-ins who visit shopping centers.
User clicks on
a yellow POI.
STEP 1 STEP 2
‘I’m now hungry”
17. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• User’s feedback is provided using exploration actions.
• Given a set of data cubes as guidance, the user expresses her interest in what she wants to
receive in the next iteration using an exploration action (feedback).
• The function explore(d,k,e) takes as input a data cube d (feedback), an integer k, and an
exploration action e, and returns k other cubes as guidance, called D.
Feedback
17
prolific, high publi,
SIGMOD
Feedback d
Exploration action e
(explore around)
D = explore(d,k,e) given k=3
SIGMOD, Schema
matching, male
highly senior, VLDB
productive, temporal
databases
Iteration t Iteration t+1
18. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Exploration actions are combination of “interestingness” measures to steer users towards a
particular cube set of interest. They capture representativeness and informativeness.
Exploration actions
18
User
Feedback d
at iteration t
Guidance D
at iteration t+1
[TKDE’19] [VLDBJ’2018]
Exploration actions
Exploration action Semantics
explore around
constrain Jaccard sim. with d and
maximize diversity in D
explore within
stay inside d and maximize
coverage in D
by distribution
constrain Earth Mover Distance with the
score distributions in d and maximize
diversity in D
by topic
constrain Cosine sim. with the LDA
topics in d and maximize diversity in D
19. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Example of exploration actions
19
prolific, high publi,
SIGMOD
SIGMOD, Schema
matching, male
highly senior, VLDB
productive, temporal
databases
Cube c1 Cube c2 Cube c3
Local constraint
Global constraint
(given the similarity threshold to be 0.7)
Jacc(d,c1) = 0.81
Jacc(d,c2) = 0.93
Jacc(d,c3) = 0.74
diversity({c1,c2,c3}) → max
explore around
Jacc(d,c4) = 0.65
d
20. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20 20
Numberofiterations
0
4
8
12
16
WebDB'14 SIGMOD'14 VLDB'14 CIKM'14
Reaching 50% of the PC Reaching 80% of the PC
Effect of exploration actions
21. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Learning interactions
Part 2
22. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Per Robin Hogarth (Economics Professor at UPF Barcelona), there exist two learning
environments: kind and wicked.
• Kind (or close-world) environments need tactics which are obtained
using repetition and practice.
• In Chess or Tennis, practice can make a grandmaster.
• ML systems are good to imitate kind environments.
• Wicked (or open-world) environments needs strategy, which differs
from tactics, and may even contradict it.
• Even a senior financial expert may get surprised by a share
investment situation.
• ML systems do not imitate well due to lack of logs.
• Information seeking often occurs in a wicked environment.
Learning interactions is challenging
22
Judit Polgár
23. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Exploration state. The exploration action at iteration t is denoted as st = (dt, Dt), where Dt
is obtained by applying et-1 to dt-1.
• Exploration session. A sequence of exploration states and actions S = [(s1,e1)…(sn,en)]
• Policy. A state-action mapper π(st)=e1
• Objective. Find the optimal policy π*
Learning interactions
23
User with a sub-optimal policy
AI agent with an optimal policy
!
"
Target
next action?
optimal action!
24. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• A good exploration policy collects more rewards along the way.
• Cube utility. Given a cube d and and an exploration target T, d’s utility is symbolically shown as
d ∩ T, and can be computed by any similarity measure.
• Reward. Given the current state st and current action et , the reward of transitioning to
another state st+1 denoted as R(st+1|st,et)is equal to the cube utility of dt+1.
• Hence an optimal policy is the one that maximizes the discounted cumulative reward.
What is a good exploration policy?
24
25. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Offline phase. An RL agent simulating a human analyst is trained to learn an optimal policy. The
policy is updated as the agent interacts with the data cubes via exploration actions. Action
selection is based on ε-greedy method (optimal under GLIE)
• Online phase. Once a policy is learned, it is provided to a user who applies it to generate an
interpretable exploration session.
General solution
25
policy
Update policyexplore
offline phase online phase
Explore with the
recommended action
recommend
action
feedback
26. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• The optimal policy requires that the value function V and the action-value function Q are
maximized.
• Following Bellman equation:
• An optimal policy is obtained when we know X o or .
Obtaining the optimal policy
26
27. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Typically RL learns matrix for .
• Instead, we learn where and
• Given the function f as the state-feature function:
• Features include (but are not limited to) diversity, coverage, size, previously seen targets,
distribution, and previous action.
Improved policy learning with feature functions
27
28. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• To learn we apply SGD by the minimization of mean squared error:
• Updates are incrementally done by SARSA:
Learning procedure
28
29. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
An example exploration session
29
by-distribu*on
features ↓
explore-
around
by-distrib. by-topic
support < 200 -0.04 0.01 0.03
no conferences 0.02 0.05 -0.04
demographic attributes 0.04 -0.02 0.03
recent papers 0.06 0.03 0.00
0-1 targets 0.14 0.10 -0.00
no seniority 0.10 0.02 0.03
reward=0 -0.12 0.04 0.02
sum 0.2 0.23 0.07
# discovered PC: 0
Beginningof
thesession
# discovered PC: 0
explore-around
[female, very productive, Europe, SIGIR]
[CEUR Workshop Proceedings, very
productive, Europe]
[female, very productive, SIGIR]
[female, Europe]
[female, EDBT, very productive]
[Asia, very active, male]
[Europe, confirmed, ICDE]
[UK/Ireland, very productive, male]
[ICDM, North America, ICDE]
[productive, female, Europe]
active
features
features ↓ explore-
around
by-distrib. by-topic
diversity >0.8 0.06 0.04 0.03
support < 50 0.11 -0.02 0.01
conference in label -0.03 -0.04 0.02
demographic attribs. 0.04 -0.02 0.03
recent papers 0.06 0.03 0.00
0-1 targets 0.14 0.10 -0.00
seniority -0.07 0.02 0.00
reward=0 -0.12 0.04 0.02
sum 0.19 0.15 0.11
active
features
by-topic
[female, highly senior, Enc. of DB]
[VLDB J., senior, IEEE, PVLDB]
[Asia, EDBT, confirmed, male]
[highly senior, very productive, Europe,
Enc. of DB]
[GRADES]
# discovered PC: 1
[female, Europe, confirmed, PVLDB]
[female, EDBT, SIGMOD]
[female, Europe, Enc. of DB]
[female, prolific, ICDE]
[Europe, confirmed, ICDE]
features ↓ explore-
around
by-distrib. by-topic
diversity > 0.8 0.06 0.04 0.03
support < 50 0.11 -0.02 0.01
conference in label -0.03 -0.04 0.02
demographic attribs. 0.04 -0.02 0.03
recent papers 0.06 0.03 0.00
2-3 targets -0.03 0.01 0.12
no seniority 0.10 0.02 0.03
reward=0 -0.12 0.04 0.02
sum 0.19 0.06 0.26
# discovered PC: 2 active
features
30. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• We examine the target spread in 1, 2, 4, and 10 cubes.
Target spread
30
31. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• We examine the impact of training and test datasets.
• Left is WebDB PC and right is SIGMOD.
Potentials of transfer learning
31
32. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• We examine the impact of features.
• We consider the feature “the number of targets discovered so far”.
• Left is WebDB 2017 PC and right is SIGMOD 2017 PC.
Experiment on decision making
32
33. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
• Human-in-the loop data analytics is a must for today’s ML applications.
• Users and AI systems can cooperate to reach a common goal.
• Reinforcement learning is an appropriate model to capture the dynamics of interactions
between users and AI systems.
• Transfer learning (from general models to more specialized domains) shows great potentials for
the future interactive systems.
Conclusion
33
34. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Collaborators
34
Sihem Amer-Yahia Eric Simon Alexandre Termier Ria Mae Borromeo Mariia Seleznova
CNRS, University of
Grenoble Alpes
SAP Paris Inria Rennes
University of the
Philippines
TU Berlin
35. Enabling Interactivity between Human and Artificial Intelligence December 14, 2020 @ STRAPS’20
Question Time
Enabling Interactivity between Human and Artificial Intelligence
Behrooz Omidvar-Tehrani
@BehroozOmidvar
2nd Workshop on Smart Data Integration and Processing on Service Based Environments
December 14, 2020