Slides of CIKM 2023 paper by Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar, Peter Brusilovsky and Bita Akram
https://dl.acm.org/doi/10.1145/3583780.3615047
Data-centric AI and the convergence of data and model engineering:opportunit...Paolo Missier
A keynote talk given to the IDEAL 2023 conference (Evora, Portugal Nov 23, 2023).
Abstract.
The past few years have seen the emergence of what the AI community calls "Data-centric AI", namely the recognition that some of the limiting factors in AI performance are in fact in the data used for training the models, as much as in the expressiveness and complexity of the models themselves. One analogy is that of a powerful engine that will only run as fast as the quality of the fuel allows. A plethora of recent literature has started the connection between data and models in depth, along with startups that offer "data engineering for AI" services. Some concepts are well-known to the data engineering community, including incremental data cleaning, multi-source integration, or data bias control; others are more specific to AI applications, for instance the realisation that some samples in the training space are "easier to learn from" than others. In this "position talk" I will suggest that, from an infrastructure perspective, there is an opportunity to efficiently support patterns of complex pipelines where data and model improvements are entangled in a series of iterations. I will focus in particular on end-to-end tracking of data and model versions, as a way to support MLDev and MLOps engineers as they navigate through a complex decision space.
Automating Software Development Using Artificial Intelligence (AI)Jeremy Bradbury
In recent years, traditional software development activities have been enhanced through the use of Artificial Intelligence (AI) techniques including genetic algorithms, machine learning and deep learning. The use cases for AI in software development have ranged from developer recommendations to complete automation of software developer activities. To demonstrate the breadth of application, I will present several recent examples of how AI can be leveraged to automate software development. First, I will present an approach to predicting future code changes in GitHub projects using historical data and machine learning. Next, I will present our framework for repairing multi-threaded software bugs using genetic algorithms. I will conclude with a broad discussion of the impact AI is having on software development.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
Data-centric AI and the convergence of data and model engineering:opportunit...Paolo Missier
A keynote talk given to the IDEAL 2023 conference (Evora, Portugal Nov 23, 2023).
Abstract.
The past few years have seen the emergence of what the AI community calls "Data-centric AI", namely the recognition that some of the limiting factors in AI performance are in fact in the data used for training the models, as much as in the expressiveness and complexity of the models themselves. One analogy is that of a powerful engine that will only run as fast as the quality of the fuel allows. A plethora of recent literature has started the connection between data and models in depth, along with startups that offer "data engineering for AI" services. Some concepts are well-known to the data engineering community, including incremental data cleaning, multi-source integration, or data bias control; others are more specific to AI applications, for instance the realisation that some samples in the training space are "easier to learn from" than others. In this "position talk" I will suggest that, from an infrastructure perspective, there is an opportunity to efficiently support patterns of complex pipelines where data and model improvements are entangled in a series of iterations. I will focus in particular on end-to-end tracking of data and model versions, as a way to support MLDev and MLOps engineers as they navigate through a complex decision space.
Automating Software Development Using Artificial Intelligence (AI)Jeremy Bradbury
In recent years, traditional software development activities have been enhanced through the use of Artificial Intelligence (AI) techniques including genetic algorithms, machine learning and deep learning. The use cases for AI in software development have ranged from developer recommendations to complete automation of software developer activities. To demonstrate the breadth of application, I will present several recent examples of how AI can be leveraged to automate software development. First, I will present an approach to predicting future code changes in GitHub projects using historical data and machine learning. Next, I will present our framework for repairing multi-threaded software bugs using genetic algorithms. I will conclude with a broad discussion of the impact AI is having on software development.
Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets.
The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.
The Computer Programming in C++ book helps reader to understand the concepts of C++, the difference between C and C++, and basic fundamentals of object-oriented programming. This book is ideal for software developers who are looking forward to develop their career in the field of programming.
This Book Covers:
Fundamentals of C++ Programming Language
Difference between C and C++
Data Input / Output Processes and Flow Controls
Arrays, Functions, and Pointers
Structures and Unions
Abstraction, Encapsulation, Inheritance, and
Polymorphism
Classes and Objects
Constructors and Destructors
Concepts of Binding and Overloading
ISBN: 978-81-7722-830-4
Price: Rs. 399/- w/CD
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Manta ray optimized deep contextualized bi-directional long short-term memor...IJECEIAES
Complex question answering (CQA) is used for human knowledge answering and community questions answering. CQA system is essential to overcome the complexities present in the question answering system. The existing techniques ignores the queries structure and resulting a significant number of noisy queries. The complex queries, distributed knowledge, composite approaches, templates, and ambiguity are the common challenges faced by the CQA. To solve these issues, this paper presents a new manta ray foraging optimized deep contextualized bidirectional long-short term memory based adaptive galactic swarm optimization (MDCBiLSTMAGSO) for CQA. At first, the given input question is preprocessed and the similarity assessment is performed to eliminate the misclassification. Afterwards, the attained keywords are mapped into applicant results to improve the answer selection. Next, a new similarity approach named InfoSelectivity is introduced for semantic similarity evaluation based on the closeness among elements. Then, the relevant answers are classified through the MDCBiLSTM and optimized by a new manta ray foraging optimization (MRFO). Finally, adaptive galactic swarm optimization (AGSO) resultant is the best output. The proposed scheme is implemented on the JAVA platform and the outputs of designed approach achieved the better results when compared with the existing approaches in average accuracy (98.2%).
Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud
Graph Convolutional Networks (GCN) have found multiple applications of graph-based machine learning. However, training GCNs on large graphs of billions of nodes and edges with rich node attributes consume significant amount of time and memory resources. This makes it impossible to train such GCNs on general purpose commodity hardware. Such use cases demand high-end servers with accelerators and ample amounts of memory. In this paper we implement a memory efficient GCN based link prediction on top of a distributed graph database server called JasmineGraph. Our approach is based on federated training on partitioned graphs with multiple parallel workers. We conduct experiments with three real world graph datasets called DBLP-V11, Reddit, and Twitter. We demonstrate that our approach produces optimal performance for a given hardware setting. JasmineGraph was able to train a GCN on the largest dataset DBLP-V11(>10GB) in 20 hours and 24 minutes for 5 training rounds and 3 epochs by partitioning it into 16 partitions with 2 workers on a single server while the conventional training method could not process it at all due to lack of memory. The second largest dataset Reddit took 9 hours 8 minutes to train with conventional training while JasmineGraph took only 3 hours and 11 minutes with 8 partitions-4 workers in the same hardware giving 3 times improved performance. In case of Twitter dataset JasmineGraph was able to give 5 times improved performance. (10 hours 31 minutes vs 2 hours 6 minutes;16 partitions-16 workers).
https://jst.org.in/index.html
Our journal has a foster a sense of community among researchers, scholars, and academics. They provide a space for intellectual exchange, allowing scholars to engage with each other's work, provide constructive feedback, and build upon existing knowledge.
Invited talk at the 5th International Workshop on Search-Oriented Conversational AI (SCAI) @EMNLP2020. Here is the recording https://slideslive.com/38940054/response-generation-and-retrieval-for-multimodal-conversational-ai
Machine and Deep Learning Application.
Applying big data learning techniques for a malware classification problem.
Code:
https://gist.github.com/indraneeld/7ffb182fd8eb87d6d463dedc001efad0
Acknowledgments:
Canadian Institute for Cybersecurity (CIC) project in collaboration with Canadian Centre for Cyber Security (CCCS).
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
Emerging machine learning approaches, including deep learning methods, for information retrieval (IR) have recently demonstrated significant improvements in accuracy of relevance estimation at the cost of increasing model complexity and corresponding rise in computational and environmental costs of training and inference. In web search, these costs are further compounded by the necessity to train on large-scale datasets, consume long documents as inputs, and retrieve relevant documents from web-scale collections within milliseconds in response to high volume query traffic. A typical playbook for developing deep learning models for IR involves largely ignoring efficiency concerns during model development and then later scaling these methods by either finding faster approximations of the same models or employing heuristics to reduce the input space over which these models operate. Domain knowledge about the specific IR task and deeper understanding of system design and data structures in whose context these models are deployed can significantly help with not only model simplification but also to inform data-structure specific machine learning model design. Alternatively, predictive machine learning can also be employed specifically to improve efficiency in large scale IR settings. In this talk, I will cover several case studies for both improving efficiency of machine learning models for IR as well as direct application of machine learning to improve retrieval efficiency, and conclude with a brief discussion on potential future directions for efficiency-sensitive benchmarking of machine learning models for IR.
Program code examples (known also as worked examples) play a crucial role in learning how to program. Instructors use examples extensively to demonstrate the semantics of the programming language being taught and to highlight the fundamental coding patterns. Programming textbooks allocate considerable space to present and explain code examples. To make the process of studying code examples more interactive, CS education researchers developed a range of tools to engage students in the study of code examples. These tools include codecasts (codemotion,codecast,elicasts), interactive example explorers (WebEx, PCEX), and tutoring systems (DeepTutor). An important component in all types of worked examples is code explanations associated with specific code lines or code chunks of an example. The explanations connect examples with general programming knowledge explaining the role and function of code fragments or their behavior. In textbooks, these explanations are usually presented as comments in the code or as explanations on the margins. The example explorer tools allow students to examine these explanations interactively. Tutoring systems, which engage students in explaining the code, use these model explanations to check student responses and provide scaffolding. In all these cases, to make a worked example re-usable beyond its presentation in a lecture, the explanations have to be authored by instructors or domain experts i.e., produced and integrated into a specific system. As the experience of the last 10 years demonstrated, these explanations are hard to obtain. Those already collected are usually “locked” in a specific example-focused system and can’t be reused. The purpose of this working group is to support broader re-used of worked examples augmented with explanations. Our current plan is to develop а standard approach to represent explained examples. This approach will enable an example created for any of the existing systems to be explored in a standard format and imported into any other example-focused system. We plan to follow a successful experience of the PEML working group focused on re-using programming exercises.
The Computer Programming in C++ book helps reader to understand the concepts of C++, the difference between C and C++, and basic fundamentals of object-oriented programming. This book is ideal for software developers who are looking forward to develop their career in the field of programming.
This Book Covers:
Fundamentals of C++ Programming Language
Difference between C and C++
Data Input / Output Processes and Flow Controls
Arrays, Functions, and Pointers
Structures and Unions
Abstraction, Encapsulation, Inheritance, and
Polymorphism
Classes and Objects
Constructors and Destructors
Concepts of Binding and Overloading
ISBN: 978-81-7722-830-4
Price: Rs. 399/- w/CD
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Manta ray optimized deep contextualized bi-directional long short-term memor...IJECEIAES
Complex question answering (CQA) is used for human knowledge answering and community questions answering. CQA system is essential to overcome the complexities present in the question answering system. The existing techniques ignores the queries structure and resulting a significant number of noisy queries. The complex queries, distributed knowledge, composite approaches, templates, and ambiguity are the common challenges faced by the CQA. To solve these issues, this paper presents a new manta ray foraging optimized deep contextualized bidirectional long-short term memory based adaptive galactic swarm optimization (MDCBiLSTMAGSO) for CQA. At first, the given input question is preprocessed and the similarity assessment is performed to eliminate the misclassification. Afterwards, the attained keywords are mapped into applicant results to improve the answer selection. Next, a new similarity approach named InfoSelectivity is introduced for semantic similarity evaluation based on the closeness among elements. Then, the relevant answers are classified through the MDCBiLSTM and optimized by a new manta ray foraging optimization (MRFO). Finally, adaptive galactic swarm optimization (AGSO) resultant is the best output. The proposed scheme is implemented on the JAVA platform and the outputs of designed approach achieved the better results when compared with the existing approaches in average accuracy (98.2%).
Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud
Graph Convolutional Networks (GCN) have found multiple applications of graph-based machine learning. However, training GCNs on large graphs of billions of nodes and edges with rich node attributes consume significant amount of time and memory resources. This makes it impossible to train such GCNs on general purpose commodity hardware. Such use cases demand high-end servers with accelerators and ample amounts of memory. In this paper we implement a memory efficient GCN based link prediction on top of a distributed graph database server called JasmineGraph. Our approach is based on federated training on partitioned graphs with multiple parallel workers. We conduct experiments with three real world graph datasets called DBLP-V11, Reddit, and Twitter. We demonstrate that our approach produces optimal performance for a given hardware setting. JasmineGraph was able to train a GCN on the largest dataset DBLP-V11(>10GB) in 20 hours and 24 minutes for 5 training rounds and 3 epochs by partitioning it into 16 partitions with 2 workers on a single server while the conventional training method could not process it at all due to lack of memory. The second largest dataset Reddit took 9 hours 8 minutes to train with conventional training while JasmineGraph took only 3 hours and 11 minutes with 8 partitions-4 workers in the same hardware giving 3 times improved performance. In case of Twitter dataset JasmineGraph was able to give 5 times improved performance. (10 hours 31 minutes vs 2 hours 6 minutes;16 partitions-16 workers).
https://jst.org.in/index.html
Our journal has a foster a sense of community among researchers, scholars, and academics. They provide a space for intellectual exchange, allowing scholars to engage with each other's work, provide constructive feedback, and build upon existing knowledge.
Invited talk at the 5th International Workshop on Search-Oriented Conversational AI (SCAI) @EMNLP2020. Here is the recording https://slideslive.com/38940054/response-generation-and-retrieval-for-multimodal-conversational-ai
Machine and Deep Learning Application.
Applying big data learning techniques for a malware classification problem.
Code:
https://gist.github.com/indraneeld/7ffb182fd8eb87d6d463dedc001efad0
Acknowledgments:
Canadian Institute for Cybersecurity (CIC) project in collaboration with Canadian Centre for Cyber Security (CCCS).
Efficient Machine Learning and Machine Learning for Efficiency in Information...Bhaskar Mitra
Emerging machine learning approaches, including deep learning methods, for information retrieval (IR) have recently demonstrated significant improvements in accuracy of relevance estimation at the cost of increasing model complexity and corresponding rise in computational and environmental costs of training and inference. In web search, these costs are further compounded by the necessity to train on large-scale datasets, consume long documents as inputs, and retrieve relevant documents from web-scale collections within milliseconds in response to high volume query traffic. A typical playbook for developing deep learning models for IR involves largely ignoring efficiency concerns during model development and then later scaling these methods by either finding faster approximations of the same models or employing heuristics to reduce the input space over which these models operate. Domain knowledge about the specific IR task and deeper understanding of system design and data structures in whose context these models are deployed can significantly help with not only model simplification but also to inform data-structure specific machine learning model design. Alternatively, predictive machine learning can also be employed specifically to improve efficiency in large scale IR settings. In this talk, I will cover several case studies for both improving efficiency of machine learning models for IR as well as direct application of machine learning to improve retrieval efficiency, and conclude with a brief discussion on potential future directions for efficiency-sensitive benchmarking of machine learning models for IR.
Similar to SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction (20)
Program code examples (known also as worked examples) play a crucial role in learning how to program. Instructors use examples extensively to demonstrate the semantics of the programming language being taught and to highlight the fundamental coding patterns. Programming textbooks allocate considerable space to present and explain code examples. To make the process of studying code examples more interactive, CS education researchers developed a range of tools to engage students in the study of code examples. These tools include codecasts (codemotion,codecast,elicasts), interactive example explorers (WebEx, PCEX), and tutoring systems (DeepTutor). An important component in all types of worked examples is code explanations associated with specific code lines or code chunks of an example. The explanations connect examples with general programming knowledge explaining the role and function of code fragments or their behavior. In textbooks, these explanations are usually presented as comments in the code or as explanations on the margins. The example explorer tools allow students to examine these explanations interactively. Tutoring systems, which engage students in explaining the code, use these model explanations to check student responses and provide scaffolding. In all these cases, to make a worked example re-usable beyond its presentation in a lecture, the explanations have to be authored by instructors or domain experts i.e., produced and integrated into a specific system. As the experience of the last 10 years demonstrated, these explanations are hard to obtain. Those already collected are usually “locked” in a specific example-focused system and can’t be reused. The purpose of this working group is to support broader re-used of worked examples augmented with explanations. Our current plan is to develop а standard approach to represent explained examples. This approach will enable an example created for any of the existing systems to be explored in a standard format and imported into any other example-focused system. We plan to follow a successful experience of the PEML working group focused on re-using programming exercises.
Personalized Learning: Expanding the Social Impact of AIPeter Brusilovsky
Slide of my keynote talk at SIAIA '23 workshop held at AAAI 2023:
The use of AI in Education could be traced to the early days of AI. While the publicity associated with the most recent wave of AI applications rarely mentions education, it is through the improvement in education AI could achieve an impressive social impact. In particular, the AI ability to personalize the learning process could make a large difference in a context where learners' knowledge could be radically different from learner to learner. Modern computer and internet technologies can now bring the power of learning in the forms of MOOCs, online textbooks, and zoom courses truly worldwide. Yet, without personalization, the potential of these technologies is not fully leveraged. In this talk, I will review several generations of research on personalized learning and discuss tools, technologies, and infrastructures for personalized learning that we are currently exploring.
Action Sequence Mining and Behavior Pattern Analysis for User ModelingPeter Brusilovsky
Slides of my talk at 2022 Workshop on Temporal Aspects of User Modeling
Tracing learner interaction with educational content has recently emerged as a centerpiece of learning analytics. Augmented by various data mining technologies, learner data has been used to predict learner success and failure, prevent dropouts, and inform university officials about student progress. While the majority of existing learning analytics approaches ignore the time aspect in the learning data, recent research indicated that not just what the learners do, but how and in which order they do it is critical to understand differences between learners, model their behavior, and predict their performance. In my talk, I will focus on the application of action sequence mining as a tool to extract temporal patterns of learning behavior and recognize cohorts of learners with divergent behavior. I will review three case studies of using sequence mining with learner data, present the obtained results, and discuss their importance for user modeling and personalization.
Tutorial at UMAP 2022:
In recent years, the use of Artificial Intelligence (AI) technologies expanded to many areas where they directly affect the lives of many
people. AI-based approaches advise human decision-makers who should be released on bail, whether it is a good time to discharge a
patient from a hospital and whether a specific student is at risk to fail a course. Such an extensive use in AI in decision making came with
a range of protentional problems that have been extensively studied over the last few years. Recognition of these problems motivated a
rapid rise of research on “human-centered AI”, which attempted to address and minimize the negative effects of using AI technologies.
Among the ideas of human-centered AI is user control - engaging users in affecting AI decision making to prevent possible errors and
biases. In my talk, I will focus on the application of user control in one popular area of AI application, adaptive information access.
Adaptive information access systems such as personalized search and recommender systems attempt to model their users to help them in
finding the most relevant information. Yet, user modeling and personalization mechanisms might not always work as expected resulting
in errors, biases, and suboptimal behavior. Combining the decision power or AI with the ability of the user to guide and control it brings
together the strong sides of artificial and human intelligence and could lead to better results. This tutorial will provide a systematic review
of approaches focused on adding various kinds of user control to adaptive information access systems and discuss lessons learned,
prospects, and challenges of this direction of research.
Human-Centered AI in AI-ED - Keynote at AAAI 2022 AI for Education workshopPeter Brusilovsky
Abstract: In recent years, the use of Artificial Intelligence (AI) technologies expanded to many areas directly affecting the lives of millions. AI-based approaches advise human decision-makers who should be released on bail, whether it is a good time to discharge a patient from a hospital and whether a specific student is at risk to fail a course. Such extensive use in AI in decision making came with a range of protentional problems that have been extensively studied over the last few years. Recognition of these problems motivated a rapid rise of research on “human-centered AI”, which attempted to address and minimize the negative effects of using AI technologies. The majority of work on human-centered AI focus on various types of Human-AI collaboration through such technologies as transparency, explainability, and user control. In my talk, I will review how the ideas of Human-AI collaboration, transparency, explainability, and user control have been used in educational applications of AI in the past and will discuss now new ideas in this research area developed outside of AI-Ed could be creatively applied in educational context.
User Control in AIED (Artificial Intelligence in Education)Peter Brusilovsky
Slides of my intro to "Meet the Expert" session at AIED 2021. This is a subset of slides of a longer presentation on user control in AI extended with many specific examples from AIED area.
The Return of Intelligent Textbooks - ITS 2021 keynote talkPeter Brusilovsky
Early research on hypermedia learning and Web-based education featured a strong stream of work on intelligent and adaptive textbooks, which combined the knowledge modeling ideas from the field of intelligent tutoring with rich linking offered by the hypermedia and the Web. However, over the next ten years from 2005 to 2015 this area was relatively quiet as the focus of research in e-learning has shifted to other topics and other creative ideas to leverage the power of Internet. A recent gradual shift of the whole publication industry from printed books to electronic books followed by a rapid growth or the volume of online books re-ignited interests to “more intelligent” textbooks. The research on the new generation of intelligent textbooks engaged a larger set of technologies and engaged scholars from a broader range of areas including machine learning, natural language understanding, social computing, etc. In my talk I will review the past and present of research on intelligent textbooks from its origins to the diverse modern work providing examples of most interesting technologies and research results.
Data-Driven Education 2020: Using Big Educational Data to Improve Teaching an...Peter Brusilovsky
Modern educational settings from regular classrooms to MOOCs produce a a rapidly increasing volume of data that captures individual learning progress of millions of students at different level of granularity. This presence of this data opens a unique opportunity to re-engineer traditional education and build and develop a range of efficient data-driven approaches to support teaching and learning. In my talk, I will present several ways to use big educational data explored in our lab. The focus will be on open social learning modeling and identifying individual differences through sequential pattern mining, but several other approaches will be mentioned. Open social learning modeling and sequential pattern mining provides two considerably different examples on using educational data. One offers an immediate use of class interaction history to develop more engaging content access while another shows how big data could be used to uncover important individual differences that could be used to optimize the process for individual leaners.
Two Brains are Better than One: User Control in Adaptive Information AccessPeter Brusilovsky
In recent years, the use of Artificial Intelligence (AI) technologies expanded to many areas where they directly affect the lives of many people. AI-based approaches advise human decision-makers who should be released on bail, whether it is a good time to discharge a patient from a hospital and whether a specific student is at risk to fail a course. Such an extensive use in AI in decision making came with a range of protentional problems that have been extensively studied over the last few years. Recognition of these problems motivated a rapid rise of research on “human-centered AI”, which attempted to address and minimize the negative effects of using AI technologies. Among the ideas of human-centered AI is user control - engaging users in affecting AI decision making to prevent possible errors and biases. In my talk, I will focus on the application of user control in one popular area of AI application, adaptive information access. Adaptive information access systems such as personalized search and recommender systems attempt to model their users to help them in finding the most relevant information. Yet, user modeling and personalization mechanisms might not always work as expected resulting in errors, biases, and suboptimal behavior. Combining the decision power or AI with the ability of the user to guide and control it brings together the strong sides of artificial and human intelligence and could lead to better results. In my talk, I review several projects focused on user control in adaptive information access systems and discuss the benefits and challenges of this approach.
Personalized Online Practice Systems for Learning ProgrammingPeter Brusilovsky
Computer programming is quickly transitioning from being just a key competency in computer and information science majors to being a desired skill for students in a wide range of fields. Yet, it is also one of the most challenging subjects to learn. While learning by doing is a critical component in mastering programming skills, neither the traditional educational process nor standard learning support tools provide sufficient opportunities for programming practice. In this talk, I will present our research on personalized programming practice systems for Java, Python, and SQL, which attempt to bridge this known gap in learning programming. A programming practice system engages students in practicing programming skills beyond a relatively small number of graded assignments and exams. To support learning by doing, an online practice system offers a range of interactive “smart content” such as program animations, worked examples, and various kinds of programming problems with an automatic assessment. The main challenges for online practice systems are to motivate students to practice and to guide them to the most appropriate smart content given their course goals and knowledge levels. In this talk, I will review a range of AI technologies, such as student modeling, navigation support, social comparison, and content recommendation, which support efficient programming practice. I will also discuss how personalized practice system could support COVID-19-influenced switch to online learning while maintaining an extensive level of feedback expected from an efficient learning process.
UMAP 2019 talk Evaluating Visual Explanations for Similarity-Based Recommenda...Peter Brusilovsky
Tsai, Chun-Hua, and Peter Brusilovsky. 2019. "Evaluating Visual Explanations for Similarity-Based Recommendations: User Perception and Performance." In the 27th ACM Conference on User Modeling, Adaptation and Personalization, UMAP 2019, 22-30. Larnaca, Cyprus: ACM.
Course-Adaptive Content Recommender for Course AuthoringPeter Brusilovsky
Developing online courses is a complex and time-consuming
process that involves organizing a course into a sequence of topics and
allocating the appropriate learning content within each topic. This task
is especially difficult in complex domains like programming, due to the
incremental nature of programming knowledge, where new topics extensively
build upon domain concepts that were introduced in earlier lessons.
In this paper, we propose a course-adaptive content-based recommender
system that assists course authors and instructors in selecting the most
relevant learning material for each course topic. The recommender system
adapts to the deep prerequisite structure of the course as envisioned
by a specific instructor, while unobtrusively deducing that structure from
problem-solving examples that the instructor uses to present course concepts.
We assessed the quality of recommendations and examined several
aspects of the recommendation process by using three datasets collected
from two different courses.While the presented recommender system was
built for the domain of introductory programming, our course-adaptive
recommendation approach could be used in a variety of other domains.
Data-Driven Education: Using Big Educational Data to Improve Teaching and Learning. Keynote slides for 15th International Conference on Web-Based Learning, ICWL 2016, Rome, Italy, October 26–29.
From Expert-Driven to Data-Driven Adaptive LearningPeter Brusilovsky
Keynote slides for the Workshop on Advancing Education with Data at the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, Aug 14, 2017
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
SANN: Programming Code Representation Using Attention Neural Network with Optimized Subtree Extraction
1. SANN: Programming Code Representation
Using Attention Neural Network with
Optimized Subtree Extraction
Muntasir Hoq, Sushanth Reddy Chilla, Melika Ahmadi Ranjbar,
Peter Brusilovsky, Bita Akram
32nd ACM International Conference on Information and Knowledge Management (CIKM) 2023
2. • Programming code representation has various intelligent functionalities
• Code classification [White et al. 2016; Bui et al. 2018]
• Bug detection [Elmishali et al. 2019; Shi et al. 2021]
• Code summarization [Jiang et al. 2017; Abdelaziz et al. 2022]
• Automated programming code analysis tools can help in CS education
• Understanding student knowledge (what are the key concepts/skills/competencies)
• Tracking student learning (student modeling, knowledge )
• Identifying misconceptions
• Providing personalized guidance
• This study aims to develop a representation model by
• Capturing task-relevant information dynamically
• Dealing with sparse solution space and larger programs
• Ensuring interpretability
Motivation
2
3. Student Modeling with
Concept-level Code Representation
1. Extract concepts (unigrams) from problem or student code
• Hosseini, R. and Brusilovsky, P. (2013) JavaParser: A Fine-Grain Concept Indexing Tool for Java Problems.
In: Proceedings of The First Workshop on AI-supported Education for Computer Science (AIEDCS) at the
16th Annual Conference on Artificial Intelligence in Education, Memphis, TN, USA, pp. 60-63.
2. Use student modeling approaches to maintain mastery levels for
concepts as students work with problems (in any order)
• Barria-Pineda, J., Guerra, J., Huang, Y., and Brusilovsky, P. (2017) Concept-Level Knowledge Visualization
for Supporting Self-Regulated Learning. In: Proceedings of Companion of the 22nd International
Conference on Intelligent User Interfaces (IUI '17), Limassol, Cyprus, ACM, pp. 141-144
4. Personalized Learning with
Concept-level Code Representation
3. Use the current state of the student model to recommend best
problems to work with (and explain why they are the best)
Barria-Pineda, J., Akhuseyinoglu, K.,
Želem-Ćelap, S., Brusilovsky, P.,
Klasnja Milicevic, A., and Ivanovic,
M. (2021) Explainable
Recommendations in a Personalized
Programming Practice System. In:
22nd International Conference on
Artificial Intelligence in Education,
AIED 2021, Springer, pp. 64-76.
5. Need Better Code Representation
• Concepts are not enough – it is how they are combined in the code
that matters as well
• Huang, Y., Guerra Hollstein, J., Barria Pineda, J., and Brusilovsky, P. (2017) Learner Modeling for
Integration Skills. In: Proceedings of the 25th Conference on User Modeling, Adaptation and
Personalization, Bratislava, Slovakia, pp. 85-93.
• Efficient concept combinations that represent student competencies
could be learned from data
• Akram, B., Azizsoltani, H., Min, W., Wiebe, E., Mott, B., Navied, A., Boyer, K. E., and Lester, J. (2020)
Automated Assessment of Computer Science Competencies from Student Programs with Gaussian
Process Regression. In: A. N. Rafferty, J. Whitehill, V. Cavalli-Sforza and C. Romero (eds.) Proceedings of
13th International Conference on Educational Data Mining, July 10-13, 2020, pp. 555-559.
• Need to find semantic-level code structure-based code
representation to represent code patterns
6. Code Representation with Deep-learning
6
• AST-based:
• code2vec (exploiting context paths using an attention mechanism, Alon et al. 2019), code2seq (merging context path information using LSTM, Alon et al.
2019), TBCNN (using convolutional kernel, Mou et al. 2016), ASTNN (encoding statement trees using GRUs, Zhang et al. 2019), ast2vec (merging
subtree information recursively, Paaben et al. 2021)
• Graph-based:
• Gated graph neural network-based approaches based on different graphs: CFG, DFG, read-
write (Li et al. 2016, Zhou et al. 2019, MVG: Long et al. 2022)
• Pre-trained Transformer-based:
• GPT (fine-tuning GPT-2 on programming analysis task, Lajko et al. 2022), Llama, CodeBERT (code sequence based NL-PL approach, Feng et al. 2020),
GraphCodeBERT (Data flow graph-based, Guo et al. 2021), UniXcoder (AST-based, Guo et al. 2022)
7. Research Gaps
7
• Dynamic splitting of subtrees
• Capture task-relevant syntactic and semantic information from code.
• Preserve long term dependencies.
• Extracting both node and structural information
• Assist in variety of tasks.
• Provide deeper insights into the local semantics of ASTs.
• Interpretability
• Important subtrees and substructures getting more importance in the vector representation.
• Understand the important structures of the code responsible for the predicted outputs to
enhance interpretability of the model.
8. What is Missing?
• Structure information available in larger ASTs might not be fully
captured
• Current models fails to capture syntactic and semantic information
dynamically based on the prediction tasks – most use static AST
splittig
• Structure is important, but concepts (nodes) are important as well
• Current models are interpretable (except code2vec & code2seq, but
does not address 1)
9. Bridging Gaps with SANN
9
• Optimized sequential subtree extraction
• To effectively capture information by splitting program ASTs into subtrees of task-relevant size and
preserve the sequence of subtrees
• Our model aims to be effective for small student dataset representing sparse programming
solution space and also for larger programs (with larger ASTs) at the same time.
• Two-way embedding
• To capture both node-based and subtree-based information
• Attention mechanism for interpretability
• To emphasize on the important part of the code when representing the code vector and interpret
the model predictions
10. Splitting Abstract Syntax Tree (AST)
10
Public void print() {
System.out.println(“Hello World!”);
}
Subtrees of depth 2
22. 22
Interpretability Case Study-2
92% attention
Correct statement:
speed -= 5
An incorrect student solution for the problem caughtSpeeding
Incorrect statements are
given 5x higher attention
23. Conclusion
23
• The study proposed a novel interpretable model for programming code
representation using Subtree-based Attention Neural Network (SANN) with
optimized subtree extraction using Genetic Algorithm.
• The study demonstrated the effectiveness of the model to analyze sparse
solution space and larger ASTs in two tasks: program correctness prediction and
algorithm using student programs.
• Competitive performance, interpretability, and effectiveness on small classroom
datasets make SANN an ideal tool for analyzing student programs
• In the future, the model can be a valuable tool in computer science education by
providing insight into student learning of programming and helping educators
adapt their teaching methods to support their students.
24. Limitations
24
• Higher training time for optimization step.
• Once trained, can be used in offline educational setting and across different programming courses due to similar scope, scale and
course nature.
• Fixed sizes for embedding vectors.
• Vector size might have relationship with the size of ASTs.
Future Work
• Develop a multi-task classifier.
• Investigate the dynamic adaptation of vector sizes based on optimized subtree sizes.
• Build a pre-trained SANN model to ensure the highest accuracy and interpretability simultaneously.
• Explore the interpretable model to understand student programs and their learning and mistakes at a more
granular level
25. Want to Read More about it?
• Guerra Hollstein, J., Barria Pineda, J., Schunn, C., Bull, S., and Brusilovsky, P. (2017) Fine-Grained Open
Learner Models: Complexity Versus Support. In: Proceedings of 25th Conference on User Modeling, Adaptation
and Personalization, Bratislava, Slovakia, ACM, pp. 41-49.
• Barria-Pineda, J., Akhuseyinoglu, K., and Brusilovsky, P. (2023) Adaptive Navigational Support and Explainable
Recommendations in a Personalized Programming Practice System. ACM, 1-9.
• Huang, Y., Brusilovsky, P., Guerra, J., Koedinger, K., and Schunn, C. (2023) Supporting skill integration in an
intelligent tutoring system for code tracing. Journal of Computer Assisted Learning 39 (2), 477-500.
• Akram, B., Azizsoltani, H., Min, W., Wiebe, E., Mott, B., Navied, A., Boyer, K. E., and Lester, J. (2020)
Automated Assessment of Computer Science Competencies from Student Programs with Gaussian Process
Regression. In: A. N. Rafferty, J. Whitehill, V. Cavalli-Sforza and C. Romero (eds.) Proceedings of 13th
International Conference on Educational Data Mining, July 10-13, 2020, pp. 555-559.
• Yoder, S., Hoq, M., Brusilovsky, P., and Akram, B. (2022) Exploring Sequential Code Embeddings for Predicting
Student Success in an Introductory Programming Course. In: Proceedings of 6th Educational Data Mining in
Computer Science Education (CSEDM) Workshop at EDM2022, Durham, UK, July 27, 2022, Zenodo.
• Hoq, M., Brusilovsky, P., and Akram, B. (2023) Analysis of an Explainable Student Performance Prediction
Model in an Introductory Programming Course. In: Proceedings of the 16th International Conference on
Educational Data Mining (EDM 2023), Bengaluru, India., July 11-14, 2023, pp. 79-90.