This document summarizes a paper on using dual learning for semantic parsing without supervision. The key points are:
1) The paper proposes using dual learning to train semantic parsers without supervision by having two agents that translate between natural language queries and logical forms and provide feedback to each other.
2) The dual learning approach uses an attention-based encoder-decoder architecture for the primary tasks of translating queries to logical forms and vice versa, along with techniques like copying and entity mapping.
3) The dual agents are trained using reinforcement learning with rewards based on validity of the translations as judged by a logical form checker and language model, without any parallel data supervision.
4) Experiments on semantic parsing datasets like ATIS
(chapter 7) A Concise and Practical Introduction to Programming Algorithms in...Frank Nielsen
These are the slides accompanying the textbook:
A Concise and Practical Introduction to Programming Algorithms in Java
by Frank Nielsen
Published by Springer-Verlag (2009), Undergraduate textbook in computer science (UTiCS series)
ISBN: 978-1-84882-338-9
http://www.lix.polytechnique.fr/~nielsen/JavaProgramming/
http://link.springer.com/book/10.1007%2F978-1-84882-339-6
(chapter 7) A Concise and Practical Introduction to Programming Algorithms in...Frank Nielsen
These are the slides accompanying the textbook:
A Concise and Practical Introduction to Programming Algorithms in Java
by Frank Nielsen
Published by Springer-Verlag (2009), Undergraduate textbook in computer science (UTiCS series)
ISBN: 978-1-84882-338-9
http://www.lix.polytechnique.fr/~nielsen/JavaProgramming/
http://link.springer.com/book/10.1007%2F978-1-84882-339-6
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
In this tutorial, we have finally come to one of the most famous frameworks of Java: Collections Framework. We will take a look at the Lists, Sets, Queues and their types through examples.
Check out rest of the Tutorials: https://berksoysal.blogspot.com/2016/06/java-se-tutorials-basics-exercises.html
LR parsing is one type of bottom up parsing. In the LR parsing, "L" stands for left-to-right scanning of the input.
"R" stands for constructing a right most derivation in reverse.
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
Introduction to R language, How to install R and R studio
How to import dataset in R
How to explore data in R
www.simpleresearch.net
info@simpleresearch.net
Poio API: a CLARIN-D curation project for language documentation and language...Peter Bouda
Poio API is an open source software library written in Python and is being developed as part of a curation project within the working group “Linguistic Fieldwork, Anthropology, Language Typology” of CLARIN-D . The goal of Poio API is to provide unified access to pivot data structures parsed from different file formats that researchers use in language documentation projects. As unified data structures we chose an implementation of the “Graph Annotation Framework” (GrAF) that was standardized as ISO 24612 in 2012. In our presentation, we will discuss the connections between GrAF and TEI, and present two use cases that demonstrate the innovation and advantage of our approach in comparison to existing methods.
In the third tutorial I explain the difference between the equals operator and the equals() method. Then I further delve into documentation, JUnit testing, array manipulations and more on objects. Lastly I investigate the class methods versus the instance methods, static versus non-static and immutable objects.
Check out rest of the Tutorials: https://berksoysal.blogspot.com/2016/06/java-se-tutorials-basics-exercises.html
The presentation discusses about the following topics:
DBMS Architecture
Relational Algebra Review
Relational calculus
Relational calculus building blocks
Tuple relational calculus
Tuple relational calculus Formulas
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production?
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn?
At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance.
In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.
8th TUC Meeting - Peter Boncz (CWI). Query Language Task Force statusLDBC council
Peter Boncz, Research Scientist at the Centrum Wiskunde & Informatica in the Netherlands, talked about the updates on the Graph Query Language Task Force after being alive for a year. This Task Force was created to answer an issue detected during the benchmark meetings, all the workload is created in English text because there is no common graph query language.
In this tutorial, we have finally come to one of the most famous frameworks of Java: Collections Framework. We will take a look at the Lists, Sets, Queues and their types through examples.
Check out rest of the Tutorials: https://berksoysal.blogspot.com/2016/06/java-se-tutorials-basics-exercises.html
LR parsing is one type of bottom up parsing. In the LR parsing, "L" stands for left-to-right scanning of the input.
"R" stands for constructing a right most derivation in reverse.
1.3 introduction to R language, importing dataset in r, data exploration in rSimple Research
Introduction to R language, How to install R and R studio
How to import dataset in R
How to explore data in R
www.simpleresearch.net
info@simpleresearch.net
Poio API: a CLARIN-D curation project for language documentation and language...Peter Bouda
Poio API is an open source software library written in Python and is being developed as part of a curation project within the working group “Linguistic Fieldwork, Anthropology, Language Typology” of CLARIN-D . The goal of Poio API is to provide unified access to pivot data structures parsed from different file formats that researchers use in language documentation projects. As unified data structures we chose an implementation of the “Graph Annotation Framework” (GrAF) that was standardized as ISO 24612 in 2012. In our presentation, we will discuss the connections between GrAF and TEI, and present two use cases that demonstrate the innovation and advantage of our approach in comparison to existing methods.
In the third tutorial I explain the difference between the equals operator and the equals() method. Then I further delve into documentation, JUnit testing, array manipulations and more on objects. Lastly I investigate the class methods versus the instance methods, static versus non-static and immutable objects.
Check out rest of the Tutorials: https://berksoysal.blogspot.com/2016/06/java-se-tutorials-basics-exercises.html
The presentation discusses about the following topics:
DBMS Architecture
Relational Algebra Review
Relational calculus
Relational calculus building blocks
Tuple relational calculus
Tuple relational calculus Formulas
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production?
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn?
At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance.
In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.
Provenance for Data Munging EnvironmentsPaul Groth
Data munging is a crucial task across domains ranging from drug discovery and policy studies to data science. Indeed, it has been reported that data munging accounts for 60% of the time spent in data analysis. Because data munging involves a wide variety of tasks using data from multiple sources, it often becomes difficult to understand how a cleaned dataset was actually produced (i.e. its provenance). In this talk, I discuss our recent work on tracking data provenance within desktop systems, which addresses problems of efficient and fine grained capture. I also describe our work on scalable provence tracking within a triple store/graph database that supports messy web data. Finally, I briefly touch on whether we will move from adhoc data munging approaches to more declarative knowledge representation languages such as Probabilistic Soft Logic.
Presented at Information Sciences Institute - August 13, 2015
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
An introduction to Learning to Rank, with case studies using RankLib with and without plugins provided by Solr and Elasticsearch. RankLib is a library of learning to rank algorithms, which includes some popular LTR algorithms such as LambdaMART, RankBoost, RankNet, etc.
The openCypher Project - An Open Graph Query LanguageNeo4j
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
We want to present the openCypher project, whose purpose is to make Cypher available to everyone – every data store, every tooling provider, every application developer. openCypher is a continual work in progress. Over the next few months, we will move more and more of the language artifacts over to GitHub to make it available for everyone.
openCypher is an open source project that delivers four key artifacts released under a permissive license: (i) the Cypher reference documentation, (ii) a Technology compatibility kit (TCK), (iii) Reference implementation (a fully functional implementation of key parts of the stack needed to support Cypher inside a data platform or tool) and (iv) the Cypher language specification.
We are also seeking to make the process of specifying and evolving the Cypher query language as open as possible, and are actively seeking comments and suggestions on how to improve the Cypher query language.
The purpose of this talk is to provide more details regarding the above-mentioned aspects.
A Production Quality Sketching Library for the Analysis of Big DataDatabricks
In the analysis of big data there are often problem queries that don’t scale because they require huge compute resources to generate exact results, or don’t parallelize well.
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
A talk given at ACM SIGMOD 2018 in support of the paper <a href="https://arxiv.org/abs/1802.10233"> Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources</a>.
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD. Calcite's architecture consists of a modular and extensible query optimizer with hundreds of built-in optimization rules, a query processor capable of processing a variety of query languages, an adapter architecture designed for extensibility, and support for heterogeneous data models and stores (relational, semi-structured, streaming, and geospatial). This flexible, embeddable, and extensible architecture is what makes Calcite an attractive choice for adoption in big-data frameworks. It is an active project that continues to introduce support for the new types of data sources, query languages, and approaches to query processing and optimization.
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFlink Forward
http://flink-forward.org/kb_sessions/taking-a-look-under-the-hood-of-apache-flinks-relational-apis/
Apache Flink features two APIs which are based on relational algebra, a SQL interface and the so-called Table API, which is a LINQ-style API available for Scala and Java. Relational APIs are interesting because they are easy to use and queries can be automatically optimized and translated into efficient runtime code. Flink offers both APIs for streaming and batch data sources. This talk will take a look under the hood of Flink’s relational APIs. We will show the unified architecture to handle streaming and batch queries and explain how Flink translates queries of both APIs into the same representation, leverages Apache Calcite to optimize them, and generates runtime code for efficient execution. Finally, we will discuss potential improvements and give an outlook for future extensions and features.
Taking a look under the hood of Apache Flink's relational APIs.Fabian Hueske
Apache Flink features two APIs which are based on relational algebra, a SQL interface and the so-called Table API, which is a LINQ-style API available for Scala and Java. Relational APIs are interesting because they are easy to use and queries can be automatically optimized and translated into efficient runtime code. Flink offers both APIs for streaming and batch data sources. This talk takes a look under the hood of Flink’s relational APIs. The presentation shows the unified architecture to handle streaming and batch queries and explain how Flink translates queries of both APIs into the same representation, leverages Apache Calcite to optimize them, and generates runtime code for efficient execution. Finally, the slides discuss potential improvements and give an outlook for future extensions and features.
The root of all modern language is ALGOL (Algorithmic Language), introduced in the early 1969s. ALGOL was the first computer language to use a block structure. In 1967, Martin Richards developed a language called BCPL (Basic Combined Programming Language) primarily for writing system software. In 1970, Ken Thompson created a language using main features of BCPL and called it simply B. B was used to create early version of UNIX operating system at Bell Laboratories. C was evolved from ALGOL, BCPL and B by Dennis Ritchie at AT & T’s Bell Laboratories in 1972 for use on the UNIX operating system. It has since spread to many other operating systems, and is now one of the most widely used programming languages.
Large-Scale Machine Learning with Apache SparkDB Tsai
Spark is a new cluster computing engine that is rapidly gaining popularity — with over 150 contributors in the past year, it is one of the most active open source projects in big data, surpassing even Hadoop MapReduce. Spark was designed to both make traditional MapReduce programming easier and to support new types of applications, with one of the earliest focus areas being machine learning. In this talk, we’ll introduce Spark and show how to use it to build fast, end-to-end machine learning workflows. Using Spark’s high-level API, we can process raw data with familiar libraries in Java, Scala or Python (e.g. NumPy) to extract the features for machine learning. Then, using MLlib, its built-in machine learning library, we can run scalable versions of popular algorithms. We’ll also cover upcoming development work including new built-in algorithms and R bindings.
Bio:
Xiangrui Meng is a software engineer at Databricks. He has been actively involved in the development of Spark MLlib since he joined. Before Databricks, he worked as an applied research engineer at LinkedIn, where he was the main developer of an offline machine learning framework in Hadoop MapReduce. His thesis work at Stanford is on randomized algorithms for large-scale linear regression.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
2. Our Goals
• Understanding the task
• Semantic Parsing & Logical Forms
• Understanding their method
• Dual Learning
• How to fit semantic parsing into dual learning
• Understanding other methods
3. Semantic Parsing
• Semantic Parsing:
• Mapping a natural language query into a logical form;
• Logical form:
• One type of meaning representation understood by
computers, usually executable to obtain answers.
• Common treatment:
• End-to-end Seq2seq models (fully supervised)
4. Example
• Query: “show flight from ci0 to ci1”
• Lambda expression:
• Strictly structured & executable
• A flexible function instance expressed by strings
• Foundation of programming languages » check wiki
• Demand of semantic parsing:
• Valid and complete at surface and semantic levels.
( lambda $0 e
( and
( flight $0 )
( from $0 ci0 )
( to $0 ci1)
) )
5. Task Demand• Surface:
• A complete tree structure / logical form
• “()” matches
• Semantic:
• Predicate & its arguments (or even their types) match
• ( flight $0 ) (from $0 ci0) (to $0 ci1)
• i.e. semantic parsing is (NL-to-AMR/SQL)
• A bridge between human and machine languages
• Python is good interpreter | we are semantic parsers.
6. A semantic parser needs to …
• (below are in my words)
• Understand the natural language | its meaning.
• Able to infer its structure.
• Understand the function and its arguments.
• Able to translate / map.
• btw. MT symbols -> symbols ⼈人
👀
7. This work
• Dual learning into semantic parsing
• Query to logical form (Q2LF);
• Logical form to query (LF2Q);
• Benefit or trait:
• No supervision
• Reinforcement learning: a validity reward for
• Achievement:
• SOTA on ATIS | competitive SOTA on OVERNIGHT
↕ provide signals to each other
Surface
Semantic
(primary)
(dual)
8. Primary Task
• Attention-based Encoder-Decoder architecture
• (Luong et al., 2015) attention:
• Regular endec output:
• Copy mechanism
• Entity Mapping: Uniform Resource Identifier (URI, K&C, 2006)
• e.g. kobe bryant → en.player.kobe_bryant;
•
gt ⋅
(1 − gt) ⋅
+
Selection at t
Attentions at t
9. Dual Model
• Reverse Entity Mapping:
• e.g. en.player.kobe_bryant → [the black mamba; …]
• Randomly select on from the list.
KB−1
(yt)
10. Dual learning
• Two agents & two loops:
• (Q2LF, LF2Q), (LF2Q, Q2LF)
• Reinforcement learning based on policy gradient (Sutton)
• Data:
• queries; logical forms; parallel
• Supervised initialization with ;
• Two unsupervised loops with 、
𝒬 ℒℱ 𝒯
𝒬 ∪ 𝒯
𝒬 ∪ 𝒯 ℒℱ ∪ 𝒯
11. Supervisor Guidance
• When is limited, the unsupervised models would rot.
• Initial training: maximum likelihood estimation (MLE)
• Train Q2LF and LF2Q on .
• Other preparation:
• Train on .
• Logical form checker: grammar_error_indicator(・)
𝒯
𝒯
LMq 𝒬 ∪ 𝒯
12. Loop starts from a query
1. Sample a query from ;
2. Q2LF generates k logical forms with beam search;
3. Calculate validity reward:
4. Reconstruct with LF2Q
5. Calculate reconstruction reward
6. Balance rewards:
7. Update:
𝒬 ∪ 𝒯
y1, y2, ⋯yk
Rval
q (yi)
Rrec
q (x, yi)
rq
i
= αRval
q (yi) + (1 − α)Rrec
q (x, yi)
13. Loop starts from a logical form
1. Sample a query from ;
2. LF2Q generates k queries with beam search;
3. Calculate validity reward:
4. Reconstruct with Q2LF
5. Calculate reconstruction reward
6. Balance rewards:
7. Update:
ℒℱ ∪ 𝒯
x1, x2, ⋯xk
Rval
lf (xi)
Rrec
lf (y, xi)
rlf
i
= βRval
lf (xi) + (1 − β)Rrec
lf (y, xi)
14. Reward design (1/2)
• grammar_error_indicator(・)
has been included in
OVERNIGHT dataset.
• Otherwise, construct a
grammar_error_indicator(・)
based on the ontology of
the corresponding dataset.
• 1 when correct, otherwise 0
• Rval
q (y) = grammar_error_indicator(y)
15. Reward design (2/2)
• The pre-trained language model for queries:
• ;
• That’s it!
• Let’s see experiments.
LMq
Rval
lf =
log LMq(x)
Length(x)
16. Experiments
• Datasets (supervised)
• ATIS(1994):
Airline
Travel
Information
System
• OVERNIGHT(2015):
• 8 domain: calenda/
restaurants/social
networks/
basketball/blocks/
publication/
recipes/housing
Parallel data (Two s)𝒯
17. Experiments
• Synthesis logical forms
( for ATIS )
• Modification based
on ontology:
• Replace entity or
predicate;
• Check validity;
• Check novelty;
Data augmentation (1/2)ℒℱ
18. Experiments
• Synthesis logical forms (for OVERNIGHT)
• Generation based on grammar:
• Reorder entity instances of one type.
• +500 for each domain.
Data augmentation (2/2)ℒℱ
19. Experiments
• Base model:
• and : 0.5
• Pre-trained word: Glove6B
• : 100-dim word emb
200-dim hidden
• Training and decoding:
• Beam size: {3, 5}
• Batch size: {10, 20}
• Adam lr: 0.001
α β
LMq
Model settings
• Pseudo baseline, 1/2 loss:
• Back-translation (Sennrich
et al., 2016)
• “Varied Queries” (Guo et
al., 2018)
20. ATIS
陳:Results on ATIS
are convincing.
Simple data
augmentation does
not help the model.
Back-trans fails to
treat SQL well.
DUAL dramatically
makes their model
so different.
Both Copy-mech &
Coarse2Fine
suggest the sparsity
of natural language.
21. OVERNIGHT
OVERNIGHT contains less distinct entities and training samples,
that copy is not essential.
陳:I start to wonder, this is the maximum of the Seq2seq capacity.