This document introduces a new dataset of machine learning final exams from MIT, Harvard, and Cornell consisting of 646 questions. It aims to test if machines can learn machine learning by having models answer questions from actual university final exams in the subject. These questions are longer, more complex, and cover a broader range of machine learning topics compared to typical problem sets. Several baseline models are evaluated on the dataset, with the best model found to perform at a human level. The full dataset and code are made publicly available to assess new language models and advance automatic problem-solving abilities in machine learning.
A neural network solves, explains, and generates university math problems by ...Sarah Pollard
- A neural network called OpenAI Codex, which was pre-trained on text and fine-tuned on code, can automatically solve 80% of university-level math problems from 6 MIT courses and 1 Columbia University course by synthesizing programs through few-shot learning.
- It does this by generating executable programs to solve problems and visualizing solutions. It can also explain solutions and generate new problems indistinguishable from human-written ones.
- This improves the previous state-of-the-art accuracy on a benchmark of advanced math problems from 8.8% to 81.1%, demonstrating that neural networks can now solve university-level math at a human level for the first time.
This course provides undergraduate computer science students with an understanding of computational thinking and approaches for solving large, complex problems. The course aims to develop skills like decomposition, pattern recognition, abstraction, algorithm design, and the ability to communicate and work with others to achieve solutions. Topics covered include computational complexity, the pillars of computational thinking, probability, statistics, optimization problems and approaches, and graph theory. Students will learn through lectures, self-study, group activities, peer assessment and presentations. Resources and labs are available for coursework and assignments, which are assessed through continuous evaluation and a final exam.
Moodle quiz: towards post-paper e-assessmentmathewhillier
Using Moodle quiz for assessments that begin to leverage the affordances of ICTs that go beyond the capabilities of paper equivalents - post paper assessment task design examples.
This document provides an overview of the CIS 115 Introduction to Programming and Logic course. The course introduces computer programming, problem solving, and logic. Topics covered include syntax, data types, program organization, algorithm design, control structures, and object-oriented programming. Upon completing the course, students should be able to write programs using variables, arrays, functions, files and recursion, as well as implement sorting and searching algorithms.
Machine Learning: Foundations Course Number 0368403401butest
This machine learning foundations course will consist of 4 homework assignments, both theoretical and programming problems in Matlab. There will be a final exam. Students will work in groups of 2-3 to take notes during classes in LaTeX format. These class notes will contribute 30% to the overall grade. The course will cover basic machine learning concepts like storage and retrieval, learning rules, estimating flexible models, and applications in areas like control, medical diagnosis, and document retrieval.
Liqiang He is seeking an internship applying mathematics. He has a Bachelor's degree in Applied
Mathematics from UCSD and skills in programming languages like Java, operating systems, and software
like MATLAB. He is fluent in Chinese and has taken courses in calculus, linear algebra, statistics, and
computer science. His experience includes using probabilistic models in finance, analyzing life insurance
with survival distributions, solving optimization problems, presenting mathematical proofs, and coding
simple apps.
A neural network solves, explains, and generates university math problems by ...Sarah Pollard
- A neural network called OpenAI Codex, which was pre-trained on text and fine-tuned on code, can automatically solve 80% of university-level math problems from 6 MIT courses and 1 Columbia University course by synthesizing programs through few-shot learning.
- It does this by generating executable programs to solve problems and visualizing solutions. It can also explain solutions and generate new problems indistinguishable from human-written ones.
- This improves the previous state-of-the-art accuracy on a benchmark of advanced math problems from 8.8% to 81.1%, demonstrating that neural networks can now solve university-level math at a human level for the first time.
This course provides undergraduate computer science students with an understanding of computational thinking and approaches for solving large, complex problems. The course aims to develop skills like decomposition, pattern recognition, abstraction, algorithm design, and the ability to communicate and work with others to achieve solutions. Topics covered include computational complexity, the pillars of computational thinking, probability, statistics, optimization problems and approaches, and graph theory. Students will learn through lectures, self-study, group activities, peer assessment and presentations. Resources and labs are available for coursework and assignments, which are assessed through continuous evaluation and a final exam.
Moodle quiz: towards post-paper e-assessmentmathewhillier
Using Moodle quiz for assessments that begin to leverage the affordances of ICTs that go beyond the capabilities of paper equivalents - post paper assessment task design examples.
This document provides an overview of the CIS 115 Introduction to Programming and Logic course. The course introduces computer programming, problem solving, and logic. Topics covered include syntax, data types, program organization, algorithm design, control structures, and object-oriented programming. Upon completing the course, students should be able to write programs using variables, arrays, functions, files and recursion, as well as implement sorting and searching algorithms.
Machine Learning: Foundations Course Number 0368403401butest
This machine learning foundations course will consist of 4 homework assignments, both theoretical and programming problems in Matlab. There will be a final exam. Students will work in groups of 2-3 to take notes during classes in LaTeX format. These class notes will contribute 30% to the overall grade. The course will cover basic machine learning concepts like storage and retrieval, learning rules, estimating flexible models, and applications in areas like control, medical diagnosis, and document retrieval.
Liqiang He is seeking an internship applying mathematics. He has a Bachelor's degree in Applied
Mathematics from UCSD and skills in programming languages like Java, operating systems, and software
like MATLAB. He is fluent in Chinese and has taken courses in calculus, linear algebra, statistics, and
computer science. His experience includes using probabilistic models in finance, analyzing life insurance
with survival distributions, solving optimization problems, presenting mathematical proofs, and coding
simple apps.
Machine Learning: Foundations Course Number 0368403401butest
This machine learning course will cover theoretical and practical machine learning concepts. It will include 4 homework assignments and programming in Matlab. Lectures will be supplemented by student-submitted class notes in LaTeX. Topics will include learning approaches like storage and retrieval, rule learning, and flexible model estimation, as well as applications in areas like control, medical diagnosis, and web search. A final exam format has not been determined yet.
Machine Learning: Foundations Course Number 0368403401butest
This machine learning course will cover theoretical and practical machine learning concepts. It will include 4 homework assignments and programming in Matlab. Lectures will be supplemented by student-submitted class notes in LaTeX. Topics will include learning approaches like storage and retrieval, rule learning, and flexible model estimation, as well as applications in areas like control, medical diagnosis, and web search. A final exam format has not been determined yet.
Classes without Dependencies - UseR 2018Sam Clifford
Presented by Sam Clifford at the 2018 UseR conference, Brisbane, Australia. The talk describes the design of SEB113 - Quantitative Methods in Science, a first year statistics/mathematics unit in the Bachelor of Science at Queensland University of Technology. The unit uses RStudio and the tidyverse packages to give students the skills to do meaningful data manipulation and analysis without relying on prior knowledge of advanced mathematics.
The Graduate Aptitude Test in Engineering (GATE) is an exam administered by IISc Bangalore and 7 IITs on behalf of the MHRD. It is used for admission to postgraduate programs in IITs, IISc, and other institutes. Scores are also used for recruitment by PSUs. The 2018 GATE exam will be held on February 3-4 and 10-11 in 8 zones across India. It will consist of 65 multiple choice and numerical answer questions testing general aptitude and engineering subjects. Scores are required for MTech programs, PSU jobs, research fellowships, and other opportunities. Intensive practice of previous papers and focusing on high-weightage topics
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
This document is an overview for an online test series program called "We Create Problems" that aims to help learners develop problem-solving skills and prepare for technical hiring tests. The 10-week test series includes tests on topics like logic, data structures, algorithms, system design and cognitive skills. It provides practice with real hiring test questions, analysis of test papers, and interactions with industry professionals to help learners gain experience and clarity on technical problem-solving.
This document provides an overview and curriculum for an online test series to prepare for technical hiring tests. The 10-week test series includes 8 topic-based tests and 2 combined tests covering topics like logic, algorithms, data structures, system design, problem solving and cognitive skills. The goal is to help learners clear a tech hiring test in 3 attempts by gaining experience with questions commonly asked in online hiring drives and learning essential problem-solving skills. Weekly tests are followed by revision sessions with professionals and webinars bring graduates and hiring managers together.
This document provides information on the "Intelligent Systems" module, including its code, level, credit points, location, coordinator, content, aims, learning outcomes, teaching methods, and assessment. The module introduces students to intelligent techniques like fuzzy logic, neural networks, and genetic algorithms through both theoretical and practical lessons. Students will learn to design and implement intelligent systems using MATLAB software. Assessment includes coursework assignments and a final written exam.
This document provides information on the "Intelligent Systems" module, including its code, level, credit points, location, coordinator, content, aims, learning outcomes, teaching methods, and assessment. The module introduces students to intelligent techniques like fuzzy logic, neural networks, and genetic algorithms through both theoretical and practical implementations in Matlab. Students will learn to design and apply these systems to solve problems in computing and engineering.
This document outlines a course on C programming for problem solving. The course is divided into 5 modules that cover topics such as computer hardware and software, an overview of C, input/output operations and conditional branching/loops, arrays and searching/sorting algorithms, user-defined functions and recursion, and structures and pointers. The course aims to teach students how to illustrate algorithms, construct programming solutions in C, identify and correct errors, and modularize problems using functions and structures. Students will be assessed through a 3-hour exam consisting of 5 out of 10 full questions, each with subquestions covering the topics in that module. Suggested textbooks and reference materials are also provided.
This document provides information on the Computer Engineering program at Malayan Colleges Laguna. It outlines the program's mission, vision, educational objectives, student outcomes, and course descriptions. The program aims to provide students with technical skills to become competent engineers. It seeks to develop students' fundamental understanding of computer engineering concepts. Graduates are expected to have abilities such as applying knowledge of math and science to solve problems, designing systems to meet needs, and engaging in lifelong learning. The document also provides details on the Discrete Elements course, including its topics, learning objectives, assessment methods and references.
The joint task force of ACM and IEEE Computer Society released recent guidelines for undergraduate computer science majors late in 2013. Since that time, many computer science departments have reviewed the included recommendations and exemplars from various institutions, and made changes to the programs that they offer. In this panel, we will share the experiences of the panelists from a variety of computer science programs in reviewing and responding to the new curriculum guidelines. The panel hopes to generate additional discussion about new knowledge areas and models for incorporating recommended content into programs at small, liberal arts institutions.
Dynamic Question Answer Generator An Enhanced Approach to Question Generationijtsrd
Teachers and educational institutions seek new questions with different difficulty levels for setting up tests for their students. Also, students long for distinct and new questions to practice for their tests as redundant questions are found everywhere. However, setting up new questions every time is a tedious task for teachers. To overcome this conundrum, we have concocted an artificially intelligent system which generates questions and answers for the mathematical topic –Quadratic equations. The system uses i Randomization technique for generating unique questions each time and ii First order logic and Automated deduction to produce solution for the generated question. The goal was achieved and the system works efficiently. It is robust, reliable and helpful for teachers, students and other organizations for retrieving Quadratic equations questions, hassle free. Rahul Bhatia | Vishakha Gautam | Yash Kumar | Ankush Garg ""Dynamic Question Answer Generator: An Enhanced Approach to Question Generation"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23730.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23730/dynamic-question-answer-generator-an-enhanced-approach-to-question-generation/rahul-bhatia
This document outlines the examination scheme for M.Sc. Computer Science program for the 2019-20 academic year. It provides details on eligibility criteria, pass criteria, classification of successful candidates, instructions for paper setters, workload expectations, and instructions for students and examiners. It includes the scheme of examination, listing the papers, duration, marks distribution, etc. for both M.Sc. previous and final year examinations. It also provides syllabus details and reading materials for some of the papers.
1 Saint Leo University GBA 334 Applied Decision.docxaryan532920
The film The Godfather explores the theme of revenge. When Michael Corleone's father Vito is attacked, Michael seeks revenge by killing the ones responsible. This act of vengeance draws Michael deeper into the family crime business. Throughout the film, Michael takes revenge on anyone who wrongs or betrays his family, solidifying his role as the new head of the crime family. Cinematography in The Godfather features unique shots and scenes that helped introduce new techniques to films.
Talk at 8th Annual Central and Eastern European Software Engineering Conference in Russia
Nov 2, 2012
Moscow, Russia
Note that original (downloadable) .pptx file has embedded videos.
Introduction to Model-Based Machine LearningDaniel Emaasit
The field of machine learning has seen the development of thousands of learning algorithms. Typically, scientists choose from these algorithms to solve specific problems. Their choices often being limited by their familiarity with these algorithms. In this classical/traditional framework of machine learning, scientists are constrained to making some assumptions so as to use an existing algorithm. This is in contrast to the model-based machine learning approach which seeks to create a bespoke solution tailored to each new problem.
Paper Mate Write Bros Ballpoint Pens, Medium PRichard Hogue
The passage discusses the positive and negative impacts of social media and mass media on adolescents. It notes that teens spend up to 11 hours per day on social media and are exposed to media via electronics. Social media influences how teens dress, act, talk and what they discuss. While mass media can shape adolescent minds and ideas, it remains unclear whether the overall impact is positive or negative. The passage intends to explore this issue by analyzing social media's impacts through sociological lenses of identity and groupthink.
Writing Phrases Best Essay Writing Service, Essay WritRichard Hogue
The document discusses the social and environmental impacts of vertical integration in the banana export trade in Honduras. It explains that in the late 1800s, Honduras was the largest banana exporter to the US. Over time, banana production gradually transitioned from small, individual farms to large monopolies controlled by three major companies. This led to significant social and environmental changes in Honduras as the companies consolidated land and power over banana production and export.
More Related Content
Similar to Automatically Answering And Generating Machine Learning Final Exams
Machine Learning: Foundations Course Number 0368403401butest
This machine learning course will cover theoretical and practical machine learning concepts. It will include 4 homework assignments and programming in Matlab. Lectures will be supplemented by student-submitted class notes in LaTeX. Topics will include learning approaches like storage and retrieval, rule learning, and flexible model estimation, as well as applications in areas like control, medical diagnosis, and web search. A final exam format has not been determined yet.
Machine Learning: Foundations Course Number 0368403401butest
This machine learning course will cover theoretical and practical machine learning concepts. It will include 4 homework assignments and programming in Matlab. Lectures will be supplemented by student-submitted class notes in LaTeX. Topics will include learning approaches like storage and retrieval, rule learning, and flexible model estimation, as well as applications in areas like control, medical diagnosis, and web search. A final exam format has not been determined yet.
Classes without Dependencies - UseR 2018Sam Clifford
Presented by Sam Clifford at the 2018 UseR conference, Brisbane, Australia. The talk describes the design of SEB113 - Quantitative Methods in Science, a first year statistics/mathematics unit in the Bachelor of Science at Queensland University of Technology. The unit uses RStudio and the tidyverse packages to give students the skills to do meaningful data manipulation and analysis without relying on prior knowledge of advanced mathematics.
The Graduate Aptitude Test in Engineering (GATE) is an exam administered by IISc Bangalore and 7 IITs on behalf of the MHRD. It is used for admission to postgraduate programs in IITs, IISc, and other institutes. Scores are also used for recruitment by PSUs. The 2018 GATE exam will be held on February 3-4 and 10-11 in 8 zones across India. It will consist of 65 multiple choice and numerical answer questions testing general aptitude and engineering subjects. Scores are required for MTech programs, PSU jobs, research fellowships, and other opportunities. Intensive practice of previous papers and focusing on high-weightage topics
Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering
The automatic question answering (QA) task has long been considered a primary objective of artificial intelligence.
Among the QA sub-systems, we focused on answer-ranking part. In particular, we investigated a novel neural network architecture with additional data clustering module to improve the performance in ranking answer candidates which are longer than a single sentence. This work can be used not only for the QA ranking task, but also to evaluate the relevance of next utterance with given dialogue generated from the dialogue model.
In this talk, I'll present our research results (NAACL 2018), and also its potential use cases (i.e. fake news detection). Finally, I'll conclude by introducing some issues on previous research, and by introducing recent approach in academic.
This document is an overview for an online test series program called "We Create Problems" that aims to help learners develop problem-solving skills and prepare for technical hiring tests. The 10-week test series includes tests on topics like logic, data structures, algorithms, system design and cognitive skills. It provides practice with real hiring test questions, analysis of test papers, and interactions with industry professionals to help learners gain experience and clarity on technical problem-solving.
This document provides an overview and curriculum for an online test series to prepare for technical hiring tests. The 10-week test series includes 8 topic-based tests and 2 combined tests covering topics like logic, algorithms, data structures, system design, problem solving and cognitive skills. The goal is to help learners clear a tech hiring test in 3 attempts by gaining experience with questions commonly asked in online hiring drives and learning essential problem-solving skills. Weekly tests are followed by revision sessions with professionals and webinars bring graduates and hiring managers together.
This document provides information on the "Intelligent Systems" module, including its code, level, credit points, location, coordinator, content, aims, learning outcomes, teaching methods, and assessment. The module introduces students to intelligent techniques like fuzzy logic, neural networks, and genetic algorithms through both theoretical and practical lessons. Students will learn to design and implement intelligent systems using MATLAB software. Assessment includes coursework assignments and a final written exam.
This document provides information on the "Intelligent Systems" module, including its code, level, credit points, location, coordinator, content, aims, learning outcomes, teaching methods, and assessment. The module introduces students to intelligent techniques like fuzzy logic, neural networks, and genetic algorithms through both theoretical and practical implementations in Matlab. Students will learn to design and apply these systems to solve problems in computing and engineering.
This document outlines a course on C programming for problem solving. The course is divided into 5 modules that cover topics such as computer hardware and software, an overview of C, input/output operations and conditional branching/loops, arrays and searching/sorting algorithms, user-defined functions and recursion, and structures and pointers. The course aims to teach students how to illustrate algorithms, construct programming solutions in C, identify and correct errors, and modularize problems using functions and structures. Students will be assessed through a 3-hour exam consisting of 5 out of 10 full questions, each with subquestions covering the topics in that module. Suggested textbooks and reference materials are also provided.
This document provides information on the Computer Engineering program at Malayan Colleges Laguna. It outlines the program's mission, vision, educational objectives, student outcomes, and course descriptions. The program aims to provide students with technical skills to become competent engineers. It seeks to develop students' fundamental understanding of computer engineering concepts. Graduates are expected to have abilities such as applying knowledge of math and science to solve problems, designing systems to meet needs, and engaging in lifelong learning. The document also provides details on the Discrete Elements course, including its topics, learning objectives, assessment methods and references.
The joint task force of ACM and IEEE Computer Society released recent guidelines for undergraduate computer science majors late in 2013. Since that time, many computer science departments have reviewed the included recommendations and exemplars from various institutions, and made changes to the programs that they offer. In this panel, we will share the experiences of the panelists from a variety of computer science programs in reviewing and responding to the new curriculum guidelines. The panel hopes to generate additional discussion about new knowledge areas and models for incorporating recommended content into programs at small, liberal arts institutions.
Dynamic Question Answer Generator An Enhanced Approach to Question Generationijtsrd
Teachers and educational institutions seek new questions with different difficulty levels for setting up tests for their students. Also, students long for distinct and new questions to practice for their tests as redundant questions are found everywhere. However, setting up new questions every time is a tedious task for teachers. To overcome this conundrum, we have concocted an artificially intelligent system which generates questions and answers for the mathematical topic –Quadratic equations. The system uses i Randomization technique for generating unique questions each time and ii First order logic and Automated deduction to produce solution for the generated question. The goal was achieved and the system works efficiently. It is robust, reliable and helpful for teachers, students and other organizations for retrieving Quadratic equations questions, hassle free. Rahul Bhatia | Vishakha Gautam | Yash Kumar | Ankush Garg ""Dynamic Question Answer Generator: An Enhanced Approach to Question Generation"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23730.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23730/dynamic-question-answer-generator-an-enhanced-approach-to-question-generation/rahul-bhatia
This document outlines the examination scheme for M.Sc. Computer Science program for the 2019-20 academic year. It provides details on eligibility criteria, pass criteria, classification of successful candidates, instructions for paper setters, workload expectations, and instructions for students and examiners. It includes the scheme of examination, listing the papers, duration, marks distribution, etc. for both M.Sc. previous and final year examinations. It also provides syllabus details and reading materials for some of the papers.
1 Saint Leo University GBA 334 Applied Decision.docxaryan532920
The film The Godfather explores the theme of revenge. When Michael Corleone's father Vito is attacked, Michael seeks revenge by killing the ones responsible. This act of vengeance draws Michael deeper into the family crime business. Throughout the film, Michael takes revenge on anyone who wrongs or betrays his family, solidifying his role as the new head of the crime family. Cinematography in The Godfather features unique shots and scenes that helped introduce new techniques to films.
Talk at 8th Annual Central and Eastern European Software Engineering Conference in Russia
Nov 2, 2012
Moscow, Russia
Note that original (downloadable) .pptx file has embedded videos.
Introduction to Model-Based Machine LearningDaniel Emaasit
The field of machine learning has seen the development of thousands of learning algorithms. Typically, scientists choose from these algorithms to solve specific problems. Their choices often being limited by their familiarity with these algorithms. In this classical/traditional framework of machine learning, scientists are constrained to making some assumptions so as to use an existing algorithm. This is in contrast to the model-based machine learning approach which seeks to create a bespoke solution tailored to each new problem.
Similar to Automatically Answering And Generating Machine Learning Final Exams (20)
Paper Mate Write Bros Ballpoint Pens, Medium PRichard Hogue
The passage discusses the positive and negative impacts of social media and mass media on adolescents. It notes that teens spend up to 11 hours per day on social media and are exposed to media via electronics. Social media influences how teens dress, act, talk and what they discuss. While mass media can shape adolescent minds and ideas, it remains unclear whether the overall impact is positive or negative. The passage intends to explore this issue by analyzing social media's impacts through sociological lenses of identity and groupthink.
Writing Phrases Best Essay Writing Service, Essay WritRichard Hogue
The document discusses the social and environmental impacts of vertical integration in the banana export trade in Honduras. It explains that in the late 1800s, Honduras was the largest banana exporter to the US. Over time, banana production gradually transitioned from small, individual farms to large monopolies controlled by three major companies. This led to significant social and environmental changes in Honduras as the companies consolidated land and power over banana production and export.
Examples How To Write A Persuasive Essay - AckerRichard Hogue
1. The study examined how monolingual and bilingual infants develop associative word learning abilities between 12-14 months of age.
2. Infants were tested using a looking-while-listening preferential looking paradigm to assess whether they could learn to associate novel words with novel objects.
3. The results showed that both monolingual and bilingual infants were able to learn novel word-object associations at 14 months of age, but bilingual infants did not demonstrate this ability at 12 months of age like monolingual infants did.
Netflix is expanding its global operations to over 200 countries in the next two years to increase its international revenue. Currently international markets make up 27% of Netflix's revenue but the company expects this to grow to 80% of total revenue. Netflix faces challenges expanding globally due to competition and regulatory issues. Strategic planning will be required to ensure Netflix's global expansion is successful and allows it to maintain its competitive edge in the streaming industry.
Best Tips On How To Write A Term Paper Outline, FormRichard Hogue
The document provides instructions for requesting writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email; 2) Complete a 10-minute order form with instructions, sources, and deadline; 3) Review bids from writers and choose one based on qualifications; 4) Review the completed paper and authorize payment if satisfied; 5) Request revisions until fully satisfied, with a refund option for plagiarized work. The document promises original, high-quality content and full satisfaction of needs.
Formal Letter In English For Your Needs - Letter TemplRichard Hogue
The document provides instructions for requesting an assignment writing from HelpWriting.net. It outlines a 5 step process: 1) Create an account, 2) Complete an order form providing instructions and deadline, 3) Review bids from writers and select one, 4) Review the completed paper and authorize payment, 5) Request revisions until satisfied. It emphasizes the site's commitment to original, high-quality content and offering refunds for plagiarized work.
Get Essay Help You Can Get Essays Written For You ByRichard Hogue
The document outlines a 5-step process for getting essay help from HelpWriting.net, which includes creating an account, submitting a request with instructions and sources, choosing a bid from qualified writers, reviewing and authorizing payment for completed work, and requesting revisions if needed. It emphasizes that HelpWriting.net provides original, high-quality content and refunds plagiarized work to ensure customer satisfaction.
The passage describes the House and Ballroom scene, an intentional LGBTQ community founded by Latino and black people. Members formed houses that provided care and protection. Houses competed against each other at balls in categories like dance. Wealth, glamour, and status were important in ballroom culture. The scene offered acceptance and a comfort zone for LGBTQ people escaping unsupportive families.
Pin By Cindy Campbell On GrammarEnglish Language ERichard Hogue
1. The document outlines the steps to request a paper writing service from HelpWriting.net, including creating an account, completing an order form, reviewing writer bids, authorizing payment, and requesting revisions.
2. Writers utilize a bidding system, and customers can choose a writer based on qualifications, history, and feedback. Customers receive the paper, ensure it meets expectations, and authorize final payment.
3. HelpWriting.net promises original, high-quality content and offers refunds for plagiarism. Customers can request multiple revisions to ensure satisfaction.
How To Write Evaluation Paper. Self Evaluation EssRichard Hogue
This document provides steps for requesting and completing an assignment writing request through the HelpWriting.net platform:
1. Create an account with a valid email and password.
2. Complete a 10-minute order form providing instructions, sources, deadline, and attaching a sample of your writing if desired.
3. Review bids from writers and choose one based on qualifications, history, and feedback, then pay a deposit to start the assignment.
4. Review the completed paper and authorize final payment if pleased, or request revisions using the free revision policy.
Pumpkin Writing Page (Print Practice) - Made By TeachRichard Hogue
The document provides instructions for requesting writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete a 10-minute order form providing instructions, sources, and deadline. 3) Review bids from writers and select one based on qualifications. 4) Review the completed paper and authorize payment if satisfied. 5) Request revisions until fully satisfied, with a refund option for plagiarized work. The document promises original, high-quality content meeting all needs.
What Is The Best Way To Write An Essay - HazelNeRichard Hogue
The document discusses equal employment opportunity, affirmative action, and workforce diversity as key concepts in public human resources management in the United States. It notes that human resource managers must pay careful attention to these concepts and their underlying values. The abstract previews that the full document will cover cases, laws, philosophies, and values related to equal employment opportunity, affirmative action, and diversity, as well as their future implications.
The Importance Of Reading Books Free Essay ExampleRichard Hogue
The document provides instructions for requesting and completing an assignment writing request on the HelpWriting.net website. It outlines a 5-step process: 1) Create an account; 2) Complete an order form with instructions and deadline; 3) Review bids from writers and choose one; 4) Review the completed paper and authorize payment; 5) Request revisions until satisfied. The purpose is to help students obtain high-quality original content by writing their assignments.
Narrative Essay Personal Leadership Style EssayRichard Hogue
The document discusses working capital management at the Heavy Engineering Division of Larsen & Toubro Limited, including analyzing accounts receivable, accounts payable, and inventory management. It provides an overview of the business activities of Larsen & Toubro and its Heavy Engineering Division. The report compares the performance of Heavy Engineering Division to other Indian and foreign companies in key working capital elements.
Thesis Introduction Examples Examples - How To Write A TheRichard Hogue
The passage discusses customer privacy issues in the hospitality service industry. It notes that while technology helps businesses operate more efficiently, it also increases the risk of data breaches and privacy concerns. The hospitality industry in particular handles sensitive customer data like payment information. Several hotel chains have reported data breaches in recent years, highlighting the need for stronger privacy protections in this sector given the large amounts of financial data collected and stored.
Literature Review Thesis Statemen. Online assignment writing service.Richard Hogue
The document discusses the differences between bipolar type I and type II disorders. Bipolar type I is characterized by episodes of mania that last at least one week, along with depressive episodes. The research will focus specifically on bipolar type I in children and youth. Diagnostic criteria for bipolar disorders are outlined in the DSM-5.
008 Essay Writing Competitions In India CustRichard Hogue
Resistance art emerged in the mid-1970s in South Africa after the Soweto uprising. It focused on resisting apartheid and celebrating African strength and unity. Resistance art can represent different points of view and elicit emotional responses in viewers that can inspire social change or revolution. While art in the 1800s and 1900s also contained elements of resistance, it became more direct and overt in the mid-1970s, often led by activist artists focusing on political and social issues. Art has the power to engage people in different ways and reflect or drive social change.
A LEVEL SOCIOLOGY 20 MARK GENDER SOCUS. Online assignment writing service.Richard Hogue
Here are the key points required for a search warrant essay:
- A search warrant is required by the Fourth Amendment to the U.S. Constitution to conduct a search of a private property. This protects citizens from unreasonable searches and seizures by the government.
- To obtain a search warrant, the police must have probable cause that evidence of a crime will be found on the private property. They present sworn testimony and evidence to a judge who determines if probable cause exists.
- The search warrant must specify the location to be searched and the items that can be seized as evidence. This prevents general "fishing expeditions" by law enforcement.
- If the police conduct a search without a valid warrant, any evidence found may
Composition Writing Meaning. How To Write A DRichard Hogue
1. The document provides instructions for how to request and complete an assignment writing request on the HelpWriting.net website. It outlines a 5-step process: create an account, submit a request form with instructions and deadline, choose a bid from qualified writers, review and authorize payment for the completed assignment, and request revisions if needed.
2. The site uses a bidding system where writers submit bids to complete assignment requests. Customers can choose a writer based on qualifications, order history, and feedback to start working on the assignment.
3. HelpWriting.net promises original, high-quality content and offers refunds if work is plagiarized. Customers can request multiple revisions to ensure satisfaction.
Get Essay Writing Help At My Assignment Services By Our HighlyRichard Hogue
This document provides instructions for getting essay writing help from the website HelpWriting.net. It outlines a 5-step process: 1) Create an account with an email and password. 2) Complete an order form with instructions, sources, and deadline. 3) Review bids from writers and choose one. 4) Review the paper and authorize payment. 5) Request revisions until satisfied. It emphasizes that original, high-quality content is guaranteed or a full refund will be provided.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
How to Fix the Import Error in the Odoo 17Celine George
An import error occurs when a program fails to import a module or library, disrupting its execution. In languages like Python, this issue arises when the specified module cannot be found or accessed, hindering the program's functionality. Resolving import errors is crucial for maintaining smooth software operation and uninterrupted development processes.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Automatically Answering And Generating Machine Learning Final Exams
1. Automatically Answering and Generating
Machine Learning Final Exams
Sarah Zhang
EECS
MIT
sazhang@mit.edu
Reece Shuttleworth
EECS
MIT
rshuttle@mit.edu
Zad Chin
CS
Harvard University
zadchin@college.harvard.edu
Pedro Lantigua
EECS
MIT
lantigua@mit.edu
Saisamrit Surbehera
CS
Columbia University
ss6365@columbia.edu
Gregory Hunter
CS
Columbia University
geh2129@columbia.edu
Derek Austin
CS
Columbia University
da2986@columbia.edu
Yann Hicke
CS
Cornell University
ylh8@cornell.edu
Leonard Tang
Mathematics
Harvard University
leonardtang@college.harvard.edu
Sathwik Karnik
EECS
MIT
skarnik@mit.edu
Darnell Granberry
EECS
MIT
darnellg@mit.edu
Iddo Drori
CSAIL and CS
MIT, Columbia University, Boston University
idrori@mit.edu,idrori@cs.columbia.edu,idrori@bu.edu
Abstract
Can a machine learn machine learning? We propose to answer this question
using the same criteria we use to answer a similar question: can a human learn
machine learning? We automatically answer final exams in MIT’s, Harvard’s and
Cornell’s large machine learning courses and generate new questions at a human
level. Recently, program synthesis and few-shot learning solved university-level
problem set questions in mathematics and STEM courses at a human level. In
this work, we solve questions from final exams that differ from problem sets in
several ways: the questions are longer, have multiple parts, are more complicated,
and span a broader set of topics. We provide a new dataset and benchmark of
questions from machine learning final exams and code for automatically answering
these questions and generating new questions. To make our dataset a reproducible
benchmark, we use automatic checkers for multiple choice questions, questions
with numeric answers, and questions with expression answers, and evaluate a large
free language model, Meta’s OPT, and compare the results with Open AI’s GPT-3,
ChatGPT, and Codex. A student survey comparing the quality, appropriateness, and
difficulty of machine-generated questions with human-written questions shows that
across multiple aspects, machine-generated questions are indistinguishable from
Preprint. Under review.
arXiv:2206.05442v5
[cs.LG]
23
Dec
2022
2. Figure 1: 500 MIT students taking the final exam in Introduction to Machine Learning in Fall 2021
(no student faces or identifying information appears in the image to maintain anonymity). The exam
is three hours long and is taken in an indoor track-and-field stadium accommodating many students.
human-generated questions and are suitable for final exams. We perform ablation
studies comparing zero-shot learning with few-shot learning, chain-of-thought
prompting, GPT-3, ChatGPT, and OPT pre-trained on text and Codex fine-tuned on
code on a range of machine learning topics and find that few-shot learning methods
perform best. We make our data and code publicly available for the machine
learning community.
1 Introduction
Can a machine learn machine learning? This work presents a new dataset of machine learning final
exams with 646 question parts and a benchmark of baselines using transformers and their respective
grade performance, demonstrating that the best baseline performs at a human level. In university-
level STEM courses, students complete assignments (including problem sets and labs) and exams
throughout the course. Recent work has opened the door for a machine to solve course problem sets
[4] using language models and few-shot learning. However, final exams remain challenging, and
this work is the first to present a structured dataset of machine learning finals and a benchmark of
baseline methods for answering them. Final exams differ from problem sets because they serve as a
benchmark of cumulative understanding of material learned over a semester and evaluate the students’
depth and breadth of expertise. Further, questions on final exams are longer, have multiple parts, span
a broader set of topics, and are more complicated and nuanced.
All the above holds for MIT’s and Cornell’s Introduction to Machine Learning classes and Harvard’s
Machine Learning course. These are undergraduate courses with hundreds of students each semester,
making them the largest undergraduate courses offered. Figure 1 illustrates the size of machine
learning courses, showing 500 MIT students taking the final exam in Introduction to Machine
Learning in Fall 2021 (no student faces or identifying information appears in the image to maintain
anonymity). The final exam takes three hours and is performed in an indoor track-and-field stadium
that accommodates many students. Introduction to Machine Learning is a core class in the computer
science program. The prerequisites for the course are Python Programming and Multivariate Calculus,
with Introduction to Algorithms and Linear Algebra recommended. The class typically consists of
weekly exercises, labs, quizzes, homework, a midterm, and a final exam. There were no final exams
in at MIT for Fall 2020 and Spring 2020 due to COVID-19.
Introduction to Machine Learning final exams differ from problem sets in several ways, and the
experience of solving each varies. First, finals are long, containing around nine questions with
around seven parts each. Final exam questions are also multifaceted and multi-stepped: different
parts of a single question require applying different concepts and problem-solving skills, and parts
may build upon each other. While weekly problem sets focus on a single topic, finals span topics
from the entire semester. Further, final questions are often story-based problems that may require
mathematical modeling. Due to the time constraint of these exams, finals are also designed to test
2
3. core understanding and application of course material over rote calculations. Thus, asking a machine
to answer questions from finals allows for testing whether the model is able to learn a breadth and
depth of topics beyond problem sets.
In this work, we present a new dataset curated from final exams of MIT’s and Cornell’s Introduction
to Machine Learning course and Harvard’s Machine Learning class, totaling 646 question parts
spanning the broad range of topics in machine learning: regression, classifiers, logistic regression,
features, loss functions, neural networks, convolutional neural networks (CNNs), Markov decision
processes (MDPs), recurrent neural networks (RNNs), reinforcement learning, clustering, decision
trees, model selection, ensemble methods, Bayesian networks, hidden Markov models (HMMs), and
optimization. Our dataset covers the finals given at MIT for semesters of Fall 2017, Spring 2018,
Fall 2018, Spring 2019, Fall 2019, Spring 2021, Fall 2021, and Spring 2022, Harvard Spring 2015
and Spring 2021, and Cornell Spring 2017 and Fall 2018. Due to the COVID-19 pandemic, no finals
were in the MIT course during 2020.
Table 1: MIT Spring 2021, Question 10.a: question part, solution, Codex Zero-Shot answer, program output,
GPT-3 Zero-Shot answer, and GPT-3 Zero-Shot with CoT prompt answer.
Topic Classifiers
Question Part Given a set of data Dtrain = {(x(i)
, y(i)
)}, a weighted nearest neighbor
regressor has the form
h(x; θ) =
P
(x(i),y(i))∈Dtrain
f(x, x(i)
; θ)y(i)
P
(x(i),y(i))∈Dtrain
f(x, x(i); θ)
.
A typical choice for f is
f(x, x′
; θ) = e−θkx−x′
k2
where θ is a scalar and kx − x′
k2
=
Pd
j=1(xj − x′
j)2
. Assume our
training data Dtrain = ((1, 1), (2, 2), (3, 6)). What is h(10; 0)? That is,
letting θ = 0, what is our prediction for x = 10?
Solution 3
Codex Zero-Shot Answer import numpy as np
def f(x, x_i, theta):
return np.exp(-theta * np.linalg.norm(x - x_i)**2)
def h(x, theta):
D = np.array([(1, 1), (2, 2), (3, 6)])
numerator = 0
denominator = 0
for x_i, y_i in D:
numerator += f(x, x_i, theta) * y_i
denominator += f(x, x_i, theta)
return numerator / denominator
print(h(10, 0))
Program Output 3.0
GPT-3 Zero-Shot Answer h(10; 0) = f(10,1,0)y(1)
+f(10,2,0)y(2)
+f(10,3,0)y(3)
f(10,1,0)+f(10,2,0)+f(10,3,0) = 1·1+1·2+1·6
1+1+1 =
9
3 = 3
GPT-3 ZS with CoT Answer f(10, (1, 1), 0) = e−0k10−(1,1)k2
= e−0k9k2
= e−0·81
= 1
f(10, (2, 2), 0) = e−0k10−(2,2)k2
= e−0k8k2
= e−0·64
= 1
f(10, (3, 6), 0) = e−0k10−(3,6)k2
= e−0k7k2
= e−0·49
= 1
3
4. Grading: Human Human Human Machine
Answers: Human Human Machine Machine
Questions: All Non-Image Non-Image Non-Image Non-Open
MIT Spring 2021 75.84 80.77 62.09 64
MIT Fall 2021 74.38 60.88 58.94 51.33
MIT Spring 2022 69.07 70.82 68.86 73.53
Mean 73.10 70.82 63.29 62.95
Table 2: Human and machine grading of human and machine solved final exams. Mean human and
machine grades on Introduction to Machine Learning final exams by semester. MIT Spring 2021,
MIT Fall 2021, and MIT Spring 2022 final exams were unavailable online when GPT-3 and Codex
were trained, ensuring that our results are not due to overfitting. Non-image grades consider question
parts that do not contain images that are required for solving the question.
h(10, 0) =
P
(x(i),y(i))∈Dtrain
f(x,x(i)
,θ)y(i)
P
(x(i),y(i))∈Dtrain
f(x,x(i),θ)
= 1·1+1·2+1·6
1+1+1 = 9
3 = 3
We verify that our results are not due to overfitting by including MIT finals from 2021 and 2022 that
are unavailable online. Also, the MIT Spring 2022 final exam was given after GPT-3 and Codex were
last updated, which means that the models were not trained on this data. The final exam questions
have many parts, each posing a new problem, and each question in the dataset corresponds to one
part. The questions in the finals are varied in topics and solution types. Solutions are primarily
open-ended questions with some true/false and multiple-choice questions on theory, math, and code
implementations.
We make the dataset publicly available and welcome others to use it to aid in developing and assessing
new language models and methods. Due to the diversity of Intro to ML final questions, our dataset
uniquely assesses advanced problem-solving and reasoning skills in machine learning, math, and
natural language processing. This dataset opens the door to achieving breakthroughs in machine
learning performance in machine learning final exams. In addition to the dataset, we present a
benchmark using several baseline methods. We apply zero-shot and few-shot learning to GPT-3
and Codex, adding chain-of-thought prompting for GPT-3. We find that few-shot learning methods
perform best. As shown in Table 2 the best performing methods pass the final exams, and their grade
is comparable with human grades of MIT students on the same machine learning finals evaluated
by the same human graders. We generate new final exam questions that are indistinguishable from
human-written questions.
1.1 Related Work
There is often thought that humans are generalists, whereas machines are specialists. However, large
language models based on transformers such as GPT-3 [1], Gopher [9], and PaLM [3], also called
foundation models, are generalist learners. Specifically, in our setting, while humans care about the
number of topics in an exam and therefore find finals more difficult than problem sets, foundation
models effortlessly scale to many topics without re-training. Language models may be pre-trained
on text and fine-tuned on specific datasets such as code, for example OpenAI’s Codex [2], which
allows generating programs from text. There are several ways to improve the mathematical reasoning
ability of language models: (1) using chain-of-thought (CoT) prompting [6, 14], (2) using the top-k
ranking solutions [7] and merging them by voting [13] or least-to-most prompting [15], and (3) using
program synthesis and few-shot learning to generate code that answers questions [4].
Much of the prior work focuses on high school or middle school level material [8]. The first work to
tackle university-level machine learning course problem set questions [12] used a transformer and
GNN architecture and heavily relied on data augmentation. This resulted in overfitting and did not
scale up to other types of questions or courses. Probability and statistics course problem-set questions
have been answered [11] by probabilistic program synthesis with human performance. Problem-set
questions from the core university math courses [4] have been automatically solved using few-shot
learning and program synthesis at a human level. Other work considers university-level course
questions across a variety of domains [5] and identifying theorems [10]. Prior work on question
4
5. Semester Questions Parts
MIT Fall 2017 10 61
MIT Spring 2018 9 42
MIT Fall 2018 10 60
MIT Spring 2019 9 58
MIT Fall 2019 8 61
MIT Spring 2021 13 71
MIT Fall 2021 8 86
MIT Spring 2022 9 59
Harvard Spring 2015 8 12
Harvard Spring 2021 6 32
Cornell Spring 2017 30 48
Cornell Fall 2018 29 56
Mean 12.42 53.83
Total 149 646
Topic Questions Parts
Regression 10 62
Classifiers 24 85
Logistic Regression 3 10
Features 3.5 21
Neural Networks 19.5 87
Loss Functions 4 16
CNNs 9 65
MDPs 10 77
RNNs 7 33
Reinforcement Learning 11 60
Clustering 5 17
Decision Trees 14 48
Model Selection 5 16
Ensemble Methods 9 20
Bayesian Networks 1 6
HMMs 1 4
Optimization 10 16
Bonus/Name 3 3
Mean 8.28 35.89
Total 149 646
Table 3: The number of questions and parts in the final for each semester and topic of Introduction
to Machine Learning. MIT Spring 2020 and Fall 2020 did not have final exams due to COVID-19.
Topics can have half-questions attributed to them if a question has some parts under one topic and the
other parts under another topic.
generation includes question–answer pair generation based on a text passage [8] and question text
generation based on other questions [4].
2 Dataset
We present a new dataset of 646 question parts from a dozen recent final exams of MIT’s and Cornell’s
Introduction to Machine Learning courses and Harvard’s Machine Learning class. The dataset spans
questions on the 17 machine learning topics covered in the courses: (1) regression, (2) classifiers,
(3) logistic regression, (4) features, (5) loss functions, (6) neural networks, (7) convolutional neural
networks (CNNs), (8) Markov decision processes (MDPs), (9) recurrent neural networks (RNNs),
(10) reinforcement learning, (11) clustering, (12) decision trees, (13) model selection, (14) ensemble
methods, (15) Bayesian networks, (16) hidden Markov models (HMMs), and (17) optimization. We
make our data and code publicly available.1
The breakdown of questions, parts, points, and non-image points by each semester and topic are
shown in Table 2. Each question in a final exam consists of multiple parts. Questions are written
by providing set-up and context information first, followed by the question parts (which may come
with additional information). Set-up and context information may contain (1) story elements (ex.,
character names, and motivations), (2) relevant definitions and equations, and (3) data points. We
format questions in the dataset by concatenating the question context, any context or solutions from
prior parts of the question required for answering the part, and the part’s context and question. We
split the questions into their corresponding parts. Questions consist of English text, mathematical
notation, and images. Mathematical notation is represented in the dataset by LaTeX and images by
screenshots from pdfs files. The types of question answers are diverse. A few are multiple-choice or
true/false questions. Most are open-ended, for which the evaluation requires modeling the problem,
mathematical manipulation, or code writing. Many questions require providing an explanation.
We used twelve final exams from different semesters for data curation. We had access to the Latex
version for the three most recent semesters of MIT Spring 2021, Fall 2021, and Spring 2022, and
therefore did not require transcription. For the nine remaining exams, MIT Fall 2017, Spring 2018,
Fall 2018, Spring 2019, Fall 2019, Harvard Spring 2015 and Spring 2021, and Cornell Spring 2018
1
Data and code are in the Supplementary Material.
5
6. and Fall 2018, we had access to the pdf versions. In these cases, we used mathpix mathpix.com for
an initial transcription, and curators then evaluated and manually corrected the input questions and
verified the correctness of each input question.
We extract questions and solutions for all parts of all types of questions, including those that rely
on images. We curated nine exams from publicly available pdf files. MIT Spring 2020 and Fall
2020 do not have final exams due to COVID-19. The three MIT exams between 2021 and 2022
were unavailable online; therefore, the model does not overfit their solutions. The aggregate average
grades were available to the students and did not contain any personally identifiable information.
Three duplicate questions were originally on the final exam of MIT Fall 2017 (questions 1, 3, 6) and
appeared again in the final exam of MIT Spring 2022.
3 Benchmark
3.1 Baselines
We provide a benchmark by comparing nine baselines for answering the final exam questions: (1)
GPT-3 text-davinci-002 with zero-shot learning, (2) GPT-3 text-davinci-003 with zero-shot learning,
(3) ChatGPT with zero-shot learning, (4) GPT-3 with few-shot learning, (5) GPT-3 with zero-shot
learning and chain-of-thought (CoT) prompting, (6) GPT-3 with few-shot learning and chain-of-
thought (CoT) prompting, (7) Codex with zero-shot learning, and (8) Codex with few-shot learning,
and (9) OPT with zero-shot learning.
Table 4 shows the prompt used for each approach. GPT-3 zero-shot uses the question as-is, whereas
GPT-3 zero-shot with CoT uses the suffix “Let’s think step by step.” after the question to encourage
multi-step output. Codex zero-shot uses the prefix “Write a program that answers” before the question
within Python comments denoted by triple quotes """ to encourage Codex to write code. GPT-3
few-shot finds the closest questions in the embedding space, measured by cosine similarity, and uses
them and their corresponding answers before the new question as examples in the prompt. Codex
few-shot finds the closest questions in the embedding space also as measured by cosine similarity
and uses these questions and their corresponding code as examples.
For students, a good study technique is to use previous final exams to review and practice for their
upcoming final. We model this method by few-shot learning using the question–answer pairs (for
GPT-3) or question–code (for Codex) with the closest question embeddings from previous finals. We
implement this by considering all the exam questions, marking each question by its semester and
year, and using only previous semesters’ questions for few-shot learning. The MIT Fall 2017 and
Spring 2022 exams contain three duplicate questions, and we handle these same questions the same
way humans do by allowing few-shot learning in MIT Spring 2022 based on successful Fall 2017
zero-shot answers. It is reasonable that if a student studies all previous exams, there may be 8.5% of
repeated question points. Since MIT Fall 2017, Harvard Spring 2015, and Cornell Spring 2017 are
the first final exams in the corresponding universities, we do not perform few-shot learning on these.
3.1.1 Comparison with Open Language Models
We also evaluated our dataset on an open-source language model, Meta’s OPT-175B. OPT-175B
is a model consisting of 175 billion parameters. Our dataset consists of final exam questions from
machine learning courses and fit to be used by OPT-175B. Tables 5 and 6 compare the results of
OpenAI GPT-3, OpenAI Codex, and Meta OPT. We evaluated OPT on only 163 question parts, since
OPT was limited to handling questions under 256 characters in length. We implement the inference
for the OPT-175B model using Alpa. Alpa is a particular framework designed for training and
inference of large models. For the hardware, we use an 8x A100 PCIE cluster. The model requires
about 560 GB of VRAM in our run case and each example takes nine minutes for inference.
3.2 Grading
3.2.1 Human Grading
The questions are of different types: multiple-choice, numerical, expressions, and open-text. We
grade answers and aim to keep all factors equal in grading human and machine answers. Human and
machine answers are graded based on the number of points allocated to each question part, giving
6
7. Method Prompt
GPT-3 Zero-Shot <question>
GPT-3 Few-Shot Q: <similar question>
A: <similar question’s answer>
Q: <question>
A:
GPT-3 Zero-Shot with CoT Q: <question>
A: Let’s think step by step.
GPT-3 Few-Shot with CoT Q: <similar question>
A: <similar question’s answer>
Q: <question>
A: Let’s think step by step.
Codex Zero-Shot """
Write a program that answers the following question: <question>
"""
Codex Few-Shot """
Write a program that answers the following question: <similar question>
"""
<similar question’s correct code>
"""
Write a program that answers the following question: <question>
"""
Table 4: Input prompt for each of the baseline methods: GPT-3 Zero-Shot, GPT-3 Few-Shot, GPT-3
Zero-Shot with CoT, GPT-3 Few-Shot with CoT, Codex Zero-Shot, and Codex Few-Shot. Similar
questions (as measured by cosine similarity) are drawn from previous finals only.
full, partial, or no credit for each answer. We approximate partial credit by assigning half-credit.
The course staff graded student final exams, which included graduate TAs and instructors. Two of
the same graduate TAs and the instructor that graded the student answers also graded the machine
answers. Grading instructions are the same for grading student answers as grading machine answers.
3.2.2 Automatic Grading
We label each question’s answer type into one or two categories out of four options - multiple choice
(MC), numerical, expression, or open. We consider answers multiple choice if the test-taker is
presented with an enumerated list of choices, numerical if the answer is a number, expression if
the answer includes variables or other notation, and open if the answer calls for free-response text.
We categorize questions that have additional questions nested within them by the multiple relevant
categories. Most often, this is the case when a question with one of MC, numerical, or expression, is
followed by a follow-up question asking the student to explain their previous answer. The breakdown
of the questions is: 98 are multiple-choice, 84 numerical, 81 expressions, and 204 are open. The
’Non-Open Points’ column of Tables 7 and 8 show the answer type breakdown by number of points.
Table 7 shows the number of question parts that do not rely on images, the number of points that do
not rely on images, and the number of non-open question points in Introduction to Machine Learning
finals for each semester. MIT Spring 2020 and Fall 2020 did not have final exams due to COVID-19.
Table 8 shows the breakdown by topic. Our automatic grading uses string matching and regular
expressions. In the case of multiple-choice results, we check that the output of the code is equal to
the solution. In the case of numerical answers, we look for a matching integer or real number.
3.3 Performance
Table 5 shows the machine grades by semester and Table 6 shows the machine grades by topic,
excluding question parts that rely on images. We compare the average grade of GPT-3 with zero-shot
(ZS), GPT-3 with ZS and chain-of-thought (CoT) prompting, GPT-3 with few-shot (FS) learning,
GPT-3 with FS and CoT prompting, Codex with ZS, Codex with FS, and OPT with ZS. Fall 2017
is the first semester, so few-shot learning results based on previous semesters are unavailable (NA).
7
8. Semester GPT-3 2 ZS GPT-3 3 ZS ChatGPT ZS GPT-3 FS GPT-3 ZS CoT GPT-3 FS CoT Codex ZS Codex FS OPT ZS
MIT Fall 2017 38.21 50.00 48.93 NA 22.86 NA 21.43 NA NA
MIT Spring 2018 44.35 60.48 50.00 60.48 38.71 70.97 32.26 67.74 33.33
MIT Fall 2018 51.99 62.96 72.50 52.18 61.63 64.17 49.78 54.00 47.54
MIT Spring 2019 43.45 55.65 62.14 54.23 41.07 58.81 15.54 41.55 34.64
MIT Fall 2019 54.92 58.61 58.20 77.05 29.92 58.20 26.23 61.48 NA
MIT Spring 2021 44.33 48.31 51.26 55.81 53.45 60.21 33.62 62.09 33.77
MIT Fall 2021 58.94 61.53 47.24 69.44 50.35 54.90 18.11 42.00 24.44
MIT Spring 2022 42.78 55.29 55.07 68.86 32.03 53.48 51.01 65.46 60.71
Harvard Spring 2015 85.71 64.29 78.57 NA 85.71 NA 50.00 NA 21.43
Harvard Spring 2021 47.73 77.27 72.73 86.36 47.73 81.82 43.18 86.36 45.45
Cornell Spring 2017 78.91 59.90 79.03 NA 80.86 NA 51.30 NA 21.88
Cornell Fall 2018 36.45 57.01 61.21 53.27 44.39 61.21 42.52 56.07 28.97
Table 5: We benchmark different baselines for each semester, excluding question parts that rely on
images. We compare the average grade of GPT-3 (text-davinci-002 and 003) with zero-shot (ZS),
ChatGPT, GPT-3 with few-shot (FS) learning, GPT-3 with ZS and chain-of-thought (CoT) prompting,
GPT-3 with FS and CoT prompting, Codex with ZS, Codex with FS, and OPT with ZS. MIT Fall
2017, Cornell Spring 2017, and Harvard Spring 2015 were the first semester for each university,
so few-shot learning results based on previous semesters are unavailable (NA). MIT Spring 2020
and MIT Fall 2020 did not have final exams due to COVID-19. MIT Spring 2021, MIT Fall 2021,
and MIT Spring 2022 final exams were unavailable online when GPT-3 and Codex were trained,
ensuring that the model is not overfitting. The result of the best-performing method for each semester
is marked in bold.
Topic GPT-3 2 ZS GPT-3 3 ZS ChatGPT ZS GPT-3 FS GPT-3 ZS CoT GPT-3 FS CoT Codex ZS Codex FS OPT ZS
Regression 31.71 56.67 50.56 50.00 25.61 40.85 40.24 50.00 50.00
Classifiers 38.18 47.12 52.81 46.21 26.28 42.35 18.88 53.74 50.00
Logistic Reg. 50.00 60.71 67.86 60.00 77.50 77.50 55.00 70.00 16.67
Features 58.65 71.92 76.15 75.96 53.85 77.31 68.85 81.54 10.00
Loss Functions NA NA NA NA NA NA NA NA NA
Neural Networks 48.34 60.82 67.71 60.23 44.54 68.42 37.82 63.45 27.27
CNNs 37.50 59.59 62.05 53.58 28.36 47.81 13.38 36.77 23.83
MDPs 49.19 73.28 47.33 52.01 46.03 54.23 24.38 38.03 28.32
RNNs 61.46 33.33 45.83 71.88 57.29 66.14 12.50 40.63 39.28
RL 36.09 65.24 55.59 42.99 36.67 50.11 28.79 45.11 24.28
Clustering 100.00 50.00 90.00 100.00 100.00 100.00 50.00 50.00 63.33
Decision Trees 54.70 74.78 69.08 71.80 32.48 51.28 46.15 54.70 55.00
Model Selection 82.93 71.12 82.10 83.74 72.76 95.12 67.48 69.92 21.95
Ensemble Methods 27.89 41.35 69.23 50.00 22.12 66.35 32.69 50.00 13.46
Bayesian Networks 100.00 0.00 100.00 100.00 100.00 100.00 0.00 0.00 100.00
HMMs 100.00 100.00 100.00 100.00 50.00 100.00 100.00 100.00 100.00
Optimization 55.00 77.50 87.50 60.00 35.00 55.00 17.50 70.00 20.00
Table 6: We benchmark different baselines for each course topic, excluding question parts that rely on
images. We compare the grade of GPT-3 (text-davince-002 and 003) with zero-shot (ZS), ChatGPT
with zero-shot (ZS), GPT-3 with few-shot (FS) learning, GPT-3 with zero-shot and chain-of-thought
(CoT) prompting, GPT-3 with FS and CoT, Codex with zero-shot, Codex with few-shot learning,
and OPT with ZS. The question parts on loss functions rely on image information and are therefore
unavailable (marked NA). The result of the best-performing method for each semester is marked in
bold.
Spring 2020 and Fall 2020 did not have final exams due to COVID-19. Spring 2021, Fall 2021, and
Spring 2022 final exams were unavailable online when GPT-3 and Codex were trained, ensuring that
the model is not overfitting content it has seen previously. The results consistently demonstrate that
for zero-shot ChatGPT is best, and that few-shot learning methods perform best across semesters and
topics, as marked in bold.
3.4 Limitations
Our dataset consists of all question parts and their solutions, including images. However, our baseline
methods do not handle questions that rely on an image containing the information required to solve
the question since GPT-3 and Codex do not handle images. Tables 7 and 8 show the breakdown of the
number of question parts and points of questions that do not rely on image information for answering
the question. On average, 27.55% of the question parts, which make up 30.32% of the points in final
8
9. Semester Non-Image Parts / All Non-Image Points Non-Open Points
MIT Fall 2017 49 / 61 70 / 100 69 / 70
MIT Spring 2018 27 / 42 62 / 100 59 / 62
MIT Fall 2018 30 / 60 62 / 100 37.5 / 62
MIT Spring 2019 41 / 58 70 / 100 48 / 70
MIT Fall 2019 46 / 61 61 / 100 50 / 61
MIT Spring 2021 51 / 71 62 / 100 43 / 61
MIT Fall 2021 56 / 86 48 / 100 43 / 48
MIT Spring 2022 46 / 59 68 / 100 53.5 / 68
Harvard Spring 2015 8 / 12 70 / 90 35 / 70
Harvard Spring 2021 15 / 32 22 / 53 11 / 22
Cornell Spring 2017 48 / 48 128 / 128 74 / 128
Cornell Fall 2018 51 / 56 107 / 120 71 / 107
OPT Total 418 / 585 759 / 1091 525 / 759
Total 468 / 646 830 / 1191 594 / 830
Table 7: The number of question parts that do not rely on images, the number of points that do not
rely on images, and the number of non-open question points, in finals for each semester. MIT Spring
2020 and MIT Fall 2020 did not have final exams due to COVID-19.
Topic Non-Image Parts / All Non-Image Points / All Non-Open Points
Regression 37 / 62 45 / 71 35.5 / 45
Classifiers 72 / 85 169 / 200 126.5 / 169
Logistic Regression 10 / 10 14 / 14 7 / 14
Features 18 / 21 26 / 38 22 / 26
Loss Functions 2 / 16 3 / 21 3 / 3
Neural Networks 66 / 87 112 / 153 100.5 / 112
CNNs 54 / 65 61 / 80 57 / 61
MDPs 38 / 77 39 / 121 32.5 / 39
RNNs 27 / 33 48 / 67 36 / 48
Reinforcement Learning 52 / 60 87 / 111 57 / 87
Clustering 5 / 17 30 / 55 15 / 30
Decision Trees 33 / 48 76 / 113 39 / 76
Model Selection 16 / 16 41 / 41 14 / 41
Ensemble Methods 20 / 20 52 / 52 31 / 52
Bayesian Networks 1 / 6 2 / 11 0 / 2
HMMs 1 / 4 1 / 7 0 / 1
Optimization 13 / 16 20 / 32 15 / 20
Bonus/Name 3 / 3 4 / 4 3 / 3
Total 468 / 646 830 / 1191 594 / 830
Table 8: The number of questions parts that do not rely on images, number of points that do not rely
on images, and number of non-open question points in the finals for each topic of the course.
exams, are questions that rely on image information. The points attributed to the non-image parts are
tallied, recorded, and used to calculate non-image percentage grades.
3.5 Generating New Questions
The creation of new, high-quality questions by course instructors and TA is often a time-consuming,
high-effort process. These new questions must be varied from past questions while still testing the
same core concepts. We explore the potential of using GPT-3 to write exam content efficiently by
using the dataset of exam questions to generate new questions automatically. We use questions from
our dataset as prompts to create new high-quality questions not present in our dataset. We create a list
of various questions in our curated dataset and use the resulting list to prompt GPT-3 to create a new
question. The supplementary material demonstrates the results of this process for each topic in the
course. The Appendix consists of new generated questions and the closest question from our dataset
9
10. as measured by the cosine similarity of the embedding of each question. These new questions are
diverse and qualitatively similar to questions on previous MIT final exams. This provides an efficient
way for course TAs and instructors to generate new final questions.
3.5.1 Student Survey
To evaluate the machine-generated questions, we conducted an anonymous online student survey
comparing them with the human-written questions in terms of quality, appropriateness relative to
the course, and question difficulty. We surveyed 15 students who have taken the Introduction to
Machine Learning course or its equivalent. The survey was optional and included informed consent,
with the following description: “We are conducting a survey to assess the quality and difficulty of
automatically generated questions for an introductory machine learning course final. You will be
presented with a series of questions, either human-written (taken from an actual course final exam) or
machine generated, but you will not be told the source of a given question. For each question, you
will be asked (a) whether you think the question is human-written or machine-generated, (b) whether
the question is appropriate for the given course final, and finally (c) how you would rate the difficulty
of the question. Please carefully read each question and answer to the best of your ability”.
We randomly sampled one generated question and its closest (measured by cosine similarity) original,
human-written question for each of the twelve machine learning topics. Students were asked to read
these 24 questions in the survey, mixed and presented randomly, and then answer three questions for
each: (1) “Is the question human-written or machine-generated?”, (2) “Is the question appropriate or
not appropriate for the specific course final?”, and (3) “What is the question’s difficulty level on a
scale between 1 (easiest) and 5 (hardest)?”. We ask the students to provide ratings and not to solve
the questions. The results of our survey are as follows: Out of the human-written questions, students
identified 56.11% of them correctly as human-written and 43.89% incorrectly as machine-generated.
Of the machine-generated questions, students identified 45% of them correctly as machine-generated
and 55% of them incorrectly as human-written. The difficulty ratings were between 1 (the easiest)
and 5 (the hardest). Students rated machine-generated questions with a difficulty level of 2.55 with a
1.11 standard deviation and rated human-written questions with a difficulty level of 2.85 with a 1.12
standard deviation. Students rated machine-generated questions as appropriate 82.6% of the time and
human-written questions as appropriate 85.0% of the time.
The conclusions we draw from the survey are that (1) survey participants considered human-written
questions to be as likely to be human-written or machine-generated, and similarly, machine-generated
questions were considered equally likely to be machine-generated as human-written, (2) survey
participants considered the machine-generated questions slightly easier than human-written questions,
and (3) survey participants considered machine-generated questions as appropriate as human-written
questions. Based on these results, we conclude that across multiple aspects, the machine-generated
questions are highly similar to human-generated questions and can be adapted to generate questions
for machine learning courses.
3.6 Implementation Details
We use the latest OpenAI GPT-3 and Codex models and do not re-train these very large language
models. We fix all the hyperparameters of the models so that the answers are deterministic and
reproducible. Specifically, we set both top P, which controls diversity, and sampling temperature,
which controls randomness, to 0. The frequency and presence penalties are also set to 0, and we
do not halt on any stop sequences. We allow diversity for generating new questions by setting the
top P and temperature to 0.1. We run Codex with an upper bound of generating programs with
1024 tokens. We use the OpenAI text-davinci-002 and 003, ChatGPT, OPT and code-davinci-002
engines for generating text and programs. For few-shot-learning and question generation, we use
the text-similarity-babbage-001 engine to embed the questions and find the closest questions in the
dataset by cosine similarity. The running time for answering or generating each question part is a few
seconds.
4 Conclusions
We present a dataset and benchmark for answering and generating university-level final exams in
machine learning. Machine performance and human performance are evaluated by the same graders
10
11. and grading instructions, as well as by automatic checkers. A comparison of baselines shows that
few-shot learning methods perform best across semesters and topics. A limitation of our work is that
our benchmark does not consider questions that rely on images for their solution. This work may
result in improving students learning for final exams, help course staff generate questions for finals,
and compare levels of difficulty of exams across semesters and schools. This work includes final
exams from MIT’s and Cornell’s Introduction to Machine Learning classes and Harvard’s Machine
Learning course. We hope this dataset and benchmark serve the machine learning community and
advance the state-of-the-art in the field.
References
[1] Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal,
Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models
are few-shot learners. In Proceedings of Advances in Neural Information Processing Systems
(NeurIPS), volume 33, pages 1877–1901, 2020.
[2] Mark Chen et al. Evaluating large language models trained on code. arXiv preprint
arXiv:2107.03374, 2021.
[3] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam
Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM:
Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
[4] Iddo Drori, Sarah Zhang, Reece Shuttleworth, Leonard Tang, Albert Lu, Elizabeth Ke, Kevin
Liu, Linda Chen, Sunny Tran, Newman Cheng, Roman Wang, Nikhil Singh, Taylor L. Patti,
Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, and Gilbert Strang. A neural network
solves, explains, and generates university math problems by program synthesis and few-shot
learning at human level. Proceedings of the National Academy of Sciences, 119(32), 2022.
[5] Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and
Jacob Steinhardt. Measuring massive multitask language understanding. In Proceedings of the
International Conference on Learning Representations (ICLR), 2021.
[6] Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large
language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
[7] Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond,
Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. Competition-level code
generation with alphacode. arXiv preprint arXiv:2203.07814, 2022.
[8] Fanyi Qu, Xin Jia, and Yunfang Wu. Asking questions like educational experts: Automatically
generating question-answer pairs on real-world examination data. In Proceedings of the 2021
Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana,
Dominican Republic, November 2021. Association for Computational Linguistics.
[9] Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song,
John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. Scaling language
models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446,
2021.
[10] Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid,
Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al.
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.
arXiv preprint arXiv:2206.04615, 2022.
[11] Leonard Tang, Elizabeth Ke, Nikhil Singh, Bo Feng, Derek Austin, Nakul Verma, and Iddo
Drori. Solving probability and statistics problems by probabilistic program synthesis at human
level and predicting solvability. In Proceedings of the International Conference on Artificial
Intelligence in Education (AIED), 2022.
[12] Sunny Tran, Pranav Krishna, Ishan Pakuwal, Prabhakar Kafle, Nikhil Singh, Jayson Lynch, and
Iddo Drori. Solving machine learning problems. In Proceedings of the Asian Conference on
Machine Learning (ACML), pages 470–485, 2021.
[13] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, and Denny Zhou. Self-
consistency improves chain of thought reasoning in language models. arXiv preprint
arXiv:2203.11171, 2022.
11
12. [14] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny
Zhou. Chain of thought prompting elicits reasoning in large language models. arXiv preprint
arXiv:2201.11903, 2022.
[15] Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale
Schuurmans, Olivier Bousquet, Quoc Le, and Ed Chi. Least-to-most prompting enables
complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
12
13. A Appendix
Table 9: Generating new questions: example of a new question for each topic automatically generated and the closest
question in the dataset based on cosine similarity of the questions embeddings.
Topic Question Similarity
Regression Generated Question: “We’re given a data set D =
x(i)
, y(i)
n
i=1
,
where x(i)
∈ Rd
and y(i)
∈ R. Let X be a d × n matrix in which the
x(i)
are the columns and let Y be a 1 × n vector containing the values
of y(i)
. Using the ordinary least-squares formula, we can compute
Wols = XXT
−1
XY T
Using ridge regression, we can compute
Wridge = XXT
+ λI
−1
XY T
We decide to try to use these methods
to initialize a single-unit neural network with a linear activation function.
Assume that XXT
is neither singular nor equal to the identity matrix,
and that neither Wols nor Wridge is equal to (0, 0, . . . , 0). Consider a
neuron initialized with Wridge . Provide an objective function J(W) that
depends on the data, such that batch gradient descent to minimize J will
have no effect on the weights, or argue that one does not exist.
Closest Question: “We’re given a data set D =
x(i)
, y(i)
n
i=1
, where
x(i)
∈ Rd
and y(i)
∈ R. Let X be a d × n matrix in which the x(i)
are
the columns and let Y be a 1 × n vector containing the values of y(i)
.
Using the analytical regression (ordinary least-squares) formula, we can
compute
Wols = XXT
−1
XY T
Using ridge regression, we can compute Wridge =
XXT
+ λI
−1
XY T
We decide to try to use these methods to
initialize a single-unit neural network with a linear activation function
and no offset: h(x; W) = WT
x. Assume that XXT
is invertible and
not equal to the identity matrix, and that neither Wols nor Wridge is equal
to (0, 0, . . . , 0). Note also that we are not using an explicit offset/bias
term. Rory has solved many problems from this particular domain
before and the solution has typically been close to W∗
= (1, . . . , 1)T
.
Define an objective function J(W) that we could minimize in order to
obtain good estimates for Rory’s next problem, even with very little
data.
0.945
Classifiers Generated Question: “Consider a binary classification problem with two
classes, +1 and -1. Assume that the training data is linearly separable.
Consider the following two models:
Model 1: g(x) = sgn(wT
x)
Model 2: g(x) = sgn(wT
x + b)
Assume that the training data is linearly separable. Which model is more
likely to overfit?
Closest Question: “In some cases, we will have a validation set
in addition to training and test sets. Assume the validation set is
approximately the same size as the test set. This validation set is often
used to tune hyperparameters such as λ. Imagine we have trained a
classifier using regularization, with λ chosen based on performance on
the training set. Which will have the highest accuracy the training set,
the validation set or the test set?
0.782
13
14. Table 9: Generating new questions: example of a new question for each topic automatically generated and the closest
question in the dataset based on cosine similarity of the questions embeddings.
Topic Question Similarity
Logistic Regression Generated Question: “In logistic regression, we use the sigmoid function
to squish the output of our linear function to be between 0 and 1. What
is the output of the sigmoid function when the input is 0?
Closest Question: “Consider the simplest of all neural networks,
consisting of a single unit with a sigmoid activation function:
h(x; w = σ(w0 + w1x) where σ(z) = (1 + exp(−z))−1
Let’s start
with a classifier defined by w0 = −1 and w1 = 1. Which range of input
values x are classified as positive? Which as negative?
0.810
Features Generated Question: “You are working on a project to predict how
successful a given Kickstarter campaign will be, based on information
about the campaign (such as the amount of money pledged, the number
of backers, the category of the project, the country where the project is
based, and so forth). One of the features you want to use is the project’s
title. What is a good way to encode the project’s title for use in a
machine-learning predictor?
Closest Question: “Mac O’Larnin is considering selling an app
on Frugal Play. You have a friend with inside info at Frugal, and they’re
able to share data on how previous apps have performed on the store.
Mac decides that he will learn a neural network with no hidden layer
(i.e., consisting only of the output layer). He needs help in figuring out
the precise formulation for machine learning. What is the best way
to encode the app characteristic ’Suitable for people ages (2–4, 5–10,
11–15, 16 and over)’ as a feature for an input to the neural network?
Choose from among the following: multiple unary features (one-hot
encoding), multiple binary features (thermometer encoding), an integer
or real-valued feature. Also give the exact function that maps each input
to its corresponding feature(s).
0.761
Loss Functions Generated Question: “We have a data set of images, each of which is
either a picture of a dog or a picture of a cat. We want to train a binary
classifier to distinguish between the two classes. Which of the following
is a better choice of loss function for this task:
(a) The 0-1 loss, which gives a loss of 1 if the classifier predicts the
wrong class and a loss of 0 otherwise.
(b) The cross-entropy loss, which is given by -log(p) if the correct class
is predicted and -log(1-p) otherwise, where p is the predicted probability
of the correct class.
(c) The hinge loss, which is given by max(0, 1-s) if the correct class is
predicted and max(0, 1+s) otherwise, where s is the predicted score for
the correct class.
Closest Question: “Consider the following 2D dataset in (x,y) format:
((1,-1), +1), ((1,1), +1), ((1,2.5),+1), ((2,-2),-1), ((2,1),+1),((2,3),+1),((5,-
1),-1),((5,-2),-1). Break ties in distance by choosing the point
with smaller x1 coordinate, and if still tied, by smaller x2 coordi-
nate. Compute the leave-one-out cross validation accuracy of the
3-nearest-neighbor learning algorithm on this dataset.
0.790
14
15. Table 9: Generating new questions: example of a new question for each topic automatically generated and the closest
question in the dataset based on cosine similarity of the questions embeddings.
Topic Question Similarity
Neural Networks Generated Question: “Consider a neural network with two hidden layers,
each with two units. The input layer has two units, and the output layer
has one unit. The hidden layers use the sigmoid activation function, and
the output layer uses the linear activation function. The weights from the
input layer to the first hidden layer are w11 = 1, w12 = 1, w21 = 1, and
w22 = 1. The weights from the first hidden layer to the second hidden
layer are w11 = 1, w12 = 1, w21 = 1, and w22 = 1. The weights from
the second hidden layer to the output layer are w11 = 1, w21 = 1. The
bias terms are all zero.
What is the output of the neural network for the input x1 = 1, x2 = 1?
Closest Question: “A neural network is given as Z1
= X ∗ W1
,
A1
= f1(Z1
), Z2
= W2
∗ A1
, ŷ = f2
(Z2
). Specifically, the input X
is a 4 × 1 column vector, ŷ is a 1 × 1 scalar. W2
is a 3 × 1 vector. We
also know that, Z1
= (W1
)T
X and Z2
= (W2
)T
A1
. What are the
dimensions of Z2
?
0.880
CNNs Generated Question: “Suppose we have a 3x3 image and we use a 2x2
filter with stride 1. What are the dimensions of the output image?
Closest Question: “A neural network is given as Z1
= X ∗ W1
,
A1
= f1
(Z1
), Z2
= W2
∗ A1
, ŷ = f2
(Z2
). There is only one data
point which is: X = [1, 1, 1, 1]T
and y = [1]. If W1
and W2
are both
matrices/vectors of all ones, what is the resulting Loss where the Loss =
(y − ŷ)2
?
0.895
MDPs Generated Question: “Consider an MDP with four states, called A, B, C,
and D, and with two actions called Move and Stay. The discount factor
γ = 0.9. Here is a reminder of the Q-learning update formula, based on
experience tuple (s, a, r, s′
) :
Q(s, a) := (1 − α)Q(s, a) + α
r + γ max
a′
Q (s′
, a′
)
Let α = 1. Assume we see the following state-action-reward sequence:
A, Move, 0 B, Move, 0 C, Move, 1 A, Move, 0 B, Move, 0 C, Move, 1
With Q-values all starting at 0, we run the Q-learning algorithm on that
state-action sequence. Provide the q-learning value for Q(C, Move).
Closest Question: “Consider an MDP with four states, called
A, B, C, and D, and with two actions called Move and Stay. The
discount factor γ = 0.9. Here is a reminder of the Q-learning update
formula, based on experience tuple (s, a, r, s′
) :
Q(s, a) := (1 − α)Q(s, a) + α
r + γ max
a′
Q (s′
, a′
)
Let α = 1. Assume we see the following state-action-reward sequence:
A, Move, 0 B, Move, 0 C, Move, 1 A, Move, 0 With Q-values all starting
at 0, we run the Q-learning algorithm on that state-action sequence.
Provide the q-learning value for Q(A, move).”
0.988
15
16. Table 9: Generating new questions: example of a new question for each topic automatically generated and the closest
question in the dataset based on cosine similarity of the questions embeddings.
Topic Question Similarity
RNNs Generated Question: “Consider the following RNN: st =
tanh(w1xt + w2st−1 + b) , yt = w3st + b2 . Assume s0 = 0 and
b2 = 0. What values of w1, w2, w3 and b would generate output
sequence [0, 0, 0, 1, 1, 1, 1] given input sequence [0, 0, 0, 1, 0, 1, 0]”
Closest Question: “Ronnie makes a simple RNN with state dimension 1
and a step function for f1, so that st = step(w1xt + w2st−1 + b) where
step(z) = 1 if z 0.0 and equals 0 otherwise, and where the output
yt = st .
Assuming s0 = 1, we want to generate output sequence [0, 0, 0, 1, 1, 1, 1]
given input sequence [0, 0, 0, 1, 0, 1, 0]. Rennie thinks this is not possible
using Ronnie’s architecture. Rennie makes an argument based on the
relationships in the table above. Is Rennie right?
0.907
Reinforcement Learning Generated Question: “What is the tabular Q-learning update equation,
based on experience tuple (s, a, r, s′
)?
Closest Question: “b is the bias for classifier network. What
are dimensions of b for multi k-class classification?
0.796
16
17. Table 9: Generating new questions: example of a new question for each topic automatically generated and the closest
question in the dataset based on cosine similarity of the questions embeddings.
Topic Question Similarity
Clustering Generated Question: “Suppose that we have a dataset with n data points,
k clusters, and d features. After running the k-means algorithm, the
within-cluster sum of squared errors (WCSS) is given by:
1
n
n
X
i=1
kxi − µyi k2
where yi is the cluster label of the ith data point, and µyi
is the cluster
center associated with the ith data point. The within-cluster sum of
squared errors (WCSS) is a measure of how well the clusters fit the data.
Suppose that we have two datasets, X1 and X2, where X1 has n1 data
points and X2 has n2 data points. We run the k-means algorithm on
both datasets. We find that the WCSS for X1 is smaller than the WCSS
for X2. Does this imply that the clusters for X1 are better than the
clusters for X2? Why or why not?
Closest Question: “Consider the following 2D dataset in (x,y) format:
((1,-1), +1), ((1,1), +1), ((1,2.5),+1), ((2,-2),-1), ((2,1),+1),((2,3),+1),((5,-
1),-1),((5,-2),-1). We will construct a tree using a greedy algorithm
that recursively minimizes weighted average entropy. Recall that the
weighted average entropy of a split into subsets A and B is: (fraction
of points in A) · H RA
j,s
+ ( fraction of points in B) · H RB
j,s
where the entropy H (Rm) of data in a region Rm is given by
H (Rm) = −
P
k P̂mk log2 P̂mk. The P̂mk is the empirical probability,
which is in this case the fraction of items in region m that are of class
k. Some facts that might be useful to you: H(0) = 0, H(3/5) = 0.97,
H(3/8) = 0.95, H(3/4) = 0.81, H(5/6) = 0.65, H(1) = 0. Draw the
decision tree that would be constructed by our tree algorithm for this
dataset. Clearly label the test in each node, which case (yes or no) each
branch corresponds to, and the prediction that will be made at each leaf.
Assume there is no pruning and that the algorithm runs until each leaf
has only members of a single class.
0.767
Decision Trees Generated Question: “The Gini score is a measure of how often
a randomly chosen element would be incorrectly labeled if it was
randomly labeled according to the distribution of labels in the subset. It
ranges from 0 to 1, with 0 meaning that there are no mislabeled elements
and 1 meaning that the subset is perfectly mixed. Explain whether or not
it would be a good idea to use the Gini score as a scoring function for
pruning decision trees.
Closest Question: “There are different strategies for pruning de-
cision trees. We assume that we grow a decision tree until there is one or
a small number of elements in each leaf. Then, we prune by deleting
individual leaves of the tree until the score of the tree starts to get worse.
The question is how to score each possible pruning of the tree. Here is a
definition of the score: The score is the percentage correct of the tree on
a separate validation set. Explain whether or not it would be a good idea
and give a reason why or why not.
0.867
17