Programming by examples (PBE) is a new frontier in AI that enables users to create scripts from input-output examples. PBE can provide a 10-100x productivity increase for developers in some task domains. 99% of computer users are non-programmers and PBE can enable them to create small scripts to automate repetitive tasks. PBE is revolutionizing data wrangling. Data scientists spend up to 80% time transforming data into a form suitable for machine learning (ML). PBE enables automation of many data manipulation tasks like string transformations (e.g., converting “FirstName LastName” to “LastName, FirstName”), column splitting, field extraction from log files/web pages, normalizing semi-structured spreadsheet into structured tables. Such PBE capabilities have been released inside multiple Microsoft products including Excel, Powershell, OMS, and Azure ML workbench.The synthesized scripts are quite performant and AML Workbench even enables their execution on large data-sets using SPARK runtime.
Another killer application of PBE is around repetitive code transformations like formatting or refactoring, given that developers spend up to 40% time refactoring code in an application migration scenario. A key technical challenge in PBE is to search for programs in an underlying domain-specific language that are consistent with the user-provided examples. Our real-time search methodology leverages logical reasoning techniques and neural-guided heuristics.
Another challenge is to resolve the ambiguity in examples since many programs can satisfy few examples. Our ML-based ranking techniques often select an intended program from among the many that satisfy the examples. We also leverage active-learning-based user interaction models that facilitate a bot-like conversation with the user. Microsoft PROSE SDK exposes these generic search and ranking algorithms (non-commercial use), allowing advanced developers to construct PBE capabilities for new task domains.
This presentation will educate the audience about this new PBE-based programming paradigm: its applications, form factors inside different products, the science behind it.
5. Data Science Class Assignment
5#Res8SAIS “FlashExtract: A Framework for data extraction by examples”
[PLDI 2014] Vu Le, Sumit Gulwani
6. Disambiguator
More Examples
Intended
Program in D
PBE Architecture
6
Examples
Program set
Test inputs
Ranked
Program set
DSL D
Program
Ranker
#Res8SAIS “Programming by Examples: PL meets ML”
[APLAS 2017] Sumit Gulwani, Prateek Jain
Search
Engine
Search
• Logical Deduction: [OOPSLA ‘15] FlashMeta: A framework for inductive program synthesis
• Machine Learning: [ICLR ‘18] Neural-guided deductive search for real-time program synthesis from examples
Ranking
• Program Features: [CAV ‘15] Predicting a correct program in programming by example
• Output Features: [IJCAI ‘17] Learning to learn programs from examples: going beyond program structure
7. New Frontiers
Predictive Synthesis
Synthesis of intended programs from just the input.
• Tabular data extraction, Sort, Join
Synthesis of readable/modifiable code
Synthesis in target language of choice.
• Scala, R, PySpark
Code-first experience in existing workflows.
• IDE, Notebook
7“Automated Data Extraction using Predictive Program Synthesis”
[AAAI 2017] Mohammad Raza, Sumit Gulwani
#Res8SAIS
8. Code Transformations by Examples
• Code refactoring consumes 40% time in migration.
– Old version to new version
– On-prem to cloud
– One framework to another
• Custom formatting
• Performance enhancements
• Repetitive bug fixes
– Feedback generation for programming education
8“Learning syntactic program transformations from examples”
[ICSE 2017] Reudismam Rolim, Gustavo Soares, et.al.
#Res8SAIS
9. Programming by examples is a new frontier in AI.
• 10-100x productivity increase in some domains.
– Data Wrangling: Data scientists spend 80% time.
– Code Refactoring: Developers spend 40% time in migration.
• 99% of end users are non-programmers.
Next-generational AI techniques under the hood
• Logical Reasoning + Machine Learning
The Future: Multi-modal programming with Examples and NL
Questions/Feedback: Contact me at sumitg@microsoft.com
Conclusion
9
Microsoft PROSE (PROgram Synthesis by Examples) Framework
Available for non-commercial use : https://microsoft.github.io/prose/
#Res8SAIS