Bhautik Mavani is a 3rd year undergraduate student studying computer science and engineering in India. He has experience developing websites and applications using various front-end and back-end technologies. For his proposed GSoC project, he plans to improve natural language parsing capabilities for SymPy Gamma by implementing tokenization, tagging, stemming/lemmatization using spaCy NLP. He will parse input statements into valid SymPy expressions, add symbol keyboards for input, improve plotting, and address other issues like pickling in SymPy Live. He provides a detailed timeline spreading the work over 3 months.
SymPy Live and SymPy Gamma natural language queries
1. SymPy Live and SymPy Gamma (on
Google App Engine)
About me
Contact Information
Name : Bhautik Mavani
University : DA-IICT, India
Email : mavanibhautik@gmail.com
Github : mvnnn
Time-zone : UTC + 5:30
Personal Background
I am a 3rd year undergraduate student pursuing B.Tech. in Computer Science and
Engineering at DA-IICT, India. I like to develop scalable websites and web applications.
I am good on mathematics and algorithms. I have done one research internship in NLP.
I like to work on new technology. I am familiar with data analysis and statistics.
Programming details
Platform details
I use Linux mint 17.3 "Rosa" as my primary work machine and Atom as my primary text
editor because it's very promising new editor and provide a lot of new functionality.
Sometimes I also use Sublime Text 3. I am very much familiar with Git and Github.
Programming experience
I have 3 years of programming experience in web development. I completed my 3
month internship as a web developer last summer. During the winter vacations I
interned as a game developer for 1 month.
During the entire course of my summer internship I developed more than 6 web
applications and two websites using HTML, CSS, javascript (ES5 and ES6), AngularJS,
Jquery, BackboneJS, React, Bootstrap, materialize, material-UI in Front-end and
2. Django, NodeJS in Back-end using database MongoDB and Mysql. I am proficient in
programming in python and java. I am very much familiar with Google App Engine.
Contribution to Sympy
I was introduced to Sympy in mid-January 2016, Since then I have been consistently
contributing and learning from the amazing community. Here is a list of all the merged
,unmerged and rejected pull requests in chronological order.
● (Merged) [PR 10615] : Missing Euler line property in Triangle so I added that in
Triangle.
● (Merged) [PR 10543] : add _eval_is_finite in zeta.
● (Open) [PR 10611] : I fixed a TypeError when match result is None in manual
integration.
● (Merged) [PR 10581] : I worked with **Christopher Smith** and fixed this issue
and closed [PR 10569]. We added a TypeError in Boolean addition and
multiplication.
● (Merged) [PR 10664] : I fixed `printing` mistake in `poly`.
● (Open) [PR 10687] : In this PR I worked with **Christopher Smith** see [PR
10644].
● (Merged) [PR 10461] : That's assumption issue. my PR is merged in [PR 10629].
● (Open) [PR 10624] : I added `ValueError` in `Euler` for negative numbers.
● (Merged) [PR 10635] : I solved boolean `Eq` evaluate problem.
● (Merged) [PR 10574] : I added AttributeError in vector subs. In this PR i worked
with **Christopher Smith** and fix this issue and closed [PR 10528].
● (Closed) [PR 10604] : I fixes assumptions bug in ask but I doing this in wrong
section.
● (Open) [PR 10525] : I modified equalLengths function.
● (Merged) [PR 10659] : I fixed Boolean rowsList or colsList output issue in
`matrix`.
3. Contribution to sympy_doc
I like to solve web related issues.Here is another list of all the merged, unmerged and
rejected pull requests.
● (Open) [PR 19] : I read `releases.txt` file from Sympy
server and add all version in `sphinxsidebarwrapper`. I also add `popup` menu
for all version.
● (Open) [PR 18] : I add `latest` docs link on all old versions header.
The Project
Overview
The website is a major face of the project which is the knowledge base where people
understand about the project, learn and see the resources and consider a chance to join
the community of growing developers and contribute back to the project. It helps reach
a wider audience and a well designed website always attracts more users thus
increasing the users and the chance of providing feedback and patches to the project.
SymPy already has a SymPy Gamma which is a little closer to WolframAlpha. My aim is
to implement a Wolfram Alpha's functionality into Sympy Gamma. Most important is to
add natural language queries parsing for SymPy Gamma and improve parsing.
Why this project?
I am familiar with Natural Language Processing and when I saw this project it got me
hooked. This project will provide me a much bigger learning experience that anything
else I have worked in the past.
I would like to build scalable website and web application. I am very much familiar with
natural language processing. I have done one research internship under my college
professor in NLP. I have been working on NLTK from last one year. I have done Data
Analysis and Statistics course on coursera. spaCy is much better then NLTK because
spaCy provide lots of new functionality compared to other NLP. In this project we use
which NLP that’s not very important because we don’t need to do parse lot of data but
we need to parse only one line statement. The main thing was how to implement NLP.
4. What have other people done on this idea?
Currently, Sympy can parse normal input expression and that is done using regular
expressions. In my opinion that wouldn’t scale very well when we need at more complex
equations.
Sympy has parsing methods which recognizing the word ‘gamma’ and interpreting its
usage. This parsing methodology is very good as it helps user type naturally technical
words like ‘gamma’, ‘pi’ and not go back look for their respective ASCII character codes.
Using sympify(), it can convert unknown variables into Symbols for eg. ‘3.00' into
‘Float(3)’, ‘x!’ into ‘factorial(x)’. These functional correction methods are very useful, if I
type " lim(tan(x), x, pi/2) ", it automatically gives me "limit (tan (x), x, pi /2 ) ".
SymPy Gamma also has a Login feature. If user is logged in then it also records search
queries too. displaying the search queries of the user.
Implementation details
SymPy Gamma :
1) I will implement natural language queries support using spaCy NLP.
How to proceed with NLP:
First, we need to do word tokenization to split sentence into words and then we
should use Tagger to target Nouns, verbs, adjectives and after this step the stop
words need to be removed. After this the output requires stemming and
lemmatization to reduce inflectional forms and possible derivationally related
forms of a word to a common base form. After doing this I will employ Wordnet .
After doing all those above process I apply regular expression to convert
normal mathematical form.
Before doing NLP in Sympy gamma we need to apply some logic on how to
proceed with it. I plan on making two examples which will attempt to explain how
to proceed with NLP for Sympy gamma. I will try to cover all possible input
expressions which are shown in Sympy gamma Examples:
5. I ) Limit x squared with respect to x as x approaches infinity
I will make a name dictionary of Sympy function. First I will apply word
tokenization on this input statement and then I will approach from LHS and find
Sympy function name which match with “Limit” keyword and then I will find
variables after function name keyword and then I will deduce that the adjacent
word x is a variable . After this we need to find the dependency and apply regular
expression on those and finally convert this sentence into Limit (x**2, x, oo ).
II ) The integral of xy with respect to x and y
First I will apply word tokenization on this input statement and then approach
from LHS and find Sympy function name which will match with the “integral”
keyword.
With this done. I will find “x” and “y” variable so in that I will make one array and
store both variable in this array. After this we need to find the only dependency
and this case one may find both variable dependency or may be one variable
dependency or may be variable are constant and I will apply regular expression
on those and finally I will convert this sentence into integrate (x*y,x,y). I plan to
follow SRO english grammar rule for split input sentence. eg.
|<-------s --------> ||<-------R------>||<--O-->|
The integral of xy with respect to x and y
In above both example I assume that function value is exit right side of Sympy function
name keyword . I will apply the same logic if function value is exist left side of Sympy
function name keyword . I check first in Sympy function name keyword right side if I not
find function value in right side then I check left side.
If any input statement I don’t find Sympy function name keyword then that’s very
complicated case. I also implement functionality to solve simple input statements
without Sympy function name keyword but very hard statement it’s impossible to write
NLP in 3 month time is not much enough.
2) Sympy gamma can parse some type of inputs(eg. ‘Sin[x]^2’ into
‘sin(x)**2’, ‘x + sin x’ into ‘x + sin(x)’). But it can’t be parsed over some
type of inputs (eg.sinx → expected output is sin(x)). I type "sin(x*(x + 1)", it
automatically guesses that I meant "sin(x*(x + 1)). I will improve this parsing
using regular expression syntax tree. In this case first I find a list of
precedence levels after this I find list of operators whether it's left, right or
non-associative.
6. 3) Wolframalpha has symbol keyboard so that user can make expression easily
using keyboard. That’s very useful especially for mobile. I will add all
symbol(" ∫ " = integrate, " ∑ " = sum, etc). Users can input those symbols using
keyboard and I will parse those symbol to SymPy code. I will do that using mathquill to
get the LaTeX code corresponding to user input from keyboard and then I will use
Latex2SymPy to convert Latex code to simple expression. If user clicked on “ ∫
” button then mathquill generate this output “int” and then I use
Latex2Sympy and we get “integrate” as a output.
4) Currently, sympy gamma plot static graph. I will add some functionality for user so
user can change variable value.etc and we need to plot according to value change. I will
replace plot with iPython Notebook to display the results on SymPy Gamma, Using
ipython widgets we can change the value of variable and also we can change initial
constant value, etc. I will add download button so User can download the plot.
5) I will add support button like themes.getbootstrap have. If the user clicks on this
button then we display small conversion windows and user can discuss their doubts
easily. If the user has logged in then we can reply their answer to their email address.
6) improve website design using Bootstrap or Materializecss. Website design is very
important. Wolfram Alpha website design is much better then Sympy gamma, for
instance.
Sympy Live :
7)SymPy Live has bugs with pickling. I would like to use pickle with panda. panda
really helps to speed up when we have big data.
7. Timeline:
I am quite sure that I'll be able to give 45-55 hours a week for this project. This project
is very big and I need around 4 months to complete the entire thing so I plan on starting
as soon as possible. My collage vacations start on 5th may so I will start working on the
project from 10th may. Classes at my University will start on 15th August but that won't
be an issue. I'll be continue working with same pace.
● 22 Apr - 22 May: Community Bonding Period. Discuss more about the project
with mentors, know the community.
● 23 May - 29 May:
Goal : Tokenization, Tagger
In this week I plan to do word Tokenization and then I will use Tagger to target
Nouns, verbs, adjectives.
● 30 May - 5 June:
Goal : Remove stop words, stemming and lemmatization
I will remove stop words and then I will do stemming and lemmatization to reduce
inflectional forms and sometimes derivationally related forms of a word to a
common base form.
● 6 June - 12 June:
Goal : Wordnet
Wordnet is a database of nouns and words in English. It supports of a lot of
operations on words like similarity measure, parent finding, etc.
● 13 June - 19 June:
Goal : Regular expression
I will apply regular expression to convert normal mathematical form.
Eg. sympy want limit(1/x, x, oo) input for evaluate limit function so I need to
convert input in this type of form.
● 20 June - 26 June : Mid term evaluations
Goal: complete NLP implementation for Sympy gamma.
In this week, I will improve my NLP code and add NLP examples in Sympy
gamma.
● 27 June - 3 July :
Goal : Improve RegExr implementation[2]
8. Currently, Sympy gamma evaluate only normal input but it fails when we add
complicated input eg. limit (tan (x), x, pi /2) + integrate (x, x).
● 4 July - 10 July :
Goal : Improve NLP parsing
Sympy gamma can not give right answer of Formal power series.etc. so i will
improve this. When more than one Sympy function exist in input statement then I
add support for that example. eg. Limit x squared with respect to x as x
approaches infinity add with integral of x with respect to x. We need to parse this
statement into limit(x^2, x, oo) + integrate(x, x).
● 11 July - 17 July :
Goal : Add symbol keyboard[3]
I will add Latex parser which directly generates a Sympy object as i explain
above.
● 18 July - 24 July :
Goal : Improve plotting[4]
I will replace plot with iPython.
● 25 July - 31 July :
Goal : improve website design[5], [6]
I complete plot replacement and improve web design.
● 1 August - 7 August :
Goal : Solve pickling issue[7]
I will improve web design and solve pickling issue.
● 8 August - 14 August:
Goal : Fix my open PR.
I will try to merge my all open PR. if that’s not merge during GSOC period then i
will try to merge all open PR after GSOC.
● 15 August - 23 August: Final week
Document the code properly after duly cleaning the code.
9. If I complete above functionality before final evaluation
I would like to add the following functionalities in Sympy gamma :
1. We can install any module in docker container. Sympy gamma currently runs on
GAE and There is too much dependency on Django and the Google App Engine.
We can install React package manager webpack or gulp on Docker. So we could
be distilled all cards into a single ReactJS component. We can run all sympy
versions on docker and give a dropdown menu to our users and let them decide
which version they want to use. We can easily bundle up all the dependencies
and can make the docker image of Sympy gamma project which can be easily
ship to the other os.
2. I intend to add “Autocompletion” (for words like {Integrate, limit,etc} and for
brackets). When a user will input words the application will show her a drop down
similar to google search. It will only show Sympy function name (eg,
Limit,integration,etc) and automatic bracket closing. When user opens bracket in
their input then I will show a close bracket after input words. In this case we need
to consider some assumption using regular expression. (eg. If user inputs: “(lim”
→ in that case we can’t show it like this “(lim)” ). We should provide a
checkbox for user to select on whether they want to use auto-completion or not.
3. I will add shortcuts for Math Symbols which are frequently used in Sympy. I will
consider “s” key and other key is a combo with this key.(eg. s+i = integration, s+l
= Limit, etc).
4. I will add favourite option similar to what wolframalpha has i.e. user can see their
favourite links on sidebar.
5. I would like to add this functionality in Sympy Live. Which will add custom option
menu and add theme change option, Tab space option.etc
Post GSoC
I will continue my contribution to Sympy after GSoC. Though I am not very proficient in
English language (I plan on improving that), my ultimate desire is to become lifelong
contributor to Sympy. To achieve that, I will actively participate with the community and
try to solve the bugs which can be solved with my level of knowledge and expertise.
10. References
● That’s very helpful for parse natural language queries into mathematical
expression NTCIR-11/perm-en-MATH
● This book covers most of NLP concepts nlp-panini
● That’s very helpful for NLP implimentation nlp.stanford.edu
● That’s very helpful to find relation of keywords wordnetweb.princeton.edu
● jurafsky martin speech and Language processing
● My disscussion about project https://groups.google.com/forum/#!
topic/sympy/4mkRkk18dQs
● Intro to NLP with spaCy
● https://github.com/sympy/sympy/wiki/parsing
● https://groups.google.com/forum/#!topic/sympy/rGQ8L5Z26Y0
● Guide for writing RegExr http://regexr.com/
● spaCy.io
● MathQuill Documentation
● latex2sympy github
● UI improvements https://github.com/sympy/sympy_gamma/issues/29
● Ipython- widgets https://github.com/ipython/ipywidgets