You have made a successful Proof of Concept by using Pandas for data manipulation and analysis. So, how are you going to productionize it? Come to learn how to transform your POC to a scalable product with MongoDB. Learn about pitfalls and drawbacks of Pandas and benefits of using MongoDB in the early stages.
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product Using MongoDB
1. MongoDB in Data Science
How to convert a Pandas Proof-of-Concept to a scalable product and
why MongoDB is the key to success !
2. Who I am
Software Engineer
Compiler Engineer
Compiler Engineer
LLVM contributor
Software Engineer
R/D
Lead ML Engineer
Backend
Infrastructure
Sr. ML Engineer
3. What will we learn ?
● Understand existing tools for delivering Data Science projects and when to use them.
● Why MongoDB could be crucial for your product and business
● How to easily productionize a Pandas Proof-of-Concept
● How to use MongoDB while being open to other technologies.
8. What is Pandas?
Most popular Python framework for data manipulation and data wrangling in Data
Science community.
9. What is Pandas?
Most popular Python framework for data manipulation and data wrangling in Data
Science community.
Source: numpy.org, scipy.org, matplotlib.org, scikit-learn.org, pandas.pydata.org
16. Drawbacks of Pandas
● Doesn’t have persistence layer
● Doesn’t support primary and secondary indexes
○ As a result, not efficient for querying
● Doesn’t support multi-threading
41. Batch Job versus Real Time Service
Real Time Service Batch Job
Pros On demand (scales as needed) Easier to develop and maintain
Cons Harder to develop and maintain Constantly utilizing resources
42. Benefits of MongoDB
● Schema-Less
● Horizontally scalable
● Available as PaaS from many vendors.
● Has a huge community
● Easier to hire people
43. Summary
● Allows to provide a real time experience
● Could help save expensive computational resources
● Provides a way to do real time as well as batch inference