Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Rsqrd AI: ML Tooling at an AI-first Startup

  1. Emad Elwany - CTO, Lexion Evolution of ML Infrastructure at an AI-First Startup Rsqrd AI Meetup - May 2020
  2. Agenda ● Lexion Overview ● Document Understanding Pipeline ● Evolution of ML Infrastructure at Lexion ● Deep Dive - Model Versioning
  3. Lexion: Applying NLP to legal agreements Creating this simple report could take weeks without automation.
  4. It’s a complex NLP problem ● Messy PDFs make OCR non-trivial ● Long, multi-agreement documents ● Domain specific language ● Complex schemas/ontologies ● Mix of non/semi/fully structured data
  5. Sample: Identify Contract Term Contract term is AUTO RENEW if, e.g.: “will automatically renew for three year terms” “shall continue on a month to month basis until terminated” Contract term is FIXED if, e.g.: “terminate effective April 1, 2007.” “will continue until the 1 year anniversary”
  6. Document Understanding Pipeline Input OCR Output BL . . . Entities Classes Relations Text Layout Structured Data . . . Many many models! Key Takeaway: Every node in this graph is a “model” (of hundreds), and the remainder of this talk applies to each and every one of them.
  7. Initial Goals (Pre-MVP) ● Evaluate technical feasibility: Can we build it? ● Evaluate business viability: Will they find it useful? ● Move very quickly: Can we ship it before we run out of money? Use tools that are easy to ● Understand ● Setup ● Deploy
  8. Steady state Goals (Post-MVP) ● Scale model development ● Scale model deployment ● Keep users happy at all times Use tools that are easy to ● Integrate ● Configure ● Scale
  9. Typical model lifecycle Experience with ML in research, applications, and platforms:
  10. Data EARLY ● Finding the data Scrapers/FOIA ● Cleaning the data Scripting + Rules ● Annotating the data Simple annotation tools LATER ● Managing the data Data Stores and Caches ● Protecting the data Encryption and Access control ● Scaling annotation Weakly/Unsupervised
  11. Training EARLY Optimize for Speed of Results Jupyter, Scripts Goal: does it work? LATER Optimize for speed of Experimentation Frameworks and metrics Goal: make it the best!
  12. Packaging EARLY Optimize for shipping the models REST endpoint (online) Batch script (offline) LATER Optimize for operationalizing the model Versioning of artefacts Dependency management Cost management More on this a bit later...
  13. Validate Model EARLY ● Does it work well enough? Simple high level metrics (F1, P, R etc.) LATER ● Is it better? ● Why is it better? ● How is it better? Much more rigor: ● Validation sets ● E2E tests ● More detailed metrics
  14. Deployment EARLY Optimize for Speed of deployment LATER Optimize for Scale of deployment ● Inference time ● Priority vs. starvation ● Rapid update deployment
  15. Monitor EARLY Bare minimum to ensure things are working: ● High level E2E alert LATER Invest in monitoring all aspects of the models: ● Detailed KPIs ● Model Drift ● User DSAT Logging, Dashboards, Alerts
  16. Deep Dive: Model Versioning
  17. Real life problems ● “We used to predict the right X on this document - when/why did it break?” ○ Usually accompanied by an alert or even worse: a user complaint. ● “The model we trained 2 months ago was so much better at Y - we can’t seem to get the same performance. How do we roll back?” ○ Usually accompanied by a frustrated product manager / quality engineering. ● “I swear I got better results over the weekend for the same experiment, I don’t know what changed!” ○ Usually accompanied by a confused data scientist. But first: can you reproduce your model results to the 10th decimal place? If not, STOP!
  18. Wait… didn’t we solve this problem a long time ago? Source control has been used for decades. How is this different? Versioning ML models shares a lot with code versioning, for e.g.: But it also includes a lot more: Code (*) Config Library dependencies Topology Training Data Training Parameters Model State (weights, hyperparameters) Hardware (*) Code is a lot of things in the context of ML models, it’s data prep, libraries, models, featurizers etc.
  19. What exactly is Versioning for ML models? L1: Production/Staging slots. Allows very short-term rollback/rollforward. L2: Reproducing Inference. Once you have a trained model, this kind of versioning allows you to deterministically reconstruct a model for inference. Allows pinning models for a long time as well as long-term rollback/rollforward. L3: Reproducing Training. You can at any point in time, re-train a model that yields the exact same model you had previously trained. This is a much stronger kind of versioninging, it enables reproducibility as well as dealing with issues as training data corruption.
  20. Artefacts that need to be versioned Simple examples Inference Training Model Hyper Parameters Size of Layer N Featurizer Code Input feature vector size Featurizer Data Vocab Model Code NN Architecture Model Config Remove Stop Words? Model State Model Weights Library Dependencies PyTorch Version Hardware V100 Training Config Early Stopping Criteria Training Data Data + Labels
  21. Remember this pipeline? Input OCR Output BL . . . Entities Classes Relations Text Layout Structured Data . . . Many many models! You need to version the aforementioned artefacts for every single node in this graph. That’s a lot of things to version!
  22. Some solutions (that don’t work) ● Let’s snapshot everything in a Docker image and store it forever > How do you hotfix the model? ● Let’s mark a “stable” production model and not deploy any future “staging” versions till they have been tested enough. > How do you make “breaking” changes to the code? ● Let’s always support only “latest” version and never commit a new version until we’re sure it’s good. > How do you iterate quickly?
  23. We evaluated some existing solutions It’s always better to not reinvent the wheel
  24. It’s a lot of work to move infrastructure The question is when not if. Early stage startups need to ship and sell their product, hard to justify infrastructure plumbing till the flywheel turns. Instead of a full solution, these investments have paid off: 1. Versioning all model state during packaging 2. Versioning all data artefacts in our our data store and making them immutable 3. Versioning all code explicitly by keeping stable interfaces and supporting minor/major version upgrades to model/featurizer code. 4. Pinning major versions of stable dependencies Remember: we are building a whole user facing application on top of this, prioritizing when to invest here is critical.
  25. BTW, all this ML is in addition to… ● Permissions ● Email alerts ● SSO ● End-user annotations ● Custom reporting ● Full text search ● Task management ● Custom fields ● Doc schemas ● APIs ● Integrations ● Bulk export ● Integrations ● Dashboards ● Pretty charts ● Bulk ingestion ● Security ● Audit trail … building a complete user facing application!
  26. A note on ML technical debt ● Identify when cost debt > cost addressing debt ● Incorporate cost of ML infrastructure in your business model ● Pick the right kind of technical debt, with a plan to get out ● Model versioning is one of the areas you might want to invest in early ● Getting a great model is just the first step of a long journey. You have to build a product customers love!
  27. Questions? Learn more at https://lexion.ai (we’re hiring!)
Advertisement