Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Neurodb Engr245 2021 Lessons Learned

business model, business model canvas, mission model, mission model canvas, customer development, lean launchpad, lean startup, stanford, startup, steve blank, entrepreneurship, I-Corps, Stanford

  • Be the first to comment

  • Be the first to like this

Neurodb Engr245 2021 Lessons Learned

  1. NeuroDB Tony Wang Kun Guo Andrew Freiman Daniel Kharitonov Picker Stanford CS Ph.D. Hacker Stanford MS&E MS Hustler Stanford MBA Designer Stanford MS&E Ph.D., MS.CS Database ML Todd Basche Advisor Maria Popo Mentor 101 Interviews NeuroDB Where we are now… Cloud-based Pandas dataframe Where we started... Unstructured data Tableau-like tool
  2. NeuroDB
  3. CTOs and engineering managers looking to cutting deep learning model development time and cost Organic through open-source adoption B2B Sales to monetize our API Need Interact with large unstructured datasets (text, images and video) quickly and without supplementary models. Web dashboard/ playground to demo functionalities Free python package for developers Paid API with model hosting Community Members / Evangelists ML Engineers and data scientists/analysts looking for ways to work faster Open-source adopters, developers, data scientists, ML engineers Engineering software / deployment infrastructure Developer community who build applets and add functionality. Machine Learning Champions in the Enterprise Data science and business intelligence teams that need user- friendly tools Loved and supported by the open-source community. Preferred by enterprise data analytics/science groups as the goto solution Product Data query engine that executes state-of-art models under the hood. R&D Community Evangelism Model Serving Problem Long turnaround time to build deep learning models with unstructured data Paid API Faster and more accurate served models Premium Web Dashboard with paid API key Business analysts interacting with large unstructured datasets. The business that we thought we were building... Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  4. NeuroDB E - entreprises care more about structured data Our emotional roller coaster ride... “Tableau” with Deep learning C - Interest from subset of consultants Open-source virtual data warehouse. Pharma and GIS beachhead C - Limited market size for consultants, 1 3 2 C- Demoing Consumer MVP 4 5 6 E - Enterprise Unstructured data MVP 7 8 Product 1 = Cloud Pandas! 9 Build out open source offering, community and customer base! Too many products, no market fit Intelligent data lake management, 3 ideas to 1 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future E - Enterprise offering?
  5. NeuroDB Minimum Viable Product 2 Intuitive Web-UI Search | Chat | Data Tables Wide support of unstructured data PDF | Text | Images | Twitter Operations Semantic search | Categorization | Feature extraction | SQL query Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  6. NeuroDB Experiment 1: Demoing NeuroDB with Customers 4 Talked to 15+ Consultants & Analysts 3-4 Expressed Interest Talked to 15+ Consultants & Analysts 3-4 Expressed Interest… 0 had a need for the tool within the next two weeks 0 followed up about trying the product Negative Signal In hindsight, we should have noticed this WAY earlier. We weren’t listening... Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  7. NeuroDB How many consultants do we need to talk to? Reflection: What could we have done differently? 4 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future In hindsight, we should have noticed this WAY earlier. We weren’t listening... What # of analysts need to express interest? Some teammates believed 1. The sample size was too small 1. The potentially small market size was offset by the potentially high WTP We should have 1. Defined what product market fit looked like earlier 1. Defined our goals as a team
  8. NeuroDB At the same time, we started exploring the idea of an enterprise product 2 “The biggest problem for us right now is to get insights in real time.” - Director of data science at major satellite radio Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future Is our Deep Learning expertise well positioned to solve a real problem facing Enterprises?
  9. NeuroDB Experiment 2: Enterprise MVP 4 Talked to ~10 Directors of Data No real interest! Talked to ~10 Directors of data science at large enterprises in telecom, airlines, comms People had very little unstructured data What unstructured data they have are very specific with vertical centric solutions. Negative Signal Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  10. NeuroDB Experiment 3: asking about ETL No clear advantage in a crowded market! 4 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future Insight: instead of transforming unstructured data why don’t we transform structured data
  11. NeuroDB Any problems with existing tools? 1. Data integrity? 2. Messy data? 3. Have to keep rewriting scripts? Not really. And if there is a problem, 10 other people are working on it already. Also a poor fit for our expertise. 5 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  12. NeuroDB 5 We lacked clear direction, felt like we weren’t making progress and hit a low point... Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  13. NeuroDB E - entreprises care more about structured data Our emotional roller coaster ride... “Tableau” with Deep learning C - Interest from subset of consultants Open-source virtual data warehouse. Pharma and GIS beachhead C - Limited market size for consultants, 1 3 2 C- Demoing Consumer MVP 4 5 6 E - Enterprise Unstructured data MVP 7 8 Product 1 = Cloud Pandas! 9 Build out open source offering, community and customer base! Too many products, no market fit Intelligent data lake management, 3 ideas to 1 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future E - Enterprise offering?
  14. NeuroDB In class, we got BURNED...and we deserved it 5 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  15. NeuroDB 5 We had some emergency calls You need to choose a product! We don’t care which one but pick one... Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  16. NeuroDB Our mentors shared their thoughts… ...Teammates had conflicting opinions... We created an analytical framework which diffused tensions... 5 We had some fierce discussions Todd Basche Advisor Maria Popo Mentor Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  17. NeuroDB 5 We analyzed our options...and MADE A DECISION! Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  18. NeuroDB Focus - Intelligent Data Lake Management 6 Data Lake KPI 1 Churn CEO’s 5pm musings Materialize NeuroDB Virtual Data Warehouse NeuroDB Data Science IDE Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  19. NeuroDB “I want it”, but … - “We can’t be your first customer” - “Do you have an open source software that we can try first?” - “This is more of a strategic thing, we need to have a high level strategic discussion about data, then we will decide what we will do” - “PoC process will probably take a few months, have to go through IT security to be cleared etc.” TLDR: What we are doing is too ambitious. 6 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  20. NeuroDB We need to start somewhere ● Open source ● Functionally different or dramatically better than competitors ● Small-medium sized enterprises/startups or small autonomous teams in large companies ● Where should we start? ○ Data Catalog? ○ Data Linkage? ○ Data Analytics? <* 7 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  21. NeuroDB We found a signal in the area of large scale cloud analytics - we feel competition is limited... ● Back to machine learning / deep learning except we don’t do the machine learning / deep learning! ● We provide the infrastructure to efficiently run deep learning models and other user defined functions at large scale in a cloud native manner through an interactive virtual dataframe. ● Performance targets: > 5x over Spark ● People have given us concrete things they need to see for a PoC! 8 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future Why dig for AI gold, when we can sell the shovels?
  22. NeuroDB Here are our customers! Criteria for pilot: native support for images, see that it works large scale -- VP of AI Software Criteria for pilot: runs on Azure, can integrate with Databricks -- VP of Claim Analytics Want to see the open- source product. --Chief Architect of the Enterprise Data Office Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  23. NeuroDB Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future Fall 2021 Incorporate 2021-2022 PhD Open Source Development Summer 2021 Analysis of Technical Feasibility Can we be 10x better? How long will it take? Spring 2021 Lean Launchpad We’re starting to build an Open Source MVP!
  24. VP Data Science/ Head of Data Analytics looking to cut cloud cost and dev time Organic through open-source adoption B2B Sales to monetize our API Need Allow users to perform scalable Pandas-like operations natively on cloud Free python package for developers Cloud Service for enterprise solution Community Members / Evangelists ML Engineers and data scientists/analysts looking for ways to work faster and help open source dev Open-source adopters, developers, data scientists, ML engineers Engineering software / deployment infrastructure Developer community who build applets and add functionality. Data Science Champions in the Enterprise Business analysts who can quickly answer questions using NeuroDB and cheer for it Loved and supported by the open-source community. Preferred by enterprise data analytics/science groups as the go-to solution Product An open-source offering to allow cloud pandas R&D Community Evangelism Managed Service Cloud Cost Problem Pandas is useful but not cloud native Cloud Service Storage of data and deep learning models Cloud Service Computation of queries, charge by time. Here is our business model! Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  25. We plan to make NeuroDB a reality We want to hear from you! tony@neurodb.io
  26. Appendix
  27. NeuroDB For data scientists, NeuroDB is an Intelligent Data Lake query tool that powers end-users to harness data sources directly. Unlike our competitors, we can treat data lake as a 1st class object Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future Position Statement
  28. NeuroDB Value Proposition Efficiently interact with large volumes (> 1 TB) of images and other unstructured data types in cloud storage with a local Pandas-like dataframe interface without worrying about provisioning clusters or managing compute resources. - Run deep learning pipelines on millions of images interactively - Save on data movement costs from S3 - Save time on managing AWS - Easily deploy batch analytics workflows on streaming data Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  29. NeuroDB Serverless Pandas - High Content Biology ID Images Cell Line Treatment 1 HeLa Dox 2 HeLa Control 3 MCF7 Dox 4 MCF7 Control Pandas DataFrame abstraction in Jupyter notebook Optimization and dispatch Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  30. NeuroDB Serverless Pandas - GIS ID Coordinate* (virtual schema) 2005-01-01 … 2020-12-31 Metadata ... 1 Zone #, Corners E/N Satellite, Sensor, Processing, CRS 2 3 4 *UTM WGS84,NAD83, PSD93, etc Pandas DataFrame abstraction in Jupyter notebook Optimization and dispatch Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  31. NeuroDB Define Workflows with user defined functions 1 2 3 4 5 6 1 2 UDF1 5 5 3 UDF2 Intermediate 4 UDF3 6 Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  32. NeuroDB Deploy to streaming data 1 2 UDF1 5 1 2 3 4 5 Workflow 1 Workflow 2 Workflow 3 Workflow Registry Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  33. NeuroDB Key Optimizations: Batch NeuroDB, under the hood, - Uses serverless computing to allow near-infinite scaling with fixed compute cost; execute deep learning pipelines on millions of rows in real time. - Optimizes batch size, workflow DAG and lower cost by selecting the best hardware (CPU/GPU, core count, instance type etc.) for the job - Cache data transparently or automatically to speed up future pipelines. - Deep Learning: model acceleration with quantization or pruning options to take advantage of latest hardware advancements. Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future
  34. NeuroDB Key Optimizations: Streaming - Automatic hardware configuration to meet given throughput and latency target with minimum cost. - Easily deploy workflows developed on batch datasets on streaming data without changing a single line of code. - Convert the workflow into a UI for edge deployment, e.g. in lab. Run the workflow on new data just by uploading inputs through a web UI. Introduction Week 1 Week 2 Week 3 Week 4 Week 5 Week 6 Week 7 Week 8 Week 9 The Future

    Be the first to comment

business model, business model canvas, mission model, mission model canvas, customer development, lean launchpad, lean startup, stanford, startup, steve blank, entrepreneurship, I-Corps, Stanford

Views

Total views

13,130

On Slideshare

0

From embeds

0

Number of embeds

12,506

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×