Successfully reported this slideshow.
Your SlideShare is downloading. ×

Df14 Building Machine Learning Systems with Apex

Upcoming SlideShare
Advanced Apex Webinar
Advanced Apex Webinar
Loading in …3

Check these out next

1 of 20 Ad

More Related Content

Slideshows for you (20)

Similar to Df14 Building Machine Learning Systems with Apex (20)


Recently uploaded (20)

Df14 Building Machine Learning Systems with Apex

  1. 1. Building Machine Learning Systems in Apex Jen Wyher Technical Architect @jenwyher Paul Battisson Technical Architect @pbattisson
  2. 2. Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of, inc. is included in our annual report on Form 10-K for the most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available., inc. assumes no obligation and does not intend to update these forward-looking statements.
  3. 3. Jennifer Wyher Technical Architect at Mavens Consulting @jenwyher
  4. 4. Paul Battisson Technical Architect at Mavens Consulting @pbattisson Summer ’14 MVP @forcedotcomcast
  5. 5. Mavens Consulting • Preferred Life Sciences implementation partner for and Veeva • 60+ consultants located across North America and Europe • 12 Mavens in attendance at #Dreamforce14, speaking in 7 different technical sessions
  6. 6. o a , Baseline setting - Who has worked on a machine learning system before?
  7. 7. What is Machine Learning? Autonomous vehicles Spam filtering Search engines Data analysis
  8. 8. o a , “Field of study that gives computers the ability to learn without being explicitly programmed” - Arthur Samuel, 1959
  9. 9. Unsupervised System determines classification parameters and groups
  10. 10. Supervised You provide the system with some guidance $200k $120k $100k $180k $110k $???
  11. 11. Why Apex? • Governor limits make it hard to do long running or big jobs with apex • Showing the power of the platform
  12. 12. K-Means Clustering • Account targeting • Medical diagnosis aid • Data segmentation “given a group of m different data points derive k clusters of related items”
  13. 13. The Algorithm • Initialize K centroids • Assign each training example to it’s “nearest” centroid • Reset the centroid as the mean of all assigned examples • Repeat until the centroid is fixed
  14. 14. The Algorithm • Initialize K centroids • Assign each training example to it’s “nearest” centroid • Reset the centroid as the mean of all assigned examples • Repeat until the centroid is fixed
  15. 15. How we thought it would work
  16. 16. How it does work
  17. 17. Demo
  18. 18. The Need For Speed • Chained Batches – Batches creating batches • Speedier loops – Remove around 90% of CPUTime – See • JSON serialize/deserialize and attachments – Quick and effective way of storing data – Attachments have much larger limit (around 10x the amount of data) • Running totals (stateful batch) – Saves repeated loops • Javascript Remoting for charting – Loading so many attachments destroys heap size – Use remoting to load attachments for display asynchronously
  19. 19. Future Ideas • Recommendation Engines – Content – Products/services • Neural Networks – Lots of number processing – Chaining will be key • Real time sites recommendations – Think Amazon recommendations
  20. 20. @jenwyher @pbattisson @mavens

Editor's Notes

  • Paul - 30 secs
  • Paul
  • Paul
  • Paul
  • JOE
    and if you’re not familiar with mavens
    we are the preferred healthcare and life sciences partner for and veeva
    we have about 60 consultants located across North America and Europe
    and we have 12 mavens in attendance at #dreamforce14, speaking in 7 different technical sessions
    and we’re feeling a little extra pride right now, earlier today, we won the salesforce partner innovation award for marketing, so I figured id toot our horn a bit
  • Paul - EOS = 2 mins
  • Paul - EOS =3
  • Paul - EOS = 4
  • Paul - EOS = 6
  • Paul - EOS - 8
  • Jen - EOS = 12

    Simple but still lots of iterations and calculations
    K-Mean Clustering Algorithm.

    How does the algorithm work?
    Feed it a large amount of dataset (eg: your sales data).

    The end result of the calculations, the data points are a pre-defined # of clustering (k # of clustering) groupings of the provided data set. Identifying data points with the most similarities.

    Account Target
    Customized marketing plan.
    Cardiology vs a neurologist

    Medical Diagnosis Aid
    Could have a tumor based on profile, probability of having certain diseases.

    Data Segmentation
    Cluster into groups to analyze
  • Jen - EOS = 14
  • Jen - EOS = 14

    Diagram shows you 2-dimensional, but imagine that you can do this analysis for 5, 10, even 100 attributes to the observation.
  • Jen - EOS = 18

    Saving too much data
    Our calculations were running long
    .. And actually our goal at this time was to be able to analyze on 11 datapoints.
  • Chained Batches, Winter ‘13.
    Start a Batch Apex job from within another Batch Apex job.

    Purist, and use matrix operations.
  • Jen - Demo - EOD = 24
  • Paul - EOS = 25