Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SFbayACM ACM Data Science Camp 2015 10 24

408 views

Published on

This is the slide deck for the 7th annual ACM Data Science Camp. It is an unconference, with content generated by the audience. For the primary event site, see http://www.sfbayacm.org/event/silicon-valley-data-science-camp-2015

Published in: Data & Analytics
  • Be the first to comment

SFbayACM ACM Data Science Camp 2015 10 24

  1. 1. }  8:15 arrive, network, register for tutorial and camp }  8:50-10:50 Tutorial: Introduction to R for Machine Learning }  11:00 Camp Kickoff }  Sponsors: ACM SIGKDD, PayPal, UCSC }  11:25 Keynote: Spark for Data Science, Big & Small }  12:25 Propose Sessions Ask for a “show of hands for interest” à Room Size }  1:15 Lunch, post Session Matrix }  2:00 Session 1 : (50 min for session, 10 min break) }  5:00 Session 4 }  6:00 Session Summary
  2. 2. ◦  8:50 – 10:50am by –  Joseph Rickert (Program Manager, Microsoft) –  Robert Horton (Data Scientist, Microsoft) ◦  Rapid introduction to the R language – in depth enough to build machine learning models –  RandomForest, kernlab, caret ◦  Exploratory analysis, visualize, clustering, classification ◦  How to find R help and additional resources ◦  Big data capabilities of Microsoft’s RRE distribution of R
  3. 3. Morning Tutorial Starts Now
  4. 4. An ACM SF Bay Area Professional Chapter Event Saturday, October 24, 2015 SFbayACM.org/event/silicon-valley-data-science-camp-2015 WiFi: conference Password: (none) Twitter Tag #DSCAMP
  5. 5. Association of Computing Machinery (ACM) ◦  Principal technical, educational, scientific society for computing professionals world-wide –  Chapter representing SF Bay Area since 1957 ◦  Membership/volunteer led, local dues only $20/yr ◦  Members get discounts with publishers, conferences ◦  Produces monthly free meetings –  3rd Wed on General Computing topics –  4th Mon on Data Science ◦  Details at www.SFbayACM.org –  Suggest, Volunteer, Donate: humphrey@SFBayACM.org
  6. 6. }  10 Year Anniversary of Data Science SIG }  Monday night, November 30 at ebay, San Jose ◦  Online Controlled Experiments: Lessons from Running A/B/n Tests for 12 Years ◦  Ronny Kohavi, Distinguished Engineer & General Manager, Analysis & Experimentation, Microsoft
  7. 7. }  Scala Professional Development Seminar ◦  Date: Sat, Nov 7, 8am-5pm ◦  Location: PayPal Town Hall (here) ◦  Speaker: Cay Horstmann, Computer Science, San Jose State University ◦  Author of “Scala for the Impatient” ◦  Interactive crash course into this language ◦  Bring your laptop (w/ Scala pre-loaded) ◦  Presentation / lab format Q) What is Scala? A) Object Oriented Meets Functional http://www.scala-lang.org/
  8. 8. }  How many have been to an un-conference? }  Goals and context of the un-conference ◦  Informal ◦  Share enthusiasm, curiosity, knowledge, questions ◦  Participate, make it happen! ◦  Share responsibility (i.e. leave session room after 50 min) ◦  Encourage session note takers to blog & share at end ◦  http://www.campsite.org/list/733 ◦  Respect others – questions & brainstorms are “safe” ◦  Have FUN! Twitter Tag #DSCAMP
  9. 9. ◦  Greg Makowski – DS SIG & Conference Chair ◦  Bill Bruns – SF bay ACM Chair ◦  Stephen McInerney – DS SIG ◦  Steve Lazarus – web registration ◦  Seeking replacement before retirement ◦  Greg Weinstein - general ◦  Liana Ye – volunteers, food, registration ◦  Liz Fraley – ACM Treasurer Bill Liana Greg W Liz Steve Greg M Stephen
  10. 10. }  8:15 arrive, network, register for tutorial and camp }  8:50-10:50 Tutorial: Introduction to R for Machine Learning }  11:00 Camp Kickoff }  Sponsors: ACM SIGKDD, PayPal, UCSC }  11:25 Keynote: Spark for Data Science, Big & Small }  12:25 Propose Sessions Ask for a “show of hands for interest” à Room Size }  1:15 Lunch, post Session Matrix }  2:00 Session 1 : (50 min for session, 10 min break) }  5:00 Session 4 }  6:00 Session Summary
  11. 11. }  SIGKDD: ACM SIG on Knowledge Discovery and Data Mining. ◦  Home of data miners, data scientists, and analytics professionals }  KDD: the premier conference of the field ◦  Research Track, Industry/Government Track, Industry Practice Expo, Tutorials, Workshops, Invited Talks, Panels, KDD Cups
  12. 12. Expect 2,000 – 2,500 attendees KDD Cup competition has been going since 2009
  13. 13. }  General Chairs }  Program Committee Chairs }  Industry Chairs Balaji Krishnapuram (IBM) Mohak Shah (Bosch, USA) Alex Smola (CMU) Charu Aggarwal (IBM) Rajeev Rastogi (Amazon) Dou Shen (Baidu)
  14. 14. Shipeng Yu Associate GC David Hazel, Derek Young Web Chairs Ron Bekkerman Social Network Chair Romer Rosales Proceedings Chair Hanghang Tong, Vishy Vishwanathan Tutorials Chairs Andrei Broder Panels Chair Quoc Le, Zhi-Hua Zhou Workshops Chairs Shou-De Lin KDD Cup co- chair Gabor Melli, Ankur Teredesai Media & Publicity Chairs Ying Li Treasurer Joaquin Quinonero Candela, Olivier Chapelle Local Arrangements Chairs Sofus Macskassy Student Travel Awards Chair
  15. 15. 2505 Augustine Drive, Santa Clara, CA 95054 
 (near Freeway 101 off Great American Parkway) http://www.ucsc-extension.edu/ ◦  UCSC Extension offers professional technology courses for software, hardware, IT and Web professionals. Over 100 courses are available for enrollment each quarter. ◦  Has a certificate program on “Database and Data Analytics” is the fastest growing certificate in UCSC Extension. Courses cover big data, data science and database applications. 
 Annual Sponsor
  16. 16. Thank PayPal for use of the location Soren Archibald www.KDnuggets.com A primary hub for data mining Co-marketing sponsor Gregory Piatetsky-Shapiro
  17. 17. STRONG FOUNDATION STRONG MOMENTUM 169 Million Active Customer Accounts $8 Billion Revenue 4 Billion Payment Transactions +19 Million Active Customer Accounts Gained in 2014 +17% Total Revenue Growth YoY +24% Payment Transactions Growth YoY $235 Billion Total Payment Volume +25% Total Payment Volume Growth YoY
  18. 18. © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. KEY ENABLER OF OUR BUSINESS SUPPORTS THE PAYPAL BRAND PROMISE MAKES PAYPAL UNIQUE 19 Invest in Growth & Innovation Improve Experience & Increase Revenue Simultaneously Lowest Loss Rates Secure Customer Champion Simple Onboard Underserved Merchants New Markets, Multiple Funding Types Enroll Users Easily Ongoing Innovation
  19. 19. © 2014 PayPal Inc. All rights reserved. Confidential and proprietary. Strong Foundation Strong Front Door 11.5 MILLION PAYMENTS processed daily by PayPal Next-level encryption on every PayPal transaction PayPal never shares financial information with merchants PayPal always verifies a person’s identity for payments 24/7 data analytics combined with human oversight to accurately and quickly spot suspicious activity Constant innovation to advance our machine learning/data mining techniques Seller and buyer protection offered for eligible transactions Security & Fraud Services Consistently ranked among the top in consumer trust & security 20 Financial Information Consumer Privacy Consumers Trust PayPal to Help Protect Their Information % of consumers who trust these companies to protect their financial data and private information such as passwords or birthday Javelin Strategy & Research: Gang of Five: Apple, Google, Amazon, Facebook, and PayPal-eBay: Threat of the Mobile Wallet Disruptors, 2013. 1% 1% 4% 3% 4% 4% 4% 4% 4% 4% 6% 6% 10% 7% 8% 7% 10% 10% 10% 8% 12% 13% 14% 14% 15% 15% 16% 15% 17% 17% 18% 21% 28% 29% 34% 34% Industry Engagement Founding member of the FIDO alliance PayPal chairs the DMARC initiative to reduce phishing attacks against all Internet users PayPal has been doing tokenization for 15+ years, securely storing customers’ financial information in the cloud.
  20. 20. }  Joseph Bradley is a Spark Committer working on MLlib at DataBricks }  Ph.D. in Machine Learning from Carnegie Mellon University in 2013 }  Spark allows fast, iterative analysis on laptop & cluster }  Spark DataFrames, allow manipulation of an API inspired by R & Python Pandas }  ML Pipelines facilitate ML workflows and model tuning }  Spark R provides an API for R users to work with distributed data }  Initial PMML support to export models to other tools
  21. 21. Keynote Starts Now
  22. 22. }  8:15 arrive, network, register for tutorial and camp }  8:50-10:50 Tutorial: Introduction to R for Machine Learning }  11:00 Camp Kickoff }  Sponsors: ACM SIGKDD, PayPal, UCSC }  11:25 Keynote: Spark for Data Science, Big & Small }  12:25 Propose Sessions Ask for a “show of hands for interest” à Room Size }  1:15 Lunch, post Session Matrix }  2:00 Session 1 : (50 min for session, 10 min break) }  5:00 Session 4 }  6:00 Session Summary
  23. 23. WiFi: conference Password: (none)
  24. 24. Town Square A Main auditorium Largest sessions Summary session Town Square C Coffee Food Sponsors bathrooms Entrance Registration Join ACM Courtyard Eat Lunch Fireside A Fireside B Fireside C Fireside D Powwow Talk Soup Stairs WiFi: conference Password: (none) www.SFbayACM.org
  25. 25. WiFi: conference Password: (none) www.SFbayACM.org
  26. 26. }  Write a topic on a sheet of paper ◦  Facilitators name }  60 seconds per suggestion! ◦  Ask for people to show hands for interest, count ◦  Ask for a time keeper (50 minutes for a session) ◦  Ask for a blogger, note taker or person to report ◦  http://www.campsite.org/list/733 }  Based on interest amount, pick a session location and one of the 4 time frames }  Pick what to attend per session: ◦  2:00 3:00 4:00 5:00 WiFi: conference Password: (none) Twitter Tag #DSCAMP
  27. 27. Session Proposals Start Now
  28. 28. Concurrent Sessions 1-3 for the Camp
  29. 29. Concurrent Sessions 4-6 for the Camp

×