Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)

•

0 likes•1,915 views

Data Science Academy, Hack session, NY Times, Dialect Map, Data science by R, Vivian S. Zhang, see www.nycdatascience.com for more details. Joint work by Data Scientist team of SupStat Inc. a New York based data analytic and visualization consulting firm.

Engineering

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
1 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
2 of 113 7/23/14, 12:19 PM

Project Background
3/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
3 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
4 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
5 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
6 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
7 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
8 of 113 7/23/14, 12:19 PM

Drawing a map with the maps package
9/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
9 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
10 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
11 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
12 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
13 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
14 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
15 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
16 of 113 7/23/14, 12:19 PM

On to ggplot2 !
17/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
17 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
18 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
19 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
20 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
21 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
22 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
23 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
24 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
25 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
26 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
27 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
28 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
29 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
30 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
31 of 113 7/23/14, 12:19 PM

Introducing ggplot2
32/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
32 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
33 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
34 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
35 of 113 7/23/14, 12:19 PM

Drawing maps with ggplot2
36/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
36 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
37 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
38 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
39 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
40 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
41 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
42 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
43 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
44 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
45 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
46 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
47 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
48 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
49 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
50 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
51 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
52 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
53 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
54 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
55 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
56 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
57 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
58 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
59 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
60 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
61 of 113 7/23/14, 12:19 PM

Maps in shiny
62/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
62 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
63 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
64 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
65 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
66 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
67 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
68 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
69 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
70 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
71 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
72 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
73 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
74 of 113 7/23/14, 12:19 PM

Creating aesthetic maps
75/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
75 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
76 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
77 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
78 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
79 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
80 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
81 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
82 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
83 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
84 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
85 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
86 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
87 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
88 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
89 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
90 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
91 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
92 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
93 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
94 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
95 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
96 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
97 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
98 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
99 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
100 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
101 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
102 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
103 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
104 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
105 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
106 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
107 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
108 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
109 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
110 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
111 of 113 7/23/14, 12:19 PM

Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
112 of 113 7/23/14, 12:19 PM

Thank you!
113/113
Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3
113 of 113 7/23/14, 12:19 PM

- TITLE: Hack / Protect / Predict SQL Server - Come learn them. Speaker: Fleitas, Hiram Duration: 60 minutes Track: Application & Database Development Level: Advanced https://www.sqlsaturday.com/801/Sessions/Details.aspx?sid=83672 - ABSTRACT: In this session, I'll show you how to hack SQL Server using a simple C# console application and other tools. Most importantly, I'll show you how to protect vectors that perhaps you're trying to use to safeguard sensitive data for GDPR compliance. 1. Tabular Data Stream (TDS) Protocol 2. Dynamic Data Masking 3. Row Level Security (Yep...) 4. Database Source Control Perhaps, you've seen these exploits before but do you really know how to reproduce them? Or, how to even protect yourself against them? No worries, I'll show you the way along with a load test. Finally, I am very excited to share with you how to analyze text using pre-trained Machine Learning models to predict a sentiment, on-prem with SQL Server 2017. 5. SQL ML / AI - A deep dive to predict the sentiment Looking forward to meeting you all. - BIO: Hiram Fleitas is the Principal Database Architect at Universal Property and Casualty Insurance Company and leads the company's intelligent edge using Microsoft’s data platform. He currently is developing database applications using Machine Learning models trained on claims, policy, and social media data to predict business opportunities for customer satisfaction and loyalty in real-time. He has worked with SQL Server for 20 years, from version 6.0 to 2019 with some of the largest companies in the world. He's spoken on SQL Server at User Groups, South Florida Code Camp, PASS SQL Saturdays, and corporate business events, often presenting talks on security, performance, devops, machine learning, and business intelligence. He coded his first program in BASIC when he was 13 years old as a school project and developed a passion for computers ever since. Hiram is also a code contributor to several opensource projects and serves as an IS Flotilla Staff Officer for the United States Coast Guard Auxiliary. On his time off he mostly enjoys spending time with his wife Christina and two kids, Ocean and Skylar Fleitas. He also wakeboards, wakesurfs, snowboards and does endurance training events by GORUCK’s Cadre-led decorated combat veterans of Special Operations. https://linkedin.com/in/hiramfleitas/ https://fleitasarts.com - DATE & TIME: Saturday - Oct 6, 2018 11:00 am - 12:00 pm Presentation - LOCATION: Seminole State College Partnership Building 100 Weldon Blvd Building UP, Sanford, Florida, 32773 100 Weldon Blvd, Building UP, Sanford, FL 32773 Room #: R1 http://seminolestate.edu - Follow #SQLSatOrlando on Twitter https://twitter.com/hashtag/SQLSatOrlando - Follow @HiramFleitas on Twitter https://twitter.com/hiramfleitas

Building Pageless Apps with Rails and Backbone.js

ZURB

Grade 10 hardware& net working

Krishan Madushanka

Streaming Python on Hadoop

Vivian S. Zhang

Data Science is concerned with the analysis of large amounts of data. When the volume of data is really large, it requires the use of cooperating, distributed machines. The most popular method of doing this is Hadoop, a collection of programs to perform computations on connected machines in a cluster. Hadoop began life as an open-source implementation of MapReduce, an idea first developed and implemented by Google for its own clusters. Though Hadoop's MapReduce is Java-based, and quite complex, this talk focuses on the "streaming" facility, which allows Python programmers to use MapReduce in a clean and simple way. We will present the core ideas of MapReduce and show you how to implement a MapReduce computation using Python streaming. The presentation will also include an overview of the various components of the Hadoop "ecosystem." NYC Data Science Academy is excited to welcome Sam Kamin who will be presenting an Introduction to Hadoop for Python Programmers a well as a discussion of MapReduce with Streaming Python. Sam Kamin was a professor in the University of Illinois Computer Science Department. His research was in programming languages, high-performance computing, and educational technology. He taught a wide variety of courses, and served as the Director of Undergraduate Programs. He retired as Emeritus Associate Professor, and worked at Google until taking his current position as VP of Data Engineering in NYC Data Science Academy. -------------------------------------- Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot! If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.

Introducing natural language processing(NLP) with r

Vivian S. Zhang

Data mining with caret package

Vivian S. Zhang

Kaggle Top1% Solution: Predicting Housing Prices in Moscow

Vivian S. Zhang

This project was completed by students graduated from NYC Data Science Academy 12-week Data Science Bootcamp. Learn more about the bootcamp: http://nycdatascience.com/data-science-bootcamp/ Watch the project presentation: https://youtu.be/W530d2ZdbJE Ranked #15 out of 3,274 teams on Kaggle Team Members - Brandy Freitas, Chase Edge and Grant Webb Given 4 years of housing price data in a foreign market, predicting the following year’s prices should be pretty straightforward, right? But what if in that last year of data, the country’s stock market, the value of its currency and the price of its number 1 export, all dropped by nearly 50%. And on top of all that, the country was slapped with economic sanctions by the EU and the US. This was Moscow in 2014 and as you can see, it was anything but straightforward. We were able to overcome these challenges and in the two weeks of working together, were able to achieve a top 1% ranking on Kaggle. Our success is a product of our in depth data cleaning, feature engineering and our approach to modeling. With a focus on interpretability and simplicity, we begin modeling using linear regression and decision trees which gave us a better understanding of the data. We then utilized more complicated models such as random forests and XGBoost which ultimately resulted in our top submission.

Twitter: @NycDataSci Learn with our NYC Data Science Program (weekend courses for working professionals and 12 week full time for whom are advancing their career into Data Science) Our next 12-Week Data Science Bootcamp starts in Jun. (Deadline to apply is May 1st, all decisions will be made by May 15th.) ==================================== Max Kuhn, Director is Nonclinical Statistics of Pfizer and also the author of Applied Predictive Modeling. He will join us and share his experience with Data Mining with R. Max is a nonclinical statistician who has been applying predictive models in the diagnostic and pharmaceutical industries for over 15 years. He is the author and maintainer for a number of predictive modeling packages, including: caret, C50, Cubist and AppliedPredictiveModeling. He blogs about the practice of modeling on his website at ttp://appliedpredictivemodeling.com/blog --------------------------------------------------------- His Feb 18th course can be RSVP at NYC Data Science Academy. Syllabus Predictive Modeling using R Description This class will get attendees up to speed in predictive modeling using the R programming language. The goal of the course is to understand the general predictive modeling process and how it can be implemented in R. A selection of important models (e.g. tree-based models, support vector machines) will be described in an intuitive manner to illustrate the process of training and evaluating models. Prerequisites: Attendees should have a working knowledge of basic R data structures (e.g. data frames, factors etc) and language fundamentals such as functions and subsetting data. Understanding of the content contained in Appendix B sections B1 though B8 of Applied Predictive Modeling (free PDF from publisher [1]) should suffice. Outline: - An introduction to predictive modeling - R and predictive modeling: the good and bad - Illustrative example - Measuring performance - Data splitting and resampling - Data pre-processing - Classification trees - Boosted trees - Support vector machines If time allows, the following topics will also be covered - Parallel processing - Comparing models - Feature selection - Common pitfalls Materials: Attendees will be provided with a copy of Applied Predictive Modeling[2] as well as course notes, code and raw data. Participants will be able to reproduce the examples described in the workshop. Attendees should have a computer with a relatively recent version of R installed. About the Instructor: More about Max's work: [1] http://rd.springer.com/content/pdf/bbm%3A978-1-4614-6849-3%2F1.pdf [2] http://appliedpredictivemodeling.com

Using Machine Learning to aid Journalism at the New York Times

Vivian S. Zhang

This talk was presented to NYC Open Data Meetup Group on Nov 11, 2014. Speaker: Daeil Kim is currently a data scientist at the Times and is finishing up his Ph.D at Brown University on work related to developing scalable inference algorithms for Bayesian Nonparametric models. His work at the Times spans a variety of problems related to the company's business interests, audience development, as well as developing tools to aid journalism. Topic: This talk will focus mostly on how machine learning can help problems that prop up in journalism. We'll begin first by talking about using popular supervised learning algorithms such as regularized Logistic Regression to help assist a journalist's work in uncovering insights into a story regarding the recall of Takata airbags in cars. Afterwards, we'll think about using topic modeling to deal with large document dumps generated from FOIA (Freedom of Information Act) requests and Refinery, a simple web based tool to ease the implementation of such tasks. Finally, if there is time, we will go over how topic models have been extended to assist in the problem of designing an efficient recommendation engine for text-based content.

Bayesian models in r

Vivian S. Zhang

A Hybrid Recommender with Yelp Challenge Data

Vivian S. Zhang

Developed by Chao Shi, Sam O'Mullane, Sean Kickham, Reza Rad and Andrew Rubino Watch the project presentation: https://youtu.be/gkKGnnBenyk This project was completed by students from NYC Data Science Academy's 12-Week Bootcamp. Learn more about the bootcamp: http://nycdatascience.com/data-science-bootcamp/ People make decisions on where to eat based on friends’ recommendations. Since they know you, their suggestions matter more than those of strangers. For the capstone project, we built a hybrid Yelp recommendation system that can provide individualized recommendations based on your friend’s reviews on the social network. We built the machine learning models using Spark, and set up a Flask-Kafka-RDS-Databricks pipeline that allows a continuous stream of user requests. During the presentation, we will talk about the development framework and technical implementation of the pipeline. Read on their project posts and code: https://blog.nycdatascience.com/student-works/capstone/yelp-recommender-part-1/ https://blog.nycdatascience.com/student-works/yelp-recommender-part-2/

Wikipedia: Tuned Predictions on Big Data

Vivian S. Zhang

This project was completed by Scott Dobbins and Rachel Kogan, who enrolled in the NYC Data Science Academy's 12-Week Data Science Bootcamp. Learn more about the program: http://nycdatascience.com/data-science-bootcamp/ Given that both Wikipedia and comments sections of most websites are freely open to anyone to edit at any time, how has Wikipedia managed to remain such a useful resource while most comments sections are ridden with vandalism, ads, and other counterproductive user behavior? We believe the answer is two-fold: 1) Wikipedia has an army of bots that quickly identify and revert vandalism so that the worst edits are usually never seen by people and the site generally maintains itself in a well-kempt state, and 2) Wikipedia has a strong community of administrators and other contributors who routinely clean the site’s flagged contents. Vandalism is relatively easy to flag, though a few clever edits manage to stay on the site for a long time. What about site content problems that are more subjective, like bias? Wikipedia users do routinely manually flag pages with point-of-view (POV) issues, though with millions of pages and no machine-based approaches, the site can only manage to confidently maintain neutrality on the more well-trafficked pages. Here we propose a solution to solve some of the more intractable content issues for Wikipedia and other sites using Natural Language Processing (NLP) and machine learning approaches. The sheer quantity of data managed by Wikipedia and similar sites requires distributed computing approaches, so we show here how Apache Spark can upgrade common algorithms to run on massive data sets.

We're so skewed_presentation

Vivian S. Zhang

Winning data science competitions, presented by Owen Zhang

Vivian S. Zhang

Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author

Vivian S. Zhang

Xgboost

Vivian S. Zhang

Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays. Agenda: Introduction of Xgboost Real World Application Model Specification Parameter Introduction Advanced Features Kaggle Winning Solution

Tips for data science competitions

Owen Zhang

Why NYC DSA.pdf

Vivian S. Zhang

Career services workshop- Roger Ren

Vivian S. Zhang

Nycdsa wordpress guide book

Vivian S. Zhang

Xgboost

Vivian S. Zhang

Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot! If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students. --------------------------------------------------------------- Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout. --------------------------------------------------------------- Speaker Bio: Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays. Pre-requisite(if any): R /Calculus Preparation: A laptop with R installed. Windows users might need to have RTools installed as well. Agenda: Introduction of Xgboost Real World Application Model Specification Parameter Introduction Advanced Features Kaggle Winning Solution Event arrangement: 6:45pm Doors open. Come early to network, grab a beer and settle in. 7:00-9:00pm XgBoost Demo Reference: https://github.com/dmlc/xgboost

Nyc open-data-2015-andvanced-sklearn-expanded

Vivian S. Zhang

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing. --------------------------------------------------------- Andreas is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously he worked as a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and maintained it for several years. Material will be posted here: https://github.com/amueller/pydata-nyc-advanced-sklearn Blog: peekaboo-vision.blogspot.com Twitter: https://twitter.com/t3kcit

Nycdsa ml conference slides march 2015

Vivian S. Zhang

THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data

Vivian S. Zhang

Natural Language Processing(SupStat Inc)

Vivian S. Zhang

Data Science Academy Student Demo day--Moyi Dang, Visualizing global public c...

Vivian S. Zhang

Data Science Academy Student Demo day--Divyanka Sharma, Businesses in nyc

Vivian S. Zhang

Data Science Academy Student Demo day--Chang Wang, dogs breeds in nyc

Vivian S. Zhang

Data Science Academy Student Demo day--Richard Sheng, kinvolved school attend...

Vivian S. Zhang

Viewers also liked

Max Kuhn's talk on R machine learning

Vivian S. Zhang

Using Machine Learning to aid Journalism at the New York Times

Vivian S. Zhang

Bayesian models in r

Vivian S. Zhang

A Hybrid Recommender with Yelp Challenge Data

Vivian S. Zhang

Wikipedia: Tuned Predictions on Big Data

Vivian S. Zhang

We're so skewed_presentation

Vivian S. Zhang

Winning data science competitions, presented by Owen Zhang

Vivian S. Zhang

Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author

Vivian S. Zhang

Xgboost

Vivian S. Zhang

Tips for data science competitions

Owen Zhang

Viewers also liked (10)

Max Kuhn's talk on R machine learning

Using Machine Learning to aid Journalism at the New York Times

Bayesian models in r

A Hybrid Recommender with Yelp Challenge Data

Wikipedia: Tuned Predictions on Big Data

We're so skewed_presentation

Winning data science competitions, presented by Owen Zhang

Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author

Xgboost

Tips for data science competitions

Recently uploaded

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines

Christina Lin

Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.

PPT on GRP pipes manufacturing and testing

anoopmanoharan2

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理

zwunae

UMich毕业证原版定制【微信：176555708】【密歇根大学|安娜堡分校毕业证成绩单-学位证】【微信：176555708】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。 ◆◆◆◆◆ — — — — — — — — 【留学教育】留学归国服务中心 — — — — — -◆◆◆◆◆ 【主营项目】一.毕业证【微信：176555708】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【微信：176555708】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分→ 【关于价格问题（保证一手价格）我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！学历顾问：微信：176555708

Water billing management system project report.pdf

Kamal Acharya

Our project entitled “Water Billing Management System” aims is to generate Water bill with all the charges and penalty. Manual system that is employed is extremely laborious and quite inadequate. It only makes the process more difficult and hard. The aim of our project is to develop a system that is meant to partially computerize the work performed in the Water Board like generating monthly Water bill, record of consuming unit of water, store record of the customer and previous unpaid record. We used HTML/PHP as front end and MYSQL as back end for developing our project. HTML is primarily a visual design environment. We can create a android application by designing the form and that make up the user interface. Adding android application code to the form and the objects such as buttons and text boxes on them and adding any required support code in additional modular. MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software. It is a stable ,reliable and the powerful solution with the advanced features and advantages which are as follows: Data Security.MySQL is free open source database that facilitates the effective management of the databases by connecting them to the software.

Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf

aqil azizi

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions

Victor Morales

MCQ Soil mechanics questions (Soil shear strength).pdf

Osamah Alsalih

Unbalanced Three Phase Systems and circuits.pptx

ChristineTorrepenida1

DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS

itech2017

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B

Sreedhar Chowdam

一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理

dxobcob

Otago毕业证原版定制【微信：176555708】【奥塔哥大学毕业证成绩单-学位证】【微信：176555708】（留信学历认证永久存档查询）采用学校原版纸张、特殊工艺完全按照原版一比一制作（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。 ◆◆◆◆◆ — — — — — — — — 【留学教育】留学归国服务中心 — — — — — -◆◆◆◆◆ 【主营项目】一.毕业证【微信：176555708】成绩单、使馆认证、教育部认证、雅思托福成绩单、学生卡等！二.真实使馆公证(即留学回国人员证明,不成功不收费) 三.真实教育部学历学位认证（教育部存档！教育部留服网站永久可查）四.办理各国各大学文凭(一对一专业服务,可全程监控跟踪进度) 如果您处于以下几种情况： ◇在校期间，因各种原因未能顺利毕业……拿不到官方毕业证【微信：176555708】 ◇面对父母的压力，希望尽快拿到； ◇不清楚认证流程以及材料该如何准备； ◇回国时间很长，忘记办理； ◇回国马上就要找工作，办给用人单位看； ◇企事业单位必须要求办理的 ◇需要报考公务员、购买免税车、落转户口 ◇申请留学生创业基金留信网认证的作用: 1:该专业认证可证明留学生真实身份 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分→ 【关于价格问题（保证一手价格）我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！学历顾问：微信：176555708

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...

Dr.Costas Sachpazis

Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.

NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...

ssuser7dcef0

Power plants release a large amount of water vapor into the atmosphere through the stack. The flue gas can be a potential source for obtaining much needed cooling water for a power plant. If a power plant could recover and reuse a portion of this moisture, it could reduce its total cooling water intake requirement. One of the most practical way to recover water from flue gas is to use a condensing heat exchanger. The power plant could also recover latent heat due to condensation as well as sensible heat due to lowering the flue gas exit temperature. Additionally, harmful acids released from the stack can be reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation. Condensation of vapors in flue gas is a complicated phenomenon since heat and mass transfer of water vapor and various acids simultaneously occur in the presence of noncondensable gases such as nitrogen and oxygen. Design of a condenser depends on the knowledge and understanding of the heat and mass transfer processes. A computer program for numerical simulations of water (H2O) and sulfuric acid (H2SO4) condensation in a flue gas condensing heat exchanger was developed using MATLAB. Governing equations based on mass and energy balances for the system were derived to predict variables such as flue gas exit temperature, cooling water outlet temperature, mole fraction and condensation rates of water and sulfuric acid vapors. The equations were solved using an iterative solution technique with calculations of heat and mass transfer coefficients and physical properties.

Fundamentals of Induction Motor Drives.pptx

manasideore6

basic-wireline-operations-course-mahmoud-f-radwan.pdf

NidhalKahouli2

Student information management system project report ii.pdf

Kamal Acharya

Technical Drawings introduction to drawing of prisms

heavyhaig

6th International Conference on Machine Learning & Applications (CMLA 2024)

ClaraZara1

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf

SamSarthak3

Hierarchical Digital Twin of a Naval Power System

Kerry Sado

A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.

Recently uploaded (20)

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines

PPT on GRP pipes manufacturing and testing

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单专业办理

Water billing management system project report.pdf

Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf

KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions

MCQ Soil mechanics questions (Soil shear strength).pdf

Unbalanced Three Phase Systems and circuits.pptx

DESIGN AND ANALYSIS OF A CAR SHOWROOM USING E TABS

Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B

一比一原版(Otago毕业证)奥塔哥大学毕业证成绩单如何办理

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...

NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...

Fundamentals of Induction Motor Drives.pptx

basic-wireline-operations-course-mahmoud-f-radwan.pdf

Student information management system project report ii.pdf

Technical Drawings introduction to drawing of prisms

6th International Conference on Machine Learning & Applications (CMLA 2024)

AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf

Hierarchical Digital Twin of a Naval Power System

Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)

1. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 1 of 113 7/23/14, 12:19 PM

2. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 2 of 113 7/23/14, 12:19 PM

3. Project Background 3/113 Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 3 of 113 7/23/14, 12:19 PM

4. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 4 of 113 7/23/14, 12:19 PM

5. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 5 of 113 7/23/14, 12:19 PM

6. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 6 of 113 7/23/14, 12:19 PM

7. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 7 of 113 7/23/14, 12:19 PM

8. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 8 of 113 7/23/14, 12:19 PM

9. Drawing a map with the maps package 9/113 Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 9 of 113 7/23/14, 12:19 PM

10. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 10 of 113 7/23/14, 12:19 PM

11. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 11 of 113 7/23/14, 12:19 PM

12. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 12 of 113 7/23/14, 12:19 PM

13. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 13 of 113 7/23/14, 12:19 PM

14. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 14 of 113 7/23/14, 12:19 PM

15. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 15 of 113 7/23/14, 12:19 PM

16. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 16 of 113 7/23/14, 12:19 PM

17. On to ggplot2 ! 17/113 Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 17 of 113 7/23/14, 12:19 PM

18. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 18 of 113 7/23/14, 12:19 PM

19. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 19 of 113 7/23/14, 12:19 PM

20. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 20 of 113 7/23/14, 12:19 PM

21. Hack Session http://www.nycdatascience.com/slides/supstat_dialectmap/index.html#3 21 of 113 7/23/14, 12:19 PM