Hack session-- citibike sharing viz(using rcharts & slidify)

•

0 likes•1,340 views

NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc,NYC, R programming, R workshop, D3.js, R programming, rcharts, slidify, Ramnath Vaidyanathan, citibike sharing, visualization

Technology Education

b.com
/ram
nathv/bikeshare/zipball/gh-pages)
Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
1 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
2 of 23 6/13/14, 2:03 PM

Get Data
3/23
Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
3 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
4 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
5 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
6 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
7 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
8 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
9 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
10 of 23 6/13/14, 2:03 PM

Create Visualization
Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
11 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
12 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
13 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
14 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
15 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
16 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
17 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
18 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
19 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
20 of 23 6/13/14, 2:03 PM

Wrap in Shiny
Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
21 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
22 of 23 6/13/14, 2:03 PM

Bike Sharing http://ramnathv.github.io/bikeshare/slides/#1
23 of 23 6/13/14, 2:03 PM

This project was completed by Scott Dobbins and Rachel Kogan, who enrolled in the NYC Data Science Academy's 12-Week Data Science Bootcamp. Learn more about the program: http://nycdatascience.com/data-science-bootcamp/ Given that both Wikipedia and comments sections of most websites are freely open to anyone to edit at any time, how has Wikipedia managed to remain such a useful resource while most comments sections are ridden with vandalism, ads, and other counterproductive user behavior? We believe the answer is two-fold: 1) Wikipedia has an army of bots that quickly identify and revert vandalism so that the worst edits are usually never seen by people and the site generally maintains itself in a well-kempt state, and 2) Wikipedia has a strong community of administrators and other contributors who routinely clean the site’s flagged contents. Vandalism is relatively easy to flag, though a few clever edits manage to stay on the site for a long time. What about site content problems that are more subjective, like bias? Wikipedia users do routinely manually flag pages with point-of-view (POV) issues, though with millions of pages and no machine-based approaches, the site can only manage to confidently maintain neutrality on the more well-trafficked pages. Here we propose a solution to solve some of the more intractable content issues for Wikipedia and other sites using Natural Language Processing (NLP) and machine learning approaches. The sheer quantity of data managed by Wikipedia and similar sites requires distributed computing approaches, so we show here how Apache Spark can upgrade common algorithms to run on massive data sets.

A Hybrid Recommender with Yelp Challenge Data

Vivian S. Zhang

Developed by Chao Shi, Sam O'Mullane, Sean Kickham, Reza Rad and Andrew Rubino Watch the project presentation: https://youtu.be/gkKGnnBenyk This project was completed by students from NYC Data Science Academy's 12-Week Bootcamp. Learn more about the bootcamp: http://nycdatascience.com/data-science-bootcamp/ People make decisions on where to eat based on friends’ recommendations. Since they know you, their suggestions matter more than those of strangers. For the capstone project, we built a hybrid Yelp recommendation system that can provide individualized recommendations based on your friend’s reviews on the social network. We built the machine learning models using Spark, and set up a Flask-Kafka-RDS-Databricks pipeline that allows a continuous stream of user requests. During the presentation, we will talk about the development framework and technical implementation of the pipeline. Read on their project posts and code: https://blog.nycdatascience.com/student-works/capstone/yelp-recommender-part-1/ https://blog.nycdatascience.com/student-works/yelp-recommender-part-2/

Kaggle Top1% Solution: Predicting Housing Prices in Moscow

Vivian S. Zhang

This project was completed by students graduated from NYC Data Science Academy 12-week Data Science Bootcamp. Learn more about the bootcamp: http://nycdatascience.com/data-science-bootcamp/ Watch the project presentation: https://youtu.be/W530d2ZdbJE Ranked #15 out of 3,274 teams on Kaggle Team Members - Brandy Freitas, Chase Edge and Grant Webb Given 4 years of housing price data in a foreign market, predicting the following year’s prices should be pretty straightforward, right? But what if in that last year of data, the country’s stock market, the value of its currency and the price of its number 1 export, all dropped by nearly 50%. And on top of all that, the country was slapped with economic sanctions by the EU and the US. This was Moscow in 2014 and as you can see, it was anything but straightforward. We were able to overcome these challenges and in the two weeks of working together, were able to achieve a top 1% ranking on Kaggle. Our success is a product of our in depth data cleaning, feature engineering and our approach to modeling. With a focus on interpretability and simplicity, we begin modeling using linear regression and decision trees which gave us a better understanding of the data. We then utilized more complicated models such as random forests and XGBoost which ultimately resulted in our top submission.

Data mining with caret package

Vivian S. Zhang

Xgboost

Vivian S. Zhang

Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays. Agenda: Introduction of Xgboost Real World Application Model Specification Parameter Introduction Advanced Features Kaggle Winning Solution

Streaming Python on Hadoop

Vivian S. Zhang

Data Science is concerned with the analysis of large amounts of data. When the volume of data is really large, it requires the use of cooperating, distributed machines. The most popular method of doing this is Hadoop, a collection of programs to perform computations on connected machines in a cluster. Hadoop began life as an open-source implementation of MapReduce, an idea first developed and implemented by Google for its own clusters. Though Hadoop's MapReduce is Java-based, and quite complex, this talk focuses on the "streaming" facility, which allows Python programmers to use MapReduce in a clean and simple way. We will present the core ideas of MapReduce and show you how to implement a MapReduce computation using Python streaming. The presentation will also include an overview of the various components of the Hadoop "ecosystem." NYC Data Science Academy is excited to welcome Sam Kamin who will be presenting an Introduction to Hadoop for Python Programmers a well as a discussion of MapReduce with Streaming Python. Sam Kamin was a professor in the University of Illinois Computer Science Department. His research was in programming languages, high-performance computing, and educational technology. He taught a wide variety of courses, and served as the Director of Undergraduate Programs. He retired as Emeritus Associate Professor, and worked at Google until taking his current position as VP of Data Engineering in NYC Data Science Academy. -------------------------------------- Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot! If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students.

Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author

Vivian S. Zhang

Xgboost

Vivian S. Zhang

Our fall 12-Week Data Science bootcamp starts on Sept 21st,2015. Apply now to get a spot! If you are hiring Data Scientists, call us at (1)888-752-7585 or reach info@nycdatascience.com to share your openings and set up interviews with our excellent students. --------------------------------------------------------------- Come join our meet-up and learn how easily you can use R for advanced Machine learning. In this meet-up, we will demonstrate how to understand and use Xgboost for Kaggle competition. Tong is in Canada and will do remote session with us through google hangout. --------------------------------------------------------------- Speaker Bio: Tong is a data scientist in Supstat Inc and also a master students of Data Mining. He has been an active R programmer and developer for 5 years. He is the author of the R package of XGBoost, one of the most popular and contest-winning tools on kaggle.com nowadays. Pre-requisite(if any): R /Calculus Preparation: A laptop with R installed. Windows users might need to have RTools installed as well. Agenda: Introduction of Xgboost Real World Application Model Specification Parameter Introduction Advanced Features Kaggle Winning Solution Event arrangement: 6:45pm Doors open. Come early to network, grab a beer and settle in. 7:00-9:00pm XgBoost Demo Reference: https://github.com/dmlc/xgboost

Nyc open-data-2015-andvanced-sklearn-expanded

Vivian S. Zhang

Scikit-learn is a machine learning library in Python, that has become a valuable tool for many data science practitioners. This talk will cover some of the more advanced aspects of scikit-learn, such as building complex machine learning pipelines, model evaluation, parameter search, and out-of-core learning. Apart from metrics for model evaluation, we will cover how to evaluate model complexity, and how to tune parameters with grid search, randomized parameter search, and what their trade-offs are. We will also cover out of core text feature processing via feature hashing. --------------------------------------------------------- Andreas is an Assistant Research Scientist at the NYU Center for Data Science, building a group to work on open source software for data science. Previously he worked as a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and maintained it for several years. Material will be posted here: https://github.com/amueller/pydata-nyc-advanced-sklearn Blog: peekaboo-vision.blogspot.com Twitter: https://twitter.com/t3kcit

Nycdsa ml conference slides march 2015

Vivian S. Zhang

THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data

Vivian S. Zhang

Max Kuhn's talk on R machine learning

Vivian S. Zhang

Twitter: @NycDataSci Learn with our NYC Data Science Program (weekend courses for working professionals and 12 week full time for whom are advancing their career into Data Science) Our next 12-Week Data Science Bootcamp starts in Jun. (Deadline to apply is May 1st, all decisions will be made by May 15th.) ==================================== Max Kuhn, Director is Nonclinical Statistics of Pfizer and also the author of Applied Predictive Modeling. He will join us and share his experience with Data Mining with R. Max is a nonclinical statistician who has been applying predictive models in the diagnostic and pharmaceutical industries for over 15 years. He is the author and maintainer for a number of predictive modeling packages, including: caret, C50, Cubist and AppliedPredictiveModeling. He blogs about the practice of modeling on his website at ttp://appliedpredictivemodeling.com/blog --------------------------------------------------------- His Feb 18th course can be RSVP at NYC Data Science Academy. Syllabus Predictive Modeling using R Description This class will get attendees up to speed in predictive modeling using the R programming language. The goal of the course is to understand the general predictive modeling process and how it can be implemented in R. A selection of important models (e.g. tree-based models, support vector machines) will be described in an intuitive manner to illustrate the process of training and evaluating models. Prerequisites: Attendees should have a working knowledge of basic R data structures (e.g. data frames, factors etc) and language fundamentals such as functions and subsetting data. Understanding of the content contained in Appendix B sections B1 though B8 of Applied Predictive Modeling (free PDF from publisher [1]) should suffice. Outline: - An introduction to predictive modeling - R and predictive modeling: the good and bad - Illustrative example - Measuring performance - Data splitting and resampling - Data pre-processing - Classification trees - Boosted trees - Support vector machines If time allows, the following topics will also be covered - Parallel processing - Comparing models - Feature selection - Common pitfalls Materials: Attendees will be provided with a copy of Applied Predictive Modeling[2] as well as course notes, code and raw data. Participants will be able to reproduce the examples described in the workshop. Attendees should have a computer with a relatively recent version of R installed. About the Instructor: More about Max's work: [1] http://rd.springer.com/content/pdf/bbm%3A978-1-4614-6849-3%2F1.pdf [2] http://appliedpredictivemodeling.com

Winning data science competitions, presented by Owen Zhang

Vivian S. Zhang

Using Machine Learning to aid Journalism at the New York Times

Vivian S. Zhang

This talk was presented to NYC Open Data Meetup Group on Nov 11, 2014. Speaker: Daeil Kim is currently a data scientist at the Times and is finishing up his Ph.D at Brown University on work related to developing scalable inference algorithms for Bayesian Nonparametric models. His work at the Times spans a variety of problems related to the company's business interests, audience development, as well as developing tools to aid journalism. Topic: This talk will focus mostly on how machine learning can help problems that prop up in journalism. We'll begin first by talking about using popular supervised learning algorithms such as regularized Logistic Regression to help assist a journalist's work in uncovering insights into a story regarding the recall of Takata airbags in cars. Afterwards, we'll think about using topic modeling to deal with large document dumps generated from FOIA (Freedom of Information Act) requests and Refinery, a simple web based tool to ease the implementation of such tasks. Finally, if there is time, we will go over how topic models have been extended to assist in the problem of designing an efficient recommendation engine for text-based content.

Introducing natural language processing(NLP) with r

Vivian S. Zhang

Bayesian models in r

Vivian S. Zhang

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days

Adtran

UiPath Test Automation using UiPath Test Suite series, part 5

DianaGray10

GridMate - End to end testing is a critical piece to ensure quality and avoid...

ThomasParaiso2

みなさんこんにちはこれ何文字まで入るの？40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの？えこ...

名前です男

A tale of scale & speed: How the US Navy is enabling software delivery from l...

sonjaschweigert1

Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved: - Reduction in onboarding time from 5 weeks to 1 day - Improved developer experience and productivity through actionable findings and reduction of false positives - Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO) Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production. We will cover: - How to remove silos in DevSecOps - How to build efficient development pipeline roles and component templates - How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence) - How to streamline operations with automated policy checks on container images

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Aggregage

Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf

Paige Cruz

Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack. While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack. I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:

Essentials of Automations: The Art of Triggers and Actions in FME

Safe Software

In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation. We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios. Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!

PCI PIN Basics Webinar from the Controlcase Team

ControlCase

GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...

Neo4j

Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.

RESUME BUILDER APPLICATION Project for students

KAMESHS29

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Free Complete Python - A step towards Data Science

RinaMondal9

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024

Neo4j

By Design, not by Accident - Agile Venture Bolzano 2024

Pierluigi Pugliese

Microsoft - Power Platform_G.Aspiotis.pdf

Uni Systems S.M.S.A.

Video Streaming: Then, Now, and in the Future

Alpen-Adria-Universität

In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.

Removing Uninteresting Bytes in Software Fuzzing

Aftab Hussain

Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process. In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds. - These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.

20240609 QFM020 Irresponsible AI Reading List May 2024

Matthew Sinclair