(Presented by David Smith at useR!2016, June 2016. Recording: https://channel9.msdn.com/Events/useR-international-R-User-conference/useR2016/R-at-Microsoft )
Since the acquisition of Revolution Analytics in April 2015, Microsoft has embarked upon a project to build R technology into many Microsoft products, so that developers and data scientists can use the R language and R packages to analyze data in their data centers and in cloud environments.
In this talk I will give an overview (and a demo or two) of how R has been integrated into various Microsoft products. Microsoft data scientists are also big users of R, and I'll describe a couple of examples of R being used to analyze operational data at Microsoft. I'll also share some of my experiences in working with open source projects at Microsoft, and my thoughts on how Microsoft works with open source communities including the R Project.
An introduction to Microsoft R Services,
Microsoft R Open and Microsoft R Server.
This presentation will briefly cover the following:
-Why consider MRO and R Server
-R Server
-MRO
-Microsoft R Services/R Server Platform
-DistributedR
-RevoScaleR/ScaleR
-ConnectR
-DevelopR
-DeployR
-Resources
-References
This session will demonstrate how the all-star line-up featuring R and Storm enables real-time processing on massive data sets; a real home run! The presenters will use actual baseball data and a real-world use case to compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution. Attendees will leave the session with information that could easily be applied for other use cases such as video game analytics, fraud detection, intrusion detection, and consumer propensity to buy calculations.
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
Real-time applications of predictive models must be able to generate predictions at the rate that transactions are generated. Previously, such applications of models trained using R needed to be converted to other languages like C++ or Java to achieve the required throughput. In this talk, I’ll describe how to use the in-database R processing capabilities of Microsoft R Server to detect fraud in a SQL Server database of loan records at a rate exceeding one million transactions per second. I will also show the process of training the underlying gradient-boosted tree model on a large training set using the out-of-memory algorithms of Microsoft R.
An introduction to Microsoft R Services,
Microsoft R Open and Microsoft R Server.
This presentation will briefly cover the following:
-Why consider MRO and R Server
-R Server
-MRO
-Microsoft R Services/R Server Platform
-DistributedR
-RevoScaleR/ScaleR
-ConnectR
-DevelopR
-DeployR
-Resources
-References
This session will demonstrate how the all-star line-up featuring R and Storm enables real-time processing on massive data sets; a real home run! The presenters will use actual baseball data and a real-world use case to compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution. Attendees will leave the session with information that could easily be applied for other use cases such as video game analytics, fraud detection, intrusion detection, and consumer propensity to buy calculations.
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
Real-time applications of predictive models must be able to generate predictions at the rate that transactions are generated. Previously, such applications of models trained using R needed to be converted to other languages like C++ or Java to achieve the required throughput. In this talk, I’ll describe how to use the in-database R processing capabilities of Microsoft R Server to detect fraud in a SQL Server database of loan records at a rate exceeding one million transactions per second. I will also show the process of training the underlying gradient-boosted tree model on a large training set using the out-of-memory algorithms of Microsoft R.
Analysts predict that the Hadoop market will reach $50.2 billion USD by 2020.1 Applications driving these large expenditures are some of the most important workloads for businesses today including:
• Analyzing clickstream data, including site-side clicks and web media tags. • Measuring sentiment by scanning product feedback, blog feeds, social media comments, and Twitter streams. • Analysis of behavior and risk by capturing vehicle telematics. • Optimizing product performance and utilization by gathering data from built-in sensors. • Tracking and analyzing people and material movement with location-aware systems. • Identifying system performance and intrusion attempts by analyzing server and network log. • Enabling automatic document and speech categorization. • Extracting learning from digitized images, voice, video, and other media types.
Predictive analytics on large data sets provides organizations with a key opportunity to improve a broad variety of business outcomes, and many have embraced Apache Hadoop as the platform of choice.
In the last few years, large businesses have adopted Apache Hadoop as a next-generation data platform, one capable of managing large data assets in a way that is flexible, scalable, and relatively low cost. However, to realize predictive benefits of big data, organizations must be able to develop or hire individuals with the requisite statistics skills, then provide them with a platform for analyzing massive data assets collected in Hadoop “data lakes.”
As users adopted Hadoop, many discovered performance and complexity limited Hadoop’s use for broad predictive analytics use. In response, the Hadoop community has focused on the Apache Spark platform to provide Hadoop with significant performance improvements. With Spark atop Hadoop, users can leverage Hadoop’s big-data management capabilities while achieving new performance levels by running analytics in Apache Spark.
What remains is a challenge—conquering the complexity of Hadoop when developing predictive analytics applications.
In this white paper, we’ll describe how Microsoft R Server helps data scientists, actuaries, risk analysts, quantitative analysts, product planners, and other R users to capture the benefits of Apache Spark on Hadoop by providing a straightforward platform that eliminates much of the complexity of using Spark and Hadoop to conduct analyses on large data assets.
Presented by: Joseph Rickert, Data Scientist Community Manager, Revolution Analytics, Sep 25 2014.
Whenever data scientists are asked about what software they use R always comes up at the top of the list. In one recent survey, only SQL was rated higher than R. In this webinar we will explore what makes R so popular and useful. Starting with the big picture, we describe how R is organized and how to find your way around the R world. Then we will work through some examples highlighting features of R that make it attractive for data science work including:
Acquiring data
Data manipulation
Exploratory data analysis
Model building
Machine learning
[Presented to the 7th China R Users Conference, Beijing, May 2014.]
Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves.
In more than 6 years of writing for the Revolutions blog, I’ve discovered hundreds of applications of R in business, in government, and in the non-profit sector. Sometimes the use of R is obvious, and sometimes it takes a little bit of detective work to learn how R is operating behind the scenes. In this talk, I’ll begin by presenting some recent statistics on the growth of R. Then I’ll recount some of my favourite applications of R, and show how R is behind some amazing innovations in today’s world.
27 Aug 2013 Webinar High Performance Predictive Analytics in Hadoop and R presented by Mario E. Inchiosa, PhD., US Data Scientist and Kathleen Rohrecker, Director of Product Marketing
There is one consistent message we hear from customers across industries and around the world: "We would like to reduce our reliance on SAS." In this webinar, we review the top reasons customers cite for moving fromSAS to R; the benefits of open source analytics; the challenges of switching; and the tools you will need to build your own roadmap. We review the key differences between SAS and R from the user's perspective, and provide you with the tools to move forward.
R is more than just a language. Many of the reasons why R has become such a popular tool for data science come from the ecosystem surrounding the R project. R users benefit from the many resources and packages created by the community, while commercial companies (including Microsoft) provide tools to extend and support R, and services to help people use R.
In this talk, I will give an overview of the R Ecosystem and describe how it has been a critical component of R’s success, and include several examples of Microsoft’s contributions to the ecosystem.
(Presented to EARL London, September 2016)
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Revolution Analytics
[Presentation by Skylar Lyon at DataWeek 2014, September 17 2014.]
I recently faced the task of how to scale out an existing analytics process. The schedule was compressed - it always is in my world. The data was big - 400+ million rows waiting in database. What did I do? I offered my favorite type of solution - quick and dirty.
At the outset, I wasn't sure how easy it would be. Nor was I certain of realized performance gains. But the concept seemed sound and the exercise fun. Let's move the compute to the data via Revolution R Enterprise for Teradata.
This presentation outlines my approach in leveraging a colleague's R models as I experimented with running R in-database. Would my path lead to significant improvement? Could it be used to productionalize the workflow?
Revolution Analytics was the first company dedicated to the R Project. This presentation from useR! 2014 covers the history of Revolution Analytics since its founding in 2007 and its contributions to the R project and community.
Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves. In more than 6 years of writing for the Revolutions blog, I’ve discovered hundreds of applications of R in business, in government, and in the non-profit sector. Sometimes the use of R is obvious, and sometimes it takes a little bit of detective work to learn how R is operating behind the scenes. In this talk, I'll recount some of my favourite applications of R, and show how R is behind some amazing innovations in today’s world.
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
Presented by David Smith and Michael Helbraun to the Portland R User Group, November 13, 2013
http://www.meetup.com/portland-r-user-group/events/147311372/
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...Revolution Analytics
Revolution R Enterprise 6.1 includes two important advances in high performance predictive analytics with R: (1) big data decision trees, and (2) the ability to easily extract and perform predictive analytics on data stored in the Hadoop Distributed File System (HDFS).
Classification and regression trees are among the most frequently used algorithms for data analysis and data mining. The implementation provided in Revolution Analytics’ RevoScaleR package is parallelized, scalable, distributable, and designed with big data in mind.
Decision trees and all of the other high performance prediction analytics functions provided with RevoScaleR (such as linear and logistic regression, generalized linear models, and k-means clustering) can now also be used to analyze data stored in the HDFS file system. After specifying the connection parameters to the HDFS file system, some or all of the data can be directly explored, analyzed or quickly and efficiently extracted into a native file system.
Microsoft R server for distributed computing โดย กฤษฏิ์ คำตื้อ Technical Evangelist Microsoft (Thailand) Limited ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
R is free software for data analysis and graphics that is similar to SAS and SPSS. Two million people are part of the R Open Source Community. Its use is growing very rapidly and Revolution Analytics distributes a commercial version of R that adds capabilities that are not available in the Open Source version. This 60-minute webinar is for people who are familiar with SAS or SPSS who want to know how R can strengthen their analytics strategy.
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this presentation, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
Hadoop is famously scalable. Cloud Computing is famously scalable. R – the thriving and extensible open source Data Science software – not so much. But what if we seamlessly combined Hadoop, Cloud Computing, and R to create a scalable Data Science platform? Imagine exploring, transforming, modeling, and scoring data at any scale from the comfort of your favorite R environment. Now, imagine calling a simple R function to operationalize your predictive model as a scalable, cloud-based Web Service. Learn how to leverage the magic of Hadoop on-premises or in the cloud to run your R code, thousands of open source R extension packages, and distributed implementations of the most popular machine learning algorithms at scale.
Analysts predict that the Hadoop market will reach $50.2 billion USD by 2020.1 Applications driving these large expenditures are some of the most important workloads for businesses today including:
• Analyzing clickstream data, including site-side clicks and web media tags. • Measuring sentiment by scanning product feedback, blog feeds, social media comments, and Twitter streams. • Analysis of behavior and risk by capturing vehicle telematics. • Optimizing product performance and utilization by gathering data from built-in sensors. • Tracking and analyzing people and material movement with location-aware systems. • Identifying system performance and intrusion attempts by analyzing server and network log. • Enabling automatic document and speech categorization. • Extracting learning from digitized images, voice, video, and other media types.
Predictive analytics on large data sets provides organizations with a key opportunity to improve a broad variety of business outcomes, and many have embraced Apache Hadoop as the platform of choice.
In the last few years, large businesses have adopted Apache Hadoop as a next-generation data platform, one capable of managing large data assets in a way that is flexible, scalable, and relatively low cost. However, to realize predictive benefits of big data, organizations must be able to develop or hire individuals with the requisite statistics skills, then provide them with a platform for analyzing massive data assets collected in Hadoop “data lakes.”
As users adopted Hadoop, many discovered performance and complexity limited Hadoop’s use for broad predictive analytics use. In response, the Hadoop community has focused on the Apache Spark platform to provide Hadoop with significant performance improvements. With Spark atop Hadoop, users can leverage Hadoop’s big-data management capabilities while achieving new performance levels by running analytics in Apache Spark.
What remains is a challenge—conquering the complexity of Hadoop when developing predictive analytics applications.
In this white paper, we’ll describe how Microsoft R Server helps data scientists, actuaries, risk analysts, quantitative analysts, product planners, and other R users to capture the benefits of Apache Spark on Hadoop by providing a straightforward platform that eliminates much of the complexity of using Spark and Hadoop to conduct analyses on large data assets.
Presented by: Joseph Rickert, Data Scientist Community Manager, Revolution Analytics, Sep 25 2014.
Whenever data scientists are asked about what software they use R always comes up at the top of the list. In one recent survey, only SQL was rated higher than R. In this webinar we will explore what makes R so popular and useful. Starting with the big picture, we describe how R is organized and how to find your way around the R world. Then we will work through some examples highlighting features of R that make it attractive for data science work including:
Acquiring data
Data manipulation
Exploratory data analysis
Model building
Machine learning
[Presented to the 7th China R Users Conference, Beijing, May 2014.]
Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves.
In more than 6 years of writing for the Revolutions blog, I’ve discovered hundreds of applications of R in business, in government, and in the non-profit sector. Sometimes the use of R is obvious, and sometimes it takes a little bit of detective work to learn how R is operating behind the scenes. In this talk, I’ll begin by presenting some recent statistics on the growth of R. Then I’ll recount some of my favourite applications of R, and show how R is behind some amazing innovations in today’s world.
27 Aug 2013 Webinar High Performance Predictive Analytics in Hadoop and R presented by Mario E. Inchiosa, PhD., US Data Scientist and Kathleen Rohrecker, Director of Product Marketing
There is one consistent message we hear from customers across industries and around the world: "We would like to reduce our reliance on SAS." In this webinar, we review the top reasons customers cite for moving fromSAS to R; the benefits of open source analytics; the challenges of switching; and the tools you will need to build your own roadmap. We review the key differences between SAS and R from the user's perspective, and provide you with the tools to move forward.
R is more than just a language. Many of the reasons why R has become such a popular tool for data science come from the ecosystem surrounding the R project. R users benefit from the many resources and packages created by the community, while commercial companies (including Microsoft) provide tools to extend and support R, and services to help people use R.
In this talk, I will give an overview of the R Ecosystem and describe how it has been a critical component of R’s success, and include several examples of Microsoft’s contributions to the ecosystem.
(Presented to EARL London, September 2016)
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Revolution Analytics
[Presentation by Skylar Lyon at DataWeek 2014, September 17 2014.]
I recently faced the task of how to scale out an existing analytics process. The schedule was compressed - it always is in my world. The data was big - 400+ million rows waiting in database. What did I do? I offered my favorite type of solution - quick and dirty.
At the outset, I wasn't sure how easy it would be. Nor was I certain of realized performance gains. But the concept seemed sound and the exercise fun. Let's move the compute to the data via Revolution R Enterprise for Teradata.
This presentation outlines my approach in leveraging a colleague's R models as I experimented with running R in-database. Would my path lead to significant improvement? Could it be used to productionalize the workflow?
Revolution Analytics was the first company dedicated to the R Project. This presentation from useR! 2014 covers the history of Revolution Analytics since its founding in 2007 and its contributions to the R project and community.
Adoption of the R language has grown rapidly in the last few years, and is ranked as the number-one data science language in several surveys. This accelerating R adoption curve has been driven by the Big Data revolution, and the fact that so many data scientists — having learned R at university — are actively unlocking the secrets hidden in these new, vast data troves. In more than 6 years of writing for the Revolutions blog, I’ve discovered hundreds of applications of R in business, in government, and in the non-profit sector. Sometimes the use of R is obvious, and sometimes it takes a little bit of detective work to learn how R is operating behind the scenes. In this talk, I'll recount some of my favourite applications of R, and show how R is behind some amazing innovations in today’s world.
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
Presented by David Smith and Michael Helbraun to the Portland R User Group, November 13, 2013
http://www.meetup.com/portland-r-user-group/events/147311372/
New Advances in High Performance Analytics with R: 'Big Data' Decision Trees ...Revolution Analytics
Revolution R Enterprise 6.1 includes two important advances in high performance predictive analytics with R: (1) big data decision trees, and (2) the ability to easily extract and perform predictive analytics on data stored in the Hadoop Distributed File System (HDFS).
Classification and regression trees are among the most frequently used algorithms for data analysis and data mining. The implementation provided in Revolution Analytics’ RevoScaleR package is parallelized, scalable, distributable, and designed with big data in mind.
Decision trees and all of the other high performance prediction analytics functions provided with RevoScaleR (such as linear and logistic regression, generalized linear models, and k-means clustering) can now also be used to analyze data stored in the HDFS file system. After specifying the connection parameters to the HDFS file system, some or all of the data can be directly explored, analyzed or quickly and efficiently extracted into a native file system.
Microsoft R server for distributed computing โดย กฤษฏิ์ คำตื้อ Technical Evangelist Microsoft (Thailand) Limited ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
R is free software for data analysis and graphics that is similar to SAS and SPSS. Two million people are part of the R Open Source Community. Its use is growing very rapidly and Revolution Analytics distributes a commercial version of R that adds capabilities that are not available in the Open Source version. This 60-minute webinar is for people who are familiar with SAS or SPSS who want to know how R can strengthen their analytics strategy.
Revolution R Enterprise - 100% R and More Webinar PresentationRevolution Analytics
R users already know why the R language is the lingua franca of statisticians today: because it's the most powerful statistical language in the world. Revolution Analytics builds on the power of open source R, and adds performance, productivity and integration features to create Revolution R Enterprise. In this presentation, author and blogger David Smith will introduce the additional capabilities of Revolution R Enterprise.
Hadoop is famously scalable. Cloud Computing is famously scalable. R – the thriving and extensible open source Data Science software – not so much. But what if we seamlessly combined Hadoop, Cloud Computing, and R to create a scalable Data Science platform? Imagine exploring, transforming, modeling, and scoring data at any scale from the comfort of your favorite R environment. Now, imagine calling a simple R function to operationalize your predictive model as a scalable, cloud-based Web Service. Learn how to leverage the magic of Hadoop on-premises or in the cloud to run your R code, thousands of open source R extension packages, and distributed implementations of the most popular machine learning algorithms at scale.
Presented to Chicago R User Group, Jan 29 2015
Good data analysis is reproducible. If someone else can’t independently replicate your results from your data, the consequences can be severe. With R, a major challenge for reproducibility is the ever-changing package ecosystem: it's all too easy to develop an R script using packages, only to find collaborators will download later versions of those packages when they attempt to reproduce your results, and outcome can be unpredictable!
In this talk I'll introduce the Reproducible R Toolkit, and the "checkpoint" package, included with Revolution R Open, and describe some best practices for writing reliable, reproducible R code with packages.
Presented by Joseph Rickert at the NYC R Conference, April 25 2015.
Good data analysis is reproducible. If someone else can’t independently replicate your results from your data, the consequences can be severe. With R, a major challenge for reproducibility is the ever-changing package ecosystem: it's all too easy to develop an R script using packages, only to find collaborators will download later versions of those packages when they attempt to reproduce your results, and outcome can be unpredictable!
In this talk I'll introduce the Reproducible R Toolkit, and the "checkpoint" package, included with Revolution R Open, and describe some best practices for writing reliable, reproducible R code with packages.
Data Science, Statistical Analysis and R... Learn what those mean, how they can help you find answers to your questions and complement the existing toolsets and processes you are currently using to make sense of data. We will explore R and the RStudio development environment, installing and using R packages, basic and essential data structures and data types, plotting graphics, manipulating data frames and how to connect R and SQL Server.
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
Teradata and Revolution Analytics worked together to develop in-database analytical capabilities for Teradata Database. Teradata v14.10 provides a foundation for in-database analytics in Teradata. Revolution Analytics has ported its Revolution R Enterprise (RRE) Version 7.1 to use the in-database capabilities of version 14.10. With RRE inside Teradata, users can run fully parallelized algorithms in each node of the Teradata appliance to achieve performance and data scale heretofore unavailable. We'll get past the market-ecture quickly and dive into a “how it really works” presentation, review implications for system configuration and administration, and then take questions from Teradata users who will be charged with deploying and administering Teradata systems as platforms for big data analytics inside the database engine.
With rising business challenges in the aftermarket service areas, it becomes imperative for manufacturers to gain actionable intelligence across the warranty management life cycle.
Join Revolution Analytics and Tech Mahindra to hear how to reduce the information visibility gap:
• Identify statistically significant business drivers
• Forecast warranty costs and claims
• Improve Customer Satisfaction
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
R and Hadoop go together. In fact, they go together so well, that the number of options available can be confusing to IT and data science teams seeking solutions under varying performance and operational requirements.
Which configuration is faster for big files? Which is faster for sharing data and servers among groups? Which eliminates data movement? Which is easiest to manage? Which works best with iterative and multistep algorithms? What are the hardware requirements of each alternative?
This webinar is intended to help new users of R with Hadoop select their best architecture for integrating Hadoop and R, by explaining the benefits of several popular configurations, their performance potential, workload handling and programming model and administrative characteristics.
Presenters from Revolution Analytics will describe the options for using Revolution R Open and Revolution R Enterprise with Hadoop including servers, edge nodes, rHadoop and ScaleR. We’ll then compare the characteristics of each configuration as regards performance but also programming model, administration, data movement, ease of scaling, mixed workload handling, and performance for large individual analyses vs. mixed workloads.
Presented by David Smith, R Community Lead (Microsoft), at Monktoberfest October 2016.
The value of open source isn’t just in the software itself. The communities that form around open source software provide just as much value and sometimes even more: in ongoing development, in documentation, in support, in marketing, and as a supply of ready-trained employees. Companies who build on open source tend to focus on the software, but neglect communities at their peril.
In this talk, I share some of my experiences in building community for an open-source software company, Revolution Analytics, and perspectives since the acquisition by Microsoft in 2015.
A relatively short Introduction to R as presented at the Belgian Software Craftmanship meetup group.
The goal of this presentation is to give you an introduction to:
• The style of the language
• It's ecosystem
• How common things like data manipulation and visualization work
• How to use it for machine learning
• Webdevelopment and report generation in R
• Integrating R in your system
License:
Introduction To R by Samuel Bosch
To the extent possible under law, the person who associated CC0 with Introduction To R has waived all copyright and related or neighboring rights
to Introduction To R.
http://creativecommons.org/publicdomain/zero/1.0/
Data Culture Series - Keynote & Panel - 19h May - LondonJonathan Woodward
Big data. Small data. All data. You have access to an ever-expanding volume of data inside the walls of your business and out across the web. The potential in data is endless – from predicting election results to preventing the spread of epidemics. But how can you use it to your advantage to help move your business forward?
Data is growing exponentially and it’s now possible to mine and unlock insights from data in new and unexpected ways. Empower your business to take advantage of this data by harnessing the rich capabilities of Microsoft SQL Server and the familiarity of Microsoft Office to help organize, analyze, and make sense of your data—no matter the size.
- Study the architecture and design
- Compare Old & New Technology stack
- Analyze evolution of architecture and scalability
- Lessons learned over time
What’s new in Spark 2.0?
Rerngvit Yanggratoke @ Combient AB
Örjan Lundberg @ Combient AB
Machine Learning Stockholm Meetup
27 October, 2016
Schibsted Media Group
Enabling the Real Time Analytical EnterpriseHortonworks
Combining IOT, Customer Experience and Real-Time Enterprise Data within Hadoop. What if you could derive real-time insights using ALL of your data? Join us for this webinar and learn how companies are combining “new” real-time data sources (i.e. IOT, Social, Web Logs) with continuously updated enterprise data from SAP and other enterprise transactional systems, providing deep and up-to-the-second analytical insights. This presentation will include a demonstration of how this can be achieved quickly, easily and affordably by utilizing a joint solution from Attunity and Hortonworks.
An update of virtual worlds as of May 2008, including a definition of virtual worlds, economic, tax, accounting and legal issues and data visualization and 3d data centers.
Windows Azure PaaS - Webinar Common SenseCommon Sense
Common Sense Webinar: Cloud Technology Trends, highlighting the Microsoft Azure Platform As A Service offering. Presented by Solution Specialist with Microsoft. Nate Shae-Han.
Learn how to Increase time to market, Drive down IT costs and Enable new markets.
Ovh analytics data compute with apache spark as a service meetup ovh bordeauxMojtaba Imani
90% of the data in the world today has been created in the last two years. The world will be creating 163 zettabytes of data a year by 2025. So how do we want to process this volume of data?
Apache Spark is an open-source distributed general-purpose cluster computing framework that is trending today. But the problem is that how to create a computing cluster fast and efficient? Should I do all network configuration and cluster management myself? What should I do with my cluster if I don't need it anymore? Is my cluster secure?
After discovering Apache Spark principles and use cases, you will discover OVH Analytics Data Compute. A fast, secure, and efficient Spark Cluster as a Service which is going to give answers to all these questions.
OVH Analytics Data Compute - Apache Spark Cluster as a ServiceOVHcloud
You need Apache Spark computation over a big Apache Spark cluster but you don't have computers ?
You don't have enough time to create a cluster of computers and do all installations and configurations ?
You just need a cluster for few hours and not forever ?
Or you just want to try out easily the power of Apache Spark ? Discover OVH Analytics Data Compute!
This Month in Data Science - April EditionVMware Tanzu
During the month of April, the growing impact of Big Data and data-driven insight on our daily lives became increasingly apparent. While pundits debated the merits of this massive sea change in data collection and analysis, its value and results were borne out this month in intriguing and surprising ways, including revealing things like why UPS trucks never turn left and exploring if there are time travelers living among us.
Presented to eRum (Budapest), May 2018
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe the doAzureParallel package, a backend to the "foreach" package that automates the process of spawning a cluster of virtual machines in the Azure cloud to process iterations in parallel. This will include an example of optimizing hyperparameters for a predictive model using the "caret" package.
By David Smith. Presented at Microsoft Build (Seattle), May 7 2018.
Your data scientists have created predictive models using open-source tools, proprietary software, or some combination of both, and now you are interested in lifting and shifting those models to the cloud. In this talk, I'll describe how data scientists can transition their existing workflows — while using mostly the same tools and processes — to train and deploy machine learning models based on open source frameworks to Azure. I'll provide guidance on keeping connections to data sources up-to-date, evaluating and monitoring models, and deploying applications that make use of those models.
Presentation delivered by David Smith to NY R Conference https://www.rstats.nyc/, April 2018:
Minecraft is an open-world creativity game, and a hit with kids. To get kids interested in learning to program with R, we created the "miner" package. This package is a collection of simple functions that allow you to connect with a Minecraft instance, manipulate the world within by creating blocks and controlling the player, and to detect events within the world and react accordingly.
The miner package is intended mainly for kids, to inspire them to learn R while playing Minecraft. But the development of the package also provides some useful insights into how to build an R package to interface with a persistent API, and how to instruct others on its use. In this talk I'll describe how to set up your own Minecraft server, and how to use and extend the package. I'll also provide a few examples of the package in action in a live Minecraft session.
While Python is a widely-used tool for AI development, in this talk I'll make the case for considering R as a platform for developing models for intelligent applications. Firstly, R provides a first-class experience working deep learning frameworks with its keras integration. Equally importantly, it provides the most comprehensive suite of statistical data analysis tools, which are extremely useful for many intelligent applications such as transfer learning. I'll give a few high-level examples in this talk, and we'll go into further detail in the accompanying interactive code lab.
There are many common workloads in R that are "embarrassingly parallel": group-by analyses, simulations, and cross-validation of models are just a few examples. In this talk I'll describe several techniques available in R to speed up workloads like these, by running multiple iterations simultaneously, in parallel.
Many of these techniques require the use of a cluster of machines running R, and I'll provide examples of using cloud-based services to provision clusters for parallel computations. In particular, I will describe how you can use the SparklyR package to distribute data manipulations using the dplyr syntax, on a cluster of servers provisioned in the Azure cloud.
Presented by David Smith at Data Day Texas in Austin, January 27 2018.
A look at the changing perceptions of R, from the early days of the R project to today. Microsoft sponsor talk, presented by David Smith to the useR!2017 conference in Brussels, July 5 2017.
Presented by David Smith at The Data Science Summit, Chicago, April 20 2017.
The ability to independently reproduce results is a critical issue within the scientific community today, and is equally important for collaboration and compliance in business. In this talk, I'll introduce several features available in R that help you make reproducibility a standard part of your data science workflow. The talk will include tips on working with data and files, combining code and output, and managing R's changing package ecosystem.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
R at Microsoft (useR! 2016)
1.
2.
3.
4.
5. Historical black box sensor
records and maintenance events
for many aircraft
Train and compare various models to predict maintenance
events
Scoring rules to predict likely
maintenance events from sensor
data
15. cloud
computing
2011 2016 5x increase
data
science
Universities filling
300,000 US talent gap
90% of the data in the world
today has been created in
the last two years alone
big
data
open
source
including R, Python, Linux, Hadoop,
Spark, …
16. David Smith
R Community Lead
Microsoft
@revodavid
davidsmi@microsoft.com
Revolutions blog
blog.revolutionanalytics.com
17. Thank you!
David Smith
R Community Lead
Microsoft
@revodavid
davidsmi@microsoft.com
Revolutions blog
blog.revolutionanalytics.com
blog.revolutionanalytics.com/2016/02/xbox_usage_trends_r.html
www.microsoft.com/en-us/stories/88acres
powerbi.microsoft.com/en-us/industries/airline
blog.revolutionanalytics.com/2016/03/sql-server-2016-launch.html
studio.azureml.net
github.com/RevolutionAnalytics/AzureML
blog.revolutionanalytics.com/2016/03/scoring-r-models-with-excel.html
www.visualstudio.com/en-us/features/rtvs-vs.aspx
mran.microsoft.com/download
https://www.microsoft.com/en-us/server-cloud/products/sql-server-2016/
Random Forests
Support for R
R packages: AzureML, checkpoint, doParallel, Rhadoop, DeployR Open
Full support for Linux in Azure, and now SQL
Own Linux Distribution
.NET open source