Tom Liptrot developed an R package called predictshine to create interactive Shiny apps for exploring predictive model results. Predictshine allows users to input new data values to predict outcomes from existing regression models. It includes S3 methods for common model types like glm and functions for building dynamic user interfaces. Liptrot recommends creating packages and using version control like Git to share work, which he found helped him learn. He demonstrated predictshine by building an app to predict life satisfaction scores based on a well-being survey.
Text Mining with R for Social Science ResearchRyan Wesslen
Text Mining workshop using R for Project Mosaic, UNC Charlotte's Social Science Research Initiative. The workshop analyzes the Federalist Papers and the Federal Reserve Beige Book with applications in text classification, topic modeling and sentiment analysis.
Text Mining with R for Social Science ResearchRyan Wesslen
Text Mining workshop using R for Project Mosaic, UNC Charlotte's Social Science Research Initiative. The workshop analyzes the Federalist Papers and the Federal Reserve Beige Book with applications in text classification, topic modeling and sentiment analysis.
MeasureCamp 9.
Just to note that the contents of slide 4 are from Mark Edmondson's http://rpubs.com/MarkeD/r-in-digital-analytics-workflow deck from MeasureCamp V. Forgot to add a citation before uploading.
This portfolio describes my data analysis skill using text mining in R to analyse text datasets consisting of numerous medical publications. I need to identify certain keywords from the abstracts from each publication that lead to clinical or non-clinical publications
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng Richard Sheng
Based on an Analytics Week article of the Top 200 Influencers in Big Data and Analytics, I used R and Hadoop to analyze the Twitter Feeds of these leaders with Text Mining, Web Scraping and Visualization techniques.
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
Natural Language Processing in R (rNLP)fridolin.wild
The introductory slides of a workshop given to the doctoral school at the Institute of Business Informatics of the Goethe University Frankfurt. The tutorials are available on http://crunch.kmi.open.ac.uk/w/index.php/Tutorials
Presented: Thursday, February 14, 2013
Presenter: Joseph Rickert, Technical Marketing Manager, Revolution Analytics
We at Revolution Analytics are often asked “What is the best way to learn R?” While acknowledging that there may be as many effective learning styles as there are people we have identified three factors that greatly facilitate learning R. For a quick start:
Find a way of orienting yourself in the open source R world
Have a definite application area in mind
Set an initial goal of doing something useful and then build on it
In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will:
Provide an orientation to R’s data mining resources
Show how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data.
Show the simple R commands to accomplish these same tasks without the GUI
Demonstrate how to build on these fundamental skills to gain further competence in R
Move away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR
Data scientists and analysts using other statistical software as well as students who are new to data mining should come away with a plan for getting started with R.
Kebutuhan Sentiment Analysis
Text Mining untuk Sentiment Analysis
Pengolahan kata Text Mining menggunakan Machine Learning
Studi Kasus Sentiment Analysis
These notes review fitting GLMs to aggregate data. Binomial, Poisson and Negative Binomial models are shown, with a few others. I also cover how to implement Moran Eigenvector filtering in a GLM. All data are for mortality rates for the state of Texas from the CDC Wonder.
MeasureCamp 9.
Just to note that the contents of slide 4 are from Mark Edmondson's http://rpubs.com/MarkeD/r-in-digital-analytics-workflow deck from MeasureCamp V. Forgot to add a citation before uploading.
This portfolio describes my data analysis skill using text mining in R to analyse text datasets consisting of numerous medical publications. I need to identify certain keywords from the abstracts from each publication that lead to clinical or non-clinical publications
Twitter Text Mining with Web scraping, R, Shiny and Hadoop - Richard Sheng Richard Sheng
Based on an Analytics Week article of the Top 200 Influencers in Big Data and Analytics, I used R and Hadoop to analyze the Twitter Feeds of these leaders with Text Mining, Web Scraping and Visualization techniques.
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
Natural Language Processing in R (rNLP)fridolin.wild
The introductory slides of a workshop given to the doctoral school at the Institute of Business Informatics of the Goethe University Frankfurt. The tutorials are available on http://crunch.kmi.open.ac.uk/w/index.php/Tutorials
Presented: Thursday, February 14, 2013
Presenter: Joseph Rickert, Technical Marketing Manager, Revolution Analytics
We at Revolution Analytics are often asked “What is the best way to learn R?” While acknowledging that there may be as many effective learning styles as there are people we have identified three factors that greatly facilitate learning R. For a quick start:
Find a way of orienting yourself in the open source R world
Have a definite application area in mind
Set an initial goal of doing something useful and then build on it
In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will:
Provide an orientation to R’s data mining resources
Show how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data.
Show the simple R commands to accomplish these same tasks without the GUI
Demonstrate how to build on these fundamental skills to gain further competence in R
Move away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR
Data scientists and analysts using other statistical software as well as students who are new to data mining should come away with a plan for getting started with R.
Kebutuhan Sentiment Analysis
Text Mining untuk Sentiment Analysis
Pengolahan kata Text Mining menggunakan Machine Learning
Studi Kasus Sentiment Analysis
These notes review fitting GLMs to aggregate data. Binomial, Poisson and Negative Binomial models are shown, with a few others. I also cover how to implement Moran Eigenvector filtering in a GLM. All data are for mortality rates for the state of Texas from the CDC Wonder.
The data visualization with R:An Example- Visualizing obesity across united s...Dr. Volkan OBAN
The data visualization with R:An Example- Visualizing obesity across united states by using data from wikipedia.
source:http://www.datasciencecentral.com/profiles/blog/show?id=6448529:BlogPost:440732&utm_content=buffer974dd&utm_medium=social&utm_source=linkedin.com&utm_campaign=buffer
What's new in Vaadin 8, how do you upgrade, and what's next?Marcus Hellberg
Presentation slides for the Vaadin 8 release meetup. Covers the new features in Vaadin 8 and 8.1, the migration steps from Vaadin 7 and what's up ahead for Vaadin.
On the development and distribution of R packagesTom Mens
In this presentation at IWSECO-WEA 2015 (Dubrovnik, Croatia, 8 September 2015) we present the ecosystem of software packages for R, one of the most popular environments for statistical computing today. We empirically study how R packages are developed and distributed on different repositories: CRAN, BioConductor, R-Forge and GitHub. We also explore the role and size of each repository, the inter-repository dependencies, and how these repositories grow over time. With this analysis, we provide a deeper insight into the extent and the evolution of the R package ecosystem.
Agile Data Science 2.0 covers the theory and practice of applying agile methods to the practice of applied analytics research called data science. The book takes the stance that data products are the preferred output format for data science teams to effect change in an organization. Accordingly, we show how to "get meta" to enable agility in building applications describing the applied research process itself. Then we show how to use 'big data' tools to iteratively build, deploy and refine analytics applications. Tracking data-product development through the five stages of the "data value pyramid", we show you how to build applications from conception through development through deployment and then through iterative improvement. Application development is a fundamental skill for a data scientist, and by publishing your data science work as a web application, we show you how to effect maximal change within your organization.
Technologies covered include Python, Apache Spark (Spark MLlib, Spark Streaming), Apache Kafka, MongoDB, ElasticSearch and Apache Airflow.
An Interactive Introduction To R (Programming Language For Statistics)Dataspora
This is an interactive introduction to R.
R is an open source language for statistical computing, data analysis, and graphical visualization.
While most commonly used within academia, in fields such as computational biology and applied statistics, it is gaining currency in industry as well – both Facebook and Google use R within their firms.
Kaggle talk series top 0.2% kaggler on amazon employee access challengeVivian S. Zhang
NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc,NYC, Machine learning, Kaggle, amazon employee access challenge
Agile Data Science 2.0 (O’Reilly 2017) defines a methodology and a software stack with which to apply the methods. The methodology seeks to deliver data products in short sprints by going meta and putting the focus on the applied research process itself. The stack is but an example of one meeting the requirements that it be utterly scalable and utterly efficient in use by application developers as well as data engineers. It includes everything needed to build a full-blown predictive system: Apache Spark, Apache Kafka, Apache Incubating Airflow, MongoDB, ElasticSearch, Apache Parquet, Python/Flask, JQuery. This talk will cover the full lifecycle of large data application development and will show how to use lessons from agile software engineering to apply data science using this full-stack to build better analytics applications.
This presentation educates you about R - Decision Tree, Examples of use of decision tress with basic syntax, Input Data and out data with chart.
For more topics stay tuned with Learnbay.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
5. Shinyby
Rstudio
A web application framework for R
Turn your analyses into interactive web applications
No HTML, CSS, or JavaScript knowledge required
14. 5 Steps to make an R package
1.install devtools and Rtools
2.devtools::create(‘path’, ‘name’)
3. save R scripts in the /R folder
4.devtools::use_package("shiny")
5.devtools::load_all()
6.devtools::install()
see here http://r-pkgs.had.co.nz/
16. install predictshine from github
install_github('tomliptrot/predictshine')
library(predictshine)
see https://github.com/tomliptrot/predictshine
17. Demo
library(predictshine)
data(well_being)
lm_1 = lm(overall_sat ~ age2 * region + sex + married + age2 * eductaion + ethnicity + health , data =
well_being)
predictshine(lm_1,
page_title = 'Happiness in the UK',
variable_descriptions = c('Age', "Region", 'Sex','Marital status',
"What is the highest level of qualification?",
"Ethnicity White/Other",
"How is your health in general?" ),
main = 'Overall, how satisfied are you with your life nowadays?',
xlab = 'predicted score out of 10',
description = p('Alter variables to get predicted overall life satisfaction (out of 10).
This model is made using data from the 1,000 respondents of the ONS Opinions Survey,
Well-Being Module, April 2011'))
18. Sharing
send r file
LAN using runApp(
shinyapps.io - still working on this
https://tomliptrot.shinyapps.io/international_jobs
/
19. Thanks
predictshine is
my first R package
learnt a lot - thanks
to R user group
please try it out!
https://github.com/tomliptrot/predictshine
tomliptrot@gmail.com