This document provides an example of using Sweave to combine R code and output to generate a self-documenting script. It shows code to scrape cancer data from a website, inspect and convert the data types, and generate simple scatter plots of the number of cancer cases by population. Tables and figures are included in the output to demonstrate the code and results.
Montenegrin translation of my Writing for the Web presentation by Andrijana Rabrenovic, Senior lecturer of the Faculty of
Political Sciences in Podgorica, Montenegro
Robust parametric classification and variable selection with minimum distance...echi99
We present a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of L1 regularized model fitting has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. This is an asymptotically efficient procedure under very general conditions - provided that the model is specified correctly. Correctly specifying a model, however, is not trivial. Even a few outliers among data drawn from an otherwise pure sample of data can result in a very poor model. In contrast, minimizing the integrated square error, while less efficient, proves to be robust to a fair amount of contamination. We propose to fit logistic models using this alternative criterion to address the possibility of model misspecification. The resulting method may be considered a robust variant of regularized maximum likelihood methods for high dimensional data.
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
Montenegrin translation of my Writing for the Web presentation by Andrijana Rabrenovic, Senior lecturer of the Faculty of
Political Sciences in Podgorica, Montenegro
Robust parametric classification and variable selection with minimum distance...echi99
We present a robust solution to the classification and variable selection problem when the dimension of the data, or number of predictor variables, may greatly exceed the number of observations. When faced with the problem of classifying objects given many measured attributes of the objects, the goal is to build a model that makes the most accurate predictions using only the most meaningful subset of the available measurements. The introduction of L1 regularized model fitting has inspired many approaches that simultaneously do model fitting and variable selection. If parametric models are employed, the standard approach is some form of regularized maximum likelihood estimation. This is an asymptotically efficient procedure under very general conditions - provided that the model is specified correctly. Correctly specifying a model, however, is not trivial. Even a few outliers among data drawn from an otherwise pure sample of data can result in a very poor model. In contrast, minimizing the integrated square error, while less efficient, proves to be robust to a fair amount of contamination. We propose to fit logistic models using this alternative criterion to address the possibility of model misspecification. The resulting method may be considered a robust variant of regularized maximum likelihood methods for high dimensional data.
We are from the internet - we know the value of open source. Hardware and storage is unfortunately real, but you can outsource it all. This talk will guide you through how to exploit cloud computing today to make you happier and more efficient.
Portuguese Market and On-board Sampling Effort ReviewErnesto Jardim
Accurate and precise estimation of discards is a major objective of data collection programs throughout the world. Discard reduction is also a major topic of the new Common Fisheries Policy (CFP) and the future Data Collection Multi-Annual Programme (DC-MAP). Using data from the Portuguese on-board observer programme that samples two otter trawl fisheries in ICES Division IXa, we compare two different approaches for estimating the sampling effort required to attain "assessment grade" discard estimates: a model-based approach (exponential-decay models) and a probability-based approach (based on classic sampling theory). We show that both approaches attain comparable sample size estimates and that the sample size required to attain precision objectives
varies across species and across fisheries being likely influenced by discard motifs. We demonstrate that sampling levels at least two fold higher than the present sampling levels would be required to attain the precision levels set in the current Data Collection Framework (DCF). We discuss the implications of these results in light of the future ability of European onboard sampling programs to detect, e.g., progressive reductions in discard levels.
Portuguese Market and On-board Sampling Effort ReviewErnesto Jardim
Accurate and precise estimation of discards is a major objective of data collection programs throughout the world. Discard reduction is also a major topic of the new Common Fisheries Policy (CFP) and the future Data Collection Multi-Annual Programme (DC-MAP). Using data from the Portuguese on-board observer programme that samples two otter trawl fisheries in ICES Division IXa, we compare two different approaches for estimating the sampling effort required to attain "assessment grade" discard estimates: a model-based approach (exponential-decay models) and a probability-based approach (based on classic sampling theory). We show that both approaches attain comparable sample size estimates and that the sample size required to attain precision objectives
varies across species and across fisheries being likely influenced by discard motifs. We demonstrate that sampling levels at least two fold higher than the present sampling levels would be required to attain the precision levels set in the current Data Collection Framework (DCF). We discuss the implications of these results in light of the future ability of European onboard sampling programs to detect, e.g., progressive reductions in discard levels.
Slides (currently unannotated) to support the "Preparing for the Future: Technological Challenges and Beyond" workshop presented with Brian Kelly - http://ukwebfocus.com/events/ili-2015-preparing-for-the-future/
Note - slideshare seems to have messed up the conversion - some slides are (unintentionally) blank....
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Example sweavefunnelplot
1. 1 Example of self-documenting data journalism notes
This is an example of using Sweave to combine code and output from the R statistical programming
environment and the LaTeX document processing environment to generate a self-documenting
script in which the actual code used to do stats and generate statistical graphics is displayed along
the charts it directly produces.
1.1 Getting Started...
The aim is to try to replicate a graphic included by Ben Goldacre in his article DIY statistical
analysis: experience the thrill of touching real data 1 .
> # The << echo = T >>= identifies an R code region;
> # echo=T means run the code, and print what happens when it's run
> # In the code area, lines beginning with a # are comment lines and are not executed
>
> #First, we need to load in the XML library that contains the scraper function
> library(XML)
> #Now we scrape the table
> srcURL='http://www.guardian.co.uk/commentisfree/2011/oct/28/bad-science-diy-data-analysis'
> cancerdata=data.frame(
+ readHTMLTable( srcURL, which=1, header=c('Area','Rate','Population','Number') ) )
>
> #The @ symbol on its own at the start of a line marks the end of a code block
The format is simple: readHTMLTable(url,which=TABLENUMBER) (TABLENUMBER is used to
extract the N’th table in the page.) The header part labels the columns (the data pulled in from
the HTML table itself contains all sorts of clutter).
We can inspect the data we’ve imported as follows:
> #Look at the whole table (the whole table is quite long,
> # so donlt disply it/comment out the command for now instead.
> #cancerdata
> #If you are using RStudio, you can inspect the data using the command: View(cancerdata))
> #Look at the column headers
> names(cancerdata)
[1] "Area" "Rate" "Population" "Number"
> #Look at the first 10 rows
> head(cancerdata)
Area Rate Population Number
1 Shetland Islands 19.15 31332 6
2 Limavady 21.49 32573 7
3 Ballymoney 17.05 35191 6
4 Orkney Islands 29.87 36826 11
5 Larne 27.54 39942 11
6 Magherafelt 15.26 45872 7
> #Look at the last 10 rows
> tail(cancerdata)
1 http://www.guardian.co.uk/commentisfree/2011/oct/28/bad-science-diy-data-analysis
1
2. Area Rate Population Number
374 Wiltshire 18.69 727662 136
375 Sheffield 16.9 757396 128
376 Durham 17.29 786582 136
377 Leeds 17.3 959538 166
378 Cornwall 15.44 1062176 164
379 Birmingham 19.78 1268959 251
> #What sort of datatype is in the Number column?
> class(cancerdata$Number)
[1] "factor"
The last line, class(cancerdata$Number), identifies the data as type factor. In order to
do stats and plot graphs, we need the Number, Rate and Population columns to contain actual
numbers. (Factors organise data according to categories; when the table is loaded in, the data is
loaded in as strings of characters; rather than seeing each number as a number, it’s identified as
a category.) The
> #Convert the numerical columns to a numeric datatype
> cancerdata$Rate =
+ as.numeric(levels(cancerdata$Rate)[as.numeric(cancerdata$Rate)])
> cancerdata$Population =
+ as.numeric(levels(cancerdata$Population)[as.integer(cancerdata$Population)])
> cancerdata$Number =
+ as.numeric(levels(cancerdata$Number)[as.integer(cancerdata$Number)])
> a˘
#Just check it worked^Ae
> class(cancerdata$Number)
[1] "numeric"
> class(cancerdata$Rate)
[1] "numeric"
> class(cancerdata$Population)
[1] "numeric"
> head(cancerdata)
Area Rate Population Number
1 Shetland Islands 19.15 31332 6
2 Limavady 21.49 32573 7
3 Ballymoney 17.05 35191 6
4 Orkney Islands 29.87 36826 11
5 Larne 27.54 39942 11
6 Magherafelt 15.26 45872 7
We can now plot the data as a simple scatterplot using the plot command (figure 1) or we
can add a title to the graph and tweak the axis labels (figure 2).
The plot command is great for generating quick charts. If we want a bit more control over
the charts we produce, the ggplot2 library is the way to go. (ggplot2 isn’t part of the standard R
bundle, so you’ll need to install the package yourself if you haven’t already installed it. In RStudio,
find the Packages tab, click Install Packages, search for ggplot2 and then install it, along with its
dependencies...). You can see the sort of chart ggplot creates out of the box in figure 3.
2
6. 1.2 Generating the Funnel Plot
Doing a bit of searching for the “funnel plot” chart type used to display the data in Goldacre’s
article, I came across a post on Cross Validated, the Stack Overflow/Stack Exchange site dedicated
to statistics related Q&A: How to draw funnel plot using ggplot2 in R? 2
The meta-analysis answer seemed to produce the similar chart type, so I had a go at cribbing
the code, with confidence limits set at the 95% and 99.9% levels. Note that I needed to do a couple
of things:
1. work out what values to use where! I did this by looking at the ggplot code to see what
was plotted. p was on the y-axis and should be used to present the death rate. The data
provides this as a rate per 100,000, so we need to divide by 100, 000 to make it a rate in the
range 0..1. The x-axis is the population.
2. change the range and width of samples used to create the curves
3. change the y-axis range.
You can see the result in figure 3.
2 http://stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r/5210#
5210
6
7. > #TH: funnel plot code from:
> #stats.stackexchange.com/questions/5195/how-to-draw-funnel-plot-using-ggplot2-in-r/5210#5210
> #TH: Use our cancerdata
> number=cancerdata$Population
> #TH: The rate is given as a 'per 100,000' value, so normalise it
> p=cancerdata$Rate/100000
> p.se <- sqrt((p*(1-p)) / (number))
> df <- data.frame(p, number, p.se, Area=cancerdata$Area)
> ## common effect (fixed effect model)
> p.fem <- weighted.mean(p, 1/p.se^2)
> ## lower and upper limits for 95% and 99.9% CI, based on FEM estimator
> #TH: I'm going to alter the spacing of the samples used to generate the curves
> number.seq <- seq(1000, max(number), 1000)
> number.ll95 <- p.fem - 1.96 * sqrt((p.fem*(1-p.fem)) / (number.seq))
> number.ul95 <- p.fem + 1.96 * sqrt((p.fem*(1-p.fem)) / (number.seq))
> number.ll999 <- p.fem - 3.29 * sqrt((p.fem*(1-p.fem)) / (number.seq))
> number.ul999 <- p.fem + 3.29 * sqrt((p.fem*(1-p.fem)) / (number.seq))
> dfCI <- data.frame(number.ll95, number.ul95, number.ll999, number.ul999, number.seq, p.fem)
> ## draw plot
> #TH: note that we need to tweak the limits of the y-axis
> fp <- ggplot(aes(x = number, y = p), data = df) +
+ geom_point(shape = 1) +
+ geom_line(aes(x = number.seq, y = number.ll95), data = dfCI) +
+ geom_line(aes(x = number.seq, y = number.ul95), data = dfCI) +
+ geom_line(aes(x = number.seq, y = number.ll999, linetype = 2), data = dfCI) +
+ geom_line(aes(x = number.seq, y = number.ul999, linetype = 2), data = dfCI) +
+ geom_hline(aes(yintercept = p.fem), data = dfCI) +
+ xlab("Population") + ylab("Bowel cancer death rate") + theme_bw()
> #Automatically set the maximum y-axis value to be just a bit larger than the max data value
> fp=fp+scale_y_continuous(limits = c(0,1.1*max(p)))
> #Label the outlier point
> fp=fp+geom_text(aes(x = number, y = p,label=Area),size=3,data=subset(df,p>0.0003))
> print(fp)
Glasgow City
q
0.00030 q
q q
qq
q
q
qq q
0.00025 q qq q q
qq
qq qq
q q
qq q q
q
q qq
q q q qq q
Bowel cancer death rate
q q q
q q q q
q q q qq q q q
q q qq q q q q
q q
q q
q q q q q
q q q
q q q q qq q q qq q
0.00020 qq q q
q qqqq q
qq q q
qq q q qq qq q q
q qq qqq q q q q
q
q
q
q q
q
q q
q
q
q
q q
q q q qq q q q q q q qq q
q q q qqq
q q q qq q q
q q qq qqq q q
q
q qqq
q
qqq
qq q q qq q q q q q q q q q
q q qqq qq q q
q q
q
q q q
q qq q q q q q q q
q qq
q qq qq q
qqq
q q q qq qqqqq q q qq
q qq q qq
q q q q
q
q q q
q q qqq q q q
q q q q q
0.00015 q
qq q q qq q
qqq q qqq
qq q q q
qq q
qq q q qq qqq q q
q
qqqq q
qq q q
q qq q q
qq q q
q q
q qqqq
q qq
q qq
q q q q
q q
q q
q
q
0.00010 q
qq
q
q
q q
0.00005
7
0.00000
200000 400000 600000 800000 1000000 1200000
Population