Using Data Effectively: Beyond Art and ScienceC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Qc0N3W.
Hilary Parker talks about approaches and techniques to collect the most useful data, analyze it in a scientific way, and use it most effectively to drive actions and decisions. This talk provides actionable insights about how to apply the lens of design thinking to help use data more effectively in everyday job. Filmed at qconsf.com.
Hilary Parker is a Data Scientist at Stitch Fix where she focuses on what sorts of data to collect from clients in order to optimize clothing recommendations, as well as building out prototypes of algorithms or entirely new products based on new data sources. She is a co-founder of the Not So Standard Deviations podcast, a bi-weekly data science podcast that has over half a million downloads.
Traps detection during migration of C and C++ code to 64-bit WindowsPVS-Studio
Appearance of 64-bit processors on PC market made developers face the task of converting old 32-bit applications for new platforms. After the migration of the application code it is highly probable that the code will work incorrectly. This article reviews questions related to software verification and testing. It also concerns difficulties a developer of 64-bit Windows application may face and the ways of solving them.
Using Data Effectively: Beyond Art and ScienceC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2Qc0N3W.
Hilary Parker talks about approaches and techniques to collect the most useful data, analyze it in a scientific way, and use it most effectively to drive actions and decisions. This talk provides actionable insights about how to apply the lens of design thinking to help use data more effectively in everyday job. Filmed at qconsf.com.
Hilary Parker is a Data Scientist at Stitch Fix where she focuses on what sorts of data to collect from clients in order to optimize clothing recommendations, as well as building out prototypes of algorithms or entirely new products based on new data sources. She is a co-founder of the Not So Standard Deviations podcast, a bi-weekly data science podcast that has over half a million downloads.
Traps detection during migration of C and C++ code to 64-bit WindowsPVS-Studio
Appearance of 64-bit processors on PC market made developers face the task of converting old 32-bit applications for new platforms. After the migration of the application code it is highly probable that the code will work incorrectly. This article reviews questions related to software verification and testing. It also concerns difficulties a developer of 64-bit Windows application may face and the ways of solving them.
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
Adaptation of the technology of the static code analyzer for developing paral...PVS-Studio
In the article the question of use of the static code analyzers in modern parallel program development processes is considered. Having appeared in 70-80s as an addition to compilers, the static analyzers stopped to be popular with the developers in 90s. The reason was probably the increase of the errors diagnostics quality by the compilers. But in 2000s the interest to the static code analyzers started to increase again. It is explained by the fact that new static code analyzers were created, which started to detect quite difficult errors in programs. If the static code analyzers of the past made it possible, for example, to detect an uninitialized variable, modern static code analyzers tend to detect an unsafe access to data from several threads. The modern tendency of static code analyzers development became their use for diagnosing errors in parallel programs. In the work the situations are considered, where the use of such tools makes it possible to considerably simplify the process of creating parallel program solutions.
Regular use of static code analysis in team developmentPVS-Studio
Static code analysis technologies are used in companies with mature software development processes. However, there might be different levels of using and introducing code analysis tools into a development process: from manual launch of an analyzer "from time to time" or when searching for hard-to-find errors to everyday automatic launch or launch of a tool when adding new source code into the version control system.
Regular use of static code analysis in team developmentPVS-Studio
Static code analysis technologies are used in companies with mature software development processes. However, there might be different levels of using and introducing code analysis tools into a development process: from manual launch of an analyzer "from time to time" or when searching for hard-to-find errors to everyday automatic launch or launch of a tool when adding new source code into the version control system.
The article discusses different levels of using static code analysis technologies in team development and shows how to "move" the process from one level to another. The article refers to the PVS-Studio code analyzer developed by the authors as an example.
Regular use of static code analysis in team developmentAndrey Karpov
Static code analysis technologies are used in companies with mature software development processes. However, there might be different levels of using and introducing code analysis tools into a development process: from manual launch of an analyzer "from time to time" or when searching for hard-to-find errors to everyday automatic launch or launch of a tool when adding new source code into the version control system.
The article discusses different levels of using static code analysis technologies in team development and shows how to "move" the process from one level to another. The article refers to the PVS-Studio code analyzer developed by the authors as an example.
The article observes some questions related to testing the 64-bit software. Some difficulties which a developer of resource-intensive 64-bit applications may face and the ways to overcome them are described.
Static analysis as part of the development process in Unreal EnginePVS-Studio
Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in the code that a programmer wants to identify as early as possible. One of the ways to reduce the number of errors is the use of a static analyzer like PVS-Studio. Moreover, the analyzer is not only evolving, but also constantly learning to look for new error patterns, some of which we will discuss in this article. If you care about code quality, this article is for you.
What is Rated Ranking Evaluator and how to use it (for both Software Engineer and IT Manager). Talk made during Chorus Workshops at Plainschwarz Salon.
Adaptation of the technology of the static code analyzer for developing paral...PVS-Studio
In the article the question of use of the static code analyzers in modern parallel program development processes is considered. Having appeared in 70-80s as an addition to compilers, the static analyzers stopped to be popular with the developers in 90s. The reason was probably the increase of the errors diagnostics quality by the compilers. But in 2000s the interest to the static code analyzers started to increase again. It is explained by the fact that new static code analyzers were created, which started to detect quite difficult errors in programs. If the static code analyzers of the past made it possible, for example, to detect an uninitialized variable, modern static code analyzers tend to detect an unsafe access to data from several threads. The modern tendency of static code analyzers development became their use for diagnosing errors in parallel programs. In the work the situations are considered, where the use of such tools makes it possible to considerably simplify the process of creating parallel program solutions.
Regular use of static code analysis in team developmentPVS-Studio
Static code analysis technologies are used in companies with mature software development processes. However, there might be different levels of using and introducing code analysis tools into a development process: from manual launch of an analyzer "from time to time" or when searching for hard-to-find errors to everyday automatic launch or launch of a tool when adding new source code into the version control system.
Regular use of static code analysis in team developmentPVS-Studio
Static code analysis technologies are used in companies with mature software development processes. However, there might be different levels of using and introducing code analysis tools into a development process: from manual launch of an analyzer "from time to time" or when searching for hard-to-find errors to everyday automatic launch or launch of a tool when adding new source code into the version control system.
The article discusses different levels of using static code analysis technologies in team development and shows how to "move" the process from one level to another. The article refers to the PVS-Studio code analyzer developed by the authors as an example.
Regular use of static code analysis in team developmentAndrey Karpov
Static code analysis technologies are used in companies with mature software development processes. However, there might be different levels of using and introducing code analysis tools into a development process: from manual launch of an analyzer "from time to time" or when searching for hard-to-find errors to everyday automatic launch or launch of a tool when adding new source code into the version control system.
The article discusses different levels of using static code analysis technologies in team development and shows how to "move" the process from one level to another. The article refers to the PVS-Studio code analyzer developed by the authors as an example.
The article observes some questions related to testing the 64-bit software. Some difficulties which a developer of resource-intensive 64-bit applications may face and the ways to overcome them are described.
Static analysis as part of the development process in Unreal EnginePVS-Studio
Unreal Engine continues to develop as new code is added and previously written code is changed. What is the inevitable consequence of ongoing development in a project? The emergence of new bugs in the code that a programmer wants to identify as early as possible. One of the ways to reduce the number of errors is the use of a static analyzer like PVS-Studio. Moreover, the analyzer is not only evolving, but also constantly learning to look for new error patterns, some of which we will discuss in this article. If you care about code quality, this article is for you.
Similar to Opinionated Analysis Development -- EARL SF Keynote (20)
Using Data Effectively: Beyond Art and ScienceHilary Parker
Data is the lifeblood of every organization. Whatever our job title, each of us uses data to get our job done -- from observing a running system to improving performance to building a machine-learned model. This talk is about approaches and techniques to collect the most useful data we can, analyze it in a scientific way, and use it most effectively to drive actions and decisions.
Is using data effectively an art or a science? It is both. The “art” helps us decide the “right” way to approach an analysis or an algorithm. The “science” applies statistical rigor to our inferences. But with only the art and the science, we miss something critically important. In this talk, I suggest that, beyond both art and science, the fundamental questions we need to ask of our data should be informed by the field of design and design thinking. Every designer needs to understand their intended user, whether they are designing a physical object, experimenting on a recommendation system, or making a launch decision about a product. That focus on the why -- instead of just the how and the what -- takes us to the next level.
This talk with leave you with actionable insights about how to apply the lens of design thinking to help you use data more effectively in your everyday job.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
29. There are known, common problems when
developing an analysis
Making these problems easy to avoid frees up time
for the creativity of developing the narrative
30. You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
You change code but don’t re-execute it, so the results are out of date.
You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
A second analyst can’t understand your code.
Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Your data becomes corrupted, but you don’t notice.
You use a new dataset that renders your statistical methods invalid, but you don’t notice.
You make a mistake in your code.
You use inefficient code.
A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
31. You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
You change code but don’t re-execute it, so the results are out of date.
You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
A second analyst can’t understand your code.
Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Your data becomes corrupted, but you don’t notice.
You use a new dataset that renders your statistical methods invalid, but you don’t notice.
You make a mistake in your code.
You use inefficient code.
A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
Reproducible
Accurate
Collaborative
35. The person did not want to make an error, and
acted in a way that she thought wouldn’t create
an error.
The current system failed the person with good
intentions
49. • Identify the incident, record what happened
• Change the system to make it hard for a person acting in
good faith to have the incident occur
• Use outside resources and best practices to inform
decisions
• Make trade-offs
54. ”
“
— John Allspaw
An engineer who thinks they’re going to be reprimanded is disincentivized to give the
details necessary to get an understanding of the mechanism, pathology, and
operation of the failure.
https://codeascraft.com/2012/05/22/blameless-postmortems/
55. You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
You change code but don’t re-execute it, so the results are out of date.
You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
A second analyst can’t understand your code.
Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Your data becomes corrupted, but you don’t notice.
You use a new dataset that renders your statistical methods invalid, but you don’t notice.
You make a mistake in your code.
You use inefficient code.
A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
57. Executable analysis scripts You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
Defined dependencies An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date.
Version control (individual) You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
Code review A second analyst can’t understand your code.
Modular, tested code Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Assertive testing of data, Your data becomes corrupted, but you don’t notice.
assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice.
Code review You make a mistake in your code.
You use inefficient code.
Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
Issue tracking You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
58. Executable analysis scripts You re-run the analysis and get different results.
Someone else can’t repeat the analysis.
You can’t re-run the analysis on different data.
Defined dependencies An external library you’re using is updated, and you can’t recreate the results.
You change code but don’t re-run downstream code, so the results aren’t consistently updated.
Watchers for changed code and data You change code but don’t re-execute it, so the results are out of date.
Version control (individual) You update your analysis and can’t compare it to previous results.
You can’t point to code changes that resulted in a different analysis results.
Code review A second analyst can’t understand your code.
Modular, tested code Can you re-use logic in different parts of the analysis?
You change the logic used in analysis, but only in some places where it’s implemented.
Your code is not performing as expected, but you don’t notice.
Assertive testing of data, Your data becomes corrupted, but you don’t notice.
assumptions and results You use a new dataset that renders your statistical methods invalid, but you don’t notice.
Code review You make a mistake in your code.
You use inefficient code.
Version Control (collaborative) A second analyst wants to contribute code to the analysis, but can’t do so.
Two analysts want to combine code but cannot.
Issue tracking You aren’t able to track and communicate known next steps in your analysis.
Your collaborators can only make requests in email or meetings, and they aren’t incorporated into the project.
“Opinions”
62. An external library you’re using is updated, and
you can’t recreate the results.
You change code but don’t re-run downstream
code, so the results aren’t consistently updated.
66. You can’t re-use logic in different parts of the
analysis.
You change the logic used in analysis, but only
in some places where it’s implemented.
Your code is not performing as expected, but
you don’t notice.
68. You update your analysis and can’t compare it
to previous results.
You can’t point to code changes that resulted in
a different analysis results.
Two analysts want to combine code but cannot.
74. You aren’t able to track and communicate
known next steps in your analysis.
Your collaborators can only make requests in
email or meetings, and they aren’t incorporated
into the project.
75.
76. WHY DID YOU SPEND MOST OF
YOUR KEYNOTE TIME ON
PROCESS?
79. ”
“
— a friend at his birthday party
HILARY, I NEVER FIGHT WITH
ANYONE ELSE EXCEPT FOR YOU
80. ”
“
–John Allspaw
A funny thing happens when engineers make mistakes and feel safe when giving
details about it: they are not only willing to be held accountable, they are also
enthusiastic in helping the rest of the company avoid the same error in the future.
They are, after all, the most expert in their own error. They ought to be heavily
involved in coming up with remediation items.
https://codeascraft.com/2012/05/22/blameless-postmortems/
85. ”
“
“You should be using GitHub.”
“If this is a project where you want to compare past and future results, you might
want to use version control. GitHub is a good option, and RStudio integrates well with
it.”
–actually helpful coworker
87. ”
“
“Excel is the worst.”
“Excel does make it hard to create scripts. If you want to redo your analysis
frequently, writing a script will take time now but save you lots of time later.”
–actually helpful coworker
88. Shift from “I know better than you” to “lots of people
have run into this problem before, and here’s
software that makes a solution easy”
90. ”
“
— Stuart Eccles
Opinionated Software is a software product that believes a certain way of
approaching a business process is inherently better and provides software crafted
around that approach.
https://medium.com/@stueccles/the-rise-of-opinionated-software-ca1ba0140d5b#.by7aq2c3c
91. ”
“
–David Heinemeier Hansson, creator of Ruby on Rails
It’s a strong disagreement with the conventional wisdom that everything should be
configurable, that the framework should be impartial and objective. In my mind,
that’s the same as saying that everything should be equally hard.