1. Meet Patrick Boily, Director at CQADS
by Roy Sarkar, Rogue Wave Software | October 6, 2016
This interview was originally made avail-
able on Rogue Wave’s Code Buzz blog, at
blog.klocwork.com/.
It has been edited to fit this document’s
format..
We recently spoke with Patrick Boily, manager and senior consultant at the
Centre for Quantitative Analysis and Decision Support (CQADS). The centre
is offering a three day series of data analysis workshops this month, letting
budding data scientists dive into the world of data science, data mining, and
extracting useful insights.
Can you tell us about CQADS and its objectives?
CQADS opened its doors in 2013. We’re housed under Carleton University’s
Faculty of Science but our consultants have mostly been drawn from the ranks
of profs and students at the School of Mathematics and Statistics.
The main raison d’ˆetre for the Centre is that, while the best decisions are those
backed by evidence, the ideal decision-making environment is still rarely met –
scarcity of data often means that we let tradition and instinct take over, while
an excess means that we risk drowning in data.
CQADS has four main objectives:
Providing consulting services and sharing our expertise while solving
real-world and academic problems
Providing funding, training, and experience for graduate students and
postdoctoral researchers
Facilitating collaborations through cross-disciplinary research involving
mathematics and statistics
Stimulating the dissemination of quantitative research through short
courses and seminars
Can you describe the workshop series and what it offers participants?
The 2017 Winter Workshop Series combines the first five of the Centre’s
data analysis workshops (there are 17 in total, see the CQADS Workshop
Catalogue); they cover what I consider to be the bare minimum that a data
scientist or a data analyst should have in their grab bag before they can embark
safe and sound on the data analysis boat:
1. Introduction to Analytics – Preparing and Visualizing Data
An introduction to the notions that must be mastered prior to, and after, em-
barking on data analysis, along with a discussion of common challenges and
pitfalls.
2. Mining for Information Gold – Data Science Concepts
An introduction to the fundamental notions underlying data science, with
a detailed discussion of three common analytical concepts: classification,
clustering, and association rules.
3. Simple Data Science – Exploring Data With R
A practical look at how to extract insight from data using the concepts of
Mining for Information Gold.
(Continued)
Centre for Quantitative Analysis and Decision Support − Carleton University
4332 Herzberg Laboratories − 1125 Colonel By Drive − Ottawa ON, K1S 5B6
cqads@carleton.ca
2. 4. Getting Technical – Data Science Methods
A continuation of Mining for Information Gold, with a detailed discussion of six
commonly-used methods and concepts: support vector machines, artificial neural
networks, naive Bayes classifiers, hierarchical clustering, density-based clustering,
and spectral clustering.
5. Hands-on Data Discovery – Data Analysis With R
A practical look at how to extract insight from data using the methods of Getting
Technical.
Let’s face it, nobody is going to become a data scientist with only three days’
worth of training. I could take any of the topics covered in any of these workshops
and spend three days on them and I still would only be scratching the surface.
It takes time, it takes practice and study to learn how to deal with data in an
insightful manner.
What the workshops are going to give participants is a good look under the data
science hood, without getting lost in details and minutia. There will be some
deeper dives over some technical areas, but we will resurface early enough not
to lose sight of the larger context. There will be opportunities to play with data,
to discuss good practices and case studies, to take a representative snapshot
of the data science landscape (with some noted exceptions, see below). And
there’s a reception on the first night, too!
These are not workshops about mathematical formulas or Big Data. Familiarity
with mathematical notation and concepts will help, but mathematical sophisti-
cation is not required. Neither is programming expertise. Big Data we tackle in
future workshops.
Personal file.
Why do you feel this workshop is necessary when there is plenty of existing literature on data science and
analytics?
In my experience with clients over the years, I’ve noticed that there is no shortage of desire to incorporate data
and analytics in the everyday operations of most organizations. That goal affects employees and stakeholders in
different ways:
Managers are required understand enough about data to be able to ask the right questions of their analysts,
in order to provide them with the right data and tools to succeed, and to hire the right experts in the first
place.
Analysts are faced with different challenges when organizations become data-friendly (especially if the
change is sudden). Engineers, economists, sociologists, psychologists, programmers have all worked with
data at some point in their careers; they’ve heard about support vector machines or neural networks, say.
Learning about these concepts can allow them to remain part of the organization’s vision, so to speak... but
they might find that branching out on their own into the world of data science can be daunting.
Experts, those who design algorithms, those who can sit through a graduate course in pattern recognition
or deep learning without flinching, what they might need is the ability to speak the language of the analysts
and the managers.
These are the main services that I think the workshops provide: a visitors’ map for managers, a translation guide
for experts, and a starting point for analysts whose experience lies in other quantitative domains, all rolled into
one.
It’s much easier to find the right literature and online learning tools with a guide, fellow travellers, and a dictionary.
Page 2 of 4 Meet Patrick Boily, Director at CQADS, by R. Sarkar
3. Personal file.
Tell us a little bit about yourself and your role with CQADS
I’ve come to data science and consulting the long way around. To give you an idea, I only got my first email
address when I started grad school – it just wasn’t that necessary at the time! Imagine living without one now – I
studied pure mathematics and didn’t really play with data until I graduated.
Even though I don’t always understand the appeal of some of the apps flying around, I’m generally glad to see
interest in analytical endeavours grow, and I like to see it done well.
I’ve been the Director and Managing Consultant at CQADS since its opening: over the years, I’ve had the chance
to work on plenty of quantitative projects, providing expertise, and supervising graduate students in operations
research methods, data science and predictive analytics, stochastic and statistical modeling, and simulations. I
also head the training group which is putting together the data analysis workshops. And for the time being, I lead
the workshops (although I am all in favour of incorporating other voices).
Is there a particular achievement that you’re most proud of for CQADS?
Perhaps my proudest achievement at CQADS has been the opportunity to see some of my former prot´eg´es
successfully strike out on their own using the quantitative skills and real-life experience they picked up at the
Centre.
It sounds a bit trite, I know, but there’s a world of difference between learning from a book (or from a course, or
even a workshop!) and being able to derive significant insight from real-world data. It’s a nice transformation to
witness and to be part of.
What are your thoughts on where data science and analytics will
be in five years?
I’m really interested in finding out where we are, currently, on the
hype curve: is the sky still the limit, or are we about to run out of
new and interesting ideas?
I like to use the Standard Model of particle physics as an analogy.
Prior to its establishment, we were discovering new particles left,
right, and centre: it was a jungle out there. Now comes the Standard
Model, you see, and it’s a nice garden (compared to what came
before, at least), and... well, that’s it. It stayed that way for years.
Look, it’s a massive achievement. We found the predicted Higgs Boson; neutrino oscillations fit nicely within
its framework; all in all an A+ of a model. And yet we know that can’t be all there is to it: we don’t know how to
explain the masses of the elementary constituents, the weak mixing angle, and some of the coupling constants,
for instance. It didn’t naturally and organically lead to GUT, or to gigantic conceptual shifts. We’ve become really
good at the Standard Model; paradoxically, its success has become a weakness in the grander scheme of things.
Meet Patrick Boily, Director at CQADS, by R. Sarkar Page 3 of 4
4. To my mind, that’s where we are with data science and analytics. In five years, are we going to be doing roughly
the same things we’re doing now (except quicker), or are we going to be doing completely different things? Five
years ago, who could have predicted that nearly everybody would own a smart phone? Or that Netflix would have
changed the way we had been watching TV for generations? Perhaps some visionaries did, but I for one didn’t: I
thought the phones were a fad that would go the way of laser discs.
My guess is that we’re about to reach the top of the hype curve. Perhaps that’s the old man in me talking. Yes,
we’ll develop tons of new apps, and some of them will be shiny and popular, but how many of them will be game
changers? How many “smart phones” do we have in us over the next 5 years? How many “Google glasses”?
Either way, I’d like to see us do more about the ethics of data science. To my eye, there definitely is an Old West
mentality on that front: we “do” data science because we can. I’ve seen proposals for applications that are –
well, let’s say that I don’t find them very compatible with the hippocratic oath. Perhaps our notions of privacy and
common good and inequities will be radically different in 30 years. But in the next 5 years, they’re not likely to
change much. Will our descendants be able to look at us and say that we were on the right side of history? I do
feel that this discussion gets tabled way too often.
Personal file.
Any advice for organizations wishing to include data analytics as part of their decision making process?
Infrastructure is important: you have to have the right tools. But to get the right results, you have to be able to
analyze the data properly and to get the right data, you have to be able to ask the right questions. And that takes
the right people.
This is crucial: you need the right data analysts and data scientists and data translators. Your people need to be
able to do more than just press a button on an expensive piece of software – they need to understand what’s
going on in the black box.
In the rush to be competitive, sometimes we chose the data specialist who knows the right software, rather than
the right data specialist who could learn the software. That can prove costly in the long run.
Roy Sarkar figured that the best way to learn something is to try and explain it to someone else. After years of explaining things while standing up, he
decided the better approach was to do it while sitting down. Beside a poster of a famous starship. Learning from projects in defense, mobile, and game
development, Roy figured out one more thing: real code isn’t dead but it could be made better. Read more from Roy at CodeBuzz.
Page 4 of 4 Meet Patrick Boily, Director at CQADS, by R. Sarkar