This presentation was provided by Johnny Boursiquot of Skilltype, during the NISO event "Transforming Search: What the Information Community Can and Should Build." The virtual conference was held on August 26, 2020.
Boursiquot "Privacy and The Effective Search Experience"
1. Responsible Intelligence
Johnny Boursiquot
Hello everyone.
I'd like to take you on a brief journey,
one that led me to some profound shifts in the way I look at the use of technology to solve people's problems.
2. As a user, I would like to find relevant and quality
content for my professional development, so that I
can advance my career.
Those of you who have worked in software development or helped to define software product roadmaps will find the structure of this user story quite familiar.
It is a short description of a feature and its benefits, told from the perspective of the type of user it is intended to benefit.
3. As engineers our view of the world can be quite simple. We see technical problems that need technical solutions. It's our comfort zone. Allow me to demonstrate.
4. As a user, I would like to find relevant and quality
content for my professional development, so that I
can advance my career.
Identity Search Relevance Ratings
A cloud technologist picking this story from the backlog may derive the following needs from it:
- User's identity - we need to know who's using the software
- Content Search - we need to index content and make it searchable. We'll want to suggest corrections for most commonly misspelled words too to make the search
experience more useful.
- Content Relevance - we need to understand what's relevant to this specific user so we'll use previous behavior to find out what they like
- Content Quality / Rating - Quality or rating is a moving target so we need to allow for crowdsourced ratings. That means we need to capture all users likes and dislikes
as well.
Armed with these requirements, the desire to offer the best experience possible, and the impetus to not reinvent any wheels, I start shopping.
5. And let me tell you, this is the best part of my job. That’s me, brimming with excitement, clearly.
It has never been easier for organizations, large and small, and even individuals on a tight budget, to leverage AI/ML technologies that less than 5 years ago were
accessible only to large tech companies.
6. Using AI/ML cloud services that cost pennies on the dollar per compute hour, we can
1. Discover insights and relationships in text using NLP (Amazon Comprehend)
2. Provide personalized content recommendations tailored to your profile consumption habits. (Amazon Personalize)
3. Using advanced OCR, extract key terms from an uploaded resume to automatically map interests to you profile (Amazon Textract
And on and on…
7. As a user, I would like to find relevant and quality
content for my professional development, so that I
can advance my career.
So as a technologist tasked with engineering this seemingly benign but useful product feature, a few months ago I would have simply jumped in and solved a technical
problem. Having selected my AI/ML tools of choice, I would have broken up the work into its component parts and have my engineering team and I work diligently to
deliver the capability to our users.
That was the plan and it was a simple one.
8. Then I read this book.
Then this report.
And this one.
I’ve heard countless questions and discussions related to privacy and data ownership during our monthly product updates with customers and prospects.
And as I continued to immerse myself into the Information Professional industry, the concerns got clearer with each of these signals.
9. The technologies we build have bias because we have bias.
New technology can perpetuate the status quo.
Transparency and consent can help.
1. The technologies we build have bias because we have bias.
2. The content and opportunities produced or surfaced using technology can often perpetuate the status quo.
3. (Most pointedly for us developers who use these building blocks to build products of top of them...) Transparency and consent need to play a larger role.
10. I have a confession.
I have a confession to make. I've been taking my privacy for granted—even as a 22-year veteran of the tech industry—knowing full well some of the ways user data
privacy and security can be abused.
I’ve left it up to others — whoever that might be — to keep profit-motivated entities honest with regards to my data and online privacy.
Now that I am responsible for the technology direction of one of those profit-motivated entities, I am forced to reconcile the privacy needs of my users with the impact of
the technology I'm using to solve problems for them.
Make no mistake, whether you are in the for-profit or non-profit sectors of this industry, you too will need to reconcile your use of AI/ML technologies with the impact they
have on your employees, your users, your patrons.
12. As a user, I would like to find relevant and quality
content for my professional development, so that I
can advance my career.
Identity Search Relevance Ratings
If you recall, part of the data we need to make this a great search experience and to provide useful recommendations to our user requires that we not only identify our
users but that we collect behavior data from them over time.
13. [Diagram Placeholder]
Now imagine that we have the mechanism in place to capture and store every search, every like, every bookmark, and every page visit, virtually every worthwhile
interaction with the software.
14. [Diagram Placeholder]
And because we know who you are and want to provide you, specifically, with a great user experience, we tag all of that data with your Personally Identifiable Information
(PII).
15. [Diagram Placeholder]
That is one of the single-greatest mistakes you can make early on in your architecture. Simply because as your system grows over time and new components get added
and are granted access to that data, PII starts to leaks into these components, sometimes even out to third-party vendors or partners -- entities your users never agreed
to share data. If you're ever curious how costly this approach is to address, ask any company who's had to embark on a GDPR compliance remediation effort.
16. [Diagram Placeholder]
So then, how do we ourselves up for success here?
Simple (though not easy): Keep PII data at the edge of your platform as much as possible. Here's what that means.
17. • Information profession is still very much in the early days of understanding
the impact that AI/ML technology.
• Design new systems with user privacy in mind.
• Lean into the uncomfortable privacy and security issues that surround
user data.
I believe the Information profession is still very much in the early days of understanding the impact that AI/ML technology will have on the people it serves for better or
worse.
For the technologists in the room, the builders, the product managers, the entrepreneurs, and the visionaries, don't punt on digging into the uncomfortable privacy and
security issues that surround user data. When taking on these new projects, insist on having a plan for how to treat personal information and how to honor requests to be
forgotten from users.
It is our responsibility as those closest to the technology to surface its tradeoffs to our community of decision makers and project sponsors.
And lastly, to the scholars and the data privacy champions, keep doing what you do best to bring awareness to misuse and risks of these technologies.