SlideShare a Scribd company logo
1 of 21
Achieve AI-Powered API Privacy
Using Open Source
Gianluca Brigandi
CEO : Atricore Inc. / Veridax
gianluca@veridax.com
Exploring concrete solutions – notYAPAT (Yet-
Another-Privacy-Awareness-Talk)
Introduce grass roots approach for application
privacy
What can we mix-and-match TODAY to start making
progress before regulations hit you
How AI (DNN specifically) can enable new
capabilities in terms of hardening our apps privacy-
wise
What is this talk about?
About me: Just a curious guy
Developer, security researcher, entrepreneur and open source contributor
During the past 15 years I’ve architected products at the intersections of
privacy, application and container security, Identity & Access Management
and AI
Introduced first model-driven security solution back in 2011 (security-as-
code but visual) with Fortune 500 and Defense clients
R&D on automating application privacy during the past couple of years
First computer: Commodore 64 !
What is Privacy?
State or condition of being free
from being observed or
disturbed by other people.
“
”
Privacy-by-design Principles
(by Prof. Ann Cavoukian)
Proactive not reactive; Preventative not remedial
Privacy as the default setting
Privacy embedded into design
Full functionality – positive-sum, not zero-sum
End-to-end security – full lifecycle protection
Visibility and transparency – keep it open
Respect for user privacy – keep it user-centric
1.2B Personal Records Breached in 2017
Cost breakdown
$140B Direct Cost
+ Class Action Liability
+ Lost Business
+ Regulator Penalties
Why “fixing” Privacy is challenging
Cuts across Infrastructure, Data, Applications and Processes
Has to address what’s inside and outside the perimeter
Must coordinate with laws and regulations (e.g. GDPR, CCPA)
Fairly new discipline: embryonic body of knowledge operating at 10K feet
Lack of accessible tools to enable faster adoption
Little insight on how to leverage security tools and techniques to introduce automation
Requires strong and effective Governance model with CxO support
Agile PbD – The bazaar mindset
(The Cathedral & the Bazaar - Eric Raymond)
Agile adoption with a grass-roots strategy
Leverages current enterprise security practices: threat modeling
Builds on OS security tech that can bring value to the table: static and dynamic code
analysis, behavioral analytics
Plays nice with existent DevSecOps processes and toolchain: automate privacy controls
Proactive vs Reactive – Accommodating regulatory demands instead of reacting “out of
the blue”
PbD - Cathedral vs. Bazaar
Policy-driven implementation Engineering-driven
Hierarchical: Owned by compliance, top-
down information flow
Graph: Owned by engineering team and
compliance. Everyone can contribute
Siloed – disconnected from the security
architecture
Built around the existing security
architecture and capabilities
Infrastructure and Data First, Applications
a second thought
Applications as first-class citizens
Mindset that buying COTS software will
translate to solving the problem
Build and Adopt, buy as a last resort
From Idea to Implementation
PrivAPI
Challenges
Missing Dataset
Scale and variety of APIs – No standards!
Manual labeling too laborious and expensive
Consumption-ready PII is not publicly available
Lack of FOSS references for inspiration
High-level Architecture
Synthetic Dataset Generation: Bird’s Eye
REST Request
Generation
OpenAPI stack
API
descriptor
Compiled API
descriptor
PII types and
their regexes
Mock PII fields
generation
OpenAPI
descriptors
Labeled Mock
API Requests
Automatic
Labeling
Unlabeled
Request
Oversampling
Mock REST
Request
Generation
Synthetic Dataset Generation: Flow
OpenAPI descriptor gets compiled
PrivAPI takes over request generation
Instead of sending it throughout the wire, it generates a mock request containing
mocked fields based on specified format (e.g. SSN, Dates)
Labels mock request based on trigger words
Oversamples minority class (i.e. PII requests)
Saves it
Note: It’s just a baseline. Augment it with “real world” data
Model Training: Bird’s Eye
Vectorize
Mock Request
Vocabulary
Creation
Keras + TensorFlow
Labeled Mock
API Request
Labeled API
Requests
Dataset
Analytics
Model
LSTM Deep
Neural
Network
Training
Embeddings
Produces
Model Training: Flow
A. PrivAPI Dataset generated in the previous step gets loaded
B. Vocabulary is created from it
C. Vector embeddings are calculated for every API request
D. LSTM Deep Neural Network is created by learning from API requests
E. Analytics model is saved
Classifying: Bird’s Eye
Vectorization
Keras + TensorFlow
API Request
Real world API
Traffic
Analytics Model
LSTM Deep
Neural Network
Prediction
Embeddings
Consumes
Is PII
Classification
Classifying: Flow
A. PrivAPI analytics model generated in the previous step is loaded, along with the
vocabulary
B. Analytics model (LSTM) created in the previous step is loaded
C. Target “real” API request is read and vectorized
D. Prediction task is executed for API request
E. Prediction results - whether the submitted API request does or does not contain PII –
are presented
Going to Prod? Model quality is key
Synthetic dataset is just a baseline – augment with real world examples!
Introduce smarter (domain specific) labeling through custom ‘fakers’ and Natural
Language Processing techniques (e.g. NER)
Get human feedback
Allow the model to continuously improve based on new data (online learning)
Demo Time!
References
http://towardsdatascience.com/detecting-personal-data-within-api-
communication-using-deep-learning-9e52a1ff09c6
https://github.com/veridax/privapi

More Related Content

What's hot

Project Proposal Presentation
Project Proposal PresentationProject Proposal Presentation
Project Proposal Presentation
guest1a53eae
 

What's hot (11)

IDSA Overview at CSA SV
IDSA Overview at CSA SVIDSA Overview at CSA SV
IDSA Overview at CSA SV
 
Integrate Apps using Azure Workbench and Azure Blockchain as Service
Integrate Apps using Azure Workbench and Azure Blockchain as ServiceIntegrate Apps using Azure Workbench and Azure Blockchain as Service
Integrate Apps using Azure Workbench and Azure Blockchain as Service
 
Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...
Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...
Assisting IoT Projects and Developers in Designing Interoperable Semantic Web...
 
Introduction to IoT Architecture
Introduction to IoT ArchitectureIntroduction to IoT Architecture
Introduction to IoT Architecture
 
Atagg2015 iot internet of things - get ready to test the connected future ata...
Atagg2015 iot internet of things - get ready to test the connected future ata...Atagg2015 iot internet of things - get ready to test the connected future ata...
Atagg2015 iot internet of things - get ready to test the connected future ata...
 
Internet Of Things
Internet Of ThingsInternet Of Things
Internet Of Things
 
Top IOT Testing Challenges Webinar with Jon Hagar
Top IOT Testing Challenges Webinar with Jon HagarTop IOT Testing Challenges Webinar with Jon Hagar
Top IOT Testing Challenges Webinar with Jon Hagar
 
Arpan pal uworld2013
Arpan pal uworld2013Arpan pal uworld2013
Arpan pal uworld2013
 
apidays LIVE London 2021 - What are SMART APIs by Patrick Brosse, Amadeus
apidays LIVE London 2021 - What are SMART APIs by Patrick Brosse, Amadeusapidays LIVE London 2021 - What are SMART APIs by Patrick Brosse, Amadeus
apidays LIVE London 2021 - What are SMART APIs by Patrick Brosse, Amadeus
 
Project Proposal Presentation
Project Proposal PresentationProject Proposal Presentation
Project Proposal Presentation
 
Semantic Analytics: The accelerator of Artificial Intelligence Digital Markets
Semantic Analytics: The accelerator of Artificial Intelligence Digital MarketsSemantic Analytics: The accelerator of Artificial Intelligence Digital Markets
Semantic Analytics: The accelerator of Artificial Intelligence Digital Markets
 

Similar to DevSecCon London 2019 - Achieve AI-Powered API Privacy Using Open Source

Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
Yael Garten
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
What I Learned at RSAC 2020
What I Learned at RSAC 2020What I Learned at RSAC 2020
What I Learned at RSAC 2020
Ulf Mattsson
 

Similar to DevSecCon London 2019 - Achieve AI-Powered API Privacy Using Open Source (20)

Achieve AI-Powered API Privacy using Open Source
Achieve AI-Powered API Privacy using Open SourceAchieve AI-Powered API Privacy using Open Source
Achieve AI-Powered API Privacy using Open Source
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Creating an MVP with Oracle
Creating an MVP with OracleCreating an MVP with Oracle
Creating an MVP with Oracle
 
Democratizing security
Democratizing securityDemocratizing security
Democratizing security
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
From Reversing to Exploitation
From Reversing to ExploitationFrom Reversing to Exploitation
From Reversing to Exploitation
 
2018 learning approach-digitaltrends
2018 learning approach-digitaltrends2018 learning approach-digitaltrends
2018 learning approach-digitaltrends
 
[Cloud Summit 2010] Peter Coffee - Sales Force
[Cloud Summit 2010] Peter Coffee - Sales Force[Cloud Summit 2010] Peter Coffee - Sales Force
[Cloud Summit 2010] Peter Coffee - Sales Force
 
Designing for Privacy in AWS cloud
Designing for Privacy in AWS cloudDesigning for Privacy in AWS cloud
Designing for Privacy in AWS cloud
 
Using Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIsUsing Algorithmia to leverage AI and Machine Learning APIs
Using Algorithmia to leverage AI and Machine Learning APIs
 
Data Privacy By Design with AWS
Data Privacy By Design with AWSData Privacy By Design with AWS
Data Privacy By Design with AWS
 
What I Learned at RSAC 2020
What I Learned at RSAC 2020What I Learned at RSAC 2020
What I Learned at RSAC 2020
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainingsTop 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
Top 10 Most Demand IT Certifications Course in 2020 - MildainTrainings
 
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
 
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving ITDynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
Dynatrace: Davis - Hololens - AI update - Cloud announcements - Self driving IT
 
Secure Your DevOps Pipeline Best Practices Meetup 08022024.pptx
Secure Your DevOps Pipeline Best Practices Meetup 08022024.pptxSecure Your DevOps Pipeline Best Practices Meetup 08022024.pptx
Secure Your DevOps Pipeline Best Practices Meetup 08022024.pptx
 
From Reversing to Exploitation: Android Application Security in Essence
From Reversing to Exploitation: Android Application Security in EssenceFrom Reversing to Exploitation: Android Application Security in Essence
From Reversing to Exploitation: Android Application Security in Essence
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

DevSecCon London 2019 - Achieve AI-Powered API Privacy Using Open Source

  • 1. Achieve AI-Powered API Privacy Using Open Source Gianluca Brigandi CEO : Atricore Inc. / Veridax gianluca@veridax.com
  • 2. Exploring concrete solutions – notYAPAT (Yet- Another-Privacy-Awareness-Talk) Introduce grass roots approach for application privacy What can we mix-and-match TODAY to start making progress before regulations hit you How AI (DNN specifically) can enable new capabilities in terms of hardening our apps privacy- wise What is this talk about?
  • 3. About me: Just a curious guy Developer, security researcher, entrepreneur and open source contributor During the past 15 years I’ve architected products at the intersections of privacy, application and container security, Identity & Access Management and AI Introduced first model-driven security solution back in 2011 (security-as- code but visual) with Fortune 500 and Defense clients R&D on automating application privacy during the past couple of years First computer: Commodore 64 !
  • 4. What is Privacy? State or condition of being free from being observed or disturbed by other people. “ ”
  • 5. Privacy-by-design Principles (by Prof. Ann Cavoukian) Proactive not reactive; Preventative not remedial Privacy as the default setting Privacy embedded into design Full functionality – positive-sum, not zero-sum End-to-end security – full lifecycle protection Visibility and transparency – keep it open Respect for user privacy – keep it user-centric
  • 6. 1.2B Personal Records Breached in 2017 Cost breakdown $140B Direct Cost + Class Action Liability + Lost Business + Regulator Penalties
  • 7. Why “fixing” Privacy is challenging Cuts across Infrastructure, Data, Applications and Processes Has to address what’s inside and outside the perimeter Must coordinate with laws and regulations (e.g. GDPR, CCPA) Fairly new discipline: embryonic body of knowledge operating at 10K feet Lack of accessible tools to enable faster adoption Little insight on how to leverage security tools and techniques to introduce automation Requires strong and effective Governance model with CxO support
  • 8. Agile PbD – The bazaar mindset (The Cathedral & the Bazaar - Eric Raymond) Agile adoption with a grass-roots strategy Leverages current enterprise security practices: threat modeling Builds on OS security tech that can bring value to the table: static and dynamic code analysis, behavioral analytics Plays nice with existent DevSecOps processes and toolchain: automate privacy controls Proactive vs Reactive – Accommodating regulatory demands instead of reacting “out of the blue”
  • 9. PbD - Cathedral vs. Bazaar Policy-driven implementation Engineering-driven Hierarchical: Owned by compliance, top- down information flow Graph: Owned by engineering team and compliance. Everyone can contribute Siloed – disconnected from the security architecture Built around the existing security architecture and capabilities Infrastructure and Data First, Applications a second thought Applications as first-class citizens Mindset that buying COTS software will translate to solving the problem Build and Adopt, buy as a last resort
  • 10. From Idea to Implementation PrivAPI
  • 11. Challenges Missing Dataset Scale and variety of APIs – No standards! Manual labeling too laborious and expensive Consumption-ready PII is not publicly available Lack of FOSS references for inspiration
  • 13. Synthetic Dataset Generation: Bird’s Eye REST Request Generation OpenAPI stack API descriptor Compiled API descriptor PII types and their regexes Mock PII fields generation OpenAPI descriptors Labeled Mock API Requests Automatic Labeling Unlabeled Request Oversampling Mock REST Request Generation
  • 14. Synthetic Dataset Generation: Flow OpenAPI descriptor gets compiled PrivAPI takes over request generation Instead of sending it throughout the wire, it generates a mock request containing mocked fields based on specified format (e.g. SSN, Dates) Labels mock request based on trigger words Oversamples minority class (i.e. PII requests) Saves it Note: It’s just a baseline. Augment it with “real world” data
  • 15. Model Training: Bird’s Eye Vectorize Mock Request Vocabulary Creation Keras + TensorFlow Labeled Mock API Request Labeled API Requests Dataset Analytics Model LSTM Deep Neural Network Training Embeddings Produces
  • 16. Model Training: Flow A. PrivAPI Dataset generated in the previous step gets loaded B. Vocabulary is created from it C. Vector embeddings are calculated for every API request D. LSTM Deep Neural Network is created by learning from API requests E. Analytics model is saved
  • 17. Classifying: Bird’s Eye Vectorization Keras + TensorFlow API Request Real world API Traffic Analytics Model LSTM Deep Neural Network Prediction Embeddings Consumes Is PII Classification
  • 18. Classifying: Flow A. PrivAPI analytics model generated in the previous step is loaded, along with the vocabulary B. Analytics model (LSTM) created in the previous step is loaded C. Target “real” API request is read and vectorized D. Prediction task is executed for API request E. Prediction results - whether the submitted API request does or does not contain PII – are presented
  • 19. Going to Prod? Model quality is key Synthetic dataset is just a baseline – augment with real world examples! Introduce smarter (domain specific) labeling through custom ‘fakers’ and Natural Language Processing techniques (e.g. NER) Get human feedback Allow the model to continuously improve based on new data (online learning)