Automated Reverse-Engineering
of a Cloud API
Stéphanie Challita, PhD
Associate Professor @University of Rennes 1, ESIR & IRISA/Inria, DiverSE team
30/09/2020 Stéphanie Challita @ENS seminar
Brief bio
▪ Degree of Systems and Networks Engineering, 2010 - 2015
▪ Research Master’s degree in Computer Science, 2014 - 2015
▪ PhD in Computer Science, 2015 - 2018
▪ Spirals team (Lille, CRIStAL, Inria)
▪ Postdoctoral Researcher, 2019 - 2020
▪ Kairos team (Sophia Antipolis, I3S, Inria)
▪ Associate Professor, since September 2020
▪ DiverSE team (Rennes, IRISA, Inria)
30/09/2020 Stéphanie Challita @ENS seminar 2/41
Research project
Towards the automatic construction of reliable and co-evolving software systems
30/09/2020 Stéphanie Challita @ENS seminar 3/41
Infer
Verify
Models for Cloud, IoT…
Tests
Co-evolution
…
Scientific challenges
o Inference of Domain-Specific Modeling Languages (DSML) from APIs
• Precision & Genericity
• Learning
o Verification
• Generation of instances of inferred DSMLs
• Generation of oracles
o Co-evolution of APIs and languages
• Identify the impact of API changes on DSML
• Co-evolve the impacted components
Research team
30/09/2020 Stéphanie Challita @ENS seminar 4/41
▪ Modeling & Language
Engineering
▪ Advanced testing
▪ DevOps for distributed
and heterogeneous
systems
▪ Variability Engineeringhttps://www.diverse-team.fr/
Cloud computing
30/09/2020 Stéphanie Challita @ENS seminar
Created by Sam Johnston, downloaded from https://en.wikipedia.org/wiki/Cloud_computing
5/41
Multi-cloud computing
30/09/2020 Stéphanie Challita @ENS seminar 6/41
Multi-cloud computing
30/09/2020 Stéphanie Challita @ENS seminar 7/41
Approaches for multi-clouds - Actors
30/09/2020 Stéphanie Challita @ENS seminar
Cloud
provider
Cloud
developer
Cloud
architect
Use Use
Offer
8/41
Approaches for multi-clouds
30/09/2020 Stéphanie Challita @ENS seminar
AWS
API
OCCI
API
DigitalOcean
API
DigitalOcean
SDK
Multi-cloud
Libraries
GCP
SDK
AWS
SDK
Public Private
Cloud
provider
Public
GCP
API
Cloud
developerOCCI
SDK
…
…
Public
Cloud
Brokers
OCCI CIMI …
9/41
Approaches for multi-clouds
30/09/2020 Stéphanie Challita @ENS seminar
Cloud
Metamodel
Cloud
Model
conforms to
represented by
defines
Cloud
Meta-metamodel
conforms to
M0
M1
M2
M3
Cloud
architect
Model
Code
generation
Static
analysis
Documentation
Transformation
10/41
Approaches for multi-clouds
30/09/2020 Stéphanie Challita @ENS seminar
Cloud
architect
CloudML
AWS
API
OCCI
API
DigitalOcean
API
DigitalOcean
SDK
Multi-cloud
Libraries
GCP
SDK
AWS
SDK
Public Private
Cloud
provider
Public
GCP
API
Cloud
developerOCCI
SDK
…
…
Public
Cloud
Brokers
CAMEL TOSCAOpenTOSCASALOON StratusML
OCCI CIMI …
11/41
Approaches for multi-clouds
Issue:
Fuzziness of the concepts of the cloud modeling languages
30/09/2020 Stéphanie Challita @ENS seminar 12/41
Research question
RQ: Is it possible to automatically extract precise models from cloud APIs and to
synchronize them with the cloud evolution?
- How to provide an accurate description for a cloud API?
- How to correct the existing drawbacks in a cloud API documentation?
- How to analyze a cloud API documentation?
30/09/2020 Stéphanie Challita @ENS seminar
Research topics: API mining, reverse-engineering, NLP
13/41
30/09/2020 Stéphanie Challita @ENS seminar
Model-Driven Approach
for the Cloud
Infer
OCCIGCPAWS
Global vision
14/41
Cloud API Documentation
30/09/2020 Stéphanie Challita @ENS seminar
Cloud
developer/architect
Cloud
provider
An agreement with the developer on exactly how the system will operate
conformsto
Cloud
documentation
Cloud documentations are written in natural language
→ human errors and/or semantic confusions
15/41
Vision
▪Inferring models from cloud APIs
▪Work of API mining, reverse-engineering
▪ HTML Model
▪Model refinement (NLP techniques, graphical output…)
30/09/2020 Stéphanie Challita @ENS seminar
Text
Analysis
Engine
Generic
parser
Model
Generator
Model
Validator
Cloud API
Cloud
DSML
16/41
Google Cloud Platform (GCP) use case
30/09/2020 Stéphanie Challita @ENS seminar
Is partner withIs adopted by
17/41
List of GCP documentation drawbacks
▪Informal heterogeneous documentation
▪Imprecise types
▪Implicit attribute metadata
▪Hidden links
▪Redundancy
▪Lack of visual support
30/09/2020 Stéphanie Challita @ENS seminar 18/41
Imprecise types
30/09/2020 Stéphanie Challita @ENS seminar 19/41
Implicit attribute metadata
30/09/2020 Stéphanie Challita @ENS seminar
Available at
https://cloud.google.com/compute/docs/reference/latest/networks
Available at
https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets
20/41
GCP snapshot
30/09/2020 Stéphanie Challita @ENS seminar
▪ GCP engineers could
update/correct GCP
documentation
▪ Continuously following up with
GCP documentation is costly
▪ Snapshot of GCP API
A
Snapshot
GCP
HTML pages
GCP
documentation
21/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar
GCP
Crawler
A B
Snapshot
GCP
HTML pages
GCP
documentation
▪ GCP Crawler to extract all GCP resources, their
attributes and actions
▪ GCP Model for a better description of the GCP
resources
GCP
Model
C
22/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 23/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 24/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 25/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 26/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 27/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 28/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 29/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 30/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar 31/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar
GCP
Crawler
A B
Snapshot
GCP
HTML pages
GCP
documentation
▪ GCP Crawler to extract all GCP resources, their
attributes and actions
▪ GCP Model for a better description of the GCP
resources
GCP
Model
C
OCCIware
Metamodel
GCP
configuration
conforms to
represented by
Ecore
Metamodel
conforms to
M0
M1
M2
M3
GCP
model
GCP
doc
conforms to
32/41
GCP crawler & GCP model
30/09/2020 Stéphanie Challita @ENS seminar
GCP
Crawler
A B
Snapshot
GCP
HTML pages
GCP
documentation
GCP
Model
C
Implicit Attribute
Metadata Detection
Link Identification
Redundancy Removal
Model
Transformations
Type Refinement
Model Visualization
33/41
Implicit attribute metadata detection
▪ To explicitly store information into additional attributes defined in the ATTRIBUTE
concept of our GCP MODEL
▪ We use Natural Language Processing (NLP) techniques
▪ Word Tagging/Part-of- Speech (PoS)
▪ We declare pre-defined tags for some GCP specific attribute properties:
▪ mu tab le = tru e if [In p u t -O n ly ]
▪ mu tab le = false if [O u tp u t -on ly ]/ read on ly
▪ req u ired = tru e if [Req u ired ]
▪ req u ired = false if [O ption al]
▪ default = X if The default value is X
30/09/2020 Stéphanie Challita @ENS seminar 34/41
Implicit attribute metadata detection
30/09/2020 Stéphanie Challita @ENS seminar 35/41
Implicit attribute metadata detection
30/09/2020 Stéphanie Challita @ENS seminar 36/41
Implicit attribute metadata detection
30/09/2020 Stéphanie Challita @ENS seminar 37/41
Implicit attribute metadata detection
30/09/2020 Stéphanie Challita @ENS seminar 38/41
Perspectives
▪ More generic crawler to support any API
▪ Do not conform to OCCI anymore, other than HTML pages
▪ Implement GCP studio, a model-based framework that relies on this approach
to design and deploy GCP applications
▪ Automated approach that would automatically handle the evolution of GCP
▪ Incrementally detect streaming modifications, by calculating and modifying only the
differences between the initially processed version and the newly modified one
30/09/2020 Stéphanie Challita @ENS seminar 39/41
Research internship for M2 students
30/09/2020 Stéphanie Challita @ENS seminar
Research
needs you!
40/41
Thank you!
Stéphanie Challita, Faiez Zalila, Christophe Gourdin, Philippe Merle. “A Precise Model for Google Cloud Platform" .
IEEE International Conference on Cloud Engineering (IC2E). 2018.
https://github.com/occiware/GCP-Model
stephanie.challita@inria.fr
https://stephaniechallita.github.io/
30/09/2020 Stéphanie Challita @ENS seminar

Automated Reverse-Engineering of a Cloud API

  • 1.
    Automated Reverse-Engineering of aCloud API Stéphanie Challita, PhD Associate Professor @University of Rennes 1, ESIR & IRISA/Inria, DiverSE team 30/09/2020 Stéphanie Challita @ENS seminar
  • 2.
    Brief bio ▪ Degreeof Systems and Networks Engineering, 2010 - 2015 ▪ Research Master’s degree in Computer Science, 2014 - 2015 ▪ PhD in Computer Science, 2015 - 2018 ▪ Spirals team (Lille, CRIStAL, Inria) ▪ Postdoctoral Researcher, 2019 - 2020 ▪ Kairos team (Sophia Antipolis, I3S, Inria) ▪ Associate Professor, since September 2020 ▪ DiverSE team (Rennes, IRISA, Inria) 30/09/2020 Stéphanie Challita @ENS seminar 2/41
  • 3.
    Research project Towards theautomatic construction of reliable and co-evolving software systems 30/09/2020 Stéphanie Challita @ENS seminar 3/41 Infer Verify Models for Cloud, IoT… Tests Co-evolution … Scientific challenges o Inference of Domain-Specific Modeling Languages (DSML) from APIs • Precision & Genericity • Learning o Verification • Generation of instances of inferred DSMLs • Generation of oracles o Co-evolution of APIs and languages • Identify the impact of API changes on DSML • Co-evolve the impacted components
  • 4.
    Research team 30/09/2020 StéphanieChallita @ENS seminar 4/41 ▪ Modeling & Language Engineering ▪ Advanced testing ▪ DevOps for distributed and heterogeneous systems ▪ Variability Engineeringhttps://www.diverse-team.fr/
  • 5.
    Cloud computing 30/09/2020 StéphanieChallita @ENS seminar Created by Sam Johnston, downloaded from https://en.wikipedia.org/wiki/Cloud_computing 5/41
  • 6.
  • 7.
  • 8.
    Approaches for multi-clouds- Actors 30/09/2020 Stéphanie Challita @ENS seminar Cloud provider Cloud developer Cloud architect Use Use Offer 8/41
  • 9.
    Approaches for multi-clouds 30/09/2020Stéphanie Challita @ENS seminar AWS API OCCI API DigitalOcean API DigitalOcean SDK Multi-cloud Libraries GCP SDK AWS SDK Public Private Cloud provider Public GCP API Cloud developerOCCI SDK … … Public Cloud Brokers OCCI CIMI … 9/41
  • 10.
    Approaches for multi-clouds 30/09/2020Stéphanie Challita @ENS seminar Cloud Metamodel Cloud Model conforms to represented by defines Cloud Meta-metamodel conforms to M0 M1 M2 M3 Cloud architect Model Code generation Static analysis Documentation Transformation 10/41
  • 11.
    Approaches for multi-clouds 30/09/2020Stéphanie Challita @ENS seminar Cloud architect CloudML AWS API OCCI API DigitalOcean API DigitalOcean SDK Multi-cloud Libraries GCP SDK AWS SDK Public Private Cloud provider Public GCP API Cloud developerOCCI SDK … … Public Cloud Brokers CAMEL TOSCAOpenTOSCASALOON StratusML OCCI CIMI … 11/41
  • 12.
    Approaches for multi-clouds Issue: Fuzzinessof the concepts of the cloud modeling languages 30/09/2020 Stéphanie Challita @ENS seminar 12/41
  • 13.
    Research question RQ: Isit possible to automatically extract precise models from cloud APIs and to synchronize them with the cloud evolution? - How to provide an accurate description for a cloud API? - How to correct the existing drawbacks in a cloud API documentation? - How to analyze a cloud API documentation? 30/09/2020 Stéphanie Challita @ENS seminar Research topics: API mining, reverse-engineering, NLP 13/41
  • 14.
    30/09/2020 Stéphanie Challita@ENS seminar Model-Driven Approach for the Cloud Infer OCCIGCPAWS Global vision 14/41
  • 15.
    Cloud API Documentation 30/09/2020Stéphanie Challita @ENS seminar Cloud developer/architect Cloud provider An agreement with the developer on exactly how the system will operate conformsto Cloud documentation Cloud documentations are written in natural language → human errors and/or semantic confusions 15/41
  • 16.
    Vision ▪Inferring models fromcloud APIs ▪Work of API mining, reverse-engineering ▪ HTML Model ▪Model refinement (NLP techniques, graphical output…) 30/09/2020 Stéphanie Challita @ENS seminar Text Analysis Engine Generic parser Model Generator Model Validator Cloud API Cloud DSML 16/41
  • 17.
    Google Cloud Platform(GCP) use case 30/09/2020 Stéphanie Challita @ENS seminar Is partner withIs adopted by 17/41
  • 18.
    List of GCPdocumentation drawbacks ▪Informal heterogeneous documentation ▪Imprecise types ▪Implicit attribute metadata ▪Hidden links ▪Redundancy ▪Lack of visual support 30/09/2020 Stéphanie Challita @ENS seminar 18/41
  • 19.
    Imprecise types 30/09/2020 StéphanieChallita @ENS seminar 19/41
  • 20.
    Implicit attribute metadata 30/09/2020Stéphanie Challita @ENS seminar Available at https://cloud.google.com/compute/docs/reference/latest/networks Available at https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets 20/41
  • 21.
    GCP snapshot 30/09/2020 StéphanieChallita @ENS seminar ▪ GCP engineers could update/correct GCP documentation ▪ Continuously following up with GCP documentation is costly ▪ Snapshot of GCP API A Snapshot GCP HTML pages GCP documentation 21/41
  • 22.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar GCP Crawler A B Snapshot GCP HTML pages GCP documentation ▪ GCP Crawler to extract all GCP resources, their attributes and actions ▪ GCP Model for a better description of the GCP resources GCP Model C 22/41
  • 23.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 23/41
  • 24.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 24/41
  • 25.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 25/41
  • 26.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 26/41
  • 27.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 27/41
  • 28.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 28/41
  • 29.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 29/41
  • 30.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 30/41
  • 31.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar 31/41
  • 32.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar GCP Crawler A B Snapshot GCP HTML pages GCP documentation ▪ GCP Crawler to extract all GCP resources, their attributes and actions ▪ GCP Model for a better description of the GCP resources GCP Model C OCCIware Metamodel GCP configuration conforms to represented by Ecore Metamodel conforms to M0 M1 M2 M3 GCP model GCP doc conforms to 32/41
  • 33.
    GCP crawler &GCP model 30/09/2020 Stéphanie Challita @ENS seminar GCP Crawler A B Snapshot GCP HTML pages GCP documentation GCP Model C Implicit Attribute Metadata Detection Link Identification Redundancy Removal Model Transformations Type Refinement Model Visualization 33/41
  • 34.
    Implicit attribute metadatadetection ▪ To explicitly store information into additional attributes defined in the ATTRIBUTE concept of our GCP MODEL ▪ We use Natural Language Processing (NLP) techniques ▪ Word Tagging/Part-of- Speech (PoS) ▪ We declare pre-defined tags for some GCP specific attribute properties: ▪ mu tab le = tru e if [In p u t -O n ly ] ▪ mu tab le = false if [O u tp u t -on ly ]/ read on ly ▪ req u ired = tru e if [Req u ired ] ▪ req u ired = false if [O ption al] ▪ default = X if The default value is X 30/09/2020 Stéphanie Challita @ENS seminar 34/41
  • 35.
    Implicit attribute metadatadetection 30/09/2020 Stéphanie Challita @ENS seminar 35/41
  • 36.
    Implicit attribute metadatadetection 30/09/2020 Stéphanie Challita @ENS seminar 36/41
  • 37.
    Implicit attribute metadatadetection 30/09/2020 Stéphanie Challita @ENS seminar 37/41
  • 38.
    Implicit attribute metadatadetection 30/09/2020 Stéphanie Challita @ENS seminar 38/41
  • 39.
    Perspectives ▪ More genericcrawler to support any API ▪ Do not conform to OCCI anymore, other than HTML pages ▪ Implement GCP studio, a model-based framework that relies on this approach to design and deploy GCP applications ▪ Automated approach that would automatically handle the evolution of GCP ▪ Incrementally detect streaming modifications, by calculating and modifying only the differences between the initially processed version and the newly modified one 30/09/2020 Stéphanie Challita @ENS seminar 39/41
  • 40.
    Research internship forM2 students 30/09/2020 Stéphanie Challita @ENS seminar Research needs you! 40/41
  • 41.
    Thank you! Stéphanie Challita,Faiez Zalila, Christophe Gourdin, Philippe Merle. “A Precise Model for Google Cloud Platform" . IEEE International Conference on Cloud Engineering (IC2E). 2018. https://github.com/occiware/GCP-Model stephanie.challita@inria.fr https://stephaniechallita.github.io/ 30/09/2020 Stéphanie Challita @ENS seminar