SlideShare a Scribd company logo
1 of 33
Code Is Not Text!
How graph technologies can help us to
understand our code better
Andreas Dewes (@japh44)
andreas@quantifiedcode.com
21.07.2015
EuroPython 2015 – Bilbao
About
Physicist and Python enthusiast
We are a spin-off of the
University of Munich (LMU):
We develop software for data-driven code
analysis.
How we ussually think about code
But code can also look like this...
Our Journey
1. Why graphs are interesting
2. How we can store code in a graph
3. What we can learn from the graph
4. How programmers can profit from this
Graphs explained in 30 seconds
node / vertex
edge
node_type: classsdef
name: Foo
label: classsdef
data: {...}
node_type: functiondef
name: foo
Old idea, many new solutions: Neo4j, OrientDB, ArangoDB, TitanDB, ... (+SQL, key/value stores)
Graphs in Programming
Used mostly within the
interpreter/compiler.
Use cases
• Code Optimization
• Code Annotation
• Rewriting of Code
• As Intermediate Language
Building the Code Graph
def encode(obj):
"""
Encode a (possibly nested)
dictionary containing complex values
into a form that can be serialized
using JSON.
"""
e = {}
for key,value in obj.items():
if isinstance(value,dict):
e[key] = encode(value)
elif isinstance(value,complex):
e[key] = {'type' : 'complex',
'r' : value.real,
'i' : value.imag}
return e
dict
name
name
assign
functiondef
body
body
targets
for
body iterator
value
import ast
tree = ast.parse(" ")
...
Storing the Graph: Merkle Trees
https://en.wikipedia.org/wiki/Merkle_tree
https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees
/
4a7ef...
/flask
79fe4...
/docs
a77be...
/docs/conf.py
9fa5a../flask/app.py
7fa2a..
...
...
tree
blob
Example: git
(also Bitcoin)
{i : 1}
{id : 'e'}
{name: 'encode',
args : [...]}
{i:0}
AST Example
e4fa76b...
a76fbc41...
c51fa291...
name
name
assign
body
body
targets
for
body iterator
value
dict
functiondef
{i : 1}
{id : 'f'}
{i:0}
5afacc...
ba4ffac...
7faec44...
name
assign
body body
targets
value
dict
functiondef
{name: 'decode',
args : [...]}
74af219...
Efficieny of this Approach
What this enables
• Store everything, not just condensed
meta-data (like e.g. IDEs do)
• Store multiple projects together, to
reveal connections and similarities
• Store the whole git commit history of a
given project, to see changes across
time.
Modules
Classes
Functions
The Flask project
(30.000 vertices)
Working with Graphs
Querying & Navigation
1. Perform a query over some indexed field(s)
to retrieve an initial set of nodes or edges.
graph.filter({'node_type' : 'functiondef',...})
2. Traverse the resulting graph along its edges.
for child in node.outV('body'):
if child['node_type'] == ...
Examples
Show all symbol names, sorted by usage.
graph.filter({'node_type' : {$in : ['functiondef','...']}})
.groupby('name',as = 'cnt').orderby('-cnt')
index 79
...
foo 7
...
bar 5
Examples (contd.)
Show all versions of a given function.
graph.get_by_path('flask.helpers.url_for')
def url_for(endpoint, **values):
"""Generates a URL to the given endpoint with the method provided.
Variable arguments that are unknown to the target endpoint are appended
to the generated URL as query arguments. If the value of a query
argument
is ``None``, the whole pair is skipped. In case blueprints are active
you can shortcut references to the same blueprint by prefixing the
local endpoint with a dot (``.``).
This will reference the index function local to the current blueprint::
url_for('.index')
def url_for(endpoint, **values):
"""Generates a URL to the given endpoint with the method provided.
Variable arguments that are unknown to the target endpoint are appended
to the generated URL as query arguments. If the value of a query
argument
is ``None``, the whole pair is skipped. In case blueprints are active
you can shortcut references to the same blueprint by prefixing the
local endpoint with a dot (``.``).
This will reference the index function local to the current blueprint::
url_for('.index')
def url_for(endpoint, **values):
"""Generates a URL to the given endpoint with the method provided.
Variable arguments that are unknown to the target endpoint are appended
to the generated URL as query arguments. If the value of a query
argument
is ``None``, the whole pair is skipped. In case blueprints are active
you can shortcut references to the same blueprint by prefixing the
local endpoint with a dot (``.``).
This will reference the index function local to the current blueprint::
url_for('.index')
def url_for(endpoint, **values):
"""Generates a URL to the given endpoint with the method provided.
Variable arguments that are unknown to the target endpoint are appended
to the generated URL as query arguments. If the value of a query
argument
is ``None``, the whole pair is skipped. In case blueprints are active
you can shortcut references to the same blueprint by prefixing the
local endpoint with a dot (``.``).
This will reference the index function local to the current blueprint::
url_for('.index')
fa7fca...
3cdaf...
Visualizing Code
Example: Code Complexity
Graph Algorithm for Calculating the
Cyclomatic Complexity (the Python variety)
node = root
def walk(node,anchor = None):
if node['node_type'] == 'functiondef':
anchor=node
anchor['cc']=1 #there is always one path
elif node['node_type'] in
('for','if','ifexp','while',...):
if anchor:
anchor['cc']+=1
for subnode in node.outV:
walk(subnode,anchor = anchor)
#aggregate by function path to visualize
The cyclomatic complexity is a quantitative measure of the number of linearly
independent paths through a program's source code. It was developed by
Thomas J. McCabe, Sr. in 1976.
Example: Flask
flask.helpers.send_file
(complexity: 22)
flask.helpers.url_for
(complexity: 14)
area:
AST weight
( lines of code)
height:
complexity
color:
complexity/weighthttps://quantifiedcode.github.io/code-is-beautiful
Exploring Dependencies in a Code Base
Finding Patterns & Problems
Pattern Matching: Text vs. Graphs
Many other standards: XQuery/XPath, Cypher (Neo4j), Gremlin (e.g. TitanDB), ...
node_type: word
content: {$or : [hello, hallo]}
#...
>followed_by:
node_type: word
content: {$or : [world, welt]}
Hello, world!
/(hello|hallo),*s*
(world|welt)/i
word(hello)
punctuation(,)
word(world)
Example: Building a Code Checker
node_type: tryexcept
>handlers:
$contains:
node_type: excepthandler
type: null
>body:
node_type: pass
try:
customer.credit_card.debit(-100)
except:
pass #to-do: implement this!
Adding an exception to the rule
node_type: tryexcept
>handlers:
$contains:
node_type: excepthandler
type: null
>body:
$not:
$anywhere:
node_type: raise
exclude: #we exclude nested try's
node_type:
$or: [tryexcept]
try:
customer.credit_card.debit(-100)
except:
logger.error("This can't be good.")
raise #let someone else deal with
#this
Bonus Chapter: Analyzing Changes
Example: Diff from Django Project
{i : 1}
{id : 'e'}
{name: 'encode',
args : [...]}
{i:0}
Basic Problem: Tree Isomorphism (NP-complete!)
name
name
assign
body
body
targets
for
body iterator
value
dict
functiondef
{i : 1}
{id : 'ee'}
{name: '_encode',
args : [...]}
{i:0}
name
name
assign
body
body
targets
for
body iterator
value
dict
functiondef
Similar Problem: Chemical Similarity
https://en.wikipedia.org/wiki/Epigallocatechin_gallate
Epigallocatechin gallate
Solution(s):
Jaccard Fingerprints
Bloom Filters
...
Benzene
Applications
Detect duplicated code
e.g. "Duplicate code detection using anti-unification", P Bulychev et. al.
(CloneDigger)
Generate semantic diffs
e.g. "Change Distilling:Tree Differencing for Fine-Grained Source Code
Change Extraction", Fluri, B. et. al.
Detect plagiarism / copyrighted code
e.g. "PDE4Java: Plagiarism Detection Engine For Java Source Code: A
Clustering Approach", A. Jadalla et. al.
Example: Semantic Diff
@mock.patch('django.db.migrations.questioner.MigrationQuestioner.ask_not_null_alteration',
return_value='Some Name')
def test_alter_field_to_not_null_oneoff_default(self, mocked_ask_method):
"""
#23609 - Tests autodetection of nullable to non-nullable alterations.
"""
class CustomQuestioner(...)
# Make state
before = self.make_project_state([self.author_name_null])
after = self.make_project_state([self.author_name])
autodetector = MigrationAutodetector(before, after, CustomQuestioner())
changes = autodetector._detect_changes()
self.assertEqual(mocked_ask_method.call_count, 1)
# Right number/type of migrations?
self.assertNumberMigrations(changes, 'testapp', 1)
self.assertOperationTypes(changes, 'testapp', 0, ["AlterField"])
self.assertOperationAttributes(changes, "testapp", 0, 0, name="name", preserve_default=False)
self.assertOperationFieldAttributes(changes, "testapp", 0, 0, default="Some Name")
Summary: Text vs. Graphs
Text
+ Easy to write
+ Easy to display
+ Universal format
+ Interoperable
- Not normalized
- Hard to analyze
Graphs
+ Easy to analyze
+ Normalized
+ Easy to transform
- Hard to generate
- Not (yet) interoperable
The Future(?): Use text for small-scale manipulation of code,
graphs for large-scale visualization, analysis and transformation.
Thanks!
Andreas Dewes (@japh44)
andreas@quantifiedcode.com
www.quantifiedcode.com
https://github.com/quantifiedcode
@quantifiedcode

More Related Content

What's hot

C++ classes tutorials
C++ classes tutorialsC++ classes tutorials
C++ classes tutorialsMayank Jain
 
C# 7.0 Hacks and Features
C# 7.0 Hacks and FeaturesC# 7.0 Hacks and Features
C# 7.0 Hacks and FeaturesAbhishek Sur
 
Design Patterns in Modern C++
Design Patterns in Modern C++Design Patterns in Modern C++
Design Patterns in Modern C++Dmitri Nesteruk
 
Computer science-2010-cbse-question-paper
Computer science-2010-cbse-question-paperComputer science-2010-cbse-question-paper
Computer science-2010-cbse-question-paperDeepak Singh
 
Why Java Sucks and C# Rocks (Final)
Why Java Sucks and C# Rocks (Final)Why Java Sucks and C# Rocks (Final)
Why Java Sucks and C# Rocks (Final)jeffz
 
Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009Deepak Singh
 
PDC Video on C# 4.0 Futures
PDC Video on C# 4.0 FuturesPDC Video on C# 4.0 Futures
PDC Video on C# 4.0 Futuresnithinmohantk
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Pythonkwatch
 
Advanced Python, Part 1
Advanced Python, Part 1Advanced Python, Part 1
Advanced Python, Part 1Zaar Hai
 
Dr archana dhawan bajaj - csharp fundamentals slides
Dr archana dhawan bajaj - csharp fundamentals slidesDr archana dhawan bajaj - csharp fundamentals slides
Dr archana dhawan bajaj - csharp fundamentals slidesDr-archana-dhawan-bajaj
 
C++11 Idioms @ Silicon Valley Code Camp 2012
C++11 Idioms @ Silicon Valley Code Camp 2012 C++11 Idioms @ Silicon Valley Code Camp 2012
C++11 Idioms @ Silicon Valley Code Camp 2012 Sumant Tambe
 
Object Calisthenics em Go
Object Calisthenics em GoObject Calisthenics em Go
Object Calisthenics em GoElton Minetto
 
CBSE Grade12, Computer Science, Sample Question Paper
CBSE Grade12, Computer Science, Sample Question PaperCBSE Grade12, Computer Science, Sample Question Paper
CBSE Grade12, Computer Science, Sample Question PaperMalathi Senthil
 

What's hot (20)

C++ classes tutorials
C++ classes tutorialsC++ classes tutorials
C++ classes tutorials
 
C# 7.0 Hacks and Features
C# 7.0 Hacks and FeaturesC# 7.0 Hacks and Features
C# 7.0 Hacks and Features
 
Design Patterns in Modern C++
Design Patterns in Modern C++Design Patterns in Modern C++
Design Patterns in Modern C++
 
Computer science-2010-cbse-question-paper
Computer science-2010-cbse-question-paperComputer science-2010-cbse-question-paper
Computer science-2010-cbse-question-paper
 
Why Java Sucks and C# Rocks (Final)
Why Java Sucks and C# Rocks (Final)Why Java Sucks and C# Rocks (Final)
Why Java Sucks and C# Rocks (Final)
 
Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009Cbse question-paper-computer-science-2009
Cbse question-paper-computer-science-2009
 
PDC Video on C# 4.0 Futures
PDC Video on C# 4.0 FuturesPDC Video on C# 4.0 Futures
PDC Video on C# 4.0 Futures
 
Iphone course 1
Iphone course 1Iphone course 1
Iphone course 1
 
C++ L10-Inheritance
C++ L10-InheritanceC++ L10-Inheritance
C++ L10-Inheritance
 
Fantastic DSL in Python
Fantastic DSL in PythonFantastic DSL in Python
Fantastic DSL in Python
 
Advanced Python, Part 1
Advanced Python, Part 1Advanced Python, Part 1
Advanced Python, Part 1
 
Ch 4
Ch 4Ch 4
Ch 4
 
Dr archana dhawan bajaj - csharp fundamentals slides
Dr archana dhawan bajaj - csharp fundamentals slidesDr archana dhawan bajaj - csharp fundamentals slides
Dr archana dhawan bajaj - csharp fundamentals slides
 
C++11 Idioms @ Silicon Valley Code Camp 2012
C++11 Idioms @ Silicon Valley Code Camp 2012 C++11 Idioms @ Silicon Valley Code Camp 2012
C++11 Idioms @ Silicon Valley Code Camp 2012
 
Oop concepts in python
Oop concepts in pythonOop concepts in python
Oop concepts in python
 
Object Calisthenics em Go
Object Calisthenics em GoObject Calisthenics em Go
Object Calisthenics em Go
 
Introduction to Objective - C
Introduction to Objective - CIntroduction to Objective - C
Introduction to Objective - C
 
C# 7
C# 7C# 7
C# 7
 
CBSE Grade12, Computer Science, Sample Question Paper
CBSE Grade12, Computer Science, Sample Question PaperCBSE Grade12, Computer Science, Sample Question Paper
CBSE Grade12, Computer Science, Sample Question Paper
 
C#
C#C#
C#
 

Viewers also liked

Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Dániel Stein
 
Proyecto Agora-evanza
Proyecto Agora-evanzaProyecto Agora-evanza
Proyecto Agora-evanzansarabia
 
Dictamen no procede contratos codigo para planta dirección
Dictamen no procede contratos codigo para planta direcciónDictamen no procede contratos codigo para planta dirección
Dictamen no procede contratos codigo para planta direcciónNelson Leiva®
 
Envios y productos
Envios y productosEnvios y productos
Envios y productosTupinganillo
 
Filacap on line 085
Filacap on line 085Filacap on line 085
Filacap on line 085mgermina
 
Community Building - Burdastyle
Community Building - BurdastyleCommunity Building - Burdastyle
Community Building - BurdastyleKristel Coolen
 
06 16 historia de estos diez años
06 16 historia de estos diez años06 16 historia de estos diez años
06 16 historia de estos diez añosPunto de Fuga
 
Barrokoa euskal herrian
Barrokoa euskal herrian Barrokoa euskal herrian
Barrokoa euskal herrian Marta Basozabal
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in textunyil96
 
Adaptive Security for Risk Management Using Spatial Data
Adaptive Security for Risk Management Using Spatial DataAdaptive Security for Risk Management Using Spatial Data
Adaptive Security for Risk Management Using Spatial DataMahsa Teimourikia
 
Dos duendes y dos deseos compañerismo
Dos duendes y dos deseos compañerismoDos duendes y dos deseos compañerismo
Dos duendes y dos deseos compañerismoDaniela Escobar
 

Viewers also liked (20)

Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories Graph-Based Source Code Analysis of JavaScript Repositories
Graph-Based Source Code Analysis of JavaScript Repositories
 
Proyecto Agora-evanza
Proyecto Agora-evanzaProyecto Agora-evanza
Proyecto Agora-evanza
 
Psicologapositivaaplicadaalaeducacin 111014053934-phpapp01
Psicologapositivaaplicadaalaeducacin 111014053934-phpapp01Psicologapositivaaplicadaalaeducacin 111014053934-phpapp01
Psicologapositivaaplicadaalaeducacin 111014053934-phpapp01
 
Dictamen no procede contratos codigo para planta dirección
Dictamen no procede contratos codigo para planta direcciónDictamen no procede contratos codigo para planta dirección
Dictamen no procede contratos codigo para planta dirección
 
Verdi power victor
Verdi power victorVerdi power victor
Verdi power victor
 
Trabajo inma441
Trabajo inma441Trabajo inma441
Trabajo inma441
 
도시바
도시바도시바
도시바
 
Envios y productos
Envios y productosEnvios y productos
Envios y productos
 
Informatik
InformatikInformatik
Informatik
 
The bigrabbit
The bigrabbitThe bigrabbit
The bigrabbit
 
Filacap on line 085
Filacap on line 085Filacap on line 085
Filacap on line 085
 
Community Building - Burdastyle
Community Building - BurdastyleCommunity Building - Burdastyle
Community Building - Burdastyle
 
Task Groups
Task GroupsTask Groups
Task Groups
 
06 16 historia de estos diez años
06 16 historia de estos diez años06 16 historia de estos diez años
06 16 historia de estos diez años
 
Barrokoa euskal herrian
Barrokoa euskal herrian Barrokoa euskal herrian
Barrokoa euskal herrian
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in text
 
Adaptive Security for Risk Management Using Spatial Data
Adaptive Security for Risk Management Using Spatial DataAdaptive Security for Risk Management Using Spatial Data
Adaptive Security for Risk Management Using Spatial Data
 
Dos duendes y dos deseos compañerismo
Dos duendes y dos deseos compañerismoDos duendes y dos deseos compañerismo
Dos duendes y dos deseos compañerismo
 
Aligning Innovation to Business
Aligning Innovation to Business Aligning Innovation to Business
Aligning Innovation to Business
 
Las medulas
Las medulasLas medulas
Las medulas
 

Similar to Code is not text! How graph technologies can help us to understand our code better.

The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project AnalyzedPVS-Studio
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerAndrey Karpov
 
.gradle 파일 정독해보기
.gradle 파일 정독해보기.gradle 파일 정독해보기
.gradle 파일 정독해보기경주 전
 
Introduction to C#
Introduction to C#Introduction to C#
Introduction to C#ANURAG SINGH
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkCaserta
 
How to Write Node.js Module
How to Write Node.js ModuleHow to Write Node.js Module
How to Write Node.js ModuleFred Chien
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
Odoo - From v7 to v8: the new api
Odoo - From v7 to v8: the new apiOdoo - From v7 to v8: the new api
Odoo - From v7 to v8: the new apiOdoo
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Jonathan Felch
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docxeugeniadean34240
 

Similar to Code is not text! How graph technologies can help us to understand our code better. (20)

Getting Input from User
Getting Input from UserGetting Input from User
Getting Input from User
 
Unit-2 Getting Input from User.pptx
Unit-2 Getting Input from User.pptxUnit-2 Getting Input from User.pptx
Unit-2 Getting Input from User.pptx
 
The First C# Project Analyzed
The First C# Project AnalyzedThe First C# Project Analyzed
The First C# Project Analyzed
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
 
.gradle 파일 정독해보기
.gradle 파일 정독해보기.gradle 파일 정독해보기
.gradle 파일 정독해보기
 
Introduction to C#
Introduction to C#Introduction to C#
Introduction to C#
 
Oops lecture 1
Oops lecture 1Oops lecture 1
Oops lecture 1
 
Oops presentation
Oops presentationOops presentation
Oops presentation
 
Effective Object Oriented Design in Cpp
Effective Object Oriented Design in CppEffective Object Oriented Design in Cpp
Effective Object Oriented Design in Cpp
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
How to Write Node.js Module
How to Write Node.js ModuleHow to Write Node.js Module
How to Write Node.js Module
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
Odoo from 7.0 to 8.0 API
Odoo from 7.0 to 8.0 APIOdoo from 7.0 to 8.0 API
Odoo from 7.0 to 8.0 API
 
Odoo - From v7 to v8: the new api
Odoo - From v7 to v8: the new apiOdoo - From v7 to v8: the new api
Odoo - From v7 to v8: the new api
 
Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)Groovy On Trading Desk (2010)
Groovy On Trading Desk (2010)
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
20145-5SumII_CSC407_assign1.htmlCSC 407 Computer Systems II.docx
 
Csdfsadf
CsdfsadfCsdfsadf
Csdfsadf
 
C
CC
C
 
C
CC
C
 

More from Andreas Dewes

Fairness and Transparency in Machine Learning
Fairness and Transparency in Machine LearningFairness and Transparency in Machine Learning
Fairness and Transparency in Machine LearningAndreas Dewes
 
Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!Andreas Dewes
 
Analyzing data with docker v4
Analyzing data with docker   v4Analyzing data with docker   v4
Analyzing data with docker v4Andreas Dewes
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New BossAndreas Dewes
 
Let's build a quantum computer!
Let's build a quantum computer!Let's build a quantum computer!
Let's build a quantum computer!Andreas Dewes
 
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...Andreas Dewes
 
Python for Scientists
Python for ScientistsPython for Scientists
Python for ScientistsAndreas Dewes
 

More from Andreas Dewes (7)

Fairness and Transparency in Machine Learning
Fairness and Transparency in Machine LearningFairness and Transparency in Machine Learning
Fairness and Transparency in Machine Learning
 
Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!Type Annotations in Python: Whats, Whys and Wows!
Type Annotations in Python: Whats, Whys and Wows!
 
Analyzing data with docker v4
Analyzing data with docker   v4Analyzing data with docker   v4
Analyzing data with docker v4
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Let's build a quantum computer!
Let's build a quantum computer!Let's build a quantum computer!
Let's build a quantum computer!
 
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...Demonstrating Quantum Speed-Up  with a Two-Transmon Quantum Processor Ph.D. d...
Demonstrating Quantum Speed-Up with a Two-Transmon Quantum Processor Ph.D. d...
 
Python for Scientists
Python for ScientistsPython for Scientists
Python for Scientists
 

Recently uploaded

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

Code is not text! How graph technologies can help us to understand our code better.

  • 1. Code Is Not Text! How graph technologies can help us to understand our code better Andreas Dewes (@japh44) andreas@quantifiedcode.com 21.07.2015 EuroPython 2015 – Bilbao
  • 2. About Physicist and Python enthusiast We are a spin-off of the University of Munich (LMU): We develop software for data-driven code analysis.
  • 3. How we ussually think about code
  • 4. But code can also look like this...
  • 5. Our Journey 1. Why graphs are interesting 2. How we can store code in a graph 3. What we can learn from the graph 4. How programmers can profit from this
  • 6. Graphs explained in 30 seconds node / vertex edge node_type: classsdef name: Foo label: classsdef data: {...} node_type: functiondef name: foo Old idea, many new solutions: Neo4j, OrientDB, ArangoDB, TitanDB, ... (+SQL, key/value stores)
  • 7. Graphs in Programming Used mostly within the interpreter/compiler. Use cases • Code Optimization • Code Annotation • Rewriting of Code • As Intermediate Language
  • 8. Building the Code Graph def encode(obj): """ Encode a (possibly nested) dictionary containing complex values into a form that can be serialized using JSON. """ e = {} for key,value in obj.items(): if isinstance(value,dict): e[key] = encode(value) elif isinstance(value,complex): e[key] = {'type' : 'complex', 'r' : value.real, 'i' : value.imag} return e dict name name assign functiondef body body targets for body iterator value import ast tree = ast.parse(" ") ...
  • 9. Storing the Graph: Merkle Trees https://en.wikipedia.org/wiki/Merkle_tree https://git-scm.com/book/en/v2/Git-Internals-Git-Objects https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees / 4a7ef... /flask 79fe4... /docs a77be... /docs/conf.py 9fa5a../flask/app.py 7fa2a.. ... ... tree blob Example: git (also Bitcoin)
  • 10. {i : 1} {id : 'e'} {name: 'encode', args : [...]} {i:0} AST Example e4fa76b... a76fbc41... c51fa291... name name assign body body targets for body iterator value dict functiondef {i : 1} {id : 'f'} {i:0} 5afacc... ba4ffac... 7faec44... name assign body body targets value dict functiondef {name: 'decode', args : [...]} 74af219...
  • 11. Efficieny of this Approach
  • 12. What this enables • Store everything, not just condensed meta-data (like e.g. IDEs do) • Store multiple projects together, to reveal connections and similarities • Store the whole git commit history of a given project, to see changes across time.
  • 15. Querying & Navigation 1. Perform a query over some indexed field(s) to retrieve an initial set of nodes or edges. graph.filter({'node_type' : 'functiondef',...}) 2. Traverse the resulting graph along its edges. for child in node.outV('body'): if child['node_type'] == ...
  • 16. Examples Show all symbol names, sorted by usage. graph.filter({'node_type' : {$in : ['functiondef','...']}}) .groupby('name',as = 'cnt').orderby('-cnt') index 79 ... foo 7 ... bar 5
  • 17. Examples (contd.) Show all versions of a given function. graph.get_by_path('flask.helpers.url_for') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') fa7fca... 3cdaf...
  • 19. Example: Code Complexity Graph Algorithm for Calculating the Cyclomatic Complexity (the Python variety) node = root def walk(node,anchor = None): if node['node_type'] == 'functiondef': anchor=node anchor['cc']=1 #there is always one path elif node['node_type'] in ('for','if','ifexp','while',...): if anchor: anchor['cc']+=1 for subnode in node.outV: walk(subnode,anchor = anchor) #aggregate by function path to visualize The cyclomatic complexity is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.
  • 20. Example: Flask flask.helpers.send_file (complexity: 22) flask.helpers.url_for (complexity: 14) area: AST weight ( lines of code) height: complexity color: complexity/weighthttps://quantifiedcode.github.io/code-is-beautiful
  • 22. Finding Patterns & Problems
  • 23. Pattern Matching: Text vs. Graphs Many other standards: XQuery/XPath, Cypher (Neo4j), Gremlin (e.g. TitanDB), ... node_type: word content: {$or : [hello, hallo]} #... >followed_by: node_type: word content: {$or : [world, welt]} Hello, world! /(hello|hallo),*s* (world|welt)/i word(hello) punctuation(,) word(world)
  • 24. Example: Building a Code Checker node_type: tryexcept >handlers: $contains: node_type: excepthandler type: null >body: node_type: pass try: customer.credit_card.debit(-100) except: pass #to-do: implement this!
  • 25. Adding an exception to the rule node_type: tryexcept >handlers: $contains: node_type: excepthandler type: null >body: $not: $anywhere: node_type: raise exclude: #we exclude nested try's node_type: $or: [tryexcept] try: customer.credit_card.debit(-100) except: logger.error("This can't be good.") raise #let someone else deal with #this
  • 27. Example: Diff from Django Project
  • 28. {i : 1} {id : 'e'} {name: 'encode', args : [...]} {i:0} Basic Problem: Tree Isomorphism (NP-complete!) name name assign body body targets for body iterator value dict functiondef {i : 1} {id : 'ee'} {name: '_encode', args : [...]} {i:0} name name assign body body targets for body iterator value dict functiondef
  • 29. Similar Problem: Chemical Similarity https://en.wikipedia.org/wiki/Epigallocatechin_gallate Epigallocatechin gallate Solution(s): Jaccard Fingerprints Bloom Filters ... Benzene
  • 30. Applications Detect duplicated code e.g. "Duplicate code detection using anti-unification", P Bulychev et. al. (CloneDigger) Generate semantic diffs e.g. "Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction", Fluri, B. et. al. Detect plagiarism / copyrighted code e.g. "PDE4Java: Plagiarism Detection Engine For Java Source Code: A Clustering Approach", A. Jadalla et. al.
  • 31. Example: Semantic Diff @mock.patch('django.db.migrations.questioner.MigrationQuestioner.ask_not_null_alteration', return_value='Some Name') def test_alter_field_to_not_null_oneoff_default(self, mocked_ask_method): """ #23609 - Tests autodetection of nullable to non-nullable alterations. """ class CustomQuestioner(...) # Make state before = self.make_project_state([self.author_name_null]) after = self.make_project_state([self.author_name]) autodetector = MigrationAutodetector(before, after, CustomQuestioner()) changes = autodetector._detect_changes() self.assertEqual(mocked_ask_method.call_count, 1) # Right number/type of migrations? self.assertNumberMigrations(changes, 'testapp', 1) self.assertOperationTypes(changes, 'testapp', 0, ["AlterField"]) self.assertOperationAttributes(changes, "testapp", 0, 0, name="name", preserve_default=False) self.assertOperationFieldAttributes(changes, "testapp", 0, 0, default="Some Name")
  • 32. Summary: Text vs. Graphs Text + Easy to write + Easy to display + Universal format + Interoperable - Not normalized - Hard to analyze Graphs + Easy to analyze + Normalized + Easy to transform - Hard to generate - Not (yet) interoperable The Future(?): Use text for small-scale manipulation of code, graphs for large-scale visualization, analysis and transformation.

Editor's Notes

  1. Advantages of treating code as data, not text: - Semantic diffs (what changed on the semantic level?) - Analysis & refactoring (editing graphs, not pieces of text) - Finding code duplicates - Finding copyrighted code - Visualizing code structure - Storing code efficiently - Separating the content from the text
  2. - Show code as text - Show code as intermediate representation (AST)
  3. First, we transform the code into a so-called „abstract syntax tree“. This is a representation that can be easily manipulated programatically.
  4. First, we transform the code into a so-called „abstract syntax tree“. This is a representation that can be easily manipulated programatically.
  5. We store all syntax trees of the project in a graph database (either on-disk or in-memory) to be able to perform queries on the graph and store it for later analysis. Nodes in modules can be linked, e.g. to point from a function call in a given module to the definition of that function in another module.
  6. First, we transform the code into a so-called „abstract syntax tree“. This is a representation that can be easily manipulated programatically.