Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Code Is Not Text!
How graph technologies can help us to
understand our code better
Andreas Dewes (@japh44)
andreas@quantif...
About
Physicist and Python enthusiast
We are a spin-off of the
University of Munich (LMU):
We develop software for data-dr...
How we ussually think about code
But code can also look like this...
Our Journey
1. Why graphs are interesting
2. How we can store code in a graph
3. What we can learn from the graph
4. How p...
Graphs explained in 30 seconds
node / vertex
edge
node_type: classsdef
name: Foo
label: classsdef
data: {...}
node_type: f...
Graphs in Programming
Used mostly within the
interpreter/compiler.
Use cases
• Code Optimization
• Code Annotation
• Rewri...
Building the Code Graph
def encode(obj):
"""
Encode a (possibly nested)
dictionary containing complex values
into a form t...
Storing the Graph: Merkle Trees
https://en.wikipedia.org/wiki/Merkle_tree
https://git-scm.com/book/en/v2/Git-Internals-Git...
{i : 1}
{id : 'e'}
{name: 'encode',
args : [...]}
{i:0}
AST Example
e4fa76b...
a76fbc41...
c51fa291...
name
name
assign
bo...
Efficieny of this Approach
What this enables
• Store everything, not just condensed
meta-data (like e.g. IDEs do)
• Store multiple projects together,...
Modules
Classes
Functions
The Flask project
(30.000 vertices)
Working with Graphs
Querying & Navigation
1. Perform a query over some indexed field(s)
to retrieve an initial set of nodes or edges.
graph.fi...
Examples
Show all symbol names, sorted by usage.
graph.filter({'node_type' : {$in : ['functiondef','...']}})
.groupby('nam...
Examples (contd.)
Show all versions of a given function.
graph.get_by_path('flask.helpers.url_for')
def url_for(endpoint, ...
Visualizing Code
Example: Code Complexity
Graph Algorithm for Calculating the
Cyclomatic Complexity (the Python variety)
node = root
def wa...
Example: Flask
flask.helpers.send_file
(complexity: 22)
flask.helpers.url_for
(complexity: 14)
area:
AST weight
( lines o...
Exploring Dependencies in a Code Base
Finding Patterns & Problems
Pattern Matching: Text vs. Graphs
Many other standards: XQuery/XPath, Cypher (Neo4j), Gremlin (e.g. TitanDB), ...
node_typ...
Example: Building a Code Checker
node_type: tryexcept
>handlers:
$contains:
node_type: excepthandler
type: null
>body:
nod...
Adding an exception to the rule
node_type: tryexcept
>handlers:
$contains:
node_type: excepthandler
type: null
>body:
$not...
Bonus Chapter: Analyzing Changes
Example: Diff from Django Project
{i : 1}
{id : 'e'}
{name: 'encode',
args : [...]}
{i:0}
Basic Problem: Tree Isomorphism (NP-complete!)
name
name
assign
bo...
Similar Problem: Chemical Similarity
https://en.wikipedia.org/wiki/Epigallocatechin_gallate
Epigallocatechin gallate
Solut...
Applications
Detect duplicated code
e.g. "Duplicate code detection using anti-unification", P Bulychev et. al.
(CloneDigge...
Example: Semantic Diff
@mock.patch('django.db.migrations.questioner.MigrationQuestioner.ask_not_null_alteration',
return_v...
Summary: Text vs. Graphs
Text
+ Easy to write
+ Easy to display
+ Universal format
+ Interoperable
- Not normalized
- Hard...
Thanks!
Andreas Dewes (@japh44)
andreas@quantifiedcode.com
www.quantifiedcode.com
https://github.com/quantifiedcode
@quant...
Upcoming SlideShare
Loading in …5
×

Code is not text! How graph technologies can help us to understand our code better.

2,415 views

Published on

Today, we almost exclusively think of code in software projects as a collection of text files. The tools that we use (version control systems, IDEs, code analyzers) also use text as the primary storage format for code. In fact, the belief that “code is text” is so deeply ingrained in our heads that we never question its validity or even become aware of the fact that there are other ways to look at code.

In my talk I will explain why treating code as text is a very bad idea which actively holds back our understanding and creates a range of problems in large software projects. I will then show how we can overcome (some of) these problems by treating and storing code as data, and more specifically as a graph. I will show specific examples of how we can use this approach to improve our understanding of large code bases, increase code quality and automate certain aspects of software development.

Finally, I will outline my personal vision of the future of programming, which is a future where we no longer primarily interact with code bases using simple text editors. I will also give some ideas on how we might get to that future.

Published in: Data & Analytics
  • Your resources were amazing Jeevan. They were nothing like the material we received at school. Thanks Jeevan. They helped me get an A! ➤➤ http://t.cn/AirrSv7D
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • My daughter struggled with Maths due to an absence of teachers during year 10. I purchased Jeevan's 'home-tutor' program and she has not looked back. Not only does it explain the basic steps but also how to achieve those top grades. This is not only far better than a private tutor but amazing value for money. I would only have got a few hours of a tutors time for the same money. I am very grateful as this has turned my daughters attitude to Maths around- she now loves it and finds it easy! My other daughter, who is currently 14-years-old, has already begun your program. After going through your book and DVD's, she has moved up to the top set in maths. I have no doubt when she takes her GCSE maths in 2 years, she will achieve an A/A* grade! Many many thanks for your help Jeevan! ♥♥♥ http://t.cn/AirrSv7D
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m77EgH } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m77EgH } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Code is not text! How graph technologies can help us to understand our code better.

  1. 1. Code Is Not Text! How graph technologies can help us to understand our code better Andreas Dewes (@japh44) andreas@quantifiedcode.com 21.07.2015 EuroPython 2015 – Bilbao
  2. 2. About Physicist and Python enthusiast We are a spin-off of the University of Munich (LMU): We develop software for data-driven code analysis.
  3. 3. How we ussually think about code
  4. 4. But code can also look like this...
  5. 5. Our Journey 1. Why graphs are interesting 2. How we can store code in a graph 3. What we can learn from the graph 4. How programmers can profit from this
  6. 6. Graphs explained in 30 seconds node / vertex edge node_type: classsdef name: Foo label: classsdef data: {...} node_type: functiondef name: foo Old idea, many new solutions: Neo4j, OrientDB, ArangoDB, TitanDB, ... (+SQL, key/value stores)
  7. 7. Graphs in Programming Used mostly within the interpreter/compiler. Use cases • Code Optimization • Code Annotation • Rewriting of Code • As Intermediate Language
  8. 8. Building the Code Graph def encode(obj): """ Encode a (possibly nested) dictionary containing complex values into a form that can be serialized using JSON. """ e = {} for key,value in obj.items(): if isinstance(value,dict): e[key] = encode(value) elif isinstance(value,complex): e[key] = {'type' : 'complex', 'r' : value.real, 'i' : value.imag} return e dict name name assign functiondef body body targets for body iterator value import ast tree = ast.parse(" ") ...
  9. 9. Storing the Graph: Merkle Trees https://en.wikipedia.org/wiki/Merkle_tree https://git-scm.com/book/en/v2/Git-Internals-Git-Objects https://en.bitcoin.it/wiki/Protocol_documentation#Merkle_Trees / 4a7ef... /flask 79fe4... /docs a77be... /docs/conf.py 9fa5a../flask/app.py 7fa2a.. ... ... tree blob Example: git (also Bitcoin)
  10. 10. {i : 1} {id : 'e'} {name: 'encode', args : [...]} {i:0} AST Example e4fa76b... a76fbc41... c51fa291... name name assign body body targets for body iterator value dict functiondef {i : 1} {id : 'f'} {i:0} 5afacc... ba4ffac... 7faec44... name assign body body targets value dict functiondef {name: 'decode', args : [...]} 74af219...
  11. 11. Efficieny of this Approach
  12. 12. What this enables • Store everything, not just condensed meta-data (like e.g. IDEs do) • Store multiple projects together, to reveal connections and similarities • Store the whole git commit history of a given project, to see changes across time.
  13. 13. Modules Classes Functions The Flask project (30.000 vertices)
  14. 14. Working with Graphs
  15. 15. Querying & Navigation 1. Perform a query over some indexed field(s) to retrieve an initial set of nodes or edges. graph.filter({'node_type' : 'functiondef',...}) 2. Traverse the resulting graph along its edges. for child in node.outV('body'): if child['node_type'] == ...
  16. 16. Examples Show all symbol names, sorted by usage. graph.filter({'node_type' : {$in : ['functiondef','...']}}) .groupby('name',as = 'cnt').orderby('-cnt') index 79 ... foo 7 ... bar 5
  17. 17. Examples (contd.) Show all versions of a given function. graph.get_by_path('flask.helpers.url_for') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') def url_for(endpoint, **values): """Generates a URL to the given endpoint with the method provided. Variable arguments that are unknown to the target endpoint are appended to the generated URL as query arguments. If the value of a query argument is ``None``, the whole pair is skipped. In case blueprints are active you can shortcut references to the same blueprint by prefixing the local endpoint with a dot (``.``). This will reference the index function local to the current blueprint:: url_for('.index') fa7fca... 3cdaf...
  18. 18. Visualizing Code
  19. 19. Example: Code Complexity Graph Algorithm for Calculating the Cyclomatic Complexity (the Python variety) node = root def walk(node,anchor = None): if node['node_type'] == 'functiondef': anchor=node anchor['cc']=1 #there is always one path elif node['node_type'] in ('for','if','ifexp','while',...): if anchor: anchor['cc']+=1 for subnode in node.outV: walk(subnode,anchor = anchor) #aggregate by function path to visualize The cyclomatic complexity is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.
  20. 20. Example: Flask flask.helpers.send_file (complexity: 22) flask.helpers.url_for (complexity: 14) area: AST weight ( lines of code) height: complexity color: complexity/weighthttps://quantifiedcode.github.io/code-is-beautiful
  21. 21. Exploring Dependencies in a Code Base
  22. 22. Finding Patterns & Problems
  23. 23. Pattern Matching: Text vs. Graphs Many other standards: XQuery/XPath, Cypher (Neo4j), Gremlin (e.g. TitanDB), ... node_type: word content: {$or : [hello, hallo]} #... >followed_by: node_type: word content: {$or : [world, welt]} Hello, world! /(hello|hallo),*s* (world|welt)/i word(hello) punctuation(,) word(world)
  24. 24. Example: Building a Code Checker node_type: tryexcept >handlers: $contains: node_type: excepthandler type: null >body: node_type: pass try: customer.credit_card.debit(-100) except: pass #to-do: implement this!
  25. 25. Adding an exception to the rule node_type: tryexcept >handlers: $contains: node_type: excepthandler type: null >body: $not: $anywhere: node_type: raise exclude: #we exclude nested try's node_type: $or: [tryexcept] try: customer.credit_card.debit(-100) except: logger.error("This can't be good.") raise #let someone else deal with #this
  26. 26. Bonus Chapter: Analyzing Changes
  27. 27. Example: Diff from Django Project
  28. 28. {i : 1} {id : 'e'} {name: 'encode', args : [...]} {i:0} Basic Problem: Tree Isomorphism (NP-complete!) name name assign body body targets for body iterator value dict functiondef {i : 1} {id : 'ee'} {name: '_encode', args : [...]} {i:0} name name assign body body targets for body iterator value dict functiondef
  29. 29. Similar Problem: Chemical Similarity https://en.wikipedia.org/wiki/Epigallocatechin_gallate Epigallocatechin gallate Solution(s): Jaccard Fingerprints Bloom Filters ... Benzene
  30. 30. Applications Detect duplicated code e.g. "Duplicate code detection using anti-unification", P Bulychev et. al. (CloneDigger) Generate semantic diffs e.g. "Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction", Fluri, B. et. al. Detect plagiarism / copyrighted code e.g. "PDE4Java: Plagiarism Detection Engine For Java Source Code: A Clustering Approach", A. Jadalla et. al.
  31. 31. Example: Semantic Diff @mock.patch('django.db.migrations.questioner.MigrationQuestioner.ask_not_null_alteration', return_value='Some Name') def test_alter_field_to_not_null_oneoff_default(self, mocked_ask_method): """ #23609 - Tests autodetection of nullable to non-nullable alterations. """ class CustomQuestioner(...) # Make state before = self.make_project_state([self.author_name_null]) after = self.make_project_state([self.author_name]) autodetector = MigrationAutodetector(before, after, CustomQuestioner()) changes = autodetector._detect_changes() self.assertEqual(mocked_ask_method.call_count, 1) # Right number/type of migrations? self.assertNumberMigrations(changes, 'testapp', 1) self.assertOperationTypes(changes, 'testapp', 0, ["AlterField"]) self.assertOperationAttributes(changes, "testapp", 0, 0, name="name", preserve_default=False) self.assertOperationFieldAttributes(changes, "testapp", 0, 0, default="Some Name")
  32. 32. Summary: Text vs. Graphs Text + Easy to write + Easy to display + Universal format + Interoperable - Not normalized - Hard to analyze Graphs + Easy to analyze + Normalized + Easy to transform - Hard to generate - Not (yet) interoperable The Future(?): Use text for small-scale manipulation of code, graphs for large-scale visualization, analysis and transformation.
  33. 33. Thanks! Andreas Dewes (@japh44) andreas@quantifiedcode.com www.quantifiedcode.com https://github.com/quantifiedcode @quantifiedcode

×