This document discusses analyzing networks using Hadoop and Neo4j. It demonstrates transforming raw network data into nodes and edges that can be imported into Neo4j. It then shows how to run graph queries in Cypher to analyze the network data and find meaningful patterns. The document also provides sample nodes, edges, and Cypher queries run on the network data.
발표자: 김준호(Lunit)
발표일: 2018.1.
의료 AI 관련 중, nodule detection 문제에 대해 다뤄보고자 합니다.
의료 AI에서는 어떠한 방식으로 classication을 하고, preprocessing은 어떤식으로 진행되는지 LUNA16이라는 의료 challenge에 이용되는 데이터를 가지고 발표를 진행해보고자 합니다.
이후, 이 데이터를 이용해서 최근 2017 MICCAI (의료 영상학회에서는 높은 수준의 학회)에서 발표된 "curriculum adaptive sampling for extreme data imbalance"를 실제 구현해서 적용해보고 이 때 발생할 수 있는 문제를 어떤식으로 해결할 수 있는지에 대한 tip도 제공할 예정입니다. (Python multi-processing data load, input-pipeline)
위 논문을 선정한 이유는, 단순한 classification이 아닌, nodule이 있는 위치도 정확하게 catch하는 논문 중, performance가 상당히 높기 때문입니다.
In this lecture we analyze graph oriented databases. In particular, we consider TtitanDB as graph database. We analyze how to query using gremlin and how to create edges and vertex.
Finally, we presents how to use rexster to visualize the storeg graph/
발표자: 김준호(Lunit)
발표일: 2018.1.
의료 AI 관련 중, nodule detection 문제에 대해 다뤄보고자 합니다.
의료 AI에서는 어떠한 방식으로 classication을 하고, preprocessing은 어떤식으로 진행되는지 LUNA16이라는 의료 challenge에 이용되는 데이터를 가지고 발표를 진행해보고자 합니다.
이후, 이 데이터를 이용해서 최근 2017 MICCAI (의료 영상학회에서는 높은 수준의 학회)에서 발표된 "curriculum adaptive sampling for extreme data imbalance"를 실제 구현해서 적용해보고 이 때 발생할 수 있는 문제를 어떤식으로 해결할 수 있는지에 대한 tip도 제공할 예정입니다. (Python multi-processing data load, input-pipeline)
위 논문을 선정한 이유는, 단순한 classification이 아닌, nodule이 있는 위치도 정확하게 catch하는 논문 중, performance가 상당히 높기 때문입니다.
In this lecture we analyze graph oriented databases. In particular, we consider TtitanDB as graph database. We analyze how to query using gremlin and how to create edges and vertex.
Finally, we presents how to use rexster to visualize the storeg graph/
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB supports replication for failover and redundancy. In this session we will introduce the basic concepts around replica sets, which provide automated failover and recovery of nodes. We'll show you how to set up, configure, and initiate a replica set, and methods for using replication to scale reads. We'll also discuss proper architecture for durability.
Slides from my GeeCON 2014 Prague talk:
"Groovy is a dynamic language that provides different types of metaprogramming techniques. In this talk we’ll mainly see runtime metaprogramming. I’ll explain Groovy Meta-Object-Protocol (MOP), the metaclass, how to intercept method calls, how to deal with method missing and property missing, the use of mixins, traits and categories. All of these topics will be explained with examples in order to understand them.
Also, I’ll talk a little bit about compile-time metaprogramming with AST Transformations. AST Transformations provide a wonderful way of manipulating code at compile time via modifications of the Abstract Syntax Tree. We’ll see a basic but powerful example of what we can do with AST transformations."
The code is available at: https://github.com/lmivan/geecon2014-prague-metaprograming-with-groovy
Abstract:
Many machine learning algorithms can be implemented to run parallel operations on graphics cards. Deeplearning4j is a Java-based machine learning library, which includes implementations of many popular neural-network algorithms. Deeplearning4j uses uses a library called Nd4j to run matrix algebra operations on either CPUs or GPUs with NVIDIA’s CUDA API.
In this talk, I will show how to get a simple machine learning algorithm running on the GPU. I will also cover how to get started with CUDA development: how to get your code to run on the GPU, how to monitor the device, and how to write code to make effective use of parralelization.
Bio: Gary Sieling is a Lead Software Engineer at IQVIA, in Blue Bell, PA, with an interests in database technologies, machine learning, and software engineering practices. He has been involved in curating talks for a company lunch and learn program and the organizing committee for a tech conference. Building on these experiences, he built a search engine called FindLectures.com to help find great talks and speakers.
Video: https://www.youtube.com/watch?v=SfrEThI_m7g
Source code: https://github.com/Alotor/2015-greach-groovy-dsls
Behind each good Groovy library or framework there is a good DSL (Domain Specific Language). And this is not by chance, one of the most exciting features of Groovy is its amazing syntax flexibility and metaprogramming capabilities that allow us do things in a highly expressive manner through DSLs.
In this talk I’ll explain the basics of doing DLS’s with Groovy. What you’ll need to start and what to investigate deeper. Also, we’ll check some of the most well known ones libraries like Spock, Gradle or Grails so you can use their techniques in your own Groovy projects.
Each of the files or classes of a projects source code represents a tree (AST). Looking at dependencies to other classes besides inheritance creates a graph though. Field types and method parameters are also implicit dependencies. Storing this information in a graph database like Neo4j allows for interesting queries and insights. Class-Graph provides that and is available as open-source github project.
Aggregation pipeline has been able to power your analysis of data since version 2.2. In 4.2 we added more power and now you can use it for more powerful queries, updates, and outputting your data to existing collections. Come hear how you can do everything with the pipeline, including single-view, ETL, data roll-ups and materialized views.
MongoDB supports replication for failover and redundancy. In this session we will introduce the basic concepts around replica sets, which provide automated failover and recovery of nodes. We'll show you how to set up, configure, and initiate a replica set, and methods for using replication to scale reads. We'll also discuss proper architecture for durability.
Slides from my GeeCON 2014 Prague talk:
"Groovy is a dynamic language that provides different types of metaprogramming techniques. In this talk we’ll mainly see runtime metaprogramming. I’ll explain Groovy Meta-Object-Protocol (MOP), the metaclass, how to intercept method calls, how to deal with method missing and property missing, the use of mixins, traits and categories. All of these topics will be explained with examples in order to understand them.
Also, I’ll talk a little bit about compile-time metaprogramming with AST Transformations. AST Transformations provide a wonderful way of manipulating code at compile time via modifications of the Abstract Syntax Tree. We’ll see a basic but powerful example of what we can do with AST transformations."
The code is available at: https://github.com/lmivan/geecon2014-prague-metaprograming-with-groovy
Abstract:
Many machine learning algorithms can be implemented to run parallel operations on graphics cards. Deeplearning4j is a Java-based machine learning library, which includes implementations of many popular neural-network algorithms. Deeplearning4j uses uses a library called Nd4j to run matrix algebra operations on either CPUs or GPUs with NVIDIA’s CUDA API.
In this talk, I will show how to get a simple machine learning algorithm running on the GPU. I will also cover how to get started with CUDA development: how to get your code to run on the GPU, how to monitor the device, and how to write code to make effective use of parralelization.
Bio: Gary Sieling is a Lead Software Engineer at IQVIA, in Blue Bell, PA, with an interests in database technologies, machine learning, and software engineering practices. He has been involved in curating talks for a company lunch and learn program and the organizing committee for a tech conference. Building on these experiences, he built a search engine called FindLectures.com to help find great talks and speakers.
Video: https://www.youtube.com/watch?v=SfrEThI_m7g
Source code: https://github.com/Alotor/2015-greach-groovy-dsls
Behind each good Groovy library or framework there is a good DSL (Domain Specific Language). And this is not by chance, one of the most exciting features of Groovy is its amazing syntax flexibility and metaprogramming capabilities that allow us do things in a highly expressive manner through DSLs.
In this talk I’ll explain the basics of doing DLS’s with Groovy. What you’ll need to start and what to investigate deeper. Also, we’ll check some of the most well known ones libraries like Spock, Gradle or Grails so you can use their techniques in your own Groovy projects.
Each of the files or classes of a projects source code represents a tree (AST). Looking at dependencies to other classes besides inheritance creates a graph though. Field types and method parameters are also implicit dependencies. Storing this information in a graph database like Neo4j allows for interesting queries and insights. Class-Graph provides that and is available as open-source github project.
Presented at a guest lecture at the Rijksuniversiteit Groningen as part of the web and cloud computing master course.
I presented a architecture for and working implementation of doing Hadoop based typeahead style search suggestions. There is a companion github repo with the code and config at: https://github.com/friso/rug (there's no documentation, though).
Unidirectional Security, Andrew Ginter of Waterfall Security Digital Bond
This presentation reviews the spectrum of perimeter solutions based on unidirectional technology - solutions that are being deployed to protect the safety and reliability of industrial control systems. Learn why the technology is truly unidirectional based on physics and different ways it can be used in SCADA and DCS.
Many practitioners find parts of the spectrum to be counter-intuitive. Further, some parts of the spectrum are straightforward to deploy, and others require that practitioners take some care to ensure that the results really are as strong as they should be. Technologies and techniques covered include unidirectional gateways, secure bypass, temporary/programmed gateway reversals, opposing gateways, secure remote access, and parallel operations and IT WANs.
Sasha Goldshtein's talk at the SELA Developer Practice (May 2013) that explains the most common vulnerabilities in web applications and demonstrates how to exploit them and how to defend applications against these attacks. Among the topics covered: SQL and OS command injection, XSS, CSRF, insecure session cookies, insecure password storage, and security misconfiguration.
Software development manager performance appraisalmartinjack417
Software development manager job description,Software development manager goals & objectives,Software development manager KPIs & KRAs,Software development manager self appraisal
Python là ngôn ngữ lập trình đơn giản và đang càng này càng trở lên phổ biến. Bài giảng này cung cấp cách tiếp cận đơn giản dễ hiểu với python một cách dễ dàng nhất
Highlights a bunch of different Python tricks and tips - from the stupid to the awesome (and a bit of both).
See how to register a 'str'.decode('hail_mary') codec, call_functions[1, 2, 3] instead of call_functions(1, 2, 3), creating a "Clojure-like" threading syntax by overloading the pipe operator, create useful equality mocks by overloading the equality operator, ditch JSON for pySON and put together a tiny lisp based on Norvig's awesome article.
Brief overview of the Rust system programming language. Provides a concise introduction of its basic features, with an emphasis on its memory safety features (ownership, moves, borrowing) and programming style with generic functions, structures, and traits.
GECon2017_Cpp a monster that no one likes but that will outlast them all _Ya...GECon_Org Team
1. A bit of C++ evolution history.
2. Why C++ became popular and why C++ started to lose its popularity.
3. Why C++ is a Monster? If C ++ is a monster, then why does 4. someone need C++?
5. How to use C++ safely and efficiently?
6. Where and when C++ can be used. If it can...
7. Pressure to C++ from competitors. Real and imaginary.
Similar to Network analysis with Hadoop and Neo4j (20)
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
8. TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
raw
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
transform nodes.txt
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
data +
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
edges.txt
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
en
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21
ric
h
?
import enriched-nodes.txt
+
enriched-edges.txt
qu
int er
er y /
ac
t
11. nodes.txt:
1 AS1 LVLT-1 - Level 3 Communications, Inc.
10 AS10 CSNET-EXT-AS - CSNET Coordination and Information Center (CSNET-CIC)
100 AS100 FMC-CTC - FMC Central Engineering Laboratories
1000 AS1000 GONET-ASN-17 - GONET
10000 AS10000 NCM Nagasaki Cable Media Inc.
10001 AS10001 MICSNET Mics Network Corporation
10002 AS10002 ICT IGAUENO CABLE TELEVISION CO.,LTD
10003 AS10003 OCT-NET Ogaki Cable Television Co.,Inc.
10004 AS10004 AS-PHOENIX-J JIN Office Service Inc.
10006 AS10006 SECOMTRUST SECOM Trust Systems Co.,Ltd.
10010 AS10010 TOKAI TOKAI Communications Corporation
10011 AS10011 ADVAN advanscope.inc
10012 AS10012 FUSION Fusion Communications Corp.
edges.txt:
1 21616 3
1 3705 3
1 2 3
2 3 1
3 4 2
3 11488 2
4 5 1
10 10 2
10 13227 2
12 12 1
12. public class SillyImporter {
private static enum ConnectionTypes implements RelationshipType {
FOLLOWS;
}
public static void main(String[] args) {
BatchInserter database = new BatchInserterImpl("/Users/friso/Desktop/graph.db");
BatchInserterIndexProvider provider = new LuceneBatchInserterIndexProvider(database);
BatchInserterIndex index = provider.nodeIndex("allnodes", stringMap(
"type", "fulltext",
IndexManager.PROVIDER, "lucene"
));
long fzkNodeId = database.createNode(map(
new Object[] {"name", "fzk", "tweets", 25L} //node properties
));
index.add(fzkNodeId, map(new Object[] {"name", "fzk"}));
long krisgeusNodeId = database.createNode(map(
new Object[] {"name", "krisgeus", "tweets", 100L} //node properties
));
index.add(krisgeusNodeId, map(new Object[] {"name", "krisgeus"}));
database.createRelationship(
krisgeusNodeId, //from node
fzkNodeId, //to node
ConnectionTypes.FOLLOWS, //relationship type
map(new Object[] {"retweets", 3})); //relationship properties
index.flush();
provider.shutdown();
database.shutdown();
}
}
13. 30M nodes + 250M edges, < 30 minutes
(if your graph fits in memory)
14. Cypher graph query language
start a = node:allnodes(‘name:”fzk”’)
match p = a-[r]-b
return p
All relationships from node with name=fzk
15. Cypher graph query language
start a = node:allnodes(‘name:”fzk”’)
match p = a-[r]-b
where any(x in r.amounts where x > 500)
return p
All relationships from node with name=fzk, where any
element in the array property amounts of the relationship
is greater than 500
16. Cypher graph query language
start a = node:allnodes(‘name:”fzk”’),
b = node:allnodes(‘name:”krisgeus”’)
match p = a-[:FOLLOWS*1..3]-b
return p
All paths with a length between 1 and 3 (inclusive) between
node with name=fzk and node with name=krisgeus with
any relationship type=FOLLOWS
17. Cypher graph query language
start a = node:allnodes(‘name:”fzk”’),
b = node:allnodes(‘name:”krisgeus”’)
match p = a<-[:FOLLOWS]-x-[:FOLLOWS]->b
return x
All people who follow both fzk and krisgeus. Note the
indication of direction in the relationship predicates.