Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Graphs are a versatile data model for capturing and analyzing rich relational structures. Graphs are an increasingly popular way to represent data in a wide range of domains such as social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.
This presentation discusses Titan's data model, query language, and novel techniques in edge compression, data layout, and vertex-centric indices which facilitate the representation and processing of Big Graph Data across a Cassandra cluster. We demonstrate Titan's performance on a large scale benchmark evaluation using Twitter data.
Presented at the Cassandra 2012 Summit.
Adding Value through graph analysis using Titan and FaunusMatthias Broecheler
In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems.
This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.
The problems we are faced with in the 21st century require efficient analysis of ever more complex systems. This presentation outlines how such problems can be better understood and effectively solved if they are modeled as graphs or networks. We present two tools for to help solve such problems at scale: Titan, which is a real-time distributed graph database based on Apache Cassandra and Hbase and Faunus, which is a batch analytics framework for graphs based on Apache Hadoop. We discuss their current development status as of November 2012 and illustrate an example application for the GitHub coding network.
A graph is a data structure composed of vertices/dots and edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012.
An introduction to graph databases and graph computing frameworks in general and overview of the Aurelius graph cluster in particular. Discusses Titan and Faunus and demonstrates how to build a knowledge graph using the cluster.
This presentation was given at Data Day Texas in 2013. http://datadaytexas.com/
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
Some people think data scientists are mythical beings, like unicorns, or they are some sort of nouveau fad that will quickly fade. Not true, says IBM big data evangelist James Kobielus. In this engaging presentation, with artwork created by Angela Tuminello, Kobielus debunks 10 myths about data scientists and their role in analytics and big data. You might also want to read the full blog by Kobielus that spawned this presentation: "Data Scientists: Myths and Mathemagical Superpowers" - http://ibm.co/PqF7Jn
For more information, visit http://www.ibmbigdatahub.com
Adding Value through graph analysis using Titan and FaunusMatthias Broecheler
In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems.
This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.
The problems we are faced with in the 21st century require efficient analysis of ever more complex systems. This presentation outlines how such problems can be better understood and effectively solved if they are modeled as graphs or networks. We present two tools for to help solve such problems at scale: Titan, which is a real-time distributed graph database based on Apache Cassandra and Hbase and Faunus, which is a batch analytics framework for graphs based on Apache Hadoop. We discuss their current development status as of November 2012 and illustrate an example application for the GitHub coding network.
A graph is a data structure composed of vertices/dots and edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012.
An introduction to graph databases and graph computing frameworks in general and overview of the Aurelius graph cluster in particular. Discusses Titan and Faunus and demonstrates how to build a knowledge graph using the cluster.
This presentation was given at Data Day Texas in 2013. http://datadaytexas.com/
How To Interview a Data Scientist
Daniel Tunkelang
Presented at the O'Reilly Strata 2013 Conference
Video: https://www.youtube.com/watch?v=gUTuESHKbXI
Interviewing data scientists is hard. The tech press sporadically publishes “best” interview questions that are cringe-worthy.
At LinkedIn, we put a heavy emphasis on the ability to think through the problems we work on. For example, if someone claims expertise in machine learning, we ask them to apply it to one of our recommendation problems. And, when we test coding and algorithmic problem solving, we do it with real problems that we’ve faced in the course of our day jobs. In general, we try as hard as possible to make the interview process representative of actual work.
In this session, I’ll offer general principles and concrete examples of how to interview data scientists. I’ll also touch on the challenges of sourcing and closing top candidates.
Presentation given by Dr. Diego Kuonen, CStat PStat CSci, on November 20, 2013, at the "IBM Developer Days 2013" in Zurich, Switzerland.
ABSTRACT
There is no question that big data has hit the business, government and scientific sectors. The demand for skills in data science is unprecedented in sectors where value, competitiveness and efficiency are driven by data. However, there is plenty of misleading hype around the terms big data and data science. This presentation gives a professional statistician's view on these terms and illustrates the connection between data science and statistics.
The presentation is also available at http://www.statoo.com/BigDataDataScience/.
Myths and Mathemagical Superpowers of Data ScientistsDavid Pittman
Some people think data scientists are mythical beings, like unicorns, or they are some sort of nouveau fad that will quickly fade. Not true, says IBM big data evangelist James Kobielus. In this engaging presentation, with artwork created by Angela Tuminello, Kobielus debunks 10 myths about data scientists and their role in analytics and big data. You might also want to read the full blog by Kobielus that spawned this presentation: "Data Scientists: Myths and Mathemagical Superpowers" - http://ibm.co/PqF7Jn
For more information, visit http://www.ibmbigdatahub.com
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
We at Revolution Analytics are often asked “What is the best way to learn R?” While acknowledging that there may be as many effective learning styles as there are people we have identified three factors that greatly facilitate learning R. For a quick start:
- Find a way of orienting yourself in the open source R world
- Have a definite application area in mind
- Set an initial goal of doing something useful and then build on it
In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will:
- Provide an orientation to R’s data mining resources
- Show how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data.
- Show the simple R commands to accomplish these same tasks without the GUI
- Demonstrate how to build on these fundamental skills to gain further competence in R
- Move away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR
Data scientists and analysts using other statistical software as well as students who are new to data mining should come away with a plan for getting started with R.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Tutorial on Deep learning and ApplicationsNhatHai Phan
In this presentation, I would like to review basis techniques, models, and applications in deep learning. Hope you find the slides are interesting. Further information about my research can be found at "https://sites.google.com/site/ihaiphan/."
NhatHai Phan
CIS Department,
University of Oregon, Eugene, OR
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Introduction to Mahout and Machine LearningVarad Meru
This presentation gives an introduction to Apache Mahout and Machine Learning. It presents some of the important Machine Learning algorithms implemented in Mahout. Machine Learning is a vast subject; this presentation is only a introductory guide to Mahout and does not go into lower-level implementation details.
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
Workshop with Joe Caserta, President of Caserta Concepts, at Data Summit 2015 in NYC.
Data science, the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions, may be considered the "sexiest" job of the 21st century, but it requires an understanding of many elements of data analytics. This workshop introduced basic concepts, such as SQL and NoSQL, MapReduce, Hadoop, data mining, machine learning, and data visualization.
For notes and exercises from this workshop, click here: https://github.com/Caserta-Concepts/ds-workshop.
For more information, visit our website at www.casertaconcepts.com
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
Titan is a scalable graph database that can distribute and query graph data across multiple machines. This presentation provides a general introduction to graph computing and Titan in particular. It also focuses on some recent development for Titan 0.9 and TinkerPop 3.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
We at Revolution Analytics are often asked “What is the best way to learn R?” While acknowledging that there may be as many effective learning styles as there are people we have identified three factors that greatly facilitate learning R. For a quick start:
- Find a way of orienting yourself in the open source R world
- Have a definite application area in mind
- Set an initial goal of doing something useful and then build on it
In this webinar, we focus on data mining as the application area and show how anyone with just a basic knowledge of elementary data mining techniques can become immediately productive in R. We will:
- Provide an orientation to R’s data mining resources
- Show how to use the "point and click" open source data mining GUI, rattle, to perform the basic data mining functions of exploring and visualizing data, building classification models on training data sets, and using these models to classify new data.
- Show the simple R commands to accomplish these same tasks without the GUI
- Demonstrate how to build on these fundamental skills to gain further competence in R
- Move away from using small test data sets and show with the same level of skill one could analyze some fairly large data sets with RevoScaleR
Data scientists and analysts using other statistical software as well as students who are new to data mining should come away with a plan for getting started with R.
Intro to Data Science for Enterprise Big DataPaco Nathan
If you need a different format (PDF, PPT) instead of Keynote, please email me: pnathan AT concurrentinc DOT com
An overview of Data Science for Enterprise Big Data. In other words, how to combine structured and unstructured data, leveraging the tools of automation and mathematics, for highly scalable businesses. We discuss management strategy for building Data Science teams, basic requirements of the "science" in Data Science, and typical data access patterns for working with Big Data. We review some great algorithms, tools, and truisms for building a Data Science practice, and provide plus some great references to read for further study.
Presented initially at the Enterprise Big Data meetup at Tata Consultancy Services, Santa Clara, 2012-08-20 http://www.meetup.com/Enterprise-Big-Data/events/77635202/
Tutorial on Deep learning and ApplicationsNhatHai Phan
In this presentation, I would like to review basis techniques, models, and applications in deep learning. Hope you find the slides are interesting. Further information about my research can be found at "https://sites.google.com/site/ihaiphan/."
NhatHai Phan
CIS Department,
University of Oregon, Eugene, OR
Data By The People, For The People
Daniel Tunkelang
Director, Data Science at LinkedIn
Invited Talk at the 21st ACM International Conference on Information and Knowledge Management (CIKM 2012)
LinkedIn has a unique data collection: the 175M+ members who use LinkedIn are also the content those same members access using our information retrieval products. LinkedIn members performed over 4 billion professionally-oriented searches in 2011, most of those to find and discover other people. Every LinkedIn search and recommendation is deeply personalized, reflecting the user's current employment, career history, and professional network. In this talk, I will describe some of the challenges and opportunities that arise from working with this unique corpus. I will discuss work we are doing in the areas of relevance, recommendation, and reputation, as well as the ecosystem we have developed to incent people to provide the high-quality semi-structured profiles that make LinkedIn so useful.
Bio:
Daniel Tunkelang leads the data science team at LinkedIn, which analyzes terabytes of data to produce products and insights that serve LinkedIn's members. Prior to LinkedIn, Daniel led a local search quality team at Google. Daniel was a founding employee of faceted search pioneer Endeca (recently acquired by Oracle), where he spent ten years as Chief Scientist. He has authored fourteen patents, written a textbook on faceted search, created the annual workshop on human-computer interaction and information retrieval (HCIR), and participated in the premier research conferences on information retrieval, knowledge management, databases, and data mining (SIGIR, CIKM, SIGMOD, SIAM Data Mining). Daniel holds a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Introduction to Mahout and Machine LearningVarad Meru
This presentation gives an introduction to Apache Mahout and Machine Learning. It presents some of the important Machine Learning algorithms implemented in Mahout. Machine Learning is a vast subject; this presentation is only a introductory guide to Mahout and does not go into lower-level implementation details.
This talk is about how we applied deep learning techinques to achieve state-of-the-art results in various NLP tasks like sentiment analysis and aspect identification, and how we deployed these models at Flipkart
Workshop with Joe Caserta, President of Caserta Concepts, at Data Summit 2015 in NYC.
Data science, the ability to sift through massive amounts of data to discover hidden patterns and predict future trends and actions, may be considered the "sexiest" job of the 21st century, but it requires an understanding of many elements of data analytics. This workshop introduced basic concepts, such as SQL and NoSQL, MapReduce, Hadoop, data mining, machine learning, and data visualization.
For notes and exercises from this workshop, click here: https://github.com/Caserta-Concepts/ds-workshop.
For more information, visit our website at www.casertaconcepts.com
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
Titan is a scalable graph database that can distribute and query graph data across multiple machines. This presentation provides a general introduction to graph computing and Titan in particular. It also focuses on some recent development for Titan 0.9 and TinkerPop 3.
Problem solving in the 21st century increasingly depends on the analysis of complex systems. Developing new drugs, understanding risk in financial networks, searching for answers in knowledge graphs, personalization and recommendation in social networks all require the analysis of systems composed of interconnected entities that exhibit complex behavior as a whole. Graph computing provides a conceptual model and practical platform for developing such analyses.
This talk presents graph computing as an important component of every developer’s toolbox. We introduce the Aurelius graph cluster which is an open-source stack enabling graph computing at scale by building on distributed systems like Cassandra, HBase, and Hadoop. This stack addresses challenging problems in graph partitioning, graph query language design and graph algorithm development with solutions inspired by physics, biology and neuroscience.
This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century.
Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksMatthias Broecheler
Users querying massive social networks or RDF databases are often not 100% certain about what they are looking for due to the complexity of the query or heterogeneity of the data. In this paper, we propose “probabilistic subgraph” (PS) queries over a graph/network database, which afford users great flexibility in specifying “approximately” what they are looking for. We formally define the probability that a substitution satisfies a PS-query with respect to a graph database. We then present the PMATCH algorithm to answer such queries and prove its correctness. Our experimental evaluation demonstrates that PMATCH is efficient and scales to massive social networks with over a billion edges.
Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence.
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksMatthias Broecheler
Slides presenting our work on COSI at the ASONAM conference 2010
Note: The images used in this presentation are copyright by the respective owners as indicated with the picture. Pictures used are either CC or fair use. Please notify the author if you feel that your images are unfairly used in this presentation.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
PHP Frameworks: I want to break free (IPC Berlin 2024)
Titan: Big Graph Data with Cassandra
1. TITAN
BIG GRAPH DATA WITH CASSANDRA
#TITANDB #GRAPHDB #CASSANDRA12
Matthias Broecheler, CTO AURELIUS
August VIII, MMXII THINKAURELIUS.COM
2. Abstract
Titan is an open source distributed graph database build on top of
Cassandra that can power real-time applications with thousands of
concurrent users over graphs with billions of edges. Graphs are a versatile
data model for capturing and analyzing rich relational structures. Graphs
are an increasingly popular way to represent data in a wide range of
domains such as social networking, recommendation engines,
advertisement optimization, knowledge representation, health care,
education, and security.
This presentation discusses Titan's data model, query language, and novel
techniques in edge compression, data layout, and vertex-centric indices
which facilitate the representation and processing of Big Graph Data
across a Cassandra cluster. We demonstrate Titan's performance on a
large scale benchmark evaluation using Twitter data.
3. Titan Graph Database
supports real time local traversals (OLTP)
is highly scalable
in the number of concurrent users
in the size of the graph
is open source under the Apache2 license
builds on top of Apache Cassandra for
distribution and replication
16. name: “Muscle building for beginners”
name: Hercules
type: book
bought
bought
name: Newton
17. name: “Muscle building for beginners”
name: Hercules
type: book
bought
bought
name: “How to deal with Father issues”
name: Newton
type: book
bought
18. name: “Muscle building for beginners”
name: Hercules
type: book
bought
recommend
bought
name: “How to deal with Father issues”
name: Newton
type: book
bought
Traversal
19. name: “Dancing with the Stars”
type: DVD
in-Cart
name: “Muscle building for beginners”
name: Hercules
type: book
bought
bought
name: “How to deal with Father issues”
name: Newton
type: book
bought
viewed
name: “Friends forever bracelet”
type: Accessory
20. name: “Dancing with the Stars”
type: DVD
in-Cart
name: “Muscle building for beginners”
name: Hercules
type: book
bought
friends
bought
name: “How to deal with Father issues”
name: Newton
type: book
bought
viewed
name: “Friends forever bracelet”
type: Accessory
21. name: “Dancing with the Stars”
type: DVD
in-Cart
name: “Muscle building for beginners”
name: Hercules
type: book
bought
time:24
friends
bought
time:22
name: “How to deal with Father issues”
name: Newton
type: book
bought
time:20
viewed
name: “Friends forever bracelet”
type: Accessory
23. name: Neptune
name: Alcmene
type: god
type: god
X
brother
mother
name: Saturn
name: Jupiter
name: Hercules
type: titan
type: god
type: demigod
X
father
father
battled
brother
time:12
name: Pluto
name: Cerberus
type: god
type: monster
pet
Path
Finding
24. name: Neptune
name: Alcmene
type: god
type: god
X
brother
mother
name: Saturn
name: Jupiter
name: Hercules
type: titan
type: god
type: demigod
X
father
father
battled
brother
time:12
name: Pluto
name: Cerberus
type: god
type: monster
pet
Path
Finding
31. Titan Features
numerous concurrent users
real-time traversals (OLTP)
high availability
dynamic scalability
built on Apache Cassandra
32. Titan Ecosystem
Native Blueprints Graph
Server
Implementation
Graph
Algorithms
Gremlin Query Object-Graph
Mapper
Language
Traversal
Language
Rexster Server
Dataflow
Processing
any Titan graph can be exposed Generic
Graph API
as a REST endpoint
37. Titan Storage Model
Adjacency list in one 5
column family
Row key = vertex id
Each property and edge
5
in one column
Denormalized, i.e. stored twice
Direction and label/key as column prefix
Use slice predicate for quick retrieval
38. Connecting Titan
titan$ bin/gremlin.sh!
,,,/!
(o o)!
-----oOOo-(_)-oOOo-----!
gremlin> conf = new BaseConfiguration();!
==>org.apache.commons.configuration.BaseConfiguration@763861e6!
gremlin> conf.setProperty("storage.backend","cassandra");!
gremlin> conf.setProperty("storage.hostname","77.77.77.77");!
gremlin> g = TitanFactory.open(conf);
==>titangraph[cassandra:77.77.77.77]!
gremlin>!
40. Defining Property Keys
Each type has a unique name
gremlin> g.makeType().name(“time”).! The allowed data type
! ! dataType(Long.class).!
! ! functional().! If a key is functional, each vertex can
! ! makePropertyKey();! have at most one property for this key
gremlin> g.makeType().name(“text”).dataType(String.class).!
! ! functional().makePropertyKey();!
gremlin> g.makeType().name(“name”).dataType(String.class).!
! ! indexed().!
! ! unique().!
! ! functional().makePropertyKey();!
41. Defining Property Keys
gremlin> g.makeType().name(“time”).!
! ! dataType(Long.class).!
! ! functional().!
! ! makePropertyKey();!
gremlin> g.makeType().name(“text”).dataType(String.class).!
! ! functional().makePropertyKey();!
gremlin> g.makeType().name(“name”).dataType(String.class).!
! ! indexed().! Creates and maintains an index over property values
! ! unique().!
Ensures that each property value is uniquely
! ! functional().makePropertyKey();! associated with only one vertex by acquiring a lock.
42. Titan Indexing
Vertices can be retrieved by
property key + value
name : Hercules
5
Titan maintains index in a
name : Jupiter
9
separate column family as
graph is updated
Only need to define a
property key as .index()
43. Titan Locking
Locking ensures consistency
when it is needed
name : Hercules
5
Titan uses time stamped
quorum reads and writes on 9
separate CFs for locking
Uses
name :
name :
Jupiter
Hercules
Property uniqueness: .unique()
father
Functional edges: .functional()
Global ID management
x
name :
father
Pluto
46. Defining Edge Labels
gremlin> g.makeType().name(“follows”).!
! ! primaryKey(time).!
! ! makeEdgeLabel();!
gremlin> g.makeType().name(“tweets”).!
! ! primaryKey(time).makeEdgeLabel();!
gremlin> g.makeType().name(“stream).!
! ! primaryKey(time).!
! ! unidirected().!
Store edges of this label only in outgoing direction
! ! makeEdgeLabel();!
47. Vertex-Centric Indices
Sort and index edges per
vertex by primary key
Primary key can be composite
Enables efficient focused
traversals
Only retrieve edges that matter
Uses slice predicate for quick,
index-driven retrieval
59. Twitter Benchmark
1.47 billion followship edges
and 41.7 million users
Loaded into Titan using BatchGraph
Twitter in 2009, crawled by Kwak et. al
4 Transaction Types
Create Account (1%)
Publish tweet (15%)
Read stream (76%)
Recommendation (8%)
Kwak, H., Lee, C., Park, H., Moon, S., “What is
Follow recommended user (30%)
Twitter, a Social Network or a News Media?,”
World Wide Web Conference, 2010.
60. Benchmark Setup
6 cc1.4xl Cassandra nodes
in one placement group
Cassandra 1.10
40 m1.small worker machines
repeatedly running transactions
simulating servers handling user
requests
EC2 cost: $11/hour
61. Benchmark Results
Transaction Type
Number of tx
Mean tx time
Std of tx time
Create account
379,019
115.15 ms
5.88 ms
Publish tweet
7,580,995
18.45 ms
6.34 ms
Read stream
37,936,184
6.29 ms
1.62 ms
Recommendation
3,793,863
67.65 ms
13.89 ms
Total
49,690,061
Runtime
2.3 hours
5,900 tx/sec
62. Peak Load Results
Transaction Type
Number of tx
Mean tx time
Std of tx time
Create account
374,860
172.74 ms
10.52 ms
Publish tweet
7,517,667
70.07 ms
19.43 ms
Read stream
37,618,648
24.40 ms
3.18 ms
Recommendation
3,758,266
229.83 ms
29.08 ms
Total
49,269,441
Runtime
1.3 hours
10,200 tx/sec
63. Benchmark Conclusion
Titan can handle 10s of thousands of concurrent users
with short response 5mes even for complex traversals
on a simulated social networking applica5on based on
real-‐world network data with billions of edges and
millions of users in a standard EC2 deployment.
For more informa5on on the benchmark:
hDp://thinkaurelius.com/2012/08/06/5tan-‐provides-‐real-‐5me-‐big-‐graph-‐data/
64. Future Titan
Titan+Cassandra embedding
sending Gremlin queries into
the cluster
Graph partitioning together
with ByteOrderedPartitioner
data locality = better performance
Let us know what you need!
65. Titan goes OLAP
Map/Reduce
Load & Compress
Analysis results
back into Titan
Stores a massive-scale Batch processing of large Runs global graph algorithms
property graph allowing real- graphs with Hadoop
on large, compressed,
time traversals and updates
in-memory graphs