Presented at Open Camps (Database Camp) in New York City on November 19, 2017. http://www.db.camp/2017/presentations/graph-computing-with-apache-tinkerpop
Exploring Graph Use Cases with JanusGraphJason Plurad
Graph databases are relative newcomers in the NoSQL database landscape. What are some graph model and design considerations when choosing a graph database in your architecture? Let's take a tour of a couple graph use cases that we've collaborated on recently with our clients to help you better understand how and why a graph database can be integrated to help solve problems found with connected data. Presented at DataWorks Summit San Jose - IBM Meetup on June 18, 2018.
https://www.meetup.com/BigDataDevelopers/events/251307524/
Community-Driven Graphs with JanusGraphJason Plurad
Presented at Open Camps (Database Camp, Search Camp) in New York City on November 19, 2017. http://www.searchcamp.io/2017/presentations/community-driven-graphs-with-janusgraph
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
Presented at the Linked Data Benchmark Council (LDBC) Technical User Group (TUG) Meeting on June 8, 2018. http://www.ldbcouncil.org/blog/11th-tuc-meeting-university-texas-austin
Start Flying with Python & Apache TinkerPopJason Plurad
This document summarizes a presentation about using Python and Apache TinkerPop to work with graph databases. It discusses Gremlin, a graph traversal language, and how Gremlin has been incorporated into Python through Gremlin-Python. It provides an example of building a small web application and APIs to work with an air routes graph stored in a graph database, and deploying the application and database to the cloud.
Airline Reservations and Routing: A Graph Use CaseJason Plurad
We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved. Presented at DataWorks Summit Berlin on April 18, 2018
One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion. Presented at Data Day Texas on January 27, 2018.
Graph Computing with JanusGraph. Presented at Cleveland Big Data Mega Meetup on September 11, 2017. https://www.meetup.com/Cleveland-Hadoop/events/241553826/
Exploring Graph Use Cases with JanusGraphJason Plurad
Graph databases are relative newcomers in the NoSQL database landscape. What are some graph model and design considerations when choosing a graph database in your architecture? Let's take a tour of a couple graph use cases that we've collaborated on recently with our clients to help you better understand how and why a graph database can be integrated to help solve problems found with connected data. Presented at DataWorks Summit San Jose - IBM Meetup on June 18, 2018.
https://www.meetup.com/BigDataDevelopers/events/251307524/
Community-Driven Graphs with JanusGraphJason Plurad
Presented at Open Camps (Database Camp, Search Camp) in New York City on November 19, 2017. http://www.searchcamp.io/2017/presentations/community-driven-graphs-with-janusgraph
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community. Presented on October 18, 2017 at the Graph Technologies Meetup in Santa Clara, CA. https://www.meetup.com/_CAIDI/events/243122187/
Presented at the Linked Data Benchmark Council (LDBC) Technical User Group (TUG) Meeting on June 8, 2018. http://www.ldbcouncil.org/blog/11th-tuc-meeting-university-texas-austin
Start Flying with Python & Apache TinkerPopJason Plurad
This document summarizes a presentation about using Python and Apache TinkerPop to work with graph databases. It discusses Gremlin, a graph traversal language, and how Gremlin has been incorporated into Python through Gremlin-Python. It provides an example of building a small web application and APIs to work with an air routes graph stored in a graph database, and deploying the application and database to the cloud.
Airline Reservations and Routing: A Graph Use CaseJason Plurad
We've all been there before... you hear the announcement that your flight is canceled. Fellow passengers race to the gate agent to rebook on the next available flight. How do they quickly determine the best route from Berlin to San Francisco? Ultimately the flight route network is best solved as a graph problem. We will discuss our lessons learned from working with a major airline to solve this problem using JanusGraph database. JanusGraph is an open source graph database designed for massive scale. It is compatible with several pieces of the open source big data stack: Apache TinkerPop (graph computing framework), HBase, Cassandra, and Solr. We will go into depth about our approach to benchmarking graph performance and discuss the utilities we developed. We will share our comparison results for evaluating which storage backend use with JanusGraph. Whether you are productizing a new database or you are a frustrated traveler, a fast resolution is needed to satisfy everybody involved. Presented at DataWorks Summit Berlin on April 18, 2018
One of the first problems a developer encounters when evaluating a graph database is how to construct a graph efficiently. Recognizing this need in 2014, TinkerPop's Stephen Mallette penned a series of blog posts titled "Powers of Ten" which addressed several bulkload techniques for Titan. Since then Titan has gone away, and the open source graph database landscape has evolved significantly. Do the same approaches stand the test of time? In this session, we will take a deep dive into strategies for loading data of various sizes into modern Apache TinkerPop graph systems. We will discuss bulkloading with JanusGraph, the scalable graph database forked from Titan, to better understand how its architecture can be optimized for ingestion. Presented at Data Day Texas on January 27, 2018.
Graph Computing with JanusGraph. Presented at Cleveland Big Data Mega Meetup on September 11, 2017. https://www.meetup.com/Cleveland-Hadoop/events/241553826/
JanusGraph: What's Next, Project Status Update. Presented at Open Source Graph Technologies NYC Meetup on August 24, 2017. https://www.meetup.com/graphs/events/241136321/
Janus graph lookingbackwardreachingforwardDemai Ni
JanusGraph: Looking Backward and Reaching Forward - by Jason Plurad (@pluradj):
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community.
This document provides an overview of large scale graph analytics and JanusGraph. It discusses graph databases and their use cases. JanusGraph is presented as an open source graph database that can scale to billions of vertices and edges across multiple storage backends like HBase, Cassandra and Bigtable. It uses the TinkerPop framework and Gremlin query language. JanusGraph supports ACID transactions, external indices, and evolving schemas. Example graph queries are demonstrated using the Gremlin console.
Margriet Groenendijk gave a presentation on data science in the cloud. She discussed her background working with large datasets and using tools like Python, Spark, R, and IBM's cloud services. She then outlined the typical data science workflow of collecting and storing data, exploring and cleaning it, creating predictive models, and presenting results. Finally, she demonstrated an example of analyzing weather and Twitter sentiment data using various IBM cloud tools.
Data Science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools and finally presenting the best possible answer in a clear, engaging manner. But it all starts with having access to the data. In these slides I will walk your through some examples of how to collect, store and access data in the Cloud with the use of different APIs.
This document introduces GraphQL, comparing it to REST. It discusses GraphQL concepts like queries, mutations, and subscriptions. It provides examples of GraphQL queries. It also demonstrates how to implement GraphQL in Node.js and lists GraphQL libraries and third-party services. Live demos are linked to show GraphQL usage.
Learn about core functions and architecture of Zentral. Zentral is a open source hub to process event streams from osquery and other sources into the ElasticStack. Besides support for distinct osquery features like file carving, Zentral provides numerous integrations for inventory acquisition and alerting.
In the next five years, 15 to 40 billion additional connected devices are expected to hit the market. How can we handle such volumes and velocity of data?
Introduction to Dynamo storage systems, Riak, Cassandra, time series databases and edge analytics.
FUTURESTACK13: Software analytics with Project Rubicon from Alex Kroman Engin...New Relic
The document discusses Project Rubicon, a software analytics tool from New Relic. It summarizes Rubicon's ability to capture raw event data from applications, allowing users to ask complex questions. It then demonstrates how to write NRQL queries to analyze metrics like page views and custom events over time. NRQL makes it easy to aggregate large amounts of data through functions, time windows, time series, and facets. The document also provides an overview of Rubicon's architecture and how it handles billions of events through techniques like using memory efficiently and building for failure.
This document discusses Apache Spark, a framework for large-scale data processing. It encourages attendees to learn Spark through free online resources and IBM's partnership with universities. It also presents a toy example analyzing Dutch police and sales data to see if item advertisements predict crime in neighborhoods. The document promotes IBM's goal of educating 1 million data scientists on Spark and announces an upcoming Spark hackathon in the Netherlands.
Python and H2O with Cliff Click at PyData Dallas 2015Sri Ambati
This document discusses H2O.ai, an open source in-memory machine learning platform. It can perform distributed machine learning on large datasets using algorithms like generalized linear modeling, gradient boosted machines, random forests, and deep learning. The platform provides APIs and interfaces for R, Python, Scala, Spark, and other languages. It can handle big data from sources like HDFS, S3, and NFS without sampling. The document includes an overview of H2O's architecture and demonstrates its use on a bike sharing dataset with over 10 million rows.
This document discusses Apache Spark, a framework for large-scale data processing. It summarizes that Spark is fast, has a nice library, and may be easy to use quickly. It provides an example of combining police reports and advertisements to see if thieves prefer certain neighborhoods or items. The document encourages attending an upcoming IBM Spark hackathon and learning Spark through free online resources and IBM's goal of educating one million data scientists on Spark.
Fraugster's Data Scientist Oxana Goriuc presentation of her work on implementing Graph Databases for fraud solutions at the (WiMLDS) Women in Machine Learning & Data Science meet-up in Berlin - hosted by Babbel.
This document discusses a universal platform for data science on public and private clouds. The platform would provide computational components like R packages and Python modules, computational resources like clusters and clouds, and computational GUIs. It would support computational scripts in languages like R and Python. The platform aims to provide a federation of public and private cloud infrastructures along with features like real-time collaboration and reproducible data science workflows. The architecture would include remote processes and engines to enable collaboration across spreadsheets, graphics, and dashboards.
The document discusses the German National Library's use of metadata provenance to track whether subject headings were assigned by a cataloger or automated process. It describes using the PROV ontology and qualified relations design pattern to represent provenance in RDF data. Examples are given showing how automated assignments are represented, including the activity, plan, timestamps and software agent. Data dumps containing provenance information are available and feedback is welcomed.
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
Abstract:- Come learn about Google BigQuery and its underlying architecture. Felipe will go over the evolution of BigQuery and explain some of the underlying principles of BigQuery and Dremel. Felipe will also go over some of the latest use cases and will demo a use case of Google BigQuery
Bio:-
Felipe Hoffa moved from Chile to San Francisco to join Google as a Software Engineer. Since 2013 he's been a Developer Advocate on big data - to inspire developers around the world to leverage the Google Cloud Platform tools to analyze and understand their data in ways they could never before. You can find him in several YouTube videos, blog posts, and conferences around the world.
Follow Felipe at https://twitter.com/felipehoffa.
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Ibrahim Muhammadi
Presented by Ibrahim Muhammadi. Founder - AppWorx.cc
Big Data is revolutionizing how businesses make decisions now. More and more decisions and strategies are now based on data.
The document discusses the development of a web and mobile app called "Tech Comm on a Map" that maps events and resources related to technical communication. Key points:
- The app allows users to contribute tech comm events and resources through a web form or Android app, which are stored in a Google Sheet and displayed on an interactive map using the Google Maps API.
- The web app was created using HTML, CSS, JavaScript and jQuery. Data is retrieved from Google Sheets using Apps Script.
- An Android version was also developed using Java and the Google Maps Android API to make the map accessible on mobile.
- The project is open source on GitHub and the developer discusses lessons learned around community collaboration and
"Enabling Googley microservices with gRPC" Riga DevDays 2018 editionAlex Borysov
The document describes a presentation about gRPC (Google's Remote Procedure Call framework) given by Alex Borysov. Some key points:
- gRPC is an open source, high performance RPC framework that uses HTTP/2 for transport. It was developed at Google and is now part of the Cloud Native Computing Foundation.
- It provides language-independent client and server APIs that can be used to define and consume services. Over 10 programming languages are supported.
- Compared to alternatives like JSON/HTTP, gRPC provides much higher throughput and lower latency. It has been shown to provide 3x throughput and 11x better CPU efficiency than JSON/HTTP for Google Cloud Pub/Sub.
- The presentation
JanusGraph: What's Next, Project Status Update. Presented at Open Source Graph Technologies NYC Meetup on August 24, 2017. https://www.meetup.com/graphs/events/241136321/
Janus graph lookingbackwardreachingforwardDemai Ni
JanusGraph: Looking Backward and Reaching Forward - by Jason Plurad (@pluradj):
The JanusGraph project started at the Linux Foundation earlier this year, but it is not the new kid on the block. We'll start with a look at the origins and evolution of this open source graph database through the lens of a few IBM graph use cases. We'll discuss the new features in latest release of JanusGraph, and then take a look at future directions to explore together with the open community.
This document provides an overview of large scale graph analytics and JanusGraph. It discusses graph databases and their use cases. JanusGraph is presented as an open source graph database that can scale to billions of vertices and edges across multiple storage backends like HBase, Cassandra and Bigtable. It uses the TinkerPop framework and Gremlin query language. JanusGraph supports ACID transactions, external indices, and evolving schemas. Example graph queries are demonstrated using the Gremlin console.
Margriet Groenendijk gave a presentation on data science in the cloud. She discussed her background working with large datasets and using tools like Python, Spark, R, and IBM's cloud services. She then outlined the typical data science workflow of collecting and storing data, exploring and cleaning it, creating predictive models, and presenting results. Finally, she demonstrated an example of analyzing weather and Twitter sentiment data using various IBM cloud tools.
Data Science covers the complete workflow from defining a question, finding the most suitable data source, identifying the right tools and finally presenting the best possible answer in a clear, engaging manner. But it all starts with having access to the data. In these slides I will walk your through some examples of how to collect, store and access data in the Cloud with the use of different APIs.
This document introduces GraphQL, comparing it to REST. It discusses GraphQL concepts like queries, mutations, and subscriptions. It provides examples of GraphQL queries. It also demonstrates how to implement GraphQL in Node.js and lists GraphQL libraries and third-party services. Live demos are linked to show GraphQL usage.
Learn about core functions and architecture of Zentral. Zentral is a open source hub to process event streams from osquery and other sources into the ElasticStack. Besides support for distinct osquery features like file carving, Zentral provides numerous integrations for inventory acquisition and alerting.
In the next five years, 15 to 40 billion additional connected devices are expected to hit the market. How can we handle such volumes and velocity of data?
Introduction to Dynamo storage systems, Riak, Cassandra, time series databases and edge analytics.
FUTURESTACK13: Software analytics with Project Rubicon from Alex Kroman Engin...New Relic
The document discusses Project Rubicon, a software analytics tool from New Relic. It summarizes Rubicon's ability to capture raw event data from applications, allowing users to ask complex questions. It then demonstrates how to write NRQL queries to analyze metrics like page views and custom events over time. NRQL makes it easy to aggregate large amounts of data through functions, time windows, time series, and facets. The document also provides an overview of Rubicon's architecture and how it handles billions of events through techniques like using memory efficiently and building for failure.
This document discusses Apache Spark, a framework for large-scale data processing. It encourages attendees to learn Spark through free online resources and IBM's partnership with universities. It also presents a toy example analyzing Dutch police and sales data to see if item advertisements predict crime in neighborhoods. The document promotes IBM's goal of educating 1 million data scientists on Spark and announces an upcoming Spark hackathon in the Netherlands.
Python and H2O with Cliff Click at PyData Dallas 2015Sri Ambati
This document discusses H2O.ai, an open source in-memory machine learning platform. It can perform distributed machine learning on large datasets using algorithms like generalized linear modeling, gradient boosted machines, random forests, and deep learning. The platform provides APIs and interfaces for R, Python, Scala, Spark, and other languages. It can handle big data from sources like HDFS, S3, and NFS without sampling. The document includes an overview of H2O's architecture and demonstrates its use on a bike sharing dataset with over 10 million rows.
This document discusses Apache Spark, a framework for large-scale data processing. It summarizes that Spark is fast, has a nice library, and may be easy to use quickly. It provides an example of combining police reports and advertisements to see if thieves prefer certain neighborhoods or items. The document encourages attending an upcoming IBM Spark hackathon and learning Spark through free online resources and IBM's goal of educating one million data scientists on Spark.
Fraugster's Data Scientist Oxana Goriuc presentation of her work on implementing Graph Databases for fraud solutions at the (WiMLDS) Women in Machine Learning & Data Science meet-up in Berlin - hosted by Babbel.
This document discusses a universal platform for data science on public and private clouds. The platform would provide computational components like R packages and Python modules, computational resources like clusters and clouds, and computational GUIs. It would support computational scripts in languages like R and Python. The platform aims to provide a federation of public and private cloud infrastructures along with features like real-time collaboration and reproducible data science workflows. The architecture would include remote processes and engines to enable collaboration across spreadsheets, graphics, and dashboards.
The document discusses the German National Library's use of metadata provenance to track whether subject headings were assigned by a cataloger or automated process. It describes using the PROV ontology and qualified relations design pattern to represent provenance in RDF data. Examples are given showing how automated assignments are represented, including the activity, plan, timestamps and software agent. Data dumps containing provenance information are available and feedback is welcomed.
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
Abstract:- Come learn about Google BigQuery and its underlying architecture. Felipe will go over the evolution of BigQuery and explain some of the underlying principles of BigQuery and Dremel. Felipe will also go over some of the latest use cases and will demo a use case of Google BigQuery
Bio:-
Felipe Hoffa moved from Chile to San Francisco to join Google as a Software Engineer. Since 2013 he's been a Developer Advocate on big data - to inspire developers around the world to leverage the Google Cloud Platform tools to analyze and understand their data in ways they could never before. You can find him in several YouTube videos, blog posts, and conferences around the world.
Follow Felipe at https://twitter.com/felipehoffa.
Big Data - part 5/7 of "7 modern trends that every IT Pro should know about"Ibrahim Muhammadi
Presented by Ibrahim Muhammadi. Founder - AppWorx.cc
Big Data is revolutionizing how businesses make decisions now. More and more decisions and strategies are now based on data.
The document discusses the development of a web and mobile app called "Tech Comm on a Map" that maps events and resources related to technical communication. Key points:
- The app allows users to contribute tech comm events and resources through a web form or Android app, which are stored in a Google Sheet and displayed on an interactive map using the Google Maps API.
- The web app was created using HTML, CSS, JavaScript and jQuery. Data is retrieved from Google Sheets using Apps Script.
- An Android version was also developed using Java and the Google Maps Android API to make the map accessible on mobile.
- The project is open source on GitHub and the developer discusses lessons learned around community collaboration and
"Enabling Googley microservices with gRPC" Riga DevDays 2018 editionAlex Borysov
The document describes a presentation about gRPC (Google's Remote Procedure Call framework) given by Alex Borysov. Some key points:
- gRPC is an open source, high performance RPC framework that uses HTTP/2 for transport. It was developed at Google and is now part of the Cloud Native Computing Foundation.
- It provides language-independent client and server APIs that can be used to define and consume services. Over 10 programming languages are supported.
- Compared to alternatives like JSON/HTTP, gRPC provides much higher throughput and lower latency. It has been shown to provide 3x throughput and 11x better CPU efficiency than JSON/HTTP for Google Cloud Pub/Sub.
- The presentation
5th in the AskTOM Office Hours series on graph database technologies. https://devgym.oracle.com/pls/apex/dg/office_hours/3084
PGQL: A Query Language for Graphs
Learn how to query graphs using PGQL, an expressive and intuitive graph query language that's a lot like SQL. With PGQL, it's easy to get going writing graph analysis queries to the database in a very short time. Albert and Oskar show what you can do with PGQL, and how to write and execute PGQL code.
State of the Art Web Mapping with Open SourceOSCON Byrum
This document discusses the importance of open source tools and data for web mapping. It begins by providing background on TileMill and Mapbox, which provide open source tools for making maps. It then discusses key concepts in web mapping like geospatial data formats, tile rendering, and minimal code examples. Modern approaches to web mapping involve preprocessing data, using tile renderers and caches, and gradually rendering more client-side. Upcoming improvements may optimize tiled formats and storage. TileMill is demonstrated as an open source tool for making maps. The talk concludes by emphasizing other open mapping tools like CartoDB, Stamen, and CartoDB that build on these concepts.
Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build
and run applications that work with highly connected datasets. The core of Neptune is a purpose-built,
high-performance graph database engine. This engine is optimized for storing billions of relationships
and querying the graph with milliseconds latency. Neptune supports the popular graph query languages
Apache TinkerPop Gremlin, the W3C’s SPARQL, and Neo4j's openCypher, enabling you to build
queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as
recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon
S3, and replication across Availability Zones. Neptune provides data security features, with support
for encryption at rest and in transit. Neptune is fully managed, so you no longer need to worry about
database management tasks like hardware provisioning, software patching, setup, configuration, or
backups
SparkR - Play Spark Using R (20160909 HadoopCon)wqchen
1. Introduction to SparkR
2. Demo
Starting to use SparkR
DataFrames: dplyr style, SQL style
RDD v.s. DataFrames
SparkR on MLlib: GLM, K-means
3. User Case
Median: approxQuantile()
ID Match: dplyr style, SQL style, SparkR function
SparkR + Shiny
4. The Future of SparkR
SETCON'18 - Ilya labacheuski - GraphQL adventuresNadzeya Pus
GraphQL adventures. Вводный курс молодого бойца по созданию GraphQL прокси сервера с использованием typescript. Опыт миграции legacy API services на GraphQL и сложности, возникающие при этом.
Monitoring Spark Applications
Tzach Zohar @ Kenshoo, March/2016
The document discusses monitoring Spark applications. It covers using the Spark UI to monitor jobs, stages and tasks; using the Spark REST API to programmatically access monitoring data; configuring Spark metric sinks like Graphite to export internal Spark metrics; and creating applicative metrics to monitor your own application metrics. The key points are monitoring is important for failures, performance, correctness and understanding data; Spark provides built-in tools but applicative metrics are also useful; and Graphite is well-suited to analyze metrics trends over time.
Automobile Route Matching with Dynamic Time Warping Using PySpark with Cather...Databricks
According to data compiled by the National Highway Traffic Safety Administration, in 2016, an average of ~100 people were killed in automobile accidents every day in the United States. Agero, a market leader in software-enabled driver assistance services, has responded to this growing problem with a breakthrough consumer app that provides near real-time driver behavior analysis and actionable insights to its users on how to become safer drivers.
As part of this effort, we have developed a methodology to identify the most frequent routes that each driver travels by applying Dynamic Time Warping time-series analysis techniques to spatial data. In this talk, we will give a high-level overview of the methodology, and discuss the performance improvement achieved by transitioning the software from stand-alone Python into PySpark + Databricks.
Discussion points will include how to determine the best way to (re)design Python functions to run in Spark, the development and use of user-defined functions in PySpark, how to integrate Spark data frames and functions into Python code, and how to use PySpark to perform ETL from AWS on very large datasets.
The document discusses machine learning techniques including classification, clustering, and collaborative filtering. It provides examples of algorithms used for each technique, such as Naive Bayes, k-means clustering, and alternating least squares for collaborative filtering. The document then focuses on using Spark for machine learning, describing MLlib and how it can be used to build classification and regression models on Spark, including examples predicting flight delays using decision trees. Key steps discussed are feature extraction, splitting data into training and test sets, training a model, and evaluating performance on test data.
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
Uber developed an new Spark ingestion system, Marmaray, for data ingestion from various sources. It’s designed to ingest billions of Kafka messages every 30 minutes. The amount of data handled by the pipeline is of the order hundreds of TBs. Omar details how to tackle such scale and insights into the optimizations techniques. Some key highlights are how to understand bottlenecks in Spark applications, to cache or not to cache your Spark DAG to avoid rereading your input data, how to effectively use accumulators to avoid unnecessary Spark actions, how to inspect your heap and nonheap memory usage across hundreds of executors, how you can change the layout of data to save long-term storage cost, how to effectively use serializers and compression to save network and disk traffic, and how to reduce amortize the cost of your application by multiplexing your jobs, different techniques for reducing memory footprint, runtime, and on-disk usage. CGI was able to significantly (~10%–40%) reduce memory footprint, runtime, and disk usage.
Speaker: Omkar Joshi (Uber)
Omkar Joshi is a senior software engineer on Uber’s Hadoop platform team, where he’s architecting Marmaray. Previously, he led object store and NFS solutions at Hedvig and was an initial contributor to Hadoop’s YARN scheduler.
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
In this Free Code Friday webinar, you’ll get an overview of machine learning with Apache Spark’s MLlib, and you’ll also learn how MLlib decision trees can be used to predict flight delays.
A short introduction to reproducible research, reproducibility with R, Docker, and all together for reproducible research using R and Docker containers. Includes demos of Rocker and containerit.
Document Conversion & Retrieve and Rank 一問一答Hisashi Komine
This document provides information about using Watson services like Document Conversion and Retrieve and Rank through their APIs. It includes examples of calling the APIs to index documents, search a Solr collection, create a ranker, and perform a ranked search.
JAX-RS and CDI Bike the (Reactive) BridgeJosé Paumard
This session explains how JAX-RS and CDI became reactive capable in Java EE 8. We put some new features of JAX-RS 2.1 and CDI 2.0 into perspective and show some reactive patterns to improve your application. Add Java 8 CompletionStage to the mix and this API trio becomes your best bet to easily go reactive without leaving the Java EE train.
Similar to Graph Computing with Apache TinkerPop (20)
Community-Driven Graphs with JanusGraphJason Plurad
Graphs are well-suited for many use cases to express and process complex relationships among entities in enterprise and social contexts. Fueled by the growing interest in graphs, there are various graph databases and processing systems that dot the graph landscape. JanusGraph is a community-driven project that continues the legacy of Titan, a pioneer of open source graph databases. JanusGraph is a scalable graph database optimized for large scale transactional and analytical graph processing. In the session, we will introduce JanusGraph, which features full integration with the Apache TinkerPop graph stack. We will discuss JanusGraph's optimized storage model that relies on HBase for fast graph transversal and processing. Presented with Jing Chen (Jerry) He at HBaseCon West 2017, June 12, 2017.
Graph Processing with Apache TinkerPop and GremlinJason Plurad
Presented at the NVIDIA GPU-Accelerated Graph Ecosystem Roundtable. "Come share and learn more about how NVIDIA is accelerating the graph ecosystem and collaborating with the community on joint development opportunities. Join us to get the latest update on nvGraph, cuSTINGER, Gunrock, and query languages. Don't miss out on a great opportunity to provide feedback and take an active part in shaping the future of GPU-accelerated graph analytics." GPU Technology Conference, May 8, 2017, San Jose, California.
IBM's journey in open source graphs and how its Open by Design approach has benefited open communities and IBM offerings. Presented at JanusGraph NYC Meetup, March 1, 2017. https://www.meetup.com/graphs/events/237100744/
Enabling Multimodel Graphs with Apache TinkerPopJason Plurad
Graphs are everywhere, but in a modern data stack, they are not the only tool in the toolbox. With Apache TinkerPop, adding graph capability on top of your existing data platform is not as daunting as it sounds. We will do a deep dive on writing Traversal Strategies to optimize performance of the underlying graph database. We will investigate how various TinkerPop systems offer unique possibilities in a multimodel approach to graph processing. We will discuss how using Gremlin frees you from vendor lock-in and enables you to swap out your graph database as your requirements evolve. Presented at Graph Day Texas, January 14, 2017. http://graphday.com/graph-day-at-data-day-texas/#plurad
Graph Processing with Titan and ScyllaJason Plurad
This document discusses graph processing with Titan and Scylla. It provides an overview of graph computing and common graph domains. It describes Apache TinkerPop and the property graph model. It then discusses the graph landscape, including graph databases for OLTP vs graph processors for OLAP. It introduces Titan as an open source graph database and describes its key features and architecture. Finally, it discusses using Scylla as a drop-in replacement for Cassandra as the storage backend for Titan, highlighting Scylla's performance benefits for OLTP and potential for future integration.
Graph Processing with Apache TinkerPopJason Plurad
This document discusses Apache TinkerPop, a graph computing framework. It provides an overview of TinkerPop and the graph landscape, describes common graph domains and the Gremlin property graph model. It also demonstrates hands-on examples with Titan and Spark/Giraph and discusses using graphs to analyze dependency management and the NPM registry. The document emphasizes that TinkerPop allows seamless use of OLTP and OLAP graphs via Gremlin and supports graph-based thinking for multi-model data.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
11. § Elevate Gremlin to a top-level citizen in the programming language of choice
§ GLV can work with any modern language
– Function composition
– Function nesting
§ Java and Groovy (native)
§ Python is the first non-JVM GLV
§ Others are coming soon (JavaScript, C#, Go)
§ SPARQL-Gremlin and SQL-Gremlin
http://tinkerpop.apache.org/docs/current/tutorials/gremlin-language-variants
@pluradj #OpenCamps2017
Gremlin Language Variants (GLV)
12. 12
Graph Databases, Gremlin and TinkerPop – A Tutorial
Kelvin Lawrence @gfxman
https://github.com/krlawrence/graph
@pluradj #OpenCamps2017
13. 13
Graph Model: Air Routes
@pluradj #OpenCamps2017
airport
continentcountry
route
code
desc
code
desc
code
city
desc
elev
lat
lon
dist
17. 17
What international flights depart from Raleigh?
@pluradj #OpenCamps2017
> g.V().has('airport', 'code', 'RDU').
out('route').
has('country', neq('US')).
values('city').
toList()
==> Toronto
==> Paris
==> London
==> Cancun
19. 19
Graph Code Patterns
IBM Cognitive OpenTech & Performance
https://github.com/IBM/janusgraph-utils
§ A 360° view of how Apache TinkerPop and JanusGraph solves a specific problem
– Includes contextual overviews, architecture diagrams, process flows, demos, blog posts,
and source code
§ Twitter-like application in JanusGraph
– Data generator
– Schema loader
– CSV importer
– Graph model and Gremlin queries
§ Contributions welcome!