Apache Fluo is an open source implementation of Google's Percolator (which populates Google's search index) for Apache Accumulo. Fluo makes it possible to continuously join new data into large existing data sets without reprocessing all data. This is done with low latency. This talk will walk through the use case of building a derived graph from multiple graphs using Fluo. For example, multiple social network graphs (e.g. Twitter, Github, Google+, etc) could be combined into a single derived graph. Machine learning could link ids and inject these into Fluo. Fluo could continually join the linked ids with social network data to create virtual nodes, compute features, and export updates to a query system. A system like this would enable searches for virtual nodes with features that only exists in the combined view.
— Speaker —
Keith Turner
Software Engineer, Peterson Technologies
Keith Turner has been working with big data since 2004. Keith started working on Accumulo in 2008 and Fluo in 2013. Keith has an MS in Computer Science from Purdue and a BS in Computer Science from the University of Louisiana at Lafayette.
— More Information —
For more information see http://www.accumulosummit.com/
This module features to integrate airtable with odoo and perform export, import, sync operations to transfer datas from (odoo to airtable) and (airtable to odoo) bi-directionally.
This document provides technical details about PostgreSQL WAL (Write Ahead Log) buffers. It describes the structure and purpose of WAL segments, WAL records, and their components. It also explains how the WAL is used to safely recover transactions after a server crash by replaying the log.
The document discusses a generic programming toolkit called PADS/ML that can be used to parse, analyze, and transform semi-structured or "ad hoc" data from various domains. It describes how PADS/ML uses generated type representations and typecase analysis to write functions that can operate on any data format described by a PADS/ML type. Case studies of PADX and Harmony are presented, which use PADS/ML to build tools for querying and synchronizing different data formats.
This document contains slides from a presentation on how Firebird transactions work. The presentation covers the basics of transactions, how record versions are created in Firebird's multi-version architecture, and how the transaction inventory pages and isolation levels allow Firebird to maintain transaction consistency and isolation. Key points include that each record can have multiple versions associated with different transactions, snapshots isolate transactions by copying the transaction inventory, and read committed transactions see the global transaction state while snapshots see only up to their start time.
M|18 Understanding the Architecture of MariaDB ColumnStoreMariaDB plc
The document provides an overview of MariaDB ColumnStore, including its history, components, disk storage architecture, writing and querying data processes. It was presented by Andrew Hutchings, the lead software engineer for MariaDB ColumnStore, who has previous experience with MySQL, HP, and other companies. The presentation covers the technical use cases for ColumnStore, differences from row-oriented databases, and optimizations for ColumnStore.
This project is based on Data Path Architecture which consists of Shift register, MAC Unit, 16-Bit ALU and Tri-State Buffer. This whole architecture is implemented by using VHDL and simulated by using Modelsim.
The document discusses various issues that can arise in translation memory (TM) data and how they are typically handled by normalization processes or technology providers like PROMT. Some examples of issues addressed include excessive internal tags, irrelevant data, mismatches between source and target languages, inconsistent formatting, and machine code or keywords that are not fully translated. The document also provides examples of how PROMT specifically handles some of these issues, such as keeping internal tags but not to an excessive level, and leaving irrelevant data untouched while letting it be handled by other downstream processes.
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Michael Noll
Talk URL: https://kafka-summit.org/sessions/kafka-102-streams-tables-way/
Video recording: https://www.confluent.io/kafka-summit-san-francisco-2019/kafka-102-streams-and-tables-all-the-way-down
Abstract: Streams and Tables are the foundation of event streaming with Kafka, and they power nearly every conceivable use case, from payment processing to change data capture, from streaming ETL to real-time alerting for connected cars, and even the lowly WordCount example. Tables are something that most of us are familiar with from the world of databases, whereas Streams are a rather new concept. Trying to leverage Kafka without understanding tables and streams is like building a rocket ship without understanding the laws of physics-a mission bound to fail. In this session for developers, operators, and architects alike we take a deep dive into these two fundamental primitives of Kafka’s data model. We discuss how streams and tables incl. global tables relate to each other and to topics, partitioning, compaction, serialization (Kafka’s storage layer), and how they interplay to process data, react to data changes, and manage state in an elastic, scalable, fault-tolerant manner (Kafka’s compute layer). Developers will understand better how to use streams and tables to build event-driven applications with Kafka Streams and KSQL, and we answer questions such as “How can I query my tables?” and “What is data co-partitioning, and how does it affect my join?”. Operators will better understand how these applications will run in production, with questions such as “How do I scale my application?” and “When my application crashes, how will it recover its state?”. At a higher level, we will explore how Kafka uses streams and tables to turn the Database inside-out and put it back together.
This module features to integrate airtable with odoo and perform export, import, sync operations to transfer datas from (odoo to airtable) and (airtable to odoo) bi-directionally.
This document provides technical details about PostgreSQL WAL (Write Ahead Log) buffers. It describes the structure and purpose of WAL segments, WAL records, and their components. It also explains how the WAL is used to safely recover transactions after a server crash by replaying the log.
The document discusses a generic programming toolkit called PADS/ML that can be used to parse, analyze, and transform semi-structured or "ad hoc" data from various domains. It describes how PADS/ML uses generated type representations and typecase analysis to write functions that can operate on any data format described by a PADS/ML type. Case studies of PADX and Harmony are presented, which use PADS/ML to build tools for querying and synchronizing different data formats.
This document contains slides from a presentation on how Firebird transactions work. The presentation covers the basics of transactions, how record versions are created in Firebird's multi-version architecture, and how the transaction inventory pages and isolation levels allow Firebird to maintain transaction consistency and isolation. Key points include that each record can have multiple versions associated with different transactions, snapshots isolate transactions by copying the transaction inventory, and read committed transactions see the global transaction state while snapshots see only up to their start time.
M|18 Understanding the Architecture of MariaDB ColumnStoreMariaDB plc
The document provides an overview of MariaDB ColumnStore, including its history, components, disk storage architecture, writing and querying data processes. It was presented by Andrew Hutchings, the lead software engineer for MariaDB ColumnStore, who has previous experience with MySQL, HP, and other companies. The presentation covers the technical use cases for ColumnStore, differences from row-oriented databases, and optimizations for ColumnStore.
This project is based on Data Path Architecture which consists of Shift register, MAC Unit, 16-Bit ALU and Tri-State Buffer. This whole architecture is implemented by using VHDL and simulated by using Modelsim.
The document discusses various issues that can arise in translation memory (TM) data and how they are typically handled by normalization processes or technology providers like PROMT. Some examples of issues addressed include excessive internal tags, irrelevant data, mismatches between source and target languages, inconsistent formatting, and machine code or keywords that are not fully translated. The document also provides examples of how PROMT specifically handles some of these issues, such as keeping internal tags but not to an excessive level, and leaving irrelevant data untouched while letting it be handled by other downstream processes.
Kafka 102: Streams and Tables All the Way Down | Kafka Summit San Francisco 2019Michael Noll
Talk URL: https://kafka-summit.org/sessions/kafka-102-streams-tables-way/
Video recording: https://www.confluent.io/kafka-summit-san-francisco-2019/kafka-102-streams-and-tables-all-the-way-down
Abstract: Streams and Tables are the foundation of event streaming with Kafka, and they power nearly every conceivable use case, from payment processing to change data capture, from streaming ETL to real-time alerting for connected cars, and even the lowly WordCount example. Tables are something that most of us are familiar with from the world of databases, whereas Streams are a rather new concept. Trying to leverage Kafka without understanding tables and streams is like building a rocket ship without understanding the laws of physics-a mission bound to fail. In this session for developers, operators, and architects alike we take a deep dive into these two fundamental primitives of Kafka’s data model. We discuss how streams and tables incl. global tables relate to each other and to topics, partitioning, compaction, serialization (Kafka’s storage layer), and how they interplay to process data, react to data changes, and manage state in an elastic, scalable, fault-tolerant manner (Kafka’s compute layer). Developers will understand better how to use streams and tables to build event-driven applications with Kafka Streams and KSQL, and we answer questions such as “How can I query my tables?” and “What is data co-partitioning, and how does it affect my join?”. Operators will better understand how these applications will run in production, with questions such as “How do I scale my application?” and “When my application crashes, how will it recover its state?”. At a higher level, we will explore how Kafka uses streams and tables to turn the Database inside-out and put it back together.
This document provides a summary of progress and accomplishments from Sprint 19 of the ManageIQ project. Key points include:
- 181 pull requests were merged addressing bugs, enhancements, technical debt, refactoring, and tests.
- Providers work focused on OpenStack infrastructure host events and Kubernetes inventory collection.
- The REST API work included improvements to tag collection/management and foundations like virtual attributes.
- UI work updated the login screen, navigation, and advanced search to use Bootstrap/Patternfly. Orchestration insight and the schedule editor were also improved.
- Other work involved service dialogs, Foreman integration, event handling improvements, and appliance/fleecing fixes and tests
This document discusses Delta Change Data Feed (CDF), which allows capturing changes made to Delta tables. It describes how CDF works by storing change events like inserts, updates and deletes. It also outlines how CDF can be used to improve ETL pipelines, unify batch and streaming workflows, and meet regulatory needs. The document provides examples of enabling CDF, querying change data and storing the change events. It concludes by offering a demo of CDF in Jupyter notebooks.
Accumulo Summit 2014: Accismus -- Percolating with AccumuloAccumulo Summit
Accismus is a system that implements incremental processing on big data using Accumulo and the percolator paper. It adds visibility to the percolator model through the use of observers that are triggered by modifications to user-defined columns. Transactions in Accismus provide fault-tolerant processing through a two-phase commit protocol. Examples demonstrated include a banking application and phrase counting on documents.
Sprint 19 of ManageIQ focused on several areas of development:
- Providers work included bug fixes and new features for OpenStack and Kubernetes integration.
- The REST API saw many improvements including tag management and foundational work. Virtual attribute and ID/Href separation were completed.
- UI updates transitioned screens to Bootstrap/Patternfly including the login screen, navigation, and advanced search. Orchestration insight and the schedule editor were converted to AngularJS.
- Other work involved service dialogs, Foreman integration, event handling improvements, appliance fixes, fleecing automation, and updates to the manageiq.org website.
The document discusses the Android Binder architecture and provides debugging information. It includes a binder transaction log showing pending async transactions between processes. It also shows binder driver statistics like number of threads and nodes. Key binder driver structs are defined for processes, threads, nodes, references, transactions and buffers. The relationship between these structs during an async transaction is illustrated. Finally, it describes how binder transactions are handled and linked within the driver.
This document discusses registers, which are sequential logic circuits that can store multiple bits of data. Registers are built from multiple flip-flops connected in parallel and are used to store data in processors and other digital circuits. The document explains basic register operation, including parallel loading of data and shifting of data. It also discusses different types of shift registers and applications of registers such as serial data transfer.
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Hsien-Hsin Sean Lee, Ph.D.
This document discusses branch prediction in computer architecture. It begins by explaining what information is predicted for branches - the direction and target. It then categorizes different types of branches and discusses the costs of branch misprediction. Various branch prediction techniques are presented, starting with simple 1-bit and 2-bit predictors, and progressing to more advanced correlating and global history predictors. The goal of branch prediction is to reduce penalties from mispredicted branches by speculatively executing the predicted path.
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
This document summarizes recent improvements to Flink SQL and Table API by Blink, Alibaba's distribution of Flink. Key improvements include support for stream-stream joins, user-defined functions, table functions and aggregate functions, retractable streams, and over/group aggregates. Blink aims to make Flink work well at large scale for Alibaba's search and recommendation systems. Many of the improvements will be included in upcoming Flink releases.
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?Marlon Dumas
This document discusses using event logs to generate business process simulation models. It describes traditional discrete event simulation approaches that discover simulation models from event logs recorded by information systems. Deep learning techniques are also discussed that can generate traces without an explicit process model. The document suggests that combining discrete event simulation and deep learning may produce more accurate simulations, but challenges remain around validating such hybrid approaches and testing them in previously unseen scenarios. More research is needed before these data-driven simulation methods can reliably predict the effects of interventions.
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
The document discusses registers, which are sequential circuits that can store multiple bits of data using multiple flip-flops. Registers are useful for storing data temporarily in processors and building larger sequential circuits. The document describes basic registers, shift registers that can shift data in or out, and how registers are used to convert between serial and parallel data transmission. Registers are faster than memory but also more limited in storage, so processors use hierarchies of caches and memory in addition to registers.
The document discusses the stack and subroutines in assembly language programming. Some key points:
- The stack pointer (SP or A7) points to the top of the stack, which grows downward in memory. Values are pushed onto the stack using predecrement mode and popped using postincrement mode.
- The stack is used for temporary variable storage, passing parameters to subroutines, and storing the return address when making subroutine calls with JSR.
- To call a subroutine, parameters are pushed to the stack, JSR jumps to the subroutine, and RTS pops the return address back to PC. Transparent subroutines save and restore any modified registers.
- Parameters
A fast-paced introduction to TensorFlow 2 about some important new features (such as generators and the @tf.function decorator) and TF 1.x functionality that's been removed from TF 2 (yes, tf.Session() has retired).
Concise code samples are presented to illustrate how to use new features of TensorFlow 2. You'll also get a quick introduction to lazy operators (if you know FRP this will be super easy), along with a code comparison between TF 1.x/iterators with tf.data.Dataset and TF 2/generators with tf.data.Dataset.
Finally, we'll look at some tf.keras code samples that are based on TensorFlow 2. Although familiarity with TF 1.x is helpful, newcomers with an avid interest in learning about TensorFlow 2 can benefit from this session.
The document discusses hiding and revealing secrets in PDF documents. It provides an overview of the PDF file format, including the structure of objects, streams, filters and parsing. Examples are given to demonstrate how text and images can be encoded in streams and embedded within a PDF. The goal is to learn the internals of PDF so content can be hidden or revealed through the use of encoding.
The document discusses the role and design of loaders. A loader is system software that loads an object program into memory, making it ready for execution. It copies the program from secondary storage to main memory. Loaders can be absolute or relocatable. Absolute loaders load programs at fixed addresses while relocatable loaders can load programs anywhere in memory.
Transactions in Firebird work by assigning each record multiple versions, with each version associated with a transaction. This allows concurrent transactions to operate on stable views of the database. The transaction inventory pages (TIP) track the state of transactions and four markers - next, oldest active, oldest snapshot, oldest interesting - help determine which record versions are visible to each transaction. Keeping transactions short helps avoid problems related to these markers being out of date and prevents unnecessary record versions from building up over time.
This document discusses the datapath and control circuitry for a simple processor. It begins with background on instruction execution procedures and CPU overview. It then covers the datapath design, starting with R-format instructions involving register reads and ALU operations, then extending to I-format instructions adding memory access. The control circuitry derivation from the instruction register is explained. Finally, the full datapath and control is shown for R-type, load, branch, and jump instructions. Performance issues are noted, with pipelining suggested as an improvement.
The document discusses register transfer language and microoperations. It describes how register transfer language is used to define the internal organization of digital computers by specifying registers, microoperation sequences, and control. Microoperations are elementary operations like shift, count, clear, and load that are performed during one clock cycle on information stored in registers. Common microoperations include register transfer, arithmetic, logic, and shift operations.
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
More Related Content
Similar to Using Apache Fluo to Create a Derived Graph
This document provides a summary of progress and accomplishments from Sprint 19 of the ManageIQ project. Key points include:
- 181 pull requests were merged addressing bugs, enhancements, technical debt, refactoring, and tests.
- Providers work focused on OpenStack infrastructure host events and Kubernetes inventory collection.
- The REST API work included improvements to tag collection/management and foundations like virtual attributes.
- UI work updated the login screen, navigation, and advanced search to use Bootstrap/Patternfly. Orchestration insight and the schedule editor were also improved.
- Other work involved service dialogs, Foreman integration, event handling improvements, and appliance/fleecing fixes and tests
This document discusses Delta Change Data Feed (CDF), which allows capturing changes made to Delta tables. It describes how CDF works by storing change events like inserts, updates and deletes. It also outlines how CDF can be used to improve ETL pipelines, unify batch and streaming workflows, and meet regulatory needs. The document provides examples of enabling CDF, querying change data and storing the change events. It concludes by offering a demo of CDF in Jupyter notebooks.
Accumulo Summit 2014: Accismus -- Percolating with AccumuloAccumulo Summit
Accismus is a system that implements incremental processing on big data using Accumulo and the percolator paper. It adds visibility to the percolator model through the use of observers that are triggered by modifications to user-defined columns. Transactions in Accismus provide fault-tolerant processing through a two-phase commit protocol. Examples demonstrated include a banking application and phrase counting on documents.
Sprint 19 of ManageIQ focused on several areas of development:
- Providers work included bug fixes and new features for OpenStack and Kubernetes integration.
- The REST API saw many improvements including tag management and foundational work. Virtual attribute and ID/Href separation were completed.
- UI updates transitioned screens to Bootstrap/Patternfly including the login screen, navigation, and advanced search. Orchestration insight and the schedule editor were converted to AngularJS.
- Other work involved service dialogs, Foreman integration, event handling improvements, appliance fixes, fleecing automation, and updates to the manageiq.org website.
The document discusses the Android Binder architecture and provides debugging information. It includes a binder transaction log showing pending async transactions between processes. It also shows binder driver statistics like number of threads and nodes. Key binder driver structs are defined for processes, threads, nodes, references, transactions and buffers. The relationship between these structs during an async transaction is illustrated. Finally, it describes how binder transactions are handled and linked within the driver.
This document discusses registers, which are sequential logic circuits that can store multiple bits of data. Registers are built from multiple flip-flops connected in parallel and are used to store data in processors and other digital circuits. The document explains basic register operation, including parallel loading of data and shifting of data. It also discusses different types of shift registers and applications of registers such as serial data transfer.
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...Hsien-Hsin Sean Lee, Ph.D.
This document discusses branch prediction in computer architecture. It begins by explaining what information is predicted for branches - the direction and target. It then categorizes different types of branches and discusses the costs of branch misprediction. Various branch prediction techniques are presented, starting with simple 1-bit and 2-bit predictors, and progressing to more advanced correlating and global history predictors. The goal of branch prediction is to reduce penalties from mispredicted branches by speculatively executing the predicted path.
Flink Forward SF 2017: Shaoxuan Wang_Xiaowei Jiang - Blinks Improvements to F...Flink Forward
This document summarizes recent improvements to Flink SQL and Table API by Blink, Alibaba's distribution of Flink. Key improvements include support for stream-stream joins, user-defined functions, table functions and aggregate functions, retractable streams, and over/group aggregates. Blink aims to make Flink work well at large scale for Alibaba's search and recommendation systems. Many of the improvements will be included in upcoming Flink releases.
Accurate and Reliable What-If Analysis of Business Processes: Is it Achievable?Marlon Dumas
This document discusses using event logs to generate business process simulation models. It describes traditional discrete event simulation approaches that discover simulation models from event logs recorded by information systems. Deep learning techniques are also discussed that can generate traces without an explicit process model. The document suggests that combining discrete event simulation and deep learning may produce more accurate simulations, but challenges remain around validating such hybrid approaches and testing them in previously unseen scenarios. More research is needed before these data-driven simulation methods can reliably predict the effects of interventions.
This is the speech Max Liu gave at Percona Live Open Source Database Conference 2016.
Max Liu: Co-founder and CEO, a hacker with a free soul
The slide covered the following topics:
- Why another database?
- What kind of database we want to build?
- How to design such a database, including the principles, the architecture, and design decisions?
- How to develop such a database, including the architecture and the core technologies for TiKV and TiDB?
- How to test the database to ensure the quality and stability?
The document discusses registers, which are sequential circuits that can store multiple bits of data using multiple flip-flops. Registers are useful for storing data temporarily in processors and building larger sequential circuits. The document describes basic registers, shift registers that can shift data in or out, and how registers are used to convert between serial and parallel data transmission. Registers are faster than memory but also more limited in storage, so processors use hierarchies of caches and memory in addition to registers.
The document discusses the stack and subroutines in assembly language programming. Some key points:
- The stack pointer (SP or A7) points to the top of the stack, which grows downward in memory. Values are pushed onto the stack using predecrement mode and popped using postincrement mode.
- The stack is used for temporary variable storage, passing parameters to subroutines, and storing the return address when making subroutine calls with JSR.
- To call a subroutine, parameters are pushed to the stack, JSR jumps to the subroutine, and RTS pops the return address back to PC. Transparent subroutines save and restore any modified registers.
- Parameters
A fast-paced introduction to TensorFlow 2 about some important new features (such as generators and the @tf.function decorator) and TF 1.x functionality that's been removed from TF 2 (yes, tf.Session() has retired).
Concise code samples are presented to illustrate how to use new features of TensorFlow 2. You'll also get a quick introduction to lazy operators (if you know FRP this will be super easy), along with a code comparison between TF 1.x/iterators with tf.data.Dataset and TF 2/generators with tf.data.Dataset.
Finally, we'll look at some tf.keras code samples that are based on TensorFlow 2. Although familiarity with TF 1.x is helpful, newcomers with an avid interest in learning about TensorFlow 2 can benefit from this session.
The document discusses hiding and revealing secrets in PDF documents. It provides an overview of the PDF file format, including the structure of objects, streams, filters and parsing. Examples are given to demonstrate how text and images can be encoded in streams and embedded within a PDF. The goal is to learn the internals of PDF so content can be hidden or revealed through the use of encoding.
The document discusses the role and design of loaders. A loader is system software that loads an object program into memory, making it ready for execution. It copies the program from secondary storage to main memory. Loaders can be absolute or relocatable. Absolute loaders load programs at fixed addresses while relocatable loaders can load programs anywhere in memory.
Transactions in Firebird work by assigning each record multiple versions, with each version associated with a transaction. This allows concurrent transactions to operate on stable views of the database. The transaction inventory pages (TIP) track the state of transactions and four markers - next, oldest active, oldest snapshot, oldest interesting - help determine which record versions are visible to each transaction. Keeping transactions short helps avoid problems related to these markers being out of date and prevents unnecessary record versions from building up over time.
This document discusses the datapath and control circuitry for a simple processor. It begins with background on instruction execution procedures and CPU overview. It then covers the datapath design, starting with R-format instructions involving register reads and ALU operations, then extending to I-format instructions adding memory access. The control circuitry derivation from the instruction register is explained. Finally, the full datapath and control is shown for R-type, load, branch, and jump instructions. Performance issues are noted, with pipelining suggested as an improvement.
The document discusses register transfer language and microoperations. It describes how register transfer language is used to define the internal organization of digital computers by specifying registers, microoperation sequences, and control. Microoperations are elementary operations like shift, count, clear, and load that are performed during one clock cycle on information stored in registers. Common microoperations include register transfer, arithmetic, logic, and shift operations.
Similar to Using Apache Fluo to Create a Derived Graph (20)
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
Discover the cutting-edge telemetry solution implemented for Alan Wake 2 by Remedy Entertainment in collaboration with AWS. This comprehensive presentation dives into our objectives, detailing how we utilized advanced analytics to drive gameplay improvements and player engagement.
Key highlights include:
Primary Goals: Implementing gameplay and technical telemetry to capture detailed player behavior and game performance data, fostering data-driven decision-making.
Tech Stack: Leveraging AWS services such as EKS for hosting, WAF for security, Karpenter for instance optimization, S3 for data storage, and OpenTelemetry Collector for data collection. EventBridge and Lambda were used for data compression, while Glue ETL and Athena facilitated data transformation and preparation.
Data Utilization: Transforming raw data into actionable insights with technologies like Glue ETL (PySpark scripts), Glue Crawler, and Athena, culminating in detailed visualizations with Tableau.
Achievements: Successfully managing 700 million to 1 billion events per month at a cost-effective rate, with significant savings compared to commercial solutions. This approach has enabled simplified scaling and substantial improvements in game design, reducing player churn through targeted adjustments.
Community Engagement: Enhanced ability to engage with player communities by leveraging precise data insights, despite having a small community management team.
This presentation is an invaluable resource for professionals in game development, data analytics, and cloud computing, offering insights into how telemetry and analytics can revolutionize player experience and game performance optimization.
06-18-2024-Princeton Meetup-Introduction to MilvusTimothy Spann
06-18-2024-Princeton Meetup-Introduction to Milvus
tim.spann@zilliz.com
https://www.linkedin.com/in/timothyspann/
https://x.com/paasdev
https://github.com/tspannhw
https://github.com/milvus-io/milvus
Get Milvused!
https://milvus.io/
Read my Newsletter every week!
https://github.com/tspannhw/FLiPStackWeekly/blob/main/142-17June2024.md
For more cool Unstructured Data, AI and Vector Database videos check out the Milvus vector database videos here
https://www.youtube.com/@MilvusVectorDatabase/videos
Unstructured Data Meetups -
https://www.meetup.com/unstructured-data-meetup-new-york/
https://lu.ma/calendar/manage/cal-VNT79trvj0jS8S7
https://www.meetup.com/pro/unstructureddata/
https://zilliz.com/community/unstructured-data-meetup
https://zilliz.com/event
Twitter/X: https://x.com/milvusio https://x.com/paasdev
LinkedIn: https://www.linkedin.com/company/zilliz/ https://www.linkedin.com/in/timothyspann/
GitHub: https://github.com/milvus-io/milvus https://github.com/tspannhw
Invitation to join Discord: https://discord.com/invite/FjCMmaJng6
Blogs: https://milvusio.medium.com/ https://www.opensourcevectordb.cloud/ https://medium.com/@tspann
Expand LLMs' knowledge by incorporating external data sources into LLMs and your AI applications.
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Marlon Dumas
This webinar discusses the limitations of traditional approaches for business process simulation based on had-crafted model with restrictive assumptions. It shows how process mining techniques can be assembled together to discover high-fidelity digital twins of end-to-end processes from event data.
2. Percolator : Google’s Use Case
● Terabytes of new data coming in each day
● To build index: join terabytes of new data with petabytes of existing data.
● Joining new data with existing data via Map Reduce took multiple days.
● Using Percolator, index update time dropped from days to minutes.
3. Fluo Features
● Layer on top of Accumulo
● Snapshot Isolation : only see committed data
● Cross Row/Node Transactions
○ Read/write data from multiple nodes
○ Fail if two transactions modify same cell : collision
○ Correct in case of faults on multiple nodes
● Observers
○ User code, executes a transaction
○ Triggered by persistent notifications.
○ Observers can trigger other observers
○ Runs in parallel on many nodes
12. Add attributes in derived graph
Twitter
T1
T2
T3
T4
Github
G1
G2
G3
G4
Facebook
F1
F2
F3 F5
Derived
A1
A2
A3
A4
F5
Location:
4 Privet Dr
Timezone:
GMT
Timezone:
GMT
Location:
4 Privet Dr
13. Putting it all together
Fluo Derived Graph
Application
Raw Graph Data
Changes
Alias analytics
Attribute analytics
Query System
Analytic System
14. Distribution of data on cluster
Server 1 Server 2 Server 3 Server 4 Server 5 Server 6
Input graph 1 (e.g. Twitter data)
Input graph 2 (e.g. Github data)
Derived graph
Input graph 3 (e.g. Facebook data)
Aliases
Attributes
15. Using Map Reduce to create derived graph
● Three to Four Joins/Map Reduce jobs
● Analysis/indexing of derived graph requires additional jobs
● When input data changes, must reprocess all data
16. Derived edges Map Reduce job #1
Input
Aliases
A1 F1, T1
A1 F1
A2 T2
A3 F3,T3
Edges
T1 T3
T3 T1
T1 T2
F1 F3
Output
Derived
Edges
Original
Edges
A1 T3 T1 T3
A3 T1 T3 T1
A1 T2 T1 T2
A1 F3 F1 F3
17. Derived edges Map Reduce job #2
Input
Aliases
A1 F1, T1
A1 F1
A2 T2
A3 F3,T3
Output
Derived
Edges
Original
Edges
A1 A3 T1 T3
A3 A1 T3 T1
A1 A2 T1 T2
A1 A3 F1 F3
Derived
Edges
Original
Edges
A1 T3 T1 T3
A3 T1 T3 T1
A1 T2 T1 T2
A1 F3 F1 F3
18. Unique edges Map Reduce job (optional)
Input
Output
Derived
Edges
Original
Edges
A1 A3 {T1->T3,F1->F3}
A1 A2 {T1->T2}
A3 A1 {T3->T1}
Derived
Edges
Original
Edges
A1 A3 T1 T3
A3 A1 T3 T1
A1 A2 T1 T2
A1 A3 F1 F3
21. Using Fluo to create derived graph
● Inputs
○ Raw edges
○ Raw node attributes
○ Aliases
● Supports adding and removing
○ Does not require reprocessing all data
● Outputs changes to derived graph
23. Fluo Data (stored in Accumulo table)
Twitter data
T1 alias A1
T3 alias A3
T5 alias A5
New Edge Transaction : T1->T5
Derived GraphGithub data
G1 alias A1
G2 alias A2
G7 alias A7
24. Fluo Data
Twitter data
T1 alias A1
T3 alias A3
T5 alias A5
New Edge Transaction : T1->T5
● Read Aliases
Derived GraphGithub data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
25. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
New Edge Transaction : T1->T5
● Write Edges
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
26. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
New Edge Transaction : T1->T5
● Notify nodes
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
27. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
New Edge Transaction : T1->T5
● Commit
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Legend
Data WrittenData Read Notification
29. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Node Transaction : A1
● Read changed edges
Derived Graph
A1 -> A5 T1:T5 new
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
Legend
Data WrittenData Read Notification
30. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Node Transaction : A1
● Mark edge processed
● Queue for export
Derived Graph
A1 -> A5 T1:T5
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
+ A1->A5 Followers:0 Following:1
Legend
Data WrittenData Read Notification
31. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Node Transaction : A1
● Commit
Derived Graph
A1 -> A5 T1:T5
A5 <- A1 T5:T1 new
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
+ A1->A5 Followers:0 Following:1
Legend
Data WrittenData Read Notification
32. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Derived Graph
A1 -> A5 T1:T5
A5 <- A1 T5:T1
Github data
G1 alias A1
G2 alias A2
G7 alias A7
Export Queue
+ A1->A5 Followers:0 Following:1
+ A5<-A1 Followers:1 Following:0
34. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A1
Derived Graph
A1 -> A5 T1:T5
A1 -> A7 G1:G7
A5 <- A1 T5:T1
A7 <- A1 G7:G1
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
Legend
Data WrittenData Read Notification
35. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A1
Derived Graph
A1 -> A5 T1:T5
A1 -> A7 G1:G7
A5 <- A1 T5:T1
A7 <- A1 G7:G1
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Read edges and alias
Legend
Data WrittenData Read Notification
36. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A7
T3 alias A3
T5 alias A7
T5 <- T1 A7:A1
Derived Graph
A1 -> A5 T1:T5 deleted
A1 -> A7 G1:G7
A1 -> A7 T1:T5 new
A5 <- A1 T5:T1 deleted
A7 <- A1 G7:G1
A7 <- A1 T5:T1 new
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Delete edges
● Insert edges
Legend
Data WrittenData Read Notification
37. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A7
T3 alias A3
T5 alias A7
T5 <- T1 A7:A1
Derived Graph
A1 -> A5 T1:T5 deleted
A1 -> A7 G1:G7
A1 -> A7 T1:T5 new
A5 <- A1 T5:T1 deleted
A7 <- A1 G7:G1
A7 <- A1 T5:T1 new
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Set notifications
Legend
Data WrittenData Read Notification
38. Fluo Data
Twitter data
T1 alias A1
T1 -> T5 A1:A7
T3 alias A3
T5 alias A7
T5 <- T1 A7:A1
Derived Graph
A1 -> A5 T1:T5 deleted
A1 -> A7 G1:G7
A1 -> A7 T1:T5 new
A5 <- A1 T5:T1 deleted
A7 <- A1 G7:G1
A7 <- A1 T5:T1 new
Github data
G1 alias A1
G1 -> G7 A1:A7
G2 alias A2
G7 alias A7
G7 <- G1 A7:A1
Alias Change Transaction : T5
● Commit
Legend
Data WrittenData Read Notification
39. Legend
Concurrent Aliases Change
Twitter data (time 1)
T1 alias A9
T1 -> T5 A1:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A1
● Alias for T1 and T2 both change.
● Starts two transactions.
● Collision : one fails, one succeeds.
Twitter data (time2)
T1 alias A9
T1 -> T5 A1:A7
T1 -> T5 A9:A5
T3 alias A3
T5 alias A7
T5 <- T1 A5:A9
T5 <- T1 A7:A1
Twitter data (time 0)
T1 alias A1
T1 -> T5 A1:A5
T3 alias A3
T5 alias A5
T5 <- T1 A5:A1
Transaction 1
Changes
Transaction 2
Changes
41. Mixer prototype
● Supports add/remove of edges, aliases, and attributes.
● Exports changes to external query table.
○ Uses invert on export.
● Can lookup nodes in external query table
● Available soon on github
● Easy to run using MiniFluo and MiniAccumulo
○ Git clone
○ ./mixer.sh mini &> mini.log &
○ ./mixer.sh shell fluo.properties
42.
43. Derived graph in Fluo
bob
tw:bob99
g+:bobE
gh:bob799
tw:alice95
loc=TX
g+:joe8
gh:jojo
gh:jeb
fb:joe9
gh:eAdam
gh:alice++
tz=CST
Bob in query table updated by Fluo
bob -> g+:joe8 followers=1,following=0,rawEdges=1
bob -> gh:jojo followers=1,following=1,rawEdges=1
bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1
Status up to here
44. Derived graph in Fluo
bob
tw:bob99
g+:bobE
gh:bob799
alice
tw:alice95
loc=TX
g+:joe8
gh:jojo
gh:jeb
fb:joe9
gh:eAdam
gh:alice++
tz=CST
Bob in query table updated by Fluo
bob -> alice followers=1,following=0,loc=TX,tz=CST,rawEdges=1
bob -> g+:joe8 followers=1,following=0,rawEdges=1
bob -> gh:jojo followers=1,following=1,rawEdges=1
bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1
Status up to here
45. Derived graph in Fluo
bob
tw:bob99
g+:bobE
gh:bob799
alice
tw:alice95
loc=TX
joe
g+:joe8
gh:jojo
gh:jeb
fb:joe9
gh:eAdam
gh:alice++
tz=CST
Bob in query table updated by Fluo
bob -> alice followers=1,following=0,loc=TX,tz=CST,rawEdges=1
bob -> joe followers=1,following=2,rawEdges=2
bob -> g+:joe8 followers=1,following=0,rawEdges=1
bob -> gh:jojo followers=1,following=1,rawEdges=1
bob -> tw:alice95 followers=1,following=0,loc=TX,rawEdges=1
Status up to here
46. Getting started with Fluo
● Fluo Tour
● Documentation on website
● Mailing list and IRC