Accismus is a system that implements incremental processing on big data using Accumulo and the percolator paper. It adds visibility to the percolator model through the use of observers that are triggered by modifications to user-defined columns. Transactions in Accismus provide fault-tolerant processing through a two-phase commit protocol. Examples demonstrated include a banking application and phrase counting on documents.
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Slides from JEEConf 2018 talk "Virtual Machine for Regular Expressions". It describes how and why to implement a custom regular expression engine for matching arbitrary sequences.
Presentation from SCALE 17x (https://www.socallinuxexpo.org/scale/17x/presentations/fast-http-string-processing-algorithms):
There are binary optimizations in HTTP/2, so the protocol becomes less about string processing. However, strings, sometimes quite large like URI or Cookie, stil exists in HTTP. A typical program working with HTTP, must perform various string operations, e.g. tokenization, string matching, searching for a pattern etc. Classic computer science describe many string processing algorithms, but HTTP strings are special and specialized algorithms can improve performance of the strings processing in several times.
This talk describes:
* How HTTP flood may make you HTTP parser the bottleneck x86-64 issues with branch mispredictions, caching and unaligned memory access
* C compiler optimizations for multi-branch statements and autovectorization
* switch-driven finite state machines (FSM) versus direct jumps (e.g. Ragel)
* what makes HTTP strings special and why LIBC functions aren't good
* strspn()- and strcasecmp()-like algorithms for HTTP strings using SSE and AVX
* efficient custom filtering to prevent injection attacks using AVX
* the cost of FPU context switch and how the Linux kernel works with SIMD
* all the topics are illustrated with microbenchmarks
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
This talk shares experiences from deploying and tuning Flink steam processing applications for very large scale. We share lessons learned from users, contributors, and our own experiments about running demanding streaming jobs at scale. The talk will explain what aspects currently render a job as particularly demanding, show how to configure and tune a large scale Flink job, and outline what the Flink community is working on to make the out-of-the-box for experience as smooth as possible. We will, for example, dive into - analyzing and tuning checkpointing - selecting and configuring state backends - understanding common bottlenecks - understanding and configuring network parameters
Slides from JEEConf 2018 talk "Virtual Machine for Regular Expressions". It describes how and why to implement a custom regular expression engine for matching arbitrary sequences.
Presentation from SCALE 17x (https://www.socallinuxexpo.org/scale/17x/presentations/fast-http-string-processing-algorithms):
There are binary optimizations in HTTP/2, so the protocol becomes less about string processing. However, strings, sometimes quite large like URI or Cookie, stil exists in HTTP. A typical program working with HTTP, must perform various string operations, e.g. tokenization, string matching, searching for a pattern etc. Classic computer science describe many string processing algorithms, but HTTP strings are special and specialized algorithms can improve performance of the strings processing in several times.
This talk describes:
* How HTTP flood may make you HTTP parser the bottleneck x86-64 issues with branch mispredictions, caching and unaligned memory access
* C compiler optimizations for multi-branch statements and autovectorization
* switch-driven finite state machines (FSM) versus direct jumps (e.g. Ragel)
* what makes HTTP strings special and why LIBC functions aren't good
* strspn()- and strcasecmp()-like algorithms for HTTP strings using SSE and AVX
* efficient custom filtering to prevent injection attacks using AVX
* the cost of FPU context switch and how the Linux kernel works with SIMD
* all the topics are illustrated with microbenchmarks
Full text search in PostgreSQL is a flexible and powerful facility to search collection of documents using natural language queries. We will discuss several new improvements of FTS in PostgreSQL 9.6 release, such as phrase search, better dictionaries support and tsvector editing functions. Also, we will present new features currently in development - RUM index support, which enables acceleration of some important kinds of full text queries, new and better ranking function for relevance search, loading dictionaries into shared memory and support for search multilingual content.
"TCP Input Text" implements the Google SOAP Search API and Bing API v2 to extract TCP Ports and Fully Qualified Domain Names (FQDN) from Google Search Results into a .csv file and individual shell scripts for maltego, nmap and nc aka netcat to provide assurance of a listening TCP service since the time that has past of the last crawl performed by the GoogleBot and BingBot
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit
Speaker: Aaron Cordova
Most users of Accumulo start developing applications on a single machine and will to scale to up to four orders of magnitude more machines without having to rewrite. In this talk we describe techniques for designing applications for scale, planning a large scale cluster, tuning the cluster for high speed ingest, dealing with a large amount of data over time, and unique features of Accumulo for taking advantage of up to ten thousand nodes in a single instance. We also include the largest public metrics gathered on Accumulo clusters to date and include a discussion of overcoming practical limits to scaling in the future.
Accumulo Summit 2014: A Tour of Internal Accumulo TestingAccumulo Summit
Speaker: Bill Havanki
Accumulo includes a remarkable breadth of testing frameworks, which helps to ensure its correctness, performance, robustness, and protection of your vital data. This presentation takes you on a tour from Accumulo's basic unit testing up through performance and scalability testing exercised on running clusters. Learn the extent to which Accumulo is put through its paces before it is released, and get ideas for how you can similarly enhance testing of your own code.
Labels in Accumulo provide great power and flexibility. However, nearly everyone makes the same set of mistakes when first applying labels to their data. In this talk, we will follow two data architects as they first come to the labeling system in Accumulo, and see how they work their way
out of the pitfalls they create for themselves. Along the way, they'll learn about Accumulo's pluggable security architecture surrounding the core functionality of the labeling system.
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...Accumulo Summit
Speaker: Michael Allen
The early Accumulo developers made security a core part of Accumulo's codebase. As the open source community around Accumulo continues to thrive, this talk examines the current state of Accumulo's security features. The talk will detail some exciting developments in the upcoming 1.6 release, which include enhancements around encryption at rest and in motion. We will also take a broader look at new use cases suggesting a wider set of threats, and how current and future work addresses those threats.
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...Accumulo Summit
Lumify is a relatively new open source platform for big data analysis and visualization, designed to help organizations derive actionable insights from the large volumes of diverse data flowing through their enterprise. Utilizing popular big data tools like Hadoop, Accumulo, and Storm, it ingests and integrates many kinds of data, from unstructured text documents and structured datasets, to images and video. Several open source analytic tools (including Tika, OpenNLP, CLAVIN, OpenCV, and ElasticSearch) are used to enrich the data, increase its discoverability, and automatically uncover hidden connections. All information is stored in a secure graph database implemented on top of Accumulo to support cell-level security of all data and metadata elements. A modern, browser-based user interface enables analysts to explore and manipulate their data, discovering subtle relationships and drawing critical new insights. In addition to full-text search, geospatial mapping, and multimedia processing, Lumify features a powerful graph visualization supporting sophisticated link analysis and complex knowledge representation.
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]Accumulo Summit
Talk Abstract
A common tradeoff made by fault-tolerant, distributed systems is the ease of user interaction with the system. Implementing correct distributed operations in the face of failures often takes priority over reducing the level of effort required to use the system. Because of this, applying a problem in a specific domain to the system can require significant planning and effort by the user. Apache Accumulo, and its sorted, Key-Value data model, is subject to this same problem: it is often difficult to use Accumulo to quickly ascertain real-life answers about some concrete problem.
This problem, not unique to Accumulo itself, has spurred the growth of numerous projects to fill these kinds of gaps in usability, in addition to multiple language bindings provided by applications. Outside of the Java API, Accumulo client support varies from programming languages, like Python or Ruby, to standalone projects that provide their own query language, such as Apache Pig and Apache Hive. This talk will cover the state of client support outside of Accumulo’s Java API with an emphasis on the pros, cons, and best practices of each alternative.
Speaker
Josh Elser
Member of Technical Staff, Hortonworks
Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software and is an Apache Accumulo committer and PMC member. He is also a committer and PMC member of Apache Slider (incubating) and regularly contributes to other Apache projects in the Apache Hadoop ecosystem. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit
Talk Abstract
Having the ability to diagnose and understand what is happening in distributed systems is essential. Tracing is one mechanism that enables analysis of operations in distributed systems by dividing each operation into a tree of measurable sub-tasks. HDFS, Accumulo, and HBase are now converging on a single tracing system utilizing HTrace, an open source tracing instrumentation library that recently became a new Apache Incubator project. This talk will cover tracing fundamentals, the instrumentation that has been added to HDFS to support tracing, and changes that have been made in Accumulo's tracing. It will also cover options for collecting and visualizing traces, as well as the current status of the HTrace podling.
Speaker
Billie Rinaldi
Sr. Member of Technical Staff, Hortonworks
Billie Rinaldi is a Senior Member of Technical Staff at Hortonworks, Inc., currently prototyping new features related to application monitoring and deployment in the Apache Hadoop ecosystem. Prior to August 2012, Billie engaged in big data science and research at the National Security Agency. Since 2008, she has been providing technical leadership regarding the software that is now Apache Accumulo. Billie is the VP of Apache Accumulo, the Accumulo Project Management Committee Chair, and a member of the Apache Software Foundation. She holds a Ph.D. in applied mathematics from Rensselaer Polytechnic Institute.
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]Accumulo Summit
Talk Abstract
An overview of the new client API planned for the release of Accumulo 2.0. Problems with the old API are described, along with lessons learned. The benefits of the new API are explained, and code snippets are provided to demonstrate the overall design and contrast it with the old API, in order to assist users interested in transitioning.
Speaker
Christopher Tubbs
Computer Systems Researcher, National Security Agency
Christopher is a researcher with the National Security Agency. He holds a Bachelor's degree in both Physics and Computer Science from Eastern Michigan University. He is an open source enthusiast and advocate for open source development, as well as data privacy and security. He has been contributing to the Accumulo project since 2009, prior to its release to the Apache Software Foundation in 2011. He is currently a committer and PMC member on the project, and an ASF member, as well as the Accumulo package maintainer for the Fedora project.
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...Accumulo Summit
Talk Abstract
This talk will describe how PHEMI leveraged several key features of Apache Accumulo to satisfy an unconventional use case: allowing farmers to protect, monitor and manage their orchards.
Speaker
Russ Weeks
Software Architect, PHEMI
Russ Weeks is a Software Architect at PHEMI. Prior to joining PHEMI Health Systems, Russ worked in the network management groups at Ericsson and Cray Supercomputers, where he discovered a passion for distributed data structures and algorithms.
PHEMI, Inc. is a Vancouver, BC-based startup focused on the storage, retention and governance of structured and unstructured data.
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]Accumulo Summit
Talk Abstract
Accumulo requires its users to trust each Accumulo installation with their data—a malicious server or user could easily compromise critical data or learn secrets they are not authorized to access. One particular threat is a malicious Accumulo server tampering with query results by returning forged, modified, or incomplete results to a user. We have implemented a lightweight client-side cryptographic tool to protect Accumulo users from this kind of threat.
Our solution is able to handle a spectrum of different threats. At one end of the spectrum, we use end-to-end signatures to guarantee data integrity: Accumulo clients can sign the data they write to Accumulo and verify that the Accumulo instance did not modify it. At the other end of the spectrum, we store metadata about all the entries written to Accumulo, allowing querying clients to guarantee not just the integrity of the elements contained in the query, but that nothing was omitted from the query itself. As an intermediate solution, we propose an extension to the signature scheme that would speed up the signing and verification of entries with symmetric key cryptography, as well as allowing periodic auditing of the database.
This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.
Speaker
Cassandra Sparks
Associate Technical Staff, Lincoln Laboratory, MIT
Cassandra Sparks is a researcher at MIT Lincoln Laboratory. She graduated from Indiana University in 2014 with an MS in computer science, focusing on programming languages and formal methods. Lately, she has been working on cryptographic enforcement of data integrity in Accumulo.
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit
Talk Abstract
Fluo provides a framework to incrementally process large datasets stored in Accumulo. Using Fluo, developers can write applications that maintain a large scale computation using a series of small transactional updates. When compared to batch processing frameworks, Fluo enables lower latency, continuous analysis of data by sacrificing throughput. This talk will provide an overview of the Fluo project by touching on its design, use cases, and API. The talk will show how developers can write Fluo applications to solve problems in a new way. It will highlight the benefits of using Fluo as well as cover the trade offs and potential problems developers may face when writing Fluo applications. The talk will end with a discussion of the current status and future direction of the Fluo project.
Speaker
Michael Walch
Software Engineer, Peterson Technologies
Mike is a software engineer and committer on the Fluo project. He has a background in distributed systems and data science. He holds a Masters in Computer Science from Johns Hopkins University and and B.S in Electrical & Computer Engineering from Carnegie Mellon University.
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit
Talk Abstract
Bulk ingest enables Accumulo to import externally-prepared data into existing tables. Unlike ingest via batch writers, much of the work of organizing data can be left to external processing frameworks such as MapReduce and scaled independently of the Accumulo cluster itself. This reduces the work required of the tablet servers to support ingest, freeing resources to support other operations.
Under the hood, bulk ingest involves a number a moving parts and accounting for a variety of failure scenarios. This talk covers the components of the bulk ingest process in-depth and describes past, current and future implementations of this capability. Attendees will leave this session with an understanding of bulk ingest that will enable troubleshooting, capacity estimation and performance management.
Speaker
Eric Newton
Senior Software Developer, SWComplete
Eric Newton has been a programmer for over 30 years, and has worked on Accumulo since 2009. He has been an open-source contributor and consumer since 1988. Through the years, his distributed communications systems work has included Air Traffic Control, Systems Monitoring and Databases. Eric has started 3 of his own companies and helped several other businesses start.
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit
Talk Abstract
Accumulo has a solid theoretical foundation, endowing it with huge scalability, high reliability, and the makings of class-leading performance for NoSQL operations. Several publications show Accumulo achieving multi-petabyte scalability and outperforming other databases in its class by orders of magnitude. However, there are challenges arising in practice that slow down that performance and introduce bottlenecks.
The root of Accumulo's distributed scale and performance while maintaining consistency comes from a multi-level amplification. Zookeeper bootstraps the consistency with a highly durable quorum. The Accumulo root table uses buffering and caching to boost that performance for sorted key/value operations. With the metadata tablets and data tables, Accumulo continues to boost performance and divides and conqures a highly scalable key/value space to leverage the resources of a large cluster. The challenge arrises when metadata operations at the core of Accumulo bottleneck performance for the entire cluster.
In this talk we will describe the Accumulo metadata operations model in detail. With a couple of prototypical application scenarios, we will show a few areas that are current bottlenecks or that we can expect to be bottlenecks in the near future. We will also propose modifications to the current model and outline projects that the community can take on to keep Accumulo in the lead for performance and scalability.
Speaker
Adam Fuchs
Chief Technology Officer, Sqrrl
As the Chief Technology Officer and co-founder of Sqrrl, Adam Fuchs is responsible for ensuring that Sqrrl is leading the world in Big Data Infrastructure technology. Previously at the National Security Agency, Adam was an innovator and technical director for several database projects, handling some of the world’s largest and most diverse data sets. He is a co-founder of the Apache Accumulo project. Adam has a BS in Computer Science from the University of Washington and has completed extensive graduate-level course work at the University of Maryland.
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]Accumulo Summit
Talk Abstract
Cell based Access Control (CBAC) in Accumulo is a powerful and flexible feature, but it has drawbacks for addressing complex access control requirements. Security architects are unable to include data types, range operators, exceptions, or environment variables to policies for dynamic access control evaluations. It is possible to solve the complex AC requirements by implementing the AC mechanism on application layer, but this approach has its own drawbacks as well. Developing another layer of AC will create an overhead both for the system design and performance.
In this talk, we present our mechanism to extend Accumulo’s Security Labels to include Attributes and XACML. This allows significantly increased Access control policy expressivity, improved policy administration, and the opportunity to implement access control models such as Attribute-based (ABAC) and Risk-Adaptable Access Control (RAdAC) in Accumulo. We will also discuss combining Accumulo's and our AC approaches to increase the capabilities of Accumulo even further. Introducing different types of attributes can be used to achieve both finer-grained and coarser-grained control over data according to access control requirements. For instance environment attributes can be used to limit access of a cell to a specific location of client whereas system specific information such as namespace/table/column can be used to simplify (or complicate) the policies.
Speaker
Gurcan Gercek
Senior Software Developer, Devera Logic
Gurcan Gercek is a Senior Software Developer at Deveralogic and a PhD Computer Science candidate researching access control in big data environments at Dalhousie University in Halifax, Nova Scotia. Gurcan is also the Lead Developer of the open source MalwareZ project at the Honeypot Project, a leading security research organization based in Ann Arbor, Michigan. Gurcan is a BSc and MSc graduate of Computer Engineering from the Izmir Institute of Technology, in Turkey, and trained in network security at the European Commission's Science Service Joint Research Centre in Ispra, Italy.
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit
Talk Abstract
Aggregation has long been a use case of Accumulo Iterators. Iterators' ability to reduce data during compaction and scanning can greatly simplify an aggregation system built on Accumulo. This talk will first review how Accumulo's Iterators/Combiners work in the context of aggregating values. I'll then step back and look at the abstraction of aggregation functions as commutative operations and the several benefits that result by making this abstraction. We will see how it becomes no harder to introduce powerful operations such as cardinality estimation and approximate top-k than it is to sum integers. I will show how to integrate these ideas into Accumulo with an example schema and Iterator. Finally, a practical aggregation use case will be discussed to highlight the concepts from the talk.
Speakers
Gadalia O'Bryan
Senior Solutions Architect, Koverse
Gadalia O'Bryan is a Sr. Solutions Architect at Koverse, where she leads customer projects and contributes to key feature and algorithm design, such as Koverse's Aggregation Framework. Prior to Koverse, Gadalia was a mathematician for the National Security Agency. She has an M.A. in mathematics from UCLA and has been working with Accumulo for the past 6 years.
Bill Slacum
Software Engineer, Koverse
Bill is an Accumulo committer and PMC member who has been working on large scale query and analytic frameworks since 2010. He holds BS's in computer science and financial economics from UMBC. Having never used his passport to leave the United States, he is currently a national man of mystery.
Full text search in PostgreSQL is a flexible and powerful facility to search collection of documents using natural language queries. We will discuss several new improvements of FTS in PostgreSQL 9.6 release, such as phrase search, better dictionaries support and tsvector editing functions. Also, we will present new features currently in development - RUM index support, which enables acceleration of some important kinds of full text queries, new and better ranking function for relevance search, loading dictionaries into shared memory and support for search multilingual content.
"TCP Input Text" implements the Google SOAP Search API and Bing API v2 to extract TCP Ports and Fully Qualified Domain Names (FQDN) from Google Search Results into a .csv file and individual shell scripts for maltego, nmap and nc aka netcat to provide assurance of a listening TCP service since the time that has past of the last crawl performed by the GoogleBot and BingBot
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit
Speaker: Aaron Cordova
Most users of Accumulo start developing applications on a single machine and will to scale to up to four orders of magnitude more machines without having to rewrite. In this talk we describe techniques for designing applications for scale, planning a large scale cluster, tuning the cluster for high speed ingest, dealing with a large amount of data over time, and unique features of Accumulo for taking advantage of up to ten thousand nodes in a single instance. We also include the largest public metrics gathered on Accumulo clusters to date and include a discussion of overcoming practical limits to scaling in the future.
Accumulo Summit 2014: A Tour of Internal Accumulo TestingAccumulo Summit
Speaker: Bill Havanki
Accumulo includes a remarkable breadth of testing frameworks, which helps to ensure its correctness, performance, robustness, and protection of your vital data. This presentation takes you on a tour from Accumulo's basic unit testing up through performance and scalability testing exercised on running clusters. Learn the extent to which Accumulo is put through its paces before it is released, and get ideas for how you can similarly enhance testing of your own code.
Labels in Accumulo provide great power and flexibility. However, nearly everyone makes the same set of mistakes when first applying labels to their data. In this talk, we will follow two data architects as they first come to the labeling system in Accumulo, and see how they work their way
out of the pitfalls they create for themselves. Along the way, they'll learn about Accumulo's pluggable security architecture surrounding the core functionality of the labeling system.
Accumulo Summit 2014: Past and Future Threats: Encryption and Security in Acc...Accumulo Summit
Speaker: Michael Allen
The early Accumulo developers made security a core part of Accumulo's codebase. As the open source community around Accumulo continues to thrive, this talk examines the current state of Accumulo's security features. The talk will detail some exciting developments in the upcoming 1.6 release, which include enhancements around encryption at rest and in motion. We will also take a broader look at new use cases suggesting a wider set of threats, and how current and future work addresses those threats.
Accumulo Summit 2014: Open Source Graph Analysis and Visualization powered by...Accumulo Summit
Lumify is a relatively new open source platform for big data analysis and visualization, designed to help organizations derive actionable insights from the large volumes of diverse data flowing through their enterprise. Utilizing popular big data tools like Hadoop, Accumulo, and Storm, it ingests and integrates many kinds of data, from unstructured text documents and structured datasets, to images and video. Several open source analytic tools (including Tika, OpenNLP, CLAVIN, OpenCV, and ElasticSearch) are used to enrich the data, increase its discoverability, and automatically uncover hidden connections. All information is stored in a secure graph database implemented on top of Accumulo to support cell-level security of all data and metadata elements. A modern, browser-based user interface enables analysts to explore and manipulate their data, discovering subtle relationships and drawing critical new insights. In addition to full-text search, geospatial mapping, and multimedia processing, Lumify features a powerful graph visualization supporting sophisticated link analysis and complex knowledge representation.
Accumulo Summit 2015: Alternatives to Apache Accumulo's Java API [API]Accumulo Summit
Talk Abstract
A common tradeoff made by fault-tolerant, distributed systems is the ease of user interaction with the system. Implementing correct distributed operations in the face of failures often takes priority over reducing the level of effort required to use the system. Because of this, applying a problem in a specific domain to the system can require significant planning and effort by the user. Apache Accumulo, and its sorted, Key-Value data model, is subject to this same problem: it is often difficult to use Accumulo to quickly ascertain real-life answers about some concrete problem.
This problem, not unique to Accumulo itself, has spurred the growth of numerous projects to fill these kinds of gaps in usability, in addition to multiple language bindings provided by applications. Outside of the Java API, Accumulo client support varies from programming languages, like Python or Ruby, to standalone projects that provide their own query language, such as Apache Pig and Apache Hive. This talk will cover the state of client support outside of Accumulo’s Java API with an emphasis on the pros, cons, and best practices of each alternative.
Speaker
Josh Elser
Member of Technical Staff, Hortonworks
Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software and is an Apache Accumulo committer and PMC member. He is also a committer and PMC member of Apache Slider (incubating) and regularly contributes to other Apache projects in the Apache Hadoop ecosystem. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit
Talk Abstract
Having the ability to diagnose and understand what is happening in distributed systems is essential. Tracing is one mechanism that enables analysis of operations in distributed systems by dividing each operation into a tree of measurable sub-tasks. HDFS, Accumulo, and HBase are now converging on a single tracing system utilizing HTrace, an open source tracing instrumentation library that recently became a new Apache Incubator project. This talk will cover tracing fundamentals, the instrumentation that has been added to HDFS to support tracing, and changes that have been made in Accumulo's tracing. It will also cover options for collecting and visualizing traces, as well as the current status of the HTrace podling.
Speaker
Billie Rinaldi
Sr. Member of Technical Staff, Hortonworks
Billie Rinaldi is a Senior Member of Technical Staff at Hortonworks, Inc., currently prototyping new features related to application monitoring and deployment in the Apache Hadoop ecosystem. Prior to August 2012, Billie engaged in big data science and research at the National Security Agency. Since 2008, she has been providing technical leadership regarding the software that is now Apache Accumulo. Billie is the VP of Apache Accumulo, the Accumulo Project Management Committee Chair, and a member of the Apache Software Foundation. She holds a Ph.D. in applied mathematics from Rensselaer Polytechnic Institute.
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]Accumulo Summit
Talk Abstract
An overview of the new client API planned for the release of Accumulo 2.0. Problems with the old API are described, along with lessons learned. The benefits of the new API are explained, and code snippets are provided to demonstrate the overall design and contrast it with the old API, in order to assist users interested in transitioning.
Speaker
Christopher Tubbs
Computer Systems Researcher, National Security Agency
Christopher is a researcher with the National Security Agency. He holds a Bachelor's degree in both Physics and Computer Science from Eastern Michigan University. He is an open source enthusiast and advocate for open source development, as well as data privacy and security. He has been contributing to the Accumulo project since 2009, prior to its release to the Apache Software Foundation in 2011. He is currently a committer and PMC member on the project, and an ASF member, as well as the Accumulo package maintainer for the Fedora project.
Accumulo Summit 2015: Preventing Bugs: How PHEMI Put Accumulo To Work in the ...Accumulo Summit
Talk Abstract
This talk will describe how PHEMI leveraged several key features of Apache Accumulo to satisfy an unconventional use case: allowing farmers to protect, monitor and manage their orchards.
Speaker
Russ Weeks
Software Architect, PHEMI
Russ Weeks is a Software Architect at PHEMI. Prior to joining PHEMI Health Systems, Russ worked in the network management groups at Ericsson and Cray Supercomputers, where he discovered a passion for distributed data structures and algorithms.
PHEMI, Inc. is a Vancouver, BC-based startup focused on the storage, retention and governance of structured and unstructured data.
Accumulo Summit 2015: Verifiable Responses to Accumulo Queries [Security]Accumulo Summit
Talk Abstract
Accumulo requires its users to trust each Accumulo installation with their data—a malicious server or user could easily compromise critical data or learn secrets they are not authorized to access. One particular threat is a malicious Accumulo server tampering with query results by returning forged, modified, or incomplete results to a user. We have implemented a lightweight client-side cryptographic tool to protect Accumulo users from this kind of threat.
Our solution is able to handle a spectrum of different threats. At one end of the spectrum, we use end-to-end signatures to guarantee data integrity: Accumulo clients can sign the data they write to Accumulo and verify that the Accumulo instance did not modify it. At the other end of the spectrum, we store metadata about all the entries written to Accumulo, allowing querying clients to guarantee not just the integrity of the elements contained in the query, but that nothing was omitted from the query itself. As an intermediate solution, we propose an extension to the signature scheme that would speed up the signing and verification of entries with symmetric key cryptography, as well as allowing periodic auditing of the database.
This work is sponsored by the Department of Defense under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.
Speaker
Cassandra Sparks
Associate Technical Staff, Lincoln Laboratory, MIT
Cassandra Sparks is a researcher at MIT Lincoln Laboratory. She graduated from Indiana University in 2014 with an MS in computer science, focusing on programming languages and formal methods. Lately, she has been working on cryptographic enforcement of data integrity in Accumulo.
Accumulo Summit 2015: Using Fluo to incrementally process data in Accumulo [API]Accumulo Summit
Talk Abstract
Fluo provides a framework to incrementally process large datasets stored in Accumulo. Using Fluo, developers can write applications that maintain a large scale computation using a series of small transactional updates. When compared to batch processing frameworks, Fluo enables lower latency, continuous analysis of data by sacrificing throughput. This talk will provide an overview of the Fluo project by touching on its design, use cases, and API. The talk will show how developers can write Fluo applications to solve problems in a new way. It will highlight the benefits of using Fluo as well as cover the trade offs and potential problems developers may face when writing Fluo applications. The talk will end with a discussion of the current status and future direction of the Fluo project.
Speaker
Michael Walch
Software Engineer, Peterson Technologies
Mike is a software engineer and committer on the Fluo project. He has a background in distributed systems and data science. He holds a Masters in Computer Science from Johns Hopkins University and and B.S in Electrical & Computer Engineering from Carnegie Mellon University.
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit
Talk Abstract
Bulk ingest enables Accumulo to import externally-prepared data into existing tables. Unlike ingest via batch writers, much of the work of organizing data can be left to external processing frameworks such as MapReduce and scaled independently of the Accumulo cluster itself. This reduces the work required of the tablet servers to support ingest, freeing resources to support other operations.
Under the hood, bulk ingest involves a number a moving parts and accounting for a variety of failure scenarios. This talk covers the components of the bulk ingest process in-depth and describes past, current and future implementations of this capability. Attendees will leave this session with an understanding of bulk ingest that will enable troubleshooting, capacity estimation and performance management.
Speaker
Eric Newton
Senior Software Developer, SWComplete
Eric Newton has been a programmer for over 30 years, and has worked on Accumulo since 2009. He has been an open-source contributor and consumer since 1988. Through the years, his distributed communications systems work has included Air Traffic Control, Systems Monitoring and Databases. Eric has started 3 of his own companies and helped several other businesses start.
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit
Talk Abstract
Accumulo has a solid theoretical foundation, endowing it with huge scalability, high reliability, and the makings of class-leading performance for NoSQL operations. Several publications show Accumulo achieving multi-petabyte scalability and outperforming other databases in its class by orders of magnitude. However, there are challenges arising in practice that slow down that performance and introduce bottlenecks.
The root of Accumulo's distributed scale and performance while maintaining consistency comes from a multi-level amplification. Zookeeper bootstraps the consistency with a highly durable quorum. The Accumulo root table uses buffering and caching to boost that performance for sorted key/value operations. With the metadata tablets and data tables, Accumulo continues to boost performance and divides and conqures a highly scalable key/value space to leverage the resources of a large cluster. The challenge arrises when metadata operations at the core of Accumulo bottleneck performance for the entire cluster.
In this talk we will describe the Accumulo metadata operations model in detail. With a couple of prototypical application scenarios, we will show a few areas that are current bottlenecks or that we can expect to be bottlenecks in the near future. We will also propose modifications to the current model and outline projects that the community can take on to keep Accumulo in the lead for performance and scalability.
Speaker
Adam Fuchs
Chief Technology Officer, Sqrrl
As the Chief Technology Officer and co-founder of Sqrrl, Adam Fuchs is responsible for ensuring that Sqrrl is leading the world in Big Data Infrastructure technology. Previously at the National Security Agency, Adam was an innovator and technical director for several database projects, handling some of the world’s largest and most diverse data sets. He is a co-founder of the Apache Accumulo project. Adam has a BS in Computer Science from the University of Washington and has completed extensive graduate-level course work at the University of Maryland.
Accumulo Summit 2015: Extending Accumulo to Support ABAC using XACML [Security]Accumulo Summit
Talk Abstract
Cell based Access Control (CBAC) in Accumulo is a powerful and flexible feature, but it has drawbacks for addressing complex access control requirements. Security architects are unable to include data types, range operators, exceptions, or environment variables to policies for dynamic access control evaluations. It is possible to solve the complex AC requirements by implementing the AC mechanism on application layer, but this approach has its own drawbacks as well. Developing another layer of AC will create an overhead both for the system design and performance.
In this talk, we present our mechanism to extend Accumulo’s Security Labels to include Attributes and XACML. This allows significantly increased Access control policy expressivity, improved policy administration, and the opportunity to implement access control models such as Attribute-based (ABAC) and Risk-Adaptable Access Control (RAdAC) in Accumulo. We will also discuss combining Accumulo's and our AC approaches to increase the capabilities of Accumulo even further. Introducing different types of attributes can be used to achieve both finer-grained and coarser-grained control over data according to access control requirements. For instance environment attributes can be used to limit access of a cell to a specific location of client whereas system specific information such as namespace/table/column can be used to simplify (or complicate) the policies.
Speaker
Gurcan Gercek
Senior Software Developer, Devera Logic
Gurcan Gercek is a Senior Software Developer at Deveralogic and a PhD Computer Science candidate researching access control in big data environments at Dalhousie University in Halifax, Nova Scotia. Gurcan is also the Lead Developer of the open source MalwareZ project at the Honeypot Project, a leading security research organization based in Ann Arbor, Michigan. Gurcan is a BSc and MSc graduate of Computer Engineering from the Izmir Institute of Technology, in Turkey, and trained in network security at the European Commission's Science Service Joint Research Centre in Ispra, Italy.
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit
Talk Abstract
Aggregation has long been a use case of Accumulo Iterators. Iterators' ability to reduce data during compaction and scanning can greatly simplify an aggregation system built on Accumulo. This talk will first review how Accumulo's Iterators/Combiners work in the context of aggregating values. I'll then step back and look at the abstraction of aggregation functions as commutative operations and the several benefits that result by making this abstraction. We will see how it becomes no harder to introduce powerful operations such as cardinality estimation and approximate top-k than it is to sum integers. I will show how to integrate these ideas into Accumulo with an example schema and Iterator. Finally, a practical aggregation use case will be discussed to highlight the concepts from the talk.
Speakers
Gadalia O'Bryan
Senior Solutions Architect, Koverse
Gadalia O'Bryan is a Sr. Solutions Architect at Koverse, where she leads customer projects and contributes to key feature and algorithm design, such as Koverse's Aggregation Framework. Prior to Koverse, Gadalia was a mathematician for the National Security Agency. She has an M.A. in mathematics from UCLA and has been working with Accumulo for the past 6 years.
Bill Slacum
Software Engineer, Koverse
Bill is an Accumulo committer and PMC member who has been working on large scale query and analytic frameworks since 2010. He holds BS's in computer science and financial economics from UMBC. Having never used his passport to leave the United States, he is currently a national man of mystery.
Accumulo Summit 2015: Ambari and Accumulo: HDP 2.3 Upcoming Features [Sponsored]Accumulo Summit
Talk Abstract
The upcoming Hortonworks Data Platform (HDP) 2.3 includes significant additions to Accumulo, within the project itself and in its interactions with the larger Hadoop ecosystem.This session will cover high-level changes that improve usability, management and security of Accumulo. Administrators of Accumulo now have the ability to deploy, manage and dynamically configure Accumulo clusters using Apache Ambari.As a part of Ambari integration, the metrics system in Accumulo has been updated to use the standard “Hadoop Metrics2” metrics subsystem which provides native Ganglia and Graphite support as well as supporting the new Ambari Metrics System. On the security front,Accumulo was also improved to support client authentication via Kerberos, while earlier versions of Accumulo only supported Kerberos authentication for server processes.With these changes,Accumulo clients can authenticate solely using their Kerberos identity across the entire Hadoop cluster without the need to manage passwords.
Speakers
Billie Rinaldi Senior Member of Technical Staff, Hortonworks
Billie Rinaldi is a Senior Member of Technical Staff at Hortonworks, Inc., currently prototyping new features related to application monitoring and deployment in the Apache Hadoop ecosystem. Prior to August 2012, Billie engaged in big data science and research at the National Security Agency. Since 2008, she has been providing technical leadership regarding the software that is now Apache Accumulo. Billie is the VP of Apache Accumulo, the Accumulo Project Management Committee Chair, and a member of the Apache Software Foundation. She holds a Ph.D. in applied mathematics from Rensselaer Polytechnic Institute.
Josh Elser Member of Technical Staff, Hortonworks
Josh is a member of the engineering staff at Hortonworks. He is strong advocate for open source software and is an Apache Accumulo committer and PMC member. He is also a committer and PMC member of Apache Slider (incubating) and regularly contributes to other Apache projects in the Apache Hadoop ecosystem. He holds a Bachelor's degree in Computer Science from Rensselaer Polytechnic Institute.
Accumulo Summit 2015: Rya: Optimizations to Support Real Time Graph Queries o...Accumulo Summit
Talk Abstract
The Resource Description Framework (RDF) is a standard model for expressing graph data for the World Wide Web. Developed by the W3C, RDF and related technologies such as OWL and SKOS provide a rich vocabulary for exchanging graph data in a machine understandable manner. As the size of available data continues to grow, there has been an increased desire for methods of storing very large RDF graphs within big data architectures. Rya is a government open source scalable RDF triple store built on top of Apache Accumulo. Originally developed by the Laboratory for Telecommunication Sciences and US Naval Academy, Rya is currently being used by a number of government agencies for storing, inferencing, and querying large amounts of RDF data.
As Rya’s user base has grown, there has been a stronger requirement for near real time query responsiveness over massive RDF graphs. In this talk, we detail several query optimization strategies the Rya team has pursued to better satisfy this requirement. We describe recent work allowing for the use of additional indices to eliminate large common joins within complex SPARQL queries. Additionally, we explain a number of statistics based optimizations to improve query planning. Specifically, we detail extensions to existing methods of estimating the selectivity of individual statement patterns (cardinality) and the selectivity of joining two statement patterns (join selectivity) to better fit a “big data” paradigm and utilize Accumulo. Finally, we share preliminary performance evaluation results for the optimizations that have been pursued.
Speaker
Caleb Meier
Engineer/Algorithm Developer, Parsons Corporation
Dr. Caleb Meier received a PhD from the University of California San Diego (UCSD) in Mathematics in 2012. For the past two years, he was a postdoctoral fellow at UCSD's Math department specializing in non-linear elliptic systems of partial differential equations. He received his undergraduate degree in Mathematics from Yale University in 2006. Dr. Meier is currently working as an engineer at Parsons Corporation, specializing in query optimization algorithms for large scale RDF graphs. He is an expert in semantic technologies, Accumulo, the Hadoop Ecosystem, and is actually more fun to be around than his bio suggests.
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit
Talk Abstract
As with all open-source databases, Accumulo developers often compete between building exciting new features and hacking on performance and stability. As the core features solidify and expand, we see many opportunities to improve performance. An effective methodology for performance improvement is scientific in nature, and follows a well-definite modeling and simulation approach, matching theory to experimentation in an iterative fashion.
Ingest performance is one of the most differentiating characteristics of Accumulo. However, there is much room for improvement for typical ingest-heavy applications. Accumulo supports two mechanisms to bring data in: streaming ingest and bulk ingest. In bulk ingest, the goal is to maximize throughput without constraining latency. Bulk ingest involves creating a set of files that conform to Accumulo's internal RFile format and then registering those files with Accumulo. MapReduce provides a framework for generating, sorting, and storing key/value pairs, which form the primary elements of preparing RFiles for bulk ingest. MapReduce has been used many times over the years to break sorting records, such as Terasort. We can expect it is a reasonable choice for maximizing bulk ingest throughput. However, the theory often proves challenging to implement as there are many performance pitfalls along the way.
In this talk, we dive deep into optimizing MapReduce for Accumulo bulk ingest. We share detailed theoretical and empirical performance models, we discuss techniques for profiling performance, and we suggest reusable techniques for squeezing the maximum performance out of enterprise-grade Accumulo bulk ingest.
Speaker
Chris McCubbin
Director of Data Science, Sqrrl
Chris is the Director of Data Science for Sqrrl. He has extensive experience with the Hadoop ecosystem and applying scientific computation algorithms to real-world datasets. Previously, Chris developed Big Data analysis tools for the Intelligence Community and applied artificial intelligence techniques to unmanned vehicle systems. He holds a MS in Computer Science and BS in Computer Science and Mathematics from the University of Maryland.
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit
Talk Abstract
In this talk we will walk through how Apache Kafka and Apache Accumulo can be used together to orchestrate a de-coupled, real-time distributed and reactive request/response system at massive scale. Multiple data pipelines can perform complex operations for each message in parallel at high volumes with low latencies. The final result will be inline with the initiating call. The architecture gains are immense. They allow for the requesting system to receive a response without the need for direct integration with the data pipeline(s) that messages must go through. By utilizing Apache Kafka and Apache Accumulo, these gains sustain at scale and allow for complex operations of different messages to be applied to each response in real-time.
Speaker
Joe Stein
Principal Consultant, Big Data Open Source Security, LLC
Joe Stein is an Apache Kafka committer and PMC member. Joe is the Founder and Principal Architect of Big Data Open Source Security LLC a professional services and product solutions company. Joe has been a developer, architect and technologist professionally for 15 years now having built back end systems that supported over one hundred million unique devices a day processing trillions of events. He blogs and hosts a podcast about Hadoop and related systems at All Things Hadoop and tweets @allthingshadoop
Introduce Brainf*ck, another Turing complete programming language. Then, try to implement the following from scratch: Interpreter, Compiler [x86_64 and ARM], and JIT Compiler.
My study notes on the Google Percolator paper, available on the Google Research website (USENIX2010). The paper talks about an incremental processing system for large-scale data.
As more and more organizations and individual users turn to Apache Flink for their streaming workloads, there is a bigger demand for additional functionality out-of-the-box. On one hand, there is demand for more low-level APIs that allow for more control, while on the other, users ask for more high-level additions that make the common cases easier to express. This talk will present the new concepts added to the Datastream API in Flink-1.2 and for the upcoming Flink-1.3 release that tried to consolidate the aforementioned goals. We will talk, among others, about the ProcessFunction, a new low level stream processing primitive that gives the user full control over how each event is processed and can register and react to timers, changes in the windowing logic that allow for more flexible windowing strategies, side outputs, and new features concerning the Flink connectors.
5-LEC- 5.pptxTransport Layer. Transport Layer ProtocolsZahouAmel1
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transport Layer.
Transport Layer Protocols
Transpor
Atmosphere Conference 2015: Need for Async: In pursuit of scalable internet-s...PROIDEA
Speaker: Konrad Malawski
Language: English
It's the year 2015, so unless you've been living under a rock for the last decade, you probably have heard about servers and platforms needing to go asynchronous in order to scale. But really, how deep did you dive into the reasons as why this need arrises? This talk aims to explain the various reasons and techniques that can be (and often are) used in developing high performance web applications - from the kernel depths, to the high level abstractions that all contribute to such designs.
We'll start with the lowest level of them all - the network transports we all use and how they impact latency in our systems.
Then we will move on to operating systems' socket selector implementation details and the now legendary C10K problem, to see how implementations were forced to change in order to survive the ever-rising number of concurrent connections. Next we'll dive into processor and thread utilisation effects and how parallel programming - using either message-passing or stream processing style libraries fits into the grand picture of pursuing the most stable and lowest latency characteristics we could dream of.
Visit our website: http://atmosphere-conference.com/
The program will read the file like this, java homework6Bank sma.pdfivylinvaydak64229
The program will read the file like this,
> java homework6/Bank small.txt 4
acct:0 bal:999 trans:1
acct:1 bal:1001 trans:1
acct:2 bal:999 trans:1
acct:3 bal:1001 trans:1
acct:4 bal:999 trans:1
acct:5 bal:1001 trans:1
acct:6 bal:999 trans:1
acct:7 bal:1001 trans:1
acct:8 bal:999 trans:1
acct:9 bal:1001 trans:1
acct:10 bal:999 trans:1
acct:11 bal:1001 trans:1
acct:12 bal:999 trans:1
acct:13 bal:1001 trans:1
acct:14 bal:999 trans:1
acct:15 bal:1001 trans:1
acct:16 bal:999 trans:1
acct:17 bal:1001 trans:1
acct:18 bal:999 trans:1
acct:19 bal:1001 trans:1
Each text file looks something like:
1 2 1
3 4 1
5 6 1
7 8 1
9 10 1
11 12 1
File Format: Each line in the external file represents a single transaction, and contains three
numbers: the id of the account from which the money is being transferred, the id of the account
to which the money is going, and the amount of money. For example the line:
17 6 104
indicates that $104 is being transferred from Account #17 to Account #6.
The test data provided includes transfers with the same from and to account numbers, so make
sure your program will work correctly for these transfers. For example:
5 5 40
Count these as two transactions for the account (one transaction taking money from the account
and one putting money into the account).
My goal is to pass each transaction into the queue, the queue will hold the transaction, the
worker will take the transaction, complete the deposit/withdraw, and update the balance of the
account accordingly. I am required to use BlockingQueue. My problem is that the program is not
running correctly. I need to fix the Bank class, how I start up the Bank in main thread, and also
work on Worker class.
More info:
Details
I recommend a design with four classes—Bank, Account, Transaction, and Worker. Both the
Account and Transactions classes are quite simple.
Account needs to store an id number, the current balance for the account, and the number of
transactions that have occurred on the account. Remember that multiple worker threads may be
accessing an account simultaneously and you must ensure that they cannot corrupt its data. You
may also want to override the toString method to handle printing of account information.
Transaction is a simple class that stores information on each transaction (see below for more
information about each transaction). If you’re careful you can treat the Transaction as
immutable. This means that you do not have to worry about multiple threads accessing it.
Remember an immutable object’s values never change, therefore its values are not subject to
corruption in a concurrent environment.
The Bank class maintains a list of accounts and the BlockingQueue used to communicate
between the main thread and the worker threads. The Bank is also responsible for starting up the
worker threads, reading transactions from the file, and printing out all the account values when
everything is done. Note: make sure you start up all the worker threads before reading the
tr.
We're going to talk whether it's worth to migrate from ASP.NET Core 2.2 to 3.0 in terms of performance. In addition, we're raising topics about high-performance features like pipelines, span, and memory which are used in ASP.NET 3.0 by Microsoft to speed up the processing of requests.
A universal asynchronous receiver-transmitter is a computer hardware device for asynchronous serial communication in which the data format and transmission speeds are configurable.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
2. Accismus
A form of irony where one pretends indifference
and refuses something while actually wanting it.
3. Google's Problem
●
Use M/R to process ~ 10
15
bytes
●
~10
12
bytes new data arrive
●
Use M/R to process 10
15
+ 10
12
bytes
● High latency before new data available for
query
4. Solution
● Percolator : incremental processing for big data
– Layer on top of BigTable
– Offers fault tolerant, cross row transactions
● Lazy recovery
– Offers snapshot isolation
● Only read committed data
– Uses BigTable data model, except timestamp
● Accismus adds visibility
– Has own API
5. Observers
● User defined function that executes a
transaction
● Triggered when a user defined column is
modified (called notification in paper)
● Guarantee only one transaction will execute per
notification
6. Initialize bank
tx1.begin()
if(tx1.get('bob','balance') == null)
tx1.set('bob','balance',100)
if(tx1.get('joe','balance') == null)
tx1.set('joe','balance',100)
if(tx1.get('sue','balance') == null)
tx1.set('sue','balance',100)
tx1.commit()
What could possibly go
wrong?
7. Two threads transferring
Thread 2 on node BThread 2 on node B
tx3.begin()
b3 = tx3.get('joe','balance')
b4 = tx3.get('sue','balance')
tx3.set('joe','balance',b3 + 5)
tx3.set('sue','balance',b4 - 5)
tx3.commit()
Thread 1 on node AThread 1 on node A
tx2.begin()
b1 = tx2.get('joe','balance')
b2 = tx2.get('bob','balance')
tx2.set('joe','balance',b1 + 7)
tx2.set('bob','balance',b2 - 7)
tx2.commit()
8. Accismus stochastic bank test
● Bank account per row
● Initialize N bank accounts with 1000
● Run random transfer threads
● Complete scan always sums to N*1000
9. Phrasecount example
● Have documents + source URI
● Dedupe documents based on SHA1
● Count number of unique documents each
phrase occurs in
● Can do this with two map reduce jobs
● https://github.com/keith-turner/phrasecount
14. Observer transaction 1
document:1e111475
his dog is very nice
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
my dog is very : 1
dog is very nice : 1
15. Observer transaction 2
document:1e111475
his dog is very nice
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
my dog is very : 1 his dog is very : 1
dog is very nice : 2
16. Load transaction 4
document:1e111475
his dog is very nice
document:b4bf617e
my dog is very nice
http://foo.com/a http://foo.net/a http://foo.com/c
my dog is very : 1 his dog is very : 1
dog is very nice : 2
19. Querying phrase counts
● Query Accismus directly
– Lazy recovery may significantly delay query
– High load may delay queries
● Export transaction write to Accumlo table
– WARNING : leaving the sane word of transactions
– Faults during export
– Concurrently exporting same item
– Out of order arrival of exported data
20. Export transaction strategy
● Only export committed data (Intent log)
– Don't export something a transaction is going to
commit
● Idempotent
– Export transaction can fail
– Expect repeated execution (possibly concurrent)
● Use committed sequence # to order data
– Thread could read export data, pause, then export old
data.
– Use seq # as timestamp in Accumulo export table
22. Phrasecount problems
● No handling for high cardinality phrases
– Weak notifications mentioned in paper
– Multi-row tree another possibility
● Possible memory exhaustion
– Percolator uses many threads to get high
throughput
– Example loads entire document into memory. Many
threads X large documents == dead worker.
23. Weak notifications(Queue)
String pr = 'phrase:'+phrase;
int current = tx1.get(pr,'stat:docCount')
if(isHighVolume(phrase)){
tx1.set(pr,'stat:docCount'+rand,delta)
tx1.weakNotify(pr); //trigger observer to collapse rand
columns
}else
tx1.set(pr, 'stat:docCount',delta + current)
24. Multi-row tree for high cardinality
phrase:<phrase>
phrase_01:<phrase>
phrase_1:<phrase>
phrase_00:<phrase>
phrase_0:<phrase>
phrase_10:<phrase>
phrase_11:<phrase>
● Incoming updates leaves
● Observers percolate to root
● Export from root
25. Timestamp Oracle
● lightweight centralized service that issues
timestamp
– Allocates batches of timestamps from zookeeper
– Give batches of timestamps to nodes executing
transactions
26. Timestamp oracle
● Gives logical global ordering to events
– Transactions get timestamp at start. Only read data
committed before.
– Transaction get timestamp when committing.
27. Percolator Implementation
● Two phase commit using conditional mutations
– Write lock+data to primary row/column
– Write lock+data to all other row/columns
– commit primary row/column if still locked
– commit all other row/columns
● Lock fails if change between start and commit
timestamp
● All row/columns in transaction point to primary
● In case of failure, primary is authority
● No centralized locking
28. Handling failures
● Transaction dies in phase 1
– Written some locks+data
– Must rollback
● Transaction dies in phase 2
– All locks+data written
– Roll-forward and write data pointers
29. Transfer transaction
Row Column Percolator
Type
Time Value
bob balance write 1 0
bob balance data 0 100
joe balance write 1 0
joe balance data 0 100
Percolator appends column type to qualifier. Accismus uses high 4 bits of timestamp.
30. Lock primary
Row Column Percolator
Type
Time Value
bob balance write 1 0
bob balance lock 3 bob:balance
bob balance data 3 93
bob balance data 0 100
joe balance write 1 0
joe balance data 0 100
31. Lock other
Row Column Percolator
Type
Time Value
bob balance write 1 0
bob balance lock 3 bob:balance
bob balance data 3 93
bob balance data 0 100
joe balance write 1 0
joe balance lock 3 bob:balance
joe balance data 3 107
joe balance data 0 100
32. Commit primary
Row Column Percolator
Type
Time Value
bob balance write 6 3
bob balance write 1 0
bob balance data 3 93
bob balance data 0 100
joe balance write 1 0
joe balance lock 3 bob:balance
joe balance data 3 107
joe balance data 0 100
What happens if tx with start time 7 reads joe and bob?
Commit timestamp obtained after all locks written, why?
33. Commit other
Row Column Percolator
Type
Time Value
bob balance write 6 3
bob balance write 1 0
bob balance data 3 93
bob balance data 0 100
joe balance write 6 3
joe balance write 1 0
joe balance data 3 107
joe balance data 0 100
34. Garbage collection
● Not mentioned in paper
● Use compaction iterator
● Currently keep X versions. Could determine
oldest active scan start timestamp.
● Must keep data about success/failure of
primary column
– Added extra column type to indicate when primary
can be collected. Never collected in failure case.
35. After GC Iterator
Row Column Percolator
Type
Time Value
bob balance write 6 3:TRUNC
bob balance write 1 0
bob balance data 3 93
bob balance data 0 100
joe balance write 6 3:TRUNC
joe balance write 1 0
joe balance data 3 107
joe balance data 0 100
Transaction with read time of 5 would see StaleScanException
36. Snapshot iterator
● Used to read data
● Analyzes percolator metadata on tserver
● Returns commited data <= start OR open locks
● Detects scan past point of GC
– Client code throws StaleScanException
37. Accismus API
● Minimal byte buffer based API
– Currently byte sequence, plan to move to byte buffer.
(could be your first patch :)
– remove all external dependencies, like Accumulo
Range
● Wrap minimal API w/ convenience API that handles
nulls, encoding, and types well.
//automatically encode strings and int into bytes using supplied encoder
tx.mutate().row(“doc:”+hash).fam(“doc”).qual(“refCount”).set(5);
//no need to check if value is null and then parse as int
int rc = tx.get().row(“doc:”+hash).fam(“doc”).qual(“refCount”).toInteger(0);
38. TODO
● test at scale
● create a cluster test suite
● weak notifications
● use YARN to run
● improve batching of reads and writes
● Initialization via M/R. Accismus file output format
● column read ahead based on past read patterns
● Improve GC
● Improve finding notifications