A presentation about Yahoo! S4 and Apache S4. I gave this presentation for Cloud Computing course of Dr. Payberah @ AUT fall 2014.
The lecturer's references are Yahoo! S4 paper and Apache S4 website.
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Twitter designed and deployed a new streaming system called Heron. Heron has been in production nearly 2 years and is widely used by several teams for diverse use cases. This talk looks at Twitter's operating experiences and challenges of running Heron at scale and the approaches taken to solve those challenges.
This talk was held at the 10th meeting on February 3rd 2014 by Sean Owen.
Having collected Big Data, organizations are now keen on data science and “Big Learning”. Much of the focus has been on data science as exploratory analytics: offline, in the lab. However, building from that a production-ready large-scale operational analytics system remains a difficult and ad-hoc endeavor, especially when real-time answers are required. Design patterns for effective implementations are emerging, which take advantage of relaxed assumptions, adopt a new tiered "lambda" architecture, and pick the right scale-friendly algorithms to succeed. Drawing on experience from customer problems and the open source Oryx project at Cloudera, this session will provide examples of operational analytics projects in the field, and present a reference architecture and algorithm design choices for a successful implementation.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
This presentation covers our use of Storm and the connectors we've built. It also proposes a design for integrating Storm with real-time web services by embedding parts of topologies directly into the web services layer.
Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasa...Data Con LA
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Twitter designed and deployed a new streaming system called Heron. Heron has been in production nearly 2 years and is widely used by several teams for diverse use cases. This talk looks at Twitter's operating experiences and challenges of running Heron at scale and the approaches taken to solve those challenges.
This talk was held at the 10th meeting on February 3rd 2014 by Sean Owen.
Having collected Big Data, organizations are now keen on data science and “Big Learning”. Much of the focus has been on data science as exploratory analytics: offline, in the lab. However, building from that a production-ready large-scale operational analytics system remains a difficult and ad-hoc endeavor, especially when real-time answers are required. Design patterns for effective implementations are emerging, which take advantage of relaxed assumptions, adopt a new tiered "lambda" architecture, and pick the right scale-friendly algorithms to succeed. Drawing on experience from customer problems and the open source Oryx project at Cloudera, this session will provide examples of operational analytics projects in the field, and present a reference architecture and algorithm design choices for a successful implementation.
Streamlio and IoT analytics with Apache PulsarStreamlio
To keep up with fast-moving IoT data, you need technology that can collect, process and store data with performance and scalability. This presentation from Data Day Texas looks at the technology requirements and how Apache Pulsar can help to meet them.
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Brian O'Neill
This presentation covers our use of Storm and the connectors we've built. It also proposes a design for integrating Storm with real-time web services by embedding parts of topologies directly into the web services layer.
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
Apache Flink is a community-driven open source and memory-centric Big Data analytics framework. It provides the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases.
Flink uses a mixture of Scala and Java internally, has very good Scala APIs and some of its libraries are basically pure Scala (FlinkML and Table).
At its core, it is a streaming dataflow execution engine and it also provides several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML) and graph processing (Gelly).
In this talk, you will learn in more details about:
What is Apache Flink, how it fits into the Big Data ecosystem and why it is the 4G (4th Generation) of Big Data Analytics frameworks?
How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment?
Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? What are the benchmarking results between Apache Flink and those other Big Data analytics frameworks?
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Karthik Ramasamy
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. In order
to meet this challenge, Twitter designed an end to end real-time stack consisting of DistributedLog, the distributed and replicated messaging system system, and Heron, the streaming system for real time computation. DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system. Twitter Heron is the next generation streaming system built from ground up to address our scalability and reliability needs. Both the systems have been in production for nearly two years and is widely used at Twitter in a range of diverse applications such as search ingestion pipeline, ad analytics, image classification and more. These slides will describe Heron and DistributedLog in detail, covering a few use cases in-depth and sharing the operating experiences and challenges of running large-scale real time systems at scale.
Within this tutorial we present the results of recent research about the cloud enablement of data streaming systems. We illustrate, based on both industrial as well as academic prototypes, new emerging uses cases and research trends. Specifically, we focus on novel approaches for (1) fault tolerance and (2) scalability in large scale distributed streaming systems. In general, new fault tolerance mechanisms strive to be more robust and at the same time introduce less overhead. Novel load balancing approaches focus on elastic scaling over hundreds of instances based on the data and query workload. Finally, we present open challenges for the next generation of cloud-based data stream processing engines.
Self Regulating Streaming - Data Platforms Conference 2018Streamlio
Streamlio's Karthik Ramasamy takes a look how the Apache Heron streaming platform uses built-in intelligence to automatically regulate data flow and ensure resiliency.
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
At StampedeCon 2014, Scott Shaw (Hortonworks) and Kit Menke (Enteprise Holdings) presented "Storm – Streaming Data Analytics at Scale"
Storm’s primary purpose is to provide real-time analytics against fast moving data before its stored. The use cases range from fraud detection, machine learning, to ETL.
Storm has been clocked at over 1 million tuples processed per second per node. It’s fast, scalable, and language agnostic. This session provides an architecture overview as well as a real-world discussion of its use and implementation at Enterprise Holdings.
Streaming data presents new challenges for statistics and machine learning on extremely large data sets. Tools such as Apache Storm, a stream processing framework, can power range of data analytics but lack advanced statistical capabilities. These slides are from the Apache.con talk, which discussed developing streaming algorithms with the flexibility of both Storm and R, a statistical programming language.
At the talk I dicsussed issues of why and how to use Storm and R to develop streaming algorithms; in particular I focused on:
• Streaming algorithms
• Online machine learning algorithms
• Use cases showing how to process hundreds of millions of events a day in (near) real time
See: https://apacheconna2015.sched.org/event/09f5a1cc372860b008bce09e15a034c4#.VUf7wxOUd5o
Real Time Processing Using Twitter Heron by Karthik RamasamyData Con LA
Abstract:- Today's enterprises are not only producing data in high volume but also at high velocity. With velocity comes the need to process the data in real time. To meet the real time needs, we developed and deployed Heron, the next generation streaming engine at Twitter. Heron processes billions and billions of events per day at Twitter and has been in production for nearly 3 years. Heron provides unparalleled performance at large scale and has been successfully meeting Twitter's strict performance requirements for various streaming and iOT applications. Heron is a open source project with several major contributors from various institutions. As the project, we identified and implemented several optimizations that improved throughput by additional 5x and further reduce latency by 50-60%. In this talk, we will describe Heron in detail, how the detailed profiling indicated the performance bottleneck areas such as multiple serializations/deserialization and immutable data structures. After mitigating these costs, we were able to show much higher throughput and latencies as low as 12ms.
This paper describes the use of Storm at Twitter. Storm is a realtime fault-tolerant and distributed stream data processing system.Storm is currently being used to run various critical computations in Twitter at scale, and in real-time. This paper describes the architecture of Storm and its methods for distributed scale-out and fault-tolerance. This paper also describes how queries (aka.topologies) are executed in Storm, and presents some operational stories based on running Storm at Twitter. We also present results
from an empirical evaluation demonstrating the resilience of
Storm in dealing with machine failures. Storm is under active
development at Twitter and we also present some potential
directions for future work.
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
With tens of thousands of Java servers running in production in enterprise, Java has become a language of choice for building production systems. If our machines are to exhibit acceptable performance, they require regular tuning.This talk takes a detailed look at techniques for tuning a Java Server.
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
A report covers the functional comparison and performance evaluation between Apache Flink, Apache Spark Streaming, Apache Storm and Apache Gearpump(incubating)
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
Apache Flink is a community-driven open source and memory-centric Big Data analytics framework. It provides the only hybrid (Real-Time Streaming + Batch) open source distributed data processing engine supporting many use cases.
Flink uses a mixture of Scala and Java internally, has very good Scala APIs and some of its libraries are basically pure Scala (FlinkML and Table).
At its core, it is a streaming dataflow execution engine and it also provides several APIs for batch processing (DataSet API), real-time streaming (DataStream API) and relational queries (Table API) and also domain-specific libraries for machine learning (FlinkML) and graph processing (Gelly).
In this talk, you will learn in more details about:
What is Apache Flink, how it fits into the Big Data ecosystem and why it is the 4G (4th Generation) of Big Data Analytics frameworks?
How Apache Flink integrates with Apache Hadoop and other open source tools for data input and output as well as deployment?
Why Apache Flink is an alternative to Apache Hadoop MapReduce, Apache Storm and Apache Spark? What are the benchmarking results between Apache Flink and those other Big Data analytics frameworks?
Twitter's Real Time Stack - Processing Billions of Events Using Distributed L...Karthik Ramasamy
Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. In order
to meet this challenge, Twitter designed an end to end real-time stack consisting of DistributedLog, the distributed and replicated messaging system system, and Heron, the streaming system for real time computation. DistributedLog is a replicated log service that is built on top of Apache BookKeeper, providing infinite, ordered, append-only streams that can be used for building robust real-time systems. It is the foundation of Twitter’s publish-subscribe system. Twitter Heron is the next generation streaming system built from ground up to address our scalability and reliability needs. Both the systems have been in production for nearly two years and is widely used at Twitter in a range of diverse applications such as search ingestion pipeline, ad analytics, image classification and more. These slides will describe Heron and DistributedLog in detail, covering a few use cases in-depth and sharing the operating experiences and challenges of running large-scale real time systems at scale.
Within this tutorial we present the results of recent research about the cloud enablement of data streaming systems. We illustrate, based on both industrial as well as academic prototypes, new emerging uses cases and research trends. Specifically, we focus on novel approaches for (1) fault tolerance and (2) scalability in large scale distributed streaming systems. In general, new fault tolerance mechanisms strive to be more robust and at the same time introduce less overhead. Novel load balancing approaches focus on elastic scaling over hundreds of instances based on the data and query workload. Finally, we present open challenges for the next generation of cloud-based data stream processing engines.
Self Regulating Streaming - Data Platforms Conference 2018Streamlio
Streamlio's Karthik Ramasamy takes a look how the Apache Heron streaming platform uses built-in intelligence to automatically regulate data flow and ensure resiliency.
Storm – Streaming Data Analytics at Scale - StampedeCon 2014StampedeCon
At StampedeCon 2014, Scott Shaw (Hortonworks) and Kit Menke (Enteprise Holdings) presented "Storm – Streaming Data Analytics at Scale"
Storm’s primary purpose is to provide real-time analytics against fast moving data before its stored. The use cases range from fraud detection, machine learning, to ETL.
Storm has been clocked at over 1 million tuples processed per second per node. It’s fast, scalable, and language agnostic. This session provides an architecture overview as well as a real-world discussion of its use and implementation at Enterprise Holdings.
Streaming data presents new challenges for statistics and machine learning on extremely large data sets. Tools such as Apache Storm, a stream processing framework, can power range of data analytics but lack advanced statistical capabilities. These slides are from the Apache.con talk, which discussed developing streaming algorithms with the flexibility of both Storm and R, a statistical programming language.
At the talk I dicsussed issues of why and how to use Storm and R to develop streaming algorithms; in particular I focused on:
• Streaming algorithms
• Online machine learning algorithms
• Use cases showing how to process hundreds of millions of events a day in (near) real time
See: https://apacheconna2015.sched.org/event/09f5a1cc372860b008bce09e15a034c4#.VUf7wxOUd5o
Real Time Processing Using Twitter Heron by Karthik RamasamyData Con LA
Abstract:- Today's enterprises are not only producing data in high volume but also at high velocity. With velocity comes the need to process the data in real time. To meet the real time needs, we developed and deployed Heron, the next generation streaming engine at Twitter. Heron processes billions and billions of events per day at Twitter and has been in production for nearly 3 years. Heron provides unparalleled performance at large scale and has been successfully meeting Twitter's strict performance requirements for various streaming and iOT applications. Heron is a open source project with several major contributors from various institutions. As the project, we identified and implemented several optimizations that improved throughput by additional 5x and further reduce latency by 50-60%. In this talk, we will describe Heron in detail, how the detailed profiling indicated the performance bottleneck areas such as multiple serializations/deserialization and immutable data structures. After mitigating these costs, we were able to show much higher throughput and latencies as low as 12ms.
This paper describes the use of Storm at Twitter. Storm is a realtime fault-tolerant and distributed stream data processing system.Storm is currently being used to run various critical computations in Twitter at scale, and in real-time. This paper describes the architecture of Storm and its methods for distributed scale-out and fault-tolerance. This paper also describes how queries (aka.topologies) are executed in Storm, and presents some operational stories based on running Storm at Twitter. We also present results
from an empirical evaluation demonstrating the resilience of
Storm in dealing with machine failures. Storm is under active
development at Twitter and we also present some potential
directions for future work.
Bobby Evans and Tom Graves, the engineering leads for Spark and Storm development at Yahoo will talk about how these technologies are used on Yahoo's grids and reasons why to use one or the other.
Bobby Evans is the low latency data processing architect at Yahoo. He is a PMC member on many Apache projects including Storm, Hadoop, Spark, and Tez. His team is responsible for delivering Storm as a service to all of Yahoo and maintaining Spark on Yarn for Yahoo (Although Tom really does most of that work).
Tom Graves a Senior Software Engineer on the Platform team at Yahoo. He is an Apache PMC member on Hadoop, Spark, and Tez. His team is responsible for delivering and maintaining Spark on Yarn for Yahoo.
With tens of thousands of Java servers running in production in enterprise, Java has become a language of choice for building production systems. If our machines are to exhibit acceptable performance, they require regular tuning.This talk takes a detailed look at techniques for tuning a Java Server.
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
A report covers the functional comparison and performance evaluation between Apache Flink, Apache Spark Streaming, Apache Storm and Apache Gearpump(incubating)
Tank Battle - A simple game powered by JMonkey engineFarzad Nozarian
A simple two player java-based game powered by JMonkey game engine which implements many standard object oriented design patterns such as Singleton, Composite, Strategy and etc.
A tutorial presentation based on hadoop.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on github.com/amplab/shark documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on storm.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on hbase.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
A tutorial presentation based on hadoop.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
Big Data Processing in Cloud Computing EnvironmentsFarzad Nozarian
This is my Seminar presentation, adopted from a paper with the same name (Big Data Processing in Cloud Computing Environments), and it is about various issues of Big Data, from its definitions and applications to processing it in cloud computing environments. It also addresses the Big Data technologies and focuses on MapReduce and Hadoop.
In addition to running databases in Amazon EC2, AWS customers can choose among a variety of managed database services. These services save effort, save time, and unlock new capabilities and economies. In this session, we make it easy to understand how they differ, what they have in common, and how to choose one or more. We explain the fundamentals of Amazon DynamoDB, a fully managed NoSQL database service; Amazon RDS, a relational database service in the cloud; Amazon ElastiCache, a fast, in-memory caching service in the cloud; and Amazon Redshift, a fully managed, petabyte-scale data-warehouse solution that can be surprisingly economical. We’ll cover how each service might help support your application, how much each service costs, and how to get started.
Speaker:
Shaun Pearce, AWS Solutions Architect
A tutorial presentation based on spark.apache.org documentation.
I gave this presentation at Amirkabir University of Technology as Teaching Assistant of Cloud Computing course of Dr. Amir H. Payberah in spring semester 2015.
To deliver a compelling and memorable presentation, you need to rehearse your presentation material. Learn the best ways to practice for a professional presentation by reviewing this Mini-Guide to Presentation Practice by Ethos3. If you want to be trained by presentation professionals, or if you need your presentation designed by award-winning designers, contact Ethos3 at: http://www.ethos3.com
An impactful approach to the Seven Deadly Sins you and your Brand should avoid on Social Media! From a humoristic approach to a modern-life analogy for Social Media and including everything in between, this deck is a compelling resource that will provide you with more than a few take-aways for your Brand!
How People Really Hold and Touch (their Phones)Steven Hoober
For the newest version of this presentation, always go to: 4ourth.com/tppt
For the latest video version, see: 4ourth.com/tvid
Presented at ConveyUX in Seattle, 7 Feb 2014
For the newest version of this presentation, always go to: 4ourth.com/tppt
For the latest video version, see: 4ourth.com/tvid
We are finally starting to think about how touchscreen devices really work, and design proper sized targets, think about touch as different from mouse selection, and to create common gesture libraries.
But despite this we still forget the user. Fingers and thumbs take up space, and cover the screen. Corners of screens have different accuracy than the center. It's time to re-evaluate what we think we know.
Steven reviews his ongoing research into how people actually interact with mobile devices, presents some new ideas on how we can design to avoid errors and take advantage of this new knowledge, and leaves you with 10 (relatively) simple steps to improve your touchscreen designs tomorrow.
You are dumb at the internet. You don't know what will go viral. We don't either. But we are slighter less dumber. So here's a bunch of stuff we learned that will help you be less dumb too.
This morning I presented the “Managing VMware vSphere 4 with The Virtualization EcoShell” session for an audience of 200+ people at the Dutch VMUG event in Nieuwegein. The total number of attendees is over 600!!! Here’s a copy of my slide deck.
Flink Forward San Francisco 2018: David Reniz & Dahyr Vergara - "Real-time m...Flink Forward
“Customer experience is the next big battle ground for telcos,” proclaimed recently Amit Akhelikar, Global Director of Lynx Analytics at TM Forum Live! Asia in Singapore. But, how to fight in this battle? A common approach has been to keep “under control” some well-known network quality indicators, like dropped calls, radio access congestion, availability, and so on; but this has proven not to be enough to keep customers happy, like a siege weapon is not enough to conquer a city. But, what if it were possible to know how customers perceive services, at least most demanded ones, like web browsing or video streaming? That would be like a squad of archers ready to battle. And even having that, how to extract value of it and take actions in no time, giving our skilled archers the right targets? Meet CANVAS (Customer And Network Visualization and AnaltyticS), one of the first LATAM implementations of a Flink-based stream processing use case for a telco, which successfully combines leading and innovative technologies like Apache Hadoop, YARN, Kafka, Nifi, Druid and advanced visualizations with Flink core features like non-trivial stateful stream processing (joins, windows and aggregations on event time) and CEP capabilities for alarm generation, delivering a next-generation tool for SOC (Service Operation Center) teams.
How can the concepts of event-driven linked with the concepts of serivce-oriented architectures. and what is the added value of such a combination?
What do events mean in the context of Business Process Management (BPM) and Business Activity Monitoring (BAM), and how can such architectures/solutions be enhanced with the concepts of Complex Event Processing?
Building Event Driven Architectures with Kafka and Cloud Events (Dan Rosanova...confluent
Apache Kafka is changing the way we build scalable and highly available software systems. Providing a simplified path to eventual consistency and event sourcing Kafka gives us the platform to make these patterns a reality for a much broader segment of applications and customers than was possible in the past. Cloud Events is an interoperable specification for eventing that is part of the CNCF. This session will combine open source and open standards to show you how you can build highly reliable application that scale linearly, provide interoperability and are easily extensible leveraging both push and pull semantics. Concrete real world examples will be shown of how Kafka makes event sourcing more approachable and how streams and events complement each other including the difference between business events and technical events.
Data Analysis with Apache Flink (Hadoop Summit, 2015)Aljoscha Krettek
Apache Flink is an open source project that offers both batch and stream processing on top of a common runtime and exposing a common API.
This talk shows how you can easily analyse data with Apache Flink. We present a new relational API and also the new Apache Flink Machine Learning library.
Spark Based Distributed Deep Learning Framework For Big Data Applications Humoyun Ahmedov
Deep Learning architectures, such as deep neural networks, are currently the hottest emerging areas of data science, especially in Big Data. Deep Learning could be effectively exploited to address some major issues of Big Data, such as fast information retrieval, data classification, semantic indexing and so on. In this work, we designed and implemented a framework to train deep neural networks using Spark, fast and general data flow engine for large scale data processing, which can utilize cluster computing to train large scale deep networks. Training Deep Learning models requires extensive data and computation. Our proposed framework can accelerate the training time by distributing the model replicas, via stochastic gradient descent, among cluster nodes for data resided on HDFS.
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Implyconfluent
Presenters: Rachel Pedreschi, Senior Director, Solutions Engineering, Imply.io + Josh Treichel, Partner Solutions Architect, Confluent
Analytic pipelines running purely on batch processing systems can suffer from hours of data lag, resulting in accuracy issues with analysis and overall decision-making. Join us for a demo to learn how easy it is to integrate your Apache Kafka® streams in Apache Druid (incubating) to provide real-time insights into the data.
In this online talk, you’ll hear about ingesting your Kafka streams into Imply’s scalable analytic engine and gaining real-time insights via a modern user interface.
Register now to learn about:
-The benefits of combining a real-time streaming platform with a comprehensive analytics stack
-Building an analytics pipeline by integrating Confluent Platform and Imply
-How KSQL, streaming SQL for Kafka, can easily transform and filter streams of data in real time
-Querying and visualizing streaming data in Imply
-Practical ways to implement Confluent Platform and Imply to address common use cases such as analyzing network flows, collecting and monitoring IoT data and visualizing clickstream data
Confluent Platform, developed by the creators of Kafka, enables the ingest and processing of massive amounts of real-time event data. Imply, the complete analytics stack built on Druid, can ingest, store, query and visualize streaming data from Confluent Platform, enabling end-to-end real-time analytics. Together, Confluent and Imply can provide low latency data delivery, data transform, and data querying capabilities to power a range of use cases.
▪ Developed a recursive-descent parser to generate an intermediate representation for subsequent optimizations in Java
▪ Implemented common subexpression elimination and copy propagation on control flow graph
▪ Deployed a code generator for the source language that yields optimized native programs
Cloud Native London 2019 Faas composition using Kafka and cloud-eventsNeil Avery
Serverless functions or FaaS are all the rage.
By leveraging well established event-driven microservice design principles and applying them to serverless functions you can build a homogenous ecosystem to run FaaS applications. Kafka’s natural ability to store and replay events means serverless functions can not only be replayed, but they can also be used to choreograph call chains or driven using orchestration. Kafka also means you can democratize and organize FaaS environments in a way that scales across the enterprise. Underpinning this mantra is the use of Cloud Events by the CNCF serverless working group (of which Confluent is an active member).
Simplified Data Processing On Large ClusterHarsh Kevadia
A computer cluster consists of a set of loosely connected or tightly connected computers that work together so that in many respects they can be viewed as a single system. They are connected through fast local area network and are deployed to improve performance over that of single computer. We know that on the web large amount of data are being stored, processed and retrieved in a few milliseconds. Doing so with help of single computer machine is very difficult task. And so we require cluster of machines which can perform this task.
Although using cluster for processing data is not enough, we need to develop a technique that can perform this task easily and efficiently. MapReduce programming model is used for this type of processing. In this model Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key.
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
2. /* Who we are! */
Farzad Nozarian
fnozarian@aut.ac.ir
Big Data Processing And Mining
Mazaher Bazari
mbazari@aut.ac.ir
Mobile Cloud Computing
1
3. What is S4
Simple Scalable Streaming System
Inspired by the MapReduce model!
S4 is a general-purpose, distributed, scalable,
partially fault-tolerant, pluggable platform for
processing continuous unbounded streams of data
2
6. “cost-per-click” billing model
Render the most relevant ads in an optimal
position on the page
Include user preferences from context:
Processing thousands of queries per sec.
recent user activity
Geographic location
Prior queries
Prior clicks
5
7. Reinvent the Wheel!
Extending the open source Hadoop platform to support
computation of unbound streams
But, Hadoop isn’t suitable!
The Hadoop platform was highly optimized for batch processing
MapReduce systems typically operate on static data by
scheduling batch jobs.
6
8. Real world systems!
Partition #1
Partition #2
Partition #...
Partition #N
Partition #3
Data Stream
Latency
Latency is proportional to
Length of the segment
Overhead of segmentation
and initiate the processing jobs
Fixed-size segments
7
9. Design goals
Simple API
Scale using commodity hardware
Minimize latency by using local memory in
each processing node
8
11. S4 Model
Avoiding the use of shared memory across the cluster
Distributed operation on commodity hardware
Actors model
G. Agha, Actors: A Model of Concurrent Computation in Distributed Systems.
10
12. S4 model (cont.)
Computation is performed by Processing Elements (PEs)
Messages are transmitted between them in the form of data events
The state of each PE is inaccessible to other PEs
Event emission and consumption is the only mode of interaction
between PEs
The framework provides the capability to route events to
appropriate PEs and to create new instances of PEs
11
13. Design Assumptions
Lossy failover is acceptable!
Nodes will not be added to or removed from a running cluster!
12
14. Design: Example
What is the task?
The task is to continuously produce a sorted list
of the top K most frequent words across all
documents with minimal latency
13
15. EV Quote
KEY Null
VALUE Quote=“I …”
A keyless event (EV) arrives at PE1 with quote:
EV Quote “I meant what I said and I said what I meant.”, Dr. Seuss
PE1
QuoteSplitterPE (PE1) counts
unique words in Quote and
emits events for each word.
EV WordEvent
KEY word="said"
VALUE count=2
PE2 PE3 PE4…
EV WordEvent
KEY word=“i"
VALUE count=4
PE2 PE3 PE4…
PE3
EV UpdatedCountEv
KEY sortID=2
VALUE word=said count=9
EV UpdatedCountEv
KEY sortID=9
VALUE word="i" count=35
WordCountPE (PE2-4) keeps
total counts for each word
across all quotes. Emits an event
any time a count is updated.
EV PartialTopKEv
KEY topk=1234
VALUE words={w:cnt}
MergePE (PE8) combines partial
TopK lists and outputs final TopK list.
14
17. Design: Processing Elements (cont.)
EV Quote
KEY Null
VALUE Quote=“I …”
Keyless PEs
No keyed attribute or value
Consume all events of the type with which they are
associated
Typically used at the input layer of an S4 cluster where
events are assigned a key
Standard PE
Count
aggregate
join
16
18. Design: Processing Node
Processing Nodes (PNs) are the logical hosts to PEs.
They are responsible for:
listening to events
executing operations on the incoming events
dispatching events with the assistance of the communication layer
emitting output events
17
19. Communication Layer
Zookeeper
Design: Processing Node (cont.)
Processing Element Container
PE1 PE2 PEn
…
Event
Listener
Dispatcher Emiter
Routing Load Balancing
Failover Management
Transport Protocols
18
25. Streaming Click-Through Rate
Computation
CTR = (ratio of the number of clicks )/(number of impressions)
Two types of
events
Serve Event
Click Event
Serve is a search result page is returned to
the user
24
26. Streaming Click-Through Rate
Computation(cont.)
Serve event contain:
serveID
query
user
Ads
…..
Click event contain:
Click information
serveID
Use a set of heuristic rules to eliminate
suspicious serves and click
25
27. EV RawServe
KEY Null
VALUE _Serve_Data
Event Flow of CTR Computation
PE1
EV serve
KEY Serve=123
VALUE Serve Data
EV Click
KEY Serve=123
VALUE Click Data
PE4
EV JoinedServe
KEY usr=Peter
VALUE JoinedData
EV JoinedClick
KEY usr=Peter
VALUE JoinedData
EV FilteredServe
KEY g-ad=Ipod-78
VALUE JoinedData
EV RawClick
KEY Null
VALUE _Click_Data
PE2
PE3
EV FilteredClick
KEY g-ad=Ipod-78
VALUE JoinedData
26
33. Apache S4: Commands
s4 <command> <options>
Command Purpose
newApp Create a new application
zkServer Start a ZooKeeper server
newCluster Define an S4 cluster
s4r Package an application
deploy Deploy/configure an application
node Start an S4 node
status Get information about S4 infrastructure
32
thousands of queries per second, which may include several ads per page.
To process user feedback, we developed S4, a low latency, scalable stream processing engine.
The main requirement for research is to have a high degree of flexibility to deploy algorithms to the field very quickly.
The main requirements for a production environment are scalability and high availability
Small segments will reduce latency, add overhead, and make it more complex to manage intersegment dependencies
On the other hand, large segments would increase latency.
The optimal segment size will depend on the application.
Minimize latency by using local memory in each processing node and avoiding disk I/O bottlenecks.
Decentralized architecture greatly simplifies deployment and maintenance.
Use a pluggable architecture to keep the design as generic and customizable as possible.
Upon a server failure, processes are automatically moved to a standby server. The state of the processes, which is stored in local memory, is lost during the handoff. The state is regenerated using the input streams.
QuoteSplitterPE is a keyless PE object that processes all Quote events.
For each unique word in a document, the QuoteSplitterPE object will assign a count and emit a new event of type WordEvent, keyed on word.
If the WordCountPE object exists, the PE object is called and the counter is incremented, otherwise a new WordCountPE object is instantiated.
S4 routes each event to PNs based on a hash function of the values of all known keyed attributes in that event.
Communication Layer provides:
Cluster management
Automatic failover to standby nodes
Maps physical nodes to logical nodes
It uses a pluggable architecture to select network protocol
Events may be sent with or without a guarantee
It uses ZooKeeper to help coordinate between nodes