S4 is a distributed stream computing platform that allows programmers to easily implement applications for processing continuous unbounded streams of data in real-time. It uses an actor-based programming model and is designed to be fault-tolerant, scalable, and pluggable. S4 was originally developed at Yahoo! Labs to enable personalized search ads by modeling users' click behaviors in real-time from streams of user activity data. It aims to maximize revenue and user experience by controlling ad ranking, pricing, filtering, and placement based on personalized models of users' intent.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/XCerdb.
Rob Shilston discusses the need for coding responsively, not just designing responsively, along with the development process in place at Financial Times. Filmed at qconsf.com.
Rob Shilston is a director of the FT's Labs division, which works on experimental web technologies and produces products such as the FT web app. He is currently responsible for the technical delivery of the FT web app and its hosting infrastructure. Prior to FT Labs, Rob founded the web consulting firm Assanka, which was acquired by the FT in January 2012.
Despite much progress, developing a pervasive computing application remains a challenge because of a lack of conceptual frameworks and supporting tools. This challenge involves coping with heterogeneous devices, overcoming the intricacies of distributed systems technologies, working out an architecture for the application, and encoding it into a program. Moreover, testing pervasive computing applications is problematic because it requires acquiring, testing and interfacing a variety of software and hardware entities. This process can rapidly become costly and time-consuming when the target environment involves many entities.
This thesis proposes a tool-based methodology for developing and testing pervasive computing applications. Our methodology first provides the DiaSpec design language that allows to define a taxonomy of area-specific building-blocks, abstracting over their heterogeneity. This language also includes a layer to define the architecture of an application. Our tool suite includes a compiler that takes DiaSpec design artifacts as input and generates a programming framework that supports the implementation and testing stages.
To address the testing phase, we propose an approach and a tool integrated in our tool-based methodology, namely DiaSim. Our approach uses the testing support generated by DiaSpec to transparently test applications in a simulated physical environment. The simulation of an application is rendered graphically in a 2D visualization tool.
We combined DiaSim with a domain-specific language for describing physical environment phenomena as differential equations, allowing a physically-accurate testing. DiaSim has been used to simulate various pervasive computing systems in different application areas. Our simulation approach has also been applied to an avionics system, which demonstrates the generality of our parameterized simulation approach.
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/XCerdb.
Rob Shilston discusses the need for coding responsively, not just designing responsively, along with the development process in place at Financial Times. Filmed at qconsf.com.
Rob Shilston is a director of the FT's Labs division, which works on experimental web technologies and produces products such as the FT web app. He is currently responsible for the technical delivery of the FT web app and its hosting infrastructure. Prior to FT Labs, Rob founded the web consulting firm Assanka, which was acquired by the FT in January 2012.
Despite much progress, developing a pervasive computing application remains a challenge because of a lack of conceptual frameworks and supporting tools. This challenge involves coping with heterogeneous devices, overcoming the intricacies of distributed systems technologies, working out an architecture for the application, and encoding it into a program. Moreover, testing pervasive computing applications is problematic because it requires acquiring, testing and interfacing a variety of software and hardware entities. This process can rapidly become costly and time-consuming when the target environment involves many entities.
This thesis proposes a tool-based methodology for developing and testing pervasive computing applications. Our methodology first provides the DiaSpec design language that allows to define a taxonomy of area-specific building-blocks, abstracting over their heterogeneity. This language also includes a layer to define the architecture of an application. Our tool suite includes a compiler that takes DiaSpec design artifacts as input and generates a programming framework that supports the implementation and testing stages.
To address the testing phase, we propose an approach and a tool integrated in our tool-based methodology, namely DiaSim. Our approach uses the testing support generated by DiaSpec to transparently test applications in a simulated physical environment. The simulation of an application is rendered graphically in a 2D visualization tool.
We combined DiaSim with a domain-specific language for describing physical environment phenomena as differential equations, allowing a physically-accurate testing. DiaSim has been used to simulate various pervasive computing systems in different application areas. Our simulation approach has also been applied to an avionics system, which demonstrates the generality of our parameterized simulation approach.
This is the mos innovative services provide by Metacortex. Those services permits to support bioassets production management of our clients in an optimized way.
This is the third part of my presentation, it is regarding effective e-commerce marketing on social media and building a healthy and friendly relationship with your customers.
This is the mos innovative services provide by Metacortex. Those services permits to support bioassets production management of our clients in an optimized way.
This is the third part of my presentation, it is regarding effective e-commerce marketing on social media and building a healthy and friendly relationship with your customers.
Building Intelligent Applications, Experimental ML with Uber’s Data Science W...Databricks
In this talk, we will explore how Uber enables rapid experimentation of machine learning models and optimization algorithms through the Uber’s Data Science Workbench (DSW). DSW covers a series of stages in data scientists’ workflow including data exploration, feature engineering, machine learning model training, testing and production deployment. DSW provides interactive notebooks for multiple languages with on-demand resource allocation and share their works through community features.
It also has support for notebooks and intelligent applications backed by spark job servers. Deep learning applications based on TensorFlow and Torch can be brought into DSW smoothly where resources management is taken care of by the system. The environment in DSW is customizable where users can bring their own libraries and frameworks. Moreover, DSW provides support for Shiny and Python dashboards as well as many other in-house visualization and mapping tools.
In the second part of this talk, we will explore the use cases where custom machine learning models developed in DSW are productionized within the platform. Uber applies Machine learning extensively to solve some hard problems. Some use cases include calculating the right prices for rides in over 600 cities and applying NLP technologies to customer feedbacks to offer safe rides and reduce support costs. We will look at various options evaluated for productionizing custom models (server based and serverless). We will also look at how DSW integrates into the larger Uber’s ML ecosystem, e.g. model/feature stores and other ML tools, to realize the vision of a complete ML platform for Uber.
Uber - Building Intelligent Applications, Experimental ML with Uber’s Data Sc...Karthik Murugesan
In this talk, we will explore how Uber enables rapid experimentation of machine learning models and optimization algorithms through the Uber’s Data Science Workbench (DSW). DSW covers a series of stages in data scientists’ workflow including data exploration, feature engineering, machine learning model training, testing and production deployment. DSW provides interactive notebooks for multiple languages with on-demand resource allocation and share their works through community features.
It also has support for notebooks and intelligent applications backed by spark job servers. Deep learning applications based on TensorFlow and Torch can be brought into DSW smoothly where resources management is taken care of by the system. The environment in DSW is customizable where users can bring their own libraries and frameworks. Moreover, DSW provides support for Shiny and Python dashboards as well as many other in-house visualization and mapping tools.
In the second part of this talk, we will explore the use cases where custom machine learning models developed in DSW are productionized within the platform. Uber applies Machine learning extensively to solve some hard problems. Some use cases include calculating the right prices for rides in over 600 cities and applying NLP technologies to customer feedbacks to offer safe rides and reduce support costs. We will look at various options evaluated for productionizing custom models (server based and serverless). We will also look at how DSW integrates into the larger Uber’s ML ecosystem, e.g. model/feature stores and other ML tools, to realize the vision of a complete ML platform for Uber.
Throw out everything that you know about security tools today. No more six-figure appliances that only do one thing marginally well. No more proprietary protocols. We deserve better and we demand better. Envision a world where your security tools talk with eachother. They communicate and share data in order to leverage eachothers strengths and and help compensate for their weaknesses. They work together to solve problems. Envision "Symbiotic Security".
Symbiotic Security is a new term that was coined to describe the ability of a tool to consume data from other tools or provide data to other tools. As part of our research, we have examined various classes of tools on the market and identified these abilities in each of them resulting in a label of "Consumer", "Provider", or "Symbiotic". As a consumer of security tools, this completely revolutionizes the way that we make purchases.
As an example, let's pretend that you are purchasing a new Intrusion Prevention System for your enterprise. As you begin to evaluate the various tools from the Gartner Magic Quadrant, you quickly realize that they almost all have the same primary feature set. The key differentiator at this point aren't the rules or the hardware, but rather, the ability for the system to send and receive data with other systems. The IPS itself has some signatures and blocking abilities, but has zero relevancy data. Now, we give the IPS the ability to pull in vulnerability data and system configuration information from network and host scans and we gain relevancy. Add in some additional data on where the potential threat is coming from and now you have the data necessary to take a decisive action on threats. This new system is a "Consumer". Now, if you give the IPS the ability to send information to other devices on things like the source of relevant threats, those devices, like a firewall or HIPS, can now make intelligent blocking decisions as well. Our IPS now has "Provider" abilities. Since our IPS is labeled as both a "Provider" and "Consumer" it is deemed "Symbiotic". This convention can now be used both by the manufacturer to market the value-add of the device as well as a way for the purchasers to differentiate between otherwise similar devices.
In order to demonstrate the true powers of being symbiotic, we are releasing a free tool that epitomizes this concept. The tool, named ThreadFix, has been labeled as a "Consumer" because of it's abilities to pull vulnerability data from static and dynamic scanning tools, threat modeling, and manual penetration tests as well as alert logs and vulnerability details from IDS, IPS, and WAF products. ThreadFix has also been labeled as a "Provider" because of it's abilities to normalize the data consumed and pass it along to IDS, IPS, and WAF for action as well as to your bug tracking system for remediation tracking. Because it can serve both a consumer and provider role, we designate it as a "Symbiotic" tool.
The presentation includes great overview on why and how to track and monitor your cloud infrastructure. It list the different types of cloud monitoring include the underlying infrastructure all the way up the application stack. Here you can find names of relevant tools that can support monitoring cloud online applications.
Profiling PHP - PHPBenelux Unconference track - 2015-01-24Dennis de Greef
Slides of my unconference talk at PHPBenelux on Jan 24th 2015.
This talk covers xdebug, xhprof, xhgui and link0/profiler.
Your application needs to be fast nowadays in order to stand out from the crowd. Study has shown that application performance has a psychological effect on customer satisfaction. Profiling can give you more insight in how your application really works internally. It gives you an overview of where the resource bottlenecks in your application reside. In this talk, I am going to give an overview of some profiling methods that exist today, and where I think we should be heading. After this talk, you will be able to use some basic profiling tricks to analyse the performance constraints in your application.
This talk aims to summarize the typical challenges one encounters in testing mobile applications. At the ThoughtWorks Pune office we have developed multiple mobile applications across various platforms (mobile web, hybrid apps, native apps, apps for tablets etc.). In this talk we will bring together lessons learnt around mobile testing. This talk was done by Vikrant Chauhan and Dubinsky De Soares
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011darach
A presentation delivered to the Erlang User Group in London demonstrating how to embed the erjang implementation of erlang into the StreamBase CEP engine, enabling extending StreamBase with erlang based extensions.
Profiling PHP - WordPress Meetup Nijmegen 2015-03-11Dennis de Greef
Tegenwoordig gaat het er om dat je website zoveel mogelijk functionaliteit bevat, en deze zo snel mogelijk toont.
Dennis gaat een techniek demonstreren waarmee je de performance van je applicatie kunt meten, en pijnpunten gemakkelijk kunt identificeren. Hij loopt door diverse toolsets met de voor- en nadelen van elk, en de toekomstvisie die hij daar bij heeft.
Daarnaast pakt Dennis een bekende WordPress plugin, en gebruik een van deze tools om te demonstreren hoe deze geoptimaliseerd kan worden.
http://www.meetup.com/WordPress-Meetup-Nijmegen/events/219038372/
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
20111104 s4 overview
1. Apache S4: A Distributed Stream
Computing Platform
Presented at Stanford Infolab – Nov 4, 2011
http://incubator.apache.org/projects/s4 (migrating from http://s4.io)
S4 Committers: {fpj, kishoreg, leoneu, mmorel,
robbins}@apache.org
Presented by Leo Neumeyer (@leoneu)
1
2. About Me
Born in Buenos Aires, Argentina, studied EE.
School/Work in Canada (Signal Processing, Speech Coding).
SRI Int'l (Menlo Park) Speech Lab, DARPA benchmarks, lab
founded speech recognition spin-off Nuance Comm Inc.
Mindstech: Startup to teach spoken English in Asia using web
audio/video (before 2-way media was widely available).
Yahoo! Labs: Search advertising (optimization, auctions).
Quantbench: mission is to create a marketplace for data
scientists, data providers, and investment funds.
2
3. S4 Project History
Started as a research project at Yahoo! Labs in August 2008
out of the need to personalize search ads in real-time.
Open sourced in September 2009.
Moved to Apache Incubator in October 2011.
3
4. Motivation
Online Parameter
Personalized Search Twitter Trends
Optimization
given multiple event streams
Predict Market Prices extract information
Spam Filtering
Automatic Trading
using data driven models
in real time
with low latency
Network Intrusion at scale
Detection Sensor Networks
It's Fun!
4
5. S4 Architecture
Node
App
App Server App
App
App PE Prototype
App
App PE Instance
App
App
Stream
App
App
Unlimited There is one Apps An app is a PE instances
number of server process encapsulate graph are clones of
nodes. Each per node. The units of work. composed of the prototype.
node has one server They can PE prototypes They are
process. loads/unloads consume and and streams associated with
apps. produce event that produce, a unique key
streams. consume, and and contain the
transmit msgs. state.
S4 is a general-purpose, real-time, distributed, decentralized, robust, scalable,
event driven, pluggable platform that allows programmers to easily implement
applications for processing continuous unbounded streams of data.
5
6. Latency vs. Accuracy
Zero Errors Real-Time
Latency ➔ Unconstrained ➔ Constrained
Why? ➔ Reproducible results ➔ Limited control over
inbound data rate and
computing complexity
Use ➔ Debug ➔ Process unstructured data
➔ Train Models ➔ Tolerance to small errors
➔ Graceful recovery from
inbound data streams
6
7. Design
Actors programming model.
Probabilistic thinking in both algorithms and systems.
Run on commodity hardware.
All in-memory, no disk bottlenecks.
Pluggable (Protocols, applications, serialization, etc.)
Object oriented design → POJOs
Static typing, no string literals, minimize type casting.
Science friendly → constant change, ease of use.
7
8. Programming Model
Example: estimate click-
through rate in a web
application after applying a
filter to remove bot traffic.
8
10. Research Areas: Systems
Checkpointing strategies
Replication strategies
Dynamic load balancing
Adaptive load management
Query languages
10
11. Fault Tolerance
Problem Approaches S4
High Availability ➔ Warm/hot failover ➔ Warm failover
➔ Cold failover ➔ Standby nodes +
Apache Zookeeper
State Loss ➔ Lossy checkpointing ➔ Lossy checkpointing
➔ Lossless checkpoint.
(Crashes, system
updates)
Low Latency ➔ Decouple stream ➔ Asynchronous writes
processing from ➔ Uncoordinated
checkpointing checkpointing
Approach: checkpoints are count or time based, pluggable backend to
support any data store, lazy PE restore, tuning is application dependent.
Research by M. Morel, F. Junqueira, Yahoo! Research Europe, 2011.
11
13. Research Areas: Algorithms
Self-adaptive models: adaptive language models using small
amounts of data.
Personalization: learn from user feedback (clicks, location,
behavior) to deliver relevant information in RT.
Trend detection: find personal Twitter trends relevant to you.
Intrusion detection: summarize high level state of the network
and detect unusual patterns.
Sensor networks: large amounts of audio/video and other
sources require processing, recognition, detection, and
tracking. Detect events across sensors.
13
14. Personalized Search Ads
Goal is to maximize:
Revenue
Click yield
User experience
By controlling:
Ranking
Pricing
Filtering
Placement
S. Schroedl, A. Kesari, and L. Neumeyer, “Personalized ad placement in web search,” in ADKDD ’10: Proceedings of the 4th Annual
International Workshop on Data Mining and Audience Intelligence for Online Advertising, 2010.
14
15. Personalized Search Ads
Model ad click intent using recent user activity.
More likely to click → show more North ads.
Example 1
First query is digital slr camera
Next query is canon slr
More likely than average to click another ad
Example 2
Repeated query without previous clicks
Less likely to click another ad
15
16. Personalized Search Ads
Modeling user session
Typical features:
Number of searches/clicks by user past 24 hrs
User COPC: Ratio of observed clicks to predicted clicks
Identical query searched before / clicked before
Time (seconds) since last search/click
Similarity measures: current vs. previous queries
Modeling technique: stochastic gradient-descent boosted
trees (GDBT)
16
17. Personalized Search Ads
Target
P[CLICK|ad,query,user]
Approximation
P[CLICK|ad,query]* ucp[user,session]
Non-personalized User Click Propensity (UCP)
long-term model for user session
computed using Hadoop computed using S4
17
18. Personalized Search Ads
Results:
We can reduce the average number of ads (ad footprint) by
7% without decreasing click yield and revenue.
- OR -
For a given ad footprint we can increase click yield by
~2%.
18