The shift to cloud computing means that organizations are undergoing a major shift as they develop scale-out infrastructure that can respond to apace of business change faster than ever before. Opscode Chef® is an open-source systems integration framework build specifically for
automating the cloud by making it easy to deploy and scale servers and applications throughout your infrastructure. Join us for this session
containing an introduction to Chef including:
An Overview of Chef
The Chef Architecture
Cookbook Components
System Integration
Live demo launching a Java Stack on Amazon EC2, Rackspace, Ubuntu, and
CentOS
[Presented as part of the Open Source Build a Cloud program on 2/29/2012 - http://cloudstack.org/about-cloudstack/cloudstack-events.html?categoryid=6]
Getting physical with web bluetooth in the browser hackferenceDan Jenkins
The document discusses getting physical with web Bluetooth in the browser. It describes how the Physical Web and Web Bluetooth APIs allow browsers to detect and interact with Bluetooth Low Energy devices in the real world. The Physical Web allows browsers to detect URLs broadcast by nearby beacons, while Web Bluetooth provides JavaScript APIs to discover, connect to, and read from/write to BLE devices directly from the browser. Examples are given of reading heart rate measurements from a fitness tracker using Web Bluetooth. Requirements and browser support are also outlined.
The document discusses techniques for improving iOS application build performance and reducing executable size in Xamarin applications. It recommends measuring build times, optimizing for the iOS simulator by avoiding rebuilds and file copying, and optimizing for iOS devices by partially linking assemblies, using the LinkerSafe attribute, and leveraging SmartLink and automatic bindings optimizations. Building configurations and deployment tradeoffs are also covered.
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)Wesley Beary
The document discusses how to use the Fog library to interact with cloud services. Fog allows interacting with multiple cloud providers like AWS, Rackspace, etc in a portable way. It provides models, collections, and methods to manage resources like servers, storage, DNS etc. in an abstracted way across providers. The document demonstrates how to boot a server, install SSH keys, run commands via SSH, and ping a target using the Fog and Ruby APIs in just a few lines of code.
The document discusses prototyping applications in the cloud using Platform as a Service providers like Heroku and AppFog. It provides examples of deploying simple applications built with PHP, Django, and Node.js to these services. The document outlines the basic steps to set up, deploy, and manage apps in the cloud, including initializing Git repositories, configuring requirements and dependencies, and using provider-specific command line tools. Deploying to the cloud allows for rapid prototyping, experimentation, and hosting web applications without needing to manage server infrastructure.
Eddystone Beacons - Physical Web - Giving a URL to All ObjectsJeff Prestes
More mobile technologies are empowering people and machines to become more autonomous. In the same way as people, machines need ways to be identified to other sources in a connected environment. This begs the question, why not give a URL to objects? With Eddystone, a new Google specification for Beacon data, this is possible, and it works with both Android and iOS based devices.
With it you can implement what physical-web.org stands
The document discusses using a Raspberry Pi, Beacons, and Node.js to broadcast URLs from physical objects. It provides code examples for setting up a Raspberry Pi with Node.js, installing necessary libraries, and programming the device to broadcast URLs using the Eddystone beacon format over Bluetooth Low Energy. The code parses beacon signals to extract URLs and notifies the user of nearby physical web objects.
The Sling Tracer tool, along with a brand new Chrome Developer plugin, helps introspect the Sling request processing to understand which code paths are executed, what gets written to the repository, which queries are fired, and lots of other details about request execution, in real time.
Slides from adaptTo conference https://adapt.to/2016/en/schedule/hey-sling--what-are-you-doing--sling-tracer-to-the-rescue.html
The shift to cloud computing means that organizations are undergoing a major shift as they develop scale-out infrastructure that can respond to apace of business change faster than ever before. Opscode Chef® is an open-source systems integration framework build specifically for
automating the cloud by making it easy to deploy and scale servers and applications throughout your infrastructure. Join us for this session
containing an introduction to Chef including:
An Overview of Chef
The Chef Architecture
Cookbook Components
System Integration
Live demo launching a Java Stack on Amazon EC2, Rackspace, Ubuntu, and
CentOS
[Presented as part of the Open Source Build a Cloud program on 2/29/2012 - http://cloudstack.org/about-cloudstack/cloudstack-events.html?categoryid=6]
Getting physical with web bluetooth in the browser hackferenceDan Jenkins
The document discusses getting physical with web Bluetooth in the browser. It describes how the Physical Web and Web Bluetooth APIs allow browsers to detect and interact with Bluetooth Low Energy devices in the real world. The Physical Web allows browsers to detect URLs broadcast by nearby beacons, while Web Bluetooth provides JavaScript APIs to discover, connect to, and read from/write to BLE devices directly from the browser. Examples are given of reading heart rate measurements from a fitness tracker using Web Bluetooth. Requirements and browser support are also outlined.
The document discusses techniques for improving iOS application build performance and reducing executable size in Xamarin applications. It recommends measuring build times, optimizing for the iOS simulator by avoiding rebuilds and file copying, and optimizing for iOS devices by partially linking assemblies, using the LinkerSafe attribute, and leveraging SmartLink and automatic bindings optimizations. Building configurations and deployment tradeoffs are also covered.
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)Wesley Beary
The document discusses how to use the Fog library to interact with cloud services. Fog allows interacting with multiple cloud providers like AWS, Rackspace, etc in a portable way. It provides models, collections, and methods to manage resources like servers, storage, DNS etc. in an abstracted way across providers. The document demonstrates how to boot a server, install SSH keys, run commands via SSH, and ping a target using the Fog and Ruby APIs in just a few lines of code.
The document discusses prototyping applications in the cloud using Platform as a Service providers like Heroku and AppFog. It provides examples of deploying simple applications built with PHP, Django, and Node.js to these services. The document outlines the basic steps to set up, deploy, and manage apps in the cloud, including initializing Git repositories, configuring requirements and dependencies, and using provider-specific command line tools. Deploying to the cloud allows for rapid prototyping, experimentation, and hosting web applications without needing to manage server infrastructure.
Eddystone Beacons - Physical Web - Giving a URL to All ObjectsJeff Prestes
More mobile technologies are empowering people and machines to become more autonomous. In the same way as people, machines need ways to be identified to other sources in a connected environment. This begs the question, why not give a URL to objects? With Eddystone, a new Google specification for Beacon data, this is possible, and it works with both Android and iOS based devices.
With it you can implement what physical-web.org stands
The document discusses using a Raspberry Pi, Beacons, and Node.js to broadcast URLs from physical objects. It provides code examples for setting up a Raspberry Pi with Node.js, installing necessary libraries, and programming the device to broadcast URLs using the Eddystone beacon format over Bluetooth Low Energy. The code parses beacon signals to extract URLs and notifies the user of nearby physical web objects.
The Sling Tracer tool, along with a brand new Chrome Developer plugin, helps introspect the Sling request processing to understand which code paths are executed, what gets written to the repository, which queries are fired, and lots of other details about request execution, in real time.
Slides from adaptTo conference https://adapt.to/2016/en/schedule/hey-sling--what-are-you-doing--sling-tracer-to-the-rescue.html
Google Back To Front: From Gears to App Engine and Beyonddion
I had the privilege of giving a Yahoo! Tech Talk at their HQ in Sunnyvale. I spoke on Gears, App Engine, and other technologies such as the Ajax Libraries API and Doctype.
The document provides best practices for handling performance issues in an Odoo deployment. It recommends gathering deployment information, such as hardware specs, number of machines, and integration with web services. It also suggests monitoring tools to analyze system performance and important log details like CPU time, memory limits, and request processing times. The document further discusses optimizing PostgreSQL settings, using tools like pg_activity, pg_stat_statements, and pgbadger to analyze database queries and performance. It emphasizes reproducing issues, profiling code with tools like the Odoo profiler, and fixing problems in an iterative process.
Presentation from Velocity NYC 2014 on setting up private WebPagetest instances
Video: https://www.youtube.com/playlist?list=PLWa0Ky8nXQTaFXpT_YNvLElTEpHUyaZi4
CoreOS in anger : firing up wordpress across a 3 machine CoreOS cluster Shaun Domingo
In this talk at the Sydney CoreOS meetup, I took the audience through:
a) Installation of CoreOS using VirtualBox and Vagrant
b) Items to consider when containerising your platform
c) Deploying wordpress across a CoreOS cluster.
Cross Domain Web Mashups with JQuery and Google App EngineAndy McKay
This document discusses cross-domain mashups using jQuery and Google App Engine. It describes common techniques for dealing with the same-origin policy, including proxies, JSONP, and building sample applications that mashup Twitter data, geotagged tweets, and maps. Examples include parsing RSS feeds from Twitter into JSONP and displaying tweets on a map based on their geotagged locations. The document concludes by noting issues with trust, failures, and limitations for enterprise use.
My presentation about Test Driven Development of HTTP clients using WebMock, presented at RubyManor 2.
WebMock is a library for stubbing and setting expectations on HTTP requests in Ruby.
Progressive Web Apps are one of the hottest things to come to the web platform in years, but how much of it is just hot air? When can you actually start shipping these things? Decades ago! In a hands on presentation, I'll show how PWAs are truly meant to be progressive - building on an evolution of web technologies nearly as old as the web itself, and still let you ship one of the most performant and cutting edge web apps around.
Testing http calls with Webmock and VCRKerry Buckley
Webmock and VCR are tools for stubbing and recording HTTP requests in tests. Webmock allows fine-grained control over stubbing but requires the remote server, while VCR records and replays requests without the server through cassettes. The document provides information on setting up and using both tools to stub or record HTTP requests in different testing frameworks for reliable and isolated tests.
"As an asynchronous event driven JavaScript runtime, Node is designed to build scalable network applications" così si presenta Node.js, piattaforma tecnologica che - grazie alla sua immediatezza e produttività - ha conquistato dapprima startup e piccole aziende, fino a ritagliarsi uno spazio importante in realtà come IBM, LinkedIn, Netflix e Yahoo. La stessa Microsoft ha riconosciuto le potenzialità della piattaforma, tanto da integrare Node.js in Visual Studio Code e nelle ultime release di Visual Studio, oltre a basarci alcuni dei propri servizi di Azure come "Mobile Services" e "Functions".
In questa sessione vedremo come implementare con Node.js alcuni scenari applicativi comuni nell’ambito dello sviluppo web, analizzando quando la sua adozione può portarci vantaggi nel nostro lavoro quotidiano. In conclusione, faremo una breve panoramica architetturale, descrivendo alcuni scenari di cooperazione tra .NET e Node.js nello stesso sistema.
Codice e demo: https://github.com/rucka/CommunityDays2016
Systems Bioinformatics Workshop KeynoteDeepak Singh
This document discusses how data science platforms can be built on cloud computing infrastructure like Amazon Web Services (AWS). It highlights how AWS provides scalable, on-demand computing and storage resources that allow data and compute needs to scale rapidly. Example applications and customer case studies are presented to show how various organizations are using AWS for large-scale data analysis, including genomics, computational fluid dynamics, and more. The document argues that distributed, programmable cloud infrastructure can support new types of data-driven science by providing massive, rapidly scaling resources.
Include technologies covered - Node.js | ORDS | Spatial
Have you ever wanted to track the location of a physical object and trigger an action based on the proximity of that object? The problem is, how to do this without spending a small fortune?
In this session, Blaine walks through using Raspberry Pi to track a small Bluetooth LE beacon. After a short explanation of BLE beacons, he talks about configuring the Pi and adding the software.
Beacon tracking generates lots of data for analysis which needs a home, ie, a database. Using REST APIs, Blaine safely stores the data in a cloud database and demonstrates using spatial queries to find the location of the beacon.
Attendees will come away from this session with the tools to build their own beacon scanning system that won't break the bank.
An introduction to cgroups and cgroupspyvpetersson
This document provides an overview of cgroups and cgroup tools. It begins with an introduction to cgroups, explaining what they are, what can be done with them, terminology, and common resource classes. It then covers using cgroups for CPU, memory, block I/O, and other resources. Finally, it summarizes common cgroup tools like the filesystem interface, libcgroup library, cgroupspy Python library, and integration with systemd and Docker.
How we use Varnish at Opera Software, from the beginning (2009) to now.
Presentation hold for the 5th Varnish Users Group meeting (VUG5) held in Paris on March 22nd 2012.
This session was presented at MacTech 2014 in Los Angeles, California. Session description follows:
Git and GitHub have changed the way we can collaborate with others on code-based projects, but it can be intimidating at first. How does this all work? We will cover the basics of Git and how to escape some of its pitfalls, and we will review some of the tools and processes available to those wanting to start or contribute to an open-source project, which isn't Git-specific. Writing code is only part of it!
The document describes the author's experience deploying and configuring Varnish caching at Opera over many years. Some key points discussed include:
- Initial deployment in 2009 caching static assets for My Opera, which grew to serve 15% of requests
- Troubleshooting issues like session mixing and unauthorized access
- Implementing caching for dynamic pages like the front page while respecting cookies and languages
- Decentralizing caching to multiple data centers for lower latency globally
- Generating and caching thumbnails on-the-fly to handle frequent design changes
- Developing a more generic "shields-up" configuration to cache unpopular content securely
- Ongoing work caching APIs and content on other
RUM isn’t just for page level metrics anymore. Thanks to modern browser updates and new techniques we can collect real user data at the object level, finding slow page components and keeping third parties honest.
In this talk we will show you how to use Resource Timing, User Timing, and other browser tricks to time the most important components in your page. We’ll also share recipes for several of the web’s most popular third parties. This will give you a head start on measuring object level performance on your own site.
The document describes Divolte Collector, a tool for collecting clickstream data from web servers and streaming it to Apache Hadoop and Kafka in a structured format. It parses web server log files and tags pages with JavaScript to collect data on user behavior. The data is mapped to Avro schemas for interoperability and enriched with information like geolocation before being sent to event transports. This allows for real-time analytics on user behavior as well as batch processing and training of machine learning models.
The document describes Divolte Collector, a tool for parsing and collecting structured event data from HTTP server logs and tags in real-time. It discusses options for accessing log/tag data, including parsing logs in Hadoop, streaming logs, and instrumenting pages with tags. It then outlines Divolte Collector's tag-based approach, how it structures and maps data, and how the tool can be configured to output events to Kafka or HDFS.
Defeating Cross-Site Scripting with Content Security Policy (updated)Francois Marier
How a new HTTP response header can help increase the depth of your web application defenses.
Also includes a few slides on HTTP Strict Transport Security, a header which helps protects HTTPS sites from sslstrip attacks.
The document provides an introduction to web application security and the Damn Vulnerable Web Application (DVWA). It discusses common web vulnerabilities like cross-site scripting (XSS), SQL injection, and information leakage. It demonstrates how to find and exploit these vulnerabilities in DVWA, including stealing cookies, extracting database information, and creating a backdoor PHP shell. The document is intended to educate users about web security risks and show how hackers can compromise applications.
HTML5 introduces many new features for web pages and applications, including semantic HTML tags, media elements, canvas drawing, geolocation, offline storage, and forms validation. The HTML5 specification from the W3C is over 900 pages and introduces these new features to enhance the capabilities of web technologies going forward.
Google Back To Front: From Gears to App Engine and Beyonddion
I had the privilege of giving a Yahoo! Tech Talk at their HQ in Sunnyvale. I spoke on Gears, App Engine, and other technologies such as the Ajax Libraries API and Doctype.
The document provides best practices for handling performance issues in an Odoo deployment. It recommends gathering deployment information, such as hardware specs, number of machines, and integration with web services. It also suggests monitoring tools to analyze system performance and important log details like CPU time, memory limits, and request processing times. The document further discusses optimizing PostgreSQL settings, using tools like pg_activity, pg_stat_statements, and pgbadger to analyze database queries and performance. It emphasizes reproducing issues, profiling code with tools like the Odoo profiler, and fixing problems in an iterative process.
Presentation from Velocity NYC 2014 on setting up private WebPagetest instances
Video: https://www.youtube.com/playlist?list=PLWa0Ky8nXQTaFXpT_YNvLElTEpHUyaZi4
CoreOS in anger : firing up wordpress across a 3 machine CoreOS cluster Shaun Domingo
In this talk at the Sydney CoreOS meetup, I took the audience through:
a) Installation of CoreOS using VirtualBox and Vagrant
b) Items to consider when containerising your platform
c) Deploying wordpress across a CoreOS cluster.
Cross Domain Web Mashups with JQuery and Google App EngineAndy McKay
This document discusses cross-domain mashups using jQuery and Google App Engine. It describes common techniques for dealing with the same-origin policy, including proxies, JSONP, and building sample applications that mashup Twitter data, geotagged tweets, and maps. Examples include parsing RSS feeds from Twitter into JSONP and displaying tweets on a map based on their geotagged locations. The document concludes by noting issues with trust, failures, and limitations for enterprise use.
My presentation about Test Driven Development of HTTP clients using WebMock, presented at RubyManor 2.
WebMock is a library for stubbing and setting expectations on HTTP requests in Ruby.
Progressive Web Apps are one of the hottest things to come to the web platform in years, but how much of it is just hot air? When can you actually start shipping these things? Decades ago! In a hands on presentation, I'll show how PWAs are truly meant to be progressive - building on an evolution of web technologies nearly as old as the web itself, and still let you ship one of the most performant and cutting edge web apps around.
Testing http calls with Webmock and VCRKerry Buckley
Webmock and VCR are tools for stubbing and recording HTTP requests in tests. Webmock allows fine-grained control over stubbing but requires the remote server, while VCR records and replays requests without the server through cassettes. The document provides information on setting up and using both tools to stub or record HTTP requests in different testing frameworks for reliable and isolated tests.
"As an asynchronous event driven JavaScript runtime, Node is designed to build scalable network applications" così si presenta Node.js, piattaforma tecnologica che - grazie alla sua immediatezza e produttività - ha conquistato dapprima startup e piccole aziende, fino a ritagliarsi uno spazio importante in realtà come IBM, LinkedIn, Netflix e Yahoo. La stessa Microsoft ha riconosciuto le potenzialità della piattaforma, tanto da integrare Node.js in Visual Studio Code e nelle ultime release di Visual Studio, oltre a basarci alcuni dei propri servizi di Azure come "Mobile Services" e "Functions".
In questa sessione vedremo come implementare con Node.js alcuni scenari applicativi comuni nell’ambito dello sviluppo web, analizzando quando la sua adozione può portarci vantaggi nel nostro lavoro quotidiano. In conclusione, faremo una breve panoramica architetturale, descrivendo alcuni scenari di cooperazione tra .NET e Node.js nello stesso sistema.
Codice e demo: https://github.com/rucka/CommunityDays2016
Systems Bioinformatics Workshop KeynoteDeepak Singh
This document discusses how data science platforms can be built on cloud computing infrastructure like Amazon Web Services (AWS). It highlights how AWS provides scalable, on-demand computing and storage resources that allow data and compute needs to scale rapidly. Example applications and customer case studies are presented to show how various organizations are using AWS for large-scale data analysis, including genomics, computational fluid dynamics, and more. The document argues that distributed, programmable cloud infrastructure can support new types of data-driven science by providing massive, rapidly scaling resources.
Include technologies covered - Node.js | ORDS | Spatial
Have you ever wanted to track the location of a physical object and trigger an action based on the proximity of that object? The problem is, how to do this without spending a small fortune?
In this session, Blaine walks through using Raspberry Pi to track a small Bluetooth LE beacon. After a short explanation of BLE beacons, he talks about configuring the Pi and adding the software.
Beacon tracking generates lots of data for analysis which needs a home, ie, a database. Using REST APIs, Blaine safely stores the data in a cloud database and demonstrates using spatial queries to find the location of the beacon.
Attendees will come away from this session with the tools to build their own beacon scanning system that won't break the bank.
An introduction to cgroups and cgroupspyvpetersson
This document provides an overview of cgroups and cgroup tools. It begins with an introduction to cgroups, explaining what they are, what can be done with them, terminology, and common resource classes. It then covers using cgroups for CPU, memory, block I/O, and other resources. Finally, it summarizes common cgroup tools like the filesystem interface, libcgroup library, cgroupspy Python library, and integration with systemd and Docker.
How we use Varnish at Opera Software, from the beginning (2009) to now.
Presentation hold for the 5th Varnish Users Group meeting (VUG5) held in Paris on March 22nd 2012.
This session was presented at MacTech 2014 in Los Angeles, California. Session description follows:
Git and GitHub have changed the way we can collaborate with others on code-based projects, but it can be intimidating at first. How does this all work? We will cover the basics of Git and how to escape some of its pitfalls, and we will review some of the tools and processes available to those wanting to start or contribute to an open-source project, which isn't Git-specific. Writing code is only part of it!
The document describes the author's experience deploying and configuring Varnish caching at Opera over many years. Some key points discussed include:
- Initial deployment in 2009 caching static assets for My Opera, which grew to serve 15% of requests
- Troubleshooting issues like session mixing and unauthorized access
- Implementing caching for dynamic pages like the front page while respecting cookies and languages
- Decentralizing caching to multiple data centers for lower latency globally
- Generating and caching thumbnails on-the-fly to handle frequent design changes
- Developing a more generic "shields-up" configuration to cache unpopular content securely
- Ongoing work caching APIs and content on other
RUM isn’t just for page level metrics anymore. Thanks to modern browser updates and new techniques we can collect real user data at the object level, finding slow page components and keeping third parties honest.
In this talk we will show you how to use Resource Timing, User Timing, and other browser tricks to time the most important components in your page. We’ll also share recipes for several of the web’s most popular third parties. This will give you a head start on measuring object level performance on your own site.
The document describes Divolte Collector, a tool for collecting clickstream data from web servers and streaming it to Apache Hadoop and Kafka in a structured format. It parses web server log files and tags pages with JavaScript to collect data on user behavior. The data is mapped to Avro schemas for interoperability and enriched with information like geolocation before being sent to event transports. This allows for real-time analytics on user behavior as well as batch processing and training of machine learning models.
The document describes Divolte Collector, a tool for parsing and collecting structured event data from HTTP server logs and tags in real-time. It discusses options for accessing log/tag data, including parsing logs in Hadoop, streaming logs, and instrumenting pages with tags. It then outlines Divolte Collector's tag-based approach, how it structures and maps data, and how the tool can be configured to output events to Kafka or HDFS.
Defeating Cross-Site Scripting with Content Security Policy (updated)Francois Marier
How a new HTTP response header can help increase the depth of your web application defenses.
Also includes a few slides on HTTP Strict Transport Security, a header which helps protects HTTPS sites from sslstrip attacks.
The document provides an introduction to web application security and the Damn Vulnerable Web Application (DVWA). It discusses common web vulnerabilities like cross-site scripting (XSS), SQL injection, and information leakage. It demonstrates how to find and exploit these vulnerabilities in DVWA, including stealing cookies, extracting database information, and creating a backdoor PHP shell. The document is intended to educate users about web security risks and show how hackers can compromise applications.
HTML5 introduces many new features for web pages and applications, including semantic HTML tags, media elements, canvas drawing, geolocation, offline storage, and forms validation. The HTML5 specification from the W3C is over 900 pages and introduces these new features to enhance the capabilities of web technologies going forward.
This document discusses responsive image techniques for adaptive web design. It begins by explaining browser sniffing versus feature testing, and recommends using feature testing to determine browser width, screen resolution, and bandwidth instead of browser sniffing. It then covers techniques like using background-size to control image sizes, SVG for smaller file sizes, and font-based solutions. The document also discusses server-side techniques like .htaccess rewrite rules and client-side techniques like picture and HiSRC. It advocates for a mobile-first approach using CSS media queries and a single pixel GIF for responsive images.
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014Amazon Web Services
Log data contains some of the most valuable raw information you can gather and analyze about your infrastructure and applications. Amid the mess of confusing lines of seemingly random text can be hints about performance, security, flaws in code, user access patterns, and other operational data. Without the proper tools, finding insights in these logs can be like searching for a hay-colored needle in a haystack. In this session you learn what practices and patterns you can easily implement that can help you better understand your log files. You see how you can customize web logs to add more information to them, how to digest logs from around your infrastructure, and how to analyze your log files in near real time.
This document discusses optimizing Meetup's performance by reducing page load times. It recommends reducing JavaScript, image, DOM, and CSS files. Specific techniques include externalizing and concatenating JavaScript, lazy loading images and scripts, minimizing DOM elements, writing efficient CSS selectors, and profiling code to optimize loops and DOM manipulation. Reducing page weight through these techniques can improve the user experience by speeding up load times and drop in member activity.
HTML5 is a language for structuring and presenting content for the World Wide Web. it is the fifth revision of the HTML standard (created in 1990 and standardized as HTML4 as of 1997) and as of February 2012 is still under development. Its core aims have been to improve the language with support for the latest multimedia while keeping it easily readable by humans and consistently understood by computers and devices (web browsers, parsers, etc.). It improves interoperability and reduces development costs by making precise rules on how to handle all HTML elements, and how to recover from errors
Stream processing in Mercari - Devsumi 2015 autumn LTMasahiro Nagano
This document discusses Mercari's use of stream processing to monitor logs and metrics. It describes how Mercari previously used scripts to parse logs periodically, which was inefficient. Mercari now uses Norikra, an open source stream processing tool, to ingest logs and metrics in real-time and perform analytics using SQL queries. Norikra provides benefits over their previous approach like no need to restart processes and the ability for any engineer to write SQL queries. The results are then sent to monitoring tools like Mackerel for alerting and graphing.
This document discusses various techniques for responsive images in web design, including browser sniffing versus feature testing, image sizes for different screen resolutions and bandwidths, and different implementation methods like .htaccess files, the <picture> element, and JavaScript libraries. It covers topics like using the browser width to determine layouts, screen resolution detection, and bandwidth testing. Workarounds discussed include using background images, SVGs, icon fonts, and compressed JPEGs. The document advocates a mobile-first approach and using CSS media queries to adapt designs based on screen size.
In this, my talk for Webinale in Berlin, June 1st 2011, I give an overview of HTML5 history and main features, relating it all back to how possible it is use develop with these new features today. Thanks to Patrick Lauke for allowing me to steal a lot of his slides ;-)
1. HTML5 provides new semantic elements like <header>, <footer>, and <nav> that allow for more structured markup. It also extends existing APIs and adds new APIs for multimedia, forms, and building web applications.
2. HTML5 introduces multimedia elements <video> and <audio> that allow embedded video and audio without plugins. It also includes the <canvas> element for scriptable drawing.
3. HTML5 includes new APIs for building powerful web applications, including geolocation, offline application caching, local storage, and databases. However, browser support is still evolving so these should be used carefully with feature detection.
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixelliando dias
This document discusses Apache Camel, an open source framework for integration patterns and enterprise integration. It provides examples of how to use Camel to implement common integration patterns like message filtering, routing, and transformation using XML configuration or Java code. It also explains how to use Camel with other technologies like ActiveMQ, Spring, and ServiceMix.
Capybara is a tool for automated user interaction testing of web applications. It allows automating browser interactions like clicking links, filling forms, and making requests. It works with several test frameworks and drivers to test against different environments. It provides a domain-specific language for describing tests in a readable way and has features for navigation, interaction, querying, finding elements, debugging, and configuration. Some examples of using it include testing a font generation application and ensuring loading states display correctly. Potential issues include slowness with some drivers and handling of dialog boxes and new windows.
1 Web Page Foundations Overview This lab walk.docxhoney725342
1
Web Page Foundations
Overview
This lab walks you through creating and deploying a simple web page. The web page you create in this
lab will have no functionality yet. It just contains many of the html elements you will see on most web
pages today. We will turn this web page into a working web application next week. A text editor will be
used to create the web page. You are welcome to use an html editor or Integrated Development
Environment (IDE) to help you generate the web pages if you like. Please be sure you have read the
“Creating Web Pages” competencies prior to completing this Lab. The online textbook has many html
code examples that will help you become comfortable with the most popular html tags.
Learning Outcomes:
At the completion of the lab you should be able to:
1. Create a web page comprised of formatted text, images, lists, tables, hyperlinks and forms.
2. Review and analyze Apache Web server logs notating http access, http methods and http error
codes
Lab Submission Requirements:
After completing this lab, you will submit a word (or PDF) document that meets all of the requirements in
the description at the end of this document. In addition several html and image files along with the
Apache2 access.log file will be submitted. You can submit all files in a zip file.
Virtual Machine Account Information
Your Virtual Machine has been preconfigured with all of the software you will need for this class. The
default username and password are:
Username : umucsdev
Password: umuc$d8v
Part 1 – Create a Web page
We will use the gedit text editor to create the web page. The web page will resemble a company home
page with an introduction, some formatted text, links to other web pages, images and a form designed
to gather customer information.
1. Assuming you have already launched and logged into your SDEV32Bit Virtual Machine (VM)
from the Oracle VirtualBox, click on the gedit icon found on the left side of the screen of your
VM.
2
2. After clicking the terminal icon a terminal will appear
Click to open text editor
3
3. To create a new document just begin typing or copying and pasting the html code from the
examples. We will create the web page in several steps adding a few paragraphs and sections at
time. Viewing the web page between each step will help minimize errors in the html code. To
add the first section of the html web page copy and paste the following html code into the gedit
editor:
<!DOCTYPE html>
<!-- CNShome.html -->
<!-- Jan 22, XXXX -->
<html>
<head>
<title>Computer Security Home Page </title>
</head>
<body>
<h1>Welcome to Computer Security Consultants! </h1>
<p>
</body>
</html>
Save the file in the /var/www/html/week2 folder in a file named CNShome.html. Note, you may need to
create a folder named week2. Recall the /var/www/html is the location of the Apache2 web server html
files. Creating ...
The document provides information on analyzing web application attacks from server logs. It begins with statistics on common targets and attacks. It then explains how to read information from server access logs, including the client IP, request details, and user agent. Tools for log analysis like Splunk and ELK are listed. The document concludes with recommendations for defending websites, such as securing coding practices, using a web application firewall, and conducting penetration testing.
This document describes an integration framework and its components. It includes:
- FUSE ESB as the integration bus based on JBI and OSGi standards.
- ActiveMQ as the message broker based on JMS.
- CXF for creating or consuming web services.
- Camel as the mediation router for creating integration patterns with a simple Java or XML DSL.
- Details on configuring ActiveMQ and Camel within a OSGi container.
- Code examples of using Camel routes and processors to integrate and transform messages between endpoints.
Similar to Prototyping online ML with Divolte Collector (20)
This document describes how latent Dirichlet allocation (LDA) was used to model endorsement data from Booking.com's Destination Finder project in order to optimize user engagement. LDA was applied to model endorsements from over 10 million real user endorsements as mixtures of latent topics. Some key topics discovered included shopping, museums, and culture/temples. Mapping destinations and users to the topic mixtures allowed for personalized recommendations. While LDA worked well, there were challenges with sparse, ambiguous, and competing endorsements that required further analysis and optimization of the model.
This talk provides an overview of Apache Spark, a tool for distributed computing. It describes Resilient Distributed Datasets (RDDs) as Spark's core data structure, which are immutable and distributed collections of records across a cluster. RDDs support transformations and actions, where transformations create new RDDs and actions return data or materialize an RDD. The talk highlights Spark's language bindings for Java, Scala, and Python and its ability to work interactively from a shell. It also notes Spark's compatibility with Hadoop and ability to deploy on YARN and read from HDFS.
This document describes building a real-time search suggestions system using open source tools like Elasticsearch, Hadoop, Redis, Flume, and Node.js. Logs are collected using Flume and processed through two MapReduce jobs - the first counts search terms by time bucket, and the second calculates scores and ranks terms. Results are stored in Redis with timeouts. Improvements discussed include using request diffs to update only changed results and adding more signals like click data to improve suggestions.
This document discusses analyzing networks using Hadoop and Neo4j. It demonstrates transforming raw network data into nodes and edges that can be imported into Neo4j. It then shows how to run graph queries in Cypher to analyze the network data and find meaningful patterns. The document also provides sample nodes, edges, and Cypher queries run on the network data.
NoSQL War Stories preso: Hadoop and Neo4j for networksfvanvollenhoven
This document contains a BGP routing table dump. It includes information about the origin AS, prefix, path, and neighbor AS for a route advertised from IP address 195.66.224.97. The path includes AS1299, AS6461, AS9318, and AS38091. The origin is marked as EGP (Exterior Gateway Protocol).
This document summarizes four NoSQL databases: MongoDB, Neo4j, HBase, and Riak. It provides examples of how each database stores and queries data. The document concludes with an exercise that divides attendees into groups to work with each database using a provided Twitter data set.
Millions of internet packets are sent each day to connect devices and route traffic on the global network. The internet relies on protocols like BGP to exchange routing information between nodes. Hadoop and HDFS provide a scalable way to store and process large amounts of unstructured data across clusters of machines. Users can launch Hadoop clusters in AWS using tools like Whirr to run analytics jobs without managing hardware.
This document provides an overview of Hadoop and MapReduce. It discusses how Hadoop uses HDFS for distributed storage and replication of data blocks across commodity servers. It also explains how MapReduce allows for massively parallel processing of large datasets by splitting jobs into mappers and reducers. Mappers process data blocks in parallel and generate intermediate key-value pairs, which are then sorted and grouped by the reducers to produce the final results.
This document discusses using Hadoop and HBase to analyze internet data. It describes collecting Border Gateway Protocol (BGP) updates from multiple data collection points and storing the raw data in Hadoop Distributed File System (HDFS). MapReduce jobs are used to create derived datasets from the raw data and insert them into HBase tables. The data is queried using an online query system. Tuning aspects like memory settings, disk configuration, and garbage collection are also covered.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
1. GoDataDriven
PROUDLY PART OF THE XEBIA GROUP
@fzk
frisovanvollenhoven@godatadriven.com
Online Machine Learning
with Divolte Collector
Friso van Vollenhoven
CTO
2. How do we use our data?
•Ad hoc
•Batch
•Streaming
9. Tagging
•Not a new idea (Google Analytics, Omniture,
etc.)
•Less garbage traffic, because a browser is
required to evaluate the tag
•Event logging is asynchronous
•Easier to do inflight processing (apply a schema,
add enrichments, etc.)
•Allows for custom events (other than page view)
10. Also…
•Manage session through cookies on the client
side
•Incoming data is already sessionised
•Extract additional information from clients
•Screen resolution
•Viewport size
•Timezone
12. Javascript based tag
<body>
<!--
Your page content here.
-->
<!--
Include Divolte Collector
just before the closing
body tag
-->
<script src="//example.com/divolte.js"
defer async>
</script>
</body>
18. Useful performance
Requests per second: 14010.80 [#/sec] (mean)
Time per request: 0.571 [ms] (mean)
Time per request: 0.071 [ms] (mean, across all concurrent requests)
Transfer rate: 4516.55 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 0 0 0.2 0 3
Waiting: 0 0 0.2 0 3
Total: 0 1 0.2 1 3
Percentage of the requests served within a certain time (ms)
50% 1
66% 1
75% 1
80% 1
90% 1
95% 1
98% 1
99% 1
100% 3 (longest request)
19. Custom events
divolte.signal('addToBasket', {
productId: 309125,
count: 1
})
In the page (Javascript)
map eventParameter('productId') onto 'basketProductId'
map eventParameter('count') onto 'basketNumProducts'
In the mapping (Groovy)
27. Approach
1. Pick n images randomly
2. Optimise displayed image using bandit optimisation
3. After X iterations:
•Pick n / 2 new images randomly
•Select n / 2 images from existing set using learned
distribution
•Construct new set of images using half of existing
set and newly selected random images
4. Goto 2
28. Bayesian Bandits
•For each image, keep track of:
•Number of impressions
•Number of clicks
•When serving an image:
•Draw a random number from a Beta
distribution with parameters alpha = # of clicks,
beta = # of impressions, for each image
•Show image where sample value is largest
30. Prototype UI
class HomepageHandler(ShopHandler):
@coroutine
def get(self):
# Hard-coded ID for a pretty flower.
# Later this ID will be decided by the bandit optmization.
winner = '15442023790'
# Grab the item details from our catalog service.
top_item = yield self._get_json('catalog/item/%s' % winner)
# Render the homepage
self.render(
'index.html',
top_item=top_item)
31. Prototype UI
<div class="col-md-6">
<h4>Top pick:</h4>
<p>
<!-- Link to the product page with a source identifier for tracking -->
<a href="/product/{{ top_item['id'] }}/#/?source=top_pick">
<img class="img-responsive img-rounded" src="{{ top_item['variants']['Medium']['img_source'] }}">
<!-- Signal that we served an impression of this image -->
<script>divolte.signal('impression', { source: 'top_pick', productId: '{{ top_item['id'] }}'})</script>
</a>
</p>
<p>
Photo by {{ top_item['owner']['real_name'] or top_item['owner']['user_name']}}
</p>
</div>
32. Data collection in Divolte Collector
{
"name": "source",
"type": ["null", "string"],
"default": null
}
def locationUri = parse location() to uri
when eventType().equalTo('pageView') apply {
def fragmentUri = parse locationUri.rawFragment() to uri
map fragmentUri.query().value('source') onto 'source'
}
when eventType().equalTo('impression') apply {
map eventParameters().value('productId') onto 'productId'
map eventParameters().value('source') onto 'source'
}
34. Consuming Kafka in Python
def start_consumer(args):
# Load the Avro schema used for serialization.
schema = avro.schema.Parse(open(args.schema).read())
# Create a Kafka consumer and Avro reader. Note that
# it is trivially possible to create a multi process
# consumer.
consumer = KafkaConsumer(args.topic,
client_id=args.client,
group_id=args.group,
metadata_broker_list=args.brokers)
reader = avro.io.DatumReader(schema)
# Consume messages.
for message in consumer:
handle_event(message, reader)
35. Consuming Kafka in Python
def handle_event(message, reader):
# Decode Avro bytes into a Python dictionary.
message_bytes = io.BytesIO(message.value)
decoder = avro.io.BinaryDecoder(message_bytes)
event = reader.read(decoder)
# Event logic.
if 'top_pick' == event['source'] and 'pageView' == event['eventType']:
# Register a click.
redis_client.hincrby(
ITEM_HASH_KEY,
CLICK_KEY_PREFIX + ascii_bytes(event['productId']),
1)
elif 'top_pick' == event['source'] and 'impression' == event['eventType']:
# Register an impression and increment experiment count.
p = redis_client.pipeline()
p.incr(EXPERIMENT_COUNT_KEY)
p.hincrby(
ITEM_HASH_KEY,
IMPRESSION_KEY_PREFIX + ascii_bytes(event['productId']),
1)
experiment_count, ingnored = p.execute()
if experiment_count == REFRESH_INTERVAL:
refresh_items()
36. def refresh_items():
# Fetch current model state. We convert everything to str.
current_item_dict = redis_client.hgetall(ITEM_HASH_KEY)
current_items = numpy.unique([k[2:] for k in current_item_dict.keys()])
# Fetch random items from ElasticSearch. Note we fetch more than we need,
# but we filter out items already present in the current set and truncate
# the list to the desired size afterwards.
random_items = [
ascii_bytes(item)
for item in random_item_set(NUM_ITEMS + NUM_ITEMS - len(current_items) // 2)
if not item in current_items][:NUM_ITEMS - len(current_items) // 2]
# Draw random samples.
samples = [
numpy.random.beta(
int(current_item_dict[CLICK_KEY_PREFIX + item]),
int(current_item_dict[IMPRESSION_KEY_PREFIX + item]))
for item in current_items]
# Select top half by sample values. current_items is conveniently
# a Numpy array here.
survivors = current_items[numpy.argsort(samples)[len(current_items) // 2:]]
# New item set is survivors plus the random ones.
new_items = numpy.concatenate([survivors, random_items])
# Update model state to reflect new item set. This operation is atomic
# in Redis.
p = redis_client.pipeline(transaction=True)
p.set(EXPERIMENT_COUNT_KEY, 1)
p.delete(ITEM_HASH_KEY)
for item in new_items:
p.hincrby(ITEM_HASH_KEY, CLICK_KEY_PREFIX + item, 1)
p.hincrby(ITEM_HASH_KEY, IMPRESSION_KEY_PREFIX + item, 1)
p.execute()
37. Serving a recommendation
class BanditHandler(web.RequestHandler):
redis_client = None
def initialize(self, redis_client):
self.redis_client = redis_client
@gen.coroutine
def get(self):
# Fetch model state.
item_dict = yield gen.Task(self.redis_client.hgetall, ITEM_HASH_KEY)
items = numpy.unique([k[2:] for k in item_dict.keys()])
# Draw random samples.
samples = [
numpy.random.beta(
int(item_dict[CLICK_KEY_PREFIX + item]),
int(item_dict[IMPRESSION_KEY_PREFIX + item]))
for item in items]
# Select item with largest sample value.
winner = items[numpy.argmax(samples)]
self.write(winner)