Live screencast on my tech blog (fr speaking):
http://www.mathieu-elie.net/screencast-video-d3-js-sinatra-elasticsearch-capucine/
other tech slides at my blog: http://www.mathieu-elie.net
This document discusses various tools for data visualization, including D3.js, WebGL, the ELK stack, R, Processing, Open Refine, and 3D printing. It provides examples of visualizations created with each tool and suggests when each tool may be best to use. D3.js is described as a low-level library that provides full control but requires more work, while tools like the ELK stack allow for quickly visualizing system and business data. R is presented as useful for exploring and analyzing large datasets, and Open Refine is recommended for cleaning and preparing CSV files for export.
The document discusses using Elasticsearch, D3.js, Angular.js, and Google Refine to create a full stack data visualization of open data from Bordeaux, France. It focuses on data from the CAPC contemporary art museum, importing the data into Elasticsearch for scalable search and then using D3.js, Angular.js, and Yeoman to build the front-end visualization with JavaScript. The goal is to make the data more accessible and understandable through interactive visualization.
This document introduces Scrapy, an open source and collaborative framework for extracting data from websites. It discusses what Scrapy is used for, its advantages over alternatives like Beautiful Soup, and provides steps to install Scrapy and create a sample scraping project. The sample project scrapes review data from The Verge website, including the title, number of comments, and author for the first 5 review pages. The document concludes by explaining how to run the spider and store the extracted data in a file.
The document discusses using the Scrapy framework in Python for web scraping. It begins with an introduction to web scraping and why Python is useful for it. It then provides an overview of Scrapy, including what problems it solves and how to get started. Specific examples are shown for scraping sushi images from Flickr using Scrapy spiders, items, pipelines, and settings. The spider constructs URLs for each image from Flickr API data and yields requests to download the images via the pipeline into an images folder.
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
In the age of information and big data, ability to quickly and easily find a needle in a haystack is extremely important. Elasticsearch is a distributed and scalable search engine which provides rich and flexible search capabilities. Social networks (Facebook, LinkedIn), media services (Netflix, SoundCloud), Q&A sites (StackOverflow, Quora, StackExchange) and even GitHub - they all find data for you using Elasticsearch. In conjunction with Logstash and Kibana, Elasticsearch becomes a powerful log engine which allows to process, store, analyze, search through and visualize your logs.
Video: https://www.youtube.com/watch?v=GL7xC5kpb-c
Scripts for the Demo: https://github.com/opanchenko/morning-at-lohika-ELK
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
This document discusses various tools for data visualization, including D3.js, WebGL, the ELK stack, R, Processing, Open Refine, and 3D printing. It provides examples of visualizations created with each tool and suggests when each tool may be best to use. D3.js is described as a low-level library that provides full control but requires more work, while tools like the ELK stack allow for quickly visualizing system and business data. R is presented as useful for exploring and analyzing large datasets, and Open Refine is recommended for cleaning and preparing CSV files for export.
The document discusses using Elasticsearch, D3.js, Angular.js, and Google Refine to create a full stack data visualization of open data from Bordeaux, France. It focuses on data from the CAPC contemporary art museum, importing the data into Elasticsearch for scalable search and then using D3.js, Angular.js, and Yeoman to build the front-end visualization with JavaScript. The goal is to make the data more accessible and understandable through interactive visualization.
This document introduces Scrapy, an open source and collaborative framework for extracting data from websites. It discusses what Scrapy is used for, its advantages over alternatives like Beautiful Soup, and provides steps to install Scrapy and create a sample scraping project. The sample project scrapes review data from The Verge website, including the title, number of comments, and author for the first 5 review pages. The document concludes by explaining how to run the spider and store the extracted data in a file.
The document discusses using the Scrapy framework in Python for web scraping. It begins with an introduction to web scraping and why Python is useful for it. It then provides an overview of Scrapy, including what problems it solves and how to get started. Specific examples are shown for scraping sushi images from Flickr using Scrapy spiders, items, pipelines, and settings. The spider constructs URLs for each image from Flickr API data and yields requests to download the images via the pipeline into an images folder.
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...Oleksiy Panchenko
In the age of information and big data, ability to quickly and easily find a needle in a haystack is extremely important. Elasticsearch is a distributed and scalable search engine which provides rich and flexible search capabilities. Social networks (Facebook, LinkedIn), media services (Netflix, SoundCloud), Q&A sites (StackOverflow, Quora, StackExchange) and even GitHub - they all find data for you using Elasticsearch. In conjunction with Logstash and Kibana, Elasticsearch becomes a powerful log engine which allows to process, store, analyze, search through and visualize your logs.
Video: https://www.youtube.com/watch?v=GL7xC5kpb-c
Scripts for the Demo: https://github.com/opanchenko/morning-at-lohika-ELK
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
This document discusses building a social analytics tool using MongoDB from a developer's perspective. It covers using MongoDB for its schema-less data and ability to handle fast read-write operations. Key topics include using aggregation queries to gain insights from data by chaining queries together and filtering/manipulating results at each stage. JavaScript capabilities in MongoDB allow applying business logic directly to data. Examples demonstrate removing garbage data and stopwords. Indexes, current progress, and tips/tricks learned around cloning collections and removing vs dropping are also covered, with a demo planned.
ELK is a stack consisting of the open source tools Elasticsearch, Logstash, and Kibana. Elasticsearch provides a distributed, multitenant-capable full-text search engine. Logstash is used to collect, process, and forward events and log messages. Kibana provides visualization capabilities on top of Elasticsearch. The document discusses how each tool in the ELK stack works and can be configured using inputs, filters, and outputs in Logstash or through the Elasticsearch REST API. It also provides examples of using ELK for log collection, processing, and visualization.
This document provides summaries of NoSQL databases MongoDB, ElasticSearch, and Couchbase. It discusses their key features and uses cases. MongoDB is a document-oriented database that stores data in JSON-like documents. ElasticSearch is a search engine and stores data in JSON documents for real-time search and analytics capabilities. Couchbase is a key-value store that provides high-performance access to data through caching and supports high concurrency.
Foursquare uses Luigi to manage their complex data workflows. Luigi allows them to define tasks with dependencies in Python code rather than XML, making the workflows easier to write, test, visualize, and reuse components of. It also avoids wasted time from Cron jobs waiting and helps ensure tasks are only run once through its centralized scheduler. This provides a more robust replacement for both Cron jobs and Oozie workflows at Foursquare.
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
This document discusses data-driven development and the technologies used in the data analytics process. It covers topics like data collection, storage, processing, and visualization. The document advocates using managed cloud services for data and analytics to focus on data instead of managing infrastructure. Choosing technologies should be based on the type of data and problems to solve, not the other way around. Services like Google BigQuery, Amazon Redshift, and Treasure Data are recommended for their ease of use.
Lightning talk: elasticsearch at CogentaYann Cluchey
This document discusses how Elasticsearch is used by Cogenta to power their real-time retail intelligence platform. It tracks hundreds of eCommerce sites daily, organizing large amounts of data into a high-quality market view. Elasticsearch allows Cogenta to scale their processing and analytics capabilities, provide high availability, and power various use cases like logging, internal analytics, and reporting.
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
My workshop at the Learning Analytics Summer Institute (LASI) 2016: http://lasi16.snola.es/#!/schedule/113
Educational data continues to grow in volume, velocity and variety. Making sense of the educational data in such conditions requires deployment and usage of appropriate scalable, real-time processing tools supporting a flexible data schema. Elasticsearch is one of the popular open-source tools meeting the enlisted requirements. Initially envisioned as a search engine capable of operating at scale and in real time, Elasticsearch is used by organisations such as Wikimedia and Github, which deal with big data on daily basis. In addition, Elasticsearch is used increasingly often as analytics platform thanks to its scalable architecture and expressive query language. Until recently, the exploitation of Elasticsearch for (learning) analytical purposes by practitioners was hindered by a high entrance barrier due to the complexity of the query language and the query specificities. This is currently changing with the ongoing development of Kibana, an open-source tool that allows to conduct analysis and build visualisations of Elasticsearch data through a graphical user interface. Kibana does not require the user to dive into technical details of the queries (although it is still possible) and hence makes big educational data visualisations accessible to regular users. The additional value of Kibana comes in play whenever several visualisations are combined on a single dashboard, enabling to use multiple coordinated views for an interactive explorative analysis. Both Elasticsearch and Kibana, together with Logstash are part of an analytics stack often referred to as ELK. Logstash supports data acquisition from multiple sources (including twitter, RSS, event logs) thanks to its rich set of available connectors. Custom connectors can be developed for case-specific sources. In addition to the mentioned values, ELK enables building analytics infrastructures decoupled from the learning platform, i.e., it allows to host separately the learning environment (with the analytics functionalities) and the data storage without affecting the end-user experience.
The document discusses two Ruby gems, Ashikawa::Core and Ashikawa::AR, that provide an interface to the ArangoDB database. Ashikawa::Core provides a low-level driver that abstracts ArangoDB's REST interface, while Ashikawa::AR implements an Active Record pattern for integrating ArangoDB with Rails applications. The document also briefly mentions plans to develop a DataMapper interface (Ashikawa::DataMapper) to support various data sources including ArangoDB.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
- 6Wunderkinder is a Berlin-based startup with 18 employees that created the task management app Wunderlist which has over 600k registered users
- They initially used a LAMP stack for Wunderlist but added Redis for caching and storing statistics since user growth impacted performance
- Redis allows them to store daily increment counts and sets of active user IDs to track metrics like daily syncs and active users
- 6Wunderkinder also uses Redis extensively in their new productivity platform Wunderkit for caching, statistics, logging, and asynchronous job queues
Cool bonsai cool - an introduction to ElasticSearchclintongormley
An introduction to Clinton Gormley and the search engine Elasticsearch. It discusses how Elasticsearch works by tokenizing text, creating an inverted index, and using relevance scoring. It also summarizes how to install and use Elasticsearch for indexing, retrieving, and searching documents.
Presentation on MongoDB and Node.JS. We describe how to do basic CRUD operations (insert, remove, update, find) how to aggregate using node.js. We also discuss a bit of Meteor, MEAN Stack and other ODMs and projects on Javascript and MongoDB
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...Miguel Gallardo
Frameworks are very helpful to solve common problems when developing an application. But what happens when we have to move to another framework? In this talk I will show how my company tries to keep independent of any framework, decoupling our business logic from symfony.
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
It's difficult to find off-the-shelf, open-source solutions for creating lean, simple, and language-agnostic data-processing pipelines for machine learning (ML). This session shows you how to use Amazon S3, Docker, Amazon EC2, Auto Scaling, and a number of open source libraries as cornerstones to build one. We also share our experience creating elastically scalable and robust ML infrastructure leveraging the Spot instance market.
This document discusses a command line tool called hotdog for interacting with DataDog. It summarizes that hotdog allows users to search for hosts on DataDog using tag expressions and instance IDs. It works by parsing expressions, retrieving host tag mappings from the DataDog API, building an index of host-tag relations, evaluating the expression against the index, and outputting results. The presenter then discusses how Treasure Data uses DataDog for monitoring and is hiring.
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
This document provides an overview of Logstash, Elasticsearch, and Kibana for log analysis. It discusses how logging is used for troubleshooting, security, and monitoring. It then introduces Logstash as an open-source log collection and parsing tool. Elasticsearch is described as a search and analytics engine that indexes log data from Logstash. Kibana provides a web interface for visualizing and searching logs stored in Elasticsearch. The document concludes with discussing demo, installation, scaling, and deployment considerations for these log analysis tools.
Elasticsearch is a distributed, RESTful search and analytics engine that allows for fast searching, filtering, and analysis of large volumes of data. It is document-based and stores structured and unstructured data in JSON documents within configurable indices. Documents can be queried using a simple query string syntax or more complex queries using the domain-specific query language. Elasticsearch also supports analytics through aggregations that can perform metrics and bucketing operations on document fields.
Talk I did on log aggregation with the ELK stack at Leeds DevOps. Covers how we process over 800,000 logs per hour at laterooms, and the cultural changes this has helped drive.
This slides are used to present the following Twitter pipeline using the ELK stack (Elasticsearch, Logstash, Kibana): https://github.com/melvynator/ELK_twitter It shows how to integrate Machine Learning into your Twitter pipeline.
Appli légère avec d3.js, sinatra, elasticsearch et capucineyann ARMAND
Présentation lors de l'Apéro Ruby Bordelais du 6 Mars 2012
par Mathieu Elie.
twitter: @mathieuel
http://www.mathieu-elie.net
Apéro Ruby Bordeaux
==============
twitter: @rubybdx
htp://rubybdx.org
The document discusses tools for creating HTML and JQuery applications including development, admin and data tools. It covers using JQuery for easy multiplatform access with standard HTML visuals and interactions. It provides examples of basic functions like data passing with PHP and AJAX, generating views, and DOM access. Useful JQuery functions like addClass and ajax are demonstrated. Popular JavaScript libraries that could be used are also listed, such as JQuery UI, DataTables, Expanding Text Areas, Typeahead, and a Color Picker library.
This document discusses building a social analytics tool using MongoDB from a developer's perspective. It covers using MongoDB for its schema-less data and ability to handle fast read-write operations. Key topics include using aggregation queries to gain insights from data by chaining queries together and filtering/manipulating results at each stage. JavaScript capabilities in MongoDB allow applying business logic directly to data. Examples demonstrate removing garbage data and stopwords. Indexes, current progress, and tips/tricks learned around cloning collections and removing vs dropping are also covered, with a demo planned.
ELK is a stack consisting of the open source tools Elasticsearch, Logstash, and Kibana. Elasticsearch provides a distributed, multitenant-capable full-text search engine. Logstash is used to collect, process, and forward events and log messages. Kibana provides visualization capabilities on top of Elasticsearch. The document discusses how each tool in the ELK stack works and can be configured using inputs, filters, and outputs in Logstash or through the Elasticsearch REST API. It also provides examples of using ELK for log collection, processing, and visualization.
This document provides summaries of NoSQL databases MongoDB, ElasticSearch, and Couchbase. It discusses their key features and uses cases. MongoDB is a document-oriented database that stores data in JSON-like documents. ElasticSearch is a search engine and stores data in JSON documents for real-time search and analytics capabilities. Couchbase is a key-value store that provides high-performance access to data through caching and supports high concurrency.
Foursquare uses Luigi to manage their complex data workflows. Luigi allows them to define tasks with dependencies in Python code rather than XML, making the workflows easier to write, test, visualize, and reuse components of. It also avoids wasted time from Cron jobs waiting and helps ensure tasks are only run once through its centralized scheduler. This provides a more robust replacement for both Cron jobs and Oozie workflows at Foursquare.
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
This document discusses data-driven development and the technologies used in the data analytics process. It covers topics like data collection, storage, processing, and visualization. The document advocates using managed cloud services for data and analytics to focus on data instead of managing infrastructure. Choosing technologies should be based on the type of data and problems to solve, not the other way around. Services like Google BigQuery, Amazon Redshift, and Treasure Data are recommended for their ease of use.
Lightning talk: elasticsearch at CogentaYann Cluchey
This document discusses how Elasticsearch is used by Cogenta to power their real-time retail intelligence platform. It tracks hundreds of eCommerce sites daily, organizing large amounts of data into a high-quality market view. Elasticsearch allows Cogenta to scale their processing and analytics capabilities, provide high availability, and power various use cases like logging, internal analytics, and reporting.
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
My workshop at the Learning Analytics Summer Institute (LASI) 2016: http://lasi16.snola.es/#!/schedule/113
Educational data continues to grow in volume, velocity and variety. Making sense of the educational data in such conditions requires deployment and usage of appropriate scalable, real-time processing tools supporting a flexible data schema. Elasticsearch is one of the popular open-source tools meeting the enlisted requirements. Initially envisioned as a search engine capable of operating at scale and in real time, Elasticsearch is used by organisations such as Wikimedia and Github, which deal with big data on daily basis. In addition, Elasticsearch is used increasingly often as analytics platform thanks to its scalable architecture and expressive query language. Until recently, the exploitation of Elasticsearch for (learning) analytical purposes by practitioners was hindered by a high entrance barrier due to the complexity of the query language and the query specificities. This is currently changing with the ongoing development of Kibana, an open-source tool that allows to conduct analysis and build visualisations of Elasticsearch data through a graphical user interface. Kibana does not require the user to dive into technical details of the queries (although it is still possible) and hence makes big educational data visualisations accessible to regular users. The additional value of Kibana comes in play whenever several visualisations are combined on a single dashboard, enabling to use multiple coordinated views for an interactive explorative analysis. Both Elasticsearch and Kibana, together with Logstash are part of an analytics stack often referred to as ELK. Logstash supports data acquisition from multiple sources (including twitter, RSS, event logs) thanks to its rich set of available connectors. Custom connectors can be developed for case-specific sources. In addition to the mentioned values, ELK enables building analytics infrastructures decoupled from the learning platform, i.e., it allows to host separately the learning environment (with the analytics functionalities) and the data storage without affecting the end-user experience.
The document discusses two Ruby gems, Ashikawa::Core and Ashikawa::AR, that provide an interface to the ArangoDB database. Ashikawa::Core provides a low-level driver that abstracts ArangoDB's REST interface, while Ashikawa::AR implements an Active Record pattern for integrating ArangoDB with Rails applications. The document also briefly mentions plans to develop a DataMapper interface (Ashikawa::DataMapper) to support various data sources including ArangoDB.
Description
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
Abstract
If you want to get data from the web, and there are no APIs available, then you need to use web scraping! Scrapy is the most effective and popular choice for web scraping and is used in many areas such as data science, journalism, business intelligence, web development, etc.
This workshop will provide an overview of Scrapy, starting from the fundamentals and working through each new topic with hands-on examples.
Participants will come away with a good understanding of Scrapy, the principles behind its design, and how to apply the best practices encouraged by Scrapy to any scraping task.
Goals:
Set up a python environment.
Learn basic concepts of the Scrapy framework.
- 6Wunderkinder is a Berlin-based startup with 18 employees that created the task management app Wunderlist which has over 600k registered users
- They initially used a LAMP stack for Wunderlist but added Redis for caching and storing statistics since user growth impacted performance
- Redis allows them to store daily increment counts and sets of active user IDs to track metrics like daily syncs and active users
- 6Wunderkinder also uses Redis extensively in their new productivity platform Wunderkit for caching, statistics, logging, and asynchronous job queues
Cool bonsai cool - an introduction to ElasticSearchclintongormley
An introduction to Clinton Gormley and the search engine Elasticsearch. It discusses how Elasticsearch works by tokenizing text, creating an inverted index, and using relevance scoring. It also summarizes how to install and use Elasticsearch for indexing, retrieving, and searching documents.
Presentation on MongoDB and Node.JS. We describe how to do basic CRUD operations (insert, remove, update, find) how to aggregate using node.js. We also discuss a bit of Meteor, MEAN Stack and other ODMs and projects on Javascript and MongoDB
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...Miguel Gallardo
Frameworks are very helpful to solve common problems when developing an application. But what happens when we have to move to another framework? In this talk I will show how my company tries to keep independent of any framework, decoupling our business logic from symfony.
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
It's difficult to find off-the-shelf, open-source solutions for creating lean, simple, and language-agnostic data-processing pipelines for machine learning (ML). This session shows you how to use Amazon S3, Docker, Amazon EC2, Auto Scaling, and a number of open source libraries as cornerstones to build one. We also share our experience creating elastically scalable and robust ML infrastructure leveraging the Spot instance market.
This document discusses a command line tool called hotdog for interacting with DataDog. It summarizes that hotdog allows users to search for hosts on DataDog using tag expressions and instance IDs. It works by parsing expressions, retrieving host tag mappings from the DataDog API, building an index of host-tag relations, evaluating the expression against the index, and outputting results. The presenter then discusses how Treasure Data uses DataDog for monitoring and is hiring.
Log analysis using Logstash,ElasticSearch and KibanaAvinash Ramineni
This document provides an overview of Logstash, Elasticsearch, and Kibana for log analysis. It discusses how logging is used for troubleshooting, security, and monitoring. It then introduces Logstash as an open-source log collection and parsing tool. Elasticsearch is described as a search and analytics engine that indexes log data from Logstash. Kibana provides a web interface for visualizing and searching logs stored in Elasticsearch. The document concludes with discussing demo, installation, scaling, and deployment considerations for these log analysis tools.
Elasticsearch is a distributed, RESTful search and analytics engine that allows for fast searching, filtering, and analysis of large volumes of data. It is document-based and stores structured and unstructured data in JSON documents within configurable indices. Documents can be queried using a simple query string syntax or more complex queries using the domain-specific query language. Elasticsearch also supports analytics through aggregations that can perform metrics and bucketing operations on document fields.
Talk I did on log aggregation with the ELK stack at Leeds DevOps. Covers how we process over 800,000 logs per hour at laterooms, and the cultural changes this has helped drive.
This slides are used to present the following Twitter pipeline using the ELK stack (Elasticsearch, Logstash, Kibana): https://github.com/melvynator/ELK_twitter It shows how to integrate Machine Learning into your Twitter pipeline.
Appli légère avec d3.js, sinatra, elasticsearch et capucineyann ARMAND
Présentation lors de l'Apéro Ruby Bordelais du 6 Mars 2012
par Mathieu Elie.
twitter: @mathieuel
http://www.mathieu-elie.net
Apéro Ruby Bordeaux
==============
twitter: @rubybdx
htp://rubybdx.org
The document discusses tools for creating HTML and JQuery applications including development, admin and data tools. It covers using JQuery for easy multiplatform access with standard HTML visuals and interactions. It provides examples of basic functions like data passing with PHP and AJAX, generating views, and DOM access. Useful JQuery functions like addClass and ajax are demonstrated. Popular JavaScript libraries that could be used are also listed, such as JQuery UI, DataTables, Expanding Text Areas, Typeahead, and a Color Picker library.
Google App Engine is a platform for developing and hosting web applications at scale. It provides tools and services including the Datastore for storing data, APIs for common tasks like mail sending, and a Java runtime environment. Developers write applications using Java and deploy them to run on Google's scalable infrastructure without having to manage servers. The Datastore is a scalable NoSQL database that provides both document and relational data storage capabilities. It uses entity groups and ancestor queries to provide transactional consistency. The Java SDK and plugins for Eclipse allow developing and testing applications locally before deployment.
Django is a Python web framework named after jazz musician Django Reinhardt. It features an object-relational mapper, reusable apps, a template language, admin interface, and testing framework. Many large sites use Django including NASA, PBS, and news organizations. It has a vibrant community and over 650 pages of documentation.
node-crate: node.js & big data
This presentation provides 'lessons learned' from project implementations with various technologies like Elasticsearch or MongoDB and describes how using Crate data store solved the key issues. The second part introduces CRATE data store and 'node-crate' by examples for development and operation.
About Crate: Crate is a new breed of database to serve today's mammoth data needs. Based on the familiar SQL syntax, Crate combines high availability, resiliency, and scalability in a distributed design that allows you to query mountains of data in realtime, not batches. We solve your data scaling problems and make administration a breeze. Easy to scale, simple to use.
Slides from an HTML5 overview session I presented at work...
This presentation has an accompanying sample webapp project: http://code.google.com/p/html5-playground
Cross platform Mobile development on TitaniumYiguang Hu
This present talks about using Titanium to develop cross platform mobile applications. Compare native development, phonegap, html5,javascript with titanium.
This document summarizes how Elasticsearch can be used for scaling analytics applications. Elasticsearch is an open source, distributed search and analytics engine that can index large volumes of data. It automatically shards and replicates data across nodes for redundancy and high availability. Analytics queries like date histograms, statistical facets, and geospatial searches can retrieve insightful results from large datasets very quickly. The document provides an example of using Elasticsearch to perform sentiment analysis, location tagging, and analytical queries on over 100 million social media documents.
Google App Engine is a platform for developing and hosting web applications at scale. Key features include the Datastore for scalable data storage, services like Mail and Images, and the Java runtime environment using Jetty and a Java Virtual Machine. Developers can code applications using Java and deploy them to run on Google's infrastructure.
En esta presentación se muestran un conjunto de librerías y frameworks en Python para poder realizar pruebas tanto funcionales com ono funcionales, a diferentes niveles (unitario, aceptación y e2)
In the PowerPoint presentation about Azure Synapse, we begin by introducing Azure Synapse as an integrated analytics service, emphasizing its role in unifying big data and data warehousing. Key features such as unlimited information processing, querying of both relational and non-relational data, and integration with AI and BI capabilities are highlighted. The presentation delves into the architecture of Azure Synapse, illustrating how it interconnects with Azure Data Lake, Power BI, and Azure Machine Learning. We explore its robust data integration capabilities, including Azure Synapse Pipelines for efficient ETL processes. The discussion then moves to its prowess in analytics and big data processing, supporting various languages like T-SQL, Python, and Scala. The integration of Azure Synapse with AI and machine learning is underscored, showcasing its application in predictive analytics. Security features form a crucial part of the talk, emphasizing data protection and compliance aspects. Real-world use cases demonstrate Azure Synapse's practical applications in business settings. A comparative analysis with other data platforms highlights Synapse's unique benefits. The presentation concludes with guidance on getting started with Azure Synapse, followed by a summary, inviting audience questions and providing contact information for further engagement.
The web has changed! Users spend more time on mobile than on desktops and expect to have an amazing user experience on both. APIs are the heart of the new web as the central point of access data, encapsulating logic and providing the same data and same features for desktops and mobiles. In this workshop, Antonio will show you how to create complex APIs in an easy and quick way using API Platform built on Symfony.
This document provides information on developing a Web of Things (WoT) including:
- Discussing how to get started with programming "things" like Arduino, Raspberry Pi, and SunSpot boards.
- Describing the complex process of programming microcontrollers, including using IDEs, compilers, debuggers, and loading code.
- Outlining key questions to answer before starting a WoT project, such as the scope, use case, programming tools, and publishing infrastructure.
- Providing an example WoT use case of using distributed sensing for applications like power grids, transport systems, and industrial automation.
This document discusses building a conversational agent for a restaurant using a language model with no framework. It describes prompting an AI model with contextual information about the restaurant to improve responses. Integrating a vector database to store company information allows retrieving relevant facts to include in responses. The document outlines phases like storing embeddings, searching the database, and embedding results in prompts. Areas for improvement include chunking large inputs, checking for prompt injection, and using feedback or an assistant framework.
Building an intelligent big data application in 30 minutesClaudiu Barbura
Strata Barcelona presentation slides, a live demo of building an intelligent big data application from a web console. The tools and APIs behind are built on top of Spark, Spark SQL/Shark, Tachyon, Mesos, Cassandra, SolrCloud, iPython and include: ELT pipeline (ingestion and transformation), data warehouse explorer, export to NoSql and generated APIs, export to SolrCloud and generated APIs, predictive model building, training and publishing, dashboard UI, monitoring and instrumentation.
The document discusses the features and capabilities of HTML5. It covers new semantic elements, forms, offline storage, device access, multimedia, 3D graphics, performance improvements, and CSS3 features. Key points include more meaningful tags, custom data attributes, offline application caching, geolocation, cameras, web sockets, and canvas/WebGL for graphics.
ElasticSearch is a flexible and powerful open source, distributed real-time search and analytics engine for the cloud. It is JSON-oriented, uses a RESTful API, and has a schema-free design. Logstash is a tool for collecting, parsing, and storing logs and events in ElasticSearch for later use and analysis. It has many input, filter, and output plugins to collect data from various sources, parse it, and send it to destinations like ElasticSearch. Kibana works with ElasticSearch to visualize and explore stored logs and data.
This document discusses several lessons about Android development that are not typically covered in school. It covers architectural changes in Android over time, security best practices, techniques for logging user activity and crash reports, strategies for building hybrid mobile-web applications, considerations for creating mobile SDKs, and approaches for testing Android apps on multiple device configurations.
Similar to Data vizualisation: d3.js + sinatra + elasticsearch (20)
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/how-axelera-ai-uses-digital-compute-in-memory-to-deliver-fast-and-energy-efficient-computer-vision-a-presentation-from-axelera-ai/
Bram Verhoef, Head of Machine Learning at Axelera AI, presents the “How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-efficient Computer Vision” tutorial at the May 2024 Embedded Vision Summit.
As artificial intelligence inference transitions from cloud environments to edge locations, computer vision applications achieve heightened responsiveness, reliability and privacy. This migration, however, introduces the challenge of operating within the stringent confines of resource constraints typical at the edge, including small form factors, low energy budgets and diminished memory and computational capacities. Axelera AI addresses these challenges through an innovative approach of performing digital computations within memory itself. This technique facilitates the realization of high-performance, energy-efficient and cost-effective computer vision capabilities at the thin and thick edge, extending the frontier of what is achievable with current technologies.
In this presentation, Verhoef unveils his company’s pioneering chip technology and demonstrates its capacity to deliver exceptional frames-per-second performance across a range of standard computer vision networks typical of applications in security, surveillance and the industrial sector. This shows that advanced computer vision can be accessible and efficient, even at the very edge of our technological ecosystem.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
What is an RPA CoE? Session 1 – CoE VisionDianaGray10
In the first session, we will review the organization's vision and how this has an impact on the COE Structure.
Topics covered:
• The role of a steering committee
• How do the organization’s priorities determine CoE Structure?
Speaker:
Chris Bolin, Senior Intelligent Automation Architect Anika Systems
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
4. d3.js
• data([4, 8, 15, 16, 23, 42])
• array of document elements (<p>)
• data[i] <=> elements[i] -> each element of
the data array is binded to an element of
the document
5. d3.js
• #repeat : each element of the data array is
binded to an element of the document
• foreach item of my data array:
• .enter().append("p").text(function(d)
{ return "I'm number " + d + "!"; });
• i build a <p> el and set a text with the data
value
6. d3.js
• circle.exit().remove();
• when data element is removed, i just the
remove the same index element from the
dom
• if data data value is updated ? transition +
svg!
• rect.transition().duration(1000).attr("x",
function(d, i) { return x(i) - .5; });
9. sinatra
• quick web application without persistence
and so on (but you can too ;))
• data viz with static js / css / html
• proxy to api (here we have a proxy to the
elastic search api)
• make call to facebook / oauth api and get
token for debug....
• etc....
11. elasticsearch
• put json on the index
• search index
• response times are indcrebly fast
• super easy clustering (sharding +
replication)
• and requested by REST / json api
13. the app
js
d3.js sinatra elasticsearch
ajax
javascript
sinatra really
contain main elasticsearch
handy for
part of app accessed via
proxying with
vizualiation and http
REST api
http querying