The document discusses using Elasticsearch, D3.js, Angular.js, and Google Refine to create a full stack data visualization of open data from Bordeaux, France. It focuses on data from the CAPC contemporary art museum, importing the data into Elasticsearch for scalable search and then using D3.js, Angular.js, and Yeoman to build the front-end visualization with JavaScript. The goal is to make the data more accessible and understandable through interactive visualization.
This document discusses various tools for data visualization, including D3.js, WebGL, the ELK stack, R, Processing, Open Refine, and 3D printing. It provides examples of visualizations created with each tool and suggests when each tool may be best to use. D3.js is described as a low-level library that provides full control but requires more work, while tools like the ELK stack allow for quickly visualizing system and business data. R is presented as useful for exploring and analyzing large datasets, and Open Refine is recommended for cleaning and preparing CSV files for export.
Data vizualisation: d3.js + sinatra + elasticsearchMathieu Elie
Live screencast on my tech blog (fr speaking):
http://www.mathieu-elie.net/screencast-video-d3-js-sinatra-elasticsearch-capucine/
other tech slides at my blog: http://www.mathieu-elie.net
Roddy Lindsay discusses how Facebook generates large amounts of user data daily and the challenges of analyzing this data at scale. Facebook initially used Oracle and Hadoop to analyze data but developed its own SQL-like query language called Hive to allow business analysts to access data. Hive distributed queries across large Hadoop clusters, enabling decentralized access. This allowed text analytics like sentiment analysis and associations mapping. Lindsay believes such analytics could help individuals understand their own happiness patterns from personal data.
The document discusses NoSQL databases and CouchDB. It provides an overview of NoSQL, the different types of NoSQL databases, and when each type would be used. It then focuses on CouchDB, explaining its features like document centric modeling, replication, and fail fast architecture. Examples are given of how to interact with CouchDB using its HTTP API and tools like Resty.
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
Cool bonsai cool - an introduction to ElasticSearchclintongormley
An introduction to Clinton Gormley and the search engine Elasticsearch. It discusses how Elasticsearch works by tokenizing text, creating an inverted index, and using relevance scoring. It also summarizes how to install and use Elasticsearch for indexing, retrieving, and searching documents.
The document discusses using Elasticsearch, D3.js, Angular.js, and Google Refine to create a full stack data visualization of open data from Bordeaux, France. It focuses on data from the CAPC contemporary art museum, importing the data into Elasticsearch for scalable search and then using D3.js, Angular.js, and Yeoman to build the front-end visualization with JavaScript. The goal is to make the data more accessible and understandable through interactive visualization.
This document discusses various tools for data visualization, including D3.js, WebGL, the ELK stack, R, Processing, Open Refine, and 3D printing. It provides examples of visualizations created with each tool and suggests when each tool may be best to use. D3.js is described as a low-level library that provides full control but requires more work, while tools like the ELK stack allow for quickly visualizing system and business data. R is presented as useful for exploring and analyzing large datasets, and Open Refine is recommended for cleaning and preparing CSV files for export.
Data vizualisation: d3.js + sinatra + elasticsearchMathieu Elie
Live screencast on my tech blog (fr speaking):
http://www.mathieu-elie.net/screencast-video-d3-js-sinatra-elasticsearch-capucine/
other tech slides at my blog: http://www.mathieu-elie.net
Roddy Lindsay discusses how Facebook generates large amounts of user data daily and the challenges of analyzing this data at scale. Facebook initially used Oracle and Hadoop to analyze data but developed its own SQL-like query language called Hive to allow business analysts to access data. Hive distributed queries across large Hadoop clusters, enabling decentralized access. This allowed text analytics like sentiment analysis and associations mapping. Lindsay believes such analytics could help individuals understand their own happiness patterns from personal data.
The document discusses NoSQL databases and CouchDB. It provides an overview of NoSQL, the different types of NoSQL databases, and when each type would be used. It then focuses on CouchDB, explaining its features like document centric modeling, replication, and fail fast architecture. Examples are given of how to interact with CouchDB using its HTTP API and tools like Resty.
ElasticSearch - index server used as a document databaseRobert Lujo
Presentation held on 5.10.2014 on http://2014.webcampzg.org/talks/.
Although ElasticSearch (ES) primary purpose is to be used as index/search server, in its featureset ES overlaps with common NoSql database; better to say, document database.
Why this could be interesting and how this could be used effectively?
Talk overview:
- ES - history, background, philosophy, featureset overview, focus on indexing/search features
- short presentation on how to get started - installation, indexing and search/retrieving
- Database should provide following functions: store, search, retrieve -> differences between relational, document and search databases
- it is not unusual to use ES additionally as an document database (store and retrieve)
- an use-case will be presented where ES can be used as a single database in the system (benefits and drawbacks)
- what if a relational database is introduced in previosly demonstrated system (benefits and drawbacks)
ES is a nice and in reality ready-to-use example that can change perspective of development of some type of software systems.
Cool bonsai cool - an introduction to ElasticSearchclintongormley
An introduction to Clinton Gormley and the search engine Elasticsearch. It discusses how Elasticsearch works by tokenizing text, creating an inverted index, and using relevance scoring. It also summarizes how to install and use Elasticsearch for indexing, retrieving, and searching documents.
This document summarizes a presentation about Apache CouchDB. Some key points:
- CouchDB is a scalable, distributed key-value database that uses peer-to-peer replication. It has an append-only file structure and is designed to handle crashes well.
- Data is stored in JSON documents with dynamic schemas. Views are built using JavaScript map-reduce functions.
- The API is RESTful HTTP and works natively with the web. Data can be queried and rendered directly in the browser using JavaScript.
- CouchDB embraces web technologies and can scale from smartphones to server clusters. It is open source and embraces an open philosophy.
This document discusses best practices for migrating from Drupal 6 to Drupal 7. It introduces Webbie the Zombie, who outlines the main topics of auditing a site, cleaning it up before migrating, and practicing the migration. The document provides tips for auditing modules, users, content types, URLs, and files. It recommends cleaning by removing redundant, outdated, and trivial components. It emphasizes practicing the migration without doing it live, having frequent backups, and planning a content freeze for the live site. The presenter asks for comments, questions, or zombie jokes.
Big Data consists of several issues: data collecting, storage, computing, analysis and visualization. Python is a popular scripting language with good code readability and thus is suitable for fast development. In this slides, the author shares how to solve Big Data issues using Python open source tools.
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...Miguel Gallardo
Frameworks are very helpful to solve common problems when developing an application. But what happens when we have to move to another framework? In this talk I will show how my company tries to keep independent of any framework, decoupling our business logic from symfony.
The document introduces Bixo, an open source web mining toolkit built on Hadoop, Cascading, and Tika. It provides an example of how Bixo was used to analyze Apache Hadoop mailing list archives to determine the most helpful contributors. The workflow involved collecting email data, parsing it, analyzing messages to score contributors, and producing a ranked list. Bixo allows building custom workflows through a pipe model to extract structured data from unstructured web sources for business intelligence, competitive analysis, and other applications.
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...BookNet Canada
This document discusses the future of metadata and explores opportunities with new technologies like chatbots, virtual reality, and automated metadata extraction. It notes that digital technologies have disrupted industries like photography and that digital experiences will be important in 2017. Various metadata fields and standards are listed. Challenges and opportunities for the BiblioShare platform are mentioned, including a lack of certain book records and the potential for more sample chapters and automated metadata.
Web History 101, or How the Future is UnwrittenBookNet Canada
In 1989 computer scientist Tim Berners-Lee wrote “Information Management: A Proposal” to persuade CERN management that a global hypertext system was in their interests. That proposal gradually grew into what we now call the World Wide Web. This originating document contains not only the bits that would later become the Web, but also features for a future we’ve yet to realize. In this talk, we’ll take a look at some of those highlights and focus them on the world of publishing, proposing solutions to problems we’re still attempting to solve and fostering ideas for further daydreaming.
ELK is a stack consisting of the open source tools Elasticsearch, Logstash, and Kibana. Elasticsearch provides a distributed, multitenant-capable full-text search engine. Logstash is used to collect, process, and forward events and log messages. Kibana provides visualization capabilities on top of Elasticsearch. The document discusses how each tool in the ELK stack works and can be configured using inputs, filters, and outputs in Logstash or through the Elasticsearch REST API. It also provides examples of using ELK for log collection, processing, and visualization.
Elasticsearch is a distributed, RESTful search and analytics engine that allows for fast searching, filtering, and analysis of large volumes of data. It is document-based and stores structured and unstructured data in JSON documents within configurable indices. Documents can be queried using a simple query string syntax or more complex queries using the domain-specific query language. Elasticsearch also supports analytics through aggregations that can perform metrics and bucketing operations on document fields.
The document discusses the percolator feature in Elasticsearch. It begins by explaining what a percolator is and how it works at a high level. It then provides more technical details on how to index queries, perform percolation searches, and the benefits of the redesigned percolator. Key points covered include how the percolator works in distributed environments, examples of how percolator can be used, and new features like filtering, sorting, scoring, and highlighting.
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Amazing Speed: Elasticsearch for the .NET Developer- Adrian Carr, Codestock 2015Adrian Carr
The document summarizes a presentation about using Elasticsearch to improve search performance for applications with large amounts of data. It describes how the presenter previously used Elasticsearch at a previous job to speed up searches of a growing product catalog. The presenter then demonstrates how to install and use Elasticsearch with .NET applications using the Nest client library. Issues that may arise with integrating Elasticsearch into existing applications are also discussed, such as differences from relational databases and potential rework of user interfaces.
This document summarizes web scraping and introduces the Scrapy framework. It defines web scraping as extracting information from websites when APIs are not available or data needs periodic extraction. The speaker then discusses experiments with scraping in Python using libraries like BeautifulSoup and lxml. Scrapy is introduced as a fast, high-level scraping framework that allows defining spiders to extract needed data from websites and run scraping jobs. Key benefits of Scrapy like simplicity, speed, extensibility and documentation are highlighted.
ArangoDB is an open source multi-model NoSQL database that can be used as a document store, key-value store, and graph database. It provides a query language called AQL that is similar to SQL. Documents and data can be easily extended and manipulated using JavaScript. ArangoDB is highly performant, space efficient, and can scale horizontally. It has been in development since 2011 with the goal of providing a full-featured database while avoiding the downsides of other NoSQL solutions.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
More infos on http://www.mathieu-elie.net/eventmachine-introduction-pres-rubybdx-screencast-fr
Ruby Eventmachine is a really goop option to build scalable real time servers and more...
This document summarizes the technology stack and use of websockets at oneplaylist.fm. The key aspects are:
- The stack includes Ruby on Rails, Redis, EventMachine, HAProxy, Resque, MongoDB, CoffeeScript, and Elasticsearch.
- HAProxy is used for TCP load balancing and handles HTTP as well, distributing traffic across multiple Rails app servers, Elasticsearch instances, and the EventMachine websocket server.
- Websockets are handled via a TCP connection to the EventMachine server through a separate subdomain, keeping HTTP requests on the main app domain.
- Redis is used for centralized communication and state management via Pub/Sub, with tokens mapping users to channels and event data pushed
Las moléculas de un gas ejercen presión constante sobre las paredes del recipiente que las contiene debido a su movimiento continuo. La presión de un gas se define como la fuerza aplicada por unidad de área. La presión y la temperatura de un gas están directamente relacionadas con la velocidad y energía cinética promedio de sus moléculas de acuerdo con la teoría cinética de los gases.
This document summarizes a presentation about Apache CouchDB. Some key points:
- CouchDB is a scalable, distributed key-value database that uses peer-to-peer replication. It has an append-only file structure and is designed to handle crashes well.
- Data is stored in JSON documents with dynamic schemas. Views are built using JavaScript map-reduce functions.
- The API is RESTful HTTP and works natively with the web. Data can be queried and rendered directly in the browser using JavaScript.
- CouchDB embraces web technologies and can scale from smartphones to server clusters. It is open source and embraces an open philosophy.
This document discusses best practices for migrating from Drupal 6 to Drupal 7. It introduces Webbie the Zombie, who outlines the main topics of auditing a site, cleaning it up before migrating, and practicing the migration. The document provides tips for auditing modules, users, content types, URLs, and files. It recommends cleaning by removing redundant, outdated, and trivial components. It emphasizes practicing the migration without doing it live, having frequent backups, and planning a content freeze for the live site. The presenter asks for comments, questions, or zombie jokes.
Big Data consists of several issues: data collecting, storage, computing, analysis and visualization. Python is a popular scripting language with good code readability and thus is suitable for fast development. In this slides, the author shares how to solve Big Data issues using Python open source tools.
Code decoupling from Symfony (and others frameworks) - PHP Conference Brasil ...Miguel Gallardo
Frameworks are very helpful to solve common problems when developing an application. But what happens when we have to move to another framework? In this talk I will show how my company tries to keep independent of any framework, decoupling our business logic from symfony.
The document introduces Bixo, an open source web mining toolkit built on Hadoop, Cascading, and Tika. It provides an example of how Bixo was used to analyze Apache Hadoop mailing list archives to determine the most helpful contributors. The workflow involved collecting email data, parsing it, analyzing messages to score contributors, and producing a ranked list. Bixo allows building custom workflows through a pipe model to extract structured data from unstructured web sources for business intelligence, competitive analysis, and other applications.
Spiders, Chatbots, and the Future of Metadata: A look inside the BNC BiblioSh...BookNet Canada
This document discusses the future of metadata and explores opportunities with new technologies like chatbots, virtual reality, and automated metadata extraction. It notes that digital technologies have disrupted industries like photography and that digital experiences will be important in 2017. Various metadata fields and standards are listed. Challenges and opportunities for the BiblioShare platform are mentioned, including a lack of certain book records and the potential for more sample chapters and automated metadata.
Web History 101, or How the Future is UnwrittenBookNet Canada
In 1989 computer scientist Tim Berners-Lee wrote “Information Management: A Proposal” to persuade CERN management that a global hypertext system was in their interests. That proposal gradually grew into what we now call the World Wide Web. This originating document contains not only the bits that would later become the Web, but also features for a future we’ve yet to realize. In this talk, we’ll take a look at some of those highlights and focus them on the world of publishing, proposing solutions to problems we’re still attempting to solve and fostering ideas for further daydreaming.
ELK is a stack consisting of the open source tools Elasticsearch, Logstash, and Kibana. Elasticsearch provides a distributed, multitenant-capable full-text search engine. Logstash is used to collect, process, and forward events and log messages. Kibana provides visualization capabilities on top of Elasticsearch. The document discusses how each tool in the ELK stack works and can be configured using inputs, filters, and outputs in Logstash or through the Elasticsearch REST API. It also provides examples of using ELK for log collection, processing, and visualization.
Elasticsearch is a distributed, RESTful search and analytics engine that allows for fast searching, filtering, and analysis of large volumes of data. It is document-based and stores structured and unstructured data in JSON documents within configurable indices. Documents can be queried using a simple query string syntax or more complex queries using the domain-specific query language. Elasticsearch also supports analytics through aggregations that can perform metrics and bucketing operations on document fields.
The document discusses the percolator feature in Elasticsearch. It begins by explaining what a percolator is and how it works at a high level. It then provides more technical details on how to index queries, perform percolation searches, and the benefits of the redesigned percolator. Key points covered include how the percolator works in distributed environments, examples of how percolator can be used, and new features like filtering, sorting, scoring, and highlighting.
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
Global introduction to elastisearch presented at BigData meetup.
Use cases, getting started, Rest CRUD API, Mapping, Search API, Query DSL with queries and filters, Analyzers, Analytics with facets and aggregations, Percolator, High Availability, Clients & Integrations, ...
Amazing Speed: Elasticsearch for the .NET Developer- Adrian Carr, Codestock 2015Adrian Carr
The document summarizes a presentation about using Elasticsearch to improve search performance for applications with large amounts of data. It describes how the presenter previously used Elasticsearch at a previous job to speed up searches of a growing product catalog. The presenter then demonstrates how to install and use Elasticsearch with .NET applications using the Nest client library. Issues that may arise with integrating Elasticsearch into existing applications are also discussed, such as differences from relational databases and potential rework of user interfaces.
This document summarizes web scraping and introduces the Scrapy framework. It defines web scraping as extracting information from websites when APIs are not available or data needs periodic extraction. The speaker then discusses experiments with scraping in Python using libraries like BeautifulSoup and lxml. Scrapy is introduced as a fast, high-level scraping framework that allows defining spiders to extract needed data from websites and run scraping jobs. Key benefits of Scrapy like simplicity, speed, extensibility and documentation are highlighted.
ArangoDB is an open source multi-model NoSQL database that can be used as a document store, key-value store, and graph database. It provides a query language called AQL that is similar to SQL. Documents and data can be easily extended and manipulated using JavaScript. ArangoDB is highly performant, space efficient, and can scale horizontally. It has been in development since 2011 with the goal of providing a full-featured database while avoiding the downsides of other NoSQL solutions.
Talk given for the #phpbenelux user group, March 27th in Gent (BE), with the goal of convincing developers that are used to build php/mysql apps to broaden their horizon when adding search to their site. Be sure to also have a look at the notes for the slides; they explain some of the screenshots, etc.
An accompanying blog post about this subject can be found at http://www.jurriaanpersyn.com/archives/2013/11/18/introduction-to-elasticsearch/
More infos on http://www.mathieu-elie.net/eventmachine-introduction-pres-rubybdx-screencast-fr
Ruby Eventmachine is a really goop option to build scalable real time servers and more...
This document summarizes the technology stack and use of websockets at oneplaylist.fm. The key aspects are:
- The stack includes Ruby on Rails, Redis, EventMachine, HAProxy, Resque, MongoDB, CoffeeScript, and Elasticsearch.
- HAProxy is used for TCP load balancing and handles HTTP as well, distributing traffic across multiple Rails app servers, Elasticsearch instances, and the EventMachine websocket server.
- Websockets are handled via a TCP connection to the EventMachine server through a separate subdomain, keeping HTTP requests on the main app domain.
- Redis is used for centralized communication and state management via Pub/Sub, with tokens mapping users to channels and event data pushed
Las moléculas de un gas ejercen presión constante sobre las paredes del recipiente que las contiene debido a su movimiento continuo. La presión de un gas se define como la fuerza aplicada por unidad de área. La presión y la temperatura de un gas están directamente relacionadas con la velocidad y energía cinética promedio de sus moléculas de acuerdo con la teoría cinética de los gases.
Sourabh Vohra is a network security analyst with over 3 years of experience in networking, team management, and customer relationship management in the telecom domain. He is currently working with TNS Telecom UAE under DU Telecom. He has expertise in Cisco, Juniper, Huawei firewalls, IPS devices, DDoS solutions and more. He is seeking a networking role where he can utilize his strong technical, troubleshooting, and customer service skills.
The document summarizes the GE Healthcare Discovery CT750 HD computed tomography (CT) scanner. It highlights the scanner's capabilities including:
- Using Adaptive Statistical Iterative Reconstruction (ASiR) to reduce radiation dose by up to 50% while maintaining image quality.
- Offering the highest cardiac spatial resolution in the industry at 18.2 lp/cm for accurate coronary artery imaging.
- Enabling low dose cardiac imaging below 1 mSv using techniques like SnapShot Pulse and ASiR.
- Its proprietary Gemstone Spectral Imaging which uses dual energy to provide material decomposition and virtual non-contrast imaging, aiding in lesion characterization and reducing metal artifacts.
O documento discute a omissão do reconhecimento da paternidade homoparental no Brasil, o que é considerado inconstitucional. Defende que é necessário reconhecer vínculos jurídicos entre crianças e adolescentes e seus dois pais do mesmo sexo, baseado no melhor interesse da criança e no afeto, não na biologia.
Indigenous forests are being degraded due to a lack of alternative livelihoods, forcing people to fell trees for low-return charcoal production. Complex regulations and a lack of coordination among government ministries have made conservation difficult. The document presents a conceptual framework for analyzing the charcoal value chain within a landscape context, including production, transport, use and the various actors and factors involved at each stage. It aims to develop a landscape approach for sustainable charcoal management through multi-stakeholder coordination, improved policies, regeneration of woodlands and alternative livelihoods.
Water pollution caused by toxic substancesshenaemhe14
Toxic substances can pollute water through various means such as discharge of industrial and commercial wastewater and waste, release of contaminants in surface runoff from urban and agricultural areas, waste disposal and groundwater leaching, and eutrophication. Other sources of water pollution include agriculture, land clearing, and littering. Some ways to help prevent water pollution are to conserve water, be careful about what is poured down drains, use environmentally friendly household products, limit overuse of pesticides and fertilizers, plant gardens to absorb runoff, and properly dispose of litter.
En el presente proyecto realizaremos el estudio de las principales tendencias sucedidas en la disciplina del marketing, a lo largo de toda su historia. Las cuales se verán tratadas tanto desde una perspectiva teórico-didáctica como desde la visión más práctica de la disciplina. Comenzamos con el estudio conceptual del término, a fin de que su lectura pueda ser considerada por cualquier tipo de lector, sea o no un profesional del marketing. Continuando con la investigación, que va adquiriendo matices más antropológicos, a medida que ahondamos en la misma. De esta manera queda contextualizada cada una de las etapas del desarrollo de la disciplina, mencionadas en el presente proyecto, favoreciendo así su comprensión. Además, a fin de facilitar su lectura, también se han contextualizado los apartados, dividiendo éstos a su vez en capítulos; en cada uno de los cuales se recogen las tendencias de esta disciplina en diferentes períodos. De manera que el lector pueda decidir sobre que etapa de la disciplina desea concentrar su atención y también para favorecer su revisión posterior. A fin de poder realizar mi labor como Coolhunter, en este proyecto. Una vez alcanzamos la era actual, profundizamos algo más sobre las tendencias que están marcando el rumbo del marketing en la actualidad. Analizando también aquellos sectores de influencia directa sobre las mismas. Por supuesto, no pretendemos hacer de ésta una investigación de mercados exhaustiva, ya que el objeto del presente proyecto se centra en la comprensión sobre el desarrollo de la disciplina, a través del estudio de las principales tendencias acontecidas en la misma, a lo largo de toda su historia. Habiendo sido éste considerado necesario, tras la lectura literal que se viene recogiendo en los medios, respecto de la inminente muerte del marketing “tradicional” y al que muchos parece que bautizan como única y exclusiva práctica del marketing, llegando incluso a afirmar rotundamente que “el marketing ha muerto”. Pues bien, si continúan con esta lectura, podrán ustedes comprobar con constatados hechos que “el marketing no está muerto”. Sino que, en la actualidad, éste alcanza un paradigma estratosférico para convertirse en una herramienta de colaboración social, capaz incluso de hacer de este mundo un lugar mejor.
This document provides an overview of Hadoop and big data concepts. It discusses Hadoop core components like HDFS, YARN, MapReduce and how they work. It also covers related technologies like Hive, Pig, Sqoop and Flume. The document discusses common Hadoop configurations, deployment modes, use cases and best practices. It aims to help developers get started with Hadoop and build big data solutions.
Hadoop is a distributed processing framework. It includes components like HDFS for storage, YARN for resource management, and MapReduce for distributed computations. HDFS stores large files across clusters with replication for reliability. YARN separates resource management from job scheduling and supports multiple programming models. MapReduce uses map and reduce functions to process large datasets in parallel. Tez and Hive provide higher-level abstractions over MapReduce. Zookeeper enables coordination between distributed services. Kafka is a distributed messaging system.
A digital signature allows one to verify the identity of the sender of a message and that the message content has not been altered. It involves a key generation algorithm that produces a private key and public key pair. The signing algorithm uses the private key to generate a signature for a message. The signature verification algorithm uses the public key to verify the signature and authenticity of the message. Digital signatures provide security as long as the private key remains confidential to the owner.
This document provides an overview of HBase, including its architecture and how it compares to relational databases and HDFS. Some key points:
- HBase is a non-relational, distributed, column-oriented database that runs on top of Hadoop. It uses a master-slave architecture with an HMaster and multiple HRegionServers.
- Unlike relational databases, HBase is schema-less, column-oriented, and designed for denormalized data in wide, sparsely populated tables.
- Compared to HDFS, HBase provides low-latency random reads/writes instead of batch processing. Data is accessed via APIs instead of MapReduce.
- HBase uses LSM
Elasticsearch is a distributed, open source search and analytics engine. It allows storing and searching of documents of any schema in real-time. Documents are organized into indices which can contain multiple types of documents. Indices are partitioned into shards and replicas to allow horizontal scaling and high availability. The document consists of a JSON object which is indexed and can be queried using a RESTful API.
This document outlines an agenda for a Hadoop workshop covering Cisco's use of Hadoop. The agenda includes introductions, presentations on Hadoop concepts and Cisco's Hadoop architecture, and two hands-on exercises configuring Hadoop and using Hive and Impala for analytics. Key topics to be covered are Hadoop and big data concepts, Cisco's Webex Hadoop architecture using Cisco UCS, and how Hadoop addresses the challenges of large volumes of structured and unstructured data across global data centers.
El documento describe los ejes y sus elementos. Explica que los ejes son elementos rotatorios o estacionarios que transmiten potencia, sobre los cuales se montan dispositivos como turbomáquinas, cigüeñales y bombas. Los ejes pueden ser de diferentes formas, tamaños y orientaciones verticales u horizontales, y transmitir potencias que van desde una fracción de vatio hasta billones de vatios. También cubre los diferentes tipos de acoplamientos rígidos y flexibles entre ejes.
This document discusses the 2016 Top 100 Mid-Sized Companies Survey in Kenya. It provides an introduction to the survey and some key findings. Specifically, it notes that the average revenue growth for companies on the list was 70% and that manufacturing companies made up the largest proportion of firms in the survey. It also recognizes the companies that were ranked in the top 20 positions and highlights some of the services provided by the number 11 ranked company, Polucon Services.
Presented on 10/11/12 at the Boston Elasticsearch meetup held at the Microsoft New England Research & Development Center. This talk gave a very high-level overview of Elasticsearch to newcomers and explained why ES is a good fit for Traackr's use case.
The document describes a presentation about rapidly prototyping with Solr. It will demonstrate ingesting documents into Solr, adjusting Solr's schema, and showcasing data in a flexible search UI. The presentation will cover faceting, highlighting, spellchecking, and debugging. Time will also be spent outlining next steps to develop and take the search application to production.
Elasticsearch – mye mer enn søk! [JavaZone 2013]foundsearch
Søkemotorer kan løse langt fler utfordringer enn en søkeboks gir. Du har kanskje et søkeproblem uten å være klar over det?
Elasticsearch, en open source søkemotor bygd på Lucene, får stadig mer oppmerksomhet - ikke bare fordi den er glimrende til å løse typiske søkeproblemer, men også fordi den kan brukes til analyse- og "big data"-utfordringer.
Foredraget gir en oversikt over hva søkemotorer er gode på, relaterte problemer du kommer over, hvordan Elasticsearch kan bidra – samt hvordan den passer inn i teknologistacken din.
Det er ingen tutorial, men med et relativt høyt tempo og eksempler med realistisk kompleksitet gis en oversikt over hva som er mulig.
Vi runder av med hvordan Elasticsearch kan klassifiseres i mylderet av "NoSQL"-databaser.
This document summarizes a presentation about rapid prototyping with Solr. It discusses getting documents indexed into Solr quickly, adjusting Solr's schema to better match needs, and showcasing data in a flexible search UI. It outlines how to leverage faceting, highlighting, spellchecking and debugging in rapid prototyping. Finally, it discusses next steps in developing a search application and taking it to production.
1.) A graph database called Neo4j was created in the 1990s by three guys who had a problem related to language translation. They realized that graphs could model the relationships between concepts. Neo4j became popular for modeling social networks, recommendation systems, and other applications that involve interconnected data.
2.) Neo4j started as an idea, progressed to a prototype, and is now used in production by many companies to model complex relationships in domains like social media and knowledge graphs. It allows modeling data as nodes connected by relationships, and uses the Cypher query language.
3.) Some popular applications built using Neo4j include the digital paper app 53 and customer mapping tools that model how
Approach to find critical vulnerabilitiesAshish Kunwar
The document discusses an approach to finding critical vulnerabilities through reconnaissance techniques like port scanning, content discovery, and searching for unprotected assets. It provides 4 examples of vulnerabilities found, including taking over an unauthenticated Elastic search, leaking Kibana credentials, exploiting SSRF to achieve remote code execution via Ghostscript, and cracking an IKE hash to access a vulnerable VPN. The presentation aims to demonstrate methods for vulnerability research and responsible disclosure of issues found.
ElasticSearch is a flexible and powerful open source, distributed real-time search and analytics engine for the cloud. It is JSON-oriented, uses a RESTful API, and has a schema-free design. Logstash is a tool for collecting, parsing, and storing logs and events in ElasticSearch for later use and analysis. It has many input, filter, and output plugins to collect data from various sources, parse it, and send it to destinations like ElasticSearch. Kibana works with ElasticSearch to visualize and explore stored logs and data.
Mastering ElasticSearch with Ruby and TireLuca Bonmassar
A tutorial on what is ElasticSearch and how to use it effectively in a real project.
The talk discusses how to integrate a search experience in an existing application, showing all the steps from downloading&configuring elastic search, to building the UI and wire the search logic (in a Rails application).
The talk was presented at RubyConf 2013.
The document discusses ElasticSearch, an open source search engine and database. It describes how ElasticSearch allows data to flow from various sources into an index using Rivers. It also explains key ElasticSearch concepts like shards, replicas, and index aliases that improve scalability and performance. The document provides examples of ElasticSearch REST API calls for indexing, searching, and retrieving documents.
Sprockets is an easy solution to managing large JavaScript codebases by letting you structure it, bundle it with related assets, and consolidate it as one single file, with pre-baked command-line tooling, CGI front and Rails plugin. It's a framework-agnostic open-source solution that makes for great serving performance while helping you structure and manage your codebase better.
Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
This document provides an overview and instructions for setting up the ELK stack (Elasticsearch, Logstash, Kibana) for attack monitoring. It discusses the components, architecture, and configuration of ELK. It also covers installing and configuring Filebeat for centralized logging, using Kibana dashboards for visualization, and integrating osquery for internal alerting and attack monitoring.
This document provides a summary of Elasticsearch by Tom Chen. It discusses that Elasticsearch is a powerful open source search and analytics engine that is distributed, scalable and real-time. It can be used for storing, searching and analyzing large volumes of data. The document then highlights some of Elasticsearch's key features, including its powerful search capabilities using Lucene queries, and aggregations that allow faceted searches and results. Code examples are provided to demonstrate indexing data and running searches and aggregations. Finally, the document mentions a code example on GitHub that uses Elasticsearch to build a search function for a WordPress site.
Terrastore - A document database for developersSergio Bossa
Sergio Bossa is a software architect and engineer who has worked on online gambling and casino software. He is an open source enthusiast who has contributed to projects like Spring, Terracotta, and Terrastore. Terrastore is a document database for developers that is document-based, consistent, distributed, scalable, and written in Java using Terracotta. It allows for easy installation, no complex configuration, and simple basic operations like putting and getting documents from buckets. It also supports features like range queries, predicate queries, server-side updates, and easy scalability. Terrastore is best suited for data hot spots, computational data, complex or variable data, and throw-away data.
Delivered at Velocity Europe in Barcelona, this talk introduces "ops" people to the idea of user centered design, touching on several techniques long used in the design world, and talks about how those ideas might be applied to software and processes that we use every day.
Why and How Powershell will rule the Command Line - Barcamp LA 4Ilya Haykinson
PowerShell is a command shell for Windows that treats commands as objects that interact through pipes and objects. It provides a fully-fledged programming language where commands manipulate objects and share a common naming convention. PowerShell holds that commands should do one thing well and interact through a consistent environment, addressing issues with text parsing between traditional command line programs.
This document provides an overview of Elasticsearch including:
- Elasticsearch is a distributed, real-time search and analytics engine. It allows storing, searching, and analyzing big volumes of data in near real-time.
- Documents are stored in indexes which can be queried using a RESTful API or with query languages like the Query DSL.
- CRUD operations allow indexing, retrieving, updating, and deleting documents. More operations can be performed efficiently using the bulk API.
- Documents are analyzed and indexed to support full-text search queries and structured queries against specific fields. Mappings and analyzers define how text is processed for searching.
Elasticsearch is recommended to create an archive to search ACM/BPM case and process data that is up to 7 years old. Elasticsearch allows storing and searching large volumes of data quickly and in near real-time. It was tested by uploading over 40,000 documents from a use case involving tweets. This allowed full-text search of case data and searching within office documents. While Elasticsearch is schema-less and easy to evolve with Oracle releases, its limitations regarding transactions and an overview of case history would need to be considered.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Trusted Execution Environment for Decentralized Process MiningLucaBarbaro3
Presentation of the paper "Trusted Execution Environment for Decentralized Process Mining" given during the CAiSE 2024 Conference in Cyprus on June 7, 2024.
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
A Comprehensive Guide to DeFi Development Services in 2024Intelisync
DeFi represents a paradigm shift in the financial industry. Instead of relying on traditional, centralized institutions like banks, DeFi leverages blockchain technology to create a decentralized network of financial services. This means that financial transactions can occur directly between parties, without intermediaries, using smart contracts on platforms like Ethereum.
In 2024, we are witnessing an explosion of new DeFi projects and protocols, each pushing the boundaries of what’s possible in finance.
In summary, DeFi in 2024 is not just a trend; it’s a revolution that democratizes finance, enhances security and transparency, and fosters continuous innovation. As we proceed through this presentation, we'll explore the various components and services of DeFi in detail, shedding light on how they are transforming the financial landscape.
At Intelisync, we specialize in providing comprehensive DeFi development services tailored to meet the unique needs of our clients. From smart contract development to dApp creation and security audits, we ensure that your DeFi project is built with innovation, security, and scalability in mind. Trust Intelisync to guide you through the intricate landscape of decentralized finance and unlock the full potential of blockchain technology.
Ready to take your DeFi project to the next level? Partner with Intelisync for expert DeFi development services today!
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3Data Hops
Free A4 downloadable and printable Cyber Security, Social Engineering Safety and security Training Posters . Promote security awareness in the home or workplace. Lock them Out From training providers datahops.com
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
2. speaker : @mathieuel
• freelance & founder @oneplaylist
• full stack skills
• see what i’ve done on http://www.mathieuelie.net
mardi 17 décembre 13
3. goal
• go from first steps
• and get over first frustation
• give the you the power needed to learn by
yourself
mardi 17 décembre 13
4. install
• be sure you have java runtime
• apt-get install openjdk-6-jre-headless -y
• consider oracle jvm
mardi 17 décembre 13
5. unzip and run !
## Get the latest stable archive
wget https://download.elasticsearch.org/elasticsearch/
elasticsearch/elasticsearch-0.90.7.zip
## Extract the archive
unzip elasticsearch-0.90.7.zip
cd elasticsearch-0.90.7
## run !
# This will run elasticsearch on foreground.
./bin/elasticsearch -f
mardi 17 décembre 13
6. its alive !
[2013-12-13 15:45:25,187][INFO ][node
] [Bridge, George Washington]
version[0.90.7], pid[37998], build[36897d0/2013-11-13T12:06:54Z]
[2013-12-13 15:45:25,189][INFO ][node
] [Bridge, George Washington]
initializing ...
[2013-12-13 15:45:25,202][INFO ][plugins
] [Bridge, George Washington]
loaded [], sites []
[2013-12-13 15:45:28,342][INFO ][node
] [Bridge, George Washington]
initialized
[2013-12-13 15:45:28,342][INFO ][node
] [Bridge, George Washington]
starting ...
[2013-12-13 15:45:28,491][INFO ][transport
] [Bridge, George Washington]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.12:9300]}
[2013-12-13 15:45:31,545][INFO ][cluster.service
] [Bridge, George Washington]
new_master [Bridge, George Washington][pKCdh1b_TP2TlurO1gm4_g][inet[/192.168.1.12:9300]],
reason: zen-disco-join (elected_as_master)
[2013-12-13 15:45:31,577][INFO ][discovery
] [Bridge, George Washington]
elasticsearch/pKCdh1b_TP2TlurO1gm4_g
[2013-12-13 15:45:31,595][INFO ][http
] [Bridge, George Washington]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.12:9200]}
[2013-12-13 15:45:31,596][INFO ][node
] [Bridge, George Washington]
started
[2013-12-13 15:45:31,629][INFO ][gateway
] [Bridge, George Washington]
recovered [0] indices into cluster_state
mardi 17 décembre 13
7. ping es on port 9200
curl http://127.0.0.1:9200
{
"ok" : true,
"status" : 200,
"name" : "Gideon, Gregory",
"version" : {
"number" : "0.90.6",
"build_hash" : "e2a24efdde0cb7cc1b2071ffbbd1fd874a6d8d6b",
"build_timestamp" : "2013-11-04T13:44:16Z",
"build_snapshot" : false,
"lucene_version" : "4.5.1"
},
"tagline" : "You Know, for Search"
}%
mardi 17 décembre 13
8. Store a Document
curl -XPUT http://localhost:9200/workshop/site/1 -d '
{
"url": "http://www.elasticsearch.org",
"title": "Open Source Distributed Real Time Search & Analytics",
"description": "Elasticsearch is a powerful open source search and
analytics engine that makes data easy to explore.",
"tags": ["Open Source", "elasticsearch", "Distributed"]
}'
{"ok":true,"_index":"workshop","_type":"sites","_id":"1","_version":1}%
mardi 17 décembre 13
9. retreive the document
curl -XGET http://localhost:9200/workshop/site/1
{"_index":"workshop","_type":"site","_id":"1","_version":2,"exists":true,
"_source" :
{
"url": "http://www.elasticsearch.org",
"title": "Open Source Distributed Real Time Search & Analytics",
"description": "Elasticsearch is a powerful open source search and
analytics engine that makes data easy to explore.",
"tags": ["Open Source", "elasticsearch", "Distributed"]
}}%
mardi 17 décembre 13
10. add more documents
curl -XPUT http://localhost:9200/workshop/site/2 -d '
{
"url": "http://www.mathieu-elie.net",
"title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data
Visualization",
"description": "Freelance Consultant in Bordeaux, System & Software
Architect. Love dataviz, redis, elasticsearch, architecture scalability
recipes and playing with data.",
tags: ["elasticsearch", "Data Visualization"]
}'
curl -XPUT http://localhost:9200/workshop/site/3 -d '
{
"url": "http://www.giroll.org",
"title": "Collectif Giroll - Gironde Logiciels Libres",
"description": "Giroll, collectif basÎ È Bordeaux, rÎunis
autour des Logiciels et des Cultures libres. Ateliers tous les mardis de
18h30 È 20h30 et organisation d''Install Party Linux tous les six",
tags: ["Open Source", "Collectif"]
}'
mardi 17 décembre 13
12. curl 'http://localhost:9200/workshop/_search?pretty=true'
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [ {
"_index" : "workshop",
"_type" : "site",
"_id" : "1",
"_score" : 1.0, "_source" :
{
"url": "http://www.elasticsearch.org",
"title": "Open Source Distributed Real Time Search & Analytics",
"description": "Elasticsearch is a powerful open source search and analytics engine
that makes data easy to explore.",
"tags": ["Open Source", "elasticsearch", "Distributed"]
}
}, {
"_index" : "workshop",
"_type" : "site",
"_id" : "3",
"_score" : 1.0, "_source" :
{
"url": "http://www.giroll.org",
"title": "Collectif Giroll - Gironde Logiciels Libres",
"description": "Giroll, collectif basÎ È Bordeaux, rÎunis autour des Logiciels
et des Cultures libres. Ateliers tous les mardis de 18h30 È 20h30 et organisation
mardi 17 décembre 13
dInstall Party Linux tous les six",
13. ok great, but now i
want to search for
text !
mardi 17 décembre 13
14. step 1 : pass query as a
request body
curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d
'{
"query" : {
"match_all" : { }
}
}'
mardi 17 décembre 13
15. It returns all documents
because we use the match all query
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-match-all-query.html
mardi 17 décembre 13
16. match_all query is part of the queries dsl
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-queries.html
mardi 17 décembre 13
17. so lets use the
query_string query dsl
curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{
"query" : {
"query_string" : {
"query" : "elasticsearch"
}
}
}'
mardi 17 décembre 13
18. result is a a quiet
verbose lets get only
title and tags fields
curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{
"fields" : ["title", "tags"],
"query" : {
"query_string" : {
"query" : "elasticsearch"
}
}
}'
mardi 17 décembre 13
20. lets go for facets on tags !!
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/search-facets.html
do you see the wall ??? ;)
mardi 17 décembre 13
23. • hey ! see "Open Source" !
it is lower cased
and exploded in multiple tokens !
• this is done by the defautl mapping and
analyzer
mardi 17 décembre 13
25. • tags is a type of string and we have a default
analyzer
• http://www.elasticsearch.org/guide/en/
elasticsearch/reference/current/analysisstandard-analyzer.html
• An analyzer of type standard is built using
the Standard Tokenizer with the Standard
Token Filter, Lower Case Token Filter, and
Stop Token Filter.
mardi 17 décembre 13
27. • what about keyword analyzer ?
• http://www.elasticsearch.org/guide/en/
elasticsearch/reference/current/analysiskeyword-analyzer.html
mardi 17 décembre 13
29. curl
{
'http://localhost:9200/workshop/site/_mapping?pretty=true' -d '
"site" : {
"properties" : {
"url" : {"type" : "string"},
"title" : {"type" : "string"},
"description" : {"type" : "string"},
"tags" : {"type" : "string", "analyzer": "keyword" }
}
}
}
'
{
"error" : "MergeMappingException[Merge failed with failures {[mapper
[tags] has different index_analyzer]}]",
"status" : 400
}
oops ! we need to drop something..
mardi 17 décembre 13
30. curl -XDELETE 'http://localhost:9200/workshop/'
{"ok":true,"acknowledged":true}%
# index should exists if we want to put mapping..
curl -XPUT 'http://localhost:9200/workshop/'
{"ok":true,"acknowledged":true}%
curl
{
'http://localhost:9200/workshop/site/_mapping?pretty=true' -d '
"site" : {
"properties" : {
"url" : {"type" : "string"},
"title" : {"type" : "string"},
"description" : {"type" : "string"},
"tags" : {"type" : "string", "analyzer": "keyword" }
}
}
}
'
{"ok":true,"acknowledged":true}%
mardi 17 décembre 13
31. # test on the field analysis
curl -XGET 'localhost:9200/workshop/_analyze?
pretty=true&field=site.tags' -d 'Open Source'
{
"tokens" : [ {
"token" : "Open Source",
"start_offset" : 0,
"end_offset" : 11,
"type" : "word",
"position" : 1
} ]
}
# congrats !
mardi 17 décembre 13
32. # lets push data again
curl -XPUT http://localhost:9200/workshop/site/1 -d '
{
"url": "http://www.elasticsearch.org",
"title": "Open Source Distributed Real Time Search & Analytics",
"description": "Elasticsearch is a powerful open source search and
analytics engine that makes data easy to explore.",
"tags": ["Open Source", "elasticsearch", "Distributed"]
}'
curl -XPUT http://localhost:9200/workshop/site/2 -d '
{
"url": "http://www.mathieu-elie.net",
"title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data
Visualization",
"description": "Freelance Consultant in Bordeaux, System & Software
Architect. Love dataviz, redis, elasticsearch, architecture scalability
recipes and playing with data.",
tags: ["elasticsearch", "Data Visualization"]
}'
curl -XPUT http://localhost:9200/workshop/site/3 -d '
{
"url": "http://www.giroll.org",
"title": "Collectif Giroll - Gironde Logiciels Libres",
"description": "Giroll, collectif basÎ È Bordeaux, rÎunis autour
des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 √
mardi 17 décembre 13
35. if want only docs with "Open Source" tag
we use filters
http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/query-dsl-filters.html
and term filter
mardi 17 décembre 13
45. RTFM WAY
• common mistake: the code example are
not showing always whole query
• so you should replace the code in the doc
in the whole dsl hierarchy
• think about hierarchy and everything
should be more clear
mardi 17 décembre 13
46. the end for me...
the begguining for you...
mardi 17 décembre 13
47. questions and more
• twitter @mathieuel
• contact on my freelance website
• http://www.mathieu-elie.net
• thanks to giroll for hosting this workshop !
mardi 17 décembre 13