This document discusses how Elasticsearch is used by Cogenta to power their real-time retail intelligence platform. It tracks hundreds of eCommerce sites daily, organizing large amounts of data into a high-quality market view. Elasticsearch allows Cogenta to scale their processing and analytics capabilities, provide high availability, and power various use cases like logging, internal analytics, and reporting.
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
My talk from GOTO Aarhus, 30th September 2014. Cogenta is a retail intelligence company which tracks ecommerce web sites around the world to provide competitive monitoring and analysis services to retailers. Using its proprietary crawler technology, Lucene and SQL Server, a stream of 20 million raw product data entries is captured and processed each day. This case study looks at how Cogenta uses Elasticsearch to break the shackles imposed by the RDBMS (and a limited budget) to make the data available in real time to its customers.
Cogenta uses SQL as its canonical store & for complex reporting, and Elasticsearch for real-time processing & to drive its SaaS web applications. Elasticsearch is easy to use, delivers the powerful features of Lucene and enables the data & platform cost to scale linearly. But… synchronising your existing data in two places presents some interesting challenges such as aggregation and concurrency control. This talk will take a detailed look at how Cogenta how overcame those challenges, with a perpetually changing and asynchronously updated dataset.
http://gotocon.com/aarhus-2014/presentation/Cogenta%20-%20Making%20Enterprise%20Data%20Available%20in%20Real%20Time%20with%20Elasticsearch
This presentation summarizes how we use Elasticsearch for analytics at Wingify for our product Visual Website Optimizer (http://vwo.com). This presentation was prepared for my poster session at The Fifth Elephant (https://funnel.hasgeek.com/fifthel2014/1143-using-elasticsearch-for-analytics).
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
My talk from GOTO Aarhus, 30th September 2014. Cogenta is a retail intelligence company which tracks ecommerce web sites around the world to provide competitive monitoring and analysis services to retailers. Using its proprietary crawler technology, Lucene and SQL Server, a stream of 20 million raw product data entries is captured and processed each day. This case study looks at how Cogenta uses Elasticsearch to break the shackles imposed by the RDBMS (and a limited budget) to make the data available in real time to its customers.
Cogenta uses SQL as its canonical store & for complex reporting, and Elasticsearch for real-time processing & to drive its SaaS web applications. Elasticsearch is easy to use, delivers the powerful features of Lucene and enables the data & platform cost to scale linearly. But… synchronising your existing data in two places presents some interesting challenges such as aggregation and concurrency control. This talk will take a detailed look at how Cogenta how overcame those challenges, with a perpetually changing and asynchronously updated dataset.
http://gotocon.com/aarhus-2014/presentation/Cogenta%20-%20Making%20Enterprise%20Data%20Available%20in%20Real%20Time%20with%20Elasticsearch
This presentation summarizes how we use Elasticsearch for analytics at Wingify for our product Visual Website Optimizer (http://vwo.com). This presentation was prepared for my poster session at The Fifth Elephant (https://funnel.hasgeek.com/fifthel2014/1143-using-elasticsearch-for-analytics).
ELK Stack (Elasticsearch, Logstash, Kibana) as a Log-Management solution for the Microsoft developer presented at the .net Usergroup in Munich in June 2015.
Logging is one of those things that everyone complains about, but doesn't dedicate time to. Of course, the first rule of logging is "do it". Without that, you have no visibility into system activities when investigations are required. But, the end goal is much, much more than this. Almost all applications require security audit logs for compliance; application logs for visibility across all cloud properties; and application tracing for tracking usage patterns and business intelligence. The latter is that magic sauce that helps businesses learn about their customer or in some cases the data is FOR the customer. Without a strategy this can get very messy, fast. In this session Michele will discuss design patterns for a sound logging and audit strategy; considerations for security and compliance; the benefits of a noSQL approach; and more.
How to get the best of both: MongoDB is great for low latency quick access of recent data; Treasure Data is great for infinitely growing store of historical data. In the latter case, one need not worry about scaling.
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
My workshop at the Learning Analytics Summer Institute (LASI) 2016: http://lasi16.snola.es/#!/schedule/113
Educational data continues to grow in volume, velocity and variety. Making sense of the educational data in such conditions requires deployment and usage of appropriate scalable, real-time processing tools supporting a flexible data schema. Elasticsearch is one of the popular open-source tools meeting the enlisted requirements. Initially envisioned as a search engine capable of operating at scale and in real time, Elasticsearch is used by organisations such as Wikimedia and Github, which deal with big data on daily basis. In addition, Elasticsearch is used increasingly often as analytics platform thanks to its scalable architecture and expressive query language. Until recently, the exploitation of Elasticsearch for (learning) analytical purposes by practitioners was hindered by a high entrance barrier due to the complexity of the query language and the query specificities. This is currently changing with the ongoing development of Kibana, an open-source tool that allows to conduct analysis and build visualisations of Elasticsearch data through a graphical user interface. Kibana does not require the user to dive into technical details of the queries (although it is still possible) and hence makes big educational data visualisations accessible to regular users. The additional value of Kibana comes in play whenever several visualisations are combined on a single dashboard, enabling to use multiple coordinated views for an interactive explorative analysis. Both Elasticsearch and Kibana, together with Logstash are part of an analytics stack often referred to as ELK. Logstash supports data acquisition from multiple sources (including twitter, RSS, event logs) thanks to its rich set of available connectors. Custom connectors can be developed for case-specific sources. In addition to the mentioned values, ELK enables building analytics infrastructures decoupled from the learning platform, i.e., it allows to host separately the learning environment (with the analytics functionalities) and the data storage without affecting the end-user experience.
ELK Stack (Elasticsearch, Logstash, Kibana) as a Log-Management solution for the Microsoft developer presented at the .net Usergroup in Munich in June 2015.
Logging is one of those things that everyone complains about, but doesn't dedicate time to. Of course, the first rule of logging is "do it". Without that, you have no visibility into system activities when investigations are required. But, the end goal is much, much more than this. Almost all applications require security audit logs for compliance; application logs for visibility across all cloud properties; and application tracing for tracking usage patterns and business intelligence. The latter is that magic sauce that helps businesses learn about their customer or in some cases the data is FOR the customer. Without a strategy this can get very messy, fast. In this session Michele will discuss design patterns for a sound logging and audit strategy; considerations for security and compliance; the benefits of a noSQL approach; and more.
How to get the best of both: MongoDB is great for low latency quick access of recent data; Treasure Data is great for infinitely growing store of historical data. In the latter case, one need not worry about scaling.
We went over what Big Data is and it's value. This talk will cover the details of Elasticsearch, a Big Data solution. Elasticsearch is an NoSQL-backed search engine using a HDFS-based filesystem.
We'll cover:
• Elasticsearch basics
• Setting up a development environment
• Loading data
• Searching data using REST
• Searching data using NEST, the .NET interface
• Understanding Scores
Finally, I show a use-case for data mining using Elasticsearch.
You'll walk away from this armed with the knowledge to add Elasticsearch to your data analysis toolkit and your applications.
Interactive learning analytics dashboards with ELK (Elasticsearch Logstash Ki...Andrii Vozniuk
My workshop at the Learning Analytics Summer Institute (LASI) 2016: http://lasi16.snola.es/#!/schedule/113
Educational data continues to grow in volume, velocity and variety. Making sense of the educational data in such conditions requires deployment and usage of appropriate scalable, real-time processing tools supporting a flexible data schema. Elasticsearch is one of the popular open-source tools meeting the enlisted requirements. Initially envisioned as a search engine capable of operating at scale and in real time, Elasticsearch is used by organisations such as Wikimedia and Github, which deal with big data on daily basis. In addition, Elasticsearch is used increasingly often as analytics platform thanks to its scalable architecture and expressive query language. Until recently, the exploitation of Elasticsearch for (learning) analytical purposes by practitioners was hindered by a high entrance barrier due to the complexity of the query language and the query specificities. This is currently changing with the ongoing development of Kibana, an open-source tool that allows to conduct analysis and build visualisations of Elasticsearch data through a graphical user interface. Kibana does not require the user to dive into technical details of the queries (although it is still possible) and hence makes big educational data visualisations accessible to regular users. The additional value of Kibana comes in play whenever several visualisations are combined on a single dashboard, enabling to use multiple coordinated views for an interactive explorative analysis. Both Elasticsearch and Kibana, together with Logstash are part of an analytics stack often referred to as ELK. Logstash supports data acquisition from multiple sources (including twitter, RSS, event logs) thanks to its rich set of available connectors. Custom connectors can be developed for case-specific sources. In addition to the mentioned values, ELK enables building analytics infrastructures decoupled from the learning platform, i.e., it allows to host separately the learning environment (with the analytics functionalities) and the data storage without affecting the end-user experience.
Deep Dive on ElasticSearch Meetup event on 23rd May '15 at www.meetup.com/abctalks
Agenda:
1) Introduction to NOSQL
2) What is ElasticSearch and why is it required
3) ElasticSearch architecture
4) Installation of ElasticSearch
5) Hands on session on ElasticSearch
Hypotheek oversluiten, historisch lage rente, Wanneer profiteren van de lage rente?,Verschuldigde boeterente, Gedeeltelijk boetevrije aflossen, Wanneer is oversluiten met een lagere rente financieel interessant ?
Elastic Search Meetup Special - Yann Cluchey, Cogenta Internet World
The Marketplace - June 18th, 17:00-18:00
Learn from Cogenta about how they are using Elasticsearch to deliver real-time data insights to their business.
By 2020, 50% of all new software will process machine-generated data of some sort (Gartner). Historically, machine data use cases have required non-SQL data stores like Splunk, Elasticsearch, or InfluxDB.
Today, new SQL DB architectures rival the non-SQL solutions in ease of use, scalability, cost, and performance. Please join this webinar for a detailed comparison of machine data management approaches.
Achieve new levels of performance for Magento e-commerce sites.Clustrix
If you run a Magento store that is impacted negatively by catalog updates or indexing/reindexing, listen in. Avoid catalog updates impacting your checkouts, site downtime and long page view load time with the ClustrixDB for Magento Bundle. Created exclusively for high-volume/complex-catalog retailers, this replacement backend is a proven upgrade for Magento sites.
Let us show you how this all works. Recently, at this year’s Magento Imagine, we ran a LIVE demo of the ClustrixDB for Magento Bundle. Stats from the demo:
System ran for 50 hours -- reindexing up to every 12 minutes.
2.6 million orders processed (14.6 per second)
147 million page views (816 per second)
Average Response time of 267ms
0% error rate, 100% checkout uptime
In this webinar, we’ll also cover:
How you can enable catalog updates to process in the background without affecting the normal operations of your site with the Clustrix Shadow (re)Indexer
A Magento-approved alternative database to MySQL that scales performance up/down as your add/subtract commodity nodes to the cluster. ClustrixDB has no read slaves, replication lag or sharding and Flexes Up and down to deliver exactly the right amount of performance and cost every month of the year.
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.
The Agile Data Warehouse Webinar – Next Generation BIRightScale
RightScale Webinar: February 4, 2010 – Watch this webinar and find out about the new approach to BI that takes advantage of the cloud as a disruptive technology combined with next-generation software applications. Watch the video at http://vimeo.com/rightscale/the-agile-data-warehouse-webinar-next-generation-bi.
An Introduction to the Speakers & What BlazeClan as an AWS Advanced Consulting Partners does and how it has Evolved. Varoon, Our Solution Architect, Specializing on Amazon Redshift, Talks about the Key differentiators of Amazon Redshift. Learn why & how Exactly Redshift can optimize your Time and Efforts & reduce costs by 1/10th the cost of a traditional warehouse solution. A Demo of Amazon Redshift in action, processing 2billion records in a matter of seconds! A casestudy of one of our products, Cloudlytics, and how it extensively user Amazon Redshift.
We had conducted a webinar on Amazon Redshift, you can also view the Video of the Webinar along with the Q & A at the end of the Slideshare.
Amazon Redshift, Customer Acquisition Cost & Advertising ROI presented with A...Amazon Web Services
In today's world, consumer habits change fast and marketing decisions need to be made within seconds, not days. Delivering engaging advertising experiences requires real time, high performing architectures that provide digital advertisers the ability to measure and improve the performance of their campaigns and tie them more closely to corporate goals. The insights gleaned from the massive amounts of data collected can then be used to dynamically adjust media spend and creative execution for optimal performance. The AWS Cloud enables you to deliver marketing content and advertisements with the levels of availability, performance, and personalization that your customers expect. Plus, AWS lowers your costs. Join us to learn about how big data and low latency / high performing architectures are changing the game for digital advertising.
Pragmatic CQRS with existing applications and databases (Digital Xchange, May...Lucas Jellema
Put very simply: CQRS (Command Query Responsibility
Segregation) is the notion that it may be wise to separate the database that processes data manipulations from the engines that handle queries. When data retrieval requires special formats, scale, availability, TCO, location, search options and response times, it is worth considering introducing additional
databases to handle those specific needs. Many organizations have a data warehouse implemented in a separate database - so the idea is not completely new. The CQRS pattern takes this existing concept to new levels, by complementing the core OLTP database with other data stores, such as Elastic Search, MongoDB, Apache Cassandra and Neo4J, and synchronizing them in near real time. This session discusses use cases for CQRS - the why - and pragmatic considerations - the how. Important challenges and decisions include how to detect changes and extract data, how to transport, convert and apply the changes in a reliable, timely manner that ensures the right level of consistency. Several demos are shown to clarify some concepts and technologies. The target audience for this session consists of server side developers and application architects.
Learn what you need to consider when moving from the world of relational databases to a NoSQL document store.
Hear from Developer Advocate Glynn Bird as he explains the key differences between relational databases and JSON document stores like Cloudant, as well as how to dodge the pitfalls of migrating from a relational database to NoSQL.
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
In November 2015, Capital Games launched a mobile game accompanying a major feature film release. The back end of the game is hosted in AWS and uses big data services like Amazon Kinesis, Amazon EC2, Amazon S3, Amazon Redshift, and AWS Data Pipeline. Capital Games will describe some of their challenges on their initial setup and usage of Amazon Redshift and Amazon EMR. They will then go over their engagement with AWS Partner 47lining and talk about specific best practices regarding solution architecture, data transformation pipelines, and system maintenance using AWS big data services. Attendees of this session should expect a candid view of the process to implementing a big data solution. From problem statement identification to visualizing data, with an in-depth look at the technical challenges and hurdles along the way.
AWS re:Invent 2016: [REPEAT] How EA Leveraged Amazon Redshift and AWS Partner...Amazon Web Services
In November 2015, Capital Games launched a mobile game accompanying a major feature film release. The back end of the game is hosted in AWS and uses big data services like Amazon Kinesis, Amazon EC2, Amazon S3, Amazon Redshift, and AWS Data Pipeline. Capital Games will describe some of their challenges on their initial setup and usage of Amazon Redshift and Amazon EMR. They will then go over their engagement with AWS Partner 47lining and talk about specific best practices regarding solution architecture, data transformation pipelines, and system maintenance using AWS big data services. Attendees of this session should expect a candid view of the process to implementing a big data solution. From problem statement identification to visualizing data, with an in-depth look at the technical challenges and hurdles along the way.
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
Amazon Redshift is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. This presentation will give an introduction to the service and its pricing before diving into how it delivers fast query performance on data sets ranging from hundreds of gigabytes to a petabyte or more.
Steffen Krause, Technical Evangelist, AWS
Padraic Mulligan, Architect and Lead Developer and Mike McCarthy, CTO, Skillspage
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
We have seen rapid adoption of C* at eBay in past two years. We have made tremendous efforts to integrate C* into existing database platforms, including Oracle, MySQL, Postgres, MongoDB, XMP etc.. We also scale C* to meet business requirement and encountered technical challenges you only see at eBay scale, 100TB data on hundreds of nodes. We will share our experience of deployment automation, managing, monitoring, reporting for both Apache Cassandra and DataStax enterprise.
Similar to Lightning talk: elasticsearch at Cogenta (20)
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
2. •Real-time retail intelligence
•Tracking 100s of eCommerce sites
•Organise into high quality market view
•Competitive intelligence to retailers
and manufacturers (SaaS)
3. How it Works
•Tracking millions of products,
around the world, daily
•Distributed Processing Pipeline
•3K/s peak
Master
Product DB
Cleaned &
Validated
Raw Data Processor
Processor
Processors
HTML, etc.
Agents
4. Challenges
•Batch indexing
• SQL boxes can’t get bigger
• Don’t ask for that
Copy
Web Server Pool
DMZ
WebUK1
Load Balancers
WebUK2 WebUK3
Batch Sync Indexing Index
Master DB Snapshot
DB
5. Elasticsearch in the Stack
• Batch indexing
Batch Sync Indexing Index
Master DB Snapshot
DB
Copy
Web Server Pool
DMZ
WebUK1
Load Balancers
WebUK2 WebUK3
6. Elasticsearch in the Stack
• Real-time indexing
• Simultaneous writes to
SQL + ES
Master DB
Real-Time
Indexing
API Calls
Index
Web Server Pool
DMZ
Elasticsearch Cluster
WebUK1
Load Balancers
WebUK2 WebUK3
… WebUKn
…
Elastic1 Elastic2 Elastic3 Elasticn
7. Benefits
•Scalability, in both directions
•Flexibility
•High availability
•Cheaper to run
•Powerful features
8. Use Cases
•Logging (Graylog2)
•Internal Analytics
•Text Processing & IR
•Driving Web Apps
Filtering, aggregations, scripting
•Reporting
9. THANKS
Read more elasticsearch.org/case-study/cogenta/
Get in touch @YannCluchey