Talk given on NLP at the Elasticsearch meetup in Berlin in February 2017. Discusses word embeddings for product classification, generation of product descriptions and chat bots.
"Maximizing ROI in eCommerce with Search" presented on June 12 at SES Toronto 2012, by Guillaume Bouchard, president and CEO at NVI.
This slideshow presents :
- Choosing the right e-Commerce platform
- E-Commerce sites: 2009 vs. 2012
- IP geolocation & browser language detection
- Rich snippets
- Mobile e-Commerce
- e-Commerce site as a destination
- Social signals
DiscoRank: optimizing discoverability on SoundCloudAmélie Anglade
These are the slides of the presentation I gave at the Realtime Conf EU on 23rd April 2013.
The full abstract of the talk can be found here: http://lanyrd.com/2013/realtime-conf-europe/scdtyf/
Chicago AWS user group meetup - May 2014 at CohesiveAWS Chicago
Chicago AWS user group meetup - May 2014 at Cohesive
All slides from the May 2014 Meetup. Talks included:
• "Mining crypto currency on AWS spot instance" - Scott VanDenPlas, Engineer at el el see @scottvdp
• "HA for healthcare" - Ryan Koop, Director of Products & Marketing, Cohesive @ryankoop
• "Using AWS for HA at BrightTag" - Matt Kemp, Engineer of Things™ at BrightTag @mattkemp
• So nice, he's talking twice. - Scott VanDenPlas, Engineer at el el see @scottvdp
Join us again June 24 at Mediafly and in July back at Cohesive!
Gaurav dev ops (AWS, Linux, Automation-ansible, jenkins:CI and CD:Ansible)Gaurav Srivastav
A determined and resourceful professional having experience for Linux system administration, and AWS DevOps. Now, presently I am functioning with "Nearet" as AWS DevOps Engineer.
I am responsible for building out and improving the reliability and maintain the performance of applications and cloud infrastructure deployed on "Amazon Web Services", focusing on automation tools availability and performance like Ansible, Elasticsearch, New relic, Jenkins etc.
I am optimistic about my skills and am convinced my knowledge would make me well suited for this position. I believe in competitive environments and possess exceptional abilities to work as a System/AWS DevOps Engineer.
Apache Ambari: Managing Hadoop and YARNHortonworks
Part of the Hortonworks YARN Ready Webinar Series, this session is about management of Apache Hadoop and YARN using Apache Ambari. This series targets developers and we will feature a demo on Ambari.
"Maximizing ROI in eCommerce with Search" presented on June 12 at SES Toronto 2012, by Guillaume Bouchard, president and CEO at NVI.
This slideshow presents :
- Choosing the right e-Commerce platform
- E-Commerce sites: 2009 vs. 2012
- IP geolocation & browser language detection
- Rich snippets
- Mobile e-Commerce
- e-Commerce site as a destination
- Social signals
DiscoRank: optimizing discoverability on SoundCloudAmélie Anglade
These are the slides of the presentation I gave at the Realtime Conf EU on 23rd April 2013.
The full abstract of the talk can be found here: http://lanyrd.com/2013/realtime-conf-europe/scdtyf/
Chicago AWS user group meetup - May 2014 at CohesiveAWS Chicago
Chicago AWS user group meetup - May 2014 at Cohesive
All slides from the May 2014 Meetup. Talks included:
• "Mining crypto currency on AWS spot instance" - Scott VanDenPlas, Engineer at el el see @scottvdp
• "HA for healthcare" - Ryan Koop, Director of Products & Marketing, Cohesive @ryankoop
• "Using AWS for HA at BrightTag" - Matt Kemp, Engineer of Things™ at BrightTag @mattkemp
• So nice, he's talking twice. - Scott VanDenPlas, Engineer at el el see @scottvdp
Join us again June 24 at Mediafly and in July back at Cohesive!
Gaurav dev ops (AWS, Linux, Automation-ansible, jenkins:CI and CD:Ansible)Gaurav Srivastav
A determined and resourceful professional having experience for Linux system administration, and AWS DevOps. Now, presently I am functioning with "Nearet" as AWS DevOps Engineer.
I am responsible for building out and improving the reliability and maintain the performance of applications and cloud infrastructure deployed on "Amazon Web Services", focusing on automation tools availability and performance like Ansible, Elasticsearch, New relic, Jenkins etc.
I am optimistic about my skills and am convinced my knowledge would make me well suited for this position. I believe in competitive environments and possess exceptional abilities to work as a System/AWS DevOps Engineer.
Apache Ambari: Managing Hadoop and YARNHortonworks
Part of the Hortonworks YARN Ready Webinar Series, this session is about management of Apache Hadoop and YARN using Apache Ambari. This series targets developers and we will feature a demo on Ambari.
Data is gravity. Your workloads and processing is dependent on where your data is and how it is stored. With AWS, you have a host of storage options and the key to successfully leverage them is to know when to use which option. This session will explain in details about each of the AWS Storage offerings along with data ingestion optins into the Cloud using Snowball and Snowmobile
Marc Trimuschat,
Head - Business Developement, AWS Storage, AWS APAC
Modern deployment and hosting environments have created a new dynamism to application, network, and development architectures. Compute resources are ephemeral, networks are instantiated and configured with API calls, and new versions of applications are deployed in seconds... security has not met this challenge gracefully. In this talk, we explore some new and interesting ways to make security reactive within cloud environments by dynamically changing the environment in response to and in preparation for security incidents, baby steps toward an architecture that can protect itself.
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
Monitoring can mean very different things to different people, and this often leads to confusion and misunderstandings. There are many offerings both free software and commercials, and it's not always clear where each fits in the bigger picture. This talk will look a bit at the history of monitoring, and then into the general categories of Metrics, Logs, Profiling and Distributed tracing and how each of these is important in Cloud-based environment.
Video: https://www.youtube.com/watch?v=hCBGyLRJ1qo
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMarcin Grzejszczak
The hype related to microservices continues. It’s already common knowledge that creating distributed systems is not easy. It’s high time to show how that complexity can be contained.
Service Discovery and Registry (Zookeeper / Consul / Eureka), easy request sending with client side load balancing (Feign + Ribbon), request proxying with Zuul. Everything is easy with Spring Cloud. Just add a dependency, a couple of lines of configuration and you’re ready to go.
That’s fixing difficulties related to writing code - what about solving the complexity of debugging distributed systems? Log correlation and visualizing latency of parts of the system? Spring Cloud Sleuth with Zipkin to the rescue!
The presentation will consist of some theory but there’ll also be live coding and demos.
Automated Infrastructure Security: Monitoring using FOSSSonatype
Madhu Akula, Automation Ninja
We can see attacks happening in real time using a dashboard. By collecting logs from various sources we will monitor & analyse. Using data gleaned from the logs, we can apply defensive rules against the attackers. We will use AWS for managing and securing the infrastructure discussed in our talk.
For most network engineers who monitor the perimeter for malicious content, it is very important to respond to an imminent threat originating from outside the boundaries of their network. Having to crunch through all the logs that the various devices (firewalls, routers, security appliances etc.) spit out, correlating that data and in real time making the right choices can prove to be a nightmare. Even with the solutions already available in the market.
As I have experienced this myself, as part of the Internal DevOps and Incident Response Teams, in several cases, I would want to create a space for interested folks to design, build, customise and deploy their very own FOSS based centralised visual attack monitoring dashboard. This setup would be able to perform real time analysis using the trusted ELK stack and visually denote what popular attack hotspots exist on a network.
At UCR, automation is a part of everything we do. When designing a new architecture and the set of new processes for our new Java based development environment we came up with a set of continuous integration and deployment tools to enable our developers to write and deploy their own applications in a flexible and secure environment.
Python Pants Build System for Large CodebasesAngad Singh
This talk is geared towards Infrastructure and system engineers who are interested in learning about structuring a large monorepo codebase, consisting of multiple micro services that share many dependencies. This talk will introduce Pants as a build system for such large monolithic codebase and how it ties with today’s container ecosystem principles.
https://pycon.sg/schedule/presentation/73/
API Management - Practical Enterprise Implementation ExperienceCapgemini
Narinder Sahota Chief Architect - Capgemini
David Rutter Solutions Architect - Capgemini
APIs are something we take for granted as a key part of modern architecture. This session will talk through the practical experiences of implementing a new cloud-based API Management capability within a mature Enterprise with a complex and business critical integration estate. The session will cover what we learnt about the maturity and evolution of the API Management service implemented during the project, the team model that enabled success, the business benefits achieved, and how the platform is now evolving.
Using Machine Learning at Scale: A Gaming Industry Experience!Databricks
Games earn more money than movies and music combined. That means a lot of data is generated as well. One of the development considerations for ML Pipeline is that it must be easy to use, maintain, and integrate. However, it doesn’t necessarily have to be developed from scratch. By using well-known libraries/frameworks and choice of efficient tools whenever possible, we can avoid “reinventing the wheel”, making it flexible and extensible.
Building search and discovery services for Schibsted (LSRS '17)Sandra Garcia
Presentation given at the Large Scale Recommender Systems workshop (LSRS) in Recsys 2017.
This presentation describes the search and discovery products we are working on in Schibsted for the domains of news and marketplaces as well as the challenges within each of these domains. It also covers how we bring these services into production including the system architecture and deployment process.
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Elasticsearch
https://www.elastic.co/elasticon/tour/2019/seoul/devsisters-game-service-integration-logging-platform-using-elastic-stack
데브시스터즈에서 서비스하고 있는 모든 게임에서 생성된 각종 로그들은 하나의 통합 로깅 플랫폼으로 수집되어 데이터 분석, 서버 운영 및 트러블슈팅, 고객 문의 대응 등 다양한 용도로 사용하고 있습니다. 본 발표에서는 이 통합 로깅 플랫폼에서 Elastic Stack이 어떻게 사용되는지 다룹니다. 구체적으로, Filebeat를 이용한 Kubernetes와 AWS EC2 환경에서의 로그 수집, Elasticsearch를 이용한 로그 조회 서비스 구성에 대해 살펴보며, 서비스 구축 및 운영 과정에서 발생한 이슈들의 해결 과정, 그리고 앞으로의 미래에 대해 이야기합니다.
Data is gravity. Your workloads and processing is dependent on where your data is and how it is stored. With AWS, you have a host of storage options and the key to successfully leverage them is to know when to use which option. This session will explain in details about each of the AWS Storage offerings along with data ingestion optins into the Cloud using Snowball and Snowmobile
Marc Trimuschat,
Head - Business Developement, AWS Storage, AWS APAC
Modern deployment and hosting environments have created a new dynamism to application, network, and development architectures. Compute resources are ephemeral, networks are instantiated and configured with API calls, and new versions of applications are deployed in seconds... security has not met this challenge gracefully. In this talk, we explore some new and interesting ways to make security reactive within cloud environments by dynamically changing the environment in response to and in preparation for security incidents, baby steps toward an architecture that can protect itself.
What does "monitoring" mean? (FOSDEM 2017)Brian Brazil
Monitoring can mean very different things to different people, and this often leads to confusion and misunderstandings. There are many offerings both free software and commercials, and it's not always clear where each fits in the bigger picture. This talk will look a bit at the history of monitoring, and then into the general categories of Metrics, Logs, Profiling and Distributed tracing and how each of these is important in Cloud-based environment.
Video: https://www.youtube.com/watch?v=hCBGyLRJ1qo
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUGMarcin Grzejszczak
The hype related to microservices continues. It’s already common knowledge that creating distributed systems is not easy. It’s high time to show how that complexity can be contained.
Service Discovery and Registry (Zookeeper / Consul / Eureka), easy request sending with client side load balancing (Feign + Ribbon), request proxying with Zuul. Everything is easy with Spring Cloud. Just add a dependency, a couple of lines of configuration and you’re ready to go.
That’s fixing difficulties related to writing code - what about solving the complexity of debugging distributed systems? Log correlation and visualizing latency of parts of the system? Spring Cloud Sleuth with Zipkin to the rescue!
The presentation will consist of some theory but there’ll also be live coding and demos.
Automated Infrastructure Security: Monitoring using FOSSSonatype
Madhu Akula, Automation Ninja
We can see attacks happening in real time using a dashboard. By collecting logs from various sources we will monitor & analyse. Using data gleaned from the logs, we can apply defensive rules against the attackers. We will use AWS for managing and securing the infrastructure discussed in our talk.
For most network engineers who monitor the perimeter for malicious content, it is very important to respond to an imminent threat originating from outside the boundaries of their network. Having to crunch through all the logs that the various devices (firewalls, routers, security appliances etc.) spit out, correlating that data and in real time making the right choices can prove to be a nightmare. Even with the solutions already available in the market.
As I have experienced this myself, as part of the Internal DevOps and Incident Response Teams, in several cases, I would want to create a space for interested folks to design, build, customise and deploy their very own FOSS based centralised visual attack monitoring dashboard. This setup would be able to perform real time analysis using the trusted ELK stack and visually denote what popular attack hotspots exist on a network.
At UCR, automation is a part of everything we do. When designing a new architecture and the set of new processes for our new Java based development environment we came up with a set of continuous integration and deployment tools to enable our developers to write and deploy their own applications in a flexible and secure environment.
Python Pants Build System for Large CodebasesAngad Singh
This talk is geared towards Infrastructure and system engineers who are interested in learning about structuring a large monorepo codebase, consisting of multiple micro services that share many dependencies. This talk will introduce Pants as a build system for such large monolithic codebase and how it ties with today’s container ecosystem principles.
https://pycon.sg/schedule/presentation/73/
API Management - Practical Enterprise Implementation ExperienceCapgemini
Narinder Sahota Chief Architect - Capgemini
David Rutter Solutions Architect - Capgemini
APIs are something we take for granted as a key part of modern architecture. This session will talk through the practical experiences of implementing a new cloud-based API Management capability within a mature Enterprise with a complex and business critical integration estate. The session will cover what we learnt about the maturity and evolution of the API Management service implemented during the project, the team model that enabled success, the business benefits achieved, and how the platform is now evolving.
Using Machine Learning at Scale: A Gaming Industry Experience!Databricks
Games earn more money than movies and music combined. That means a lot of data is generated as well. One of the development considerations for ML Pipeline is that it must be easy to use, maintain, and integrate. However, it doesn’t necessarily have to be developed from scratch. By using well-known libraries/frameworks and choice of efficient tools whenever possible, we can avoid “reinventing the wheel”, making it flexible and extensible.
Building search and discovery services for Schibsted (LSRS '17)Sandra Garcia
Presentation given at the Large Scale Recommender Systems workshop (LSRS) in Recsys 2017.
This presentation describes the search and discovery products we are working on in Schibsted for the domains of news and marketplaces as well as the challenges within each of these domains. It also covers how we bring these services into production including the system architecture and deployment process.
Nature is the ultimate complex system. Nature 1.0 is seeds & soil. *Evolving.* Nature 2.0 adds silicon & steel. *Evolving.*
Presented to Complex Systems Group, Stanford University, on May 4, 2018.
Customer Story: Elastic Stack을 이용한 게임 서비스 통합 로깅 플랫폼Elasticsearch
https://www.elastic.co/elasticon/tour/2019/seoul/devsisters-game-service-integration-logging-platform-using-elastic-stack
데브시스터즈에서 서비스하고 있는 모든 게임에서 생성된 각종 로그들은 하나의 통합 로깅 플랫폼으로 수집되어 데이터 분석, 서버 운영 및 트러블슈팅, 고객 문의 대응 등 다양한 용도로 사용하고 있습니다. 본 발표에서는 이 통합 로깅 플랫폼에서 Elastic Stack이 어떻게 사용되는지 다룹니다. 구체적으로, Filebeat를 이용한 Kubernetes와 AWS EC2 환경에서의 로그 수집, Elasticsearch를 이용한 로그 조회 서비스 구성에 대해 살펴보며, 서비스 구축 및 운영 과정에서 발생한 이슈들의 해결 과정, 그리고 앞으로의 미래에 대해 이야기합니다.
We've known for years that data-driven content was a 'thing' when we'd produce simple infographics that shared a few statistics and they'd get easy traction for us online. The game has lifted and consumers are becoming more and more obsessed with data and are now demanding higher quality and more complex data-driven content. The challenge for us now as "T-Shaped" marketers is that there are increasing demands for us to learn new skills to produce this content but we don't have the time to do this amongst the other things we need to be expert at.
This presentation is going to give you specific help on how to produce data-driven content without any programming skill. After watching this presentation you'll have the confidence to build your own data-driven content with the knowledge of:
- blueprints for data-driven content ideas
- scraping tools, frameworks and methodologies
- how to brief in a data scraping project to your in-house team or a freelancer
- how to turn your data into visually appealing content
- channels for promoting data-driven content to ensure it gets traction
idealo.de offers a price comparison service on millions of products from a wide range of categories. Each day we receive millions of offers that we cannot map to our product catalogue. We started clustering these offers to create new product clusters to ultimately enhance our product catalogue. For this we mainly use two open-source libraries:
Sentence-Transformers to encode the offers into a vector space
Facebook Faiss to do K-Nearest-Neighbours search in vector space
We will present our results for various optimisation strategies to fine-tune Transformers for our clustering use case. The strategies include siamese and triplet network architectures, as well as an approach with an additive angular margin loss. Results will also be compared against a probabilistic record linkage and TF-IDF approach.
Further, we will share our lessons learned e.g. how both libraries make Machine Learning Engineer‘s life fairly easy and how we created informative training data for our best performing solution.
Oracle Endeca 101 Developer Introduction High Level OverviewGordon Kiser
This slideshare gives developers a high level overview of the structure of an Oracle Commerce Experience Manager page used by business users to create scenarios and triggers that may control static pages and dynamic pages that automatically present content based on site visitor behavior.
This talk describes how tokens/decentralization and complex systems relate. Contents:
-blockchains as trust machines
-blockchains as incentive machines
-evolutionary algorithm design (and agent based simulation) for token design
-benevolent computer viruses (aka smart contracts)
-AI DAOs
-blockchains as life
Presented at Santa Fe Institute, New Mexico, Jan 31, 2018
Video at: https://medium.com/abq-blockchain-community/talking-blockchain-ai-complex-systems-3c5a33676f85
MongoDB and Ecommerce : A perfect combinationSteven Francia
Presentation given at the MongoDB NYC Meetup by Steve Francia, VP of Engineering at OpenSky. OpenSky uses MongoDB to develop the next ecommerce platform. OpenSky also uses Symfony 2, Doctrine 2, PHP 5.3, PHPUnit 3.5, jQuery, node.js, Git (with gitflow) and a touch of Java and Python. The OpenSky team contributes back to many of these technologies and employs core members of the Symfony 2 and Doctrine 2 teams.
This presentation provides an introduction to Azure DocumentDB. Topics include elastic scale, global distribution and guaranteed low latencies (with SLAs) - all in a managed document store that you can query using SQL and Javascript. We also review common scenarios and advanced Data Sciences scenarios.
Cross mobile testautomation mit Xamarin & SpecFlowChristian Hassa
Test automation can be implemented most efficiently as a by-product of Specification-By-Example (SbE). It combines acceptance criteria specification and acceptance test driven development (ATDD, BDD) to build automatically validated specifications of the system. The practice is well established in many teams for “traditional” enterprise application development (web clients, rich clients, services), and supported with a broad range of tools.
In mobile development, however, we seem to start over again with bare-bones test automation tool support that provokes post implementation test automation, which is costly and hard to maintain. Teams that had already successfully applied ATDD/BDD fall back into old habits when moving to mobile development. This is due to the lack of tool support and a lack of confidence that the principles that worked before can also be applied in mobile development.
Join Gaspar, Christian and Andreas for a brief introduction to BDD and Specification-By-Example. They’ll then show how it can be put into practice with SpecFlow and Calabash for a mobile app that is developed using Xamarin.
Similar to Applying NLP to product comparison at visual meta (20)
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Applying NLP to product comparison at visual meta
1. Applying NLP to Product Comparison at Visual Meta
1
Ross Turner
Elasticsearch Meetup Berlin 22/02/17
2. Overview
Product Comparison on the Visual Meta Platform1
Applying NLP to Product Comparison
Using NLP to Maintain a Product Catalogue2
Making Product Discovery Conversational3
2
3. About Me
Previously…
• Researcher in Natural Language Generation (NLG)
• Software Engineer on Local Search
• Co-founder and Principal Engineer at an NLG Start Up
Currently…
• Engineering Head at Visual Meta
5. Product Comparison at Visual Meta
‘All shops, one site’
• Online marketing platform with
shopping portals in 12 different
countries
• 3 brands: Ladenzeile, ShopAlike,
UmSóLugar
• 100,000,000+ items
• 6,000+ partner shops
6. Faceted Search at Visual Meta
Discover fashion, furniture and
more….
• 800,000 platform visits per day
• 80 filter types across 21
categories
• Currently porting filter search
to ElasticSearch
7. Maintaining a Product Catalogue at Visual Meta
Product feeds are continuously synced from partner shops:
• Feed items must be categorised in order to be discoverable on the platform
We want to:
• Identify all variants of a product
• Compare offers across shops
• Make it easy for our for users to browse through millions of products
Model Colour Memory
Apple iPhone 6s Space Grey 32GB
Apple iPhone 6s Space Grey 128GB
Apple iPhone 6s Gold 32GB
Apple iPhone 6s Gold 128GB
Apple iPhone 6s Rose Gold 32GB
Apple iPhone 6s Rose Gold 128GB
Apple iPhone 6s Silver 32GB
Apple iPhone 6s Silver 128GB
9. String Matching
Index item names and descriptions, query product variant tag names against the index
Lucene query:
• +(Name:apple Description:apple) +(Name:iphone Description:iphone) +(Name:6s Description:6s)
+(Name:16gb Description:16gb) +(Name:space Description:space) +(Name:grey)
Test by manually assigning items to a random sample of products
Recall Precision Fscore
0.59 0.64 0.61
10. Error Analysis
Naming for the same product is not consistent across feeds:
1. abc.com: “Apple iPhone 6 (Space Grey, 64GB)”
2. efg.com: “Apple iPhone 6 64 GB Space Grey”
3. xyz.com: “Apple iPhone 6”
Naming for the same product is not consistent within the same feed:
1. “Apple Iphone 6 - 64GB”
2. “Apple Iphone 6 64GB Space Grey”
3. “Kamakshi Apple iPhone 6 (Latest Model) - 64 GB - Space Gray - Smartphone”
Wrongly categorised Products in the feed:
• “Cover for Apple Iphone 6 - 64GB”
14. Language Models
Drawbacks of bag of words / n-grams:
• Words are equally distant
• Vectors are sparse
Word embeddings capture semantics:
• Vectors are continuous
• Similar words are close in vector space
1. Efficient estimation of word representations in vector space arXiv preprint arXiv:1301.3781 (2013) by Tomas Mikolov, Kai Chen, Greg
Corrado, Jeffrey Dean
15. 15
Word2Vec for Mobile Phone Items
Mobile phone item corpus:
• 7,890 feed items
• 863k tokens, 41.5k unique
Closest words to “Galaxy”:
Word Cosine Distance
1 Samsung 0.51
2 S2 0.48
3 S5 0.46
18. Two Descriptions of a Samsung TV
Samsung UE40H6400AK. Display diagonal:
101.6 cm (40"), HD type: Full HD, Display
resolution: 1920 x 1080 pixels. Tuner type:
Analog & Digital, Digital signal format
system: DVB-C, DVB-T. RMS rated power:
20 W. Consumer Electronics Control (CEC):
Anynet+. Picture processing technology:
Samsung Wide Color Enhancer
The Samsung UE40H6400 has a 101.6cm
screen size and a resolution of 1920 x
1080 pixels. It is a Full HD TV, has an
Analog & Digital tuner and comes with
Anynet+.
19. Generating Product Descriptions
Choosing what to say Deciding how to say it
3. E Reiter (2007). An Architecture for Data-to-Text Systems. In Proceedings of ENLG-2007, pages 97-104
20. Two Descriptions of a Samsung Smartphone
Samsung SM-G920F, Galaxy. Display
diagonal: 12.9 cm (5.1"), Display
resolution: 2560 x 1440 pixels, Display
type: SAMOLED. Processor frequency: 2.1
GHz, Coprocessor frequency: 1.5 GHz.
Internal storage capacity: 32 GB, Internal
RAM: 3072 MB. Main camera resolution
(numeric): 16 MP, Video recording modes:
1080p, 2160p, Maximum frame rate: 30
fps. SIM card capability: Single SIM, SIM
card type: NanoSIM, 2G standards: GSM
The Samsung GALAXY S6 has a 12.9'
display with 2560 x 1440 pixel resolution.
It has a 2.1GHZ processor, a 16 megapixel
camera and 3072MB of internal RAM with
32GB of internal storage capacity.
21. Building Messages from a Product Catalogue
The Samsung Galaxy S6 has a 12.9' display
with 2560 x 1440 pixel resolution. It has a
2.1GHZ processor, a 16 megapixel camera
and 3072MB of internal RAM with 32GB of
internal storage capacity.
23. Entity Recognition for Voice Search
Input - “I’d like some red adidas trainers”
Output:
• <brands, [adidas]>
• <categories, [trainers]>
• <colours, [red]>
234. http://visual-meta.com/tech-corner/hi-lara-building-a-conversational-agent-for-visual-metas-first-hackathon.html
24. Lucene index is built from labels to tag tree
tokens
1. Word shingles are extracted from the input
query
2. Each shingle is queried against the index (top
down, greedy)
Labeled tokens are used to:
1. Query the product index
2. Keep track of the dialogue state
Using the Product Catalogue to Parse Queries
24
• “I’d like some red adidas trainers”
• “I’d like some red adidas”
• “like some red adidas trainers”
• “I’d like some red”
• “like some red adidas”
• “some red adidas trainers”
• ...
• “red”
• “adidas”
• “trainers”
25. Putting It all Together: Answering Queries
How big is the Samsung Galaxy S6’s screen?
The Samsung Galaxy S6 has a 12’9 display
How much RAM does it have?
It has 3072MB of RAM
27. Takeaways
1. Word embeddings, even when trained on limited data can:
a. provide significant improvement over bag of words models for text classification; and
b. reduce the amount of manually curated data required for the task
2. Product catalogues provide a rich information source for conversational apps
3. NLG can be utilised for product feed enhancement as well as conversation