Finding Products on the Internet Using Neural Networks

•

3 likes•717 views

Datafiniti crawls the entire Internet to find product data. We use a machine learning technique called neural networks to automatically identify product listings and various heuristic techniques to extract product data.

Technology

Finding Products on the Internet
using Neural Networks
http://www.datafiniti.net

● Goals
○ Collect vast amounts of data through web crawling
○ Normalize and deduplicate data
○ Make it searchable and meaningful

Challenges
48 billion pages on the Internet
○ Crawled 6 billion+ pages in the past year
Mostly unstructured data
Limitations of customized crawls
○ Non-scalable
○ Less robust

Solution: Intelligent Classifiers
Advantages
○ Generic code: Scalability
○ More robust
Challenges
○ More difficult to parse data of interest

Minimize dependency on HTML
Supervised learning for page classification
○ Neural networks
Heuristic algorithms for data parsing
Our Approach

$Hidden Layer Input Layer Output Layer AV(Product) AV(Product Category) AV(Other) PageFeatures AV: Activation Value : {0, 1} Neural Network Classification_Type = Type with max. AV$

Page Features
Buy
Widget
Price
Image
Num Clickable
Images with
Price
Shipping
Info

Page Features
Weight
Product
Code
Keywords
Num.
Words on
Page

Trained offline Dataset Feature Vector
Normalization
Neural Network
Input Layer
Parameter Set
(P)
Hidden Layer
Parameter Set
(Q)
Training

$Web page Feature Vector Normalized Feature Vector (x) Neural Network Input Layer Parameter Set (P) Hidden Layer Parameter Set (Q) AV(Prod) AV(ProdCat) AV(Other) Page_Type = max{ AV(Prod), AV(ProdCat), AV(Other) } Output of hidden layer: L1 = sigmoid(PT X) Final output: L2 = sigmoid(QT L1 ) L2 = { AV(Prod), AV(ProdCat), AV(Other) } sigmoid(s) = 1 / (1 + e-s ) Deployment$

Notation
○ True Positive (TP)
○ False Positive (FP)
○ False Negative (FN)
Precision : TP / (TP + FP)
Recall : TP / (TP + FN)
F-score: 2PR / (P + R)
Known Dataset
○ Precision = 1.0
○ Recall = 0.985
○ F-score = 0.9925
Live System/Unknown Data
○ Precision = 0.854
○ Recall = Difficult to
calculate
Evaluation

Product Name
Product Price
Product Code
○ UPC, EAN, ISBN, ASIN
Fields to Collect

Product PageGetting Product Name
Potential
Names
<title>Pebble Smart Watch for Select Apple and Android Devices 301RD -
Best Buy</title>
Match
Found

Product PageGetting Product Price
Price values
with text -
discard
Old Price -
discard
Current
Price -
Accept

Improve classification accuracy
Increase/improve collection of data fields
Future Work

Questions?
https://www.datafiniti.net
http://blog.datafiniti.net
@datafiniti

Price
Image(s)
# clickable images adjacent to price values
"Add to cart", "Buy" widget
# words in page text
Keywords
○ Product detail, specifications, features, size,
color, weight, shipping, availability, SKU, UPC,
ISBN, ASIN
Page Features

Product
Images
Related
Products
Price
Widget
to buy
Shipping
Info
Classification Intuition

Games earn more money than movies and music combined. That means a lot of data is generated as well. One of the development considerations for ML Pipeline is that it must be easy to use, maintain, and integrate. However, it doesn’t necessarily have to be developed from scratch. By using well-known libraries/frameworks and choice of efficient tools whenever possible, we can avoid “reinventing the wheel”, making it flexible and extensible.

Accelerating algorithmic and hardware advancements for power efficient on-dev...

Qualcomm Research

Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today are growing quickly in size and use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, it needs to run on the end device within a tight power and thermal budget. One approach to address these issues is Bayesian deep learning. This presentation covers: • Why AI algorithms and hardware need to be energy efficient • How Bayesian deep learning is making neural networks more power efficient through model compression and quantization • How we are doing fundamental research on AI algorithms and hardware to maximize power efficiency

Making better use of Data and AI in Industry 4.0

Albert Y. C. Chen

Everyday Probabilistic Data Structures for Humans

Databricks

Real-time Analytics with Trino and Apache Pinot

Xiang Fu

Amazon Aurora with MySQL compatibility includes several features to improve query performance, while still maintaining full MySQL compatibility. One such feature is Parallel Query, which provides faster analytic queries over current data by pushing query processing down to thousands of CPUs in the storage layer, achieving performance gains of up to two orders of magnitude. Learn how to take advantage of this and other recent Aurora features to implement high performance distributed queries for your MySQL-based applications.

AI Expo - AI Revolution in Silicon Valley

Avkash Chauhan

1000 track2 Bharadwaj

Rising Media, Inc.

Introduction to competitive machine learning

Hawaii Machine Learning Meetup

Dynamic Partition Pruning in Apache Spark

Databricks

In data analytics frameworks such as Spark it is important to detect and avoid scanning data that is irrelevant to the executed query, an optimization which is known as partition pruning. Dynamic partition pruning occurs when the optimizer is unable to identify at parse time the partitions it has to eliminate. In particular, we consider a star schema which consists of one or multiple fact tables referencing any number of dimension tables. In such join operations, we can prune the partitions the join reads from a fact table by identifying those partitions that result from filtering the dimension tables. In this talk we present a mechanism for performing dynamic partition pruning at runtime by reusing the dimension table broadcast results in hash joins and we show significant improvements for most TPCDS queries.

Maximize Big Data ROI via Best of Breed Patterns and Practices

Jeff Bertman

******** Abstract: ******** Not long ago the question was whether your organization had big data. Did you have the volume, the velocity, the technology. Now those basics are largely given for most of the people attending this event. The path to success is still fuzzy, however, with so many technologies to choose from – and so many ways to use them. This presentation triangulates in a holistic manner on the modern business dilemma: how can we leverage technology to improve revenue, profit, market share, and numerous other success criteria. That said, this is not about the analytics or KPIs -- although it is about measurable improvement. It’s about lining up the right technologies and using them in effective, proven ways to maximize Return on Investment (ROI). Since the slant here is holistic, we’ll show how to blend infrastructure, tools, methods, and talent to avoid and constantly trim technical debt… and to produce success stories that are consistently repeatable, not a byproduct of individual heroics.

Faceted Search And Result Reordering

Varun Thacker

Practical Artificial Intelligence & Machine Learning (Arturo Servin)LSx Festival of Technology

13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)Imran Ali

Lessons learned from designing a QA Automation for analytics databases (big d...

Omid Vahdaty

Intel Powered AI Applications for Telco

Michelle Holley

In this talk, Tong will start with the current landscape and typical use cases of Artificial Intelligence applications in the Telco domain. Then, she will introduce Intel’s strategy and products for Network AI, including our focus areas, our hardware portfolio, software stacks, roadmaps and some case studies. Speaker: Tong Zhang, Principal Engineer and Chief Architect for AI and Analytics of the Network Platforms Group, Intel

Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks

Lucidworks

Seed endeca

Ishtiaq Khan

Holistic data application quality

Lars Albertsson

Monomi: Practical Analytical Query Processing over Encrypted Data

ShurenBi1

PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...

Edureka!

( ** Deep Learning Training: https://www.edureka.co/ai-deep-learning-with-tensorflow ** ) This Edureka PyTorch Tutorial (Blog: https://goo.gl/4zxMfU) will help you in understanding various important basics of PyTorch. It also includes a use-case in which we will create an image classifier that will predict the accuracy of an image data-set using PyTorch. Below are the topics covered in this tutorial: 1. What is Deep Learning? 2. What are Neural Networks? 3. Libraries available in Python 4. What is PyTorch? 5. Use-Case of PyTorch 6. Summary Follow us to never miss an update in the future. Instagram: https://www.instagram.com/edureka_learning/ Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka

ATP 2014Sally Valenzuela

DevOps Days Austin 2014 - Vendor DEMO

stonevil

Spark Hearts GraphLab Create

Amanda Casari

Analytics at Motorola: Motorola journey to enable self-serve analytics that l...Patrick Deglon

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams. Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Similar to Finding Products on the Internet Using Neural Networks

Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...

Amazon Web Services

AI Expo - AI Revolution in Silicon Valley

Avkash Chauhan

1000 track2 Bharadwaj

Rising Media, Inc.

Introduction to competitive machine learning

Hawaii Machine Learning Meetup

Dynamic Partition Pruning in Apache Spark

Databricks

Maximize Big Data ROI via Best of Breed Patterns and Practices

Jeff Bertman

Faceted Search And Result Reordering

Varun Thacker

Practical Artificial Intelligence & Machine Learning (Arturo Servin)LSx Festival of Technology

13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)Imran Ali

Lessons learned from designing a QA Automation for analytics databases (big d...

Omid Vahdaty

Intel Powered AI Applications for Telco

Michelle Holley

Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks

Lucidworks

Seed endeca

Ishtiaq Khan

Holistic data application quality

Lars Albertsson

Monomi: Practical Analytical Query Processing over Encrypted Data

ShurenBi1

PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...

Edureka!

ATP 2014Sally Valenzuela

DevOps Days Austin 2014 - Vendor DEMO

stonevil

Spark Hearts GraphLab Create

Amanda Casari

Analytics at Motorola: Motorola journey to enable self-serve analytics that l...Patrick Deglon

Similar to Finding Products on the Internet Using Neural Networks (20)

Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...

AI Expo - AI Revolution in Silicon Valley

1000 track2 Bharadwaj

Introduction to competitive machine learning

Dynamic Partition Pruning in Apache Spark

Maximize Big Data ROI via Best of Breed Patterns and Practices

Faceted Search And Result Reordering

Practical Artificial Intelligence & Machine Learning (Arturo Servin)

13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)

Lessons learned from designing a QA Automation for analytics databases (big d...

Intel Powered AI Applications for Telco

Practical End-to-End Learning to Rank Using Fusion - Andy Liu, Lucidworks

Seed endeca

Holistic data application quality

Monomi: Practical Analytical Query Processing over Encrypted Data

PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...

ATP 2014

DevOps Days Austin 2014 - Vendor DEMO

Spark Hearts GraphLab Create

Analytics at Motorola: Motorola journey to enable self-serve analytics that l...

Recently uploaded

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Jeffrey Haguewood

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

ODC, Data Fabric and Architecture User Group

CatarinaPereira64715

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

PHP Frameworks: I want to break free (IPC Berlin 2024)

Ralf Eggert

In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development. This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

Prayukth K V

The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development. The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers: State of global ICS asset and network exposure Sectoral targets and attacks as well as the cost of ransom Global APT activity, AI usage, actor and tactic profiles, and implications Rise in volumes of AI-powered cyberattacks Major cyber events in 2024 Malware and malicious payload trends Cyberattack types and targets Vulnerability exploit attempts on CVEs Attacks on counties – USA Expansion of bot farms – how, where, and why In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East Why are attacks on smart factories rising? Cyber risk predictions Axis of attacks – Europe Systemic attacks in the Middle East Download the full report from here: https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

The Future of Platform Engineering

Jemma Hussein Allen

JMeter webinar - integration with InfluxDB and Grafana

RTTS

Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application. In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics. Length: 30 minutes Session Overview ------------------------------------------- During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana: - What out-of-the-box solutions are available for real-time monitoring JMeter tests? - What are the benefits of integrating InfluxDB and Grafana into the load testing stack? - Which features are provided by Grafana? - Demonstration of InfluxDB and Grafana using a practice web application To view the webinar recording, go to: https://www.rttsweb.com/jmeter-integration-webinar

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

UiPath Test Automation using UiPath Test Suite series, part 4

DianaGray10

Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap. The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies. Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques What will you get from this session? 1. Insights into SAP testing best practices 2. Heatmap utilization for testing 3. Optimization of testing processes 4. Demo Topics covered: Execution from the test manager Orchestrator execution result Defect reporting SAP heatmap example with demo Speaker: Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

Recently uploaded (20)

Essentials of Automations: Optimizing FME Workflows with Parameters

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

ODC, Data Fabric and Architecture User Group

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Accelerate your Kubernetes clusters with Varnish Caching

PHP Frameworks: I want to break free (IPC Berlin 2024)

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

State of ICS and IoT Cyber Threat Landscape Report 2024 preview

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

The Future of Platform Engineering

JMeter webinar - integration with InfluxDB and Grafana

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

UiPath Test Automation using UiPath Test Suite series, part 4

"Impact of front-end architecture on development cost", Viktor Turskyi

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Finding Products on the Internet Using Neural Networks

1. Finding Products on the Internet using Neural Networks http://www.datafiniti.net

2. ● Goals ○ Collect vast amounts of data through web crawling ○ Normalize and deduplicate data ○ Make it searchable and meaningful

3. Motivation

4. Challenges 48 billion pages on the Internet ○ Crawled 6 billion+ pages in the past year Mostly unstructured data Limitations of customized crawls ○ Non-scalable ○ Less robust

5. Solution: Intelligent Classifiers Advantages ○ Generic code: Scalability ○ More robust Challenges ○ More difficult to parse data of interest

6. Problem Problem

7. Product PageProduct Page

8. Product CategoryProduct Category

9. Product PageProduct Page

10. Some Other Page

11. Problem Solution

12. Minimize dependency on HTML Supervised learning for page classification ○ Neural networks Heuristic algorithms for data parsing Our Approach

13. Hidden Layer Input Layer Output Layer AV(Product) AV(Product Category) AV(Other) PageFeatures AV: Activation Value : {0, 1} Neural Network Classification_Type = Type with max. AV

14. Page Features Buy Widget Price Image Num Clickable Images with Price Shipping Info

15. Page Features Weight Product Code Keywords Num. Words on Page

16. Trained offline Dataset Feature Vector Normalization Neural Network Input Layer Parameter Set (P) Hidden Layer Parameter Set (Q) Training

17. Web page Feature Vector Normalized Feature Vector (x) Neural Network Input Layer Parameter Set (P) Hidden Layer Parameter Set (Q) AV(Prod) AV(ProdCat) AV(Other) Page_Type = max{ AV(Prod), AV(ProdCat), AV(Other) } Output of hidden layer: L1 = sigmoid(PT X) Final output: L2 = sigmoid(QT L1 ) L2 = { AV(Prod), AV(ProdCat), AV(Other) } sigmoid(s) = 1 / (1 + e-s ) Deployment

18. Notation ○ True Positive (TP) ○ False Positive (FP) ○ False Negative (FN) Precision : TP / (TP + FP) Recall : TP / (TP + FN) F-score: 2PR / (P + R) Known Dataset ○ Precision = 1.0 ○ Recall = 0.985 ○ F-score = 0.9925 Live System/Unknown Data ○ Precision = 0.854 ○ Recall = Difficult to calculate Evaluation

19. Problem Data Extraction

20. Product Name Product Price Product Code ○ UPC, EAN, ISBN, ASIN Fields to Collect

21. Product PageGetting Product Name Potential Names <title>Pebble Smart Watch for Select Apple and Android Devices 301RD - Best Buy</title> Match Found

22. Product PageGetting Product Price Price values with text - discard Old Price - discard Current Price - Accept

23. Improve classification accuracy Increase/improve collection of data fields Future Work

24. Questions? https://www.datafiniti.net http://blog.datafiniti.net @datafiniti

25. Price Image(s) # clickable images adjacent to price values "Add to cart", "Buy" widget # words in page text Keywords ○ Product detail, specifications, features, size, color, weight, shipping, availability, SKU, UPC, ISBN, ASIN Page Features

26. Some Other Page

27. Product Images Related Products Price Widget to buy Shipping Info Classification Intuition

Finding Products on the Internet Using Neural Networks

Recommended

Recommended

More Related Content

Similar to Finding Products on the Internet Using Neural Networks

Similar to Finding Products on the Internet Using Neural Networks (20)

Recently uploaded

Recently uploaded (20)

Finding Products on the Internet Using Neural Networks