This document summarizes a talk on web crawling and text analytics techniques for extracting useful information from unstructured data sources. It discusses tools for web crawling like Import.io and webscraper.io. It also covers text analytics approaches like part-of-speech tagging, named entity recognition, and sentiment analysis. A case study is presented on how these methods were used to analyze customer reviews for a restaurant and provide insights. The talk concludes with predictions for the data science industry and advice for starting a data science company.
My talk was titled 'Decoding Ratings for superior service in restaurants - Using text to understand customers'. The focus was quite simple - convince and demonstrate how to read and understand customers from their reviews, not ratings. Our product, Lunchbox, a complete restaurant management solution was showcased as well. It provides restaurant owner cues for exceptional customer service. Millions of reviews for almost one lakh restaurants have been processed and can now be used for market scenarios, competition analysis, transactional information and customer profiling.
Pitfalls of product marketing and How Business Requirements Can Make Your Pro...Eliza Dumitrache
The presentation comprises elements of tracking sales and user behavior that are essential for a speedy and successful sales kick off, profitability and business development and the importance of involving the Marketing Department in product development.
Are you very clear about the ideal profile for your customers? Do you move like a butterfly in and around the ideal customer profile? What parameters do you use to arrive at the ideal customer profile?
About Us:-
Strategic Concepts (India) Pvt Ltd, founded by Mr. Sanjay Singh and Mrs. Reena Singh in 1998, is an enterprise that gives 360 degrees solutions on the Sales and Customer Service function of Management.
We will guide you “What not to do in Sales Management?” with an explanation of “Why not to do?” so that you always know “What to do in Sales Management?”
X- Sell, SaleskiPathshala, SaleskiJaat, SaleskiVidhi, SaleskiNeeti, and Saleskatha are our flagship programs that are proprietary. We have also registered processes like (More Sales Calls = More Sales), More Customer Per Customer, More References Per Customer.
We catalyze growth in Corporates like Ultratech, Airtel, LG Electronics, ICICI Bank, Amway, Tata Teleservices, Godrej & Boyce, WinMagic Toys and many more through our Learning & Development interventions.
Watch the video to learn how to maintain, engage, and convert footfalls for higher sales and bigger revenues.
Stay connected with Us for regular updates on Our Social Channels:
Like Us on Facebook: https://www.facebook.com/Consult4Sales1/
Follow Us on Twitter: https://twitter.com/Consults4Sales
Follow Us on LinkedIn https://www.linkedin.com/company/strategic-concepts-i-pvt.-ltd/?originalSubdomain=in
Pinterest.com: https://in.pinterest.com/sciplmarkon/
If you want to invite Mr. Sanjay Singh for a Training, Seminar or Event, then please click on the link :
https://consult4sales.com or Call at +91- 9970506000
2017 06-test withintelligence-conversioneliteTim Stewart
Full deck from Conversion Elite 2017 with additional slides and further on slide notes added. Covers the concept of understanding where you are in the user relationship and which additional metrics to consider when planning tests as well as those which you are trying to optimise in an MVT or AB test
A presentation by Glyn Jones, Welsh Government Chief Statistician, at the launch of the Administrative Data Research Centre Wales on Monday 23rd March 2015.
My talk was titled 'Decoding Ratings for superior service in restaurants - Using text to understand customers'. The focus was quite simple - convince and demonstrate how to read and understand customers from their reviews, not ratings. Our product, Lunchbox, a complete restaurant management solution was showcased as well. It provides restaurant owner cues for exceptional customer service. Millions of reviews for almost one lakh restaurants have been processed and can now be used for market scenarios, competition analysis, transactional information and customer profiling.
Pitfalls of product marketing and How Business Requirements Can Make Your Pro...Eliza Dumitrache
The presentation comprises elements of tracking sales and user behavior that are essential for a speedy and successful sales kick off, profitability and business development and the importance of involving the Marketing Department in product development.
Are you very clear about the ideal profile for your customers? Do you move like a butterfly in and around the ideal customer profile? What parameters do you use to arrive at the ideal customer profile?
About Us:-
Strategic Concepts (India) Pvt Ltd, founded by Mr. Sanjay Singh and Mrs. Reena Singh in 1998, is an enterprise that gives 360 degrees solutions on the Sales and Customer Service function of Management.
We will guide you “What not to do in Sales Management?” with an explanation of “Why not to do?” so that you always know “What to do in Sales Management?”
X- Sell, SaleskiPathshala, SaleskiJaat, SaleskiVidhi, SaleskiNeeti, and Saleskatha are our flagship programs that are proprietary. We have also registered processes like (More Sales Calls = More Sales), More Customer Per Customer, More References Per Customer.
We catalyze growth in Corporates like Ultratech, Airtel, LG Electronics, ICICI Bank, Amway, Tata Teleservices, Godrej & Boyce, WinMagic Toys and many more through our Learning & Development interventions.
Watch the video to learn how to maintain, engage, and convert footfalls for higher sales and bigger revenues.
Stay connected with Us for regular updates on Our Social Channels:
Like Us on Facebook: https://www.facebook.com/Consult4Sales1/
Follow Us on Twitter: https://twitter.com/Consults4Sales
Follow Us on LinkedIn https://www.linkedin.com/company/strategic-concepts-i-pvt.-ltd/?originalSubdomain=in
Pinterest.com: https://in.pinterest.com/sciplmarkon/
If you want to invite Mr. Sanjay Singh for a Training, Seminar or Event, then please click on the link :
https://consult4sales.com or Call at +91- 9970506000
2017 06-test withintelligence-conversioneliteTim Stewart
Full deck from Conversion Elite 2017 with additional slides and further on slide notes added. Covers the concept of understanding where you are in the user relationship and which additional metrics to consider when planning tests as well as those which you are trying to optimise in an MVT or AB test
A presentation by Glyn Jones, Welsh Government Chief Statistician, at the launch of the Administrative Data Research Centre Wales on Monday 23rd March 2015.
El derecho de autor y su incidencia en las bibliotecas Legislación y normativ...abamp
Incluyó a todos los países de la OMPI.
Utilizó las leyes de 188 países. Encontró que
33 países no tenían excepciones en favor de las bibliotecas y
Y los que tenían L&E en sus leyes eran muy distintas entre sí. Sin armonización entre ellas.
Crews utilizó la base de datos WIPO Lex.
Bespoke Wood Interiors | Made to measure
Creator of fine superior wood, Oscar Ono takes the word "tailor-made" to another level through products that perfectly match various kinds of decors, be they classic or contemporary. Offering a wide variety of textures, colors, species and sizes adapted to each creation, Oscar Ono aims at turning design requirements and ideas into unique and customized projects. It is with great pleasure that we give you a glimpse of some past projects.
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
HacktoberFestPune is a beginner-friendly, all-inclusive event that is absolutely free of cost. Certificates will be issued by DSC MESCOE and DSC PVGCOET for everyone who can complete 4 successful Pull Requests by 13th October 10 AM! An evening filled with speaker sessions, interactions with fellow developers, and mini-games, we think you'll have a great time with everyone!
We explain the history of our agile organization with a focus on the latest round of evolution of our Product and Engineering organization, moving from business-oriented feature teams to mission teams.
Scaling SEO by Building Products - Search London Meetup Nov 17Fabrizio Ballarini
My talk at November 2017 Search London Meetup.
Covering how we scaled organic search growth by building products together with full stack engineers & how we operate an autonomous & independent SEO team.
Your Roadmap, Your Product Story & Datadriven Product ManagementProduct School
From this presentation you will find out more about becoming a Data-Driven Product Manager.
Get a FREE copy of our Product Book here: https://prdct.school/2BSES8J
El derecho de autor y su incidencia en las bibliotecas Legislación y normativ...abamp
Incluyó a todos los países de la OMPI.
Utilizó las leyes de 188 países. Encontró que
33 países no tenían excepciones en favor de las bibliotecas y
Y los que tenían L&E en sus leyes eran muy distintas entre sí. Sin armonización entre ellas.
Crews utilizó la base de datos WIPO Lex.
Bespoke Wood Interiors | Made to measure
Creator of fine superior wood, Oscar Ono takes the word "tailor-made" to another level through products that perfectly match various kinds of decors, be they classic or contemporary. Offering a wide variety of textures, colors, species and sizes adapted to each creation, Oscar Ono aims at turning design requirements and ideas into unique and customized projects. It is with great pleasure that we give you a glimpse of some past projects.
HacktoberFestPune - DSC MESCOE x DSC PVGCOETTanyaRaina3
HacktoberFestPune is a beginner-friendly, all-inclusive event that is absolutely free of cost. Certificates will be issued by DSC MESCOE and DSC PVGCOET for everyone who can complete 4 successful Pull Requests by 13th October 10 AM! An evening filled with speaker sessions, interactions with fellow developers, and mini-games, we think you'll have a great time with everyone!
We explain the history of our agile organization with a focus on the latest round of evolution of our Product and Engineering organization, moving from business-oriented feature teams to mission teams.
Scaling SEO by Building Products - Search London Meetup Nov 17Fabrizio Ballarini
My talk at November 2017 Search London Meetup.
Covering how we scaled organic search growth by building products together with full stack engineers & how we operate an autonomous & independent SEO team.
Your Roadmap, Your Product Story & Datadriven Product ManagementProduct School
From this presentation you will find out more about becoming a Data-Driven Product Manager.
Get a FREE copy of our Product Book here: https://prdct.school/2BSES8J
Conversion Optimization Framework to Build Sustainable and Repeat GrowthTushar Purohit
The goal of the this presentation on Conversion optimization Framework is to remove the guesswork from the conversion optimization process. It provides comprehensive analysis to anyone interested in optimization with a specific methodology to produce consistent results.
Your competition now matters: Building a SaaS Company Isn't What it Used to B...Price Intelligently
Growth and software luminaries have preached for decades that "your competition doesn't matter...focus on your customers." While true in theory, Hiten Shah in his presentation at Price Intelligently's SaaSFest 2016, shows us that in the second wave of SaaS, your competition now matters. The shift took place because software is relatively easy to build now with infrastructure and marketing advances. You need to think about your customer first, but if you're not aware of your competition or doing things to circumvent them, you'll get left behind.
Lean Startup Tools for Scrum Product OwnersTechWell
In just a few years, the Lean Startup movement has gained influence by promoting a powerful but simple agile product management toolset—one that complements agile software development approaches such as Scrum and kanban. Arlen Bankston explores the tools and techniques product owners at startup companies and others are employing today for project visioning, experimental design, evaluating new feature impact, prototyping, split testing, and gaining early customer feedback. He demonstrates tools like Google Analytics and reveals where to find and how to exploit "pirate metrics." With case studies, Arlen illustrates how these approaches have been applied on large and small projects. Because the Scrum Product Owner role is often oversimplified yet difficult to execute well, these techniques have been welcomed in organizations ranging from Silicon Valley startups to the US government and its contractors. Join Arlen and add your name to the list!
Finding the one growth metric that mattersSean Ellis
From social media to website analytics, there are literally hundreds of things you could track, measure and try to improve. But what is the one metric that if improved would mean a big win for your business?
This deck will help you hone in on the metrics that really matter to your business. You’ll get their insights on the tools and strategies you need to find, prioritize and grow the numbers that will result in big wins for your business.
The steps to take your B2B Software or Service company from horizontal market focus to a vertical market focus in 60 days. PS: You don't need IP to do it.
World of Watson 2016: Journey to Cognitive Excellence - Harness the Force of ...Julie Severance
Becoming a cognitive business is a journey, not a destination. A cognitive analytics culture is not something you can just buy or install. Although the right technology is crucial, its true value arises when the organizational mindset changes. Many organizations have learned to embrace analytics, but embracing cognitive is another step entirely, and it’s one that may be even more challenging. However, the possibilities are endless and the potential rewards make it worthwhile.
Data Discovery and BI - Is there Really a Difference?Inside Analysis
The Briefing Room with John O'Brien and Birst
Live Webcast Dec. 3, 2013
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=7869542&rKey=1f6574abc879ca42
While the disciplines of business intelligence and discovery certainly overlap, there are key distinctions between the two, both in terms of design point and user interface. While traditionally it is believed different architectures are required to address these differing analytic needs, is that really the case? Or is discovery simply another key capability within an overall BI platform?
Register for this episode of The Briefing Room to learn from veteran Analyst John O'Brien of Radiant Advisors as he outlines best practices for enabling high-quality business intelligence and discovery, and the architectural capabilities to enable both. He'll be briefed by Brad Peters of Birst who will tout his company's cloud BI platform. In particular, Peters will demonstrate how the Birst architecture was especially designed for enterprise-caliber BI and argue for a more inclusive future BI architecture.
Visit InsideAnalysis.com for more information
Data Discovery and BI - Is there Really a Difference?
IIML Talk_23012016
1. A talk delivered at
IIM Lucknow
24/01/2016
EXTRACT. LOAD. VISUALIZE
Becoming a data ninja
2.
3.
4. • INTRODUCTION
• WEB CRAWLING – THE DESIGN
• DIY – WEB CRAWLING
• TEXT ANALYTICS– THE DESIGN
• CASE STUDY
• DIY – TEXT ANALYTICS
• SOME UNSOLICITED ADVICE? – BUILDING YOUR FIRM
WE WILL TALK OF….
5. Who are we ?
We are a data science company, founded in 2009– with special interest in making the
world an intelligent place to live in.
We identify data and bring it to light, making it visible, cohesive, comparable and easy to
understand so that it really does support YOU in making the right decisions.
Who am I ?
I am a Practice Lead at JSM for Natural Language Processing & Machine Learning. I have
architected multiple solutions in the area of text analytics for multiple industries like finance,
healthcare, food & beverages & hospitality.
6. AREAS WE WORK ON
PHARMA
Sales Pitch Analysis
RETAIL
Predictive + IoT
FINANCE
Competitive
Intelligence
F&B
Customer Insights
MR
Scoping and Product
Evaluation
SaaS
NLP, ML, Text
8. TOOLS WE PLAY WITH
OPEN SOURCE
Inexpensive
DATABASES
Fast & Scalable
INSIGHTS
Python, R
TECHNIQUES
Latest yet tested
VISUALIZATIONS
D3, GCharts, Tableau
MANAGEMENT
Basecamp
12. • HTML pages are like textbooks – content, titles, subtitles, paragraphs and so on
•Javascript adds interactivity to the HTML pages
HTML PAGES
13. A crawler is a program that visits Web sites and reads their pages and other information in order to create
entries for a search engine index.
The major search engines on the Web all have such a program, which is also known as a "spider" or a "bot."
OVERVIEW– WEB CRAWLING
17. • BEST of the lot
• Gives great flexibility – just click and extract
• Most of the sites are compatible
• Easy CSV/Google Docs Export
• Provides APIs for regular data updates
• Low training time
INTRODUCTION
18. USE CASES
You want to monitor feedback
http://www.consumercomplaints.in/snapdeal-com-b100038
22. • Works where Import.io fails
• Bit buggy, but does a god job of providing flexible choices of data extraction
• Most of the sites are compatible
• Easy CSV Export
• NO APIs for regular data updates
• Moderate learning curve
INTRODUCTION
24. • Don’t scrape too fast or you will get banned
• Respect robots.txt
• Extract only what you need
• Don’t overload their servers
• Don’t take data what’s not yours – only the data in public domain
PRECAUTIONS
26. As a brand owner with significant investments in social media, the
usual questions you might have in mind…
• Is the brand exuding same attributes I intended it to be?
• Is my internet presence helping me ?
• Can I measure my ROI for the money I spent?
• What are the measurable metrics for effective social media
management?
• When can I exploit emerging trends for my brand?
• How can I understand my customers better?
26
WHAT WOULD YOU WANT?
27. 27
WHAT WOULD YOU WANT TO TARGET?
TRANSACTIONAL
CONVERSATIONS
Users talk about
current, events,
share cat videos
and engage in
trivial gossip
INFORMATIONAL
CONVERSTIONS
Users engage with
the brand to air
appreciation or
complaints.
29. For a brand, say X, there would be thousands of conversations on blogs,
websites and social media revolving around X.
Several hundred blog posts are published that contain the keywords “X” This
company needs to know:
- How many of these posts are relevant and actually expressing opinions
about X?
- How many relevant posts are negative, and how many are positive?
- What particular aspects and features of X are being praised or criticized?
- For all of the above, what is the trend for the past few time periods?
- How is my brand perceived among people demographics?
- How is my brand faring against my competitors?
29
QUESTIONS TO ANSWER
30. BRIEF TERMINOLOGY
You build an algorithm, machine learns patterns, machine predicts, rinse & repeat.
MACHINE LEARNING
TEXT ANALYTICS
Analyzing unstructured text, assign structure, load into a BI/program to visualize
32. PROBLEM STATEMENT
The client had thousands of customer reviews which they wanted to analyse - to
understand customer feedback and identify improvement opportunities.
The broad questions we focused on;
What did they say about the
restaurant?
Keywords & topics of discussion across
the comments
What elements of the restaurant would
they want improved? – service, staff
behaviour, ambience etc.
When did the customer visit the store?
How is client’s traffic distributed over
time?
Ticket sizes across multiple customer
dimensions – age, gender, ratings,
location, time of visit etc.
Overall customer sentiments & views
about UCH
PRIMARY FOCUS AREAS SECONDARY FOCUS AREAS
34. APPROACH
Extract data
and validate
Corpus from
social media
Tokenise and
remove stop
words
Initiate ML models ,
NER , parsers &
topic algorithms
Initiate detection rules
for topics, keywords,
gender, sentiment and
multi-word concept
detection
Final Output
PRE - PROCESSING PARSING &
ANALYSIS
OUTPUT
Part of
Speech (POS)
Tagger
35. Author Value Type Sentiment
Duncan Riley Samsung Galaxy S6 Entity Positive
Duncan Riley Apple Entity Negative
Duncan Riley LoopPay Entity Neutral
Duncan Riley mobile payments Keyword Positive
Duncan Riley point of sales Keyword Positive
Structuring data from free flowing text is easy to use by existing reporting and business intelligence software.
Insights from the final reports can now be used for decision-making by the PR firm and their client
Actual blog post parsed through our
SmartText Engine
56. • Purse strings WILL be tightened
• Fewer ‘unicorn-level’ valuations
• No investment without revenue or clients
• MVP with traction – a must have
• Acquire or be acquired – or be a acquisition target
• Needs > Wants – solve a problem, but scope it out
•Bootstrapped startups will win, not the valuation-hungry
PREDICTIONS
57. Don't sell what you do - disguise it
Ex. Create marketing strategy, not analytics/Tableau
Nugget 1
58. A product that promotes laziness or transfers laziness from
seller to buyer will sell
Nugget 2
59. Is it something that Google/FB provides or can do? Even if it's
partial, danger is real.
Nugget 3
60. The product should be IP-able, scalable and monetize-able -
atleast 2/3 must be met
Nugget 4