BigData Use-
cases
- Prepared by
- Vishal Shukla
- Pranav Shukla
- Krishna Meet
Brevitaz Overview
● Founded in 2014
● Small team of technocrats delivering Big Data Solutions
● Global client-base in Europe and Asia-pacific region
● Expertise
○ Full-text search
○ Real-time analytics
○ Log analytics
○ BigData analytics
○ IoT based solutions
○ Machine learning
○ BigData warehousing
● Technologies
○ Spark, Hadoop, Kafka, Flume, Storm
○ Elasticsearch, Logstash, Kibana
○ MongoDB, Cassandra, HBase, Apache Titan
○ Impala, Spark SQL, Hawk
○ Java & Spring stack, Typesafe stack (Scala, Akka, Spray, Slick)
○ AngularJS
Agenda
➔ Big Data & Analytics
➔ Full-text search
➔ Log analytics
➔ Big Data Analytics
➔ Real-time Analytics
➔ IoT Analytics
➔ Machine Learning on BigData
➔ Big Data Warehousing
➔ Big Data is for Everyone
Analytics
Data is growing!
Full-text
Search
Spot the right data
quickly
“
It’s all about being able to spot
Right Information at Right Time
◎ Relevance search in near real-time
○ Find results matching “iphone”. Please don’t show me
Iphone chargers in first page.
◎ Fuzzy search and search suggestions
○ Find results matching “iphne"
◎ Faceted search
○ Filters in amazon after searching a keyword
◎ Complex search with multiple criteria
○ Find me products matching “iphone” with in price range
30000 INR to 50000 INR and color “Space grey”
◎ Geo-spatial search
○ Find restaurants within 10 km radius from my current
location. And yes, I want to see closer ones on top.
Full-text search - What it is?
What are you talking! I am here for
BigData!
Elasticsearch - does all of these for massive
volume, variety and velocity
◎ Crawl third-party websites
◎ Aggregate and classify the data
◎ Develop custom application on top of classified
data
Use-case - Information Aggregator
◎ Google’s “Did you mean?”
◎ Search suggestions as you type
◎ Text analytics
◎ McGrowHill - Transform text-books into digital
learning resource
◎ SoundCloud - Quickly find music that interests
them
Other use-cases
Log
Analytics
Collect, analyze and
Improvise
“
Transform your dumb logs into
actionable insights
● Use machine generated logs to get operational
insights
● Sensors, application servers, web servers or any
IoT device logs
To interactively answer questions like...
◎ How many users signed up this week?
◎ How users are using your website / mobile app
◎ How successful is our advertising campaign?
◎ Why is the database slow?
◎ Which are the websites categories my team is
spending the most time at?
◎ Who are the potential employees to resign next?
Log Analytics - What it is?
What’s the big deal!
Use-case - Network Logs Analysis
◎ High velocity
◎ High volume
◎ Collect, analyze and improvise
◎ Analyze click stream data to provide
personalized offers and user experience
◎ Interactive drill-down analysis
◎ Compliance reporting through interactive
dashboard
◎ Real-time alerts on invalid login attempts
◎ Detect outages
◎ Multi-channel funnel reporting for your
Advertising campaigns to find out which
channels contribute the most for conversions
Other use-cases
BigData
Analytics
Make your data speak
“
Combine all sources of data to
uncover hidden patterns and
unknown relations in your data
● Take your transactional data from various
sources
● Take operational and user behaviour logs data
● Collect social data
● Combine data collected from various sources to
To interactively answer questions like...
◎ What is increase or decrease in sales over the
years?
◎ How many unique customers are acquired this
year?
◎ Which products are trending disproportionately
this year?
Big Data Analytics - What it is?
Usecase - Supply chain management
◎ RFID labels can indicate which product is where
at what time
◎ Get more accurate business insights
◎ Theft detection
◎ Social media sentiment analysis to get end-user
feedback on launched products
◎ Identify market trends
◎ Predict employees attrition
◎ Customer churn analysis
◎ Influencer analysis
◎ Lead generation
◎ Proactive issues monitoring
◎ For insurance companies, identify potential
customers by combining birth, marriage and
health data
Other use-cases
Real-time
Analytics
Analyse instantaneously
as you collect data
“
Lag of seconds can make a
fraudster and you
● Ingest streaming data, possibly at high velocity
● Analyse and react immediately
To solve problems like...
◎ Identify changing trends in real-time
◎ Detect fraud
◎ Analyse policy violations and react immediately
◎ Reduce downtimes
◎ Provide better and quicker business decisions
Real-time Analytics - What it is?
Use-case - Enrich Customer Experience
◎ Get real-time feeds about customer location or
products being browsed
◎ Combine with historical user behaviours
◎ Roll out offers in real-time
◎ Hospitality Industry
○ Bad weather reduces travel, which then
reduces overnight lodging
○ Combine weather data with flight
cancellation to identify stranded travellers
○ Offer hotel coupons based on near by
location.
Other use-cases
◎ Fraud detection
◎ Predict and enrich customer experience based on
location, lifestyle
◎ Real-time process visibility across an enterprise
◎ Suggest optimal routes based on current traffic
data
◎ Get player performance metrics in real-time to
substitute players at right time
Other use-cases
IoT
Analytics
Let machines communicate
● Use sensors to detect low level data
● Report the captured data to server
● Analyse and get back to user
To provide smart alerts and suggestions like
◎ Schedule maintenance of machines
◎ Your pulse rate is disproportionately increasing
◎ Medicines manufactured in a batch is not
complying to standards
IoT Based Smart Solutions - What it is?
Use-case
◎ Performance measurement & maintenance
schedule
DIAGRAM
◎ In agriculture, Sensors can detect crop health
along with geo data and based on that alert can
be sent to farmers where they need to focus
◎ In retail, smart-shelves can detect and send
alerts on when to replenish
◎ Smart home can analyze the patterns of each
family member and optimize energy usage
Other use-cases
Machine
Learning
on
BigData
Make the machines
learn from data
What is machine learning?
◎ Machine learning is not programming a machine to
do stuff
◎ Machine learning is making the machine learn and
adapt based on the observed data
Where is machine learning used?
● Identify similarities between products, users
● Predict values from past data
● Classify items into categories, like an email is spam
or not spam
in order to ...
◎ Predict expected outcome
◎ Categorize large amounts of data
◎ Optimize algorithms or paths
◎ Find similarities
◎ Improve quality of predictions continuously
“
Recommending the right products
makes the difference between
selling or not selling a product
Use-case - Recommending Products
◎ Compare thousands of
users/products with each other
to find similar “clusters”
◎ Content-based filtering -
Recommend similar products
to what customer has already
bought
◎ Find similar customers to the
current customer and
recommend him what they
have bought
◎ Apply what is known as
Clustering algorithms in
machine learning on Big Data
Use-case - Optimise team combination in Sports
◎ Choose best performing team with limited
budget
◎ It was first applied in Baseball, now many
professional games use these techniques
◎ Choose a team consisting of players who could
win at least enough games to make to the play-
offs
◎ Use data analysis techniques to find undervalued
players
Use-case - Sports
What they achieved?
◎ Average 90 wins in each
season in less than 30M $
◎ Same number of wins in
1/3rd of budget than
another team
◎ 20 more wins than
another team with similar
budget
Other use-cases
◎ Fraud detection in banking and other sectors
◎ Fine grained customer segmentation for targeted
products
◎ Predicting next product failure and sending a
replacement part in advance
◎ Predict best candidates
Big Data
Warehousing
Catch all that you can so, you
can analyze it later
Why modernize Data Warehouse with Big Data?
Traditional Enterprise Data Warehouse (EDW) can only
◎ Store only structured data
◎ Extremely expensive license cost per TB of storage
◎ Capacity constrained with ETL and query workloads
big data will help to...
◎ Store unstructured, semi-structured data
◎ Combine your structured data with other sources
◎ Run interactive SQL queries on big data
◎ Offload ETL workload from your EDW
◎ Offload less frequently used data from your EDW
◎ Save licensing costs
Use-case - Modernizing Data Warehouse
◎ Low cost storage for years of data
◎ Data lake for structured, unstructured and semi-
structured data
◎ Interactive queries on historic data
◎ Online archival with reporting
○ Make years of data available
◎ ETL off-loading
○ Spark jobs to reduce ETL job time from hours
to minutes
◎ Batch reports off-loading
○ Reduce load on your warehouse by off-
loading batch reports
◎ Big Data Discovery
○ Proactively find patterns guided by the
system
Other use-cases
But we are just a startup !
“
Start small. Then scale.
Next steps
ry, evaluate and adopt in risk-
free manner
◎ Identify sources of your unused data
○ like server logs
○ social streams
◎ Collect and store on cloud to minimize initial
investment
◎ Many cloud options like Amazon EC2,
Databricks, Altiscale...
◎ Use open-source analytics engines like
Elasticsearch, Kibana. They are free to use.
◎ Experience the success
◎ Automate using sensors or IoT devices to add
more sources of useful data
Start small and then scale
◎ https://aws.amazon.com/public-data-sets/
◎ https://data.gov.in/
◎ https://open-data.europa.eu/en/data/
◎ https://www.data.gov/
◎ https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public
Some open datasets to play with
Woo-ha! I am feeling empowered!
Thanks!
Any questions?
Contact Us
@pranavshukla81
http://in.linkedin.
com/in/pranavshukla81
pranav.shukla@brevitaz.com
@vishal1shukla2
https://in.linkedin.com/in/vishalshu
vishal.shukla@brevitaz.com

Big Data Usecases

  • 1.
    BigData Use- cases - Preparedby - Vishal Shukla - Pranav Shukla - Krishna Meet
  • 2.
    Brevitaz Overview ● Foundedin 2014 ● Small team of technocrats delivering Big Data Solutions ● Global client-base in Europe and Asia-pacific region ● Expertise ○ Full-text search ○ Real-time analytics ○ Log analytics ○ BigData analytics ○ IoT based solutions ○ Machine learning ○ BigData warehousing ● Technologies ○ Spark, Hadoop, Kafka, Flume, Storm ○ Elasticsearch, Logstash, Kibana ○ MongoDB, Cassandra, HBase, Apache Titan ○ Impala, Spark SQL, Hawk ○ Java & Spring stack, Typesafe stack (Scala, Akka, Spray, Slick) ○ AngularJS
  • 4.
    Agenda ➔ Big Data& Analytics ➔ Full-text search ➔ Log analytics ➔ Big Data Analytics ➔ Real-time Analytics ➔ IoT Analytics ➔ Machine Learning on BigData ➔ Big Data Warehousing ➔ Big Data is for Everyone
  • 5.
  • 6.
  • 7.
  • 8.
    “ It’s all aboutbeing able to spot Right Information at Right Time
  • 9.
    ◎ Relevance searchin near real-time ○ Find results matching “iphone”. Please don’t show me Iphone chargers in first page. ◎ Fuzzy search and search suggestions ○ Find results matching “iphne" ◎ Faceted search ○ Filters in amazon after searching a keyword ◎ Complex search with multiple criteria ○ Find me products matching “iphone” with in price range 30000 INR to 50000 INR and color “Space grey” ◎ Geo-spatial search ○ Find restaurants within 10 km radius from my current location. And yes, I want to see closer ones on top. Full-text search - What it is?
  • 10.
    What are youtalking! I am here for BigData!
  • 11.
    Elasticsearch - doesall of these for massive volume, variety and velocity
  • 12.
    ◎ Crawl third-partywebsites ◎ Aggregate and classify the data ◎ Develop custom application on top of classified data Use-case - Information Aggregator
  • 13.
    ◎ Google’s “Didyou mean?” ◎ Search suggestions as you type ◎ Text analytics ◎ McGrowHill - Transform text-books into digital learning resource ◎ SoundCloud - Quickly find music that interests them Other use-cases
  • 14.
  • 15.
    “ Transform your dumblogs into actionable insights
  • 16.
    ● Use machinegenerated logs to get operational insights ● Sensors, application servers, web servers or any IoT device logs To interactively answer questions like... ◎ How many users signed up this week? ◎ How users are using your website / mobile app ◎ How successful is our advertising campaign? ◎ Why is the database slow? ◎ Which are the websites categories my team is spending the most time at? ◎ Who are the potential employees to resign next? Log Analytics - What it is?
  • 17.
  • 18.
    Use-case - NetworkLogs Analysis ◎ High velocity ◎ High volume ◎ Collect, analyze and improvise
  • 20.
    ◎ Analyze clickstream data to provide personalized offers and user experience ◎ Interactive drill-down analysis ◎ Compliance reporting through interactive dashboard ◎ Real-time alerts on invalid login attempts ◎ Detect outages ◎ Multi-channel funnel reporting for your Advertising campaigns to find out which channels contribute the most for conversions Other use-cases
  • 21.
  • 22.
    “ Combine all sourcesof data to uncover hidden patterns and unknown relations in your data
  • 23.
    ● Take yourtransactional data from various sources ● Take operational and user behaviour logs data ● Collect social data ● Combine data collected from various sources to To interactively answer questions like... ◎ What is increase or decrease in sales over the years? ◎ How many unique customers are acquired this year? ◎ Which products are trending disproportionately this year? Big Data Analytics - What it is?
  • 24.
    Usecase - Supplychain management ◎ RFID labels can indicate which product is where at what time ◎ Get more accurate business insights ◎ Theft detection
  • 25.
    ◎ Social mediasentiment analysis to get end-user feedback on launched products ◎ Identify market trends ◎ Predict employees attrition ◎ Customer churn analysis ◎ Influencer analysis ◎ Lead generation ◎ Proactive issues monitoring ◎ For insurance companies, identify potential customers by combining birth, marriage and health data Other use-cases
  • 26.
  • 27.
    “ Lag of secondscan make a fraudster and you
  • 28.
    ● Ingest streamingdata, possibly at high velocity ● Analyse and react immediately To solve problems like... ◎ Identify changing trends in real-time ◎ Detect fraud ◎ Analyse policy violations and react immediately ◎ Reduce downtimes ◎ Provide better and quicker business decisions Real-time Analytics - What it is?
  • 29.
    Use-case - EnrichCustomer Experience ◎ Get real-time feeds about customer location or products being browsed ◎ Combine with historical user behaviours ◎ Roll out offers in real-time
  • 30.
    ◎ Hospitality Industry ○Bad weather reduces travel, which then reduces overnight lodging ○ Combine weather data with flight cancellation to identify stranded travellers ○ Offer hotel coupons based on near by location. Other use-cases
  • 31.
    ◎ Fraud detection ◎Predict and enrich customer experience based on location, lifestyle ◎ Real-time process visibility across an enterprise ◎ Suggest optimal routes based on current traffic data ◎ Get player performance metrics in real-time to substitute players at right time Other use-cases
  • 32.
  • 33.
    ● Use sensorsto detect low level data ● Report the captured data to server ● Analyse and get back to user To provide smart alerts and suggestions like ◎ Schedule maintenance of machines ◎ Your pulse rate is disproportionately increasing ◎ Medicines manufactured in a batch is not complying to standards IoT Based Smart Solutions - What it is?
  • 34.
    Use-case ◎ Performance measurement& maintenance schedule DIAGRAM
  • 35.
    ◎ In agriculture,Sensors can detect crop health along with geo data and based on that alert can be sent to farmers where they need to focus ◎ In retail, smart-shelves can detect and send alerts on when to replenish ◎ Smart home can analyze the patterns of each family member and optimize energy usage Other use-cases
  • 36.
  • 37.
    What is machinelearning? ◎ Machine learning is not programming a machine to do stuff ◎ Machine learning is making the machine learn and adapt based on the observed data
  • 38.
    Where is machinelearning used? ● Identify similarities between products, users ● Predict values from past data ● Classify items into categories, like an email is spam or not spam in order to ... ◎ Predict expected outcome ◎ Categorize large amounts of data ◎ Optimize algorithms or paths ◎ Find similarities ◎ Improve quality of predictions continuously
  • 39.
    “ Recommending the rightproducts makes the difference between selling or not selling a product
  • 40.
    Use-case - RecommendingProducts ◎ Compare thousands of users/products with each other to find similar “clusters” ◎ Content-based filtering - Recommend similar products to what customer has already bought ◎ Find similar customers to the current customer and recommend him what they have bought ◎ Apply what is known as Clustering algorithms in machine learning on Big Data
  • 41.
    Use-case - Optimiseteam combination in Sports ◎ Choose best performing team with limited budget ◎ It was first applied in Baseball, now many professional games use these techniques ◎ Choose a team consisting of players who could win at least enough games to make to the play- offs ◎ Use data analysis techniques to find undervalued players
  • 42.
  • 43.
    What they achieved? ◎Average 90 wins in each season in less than 30M $ ◎ Same number of wins in 1/3rd of budget than another team ◎ 20 more wins than another team with similar budget
  • 44.
    Other use-cases ◎ Frauddetection in banking and other sectors ◎ Fine grained customer segmentation for targeted products ◎ Predicting next product failure and sending a replacement part in advance ◎ Predict best candidates
  • 45.
    Big Data Warehousing Catch allthat you can so, you can analyze it later
  • 46.
    Why modernize DataWarehouse with Big Data? Traditional Enterprise Data Warehouse (EDW) can only ◎ Store only structured data ◎ Extremely expensive license cost per TB of storage ◎ Capacity constrained with ETL and query workloads big data will help to... ◎ Store unstructured, semi-structured data ◎ Combine your structured data with other sources ◎ Run interactive SQL queries on big data ◎ Offload ETL workload from your EDW ◎ Offload less frequently used data from your EDW ◎ Save licensing costs
  • 47.
    Use-case - ModernizingData Warehouse ◎ Low cost storage for years of data ◎ Data lake for structured, unstructured and semi- structured data ◎ Interactive queries on historic data
  • 48.
    ◎ Online archivalwith reporting ○ Make years of data available ◎ ETL off-loading ○ Spark jobs to reduce ETL job time from hours to minutes ◎ Batch reports off-loading ○ Reduce load on your warehouse by off- loading batch reports ◎ Big Data Discovery ○ Proactively find patterns guided by the system Other use-cases
  • 49.
    But we arejust a startup !
  • 50.
  • 51.
    Next steps ry, evaluateand adopt in risk- free manner
  • 52.
    ◎ Identify sourcesof your unused data ○ like server logs ○ social streams ◎ Collect and store on cloud to minimize initial investment ◎ Many cloud options like Amazon EC2, Databricks, Altiscale... ◎ Use open-source analytics engines like Elasticsearch, Kibana. They are free to use. ◎ Experience the success ◎ Automate using sensors or IoT devices to add more sources of useful data Start small and then scale
  • 53.
    ◎ https://aws.amazon.com/public-data-sets/ ◎ https://data.gov.in/ ◎https://open-data.europa.eu/en/data/ ◎ https://www.data.gov/ ◎ https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public Some open datasets to play with
  • 54.
    Woo-ha! I amfeeling empowered!
  • 55.
  • 56.