Real-time big data analytics
based on product recommendations case study
IT Business Solutions B2B Conference
October 2015
© deep.bi
We started as an ad network
The challenge was to recommend
the best product (out of millions)
to the right person in a given moment
(thousands of users within a second)
5 billionad views delivered in 24 months
To put it in the scale context:
If we would serve 1 ad per second it will take
160 years
to serve 5 billion ads
So we needed a solution
SQL databases did not work
Popular NoSQL databases did not work
Standard data warehouse approaches (pre-
aggregations, creating schemas) - did not work
Re-thinking all the problems with
huge data streams flowing to us every second
we have built a complete solution
based on open-source technologies
and fresh, smart ideas from our engineering team
It is called deep.bi
and now we make it available to other companies
DEEP.BI = BIG DATA FAST DATA SOLUTION
high velocity
high volume
deep.bi lets high-growth companies
solve fast data problems by providing
scalable, flexible and real-time
data collection, enrichment and analytics
deep.bi – complete data processing flow
Data
enrichment,
transformation
and integration
Unstructured,
raw data from
many sources
page views, IoT events,
IP, URL, cookie,
transactions, call detail
records, etc.
Find
patterns,
build models,
predict
behavior
collect enrich analyze
How to predict the best offer
based on online data – case study.
Collect website, campaigns and CRM data
Website:
Google
Analytics
Campaigns:
Agency
reports
Apps:
Dedicated
monitoring
tools
Other
systems:
Call center
IVR, emails
Instead of integrating current reporting tools we need to
gather all the single events that our customers generate.
Data is stored in silos. Reporting tools provide aggregated
reports impossible to integrate around single customer.
Collecting raw web data is not enough
2015-05-15T00:26:41.328Z,3,D,
[ip_hidden],i1xszg0f-19hqrje,"Mozilla/5.0 (Windows NT
5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/
42.0.2311.152 Safari/537.36",”[url_hidden]",
7279848891,@906,"https://www.google.pl/",vuser-history-
allegro-1-
hc20150509.1,"122_100003_Park@700:html_620x100_single_ban
ner:See offer"
IP, URL, cookie, user-agent, timestamp
* Coming soon
Enrich raw web and mobile data
50+information
from one
interaction
Purchase intent
Device
Time
Location
ISP
Online context
Weather*
Demographics
We can learn quite a few things from user IP
Example use:
•  international travellers
•  townspeople
•  people in mountains
•  rainy day
•  Country
•  Region
•  City
•  ZIP Code
•  Population
•  Latitude & Longitude
•  Time zone
•  IDD prefix to call the city from
another country
•  Phone area code
•  Mobile Country Code (MCC)
•  Mobile Network Code (MNC)
•  Elevation
•  Weather at the moment of event
ISP tells us more we could expect
Example use:
•  competitors’ users->
acquisition
•  our users -> retention/up-
selling/cross-selling
•  people from particular
company or company type
•  ISP name or Organization name
•  Organization type:
•  Commercial
•  Organization
•  Government
•  Military
•  University/College/School
•  Library
•  Content Delivery Network
•  Fixed Line ISP
•  Mobile ISP
•  Data Center/Web Hosting/Transit
•  Search Engine Spider
•  Reserved
•  Mobile brand
•  Net speed
Detailed information about user device
Example use:
•  smartphone users
•  Apple users
•  Samsung Galaxy users
•  Google browser users
•  Device Type
•  Device Brand
•  Device Model
•  Device Operating System
•  Operating System Producer
•  Browser
•  Browser Producer
Besides user features, track user behavior too.
Deeper understanding of people’s behavior:
•  RFM Segmentation (Recency, Frequency, Monetary)
•  Shopping cart analysis
•  Purchase sequence analysis
User behavior and characteristics
helps predicts next best action/offer
What product should we recommend?
How could end this purchase path?
So, how to build tailored recommendations?
Pick an algorithm that is suitable for the problem
Product [ feature_1, feature_2, …, feature_N]
User [ feature_1, feature_2, …, feature_N]
User [ product_1, product_2, …, product_N]
  Simple rules: if a user has some features serve
this group of products
  Manual segment creating: analysts find
segments of users and match them with
product segments
  Simple feature matching: get user weighted
feature vector and match with products feature
vectors
Manual / people managed rules
  Find segments automatically (e.g. k-means)
  Product features based recommendations
  User features based recommendations
  Combined product and user based
recommendations (collaborative filtering, deep
learning)
Machine learning-supported recommendations
Productpopularity
Products
The most interesting
recommendations
Recommendations long tail phenomenon
Technology behind Deep BI
  Complex data model for query optimization
 split dimensions in several tables based on reports made
 pre cherry-pick dimensions which we can aggregate based on
cardinality
 index every dimension column is a must
  Impossible to add high-cardinality dimensions
 no way to analyze per user (millions of them)
 no way to event add all of user-agent, url, geo-info, ...
Problems with SQL and NoSQL databases
  Complex data loading process
 needs to pre-aggregate in memory
 non-trivial reliability issues
 hard to parallelize
  There is always latency
 pre-aggregation in job loading memory
Problems with SQL and NoSQL databases
Customer
databases
Event
sources*
Raw data
stream
Transformed
data stream
Real-time data ingestion
Kafka
Data
Transformation
& Enrichment
Node.js, Spark
Streaming
Real-time
OLAP Store
Druid
Operational
Store
Cassandra
High performance, multi-purpose storage
Webanalytics
dashboard
deep.biAPI	
  
ETL	
  
Customer
analytics
dashboard
*e.g.. mobile apps, websites, marketing campaigns, IoT (beacons, wearables)	
  
Raw Data Store
Hadoop,
Parquet, Spark
deep.bi – real-time big data architecture
DEEP
Data enrichment,
storage
& analytics
Client’s DEEP
Data Space
End-user browser
Web Data Collection API
(HTML or JS)
Trackers pass event data with
<DEEP tracker>
Ingestion
API
Data Collection APIs
1	
  
<D>
<D>
Mobile Data Collection API
(HTML, JS or Native SDK)
Trackers pass event data with
Events are represented with full flexibility of JSON
{
"data": {
"event_type": "CLICK",
"ad_request_event": {
"ctx": {
"event_time": "2015-07-10T06:15:50.819Z",
"ip_address": "XX.XX.XX.XX",
"geo_info": {
"country": ”US", "region": ”California", "city": ”San Francisco",
"timezone": ”PST", "isp": ”XXX",
"population": 849,774
},
“page": {
"raw_url": ”XXX",
"standardized_domain": ”XXX"
},
"page_info": {
"page_raw_url": ”XXX",
”product_categories": [
{ "id": 20585 },
{ "id": 100126 },
},
"cookie": "ibx8axlw-17j287o",
"user_agent": "Mozilla/5.0 (Linux; Android 4.2.2; GT-S7580 Build/JDQ39)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile
Safari/537.36",
  Publish-subscribe service
  The nervous system of enterprise data
  decouple producers from consumers
  reliable buffer data
  send now, process later.
  Scalable distributed, replicated log system
  Pause components, restart processing
  Powered by:
  web giants like LinkedIn, Twitter, Netflix, Uber, Spotify or Pinterest
  >10M messages/second
Apache Kafka
  Scalable, fault-tolerant stream processing system
  With simple programming model & rich API & integrations
  Powered by:
  Yahoo, Netflix, eBay
  NASA, Intel, Cisco
  It is our fundamental technology for streaming applications
sessionize events
  detect frauds
  attribute purchases to click or views
  load & read external stores like Druid, Hadoop, Cassand
Apache Spark Streaming
  Open Source Streaming Data Store for Interactive Analytics at Scale
  denormalized data
  no more snowflake or star-schema!
  Build real-time dashboards, analytic applications, exploratory tools on it.
  It’s FAST!
  aggregate, drill-down, slice-n-dice in sub-seconds
  advanced column-store with compression
  sophisticated approximate algorithms
  It’s SCALABLE
  horizontally scalable - just add more machines
  replicated, highly-available
  Over 100 PBs of data, millions events/second
Druid – Real-time OLAP Store
  Ingest historical & real-time data
  data available for exploration in milliseconds
  can store years of data in very optimized storage
  Powered by
  eBay, Netflix, PayPal, Yahoo
  Cisco
  It is our core data store of all events, historical and real-time data
Druid – Real-time OLAP Store
  Apache Spark for batch-processing: fast and general engine for
large-scale data processing
  Replaces Map-Reduce, being up to 10x-100x faster!
  Number 1 open-source project in big data space (contributors, commits)
  In-memory processing (if possible)
  Spark SQL for SQL processing
  Apache Parquet - an optimized storage format
  columnar – read only columns you need
  compressed – specialized compression for data type + generic compression
  2x-4x: 600 GB data -> 150 GB data
  Hadoop can be optimized by 2 order of magnitudes: from hours
to seconds!
Hadoop Optimized
Thank you!
Share your thoughts, challenges
or case studies with us.
Or drop us a line: hello@deep.bi
SUBMIT»
Backup slides
Let’s assume we want to find users who:
  Were interested in smartphones
  Use Samsung product
  Live in cities with population over 1M people
  Are woman
  Were traveling abroad
  Came from our display campaign
So, we have a combination of 6 (k) dimensions from 50 (n).
Using the combination formula: we will have…
Complexity of multidimensional queries
… similar number of possible combinations:
15,890,700
as in Lotto (6 from 49).
Thank you!
Share your thoughts, challenges
or case studies with us.
Or drop us a line: hello@deep.bi
SUBMIT»

Real-time big data analytics based on product recommendations case study

  • 1.
    Real-time big dataanalytics based on product recommendations case study IT Business Solutions B2B Conference October 2015 © deep.bi
  • 2.
    We started asan ad network The challenge was to recommend the best product (out of millions) to the right person in a given moment (thousands of users within a second)
  • 3.
    5 billionad viewsdelivered in 24 months
  • 4.
    To put itin the scale context: If we would serve 1 ad per second it will take 160 years to serve 5 billion ads
  • 5.
    So we neededa solution SQL databases did not work Popular NoSQL databases did not work Standard data warehouse approaches (pre- aggregations, creating schemas) - did not work
  • 6.
    Re-thinking all theproblems with huge data streams flowing to us every second we have built a complete solution based on open-source technologies and fresh, smart ideas from our engineering team It is called deep.bi and now we make it available to other companies
  • 7.
    DEEP.BI = BIGDATA FAST DATA SOLUTION high velocity high volume
  • 8.
    deep.bi lets high-growthcompanies solve fast data problems by providing scalable, flexible and real-time data collection, enrichment and analytics
  • 9.
    deep.bi – completedata processing flow Data enrichment, transformation and integration Unstructured, raw data from many sources page views, IoT events, IP, URL, cookie, transactions, call detail records, etc. Find patterns, build models, predict behavior collect enrich analyze
  • 10.
    How to predictthe best offer based on online data – case study.
  • 11.
    Collect website, campaignsand CRM data Website: Google Analytics Campaigns: Agency reports Apps: Dedicated monitoring tools Other systems: Call center IVR, emails Instead of integrating current reporting tools we need to gather all the single events that our customers generate. Data is stored in silos. Reporting tools provide aggregated reports impossible to integrate around single customer.
  • 12.
    Collecting raw webdata is not enough 2015-05-15T00:26:41.328Z,3,D, [ip_hidden],i1xszg0f-19hqrje,"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ 42.0.2311.152 Safari/537.36",”[url_hidden]", 7279848891,@906,"https://www.google.pl/",vuser-history- allegro-1- hc20150509.1,"122_100003_Park@700:html_620x100_single_ban ner:See offer" IP, URL, cookie, user-agent, timestamp
  • 13.
    * Coming soon Enrichraw web and mobile data 50+information from one interaction Purchase intent Device Time Location ISP Online context Weather* Demographics
  • 14.
    We can learnquite a few things from user IP Example use: •  international travellers •  townspeople •  people in mountains •  rainy day •  Country •  Region •  City •  ZIP Code •  Population •  Latitude & Longitude •  Time zone •  IDD prefix to call the city from another country •  Phone area code •  Mobile Country Code (MCC) •  Mobile Network Code (MNC) •  Elevation •  Weather at the moment of event
  • 15.
    ISP tells usmore we could expect Example use: •  competitors’ users-> acquisition •  our users -> retention/up- selling/cross-selling •  people from particular company or company type •  ISP name or Organization name •  Organization type: •  Commercial •  Organization •  Government •  Military •  University/College/School •  Library •  Content Delivery Network •  Fixed Line ISP •  Mobile ISP •  Data Center/Web Hosting/Transit •  Search Engine Spider •  Reserved •  Mobile brand •  Net speed
  • 16.
    Detailed information aboutuser device Example use: •  smartphone users •  Apple users •  Samsung Galaxy users •  Google browser users •  Device Type •  Device Brand •  Device Model •  Device Operating System •  Operating System Producer •  Browser •  Browser Producer
  • 17.
    Besides user features,track user behavior too. Deeper understanding of people’s behavior: •  RFM Segmentation (Recency, Frequency, Monetary) •  Shopping cart analysis •  Purchase sequence analysis
  • 18.
    User behavior andcharacteristics helps predicts next best action/offer What product should we recommend? How could end this purchase path?
  • 19.
    So, how tobuild tailored recommendations? Pick an algorithm that is suitable for the problem Product [ feature_1, feature_2, …, feature_N] User [ feature_1, feature_2, …, feature_N] User [ product_1, product_2, …, product_N]
  • 20.
      Simple rules:if a user has some features serve this group of products   Manual segment creating: analysts find segments of users and match them with product segments   Simple feature matching: get user weighted feature vector and match with products feature vectors Manual / people managed rules
  • 21.
      Find segmentsautomatically (e.g. k-means)   Product features based recommendations   User features based recommendations   Combined product and user based recommendations (collaborative filtering, deep learning) Machine learning-supported recommendations
  • 22.
  • 23.
  • 24.
      Complex datamodel for query optimization  split dimensions in several tables based on reports made  pre cherry-pick dimensions which we can aggregate based on cardinality  index every dimension column is a must   Impossible to add high-cardinality dimensions  no way to analyze per user (millions of them)  no way to event add all of user-agent, url, geo-info, ... Problems with SQL and NoSQL databases
  • 25.
      Complex dataloading process  needs to pre-aggregate in memory  non-trivial reliability issues  hard to parallelize   There is always latency  pre-aggregation in job loading memory Problems with SQL and NoSQL databases
  • 26.
    Customer databases Event sources* Raw data stream Transformed data stream Real-timedata ingestion Kafka Data Transformation & Enrichment Node.js, Spark Streaming Real-time OLAP Store Druid Operational Store Cassandra High performance, multi-purpose storage Webanalytics dashboard deep.biAPI   ETL   Customer analytics dashboard *e.g.. mobile apps, websites, marketing campaigns, IoT (beacons, wearables)   Raw Data Store Hadoop, Parquet, Spark deep.bi – real-time big data architecture
  • 27.
    DEEP Data enrichment, storage & analytics Client’sDEEP Data Space End-user browser Web Data Collection API (HTML or JS) Trackers pass event data with <DEEP tracker> Ingestion API Data Collection APIs 1   <D> <D> Mobile Data Collection API (HTML, JS or Native SDK) Trackers pass event data with
  • 28.
    Events are representedwith full flexibility of JSON { "data": { "event_type": "CLICK", "ad_request_event": { "ctx": { "event_time": "2015-07-10T06:15:50.819Z", "ip_address": "XX.XX.XX.XX", "geo_info": { "country": ”US", "region": ”California", "city": ”San Francisco", "timezone": ”PST", "isp": ”XXX", "population": 849,774 }, “page": { "raw_url": ”XXX", "standardized_domain": ”XXX" }, "page_info": { "page_raw_url": ”XXX", ”product_categories": [ { "id": 20585 }, { "id": 100126 }, }, "cookie": "ibx8axlw-17j287o", "user_agent": "Mozilla/5.0 (Linux; Android 4.2.2; GT-S7580 Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile Safari/537.36",
  • 29.
      Publish-subscribe service  The nervous system of enterprise data   decouple producers from consumers   reliable buffer data   send now, process later.   Scalable distributed, replicated log system   Pause components, restart processing   Powered by:   web giants like LinkedIn, Twitter, Netflix, Uber, Spotify or Pinterest   >10M messages/second Apache Kafka
  • 30.
      Scalable, fault-tolerantstream processing system   With simple programming model & rich API & integrations   Powered by:   Yahoo, Netflix, eBay   NASA, Intel, Cisco   It is our fundamental technology for streaming applications sessionize events   detect frauds   attribute purchases to click or views   load & read external stores like Druid, Hadoop, Cassand Apache Spark Streaming
  • 31.
      Open SourceStreaming Data Store for Interactive Analytics at Scale   denormalized data   no more snowflake or star-schema!   Build real-time dashboards, analytic applications, exploratory tools on it.   It’s FAST!   aggregate, drill-down, slice-n-dice in sub-seconds   advanced column-store with compression   sophisticated approximate algorithms   It’s SCALABLE   horizontally scalable - just add more machines   replicated, highly-available   Over 100 PBs of data, millions events/second Druid – Real-time OLAP Store
  • 32.
      Ingest historical& real-time data   data available for exploration in milliseconds   can store years of data in very optimized storage   Powered by   eBay, Netflix, PayPal, Yahoo   Cisco   It is our core data store of all events, historical and real-time data Druid – Real-time OLAP Store
  • 33.
      Apache Sparkfor batch-processing: fast and general engine for large-scale data processing   Replaces Map-Reduce, being up to 10x-100x faster!   Number 1 open-source project in big data space (contributors, commits)   In-memory processing (if possible)   Spark SQL for SQL processing   Apache Parquet - an optimized storage format   columnar – read only columns you need   compressed – specialized compression for data type + generic compression   2x-4x: 600 GB data -> 150 GB data   Hadoop can be optimized by 2 order of magnitudes: from hours to seconds! Hadoop Optimized
  • 34.
    Thank you! Share yourthoughts, challenges or case studies with us. Or drop us a line: hello@deep.bi SUBMIT»
  • 35.
  • 36.
    Let’s assume wewant to find users who:   Were interested in smartphones   Use Samsung product   Live in cities with population over 1M people   Are woman   Were traveling abroad   Came from our display campaign So, we have a combination of 6 (k) dimensions from 50 (n). Using the combination formula: we will have… Complexity of multidimensional queries
  • 37.
    … similar numberof possible combinations: 15,890,700 as in Lotto (6 from 49).
  • 38.
    Thank you! Share yourthoughts, challenges or case studies with us. Or drop us a line: hello@deep.bi SUBMIT»

Editor's Notes

  • #4 5000000000/60/60/24/30/12
  • #8 Sources: http://www.infoworld.com/article/2608040/big-data/fast-data--the-next-step-after-big-data.html
  • #18 Segmentacja RFM (Recency, Frequency, Monetary) Ocena potencjału przychodowego klientów Analiza migracji między segmentami Analiza koszyków zakupowych Zrozumienie jakie koszyki konstruują klienci. Zrozumienie, które kategorie produktów najczęściej sprzedają się razem. Analiza sekwencji zakupowych Zrozumienie jak zachowania klientów układają się w czasie. Jakie sekwencje poprzedzają zakup. Jakie sekwencje poprzedzają wycofanie się.
  • #19 Modele typu uplift: Kupi po rekomendacji Grupa celowa Kupi bez rekomendacji Zbędny wydatek Nie kupi po rekomendacji Strata klienta
  • #20 Modele typu uplift: Kupi po rekomendacji Grupa celowa Kupi bez rekomendacji Zbędny wydatek Nie kupi po rekomendacji Strata klienta
  • #25 Source: http://saasaddict.walkme.com/saas-2015-new-shifts-will-see/ 1.Companies Will Be Investing More in Personal Consumer Research Currently a lot of consumer research is performed in a very static manner, through surveys and analysis of raw data. What more companies will be investing in is in personalization and customization in their services. They will also focus on getting to know their customers more personally, usually through social media, through the use of Big Data (see more on that below) and through direct engagement (via email and social media). Details like purchasing motivations, lifestyle, and desires are all important. Relevant marketing strategies seek to improve customer satisfaction and motivate customers to value your brand as more than just a service. 2. Cloud Data Services Will Overtake Traditional Means of Storage According to Forrester research, Microsoft will be generating more revenue from its cloud services compared to its traditional on-premise application. Traditional services are limited by their on-premise storage space, while cloud data services are much more open. This will allow for businesses to look into contracting cloud services for meaningful growth while it is still relatively inexpensive. One challenge to watch out for is that cloud data breaches are a legitimate issue. Expect companies to invest heavily in shoring up their securities to avoid breaches. 3. More SaaS Apps Will Specialize in Specific Industries Industries like healthcare, manufacturing, and retail will be developing more apps in their specific fields. One of the challenges to this new approach is that it burdens the customer with a deeper, more complex experience to acclimate to. However, a benefit to specialized SaaS is that companies will have a built-in userbase which gives them a head start when developing features. It also benefits enterprise customers. The reason that this trend is important is because consumers are demanding more apps that are relevant to specific needs. Generalized apps avoid getting too complex in any one area which can alienate consumers by not providing solutions they desire. 4. New Alternatives to Multitenancy Will Develop Allowing multiple customers to share a single application instance is useful for managing data on cloud services. While the traditional sense allowed for multiple users to be plugged in, and had individual views, alternatives that allow for more personalized experiences are being developed. For example, Salesforce.com is offering a new 'Superpod' service for enterprises. This allows companies to have their own dedicated infrastructure inside their data centers, rather than connect to a single server-side instance. These new hybrid services gives enterprises more options leading into the future, allows for more innovation in developing delivery systems, and thus frees up the bottleneck in the cloud service market. It also gives consumers options as well. 5. A Bigger Emphasis on Big Data Analytics According to IDC reports, there is a trend leading towards a greater use of data-as-a-service (DaaS) with spending reaching $215 billion in 2015. DaaS will leverage cloud to deliver their services. They also predict that more companies will be using big data analytics as a part of their commercial and open data sets. Cloud storage offers more flexibility for enterprise access and overall capacity. Since the relative cost of cloud storage per unit is decreasing, more companies are becoming interested in big data analysis, which makes it a perfect opportunity to begin implementing open data set technologies.
  • #26 Source: http://saasaddict.walkme.com/saas-2015-new-shifts-will-see/ 1.Companies Will Be Investing More in Personal Consumer Research Currently a lot of consumer research is performed in a very static manner, through surveys and analysis of raw data. What more companies will be investing in is in personalization and customization in their services. They will also focus on getting to know their customers more personally, usually through social media, through the use of Big Data (see more on that below) and through direct engagement (via email and social media). Details like purchasing motivations, lifestyle, and desires are all important. Relevant marketing strategies seek to improve customer satisfaction and motivate customers to value your brand as more than just a service. 2. Cloud Data Services Will Overtake Traditional Means of Storage According to Forrester research, Microsoft will be generating more revenue from its cloud services compared to its traditional on-premise application. Traditional services are limited by their on-premise storage space, while cloud data services are much more open. This will allow for businesses to look into contracting cloud services for meaningful growth while it is still relatively inexpensive. One challenge to watch out for is that cloud data breaches are a legitimate issue. Expect companies to invest heavily in shoring up their securities to avoid breaches. 3. More SaaS Apps Will Specialize in Specific Industries Industries like healthcare, manufacturing, and retail will be developing more apps in their specific fields. One of the challenges to this new approach is that it burdens the customer with a deeper, more complex experience to acclimate to. However, a benefit to specialized SaaS is that companies will have a built-in userbase which gives them a head start when developing features. It also benefits enterprise customers. The reason that this trend is important is because consumers are demanding more apps that are relevant to specific needs. Generalized apps avoid getting too complex in any one area which can alienate consumers by not providing solutions they desire. 4. New Alternatives to Multitenancy Will Develop Allowing multiple customers to share a single application instance is useful for managing data on cloud services. While the traditional sense allowed for multiple users to be plugged in, and had individual views, alternatives that allow for more personalized experiences are being developed. For example, Salesforce.com is offering a new 'Superpod' service for enterprises. This allows companies to have their own dedicated infrastructure inside their data centers, rather than connect to a single server-side instance. These new hybrid services gives enterprises more options leading into the future, allows for more innovation in developing delivery systems, and thus frees up the bottleneck in the cloud service market. It also gives consumers options as well. 5. A Bigger Emphasis on Big Data Analytics According to IDC reports, there is a trend leading towards a greater use of data-as-a-service (DaaS) with spending reaching $215 billion in 2015. DaaS will leverage cloud to deliver their services. They also predict that more companies will be using big data analytics as a part of their commercial and open data sets. Cloud storage offers more flexibility for enterprise access and overall capacity. Since the relative cost of cloud storage per unit is decreasing, more companies are becoming interested in big data analysis, which makes it a perfect opportunity to begin implementing open data set technologies.
  • #31 Sources: https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani
  • #32 Sources: https://speakerdeck.com/metamx/druid-plus-r https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani