3. Customer retention
Recommendations during product choosing
Cross-sell during purchasing process
Realtime personal discounts
Personalized search results
4. Customer returns
Predict following sales and offer the customer
Personalized discounts and hot-sales e-mails
Work with abandoned baskets
After-sale support by e-mail of phone
6. Customer behavior prediction
Predict customer preferences
Predict unknown data about customer
Predict future purchases
Predict lost customers
Predict anything for what you can find correlations
8. Convert visitors to HAPPY buyers
1. Analyze visitor
2. Predict visitor needs
3. Determine visitor behavior pattern
4. Inject into salesflow the most effective additional points of influence personalized
for the visitor
5. Suggest exactly what the visitor wants
6. Make the visitor happy buyer
7. Thank him for the purchasing and suggest more
10. Magento 2.0
instance
Data Sources
Events flow
Generate events
ES
Persist events to
datastore
321 …
Event consumers
Machine Learning
standalone service
Data required for
prediction models
Consumer asks by SOAP
for a prediction results
Create sub-events
Calls to API in order to ML
decisions
Communicates
with visitors
Realtime calls to ML service API
to obtain predicted data
Hadoop Spark ML
Batch and realtime long-term history analysis,
heavy reporting
Customer activities, internal data changes
Data Flow
11. Data sources
1. Products catalog, inventory
2. Pages visit logs
3. Purchases, abandoned baskets
4. Ratings, reviews
5. External data sources like Twitter, Amazon, public datasets, etc.
6. Timeseries with
• history of changes of product’s prices
• customers activity log
12. Events FLOW
1. Common event bus using RabbitMQ for small customers and Apache Kafka for a
large
2. It’s a horizontal highly scalable solution
3. All data inside events should get to the persistent datastore according to
consumers rules
4. After that consumers may trigger sub-event for the ML algorithms that depends
on changed data
5. If ML algorithm should call some API method in Magento (for example add
customer to a new segment), it would publish event for the appropriate consumer
6. On each step we have the opportunity to integrate any external systems into our
process flow through the event bus
13. Persistent datastore
1. Datastore should have three levels
I. In-memory datastore to cache operational data for realtime queries
II. Operational datastore to persists all appropriate data for machine learning algorithms
III. Analytical datastore for all historical data which will be used for a heavy reporting and
deep ML analysis
2. Due to the probabilistic nature of the ML algorithm, in datastore architecture we
can sacrifice Consistency of CAP theory and guarantee Availability and Partition
tolerance
3. On the first step of discussing I propose to use Redis(VoltDB, Aerospike,
Tarantool), ElasticSearch(Solr, MySQL, HBase) and Hadoop
14. Machine learning service
1. Will be implemented as standalone service
2. Binary/SOAP/REST protocols using HTTP/TCP transport layer
3. Direct read-only access to all data sources
4. ACL checks should be implemented on clients
5. Horizontally scalable nothing-shared architecture
6. Calculated models will be synced using binary protocol without master-node
(Zookeeper)
7. Each node has its own memory pool to store internal datasets for calculations
15. Hadoop + Spark
Should be implemented only for extremely large stores
Hadoop is a very slow datastore
But Hadoop and Spark together allow us to use machine learning algorithms in
near-realtime and distributed manner
Using event bus we can write all the data to Hadoop and run ML tasks on unlimited
volumes of data:
Reporting
Batch clustering
Searching for patterns and outliers
17. Realtime Recommendations
Using all historical data about user’s activity and internal datasources we can predict
customer needs
User activities:
Page views log with duration of an each page view
Visitor returns
Registrations
Ratings and reviews
Purchases
Abandoned shopping cart
Internal datasources
Product’s prices with changes
Discounts
Customer segments
18. Personal discounts
Create behavior patterns and detect cases when merchant should give a personal
discount to customer on a particular product
Discounts will be shown to customer in realtime during catalog browsing
If customer didn’t purchase discounted product, algorithm should take this into
consideration in further work with this customer
19. Personalized product catalog
Besides product recommendations ML algorithms can determine customer’s
preferences and generate product catalog page according to them
Product’s list may naturally include starred products from predicted list
Another way is sorting list by “the best choice”