SlideShare a Scribd company logo
1 of 15
Download to read offline
A Practical Guide to Enterprise Machine
Learning Platforms
By Tellago Research
Contents
Overview.......................................................................................................................................................3
Key Characteristics of Enterprise Machine Learning Solutions ....................................................................3
Cloud vs. On-Premise Machine Learning Platforms......................................................................................5
Enterprise Cloud Machine Learning Platforms.............................................................................................5
Azure Machine Learning ..........................................................................................................................6
AWS Machine Learning............................................................................................................................7
IBM Watson Developer Cloud .................................................................................................................8
Databricks.................................................................................................................................................9
On-Premise Enterprise Machine Learning Platforms ...................................................................................9
Revolution Analytics ..............................................................................................................................10
Dato ........................................................................................................................................................11
Spark MLib and Spark R .........................................................................................................................12
PredictionIO............................................................................................................................................13
Scikit-Learn .............................................................................................................................................13
Summary.....................................................................................................................................................14
Overview
Machine learning is becoming one of the most important aspects of modern
enterprise applications. Recent years has seen an explosion in the innovation of
machine learning platforms taking it from a domain constrained to a few data
scientists to a mainstream developer audience. As a result, companies are now in a
position to build really comprehensive machine learning applications that were
completely impossible just 2-3 years ago.
The explosion in machine learning technologies doesn’t come without a price for
enterprises. As any other rapidly emerging technology trends, machine learning has
experienced a rapid growth in the number of new platforms and startups that
provide relevant machine learning capabilities for enterprises. As a result, many
enterprises struggle navigating the new ecosystem of machine learning
technologies and platforms.
This paper provides an analysis of some of the most relevant technologies in the
machine learning space along with experiences that Tellago’s data science practice
team has implementing machine learning solutions in the real world. The analysis
illustrated in this paper is solely based on practical experiences and not theoretical
exercises.
Key Characteristics of Enterprise Machine Learning Solutions
Integration with Mainstream Data Stores
The integration with diverse data stores is a key element for the mainstream
adoption of machine learning platforms. Databases, SaaS platforms, ERPs, CRMs
are just some of the data sources that can be relevant in machine learning
scenarios. The ability to seamlessly integrate with different line of business systems
drastically simplifies the adoption of machine learning platforms in enterprise
environments.
Integration with R and Python
R and Python have been the main platforms used in machine learning and data
science applications. Consequently, there are many widely adopted machine
learning frameworks implemented in R and Python. The interoperability with R and
Python libraries allows machine learning platforms to take advantage of well-
established data science practices and techniques implemented in those
frameworks. In that sense, enterprises can benefit from machine learning platforms
that can natively leverage R and Python libraries.
Simple Infrastructure
Scaling machine learning infrastructures can be a complex endeavor. Even worse,
the complexities around the configuration of machine learning infrastructures
sometimes become a friction point for the early adoption of machine learning
platforms. To avoid those challenges, enterprises should look for machine learning
platforms that can be relatively simple to setup and don’t require massive
investments in infrastructure. This will allow organizations to focus on the
evaluation of core machine learning capabilities instead of the infrastructure behind
them.
Programmatic Interfaces
Executing and evaluating machine learning models is often seen as an activity
exclusively performed by humans. However, incorporating machine learning models
into business applications is incredibly relevant in the enterprise. To achieve that,
machine learning platforms should support the programmatic execution of models
via APIs or mainstream enterprise programming platforms such as .NET or Java.
Monitoring and Management Tools
Monitoring and managing the execution of machine learning models is an essential
element to guarantee the adoption of these type of platforms in enterprise
environments. From the monitoring perspective, machine learning platforms should
enable both analytics about the results of executed models as well as operational
metrics related to the execution of those models. Additionally, organizations should
favor machine learning platforms that provide a simple but robust management
experience.
Extensibility
Until a few years ago, machine learning platforms were notoriously closed systems.
That factor really limited the mainstream adoption of these platforms in enterprise
environments, as many machine learning solutions require complex levels of
customizations that require extending the core platform. In that sense,
organizations should carefully evaluate the extensibility models of machine learning
platforms and analyze how those can help to optimize the platform for their specific
scenarios.
Cloud vs. On-Premise Machine Learning Platforms
A simple way to simplify the really crowded machine learning platform market is to
make a distinction between cloud and on-premise platforms. For many
organizations, the nature of the underlying infrastructure (cloud vs. on-premise) is
a determining factor in terms of which machine learning platforms to evaluate.
Deciding between and on-premise and cloud platform is always an interesting
dilemma for most organizations but its even more relevant when it comes to data-
centric platforms. While cloud machine learning platforms abstract the complexity
of the underlying machine learning infrastructure and are rapidly driving innovation
in the space, they lack the levels of control and extensity that you can achieve with
on-premise machine learning stacks.
The next section in this document provides an analysis of some of the most
relevant cloud and on-premise platforms in the machine learning space.
Enterprise Cloud Machine Learning Platforms
Machine learning platforms are rapidly emerging as one of the most important
components of platform as a service (PaaS) technologies. While the first iteration of
cloud big data technologies focused on providing a seamless experience for hosting
and provisioning a Hadoop based infrastructure, the lead platforms in the space are
rapidly adding value data intelligence capabilities including machine learning. This
movement has been led by platforms like Microsoft, Amazon or IBM, which have
added sophisticated machine learning capabilities to their existing PaaS offerings.
Additionally, there is a large number of startups trying to provide specialized
machine learning cloud services that simplify the experience for organizations trying
to apply machine learning models to specific business scenarios. When analyzing
the cloud machine learning platform space, organizations should consider Azure,
AWS, IBM and Databricks as some of the leader in the space.
Azure Machine Learning
 Overview: Azure machine learning is a fully managed service included in the
Azure platform that allows the implementation of predictive analytics
solutions using machine learning. The service provides interfaces for building,
deploying and managing machine learning models and its tightly integrated
with other Azure services. Currently, Azure machine learning is included as
part of the Cortana Analytics suite.
 Key Capabilities: Azure machine learning includes some of the following
capabilities
o Machine Learning Studio: Microsoft Azure Machine Learning Studio
is a collaborative, drag-and-drop tool you can use to build, test, and
deploy predictive analytics solutions on your data. Machine Learning
Studio publishes models as web services that can easily be consumed
by custom apps or BI tools such as Excel.
o API Generation: Azure machine learning provides the infrastructure
to expose machine learning models as APIs that can be
programmatically accessed by client applications. These APIs can also
be integrated with the Azure API Gateway to enable more
sophisticated management and monitoring features.
o R and Python Extensibility: Azure machine learning allows
developers to incorporate custom R and Python scripts into models.
This extensibility mechanism allows developers to implement machine
learning applications that combine the capabilities of Azure with many
of the popular R and Python machine learning frameworks in the
market.
 Challenges: Azure machine learning is still relatively limited in terms of the
integration with on-premise data stores, which are predominant in the
enterprise. Additionally, we feel Azure machine learning can benefit for more
complete extensibility mechanisms beyond the ones provided by R and
Python scripts.
AWS Machine Learning
 Overview: Amazon Machine Learning is a native AWS service that makes it
easy for developers of all skill levels to use machine learning technologies.
Amazon Machine Learning provides visualization tools and wizards that guide
developers through the process of creating machine learning (ML) models
without having to learn complex ML algorithms and technology. Amazon
Machine Learning makes it easy to obtain predictions for your application
using simple APIs, without having to implement custom prediction generation
code, or manage any infrastructure
 Key Capabilities: Azure Machine Learning enables some of the following key
capabilities:
o Model Creation: AWS APIs and wizards make it easy for any
developer to create and fine-tune ML models from data stored
in different data stores and query these models for predictions. The
service’s built-in data processors, scalable ML algorithms, interactive
data and model visualization tools, and quality alerts help you build
and refine your models quickly.
o Prediction Services: AWS machine learning provides the
mechanisms for quickly and reliably generate predictions for your
applications based on previously created machine learning models. The
prediction services can be elastically scaled using AWS infrastructure.
o Data Transformation DSL: AWS machine learning includes a domain
specific language (DSL) that allows developers to model
transformations on the data processed by machine learning models.
Data transformation implemented using these DSLs can be published
as “recipes” and reused across other transformation processes.
 Challenges: The experience of getting started with AWS machine learning is
relatively complex compared to its competitors in the space. We believe the
AWS machine learning service can benefit from incorporating more visual
tools that facilitate the authoring of machine learning models. Another
challenging factor in AWS machine learning applications remains the
communication with on-premise data stores.
IBM Watson Developer Cloud
 Overview: IBM Watson developer cloud is a series of cognitive data services
included as part of the IBM Bluemix platform. The Watson developer cloud
includes services such as vision analysis, text analytics, text-to-speech
transformation, concept expansion, among a dozen of other that enable
developers to incorporate deep learning and cognitive data capabilities within
their applications.
 Key Capabilities: The Watson developer cloud includes some of the
following capabilities.
o Text Analytics: Watson developer cloud provides a large number of
text analytics related services including relationship extraction,
concept insights, sentiment analysis etc. These services can be easily
integrated with other machine learning or business applications.
o Vision Analytics: Watson developer cloud provides a group of
innovative services that abstract key image analysis capabilities such
as face recognition, object detection, image link extraction etc. These
services can complement image libraries required in line of business
applications and solutions.
o Integration with Bluemix Services: Watson developer cloud is
included as part of IBM Bluemix and, consequently, is tightly
integrated with other Bluemix platform services. As a result,
developers can implement really robust applications that leverage
cognitive data services
 Challenges: Watson developer cloud is a collection of APIs that enable
cognitive data capabilities. As a result, Watson developer cloud is typically
used as a complement to machine learning applications and can’t be
considered a complete machine learning solution.
Databricks
 Overview: Databricks is a cloud integrated platform that enables the
implementation and operation of Apache Spark applications. As part of the
current capabilities, Databricks provides strong support for Spark MLib and
Spark R.
 Key Capabilities:
o Model Performance: Databricks provides a highly scalable
architecture that powers the performance of Spark MLib models. This
capability allows developers to focus on writing Spark MLib solutions
without worrying about the underlying infrastructure.
o Support for R: In addition to Spark MLib, Databricks provides support
for Spark R. This capability allows developers to write very
sophisticated applications that combine traditional machine learning
and R models to achieve optimal results.
o On-premise Support: One of the biggest advantages of Databricks is
that is completely based on Apache Spark. That model allows
developers to write machine learning applications that can seamlessly
work in both on-premise and cloud topologies.
 Challenges: The current feature set of Spark MLib and Spark R is relatively
limited compared to some of its cloud competitors. Additionally, Databricks is
a Spark-exclusive cloud, which means that it doesn’t include complementary
platform services comparable the ones provided by PaaS solutions like Azure,
AWS or Bluemix.
On-Premise Enterprise Machine Learning Platforms
Similar to the cloud space, the on-premise machine learning space is experiencing
an explosion in the number of technologies and platforms that enable the
implementation of enterprise-ready machine learning solutions. Differently from the
cloud space, new on-premise machine learning technologies seem to be actively
built on popular open source data science frameworks such as R and Python instead
of building proprietary stacks. As a result, many of the lead machine learning
platforms are also delivered as open source distributions. The following sections in
this paper evaluates some of the key on-premise machine learning stacks such as
Revolution Analytics, Data, Spark, PredicitonIO and Scikit-learn.
Revolution Analytics
 Overview: Revolution R Enterprise provides the infrastructure for
implementing enterprise-ready analytics applications based on R. Supporting
a variety of big data statistics, predictive modeling and machine learning
capabilities, Revolution R Enterprise is also 100% R. Revolution R Enterprise
supports a variety of analytical capabilities including exploratory data
analysis, model building and model deployment.
 Key Capabilities: Revolution R provides some of the following key
capabilities:
o Scalable R: Revolution R Enterprise scales and accelerates R, running
R scripts in a high-performance, parallel architecture that supports
systems from workstations to clusters and grids including Hadoop and
enterprise data warehouses.
o Enterprise-Ready R Capabilities: Revolution R expands R with
enterprise-ready capabilities such as logging, instrumentation,
security, monitoring among other features that are essential to
operationalize R solutions in the enterprise.
o Integration with Mainstream Analytic Tools: Revolution R
provides integration with many of the most popular analytics tools in
the enterprise such as Tableau, Excel or Qlikview. Additionally,
Revolution R also integrates with traditional reporting platforms such
as Cognos, Business Objects etc.
 Challenges: Revolution R is optimized for authoring applications in the R
language. Sometimes, this model results limited for the implementation of
complete enterprise applications. Additionally, the applications implemented
with Revolution R can be complex to integrate into other enterprise solutions.
Dato
 Overview: Dato enables the rapid development, simple deployment, and
robust management of real-time services and applications that use machine
learning. Dato leverages the advancements in Python machine learning
libraries to enable the implementation of highly sophisticated, enterprise-
ready machine learning solutions. The Dato platform includes three key
products: Graphlb Create, Dato Distributed and Dato Predictive Services.
 Key Capabilities:
o Model Creation: Dato’s Graphlab Create is an extensible machine
learning framework that enables developers and data scientists to
easily build and deploy intelligent applications and services at scale. It
includes distributed data structures and rich libraries for data
transformation and manipulation as well as scalable task-oriented
machine learning toolkits for creating, evaluating, and improving
machine learning models.
o Scalable Execution: The Dato platform includes Data Distributed
which is a server product that allows distributed execution of machine
learning jobs on a cluster of machines. Jobs can include distributed
training of machine learning models, parallel model scoring &
predictions, distributed hyperparameter tuning, model ensembling,
and evaluation tasks. This capability abstracts the complexities of
scaling machine learning models in enterprise environments.
o API Access: Dato Predictive Services enables the execution of Dato
machine learning models as high performance APIs. This capability
allows developers to easily incorporate machine learning models into
new applications without having to use any proprietary libraries.
 Challenges: As any new product, enterprises adopting Dato faced the
challenge of embracing a product without a large community of developers
and system implementers. However, the communities around Dato are
rapidly growing. Additionally, Dato is completely Python-centric which makes
it challenging to adopt by organizations without that in-house expertise.
Spark MLib and Spark R
 Overview: Apache Spark includes two main libraries for machine learning
applications: Spark MLib and Spark R. MLlib is Spark’s scalable machine
learning library consisting of common learning algorithms and utilities,
including classification, regression, clustering, collaborative filtering,
dimensionality reduction, as well as underlying optimization primitives. Spark
R is an R package that provides a light-weight frontend to use Apache Spark
from R. Spark R provides a distributed data frame implementation that
supports operations like selection, filtering, aggregation etc. (similar to R
data frames, dplyr) but on large datasets.
 Key Capabilities: Spark provides the following key capabilities for machine
learning applications:
o Scalability: Because Spark MLib and Spark R are built on the Spark
platform; they enjoy the scalability and performance benefits of the
Spark architecture. In that sense, Spark machine learning models can
run across large topologies with hundreds of nodes and recover from
unexpected errors.
o Support for R: The addition of Spark R offers developers a very
unique option of combining R and machine learning models as part of
the same applications. More importantly, both Spark R and Spark MLib
are provisioned, scaled and managed using the same underlying
infrastructure.
o Developer and System Integrator Community: Apache Spark is
enjoying a rapidly growing community of developers and system
integrators. As a result, organizations can enjoy a strong support for
machine learning applications built on Apache Spark and Apache R.
 Challenges: The infrastructure required to run Spark Mlib and Spark R
applications at an enterprise scale can result is a very complex endeavor.
Additionally, the tools to fully operationalize Spark Mlib and Spark R
applications are still limited compared to other platforms in the space.
PredictionIO
 Overview: PredictionIO is an open-source Machine Learning server for
developers and data scientists to build and deploy predictive applications in a
fraction of the time. PredictionIO template gallery offers a wide range of
predictive engine templates for download where developers can customize
them easily. PredictionIO is built on top of Apache Spark and it expands it
with enterprise-ready capabilities such as event-based activations, API
generation or monitoring tools.
 Key Capabilities:
o Template Based Authoring: PredictionIO provides a model for
authoring simple machine learning applications based on templates.
These templates abstract some of the underlying complexity of a
machine learning model and can be extended and customized for
specific scenarios.
o Event Based Activation: PredictionIO includes an event server
component that enables the asynchronous activation of machine
learning engines. This architecture provides a scalable model to
execute machine learning applications across diverse topologies.
o Monitoring and Management Tools: PredictionIO extends Apache
Aprk with sophisticated management and monitoring tools that
facilitate the operational readiness of machine learning applications.
 Challenges: Although incredibly easy to use for simple machine learning
scenarios, PredictionIO can result limited in the implementation of more
complex models. Additionally, PredictionIO still hasn’t been able to build
large developer and system integrator communities and streamline its
implementation in enterprise environments.
Scikit-learn
 Overview: Scikit-learn is framework provides a range of supervised and
unsupervised learning algorithms via a consistent interface in Python. It is
licensed under a permissive simplified BSD license and is distributed under
many Linux distributions, encouraging academic and commercial use.
 Key Capabilities:
o Rich Machine Learning Algorithm Library: Scikit-learn provides
what can be considered the richest collection of machine learning
algorithms of any framework in the space. The framework also
combines features from popular frameworks like Numpy, Scipy or
Sympy to provide sophisticated capabilities in areas such as symbolic
mathematics or scientific computing.
o Simple Programming Model: Despite its large feature set, Scikit-
learn provides a very simple programming model that allow developers
without strong expertise in machine learning to implement highly
sophisticated data science applications.
o Rich Data Visualizations: Scikit-learn provides a strong set of data
visualization capabilities that can be combines with the machine
learning model to rapidly evaluate the effectiveness of the models.
 Challenges: Scikit-learn is a programming framework and not a machine
learning platform. In that sense, Scikit-learn does not provide the scalability
models or the monitoring and management tools typically included in
machine learning platforms. As a result, enterprises should look to leverage
the rich capabilities of Scikit-learn in conjunction with other machine learning
platforms to implement enterprise-ready data science solutions.
Summary
Machine learning is becoming one of the most relevant aspects of data intelligence
solutions in the enterprise. Enterprises evaluating machine learning platforms
should consider both cloud and on-premise options. Cloud enterprise machine
learning platforms excel on abstracting the underlying infrastructure needed to run
and scale machine learning models. On-premise enterprise machine learning
platforms offer rich extensibility models and typically rely on open source
distribution channels.
Platforms like Azure, AWS and IBM are leading the charge in the cloud enterprise
machine space. Vendors like DAtabricks are also bringing a lot of innovation to the
space. In the on-premise arena, companies like Data and PredictionIO as well as
popular open source frameworks like Apache Spark or Scikit-learn are some of the
robust options for enterprises building data science solutions. This paper included
an analysis of some of the key machine learning platforms including their strengths
and weaknesses based on our experience in real world implementations.

More Related Content

What's hot

Understanding the Cloud Computing Stack
Understanding the Cloud Computing StackUnderstanding the Cloud Computing Stack
Understanding the Cloud Computing Stack
Rackspace
 
Hybrid cloud platform for businesses
Hybrid cloud platform for businessesHybrid cloud platform for businesses
Hybrid cloud platform for businesses
Lakshman Singh
 
Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...
Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...
Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...
Microsoft Private Cloud
 
Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...
Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...
Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...
Microsoft Private Cloud
 

What's hot (16)

More cloud less fluff
More cloud less fluffMore cloud less fluff
More cloud less fluff
 
Solving data discovery in the enterprise
Solving data discovery in the enterpriseSolving data discovery in the enterprise
Solving data discovery in the enterprise
 
Case Studies (Questions and Answers)
Case Studies (Questions and Answers)Case Studies (Questions and Answers)
Case Studies (Questions and Answers)
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Understanding the Cloud Computing Stack
Understanding the Cloud Computing StackUnderstanding the Cloud Computing Stack
Understanding the Cloud Computing Stack
 
Hybrid cloud platform for businesses
Hybrid cloud platform for businessesHybrid cloud platform for businesses
Hybrid cloud platform for businesses
 
Cloud computing
Cloud computingCloud computing
Cloud computing
 
Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...
Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...
Microsoft Windows Azure - Dream Factory Software Combines Data Across Clouds ...
 
Grail research-horizons-watch-cloud-trends
Grail research-horizons-watch-cloud-trendsGrail research-horizons-watch-cloud-trends
Grail research-horizons-watch-cloud-trends
 
Grail research-navigating-the-cloud
Grail research-navigating-the-cloudGrail research-navigating-the-cloud
Grail research-navigating-the-cloud
 
Transform IT Operations with CSC
Transform IT Operations with CSCTransform IT Operations with CSC
Transform IT Operations with CSC
 
BSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation LabsBSFI Technology Offerings by Value Innovation Labs
BSFI Technology Offerings by Value Innovation Labs
 
Web Services as A Solution for Cloud Enterprise Resource Planning Interoperab...
Web Services as A Solution for Cloud Enterprise Resource Planning Interoperab...Web Services as A Solution for Cloud Enterprise Resource Planning Interoperab...
Web Services as A Solution for Cloud Enterprise Resource Planning Interoperab...
 
Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...
Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...
Microsoft Windows Azure - Acumatica an IT Services Company Delivers Software ...
 
Enterprise Integration Patterns Revisited (EIP) for the Era of Big Data, Inte...
Enterprise Integration Patterns Revisited (EIP) for the Era of Big Data, Inte...Enterprise Integration Patterns Revisited (EIP) for the Era of Big Data, Inte...
Enterprise Integration Patterns Revisited (EIP) for the Era of Big Data, Inte...
 
IRJET- Legacy and Privacy Issues in Cloud Computing
IRJET- Legacy and Privacy Issues in Cloud ComputingIRJET- Legacy and Privacy Issues in Cloud Computing
IRJET- Legacy and Privacy Issues in Cloud Computing
 

Viewers also liked

The Minister's Black Veil - in class notes
The Minister's Black Veil - in class notesThe Minister's Black Veil - in class notes
The Minister's Black Veil - in class notes
lramirezcruz
 

Viewers also liked (18)

A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
Distil
DistilDistil
Distil
 
Code Generation as a Service
Code Generation as a ServiceCode Generation as a Service
Code Generation as a Service
 
Christoforos zolotas cloudmde2015 presentation - camera ready
Christoforos zolotas  cloudmde2015 presentation - camera readyChristoforos zolotas  cloudmde2015 presentation - camera ready
Christoforos zolotas cloudmde2015 presentation - camera ready
 
Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015Optimization of Incremental Queries CloudMDE2015
Optimization of Incremental Queries CloudMDE2015
 
Introduce to PredictionIO
Introduce to PredictionIOIntroduce to PredictionIO
Introduce to PredictionIO
 
[2C2]PredictionIO
[2C2]PredictionIO[2C2]PredictionIO
[2C2]PredictionIO
 
The Minister's Black Veil - in class notes
The Minister's Black Veil - in class notesThe Minister's Black Veil - in class notes
The Minister's Black Veil - in class notes
 
Prediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handout
 
Machine Learning Software Design Pattern with PredictionIO
Machine Learning Software Design Pattern with PredictionIOMachine Learning Software Design Pattern with PredictionIO
Machine Learning Software Design Pattern with PredictionIO
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
Practical Machine Learning
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
 
PredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scala
 
PredictionIO
PredictionIOPredictionIO
PredictionIO
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Disruptive Analysis - Telecoms Futurism
Disruptive Analysis - Telecoms FuturismDisruptive Analysis - Telecoms Futurism
Disruptive Analysis - Telecoms Futurism
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 

Similar to Machine learning in the enterprise

Cloud based Machine Learning Platforms, a review - Sagar Khashu
Cloud based Machine Learning Platforms, a review - Sagar KhashuCloud based Machine Learning Platforms, a review - Sagar Khashu
Cloud based Machine Learning Platforms, a review - Sagar Khashu
Sagar Khashu
 
Power Platform Training
Power Platform TrainingPower Platform Training
Power Platform Training
Digisurface
 

Similar to Machine learning in the enterprise (20)

Cloud based Machine Learning Platforms, a review - Sagar Khashu
Cloud based Machine Learning Platforms, a review - Sagar KhashuCloud based Machine Learning Platforms, a review - Sagar Khashu
Cloud based Machine Learning Platforms, a review - Sagar Khashu
 
K-MUG Azure Machine Learning
K-MUG Azure Machine LearningK-MUG Azure Machine Learning
K-MUG Azure Machine Learning
 
For linked in part 2 no template
For linked in part 2  no templateFor linked in part 2  no template
For linked in part 2 no template
 
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
 
eBook-DataSciencePlatform
eBook-DataSciencePlatformeBook-DataSciencePlatform
eBook-DataSciencePlatform
 
Week 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud ComputingWeek 12: Cloud AI- DSA 441 Cloud Computing
Week 12: Cloud AI- DSA 441 Cloud Computing
 
Deep architectural competency for deploying azure solutions
Deep architectural competency for deploying azure solutionsDeep architectural competency for deploying azure solutions
Deep architectural competency for deploying azure solutions
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestration
 
Mann assignment
Mann assignmentMann assignment
Mann assignment
 
Session 4 & 5
Session 4 & 5Session 4 & 5
Session 4 & 5
 
Power Platform Training
Power Platform TrainingPower Platform Training
Power Platform Training
 
12 Pro Predictive Analysis Tools to Look Out for in 2024.pdf
12 Pro Predictive Analysis Tools to Look Out for in 2024.pdf12 Pro Predictive Analysis Tools to Look Out for in 2024.pdf
12 Pro Predictive Analysis Tools to Look Out for in 2024.pdf
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Aws overview
Aws overviewAws overview
Aws overview
 
Aws overview
Aws overviewAws overview
Aws overview
 
AWS overview
AWS overviewAWS overview
AWS overview
 
Your practical reference guide to build an stream analytics solution
Your practical reference guide to build an stream analytics solutionYour practical reference guide to build an stream analytics solution
Your practical reference guide to build an stream analytics solution
 
Azure and the Cloud White Paper - Ethos
Azure and the Cloud White Paper - EthosAzure and the Cloud White Paper - Ethos
Azure and the Cloud White Paper - Ethos
 
The Total Economic Impact™ Of Microsoft Azure AI: Cost Savings and Business B...
The Total Economic Impact™ Of Microsoft Azure AI: Cost Savings and Business B...The Total Economic Impact™ Of Microsoft Azure AI: Cost Savings and Business B...
The Total Economic Impact™ Of Microsoft Azure AI: Cost Savings and Business B...
 
Technovision
TechnovisionTechnovision
Technovision
 

More from Jesus Rodriguez

More from Jesus Rodriguez (20)

The Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-PrimitivesThe Emergence of DeFi Micro-Primitives
The Emergence of DeFi Micro-Primitives
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
DeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto MarketDeFi Opportunities and Challenges in the Current Crypto Market
DeFi Opportunities and Challenges in the Current Crypto Market
 
MEV Deep Dive .pptx
MEV Deep Dive .pptxMEV Deep Dive .pptx
MEV Deep Dive .pptx
 
Quant in Crypto Land
Quant in Crypto LandQuant in Crypto Land
Quant in Crypto Land
 
The Polygon Blockchain by the Numbers
The Polygon Blockchain by the NumbersThe Polygon Blockchain by the Numbers
The Polygon Blockchain by the Numbers
 
Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies Social Analytics for Cryptocurrencies
Social Analytics for Cryptocurrencies
 
DeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating StrategiesDeFi Quant Yield-Generating Strategies
DeFi Quant Yield-Generating Strategies
 
High Frequency Trading and DeFi
High Frequency Trading and DeFiHigh Frequency Trading and DeFi
High Frequency Trading and DeFi
 
Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About Simple DeFi Analytics Any Crypto-Investor Should Know About
Simple DeFi Analytics Any Crypto-Investor Should Know About
 
15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics15 Minutes of DeFi Analytics
15 Minutes of DeFi Analytics
 
DeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and ChallengesDeFi Trading Strategies: Opportunities and Challenges
DeFi Trading Strategies: Opportunities and Challenges
 
Practical Crypto Asset Predictions rev
Practical Crypto Asset Predictions revPractical Crypto Asset Predictions rev
Practical Crypto Asset Predictions rev
 
Better Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain IndicatorsBetter Technical Analysis with Blockchain Indicators
Better Technical Analysis with Blockchain Indicators
 
Price Predictions for Cryptocurrencies
Price Predictions for CryptocurrenciesPrice Predictions for Cryptocurrencies
Price Predictions for Cryptocurrencies
 
Fascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About CryptocurrenciesFascinating Metrics and Analytics About Cryptocurrencies
Fascinating Metrics and Analytics About Cryptocurrencies
 
Price PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep LearningPrice PRedictions for Crypto-Assets Using Deep Learning
Price PRedictions for Crypto-Assets Using Deep Learning
 
Demystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data ScienceDemystifying Centralized Crypto Exchanges using Data Science
Demystifying Centralized Crypto Exchanges using Data Science
 
Crypto assets are a data science heaven rev
Crypto assets are a data science heaven revCrypto assets are a data science heaven rev
Crypto assets are a data science heaven rev
 
Implementing Machine Learning in the Real World
Implementing Machine Learning in the Real WorldImplementing Machine Learning in the Real World
Implementing Machine Learning in the Real World
 

Recently uploaded

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 

Machine learning in the enterprise

  • 1. A Practical Guide to Enterprise Machine Learning Platforms By Tellago Research
  • 2. Contents Overview.......................................................................................................................................................3 Key Characteristics of Enterprise Machine Learning Solutions ....................................................................3 Cloud vs. On-Premise Machine Learning Platforms......................................................................................5 Enterprise Cloud Machine Learning Platforms.............................................................................................5 Azure Machine Learning ..........................................................................................................................6 AWS Machine Learning............................................................................................................................7 IBM Watson Developer Cloud .................................................................................................................8 Databricks.................................................................................................................................................9 On-Premise Enterprise Machine Learning Platforms ...................................................................................9 Revolution Analytics ..............................................................................................................................10 Dato ........................................................................................................................................................11 Spark MLib and Spark R .........................................................................................................................12 PredictionIO............................................................................................................................................13 Scikit-Learn .............................................................................................................................................13 Summary.....................................................................................................................................................14
  • 3. Overview Machine learning is becoming one of the most important aspects of modern enterprise applications. Recent years has seen an explosion in the innovation of machine learning platforms taking it from a domain constrained to a few data scientists to a mainstream developer audience. As a result, companies are now in a position to build really comprehensive machine learning applications that were completely impossible just 2-3 years ago. The explosion in machine learning technologies doesn’t come without a price for enterprises. As any other rapidly emerging technology trends, machine learning has experienced a rapid growth in the number of new platforms and startups that provide relevant machine learning capabilities for enterprises. As a result, many enterprises struggle navigating the new ecosystem of machine learning technologies and platforms. This paper provides an analysis of some of the most relevant technologies in the machine learning space along with experiences that Tellago’s data science practice team has implementing machine learning solutions in the real world. The analysis illustrated in this paper is solely based on practical experiences and not theoretical exercises. Key Characteristics of Enterprise Machine Learning Solutions Integration with Mainstream Data Stores The integration with diverse data stores is a key element for the mainstream adoption of machine learning platforms. Databases, SaaS platforms, ERPs, CRMs are just some of the data sources that can be relevant in machine learning scenarios. The ability to seamlessly integrate with different line of business systems drastically simplifies the adoption of machine learning platforms in enterprise environments.
  • 4. Integration with R and Python R and Python have been the main platforms used in machine learning and data science applications. Consequently, there are many widely adopted machine learning frameworks implemented in R and Python. The interoperability with R and Python libraries allows machine learning platforms to take advantage of well- established data science practices and techniques implemented in those frameworks. In that sense, enterprises can benefit from machine learning platforms that can natively leverage R and Python libraries. Simple Infrastructure Scaling machine learning infrastructures can be a complex endeavor. Even worse, the complexities around the configuration of machine learning infrastructures sometimes become a friction point for the early adoption of machine learning platforms. To avoid those challenges, enterprises should look for machine learning platforms that can be relatively simple to setup and don’t require massive investments in infrastructure. This will allow organizations to focus on the evaluation of core machine learning capabilities instead of the infrastructure behind them. Programmatic Interfaces Executing and evaluating machine learning models is often seen as an activity exclusively performed by humans. However, incorporating machine learning models into business applications is incredibly relevant in the enterprise. To achieve that, machine learning platforms should support the programmatic execution of models via APIs or mainstream enterprise programming platforms such as .NET or Java. Monitoring and Management Tools Monitoring and managing the execution of machine learning models is an essential element to guarantee the adoption of these type of platforms in enterprise environments. From the monitoring perspective, machine learning platforms should enable both analytics about the results of executed models as well as operational metrics related to the execution of those models. Additionally, organizations should
  • 5. favor machine learning platforms that provide a simple but robust management experience. Extensibility Until a few years ago, machine learning platforms were notoriously closed systems. That factor really limited the mainstream adoption of these platforms in enterprise environments, as many machine learning solutions require complex levels of customizations that require extending the core platform. In that sense, organizations should carefully evaluate the extensibility models of machine learning platforms and analyze how those can help to optimize the platform for their specific scenarios. Cloud vs. On-Premise Machine Learning Platforms A simple way to simplify the really crowded machine learning platform market is to make a distinction between cloud and on-premise platforms. For many organizations, the nature of the underlying infrastructure (cloud vs. on-premise) is a determining factor in terms of which machine learning platforms to evaluate. Deciding between and on-premise and cloud platform is always an interesting dilemma for most organizations but its even more relevant when it comes to data- centric platforms. While cloud machine learning platforms abstract the complexity of the underlying machine learning infrastructure and are rapidly driving innovation in the space, they lack the levels of control and extensity that you can achieve with on-premise machine learning stacks. The next section in this document provides an analysis of some of the most relevant cloud and on-premise platforms in the machine learning space. Enterprise Cloud Machine Learning Platforms Machine learning platforms are rapidly emerging as one of the most important components of platform as a service (PaaS) technologies. While the first iteration of cloud big data technologies focused on providing a seamless experience for hosting and provisioning a Hadoop based infrastructure, the lead platforms in the space are rapidly adding value data intelligence capabilities including machine learning. This
  • 6. movement has been led by platforms like Microsoft, Amazon or IBM, which have added sophisticated machine learning capabilities to their existing PaaS offerings. Additionally, there is a large number of startups trying to provide specialized machine learning cloud services that simplify the experience for organizations trying to apply machine learning models to specific business scenarios. When analyzing the cloud machine learning platform space, organizations should consider Azure, AWS, IBM and Databricks as some of the leader in the space. Azure Machine Learning  Overview: Azure machine learning is a fully managed service included in the Azure platform that allows the implementation of predictive analytics solutions using machine learning. The service provides interfaces for building, deploying and managing machine learning models and its tightly integrated with other Azure services. Currently, Azure machine learning is included as part of the Cortana Analytics suite.  Key Capabilities: Azure machine learning includes some of the following capabilities o Machine Learning Studio: Microsoft Azure Machine Learning Studio is a collaborative, drag-and-drop tool you can use to build, test, and deploy predictive analytics solutions on your data. Machine Learning Studio publishes models as web services that can easily be consumed by custom apps or BI tools such as Excel. o API Generation: Azure machine learning provides the infrastructure to expose machine learning models as APIs that can be programmatically accessed by client applications. These APIs can also be integrated with the Azure API Gateway to enable more sophisticated management and monitoring features. o R and Python Extensibility: Azure machine learning allows developers to incorporate custom R and Python scripts into models. This extensibility mechanism allows developers to implement machine learning applications that combine the capabilities of Azure with many of the popular R and Python machine learning frameworks in the market.
  • 7.  Challenges: Azure machine learning is still relatively limited in terms of the integration with on-premise data stores, which are predominant in the enterprise. Additionally, we feel Azure machine learning can benefit for more complete extensibility mechanisms beyond the ones provided by R and Python scripts. AWS Machine Learning  Overview: Amazon Machine Learning is a native AWS service that makes it easy for developers of all skill levels to use machine learning technologies. Amazon Machine Learning provides visualization tools and wizards that guide developers through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology. Amazon Machine Learning makes it easy to obtain predictions for your application using simple APIs, without having to implement custom prediction generation code, or manage any infrastructure  Key Capabilities: Azure Machine Learning enables some of the following key capabilities: o Model Creation: AWS APIs and wizards make it easy for any developer to create and fine-tune ML models from data stored in different data stores and query these models for predictions. The service’s built-in data processors, scalable ML algorithms, interactive data and model visualization tools, and quality alerts help you build and refine your models quickly. o Prediction Services: AWS machine learning provides the mechanisms for quickly and reliably generate predictions for your applications based on previously created machine learning models. The prediction services can be elastically scaled using AWS infrastructure. o Data Transformation DSL: AWS machine learning includes a domain specific language (DSL) that allows developers to model transformations on the data processed by machine learning models. Data transformation implemented using these DSLs can be published as “recipes” and reused across other transformation processes.
  • 8.  Challenges: The experience of getting started with AWS machine learning is relatively complex compared to its competitors in the space. We believe the AWS machine learning service can benefit from incorporating more visual tools that facilitate the authoring of machine learning models. Another challenging factor in AWS machine learning applications remains the communication with on-premise data stores. IBM Watson Developer Cloud  Overview: IBM Watson developer cloud is a series of cognitive data services included as part of the IBM Bluemix platform. The Watson developer cloud includes services such as vision analysis, text analytics, text-to-speech transformation, concept expansion, among a dozen of other that enable developers to incorporate deep learning and cognitive data capabilities within their applications.  Key Capabilities: The Watson developer cloud includes some of the following capabilities. o Text Analytics: Watson developer cloud provides a large number of text analytics related services including relationship extraction, concept insights, sentiment analysis etc. These services can be easily integrated with other machine learning or business applications. o Vision Analytics: Watson developer cloud provides a group of innovative services that abstract key image analysis capabilities such as face recognition, object detection, image link extraction etc. These services can complement image libraries required in line of business applications and solutions. o Integration with Bluemix Services: Watson developer cloud is included as part of IBM Bluemix and, consequently, is tightly integrated with other Bluemix platform services. As a result, developers can implement really robust applications that leverage cognitive data services  Challenges: Watson developer cloud is a collection of APIs that enable cognitive data capabilities. As a result, Watson developer cloud is typically
  • 9. used as a complement to machine learning applications and can’t be considered a complete machine learning solution. Databricks  Overview: Databricks is a cloud integrated platform that enables the implementation and operation of Apache Spark applications. As part of the current capabilities, Databricks provides strong support for Spark MLib and Spark R.  Key Capabilities: o Model Performance: Databricks provides a highly scalable architecture that powers the performance of Spark MLib models. This capability allows developers to focus on writing Spark MLib solutions without worrying about the underlying infrastructure. o Support for R: In addition to Spark MLib, Databricks provides support for Spark R. This capability allows developers to write very sophisticated applications that combine traditional machine learning and R models to achieve optimal results. o On-premise Support: One of the biggest advantages of Databricks is that is completely based on Apache Spark. That model allows developers to write machine learning applications that can seamlessly work in both on-premise and cloud topologies.  Challenges: The current feature set of Spark MLib and Spark R is relatively limited compared to some of its cloud competitors. Additionally, Databricks is a Spark-exclusive cloud, which means that it doesn’t include complementary platform services comparable the ones provided by PaaS solutions like Azure, AWS or Bluemix. On-Premise Enterprise Machine Learning Platforms Similar to the cloud space, the on-premise machine learning space is experiencing an explosion in the number of technologies and platforms that enable the implementation of enterprise-ready machine learning solutions. Differently from the cloud space, new on-premise machine learning technologies seem to be actively built on popular open source data science frameworks such as R and Python instead
  • 10. of building proprietary stacks. As a result, many of the lead machine learning platforms are also delivered as open source distributions. The following sections in this paper evaluates some of the key on-premise machine learning stacks such as Revolution Analytics, Data, Spark, PredicitonIO and Scikit-learn. Revolution Analytics  Overview: Revolution R Enterprise provides the infrastructure for implementing enterprise-ready analytics applications based on R. Supporting a variety of big data statistics, predictive modeling and machine learning capabilities, Revolution R Enterprise is also 100% R. Revolution R Enterprise supports a variety of analytical capabilities including exploratory data analysis, model building and model deployment.  Key Capabilities: Revolution R provides some of the following key capabilities: o Scalable R: Revolution R Enterprise scales and accelerates R, running R scripts in a high-performance, parallel architecture that supports systems from workstations to clusters and grids including Hadoop and enterprise data warehouses. o Enterprise-Ready R Capabilities: Revolution R expands R with enterprise-ready capabilities such as logging, instrumentation, security, monitoring among other features that are essential to operationalize R solutions in the enterprise. o Integration with Mainstream Analytic Tools: Revolution R provides integration with many of the most popular analytics tools in the enterprise such as Tableau, Excel or Qlikview. Additionally, Revolution R also integrates with traditional reporting platforms such as Cognos, Business Objects etc.  Challenges: Revolution R is optimized for authoring applications in the R language. Sometimes, this model results limited for the implementation of complete enterprise applications. Additionally, the applications implemented with Revolution R can be complex to integrate into other enterprise solutions.
  • 11. Dato  Overview: Dato enables the rapid development, simple deployment, and robust management of real-time services and applications that use machine learning. Dato leverages the advancements in Python machine learning libraries to enable the implementation of highly sophisticated, enterprise- ready machine learning solutions. The Dato platform includes three key products: Graphlb Create, Dato Distributed and Dato Predictive Services.  Key Capabilities: o Model Creation: Dato’s Graphlab Create is an extensible machine learning framework that enables developers and data scientists to easily build and deploy intelligent applications and services at scale. It includes distributed data structures and rich libraries for data transformation and manipulation as well as scalable task-oriented machine learning toolkits for creating, evaluating, and improving machine learning models. o Scalable Execution: The Dato platform includes Data Distributed which is a server product that allows distributed execution of machine learning jobs on a cluster of machines. Jobs can include distributed training of machine learning models, parallel model scoring & predictions, distributed hyperparameter tuning, model ensembling, and evaluation tasks. This capability abstracts the complexities of scaling machine learning models in enterprise environments. o API Access: Dato Predictive Services enables the execution of Dato machine learning models as high performance APIs. This capability allows developers to easily incorporate machine learning models into new applications without having to use any proprietary libraries.  Challenges: As any new product, enterprises adopting Dato faced the challenge of embracing a product without a large community of developers and system implementers. However, the communities around Dato are rapidly growing. Additionally, Dato is completely Python-centric which makes it challenging to adopt by organizations without that in-house expertise.
  • 12. Spark MLib and Spark R  Overview: Apache Spark includes two main libraries for machine learning applications: Spark MLib and Spark R. MLlib is Spark’s scalable machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives. Spark R is an R package that provides a light-weight frontend to use Apache Spark from R. Spark R provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets.  Key Capabilities: Spark provides the following key capabilities for machine learning applications: o Scalability: Because Spark MLib and Spark R are built on the Spark platform; they enjoy the scalability and performance benefits of the Spark architecture. In that sense, Spark machine learning models can run across large topologies with hundreds of nodes and recover from unexpected errors. o Support for R: The addition of Spark R offers developers a very unique option of combining R and machine learning models as part of the same applications. More importantly, both Spark R and Spark MLib are provisioned, scaled and managed using the same underlying infrastructure. o Developer and System Integrator Community: Apache Spark is enjoying a rapidly growing community of developers and system integrators. As a result, organizations can enjoy a strong support for machine learning applications built on Apache Spark and Apache R.  Challenges: The infrastructure required to run Spark Mlib and Spark R applications at an enterprise scale can result is a very complex endeavor. Additionally, the tools to fully operationalize Spark Mlib and Spark R applications are still limited compared to other platforms in the space.
  • 13. PredictionIO  Overview: PredictionIO is an open-source Machine Learning server for developers and data scientists to build and deploy predictive applications in a fraction of the time. PredictionIO template gallery offers a wide range of predictive engine templates for download where developers can customize them easily. PredictionIO is built on top of Apache Spark and it expands it with enterprise-ready capabilities such as event-based activations, API generation or monitoring tools.  Key Capabilities: o Template Based Authoring: PredictionIO provides a model for authoring simple machine learning applications based on templates. These templates abstract some of the underlying complexity of a machine learning model and can be extended and customized for specific scenarios. o Event Based Activation: PredictionIO includes an event server component that enables the asynchronous activation of machine learning engines. This architecture provides a scalable model to execute machine learning applications across diverse topologies. o Monitoring and Management Tools: PredictionIO extends Apache Aprk with sophisticated management and monitoring tools that facilitate the operational readiness of machine learning applications.  Challenges: Although incredibly easy to use for simple machine learning scenarios, PredictionIO can result limited in the implementation of more complex models. Additionally, PredictionIO still hasn’t been able to build large developer and system integrator communities and streamline its implementation in enterprise environments. Scikit-learn  Overview: Scikit-learn is framework provides a range of supervised and unsupervised learning algorithms via a consistent interface in Python. It is licensed under a permissive simplified BSD license and is distributed under many Linux distributions, encouraging academic and commercial use.
  • 14.  Key Capabilities: o Rich Machine Learning Algorithm Library: Scikit-learn provides what can be considered the richest collection of machine learning algorithms of any framework in the space. The framework also combines features from popular frameworks like Numpy, Scipy or Sympy to provide sophisticated capabilities in areas such as symbolic mathematics or scientific computing. o Simple Programming Model: Despite its large feature set, Scikit- learn provides a very simple programming model that allow developers without strong expertise in machine learning to implement highly sophisticated data science applications. o Rich Data Visualizations: Scikit-learn provides a strong set of data visualization capabilities that can be combines with the machine learning model to rapidly evaluate the effectiveness of the models.  Challenges: Scikit-learn is a programming framework and not a machine learning platform. In that sense, Scikit-learn does not provide the scalability models or the monitoring and management tools typically included in machine learning platforms. As a result, enterprises should look to leverage the rich capabilities of Scikit-learn in conjunction with other machine learning platforms to implement enterprise-ready data science solutions. Summary Machine learning is becoming one of the most relevant aspects of data intelligence solutions in the enterprise. Enterprises evaluating machine learning platforms should consider both cloud and on-premise options. Cloud enterprise machine learning platforms excel on abstracting the underlying infrastructure needed to run and scale machine learning models. On-premise enterprise machine learning platforms offer rich extensibility models and typically rely on open source distribution channels. Platforms like Azure, AWS and IBM are leading the charge in the cloud enterprise machine space. Vendors like DAtabricks are also bringing a lot of innovation to the space. In the on-premise arena, companies like Data and PredictionIO as well as
  • 15. popular open source frameworks like Apache Spark or Scikit-learn are some of the robust options for enterprises building data science solutions. This paper included an analysis of some of the key machine learning platforms including their strengths and weaknesses based on our experience in real world implementations.