SlideShare a Scribd company logo
1 of 3
Download to read offline
High Performance data mining platforms-Things to consider
The computing environment is critical to your success in Big data and data mining projects.Your
computing environment comprises four important resources to consider: network, disk, central
processing units (CPUs), and memory. Time to solution goals, expected data volumes, and budget will
direct your decisions regarding computation resources. Appropriate computing platforms for data
analysis depend on many dimensions, primarily the volume of data (initial, working set, and output
data volumes), the pattern of access to the data, and the algorithm for analysis. These will vary
depending on the phase of the data analysis.
Here are some key questions to consider when you are trying to built or purchase high-performance
data mining platform for your organization.
• What is the size of your data now?
• What is the anticipated growth rate of your data in the next few years?
• Is the data you are storing mostly structured data or unstructured data?
• What data movement will be required to complete your data mining projects?
• What percentage of your data mining projects can be solved on a single machine?
• What other changes are required to improve data processing power? Do you need to buy
additional multi-core machines/servers. How much we can save if we process the data over
cloud?
• How to leverage the existing IT infrasture for running analytics applications faster & smoother?
• How much In-person hours/efforts are required to build new big data and advance analytics
capabilities in the current enviornment?
• How many existing system applications and data sources are required to be integrated to built
the efficient big data processing capabilities?
• Do we need to re-design organizational meta-data to create one version or consistency of data?
• Is your data mining software a good match for the computing environment you are designing?
• What are your users' biggest complaints about the current system?
The purchasing decision for the data mining computing platform will likely be made by the IT
organization, but it must first understand the problems being solved now and the problems that are
needed to be solved in the near future. The consideration of platform trade-offs and the needs of the
organization, all shared in a transparent way, will lead to the best outcome for the entire organization
and the individual stakeholders.
V 1.0- Author: Ashish Jain Date: 13-Apr-2015
In the figure below we try to compare a number of big data technologies. The figure highlights the
different types of systems and their comparative strengths and weaknesses.
MPP Database: Massively parallel processing Databases. Also named as Enterprise
Datawarehouses[EDWs]
IMDBs: In-Memory Databases [SAP HANA,Oracle Exalytics]
Hadoop: Based on distributed file system
NoSQL Databases: Cassandra,Hbase,MongoDB,Couchbase etc.
You can learn more about IMDBs databases from below URLs
http://www.mcobject.com/in_memory_database
http://www.opensourceforu.com/2012/01/importance-of-in-memory-databases/
http://searchsap.techtarget.com/feature/Faceoff-SAP-HANA-a-full-in-memory-database-unlike-Oracle-
Exalytics
You can learn more about NoSQL databases from below URLs
https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-management-
systems-and-models
http://nosql-database.org/
V 1.0- Author: Ashish Jain Date: 13-Apr-2015
Hadoop
Consistent ● ● ● ▲ ▲
Available ● ● ● ▲ ▲
Fault Tolerant ● ● ▲ ● ●
● ● ● ♦ ♦
▲ ▲ ● ● ♦
♦ ▲ ▲ ● ●
♦ ♦ ▲ ● ●
In Memory
Database
MPP
Database Big Data
Appliance
NoSQL
Database
Suitable for
Real-time Transaction
Suitable for
Analytics
Suitable for
Extremely
Large data size
Suitable for
Unstructured data
● Meets widely held expectation
▲ Potentially meets widely held expectations
♦ Fails to meet widely held expectations
V 1.0- Author: Ashish Jain Date: 13-Apr-2015

More Related Content

What's hot

Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data ScienceBrijeshGoyani
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureInside Analysis
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case Muh Saleh
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use CasesInSemble
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMHoi Lan Leong
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Denodo
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...Eric Javier Espino Man
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...yashbheda
 

What's hot (20)

SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Big Data & Data Science
Big Data & Data ScienceBig Data & Data Science
Big Data & Data Science
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big Data
Big DataBig Data
Big Data
 
Big Data
Big DataBig Data
Big Data
 
Big data tools
Big data toolsBig data tools
Big data tools
 
Foundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information ArchitectureFoundation for Success: How Big Data Fits in an Information Architecture
Foundation for Success: How Big Data Fits in an Information Architecture
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
View on big data technologies
View on big data technologiesView on big data technologies
View on big data technologies
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 
PASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureMLPASS Summit Data Storytelling with R Power BI and AzureML
PASS Summit Data Storytelling with R Power BI and AzureML
 
Matthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCMMatthew Johnston - Big Data Futures Outlook BCM
Matthew Johnston - Big Data Futures Outlook BCM
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...White paper   making an-operational_data_store_(ods)_the_center_of_your_data_...
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
 
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
Big data (4Vs,history,concept,algorithm) analysis and applications #bigdata #...
 
Big data in action
Big data in actionBig data in action
Big data in action
 

Similar to High Performance data mining platforms-Things to consider

Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewAbhishek Roy
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperImpetus Technologies
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Dell World
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2RojaT4
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which DataWorks Summit
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
HyperconvergedFantasyAnalytics
HyperconvergedFantasyAnalyticsHyperconvergedFantasyAnalytics
HyperconvergedFantasyAnalyticsJerry Jermann
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMichael Hiskey
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 

Similar to High Performance data mining platforms-Things to consider (20)

Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Building a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White PaperBuilding a Big Data Analytics Platform- Impetus White Paper
Building a Big Data Analytics Platform- Impetus White Paper
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Big data analytics - hadoop
Big data analytics - hadoopBig data analytics - hadoop
Big data analytics - hadoop
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
HyperconvergedFantasyAnalytics
HyperconvergedFantasyAnalyticsHyperconvergedFantasyAnalytics
HyperconvergedFantasyAnalytics
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Big data rmoug
Big data rmougBig data rmoug
Big data rmoug
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 

High Performance data mining platforms-Things to consider

  • 1. High Performance data mining platforms-Things to consider The computing environment is critical to your success in Big data and data mining projects.Your computing environment comprises four important resources to consider: network, disk, central processing units (CPUs), and memory. Time to solution goals, expected data volumes, and budget will direct your decisions regarding computation resources. Appropriate computing platforms for data analysis depend on many dimensions, primarily the volume of data (initial, working set, and output data volumes), the pattern of access to the data, and the algorithm for analysis. These will vary depending on the phase of the data analysis. Here are some key questions to consider when you are trying to built or purchase high-performance data mining platform for your organization. • What is the size of your data now? • What is the anticipated growth rate of your data in the next few years? • Is the data you are storing mostly structured data or unstructured data? • What data movement will be required to complete your data mining projects? • What percentage of your data mining projects can be solved on a single machine? • What other changes are required to improve data processing power? Do you need to buy additional multi-core machines/servers. How much we can save if we process the data over cloud? • How to leverage the existing IT infrasture for running analytics applications faster & smoother? • How much In-person hours/efforts are required to build new big data and advance analytics capabilities in the current enviornment? • How many existing system applications and data sources are required to be integrated to built the efficient big data processing capabilities? • Do we need to re-design organizational meta-data to create one version or consistency of data? • Is your data mining software a good match for the computing environment you are designing? • What are your users' biggest complaints about the current system? The purchasing decision for the data mining computing platform will likely be made by the IT organization, but it must first understand the problems being solved now and the problems that are needed to be solved in the near future. The consideration of platform trade-offs and the needs of the organization, all shared in a transparent way, will lead to the best outcome for the entire organization and the individual stakeholders. V 1.0- Author: Ashish Jain Date: 13-Apr-2015
  • 2. In the figure below we try to compare a number of big data technologies. The figure highlights the different types of systems and their comparative strengths and weaknesses. MPP Database: Massively parallel processing Databases. Also named as Enterprise Datawarehouses[EDWs] IMDBs: In-Memory Databases [SAP HANA,Oracle Exalytics] Hadoop: Based on distributed file system NoSQL Databases: Cassandra,Hbase,MongoDB,Couchbase etc. You can learn more about IMDBs databases from below URLs http://www.mcobject.com/in_memory_database http://www.opensourceforu.com/2012/01/importance-of-in-memory-databases/ http://searchsap.techtarget.com/feature/Faceoff-SAP-HANA-a-full-in-memory-database-unlike-Oracle- Exalytics You can learn more about NoSQL databases from below URLs https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-management- systems-and-models http://nosql-database.org/ V 1.0- Author: Ashish Jain Date: 13-Apr-2015 Hadoop Consistent ● ● ● ▲ ▲ Available ● ● ● ▲ ▲ Fault Tolerant ● ● ▲ ● ● ● ● ● ♦ ♦ ▲ ▲ ● ● ♦ ♦ ▲ ▲ ● ● ♦ ♦ ▲ ● ● In Memory Database MPP Database Big Data Appliance NoSQL Database Suitable for Real-time Transaction Suitable for Analytics Suitable for Extremely Large data size Suitable for Unstructured data ● Meets widely held expectation ▲ Potentially meets widely held expectations ♦ Fails to meet widely held expectations
  • 3. V 1.0- Author: Ashish Jain Date: 13-Apr-2015