SlideShare a Scribd company logo
1 of 28
BIG DATA INFRASTRUCTURE AND
ANALYTICS SOLUTION
Erdenebayar Erdenebileg, Oyun-Erdene Namsrai
School of Information Technology, National University of Mongolia
erdenebayar.erdenebileg@gmail.com, oyunerdene@num.edu.mn
Overview

•
•
•
•
•
•
•

Introduction
Methods
Proposed methods
Experimental results
Related work
Discussion
Future work

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Introduction
• BIG DATA is coming from structured
and unstructured information (Web
data, market purchases, Credit card
transactions …)
• BIG DATA: 10% is structured data, But
90% is unstructured data

• Nowadays, almost every organization
is facing BIG DATA problems in
Mongolia.
• They need to analyze and predict their
valuable information

School of Information Technology, National University of Mongolia

Why?
How?

FITAT/ISPM 2013
Why?
Why we are facing BIG DATA problem?
Big Data: 3V’s
We are facing big data problem
with Volume, Variety, Velocity
reasons:
• Transactional data is growing day
by day
• Storing different types of data
• Need to be processed fast

Real Time

Data Velocity
(Fast analyzing requirement)

Near Real Time

Periodic
Batch

Unstructured
Video

Table
Database

GB

Web

Social

Data Variety

MB

Photo

Audio

Mobile

TB
PB

(Many types of data)

Data Volume
(Large amount of data)
School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
How?
How to solve the BIG DATA problem?
How to solve problem?

To provide BI and Analytic tool

Full solution is
1. To construct BIG DATA
infrastructure
2. To find and develop data
transmission tools
3. To implement warehousing and
mining tools and techniques
4. To provide BI and Analytic tool

To implement warehousing and
mining tools and techniques
To construct BIG DATA
infrastructure

School of Information Technology, National University of Mongolia

To find and develop data
transmission tools

Data Sources
(Structured,
Semi-structured,
Unstructured)

FITAT/ISPM 2013
Methods and Comparison?
RDBMS versus NoSQL database?
RDMBS based infrastructure
From my experimental :
• Optimization requires more
cost (Licenses and Server), but
open source RDBMS is not
fitted with license
• RDBMS is not good with more
than gigabyte data
• It is not compatible to store
unstructured data (video, audio
etc…)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
HADOOP based infrastructure
From the biggest companies
experience (Facebook, Yahoo,
Twitter …), main advantages
are :
• Distributed File System
paradigm
• Powerful parallel computing
framework (MapReduce)
• It can be store any type of
data, which are structured,
semi-structured, unstructured
data
• It is Open source and easy to
integrate Hadoop related
products

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Brief introduction: HDFS Architecture
NameNode

BackupNode

Balancing, Replication, Failover

DataNode

DataNode

DataNode

DataNode

Data Node stores in local disks
School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Brief introduction : MapReduce framework
Job Tracker

2010

2011

2012

2013

1. We have a big GREEN data

3. Aggregation and calculation data

2. Data will separate to the different
server

4. Consolidated result to the client

Task Tracker /
Server

Task Tracker /
Server

School of Information Technology, National University of Mongolia

Task Tracker /
Server

Task Tracker /
Server

FITAT/ISPM 2013
Proposed method & solution
It is Hadoop and open source technologies
Proposed method selection (Hadoop stacks)
Proposed method selected with following reason:
• Data should be stored in Distributed system
• Aggregation and calculation should be done in parallel computing
paradigm
• Data type is structured and unstructured data, which are mobile
call detailed record
• Data size is about 20TB
• Method should be Open source technologies

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Full Infrastructure (3 main method)
Client Machine (Jasper Business Intelligence)

Client software
(Reporting tool)

JasperRepors Server
Hive connector

Machine 1 (Slave Hadoop)

HBase connector

Machine 2 (Master Hadoop)

Clustered Big Data Infrastructure and Data Processing

Physical Machine (Resources)

Data Sender
Data resources

Sensor Data (Phone, Web Log, Camera etc…)
Structured Data

Big Data
Infrastructure

Semi -Unstructured Data

School of Information Technology, National University of Mongolia

Unstructured Data

FITAT/ISPM 2013
Method 1: Clustered Big Data Infrastructure and Data Processing
• First task is configuring BIG DATA infrastructure with Analytic products
• This configuration clustered with TWO machine (Physical machine)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Method 2: Data transmission way
• Data resources consist RDBMS
and unstructured data (CDR file,
video …)
• If structured data stores such as
Relational databases, we need
Sqoop product for bulk data
transfer
• If unstructured data stores such
as video and file, we need custom
application development using
HDFS client (SSH)

•
•

School of Information Technology, National University of Mongolia

Manual data transfer way
Automatic data transfer way
(Custom application)

FITAT/ISPM 2013
Method 3: Analytics solution over the BIG DATA
This is the main method and trying to solve
following concepts

Predictive Analytics
They are focusing now

Prediction
(What will happen?)

Complexity

Business Intelligence
Almost every
organizations are
doing now

Monitoring
(What is happening now?)

Analysis
(Why did it happen?)

Reporting
(What happened?)

Business value
School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Method 3: Analytics solution over the BIG DATA
• This is describes how to Reporting, Analyzing, Monitoring and Predict over the
BIG DATA infrastructure
Hadoop Distributed File System (Resources)
Sensor
Data

Hive
Table

HBase
Table

Hive Warehouse Data
Hive Table
Summarization
(Reporting, Analyzing,and analysis
Creation
Monitoring)
Hive Query
Language (HQL)

Direct Access To
HDFS

HBase table
management
HBase Table
Creation
(Reporting,
Analyzing,
Monitoring)

Aggregated data

Ad-hoc query

Sensor
Data

Mined
Data

Mahout Machine
Mahout Machine
LearningMining) Data
and
Learning (Data

Thrift
Server

HBase query

Mining
(Prediction)

Direct Access To
HDFS

End User (Analytic Tool)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Experimental results
Testing, Monitoring, Working
Experimental results
Experimental work focused on following main job:
1. Install and configure BIG DATA infrastructure (Clustered 2
physical machine)
2. Import sample unstructured data to the HDFS using SSH (to the
Big data infrastructure)
3. Ran sample HiveQL query, HBase query and Mahout job over
the MapReduce framework

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Running and monitoring HDFS and MapReduce framework
Sample results: HDFS and MapReduce

Master Machine:
DataNode, JobTracker,
NameNode, SNN,
TaskTracker are running

Slave Machine:
DataNode, TaskTracker
are running

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Running and working Hive warehouse
Sample results: Hive warehouse and HiveQL

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Running and working HBase table management
Sample results: HBase table management and Rest-ful web service

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Future work and Conclusion
Keep continue data mining research
Future work

Keep continue my research work about BIG DATA
and Analytic solution:
1. Validate proposed infrastructure with real world data
(Mobile call logs, Camera sensor)
2. Keep research new technology to support to our
architecture
3. Predict and analyze real data over the infrastructure
(Market basket analyze, recommendation etc…)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Conclusion
1. This is the full analytics solution for Analyzing big data
over the Hadoop Distributed File System:
-

Reporting (What happened?) (Hive)

-

Analysis (Why did it happen?) (Hive, HBase)

-

Monitoring (What happening now?) (Hive)

-

Predict (What will happen?) (Mahout)

School of Information Technology, National University of Mongolia

FITAT/ISPM 2013
Thank you
Questions?

More Related Content

What's hot

IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?Kun Le
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarDatameer
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practiceVivek Murugesan
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsAlan Quayle
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonCapgemini
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Denodo
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop SampleAlan Quayle
 
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Denodo
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big DataDataWorks Summit
 
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Impetus Technologies
 
1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...IBM
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategySustainableEnergyAut
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 

What's hot (20)

IBM-Why Big Data?
IBM-Why Big Data?IBM-Why Big Data?
IBM-Why Big Data?
 
How to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics WebinarHow to Avoid Pitfalls in Big Data Analytics Webinar
How to Avoid Pitfalls in Big Data Analytics Webinar
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Overview of analytics and big data in practice
Overview of analytics and big data in practiceOverview of analytics and big data in practice
Overview of analytics and big data in practice
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
Telco Big Data 2012 Highlights
Telco Big Data 2012 HighlightsTelco Big Data 2012 Highlights
Telco Big Data 2012 Highlights
 
Traditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A ComparisonTraditional BI vs. Business Data Lake – A Comparison
Traditional BI vs. Business Data Lake – A Comparison
 
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
Accelerate Digital Transformation with Data Virtualization in Banking, Financ...
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
Telco Big Data Workshop Sample
Telco Big Data Workshop SampleTelco Big Data Workshop Sample
Telco Big Data Workshop Sample
 
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
Extended Data Warehouse - A New Data Architecture for Modern BI with Claudia ...
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
Big Data Use Cases for Different Verticals and Adoption Patterns - Impetus We...
 
1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...1524 how ibm's big data solution can help you gain insight into your data cen...
1524 how ibm's big data solution can help you gain insight into your data cen...
 
BD&A Day
BD&A Day BD&A Day
BD&A Day
 
Haven 2 0
Haven 2 0 Haven 2 0
Haven 2 0
 
RFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data StrategyRFT for Business Intelligence and Data Strategy
RFT for Business Intelligence and Data Strategy
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 

Similar to Big Data Infrastructure and Analytics Solution on FITAT2013

IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET Journal
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Aravindharamanan S
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Jeffrey Sica
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohenTaldor Group
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsJ Singh
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationLeMeniz Infotech
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Love Arora
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big DataSnapLogic
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )fmarukanda
 
2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdfjimjones227147
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...IRJET Journal
 

Similar to Big Data Infrastructure and Analytics Solution on FITAT2013 (20)

IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop FrameworkIRJET-  	  Youtube Data Sensitivity and Analysis using Hadoop Framework
IRJET- Youtube Data Sensitivity and Analysis using Hadoop Framework
 
Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1Eecs6893 big dataanalytics-lecture1
Eecs6893 big dataanalytics-lecture1
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)Data Ingestion At Scale (CNECCS 2017)
Data Ingestion At Scale (CNECCS 2017)
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
 
Real time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing applicationReal time big data analytical architecture for remote sensing application
Real time big data analytical architecture for remote sensing application
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
Big Data Handling Technologies ICCCS 2014_Love Arora _GNDU
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf2019-09-05Federated Learning.pdf
2019-09-05Federated Learning.pdf
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
 

Recently uploaded

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Big Data Infrastructure and Analytics Solution on FITAT2013

  • 1. BIG DATA INFRASTRUCTURE AND ANALYTICS SOLUTION Erdenebayar Erdenebileg, Oyun-Erdene Namsrai School of Information Technology, National University of Mongolia erdenebayar.erdenebileg@gmail.com, oyunerdene@num.edu.mn
  • 2. Overview • • • • • • • Introduction Methods Proposed methods Experimental results Related work Discussion Future work School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 3. Introduction • BIG DATA is coming from structured and unstructured information (Web data, market purchases, Credit card transactions …) • BIG DATA: 10% is structured data, But 90% is unstructured data • Nowadays, almost every organization is facing BIG DATA problems in Mongolia. • They need to analyze and predict their valuable information School of Information Technology, National University of Mongolia Why? How? FITAT/ISPM 2013
  • 4. Why? Why we are facing BIG DATA problem?
  • 5. Big Data: 3V’s We are facing big data problem with Volume, Variety, Velocity reasons: • Transactional data is growing day by day • Storing different types of data • Need to be processed fast Real Time Data Velocity (Fast analyzing requirement) Near Real Time Periodic Batch Unstructured Video Table Database GB Web Social Data Variety MB Photo Audio Mobile TB PB (Many types of data) Data Volume (Large amount of data) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 6. How? How to solve the BIG DATA problem?
  • 7. How to solve problem? To provide BI and Analytic tool Full solution is 1. To construct BIG DATA infrastructure 2. To find and develop data transmission tools 3. To implement warehousing and mining tools and techniques 4. To provide BI and Analytic tool To implement warehousing and mining tools and techniques To construct BIG DATA infrastructure School of Information Technology, National University of Mongolia To find and develop data transmission tools Data Sources (Structured, Semi-structured, Unstructured) FITAT/ISPM 2013
  • 8. Methods and Comparison? RDBMS versus NoSQL database?
  • 9. RDMBS based infrastructure From my experimental : • Optimization requires more cost (Licenses and Server), but open source RDBMS is not fitted with license • RDBMS is not good with more than gigabyte data • It is not compatible to store unstructured data (video, audio etc…) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 10. HADOOP based infrastructure From the biggest companies experience (Facebook, Yahoo, Twitter …), main advantages are : • Distributed File System paradigm • Powerful parallel computing framework (MapReduce) • It can be store any type of data, which are structured, semi-structured, unstructured data • It is Open source and easy to integrate Hadoop related products School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 11. Brief introduction: HDFS Architecture NameNode BackupNode Balancing, Replication, Failover DataNode DataNode DataNode DataNode Data Node stores in local disks School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 12. Brief introduction : MapReduce framework Job Tracker 2010 2011 2012 2013 1. We have a big GREEN data 3. Aggregation and calculation data 2. Data will separate to the different server 4. Consolidated result to the client Task Tracker / Server Task Tracker / Server School of Information Technology, National University of Mongolia Task Tracker / Server Task Tracker / Server FITAT/ISPM 2013
  • 13. Proposed method & solution It is Hadoop and open source technologies
  • 14. Proposed method selection (Hadoop stacks) Proposed method selected with following reason: • Data should be stored in Distributed system • Aggregation and calculation should be done in parallel computing paradigm • Data type is structured and unstructured data, which are mobile call detailed record • Data size is about 20TB • Method should be Open source technologies School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 15. Full Infrastructure (3 main method) Client Machine (Jasper Business Intelligence) Client software (Reporting tool) JasperRepors Server Hive connector Machine 1 (Slave Hadoop) HBase connector Machine 2 (Master Hadoop) Clustered Big Data Infrastructure and Data Processing Physical Machine (Resources) Data Sender Data resources Sensor Data (Phone, Web Log, Camera etc…) Structured Data Big Data Infrastructure Semi -Unstructured Data School of Information Technology, National University of Mongolia Unstructured Data FITAT/ISPM 2013
  • 16. Method 1: Clustered Big Data Infrastructure and Data Processing • First task is configuring BIG DATA infrastructure with Analytic products • This configuration clustered with TWO machine (Physical machine) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 17. Method 2: Data transmission way • Data resources consist RDBMS and unstructured data (CDR file, video …) • If structured data stores such as Relational databases, we need Sqoop product for bulk data transfer • If unstructured data stores such as video and file, we need custom application development using HDFS client (SSH) • • School of Information Technology, National University of Mongolia Manual data transfer way Automatic data transfer way (Custom application) FITAT/ISPM 2013
  • 18. Method 3: Analytics solution over the BIG DATA This is the main method and trying to solve following concepts Predictive Analytics They are focusing now Prediction (What will happen?) Complexity Business Intelligence Almost every organizations are doing now Monitoring (What is happening now?) Analysis (Why did it happen?) Reporting (What happened?) Business value School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 19. Method 3: Analytics solution over the BIG DATA • This is describes how to Reporting, Analyzing, Monitoring and Predict over the BIG DATA infrastructure Hadoop Distributed File System (Resources) Sensor Data Hive Table HBase Table Hive Warehouse Data Hive Table Summarization (Reporting, Analyzing,and analysis Creation Monitoring) Hive Query Language (HQL) Direct Access To HDFS HBase table management HBase Table Creation (Reporting, Analyzing, Monitoring) Aggregated data Ad-hoc query Sensor Data Mined Data Mahout Machine Mahout Machine LearningMining) Data and Learning (Data Thrift Server HBase query Mining (Prediction) Direct Access To HDFS End User (Analytic Tool) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 21. Experimental results Experimental work focused on following main job: 1. Install and configure BIG DATA infrastructure (Clustered 2 physical machine) 2. Import sample unstructured data to the HDFS using SSH (to the Big data infrastructure) 3. Ran sample HiveQL query, HBase query and Mahout job over the MapReduce framework School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 22. Running and monitoring HDFS and MapReduce framework Sample results: HDFS and MapReduce Master Machine: DataNode, JobTracker, NameNode, SNN, TaskTracker are running Slave Machine: DataNode, TaskTracker are running School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 23. Running and working Hive warehouse Sample results: Hive warehouse and HiveQL School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 24. Running and working HBase table management Sample results: HBase table management and Rest-ful web service School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 25. Future work and Conclusion Keep continue data mining research
  • 26. Future work Keep continue my research work about BIG DATA and Analytic solution: 1. Validate proposed infrastructure with real world data (Mobile call logs, Camera sensor) 2. Keep research new technology to support to our architecture 3. Predict and analyze real data over the infrastructure (Market basket analyze, recommendation etc…) School of Information Technology, National University of Mongolia FITAT/ISPM 2013
  • 27. Conclusion 1. This is the full analytics solution for Analyzing big data over the Hadoop Distributed File System: - Reporting (What happened?) (Hive) - Analysis (Why did it happen?) (Hive, HBase) - Monitoring (What happening now?) (Hive) - Predict (What will happen?) (Mahout) School of Information Technology, National University of Mongolia FITAT/ISPM 2013

Editor's Notes

  1. Good afternoon, Dear professors and teachers and students,My name is Erdenebayar, who is master student of School of Information Technology, National University of MongoliaI am very appreciate to have the chance to introduce our research work. It is one of my important moment of my life. Today I will introduce my research work about Big Data infrastructure and analytics solution
  2. This is the main topics
  3. First of all, I’ll introduce why I’m researching big data and analytic work.In Mongolia ….. Nowadays …..Because I’m working on Data Management team at one Software Development company and discussed with biggest customers (Government and Business companies).
  4. Currently we are facing big data problem with Volume, Variety, Velocity reasons.First one is Volume: Transactional data is growing day by day (MB, GB, TB, PB, ZB)Second one is Variety: It mainly about data types. Lot of different devices storing different type of dataLast one is Velocity: Every business companies need to analyze and process very fast to do future business
  5. Exactly we can decide Big Data problem and Business companies need with following way:This picture shows conceptual solution for that.
  6. In this topic, I will describe some method and comparison of different methodology.We can store big data (data) on the RDBMS and NoSQL Database.
  7. Hadoop product consists two main product, which are Hadoop Distributed File System and Data Processing MapReduce Framework.I will briefly introduce these two product
  8. I would like to thank you my Professor Oyun-Erdene, She always couch and teach me all of cases.
  9. Thank you for your attention.If you have any question, I would be happy to answer