SlideShare a Scribd company logo
1 of 84
Download to read offline
DATA ANALYTIC TRAINING
TRINER
DR. IR. JOHN SIHOTANG,MM
MANAJEMEN TRAINEE SUCOFINDO
JAKARTA, 24-26 APRIL 2019
jhotank@yahoo.com
0811-231-509
2
DATA ANALYTICS CONCEPT &
IMPLEMENTATION
MANAJEMEN TRAINEE SUCOFINDO
JAKARTA, 24-26 APRIL 2019
TOPIK MATERI
1.BIG DATA OVERVIEW
2.DATA FOR BUSINESS ANALYSIS
3.DATA ANALYTICS CONCEPT
4.DATA ANALYTICS TOOLS
4
BIG DATA OVERVIEW
THEME OF THIS COURSE
Large-Scale Data Management
Big Data Analytics
Data Science and Analytics
• How to manage very large amounts of data and extract value and
knowledge from them
6
INTRODUCTION TO BIG DATA
What is Big Data?
What makes data, “Big” Data?
7
BIG DATA DEFINITION
• No single standard definition…
“Big Data” is data whose scale, diversity, and
complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract
value and hidden knowledge from it…
8
CHARACTERISTICS OF BIG DATA:
1-SCALE (VOLUME)
• Data Volume
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
9
Exponential increase in
collected/generated data
CHARACTERISTICS OF BIG DATA:
2-COMPLEXITY (VARITY)
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media data,
multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be generating/collecting
many types of data
10
To extract knowledgeè all these types of
data need to linked together
CHARACTERISTICS OF BIG DATA:
3-SPEED (VELOCITY)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions è missing opportunities
• Examples
– E-Promotions: Based on your current location, your purchase
history, what you like è send promotions right now for store next to
you
– Healthcare monitoring: sensors monitoring your activities and
body è any abnormal measurements require immediate reaction
11
BIG DATA: 3V’S
12
SOME MAKE IT 4V’S
13
HARNESSING BIG DATA
• OLTP: Online Transaction Processing (DBMSs)
• OLAP: Online Analytical Processing (Data Warehousing)
• RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)
14
WHO’S GENERATING BIG DATA
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data
in a timely manner and in a scalable fashion
15
THE MODEL HAS CHANGED…
• The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
16
WHAT’S DRIVING BIG DATA
- Ad-hoc querying and reporting
- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets
- Optimizations and predictive analytics
- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time
17
VALUE OF BIG DATA ANALYTICS
• Big data is more real-time in nature than
traditional DW applications
• Traditional DW architectures (e.g.
Exadata, Teradata) are not well-suited
for big data apps
• Shared nothing, massively parallel
processing, scale out architectures are
well-suited for big data apps
18
CHALLENGES IN HANDLING BIG DATA
• The Bottleneck is in technology
– New architecture, algorithms, techniques are needed
• Also in technical skills
– Experts in using the new technology and dealing with big data
19
DATA FOR BUSINESS
ANALYTICS
2
BASIC DEFINITION
qData: Data is a set of values of qualitative or
quantitative variables. It is information in raw
or unorganized form. It may be a fact, figure,
characters, symbols etc.
qInformation: Meaningful or organised data is
information
q Data terstruktur (structured data); yakni data yang sudah
dikelola, diproses dan dimanipulasi dalam RDBMS
(Relational Database Management System). Misalnya
data tabel hasil masukan formulir pendaftaran di sebuah
layanan web.
q Data tidak terstruktur (unscructured data); yakni berupa
data mentah yang baru didapat dari beragam jenis
aktivitas dan belum disesuaikan ke dalam format basis
data. Misalnya berkas video yang didapat dari kamera.
q Data semi terstruktur (semistructured data); yakni berupa
data yang memiliki struktur, misalnya berupa tag, akan
tetapi belum sepenuhnya terstruktur dalam sistem basis
data. Misalnya data yang memiliki keseragaman tag,
namun memiliki isian yang berbeda didasarkan pada
karakteristik pengisi.
JENIS DATA
TYPE OF DATA
Data types are an important concept because statistical methods can only be
used with certain data types. You have to analyze continuous data differently
than categorical data otherwise it would result in a wrong analysis. Therefore
knowing the types of data you are dealing with, enables you to choose the
correct method of analysis.
(continued)
Classifying Data Elements in a Purchasing
Database
DATA FOR BUSINESS ANALYTICS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-24
Figure 1.2
• Nominal or categorical data is data that comprises of categories that
cannot be rank ordered – each category is just different.
• The categories available cannot be placed in any order and no
judgement can be made about the relative size or distance from one
category to another.
Ø Categories bear no quantitative relationship to one another
Ø Examples:
ü customer’s location (America, Europe, Asia)
ü employee classification (manager, supervisor, associate)
• What does this mean? No mathematical operations can be performed
on the data relative to each other.
• Therefore, nominal data reflect qualitative differences rather than
quantitative ones.
Categorical (Nominal) data
Examples:
Nominal data
What is your gender? (please tick)
Male
Female
Did you enjoy the film? (please tick)
Yes
No
Nominal values represent
discrete units and are used to
label variables, that have no
quantitative value. Just think of
them as „labels“. Note that
nominal data that has no order.
Therefore if you would change
the order of its values, the
meaning would not change. You
can see two examples of
nominal features below:
q Systems for measuring nominal data must
ensure that each category is mutually
exclusive and the system of measurement
needs to be exhaustive.
q Variables that have only two responses
i.e. Yes or No, are known as
dichotomies.
Nominal data
Nominal data
When you are dealing with nominal data, you collect information through:
1. Frequencies: The Frequency is the rate at which something occurs over a
period of time or within a dataset.
2. Proportion: You can easily calculate the proportion by dividing the frequency
by the total number of events. (e.g how often something happened divided by
how often it could happen)
3. Percentage.
4. Visualization Methods: To visualize nominal data you can use a pie chart or a
bar chart.
§ Ordinal data is data that comprises of categories that can
be rank ordered.
§ Similarly with nominal data the distance between each
category cannot be calculated but the categories can be
ranked above or below each other.
Ø No fixed units of measurement
Ø Examples:
ü college football rankings
ü survey responses (poor, average, good, very
good, excellent)
§ What does this mean? Can make statistical judgements and
perform limited maths.
Ordinal data
Example:
Ordinal data
How satisfied are you with the level of service you have
received? (please tick)
Very satisfied
Somewhat satisfied
Neutral
Somewhat dissatisfied
Very dissatisfied
Ordinal data
q When you are dealing with ordinal data, you can
use the same methods like with nominal data, but
you also have access to some additional tools.
q Therefore you can summarize your ordinal data
with frequencies, proportions, percentages.
q And you can visualize it with pie and bar charts.
Additionally, you can use percentiles, median,
mode and the interquartile range to summarize
your data.
q in Data Science, you can use one label encoding,
to transform ordinal data into a numeric feature.
q Both interval and ratio data are examples of scale data.
q Scale data:
• data is in numeric format ($50, $100, $150)
• data that can be measured on a continuous scale
• the distance between each can be observed and as a
result measured
• the data can be placed in rank order.
Interval and ratio data
Ø Ordinal data but with constant differences
between observations
Ø Ratios are not meaningful
Ø Examples:
§ Time – moves along a continuous measure
or seconds, minutes and so on and is
without a zero point of time.
§ Temperature – moves along a continuous
measure of degrees and is without a true
zero.
§ SAT scores
Interval data
• Ratio data measured on a continuous
scale and does have a natural zero point.
Ø Ratios are meaningful
Ø Examples:
• monthly sales
• delivery times
• Weight
• Height
• Age
Ratio data
35
q When you are dealing with continuous data, you can use the most
methods to describe your data. You can summarize your data using
percentiles, median, interquartile range, mean, mode, standard
deviation, and range.
q Visualization Methods: To visualize continuous data, you can use a
histogram or a box-plot. With a histogram, you can check the central
tendency, variability, modality, and kurtosis of a distribution. Note that a
histogram can’t show you if you have any outliers. This is why we also
use box-plots.
Continuous Data
36
q you discovered the different data types that are used
throughout statistics.
q You learned the difference between discrete & continuous
data and learned what nominal, ordinal, interval and ratio
measurement scales are.
q Furthermore, you now know what statistical measurements
you can use at which data type and which are the right
visualization methods.
q You also learned, with which methods categorical variables
can be transformed into numeric variables.
q This enables you to create a big part of an exploratory
analysis on a given dataset.
Summary
DATA ANALYTICS CONCEPT
2
BASIC DEFINITION
q Analytics: Analytics is the discovery , interpretation, and
communication of meaningful patterns or summery in
data.
q Data Analytics (DA) is the process of examining data
sets in order to draw conclusion about the information it
contains.
q Analytics is not a tool or technology, rather it is the way
of thinking and acting on data.
WHAT IS ANALYTICS?
Data on its own is useless unless you can make sense of it!
WHAT IS ANALYTICS?
The scientific process of transforming data into insight for making better
decisions, offering new opportunities for a competitive advantage
www.imarticus.org 39
The Case for Business Analytics
• The Business environment today is
more complex than ever before.
• Businesses are expected to be diligently
responsive to the increasing demands of
customers, various stakeholders and
even regulators.
• Organizations have been turning to the
use of analytics.
• More than 83% of Global CIOs surveyed
by IBM in 2010 singled out Business
Intelligence and Analytics as one of their
visionary plans for enhancing
competitiveness.
In most cases the primary objective of an
organization that seeks to turn to analytics
is:
• Revenue/Profit growth
• Optimize expenditure
SOLUTION
BUSINESS NEED
GOAL
www.imarticus.org 40
6
WHAT IS DATA ANALYTICS?
• Data Analytics:
– “is a process of inspecting, cleansing, transforming, and modeling data
with the goal of discovering useful information, suggesting conclusions,
and supporting decision-making”. - Wikipedia
– "leverage data in a particular functional process (or application) to
enable context-specific insight that is actionable.“ – Gartner
– “is using our current data sets to extract useful information to
support advanced decision making” - ATC
• Data Visualizations (i.e. Data Viz):
– Visual context of data, Dashboards
– Often single page, real-time user interface, graphical presentation of
your data
“Without data you’re just another person with an opinion.” ~W. Edwards Deming
DATA ANALYTICS PILLARS AND
FOUNDATION
10
11
DATA ANALYTICS ARCHITECTURE
Analysis
Visualization
Translation
Cleaning Processing
Interpretation
Preparation
Data Layer
Integration Data Collection
Business
Information
Technology
Partnership and
Stewardship
Wisdom
Information &
Knowledge
Data
DataRulesTools
Action
Action
“The greatest value of a picture is when it forces to notice what we never expected to
see.” ~John Tukey
TYPES OF ANALYTICS
1. Descriptive Analytics; analisis ini mengacu pada histori data
sekaligus data yang ada saat ini. Umumnya digunakan untuk menjawab
pertanyaan semacam “Apa yang terjadi dengan ABC?”, “Apa yang terjadi
jika XYZ?”, dan sebagainya.
2. Diagnostic Analytics; analisis ini digunakan untuk menyimpulkan
kejadian berdasarkan lansiran data terkait. Digunakan untuk menjawab
pertanyaan semacam “Mengapa ABC terjadi saat XYZ?”, “Apa yang salah
dengan strategi DEF?”, dan sebagaiya.
3. Predictive Analytics; analisis ini mencoba menyimpulkan sebuah tren
dan kejadian di masa depan mengacu pada data-data historis yang ada.
Model ini cenderung lebih kompleks dari dua tipe sebelumnya, karena
memerlukan pemodelan dan analisis yang lebih mendalam.
4. Prescriptive Analytics; analisis ini digunakan untuk mengoptimalkan
proses, struktur dan sistem melalui informasi yang dihasilkan dari Predictive
Analytics. Pada dasarnya memberi tahu kepada bisnis tentang hal apa yang
perlu dilakukan untuk mengantisipasi kejadian yang ada datang.
BDA-45
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
DESCRIPTIVE ANALYTICS
• Process:
– Identify the attributes, then assess/evaluate the attributes
– Estimate the magnitude to correlate the relative contribution of each attribute to the final
solution
– Accumulate more instances of data from the data sources
– If possible, perform the steps of evaluation, classification and categorization quickly
– Yield a measure of adaptability within the OODA loop
• At some threshold, crossover into diagnostic and predictive analytics
http://v1shal.com/content/25-
cartoons-give-current-big-data-
hype-perspective/
46
DIAGNOSTIC ANALYTICS
• Process:
– Begin with descriptive analytics
– Extract patterns from large data quantities via data mining
– Correlate data types for explanation of near-term behavior – past and present
– Estimate linear/non-linear behavior not easily identifiable through other
approaches.
• Example: by classifying past insurance claims, estimate the number of
future claims to flag for investigation with a high probability of being
fraudulent.
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
PREDICTIVE ANALYTICS
• Process:
– Begin with descriptive AND diagnostic analytics
– Choose the right data based on domain knowledge and relationships among
variables
– Choose the right techniques to yield insight into possible outcomes
– Determine the likelihood of possible outcomes given initial boundary conditions
– Remember! Data driven analytics is non-linear; do NOT treat like an engineering
project
47
BD
A-
48
PRESCRIPTIVE ANALYTICS
• Process:
– Begin w/ predictive analytics
– Determine what should occur and how to make it so
– Determine the mitigating factors that lead to desirable/undesirable outcomes
– “What-if” analysis w/ local or global optimization
– Ex: Find the best set of prices and advertising frequency to maximize revenue
– Ex: And, the right set of business moves to make to achieve that goal
“Make it so”
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
BDA-49
DECISIVE ANALYTICS
Ø Process:
• Given a set of decision
alternatives, choose the one
course of action to do from
possibly many
• But, it may not be the optimal one.
• Visualize alternatives – whole or
partial subset
• Perform exploratory analysis –
what-if and why
– How do I get to there from here?
– How did I get here from there?
Copyright (except where referenced) 2014-2016
Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
7
DATA ANALYTIC CAPABILITIES
“The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge.”
~Stephen Hawking
12
CURRENT STATE – RELIABILITY DATA
(Looking back
What happened?)
(Looking forward
What will happen?)
Information
Manual
Data
13
CURRENT STATE – RELIABILITY DATA
(Looking back) (Looking forward)
DESCRIPTIVE MODEL
53
54
q Analytics deskriptif adalah proses data analytics untuk mendapatkan
gambaran umum dari data yang sudah dikumpulkan.
q Contoh dari analytcs desckriptif adalah Google Analytics. Pada
Google Analytics kita hanya bisa melihat informasi sederhana seperti
ada berapa jumlah visitor per satuan waktu, halaman mana saja
yang paling sering dikunjungi, dan data seperti itu.
q Pada analytics sederhana seperti penjumlahan dan rata-rata tanpa
machine learning sudah lebih dari cukup.
q Analytics deskriptif tidak menampilkan prediksi halaman apa yang
akan dikunjungi pengunjung berikutnya atau kenapa seorang
pengunjung mengunjungi suatu halaman.
q Data analytics jenis ini adalah yang paling umum ditemui. Meskipun
hanya data sederhana tanpa pengolahan machine learning, data
seperti ini sangat diperlukan terutama untuk melakukan
benchmarking untuk mengetahui efek dari perubahan yang kita
lakukan.
Analytics deskriptif (Descriptive Analytics)
55
q Deskriptif model mengukur hubungan data dalam suatu cara yang
sering digunakan untuk mengklasifikasikan pelanggan atau prospek
menjadi kelompok-kelompok.
q Deskriptif model tidak rank-order pelanggan dengan kemungkinan
mereka untuk mengambil tindakan tertentu cara model prediksi.
q Sebaliknya, deskriptif model ini dapat digunakan, misalnya, untuk
mengkategorikan pelanggan dengan produk mereka preferensi dan
tahap kehidupan.
q Deskriptif alat pemodelan dapat dimanfaatkan untuk
mengembangkan lebih lanjut model yang dapat mensimulasikan
sejumlah besar agen individual dan membuat prediksi.
Model Deskriptif
Model:
} An abstraction or representation of a real system, idea, or object
} Captures the most important features
} Can be a written or verbal description, a visual display, a mathematical formula, or a spreadsheet
representation
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-56
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-57
Figure 1.3
} A decision model is a model used to understand, analyze, or facilitate decision making.
} Types of model input
- data
- uncontrollable variables
- decision variables (controllable)
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-58
Descriptive Decision Models
} Simply tell “what is” and describe relationships
} Do not tell managers what to do
DECISION MODELS
An Influence Diagram for Total Cost
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-59
Descriptive Analytics
What has occurred?
Descriptive analytics, such as data
visualization, is important in helping users
interpret the output from predictive and
predictive analytics.
• Descriptive analytics, such as reporting/OLAP,
dashboards, and data visualization, have been
widely used for some time.
• They are the core of traditional BI.
A Break-even Decision Model
TC(manufacturing) = $50,000 + $125*Q
TC(outsourcing) = $175*Q
Breakeven Point:
Set TC(manufacturing)
= TC(outsourcing)
Solve for Q = 1000 units
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-61
Figure 1.7
PREDICTIVE MODEL
62
A Break-even Decision Model
TC(manufacturing) = $50,000 + $125*Q
TC(outsourcing) = $175*Q
Breakeven Point:
Set TC(manufacturing)
= TC(outsourcing)
Solve for Q = 1000 units
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-63
Figure 1.7
• Predictive Decision Models often incorporate uncertainty to help managers analyze risk.
• Aim to predict what will happen in the future.
• Uncertainty is imperfect knowledge of what will happen in the future.
• Risk is associated with the consequences of what actually happens.
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-64
Predictive Analytics
What will occur?
• Marketing is the target for many predictive analytics
applications.
• Descriptive analytics, such as data visualization, is important
in helping users interpret the output from predictive and
prescriptive analytics.
• Algorithms for predictive analytics, such as regression analysis,
machine learning, and neural networks, have also been around
for some time.
• Prescriptive analytics are often referred to as advanced analytics.
A Linear Demand Prediction Model
As price increases, demand falls.
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-66
Figure 1.8
A Nonlinear Demand Prediction Model
Assumes price elasticity (constant ratio of %
change in demand to % change in price)
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-67
Figure 1.9
PRESCRIPTIVE
ANALYTICS
68
Prescriptive Decision Models help decision makers identify the best
solution.
} Optimization - finding values of decision variables that minimize (or
maximize) something such as cost (or profit).
} Objective function - the equation that minimizes (or maximizes) the
quantity of interest.
} Constraints - limitations or restrictions.
} Optimal solution - values of the decision variables at the minimum (or
maximum) point.
DECISION MODELS
• Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall
1-69
Prescriptive Analytics
What should occur?
• For example, the use of mathematical programming for revenue management is
common for organizations that have “perishable” goods (e.g., rental cars, hotel
rooms, airline seats).
• Harrah’s has been using revenue management for hotel room pricing for some
time.
• Prescriptive analytics are often referred to as advanced analytics.
• Regression analysis, machine learning, and neural networks
• Often for the allocation of scarce resources
Know Your Tools &
Why Learn About
Them?
DATA ANALYTICS TOOLS
72
Bahasa Pemrograman Terpopuler Tahun 2017
Sumber : https://spectrum.ieee.org/
TOOLS COVERED IN PROGRAM
The program is developed keeping in mind the needs of an evolving Analytics industry that
requires individuals to be “job-ready” from Day 1.
www.imarticus.org 73
Why SAS?
The largest independent
vendor in the business
intelligence market
The De facto industry
standard for Clinical Data
Analysis
##11Market LeaderMarket Leader
in Analyticsin Analytics
Used in 60,000+
companies in
over 135
countries
“Analytics powerhouse”
INTEGRATED PLATFORM FOR END TO END SOLUTIONS:
SAS provides an integrated set of software products and services
and integrated technologies for information management,
advanced analytics and reporting.
BUSINESS SOLUTIONS ACROSS DOMAINS AND INDUSTRIES:
Unmatched domain specific industry focused analytics solutions
The Forrester Wave™: Big Data Predictive Analytics Solutions, Q1 2013 74
Why R?
R is the #1 Google Search for Advanced Analytics software Google
Trends, April 2016
Highest Paid IT
Skill
Linkedin Skills and
O'Reilly Survey, 2016
Most-used data
science language
after SQL
O’Reilly Survey,
Jan 2014
75% of data
professionals
use R
Rexer Survey,
Oct 2015 Second best
programming
languages for data
science
O'Reilly Survey, 2016
Supports close to
10,000 free
packages
CRAN Figure as on
December 2016
R is #13 of all Programming Languages
Redmonk Language Ratings, June 2015
Demand for R language skills is on the rise.
BCG
Uber
Lloyds of London
& Many More…
Companies Already Onboard R
Facebook
Google
Twitter
McKinsey
ANZ Bank
www.imarticus.org 75
What is Hadoop?
Hadoop is TransformingHadoop is Transforming
Businesses AcrossBusinesses Across
IndustriesIndustries
“The growing use of Apache Hadoop, increasing data warehouse volume sizes and the
accumulation of legacy systems in organizations are fostering structured data growth. These
factors are leading enterprises to understand how to reuse, repurpose and gain critical insight
from this data.” Gartner
BIG DATA STORING AND FASTER PROCESSING
Hadoop is an open source software framework created in 2005
that keeps and processes big data in a distributed manner on
large collection of hardware.
Organizations use Hadoop to
manage their data today
(up from 1 out of 10 in 2012)
1 in 4
BUSINESS SOLUTIONS ACROSS DOMAINS AND INDUSTRIES:
Low cost solution with a high fault tolerance to access and create
value from data.
www.imarticus.org 76
Hadoop – Big Data is a comprehensive class room training program that enables you to analyse data and
create useful information for careers in Data Analytics.
WHY HADOOP?
ScalabilityComputing
Power
Top 5 Reasons Organizations are using Hadoop
Low Cost
Storage
Flexibility
Data
Protection
Top 5 Industries using Hadoop:
• Computer Manufacturing
• Business Services
• Finance
• Retail & Wholesale
• Education & Government
Enterprises using Hadoop
www.imarticus.org 77
HADOOP & BIG-DATA JOBS IN INDIA
www.imarticus.org 78
WHY PYTHON?
Cost of
Ownership
Python is an open source software
that is free to download. Versatility
Multi-purpose language that can
be used to build an entire
application
Python is a powerful, flexible, open-source language that is easy to learn, easy to use, and has
powerful libraries for data manipulation and analysis
What are the reasons for its sudden popularity?
A Data Scientists’ Dream
Python is particularly useful in data analytics because
it has a rich library for reading and writing data,
running calculations on the information and creating
graphical representations of data sets.
We can write map reduce programs in python using
PyDoop. Here is where Python scores over R. While R
uses in-memory processing, Python using PyDoop
can process PetaBytes of data
Python offers extensive
analytics capabilities for
Text & Predictive Analytics.
IDLE & Spyder IDE is
widely used for data mining.
Big Data Analytics made
possible by PyDoop and
Scipy
In industry, the data science
trend shows increasing
popularity of Python. A
Python-based application
stack can more easily
integrate a data scientist who
writes Python code, since that
eliminates a key hurdle in
productionizing a data
scientist's work.
Integration
Big data
compatibility
Python has become one of the big go-to languages for big data processing due to its wide
selection of libraries
www.imarticus.org 79
WHY PYTHON?
Official
language of
Google
Among top
in-demand data
science skills
KDNuggets,
Dec2014
46% of job ads
mention
Python
(after SQL)
KDNuggets
Dec 2014 Ranked #1 of
all programming
languages
Codeeval rankings,
Feb 2015
2nd most
popular data
science language
KDNuggets 2013
Google
Yahoo
Quora
Nokia
ABN
AMRO Bank
IBM
National Weather
Service
& Many More…
Companies Already
Onboard Python
www.imarticus.org 80
WHAT IS DATA VISUALIZATION?
Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people
have depended on visual representations such as charts and maps to understand information more
easily and quickly.
81
WHY TABLEAU FOR DATA VISUALIZATION?
Cost of
Ownership
Tableau is a competitively priced
software that is available for a trial
download.
Versatility
Multi-purpose package that can be
used to build an entire application
Tableau is a powerful, flexible Data Visualization tool that is easy to learn, easy to use, and has
powerful libraries for data visualization and presentation.
Big data
compatibility
Tableau has become one of the big go-to software programs for Data visualization due to the
wide variety of tools it provides and compatibility with Big Data platforms such as Hadoop.
82
WHY TABLEAU FOR DATA VISUALIZATION?
A BUSINESS ANALYSTS’ DREAM
Tableau is easy to learn, use, and significantly faster
than existing solutions. One can easily see patterns,
identify trends and discover visual insights in
seconds. No wizards, no scripts.
Tableau facilitates live, up-to-date data analysis that
taps into the power of the firm’s data warehouse.
Extract data into Tableau’s data engine and take
advantage of breakthrough in-memory architecture.
Tableau offers Powerful
visualization capabilities,
without a single line of code.
Experiment with trend
analyses, regressions,
correlations.
Scalable, secure and
Reliable Cloud and Mobile
Connectivity.
Tableau integrates
exceptionally well with R and
Hadoop, making it a powerful
visualization tool for analytics
and big data use cases.
Developers creating web
applications can integrate fully
interactive Tableau content
into their applications via the
JavaScript API.
INTEGRATION
www.imarticus.org 83
Pelatihan Data Analitik

More Related Content

What's hot

Cobit 5 untuk manajemen teknologi informasi dan proses bisnis
Cobit 5 untuk manajemen teknologi informasi dan proses bisnisCobit 5 untuk manajemen teknologi informasi dan proses bisnis
Cobit 5 untuk manajemen teknologi informasi dan proses bisnis
Agreindra Helmiawan
 
Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...
Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...
Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...
Kasi Irawati
 

What's hot (20)

Peran sistem informasi manajemen pada perusahaan
Peran sistem informasi manajemen pada perusahaanPeran sistem informasi manajemen pada perusahaan
Peran sistem informasi manajemen pada perusahaan
 
Bab 10 Penilaian Prestasi Kerja
Bab 10 Penilaian Prestasi KerjaBab 10 Penilaian Prestasi Kerja
Bab 10 Penilaian Prestasi Kerja
 
Membangun budaya organisasi melalui Digital Mindset
Membangun budaya organisasi melalui Digital MindsetMembangun budaya organisasi melalui Digital Mindset
Membangun budaya organisasi melalui Digital Mindset
 
Cobit 5 untuk manajemen teknologi informasi dan proses bisnis
Cobit 5 untuk manajemen teknologi informasi dan proses bisnisCobit 5 untuk manajemen teknologi informasi dan proses bisnis
Cobit 5 untuk manajemen teknologi informasi dan proses bisnis
 
Chpter 1_Sistem Informasi dalam Bisnis Global Saat Ini.pptx
Chpter 1_Sistem Informasi dalam Bisnis Global Saat Ini.pptxChpter 1_Sistem Informasi dalam Bisnis Global Saat Ini.pptx
Chpter 1_Sistem Informasi dalam Bisnis Global Saat Ini.pptx
 
Powerpoint Komunikasi Bisnis
Powerpoint Komunikasi BisnisPowerpoint Komunikasi Bisnis
Powerpoint Komunikasi Bisnis
 
Keragaman dalam organisasi ppt 1
Keragaman dalam organisasi ppt 1Keragaman dalam organisasi ppt 1
Keragaman dalam organisasi ppt 1
 
Makalah implementasi sistem informasi
Makalah implementasi sistem informasiMakalah implementasi sistem informasi
Makalah implementasi sistem informasi
 
Kamus Kompetensi Spencer
Kamus Kompetensi SpencerKamus Kompetensi Spencer
Kamus Kompetensi Spencer
 
TEORI-TEORI KEPEMIMPINAN
TEORI-TEORI KEPEMIMPINANTEORI-TEORI KEPEMIMPINAN
TEORI-TEORI KEPEMIMPINAN
 
Manajemen Proses Bisnis (Business Process Management, BPM)
Manajemen Proses Bisnis (Business Process Management, BPM)Manajemen Proses Bisnis (Business Process Management, BPM)
Manajemen Proses Bisnis (Business Process Management, BPM)
 
03. Business Information Requirements Template
03. Business Information Requirements Template03. Business Information Requirements Template
03. Business Information Requirements Template
 
Data Governance
Data GovernanceData Governance
Data Governance
 
54329162 si
54329162 si54329162 si
54329162 si
 
Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...
Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...
Bab 9 MENCAPAI KEUNGGULAN OPERASIONAL DAN KEDEKATAN DENGAN PELANGGAN:APLIKASI...
 
SISTEM INFORMASI DALAM KEGIATAN BISNIS GLOBAL SAAT INI
SISTEM INFORMASI DALAM KEGIATAN BISNIS GLOBAL SAAT INISISTEM INFORMASI DALAM KEGIATAN BISNIS GLOBAL SAAT INI
SISTEM INFORMASI DALAM KEGIATAN BISNIS GLOBAL SAAT INI
 
Talent management 4.0 pdf
Talent management 4.0 pdfTalent management 4.0 pdf
Talent management 4.0 pdf
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - Intro
 
COBIT 5
COBIT 5COBIT 5
COBIT 5
 
Pengaruh sistem informasi bagi organisasi bisnis
Pengaruh sistem informasi bagi organisasi bisnisPengaruh sistem informasi bagi organisasi bisnis
Pengaruh sistem informasi bagi organisasi bisnis
 

Similar to Pelatihan Data Analitik

Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
wekineheshete
 
classIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptxclassIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptx
XICSStudents
 

Similar to Pelatihan Data Analitik (20)

Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
5.Measurement and scaling technique.pptx
5.Measurement and scaling technique.pptx5.Measurement and scaling technique.pptx
5.Measurement and scaling technique.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Data analysis
Data analysisData analysis
Data analysis
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 
The Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptxThe Research specifically DataAnalysis.pptx
The Research specifically DataAnalysis.pptx
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Data Analytics .pptx
Data Analytics .pptxData Analytics .pptx
Data Analytics .pptx
 
Data analysis (Seminar for MR) (1).pptx
Data analysis (Seminar for MR) (1).pptxData analysis (Seminar for MR) (1).pptx
Data analysis (Seminar for MR) (1).pptx
 
Data mining
Data miningData mining
Data mining
 
Understanding big data and data analytics big data
Understanding big data and data analytics big dataUnderstanding big data and data analytics big data
Understanding big data and data analytics big data
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
RM UNIT 6.pptx
RM UNIT 6.pptxRM UNIT 6.pptx
RM UNIT 6.pptx
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptx
 
classIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptxclassIX_DS_Teacher_Presentation.pptx
classIX_DS_Teacher_Presentation.pptx
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data and scales of measurement
Data and scales of measurement Data and scales of measurement
Data and scales of measurement
 

More from John Sihotang, Dr, MM, Ir

More from John Sihotang, Dr, MM, Ir (17)

What is Corporate University (Corpu).pdf
What is Corporate University (Corpu).pdfWhat is Corporate University (Corpu).pdf
What is Corporate University (Corpu).pdf
 
Introduction to Change Management.pdf
Introduction to Change Management.pdfIntroduction to Change Management.pdf
Introduction to Change Management.pdf
 
Learning Need Assessment Concept & Implementation.pptx
Learning Need Assessment Concept & Implementation.pptxLearning Need Assessment Concept & Implementation.pptx
Learning Need Assessment Concept & Implementation.pptx
 
Human Capital development and LMS Concept-1.pptx
Human Capital development and LMS Concept-1.pptxHuman Capital development and LMS Concept-1.pptx
Human Capital development and LMS Concept-1.pptx
 
Kumpulan flyer training.pdf
Kumpulan flyer training.pdfKumpulan flyer training.pdf
Kumpulan flyer training.pdf
 
Corpu Development Awareness for BOD-1 PT KAI.pdf
Corpu Development Awareness for BOD-1 PT KAI.pdfCorpu Development Awareness for BOD-1 PT KAI.pdf
Corpu Development Awareness for BOD-1 PT KAI.pdf
 
Revitalisasi Corporate University- Bab-1.pdf
Revitalisasi Corporate University- Bab-1.pdfRevitalisasi Corporate University- Bab-1.pdf
Revitalisasi Corporate University- Bab-1.pdf
 
Corporate University Implementation Readiness Asssessment.pdf
Corporate University Implementation Readiness Asssessment.pdfCorporate University Implementation Readiness Asssessment.pdf
Corporate University Implementation Readiness Asssessment.pdf
 
LMS for Higher Education-Slideshare.pdf
LMS for Higher Education-Slideshare.pdfLMS for Higher Education-Slideshare.pdf
LMS for Higher Education-Slideshare.pdf
 
Pengantar manajemen stratejik
Pengantar manajemen stratejikPengantar manajemen stratejik
Pengantar manajemen stratejik
 
Pengantar Buku Corporate University
Pengantar Buku Corporate UniversityPengantar Buku Corporate University
Pengantar Buku Corporate University
 
Building corporate university roadmap
Building corporate university roadmapBuilding corporate university roadmap
Building corporate university roadmap
 
Transformasi Telkom LC menuju CU
Transformasi Telkom LC menuju CUTransformasi Telkom LC menuju CU
Transformasi Telkom LC menuju CU
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Prinsip Dasar e-Money
Prinsip Dasar  e-MoneyPrinsip Dasar  e-Money
Prinsip Dasar e-Money
 
TELKOM IPTV
TELKOM IPTVTELKOM IPTV
TELKOM IPTV
 
Prinsip Dasar Cloud Computing
Prinsip Dasar Cloud ComputingPrinsip Dasar Cloud Computing
Prinsip Dasar Cloud Computing
 

Recently uploaded

obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
LuisMiguelPaz5
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 

Recently uploaded (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
Huawei Ransomware Protection Storage Solution Technical Overview Presentation...
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 

Pelatihan Data Analitik

  • 1. DATA ANALYTIC TRAINING TRINER DR. IR. JOHN SIHOTANG,MM MANAJEMEN TRAINEE SUCOFINDO JAKARTA, 24-26 APRIL 2019 jhotank@yahoo.com 0811-231-509
  • 2. 2
  • 3. DATA ANALYTICS CONCEPT & IMPLEMENTATION MANAJEMEN TRAINEE SUCOFINDO JAKARTA, 24-26 APRIL 2019
  • 4. TOPIK MATERI 1.BIG DATA OVERVIEW 2.DATA FOR BUSINESS ANALYSIS 3.DATA ANALYTICS CONCEPT 4.DATA ANALYTICS TOOLS 4
  • 6. THEME OF THIS COURSE Large-Scale Data Management Big Data Analytics Data Science and Analytics • How to manage very large amounts of data and extract value and knowledge from them 6
  • 7. INTRODUCTION TO BIG DATA What is Big Data? What makes data, “Big” Data? 7
  • 8. BIG DATA DEFINITION • No single standard definition… “Big Data” is data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it… 8
  • 9. CHARACTERISTICS OF BIG DATA: 1-SCALE (VOLUME) • Data Volume – 44x increase from 2009 2020 – From 0.8 zettabytes to 35zb • Data volume is increasing exponentially 9 Exponential increase in collected/generated data
  • 10. CHARACTERISTICS OF BIG DATA: 2-COMPLEXITY (VARITY) • Various formats, types, and structures • Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc… • Static data vs. streaming data • A single application can be generating/collecting many types of data 10 To extract knowledgeè all these types of data need to linked together
  • 11. CHARACTERISTICS OF BIG DATA: 3-SPEED (VELOCITY) • Data is begin generated fast and need to be processed fast • Online Data Analytics • Late decisions è missing opportunities • Examples – E-Promotions: Based on your current location, your purchase history, what you like è send promotions right now for store next to you – Healthcare monitoring: sensors monitoring your activities and body è any abnormal measurements require immediate reaction 11
  • 13. SOME MAKE IT 4V’S 13
  • 14. HARNESSING BIG DATA • OLTP: Online Transaction Processing (DBMSs) • OLAP: Online Analytical Processing (Data Warehousing) • RTAP: Real-Time Analytics Processing (Big Data Architecture & technology) 14
  • 15. WHO’S GENERATING BIG DATA Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 15
  • 16. THE MODEL HAS CHANGED… • The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data 16
  • 17. WHAT’S DRIVING BIG DATA - Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets - Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time 17
  • 18. VALUE OF BIG DATA ANALYTICS • Big data is more real-time in nature than traditional DW applications • Traditional DW architectures (e.g. Exadata, Teradata) are not well-suited for big data apps • Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps 18
  • 19. CHALLENGES IN HANDLING BIG DATA • The Bottleneck is in technology – New architecture, algorithms, techniques are needed • Also in technical skills – Experts in using the new technology and dealing with big data 19
  • 21. 2 BASIC DEFINITION qData: Data is a set of values of qualitative or quantitative variables. It is information in raw or unorganized form. It may be a fact, figure, characters, symbols etc. qInformation: Meaningful or organised data is information
  • 22. q Data terstruktur (structured data); yakni data yang sudah dikelola, diproses dan dimanipulasi dalam RDBMS (Relational Database Management System). Misalnya data tabel hasil masukan formulir pendaftaran di sebuah layanan web. q Data tidak terstruktur (unscructured data); yakni berupa data mentah yang baru didapat dari beragam jenis aktivitas dan belum disesuaikan ke dalam format basis data. Misalnya berkas video yang didapat dari kamera. q Data semi terstruktur (semistructured data); yakni berupa data yang memiliki struktur, misalnya berupa tag, akan tetapi belum sepenuhnya terstruktur dalam sistem basis data. Misalnya data yang memiliki keseragaman tag, namun memiliki isian yang berbeda didasarkan pada karakteristik pengisi. JENIS DATA
  • 23. TYPE OF DATA Data types are an important concept because statistical methods can only be used with certain data types. You have to analyze continuous data differently than categorical data otherwise it would result in a wrong analysis. Therefore knowing the types of data you are dealing with, enables you to choose the correct method of analysis.
  • 24. (continued) Classifying Data Elements in a Purchasing Database DATA FOR BUSINESS ANALYTICS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-24 Figure 1.2
  • 25. • Nominal or categorical data is data that comprises of categories that cannot be rank ordered – each category is just different. • The categories available cannot be placed in any order and no judgement can be made about the relative size or distance from one category to another. Ø Categories bear no quantitative relationship to one another Ø Examples: ü customer’s location (America, Europe, Asia) ü employee classification (manager, supervisor, associate) • What does this mean? No mathematical operations can be performed on the data relative to each other. • Therefore, nominal data reflect qualitative differences rather than quantitative ones. Categorical (Nominal) data
  • 26. Examples: Nominal data What is your gender? (please tick) Male Female Did you enjoy the film? (please tick) Yes No Nominal values represent discrete units and are used to label variables, that have no quantitative value. Just think of them as „labels“. Note that nominal data that has no order. Therefore if you would change the order of its values, the meaning would not change. You can see two examples of nominal features below:
  • 27. q Systems for measuring nominal data must ensure that each category is mutually exclusive and the system of measurement needs to be exhaustive. q Variables that have only two responses i.e. Yes or No, are known as dichotomies. Nominal data
  • 28. Nominal data When you are dealing with nominal data, you collect information through: 1. Frequencies: The Frequency is the rate at which something occurs over a period of time or within a dataset. 2. Proportion: You can easily calculate the proportion by dividing the frequency by the total number of events. (e.g how often something happened divided by how often it could happen) 3. Percentage. 4. Visualization Methods: To visualize nominal data you can use a pie chart or a bar chart.
  • 29. § Ordinal data is data that comprises of categories that can be rank ordered. § Similarly with nominal data the distance between each category cannot be calculated but the categories can be ranked above or below each other. Ø No fixed units of measurement Ø Examples: ü college football rankings ü survey responses (poor, average, good, very good, excellent) § What does this mean? Can make statistical judgements and perform limited maths. Ordinal data
  • 30. Example: Ordinal data How satisfied are you with the level of service you have received? (please tick) Very satisfied Somewhat satisfied Neutral Somewhat dissatisfied Very dissatisfied
  • 31. Ordinal data q When you are dealing with ordinal data, you can use the same methods like with nominal data, but you also have access to some additional tools. q Therefore you can summarize your ordinal data with frequencies, proportions, percentages. q And you can visualize it with pie and bar charts. Additionally, you can use percentiles, median, mode and the interquartile range to summarize your data. q in Data Science, you can use one label encoding, to transform ordinal data into a numeric feature.
  • 32. q Both interval and ratio data are examples of scale data. q Scale data: • data is in numeric format ($50, $100, $150) • data that can be measured on a continuous scale • the distance between each can be observed and as a result measured • the data can be placed in rank order. Interval and ratio data
  • 33. Ø Ordinal data but with constant differences between observations Ø Ratios are not meaningful Ø Examples: § Time – moves along a continuous measure or seconds, minutes and so on and is without a zero point of time. § Temperature – moves along a continuous measure of degrees and is without a true zero. § SAT scores Interval data
  • 34. • Ratio data measured on a continuous scale and does have a natural zero point. Ø Ratios are meaningful Ø Examples: • monthly sales • delivery times • Weight • Height • Age Ratio data
  • 35. 35 q When you are dealing with continuous data, you can use the most methods to describe your data. You can summarize your data using percentiles, median, interquartile range, mean, mode, standard deviation, and range. q Visualization Methods: To visualize continuous data, you can use a histogram or a box-plot. With a histogram, you can check the central tendency, variability, modality, and kurtosis of a distribution. Note that a histogram can’t show you if you have any outliers. This is why we also use box-plots. Continuous Data
  • 36. 36 q you discovered the different data types that are used throughout statistics. q You learned the difference between discrete & continuous data and learned what nominal, ordinal, interval and ratio measurement scales are. q Furthermore, you now know what statistical measurements you can use at which data type and which are the right visualization methods. q You also learned, with which methods categorical variables can be transformed into numeric variables. q This enables you to create a big part of an exploratory analysis on a given dataset. Summary
  • 38. 2 BASIC DEFINITION q Analytics: Analytics is the discovery , interpretation, and communication of meaningful patterns or summery in data. q Data Analytics (DA) is the process of examining data sets in order to draw conclusion about the information it contains. q Analytics is not a tool or technology, rather it is the way of thinking and acting on data.
  • 39. WHAT IS ANALYTICS? Data on its own is useless unless you can make sense of it! WHAT IS ANALYTICS? The scientific process of transforming data into insight for making better decisions, offering new opportunities for a competitive advantage www.imarticus.org 39
  • 40. The Case for Business Analytics • The Business environment today is more complex than ever before. • Businesses are expected to be diligently responsive to the increasing demands of customers, various stakeholders and even regulators. • Organizations have been turning to the use of analytics. • More than 83% of Global CIOs surveyed by IBM in 2010 singled out Business Intelligence and Analytics as one of their visionary plans for enhancing competitiveness. In most cases the primary objective of an organization that seeks to turn to analytics is: • Revenue/Profit growth • Optimize expenditure SOLUTION BUSINESS NEED GOAL www.imarticus.org 40
  • 41. 6 WHAT IS DATA ANALYTICS? • Data Analytics: – “is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making”. - Wikipedia – "leverage data in a particular functional process (or application) to enable context-specific insight that is actionable.“ – Gartner – “is using our current data sets to extract useful information to support advanced decision making” - ATC • Data Visualizations (i.e. Data Viz): – Visual context of data, Dashboards – Often single page, real-time user interface, graphical presentation of your data “Without data you’re just another person with an opinion.” ~W. Edwards Deming
  • 42. DATA ANALYTICS PILLARS AND FOUNDATION 10
  • 43. 11 DATA ANALYTICS ARCHITECTURE Analysis Visualization Translation Cleaning Processing Interpretation Preparation Data Layer Integration Data Collection Business Information Technology Partnership and Stewardship Wisdom Information & Knowledge Data DataRulesTools Action Action “The greatest value of a picture is when it forces to notice what we never expected to see.” ~John Tukey
  • 44. TYPES OF ANALYTICS 1. Descriptive Analytics; analisis ini mengacu pada histori data sekaligus data yang ada saat ini. Umumnya digunakan untuk menjawab pertanyaan semacam “Apa yang terjadi dengan ABC?”, “Apa yang terjadi jika XYZ?”, dan sebagainya. 2. Diagnostic Analytics; analisis ini digunakan untuk menyimpulkan kejadian berdasarkan lansiran data terkait. Digunakan untuk menjawab pertanyaan semacam “Mengapa ABC terjadi saat XYZ?”, “Apa yang salah dengan strategi DEF?”, dan sebagaiya. 3. Predictive Analytics; analisis ini mencoba menyimpulkan sebuah tren dan kejadian di masa depan mengacu pada data-data historis yang ada. Model ini cenderung lebih kompleks dari dua tipe sebelumnya, karena memerlukan pemodelan dan analisis yang lebih mendalam. 4. Prescriptive Analytics; analisis ini digunakan untuk mengoptimalkan proses, struktur dan sistem melalui informasi yang dihasilkan dari Predictive Analytics. Pada dasarnya memberi tahu kepada bisnis tentang hal apa yang perlu dilakukan untuk mengantisipasi kejadian yang ada datang.
  • 45. BDA-45 Copyright (except where referenced) 2014-2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money DESCRIPTIVE ANALYTICS • Process: – Identify the attributes, then assess/evaluate the attributes – Estimate the magnitude to correlate the relative contribution of each attribute to the final solution – Accumulate more instances of data from the data sources – If possible, perform the steps of evaluation, classification and categorization quickly – Yield a measure of adaptability within the OODA loop • At some threshold, crossover into diagnostic and predictive analytics http://v1shal.com/content/25- cartoons-give-current-big-data- hype-perspective/
  • 46. 46 DIAGNOSTIC ANALYTICS • Process: – Begin with descriptive analytics – Extract patterns from large data quantities via data mining – Correlate data types for explanation of near-term behavior – past and present – Estimate linear/non-linear behavior not easily identifiable through other approaches. • Example: by classifying past insurance claims, estimate the number of future claims to flag for investigation with a high probability of being fraudulent. Copyright (except where referenced) 2014-2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
  • 47. PREDICTIVE ANALYTICS • Process: – Begin with descriptive AND diagnostic analytics – Choose the right data based on domain knowledge and relationships among variables – Choose the right techniques to yield insight into possible outcomes – Determine the likelihood of possible outcomes given initial boundary conditions – Remember! Data driven analytics is non-linear; do NOT treat like an engineering project 47
  • 48. BD A- 48 PRESCRIPTIVE ANALYTICS • Process: – Begin w/ predictive analytics – Determine what should occur and how to make it so – Determine the mitigating factors that lead to desirable/undesirable outcomes – “What-if” analysis w/ local or global optimization – Ex: Find the best set of prices and advertising frequency to maximize revenue – Ex: And, the right set of business moves to make to achieve that goal “Make it so” Copyright (except where referenced) 2014-2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
  • 49. BDA-49 DECISIVE ANALYTICS Ø Process: • Given a set of decision alternatives, choose the one course of action to do from possibly many • But, it may not be the optimal one. • Visualize alternatives – whole or partial subset • Perform exploratory analysis – what-if and why – How do I get to there from here? – How did I get here from there? Copyright (except where referenced) 2014-2016 Stephen H. Kaisler, Frank Armour, Alberto Espinosa and William H. Money
  • 50. 7 DATA ANALYTIC CAPABILITIES “The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge.” ~Stephen Hawking
  • 51. 12 CURRENT STATE – RELIABILITY DATA (Looking back What happened?) (Looking forward What will happen?) Information Manual Data
  • 52. 13 CURRENT STATE – RELIABILITY DATA (Looking back) (Looking forward)
  • 54. 54 q Analytics deskriptif adalah proses data analytics untuk mendapatkan gambaran umum dari data yang sudah dikumpulkan. q Contoh dari analytcs desckriptif adalah Google Analytics. Pada Google Analytics kita hanya bisa melihat informasi sederhana seperti ada berapa jumlah visitor per satuan waktu, halaman mana saja yang paling sering dikunjungi, dan data seperti itu. q Pada analytics sederhana seperti penjumlahan dan rata-rata tanpa machine learning sudah lebih dari cukup. q Analytics deskriptif tidak menampilkan prediksi halaman apa yang akan dikunjungi pengunjung berikutnya atau kenapa seorang pengunjung mengunjungi suatu halaman. q Data analytics jenis ini adalah yang paling umum ditemui. Meskipun hanya data sederhana tanpa pengolahan machine learning, data seperti ini sangat diperlukan terutama untuk melakukan benchmarking untuk mengetahui efek dari perubahan yang kita lakukan. Analytics deskriptif (Descriptive Analytics)
  • 55. 55 q Deskriptif model mengukur hubungan data dalam suatu cara yang sering digunakan untuk mengklasifikasikan pelanggan atau prospek menjadi kelompok-kelompok. q Deskriptif model tidak rank-order pelanggan dengan kemungkinan mereka untuk mengambil tindakan tertentu cara model prediksi. q Sebaliknya, deskriptif model ini dapat digunakan, misalnya, untuk mengkategorikan pelanggan dengan produk mereka preferensi dan tahap kehidupan. q Deskriptif alat pemodelan dapat dimanfaatkan untuk mengembangkan lebih lanjut model yang dapat mensimulasikan sejumlah besar agen individual dan membuat prediksi. Model Deskriptif
  • 56. Model: } An abstraction or representation of a real system, idea, or object } Captures the most important features } Can be a written or verbal description, a visual display, a mathematical formula, or a spreadsheet representation DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-56
  • 57. DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-57 Figure 1.3
  • 58. } A decision model is a model used to understand, analyze, or facilitate decision making. } Types of model input - data - uncontrollable variables - decision variables (controllable) DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-58
  • 59. Descriptive Decision Models } Simply tell “what is” and describe relationships } Do not tell managers what to do DECISION MODELS An Influence Diagram for Total Cost • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-59
  • 60. Descriptive Analytics What has occurred? Descriptive analytics, such as data visualization, is important in helping users interpret the output from predictive and predictive analytics. • Descriptive analytics, such as reporting/OLAP, dashboards, and data visualization, have been widely used for some time. • They are the core of traditional BI.
  • 61. A Break-even Decision Model TC(manufacturing) = $50,000 + $125*Q TC(outsourcing) = $175*Q Breakeven Point: Set TC(manufacturing) = TC(outsourcing) Solve for Q = 1000 units DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-61 Figure 1.7
  • 63. A Break-even Decision Model TC(manufacturing) = $50,000 + $125*Q TC(outsourcing) = $175*Q Breakeven Point: Set TC(manufacturing) = TC(outsourcing) Solve for Q = 1000 units DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-63 Figure 1.7
  • 64. • Predictive Decision Models often incorporate uncertainty to help managers analyze risk. • Aim to predict what will happen in the future. • Uncertainty is imperfect knowledge of what will happen in the future. • Risk is associated with the consequences of what actually happens. DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-64
  • 65. Predictive Analytics What will occur? • Marketing is the target for many predictive analytics applications. • Descriptive analytics, such as data visualization, is important in helping users interpret the output from predictive and prescriptive analytics. • Algorithms for predictive analytics, such as regression analysis, machine learning, and neural networks, have also been around for some time. • Prescriptive analytics are often referred to as advanced analytics.
  • 66. A Linear Demand Prediction Model As price increases, demand falls. DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-66 Figure 1.8
  • 67. A Nonlinear Demand Prediction Model Assumes price elasticity (constant ratio of % change in demand to % change in price) DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-67 Figure 1.9
  • 69. Prescriptive Decision Models help decision makers identify the best solution. } Optimization - finding values of decision variables that minimize (or maximize) something such as cost (or profit). } Objective function - the equation that minimizes (or maximizes) the quantity of interest. } Constraints - limitations or restrictions. } Optimal solution - values of the decision variables at the minimum (or maximum) point. DECISION MODELS • Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall 1-69
  • 70. Prescriptive Analytics What should occur? • For example, the use of mathematical programming for revenue management is common for organizations that have “perishable” goods (e.g., rental cars, hotel rooms, airline seats). • Harrah’s has been using revenue management for hotel room pricing for some time. • Prescriptive analytics are often referred to as advanced analytics. • Regression analysis, machine learning, and neural networks • Often for the allocation of scarce resources
  • 71. Know Your Tools & Why Learn About Them? DATA ANALYTICS TOOLS
  • 72. 72 Bahasa Pemrograman Terpopuler Tahun 2017 Sumber : https://spectrum.ieee.org/
  • 73. TOOLS COVERED IN PROGRAM The program is developed keeping in mind the needs of an evolving Analytics industry that requires individuals to be “job-ready” from Day 1. www.imarticus.org 73
  • 74. Why SAS? The largest independent vendor in the business intelligence market The De facto industry standard for Clinical Data Analysis ##11Market LeaderMarket Leader in Analyticsin Analytics Used in 60,000+ companies in over 135 countries “Analytics powerhouse” INTEGRATED PLATFORM FOR END TO END SOLUTIONS: SAS provides an integrated set of software products and services and integrated technologies for information management, advanced analytics and reporting. BUSINESS SOLUTIONS ACROSS DOMAINS AND INDUSTRIES: Unmatched domain specific industry focused analytics solutions The Forrester Wave™: Big Data Predictive Analytics Solutions, Q1 2013 74
  • 75. Why R? R is the #1 Google Search for Advanced Analytics software Google Trends, April 2016 Highest Paid IT Skill Linkedin Skills and O'Reilly Survey, 2016 Most-used data science language after SQL O’Reilly Survey, Jan 2014 75% of data professionals use R Rexer Survey, Oct 2015 Second best programming languages for data science O'Reilly Survey, 2016 Supports close to 10,000 free packages CRAN Figure as on December 2016 R is #13 of all Programming Languages Redmonk Language Ratings, June 2015 Demand for R language skills is on the rise. BCG Uber Lloyds of London & Many More… Companies Already Onboard R Facebook Google Twitter McKinsey ANZ Bank www.imarticus.org 75
  • 76. What is Hadoop? Hadoop is TransformingHadoop is Transforming Businesses AcrossBusinesses Across IndustriesIndustries “The growing use of Apache Hadoop, increasing data warehouse volume sizes and the accumulation of legacy systems in organizations are fostering structured data growth. These factors are leading enterprises to understand how to reuse, repurpose and gain critical insight from this data.” Gartner BIG DATA STORING AND FASTER PROCESSING Hadoop is an open source software framework created in 2005 that keeps and processes big data in a distributed manner on large collection of hardware. Organizations use Hadoop to manage their data today (up from 1 out of 10 in 2012) 1 in 4 BUSINESS SOLUTIONS ACROSS DOMAINS AND INDUSTRIES: Low cost solution with a high fault tolerance to access and create value from data. www.imarticus.org 76
  • 77. Hadoop – Big Data is a comprehensive class room training program that enables you to analyse data and create useful information for careers in Data Analytics. WHY HADOOP? ScalabilityComputing Power Top 5 Reasons Organizations are using Hadoop Low Cost Storage Flexibility Data Protection Top 5 Industries using Hadoop: • Computer Manufacturing • Business Services • Finance • Retail & Wholesale • Education & Government Enterprises using Hadoop www.imarticus.org 77
  • 78. HADOOP & BIG-DATA JOBS IN INDIA www.imarticus.org 78
  • 79. WHY PYTHON? Cost of Ownership Python is an open source software that is free to download. Versatility Multi-purpose language that can be used to build an entire application Python is a powerful, flexible, open-source language that is easy to learn, easy to use, and has powerful libraries for data manipulation and analysis What are the reasons for its sudden popularity? A Data Scientists’ Dream Python is particularly useful in data analytics because it has a rich library for reading and writing data, running calculations on the information and creating graphical representations of data sets. We can write map reduce programs in python using PyDoop. Here is where Python scores over R. While R uses in-memory processing, Python using PyDoop can process PetaBytes of data Python offers extensive analytics capabilities for Text & Predictive Analytics. IDLE & Spyder IDE is widely used for data mining. Big Data Analytics made possible by PyDoop and Scipy In industry, the data science trend shows increasing popularity of Python. A Python-based application stack can more easily integrate a data scientist who writes Python code, since that eliminates a key hurdle in productionizing a data scientist's work. Integration Big data compatibility Python has become one of the big go-to languages for big data processing due to its wide selection of libraries www.imarticus.org 79
  • 80. WHY PYTHON? Official language of Google Among top in-demand data science skills KDNuggets, Dec2014 46% of job ads mention Python (after SQL) KDNuggets Dec 2014 Ranked #1 of all programming languages Codeeval rankings, Feb 2015 2nd most popular data science language KDNuggets 2013 Google Yahoo Quora Nokia ABN AMRO Bank IBM National Weather Service & Many More… Companies Already Onboard Python www.imarticus.org 80
  • 81. WHAT IS DATA VISUALIZATION? Data visualization is the presentation of data in a pictorial or graphical format. For centuries, people have depended on visual representations such as charts and maps to understand information more easily and quickly. 81
  • 82. WHY TABLEAU FOR DATA VISUALIZATION? Cost of Ownership Tableau is a competitively priced software that is available for a trial download. Versatility Multi-purpose package that can be used to build an entire application Tableau is a powerful, flexible Data Visualization tool that is easy to learn, easy to use, and has powerful libraries for data visualization and presentation. Big data compatibility Tableau has become one of the big go-to software programs for Data visualization due to the wide variety of tools it provides and compatibility with Big Data platforms such as Hadoop. 82
  • 83. WHY TABLEAU FOR DATA VISUALIZATION? A BUSINESS ANALYSTS’ DREAM Tableau is easy to learn, use, and significantly faster than existing solutions. One can easily see patterns, identify trends and discover visual insights in seconds. No wizards, no scripts. Tableau facilitates live, up-to-date data analysis that taps into the power of the firm’s data warehouse. Extract data into Tableau’s data engine and take advantage of breakthrough in-memory architecture. Tableau offers Powerful visualization capabilities, without a single line of code. Experiment with trend analyses, regressions, correlations. Scalable, secure and Reliable Cloud and Mobile Connectivity. Tableau integrates exceptionally well with R and Hadoop, making it a powerful visualization tool for analytics and big data use cases. Developers creating web applications can integrate fully interactive Tableau content into their applications via the JavaScript API. INTEGRATION www.imarticus.org 83