Big data analytics and machine intelligence v5.0

Big Data Analytics and
Machine Intelligence

Giza At A Glance
• We are system integrator
• 43 years in the market
• Work in 25 countries
• 4 Regions of operation
• Enterprise Business
Solutions
• SCADA
• Transmission &
Distribution
• Transportation
Infrastructure
• Field Solutions
• Smart Buildings

Contents
• Introduction
• When Data is “Big”
• Big Data Information System Layers
Data Platform
Data Science & Advanced Analytics
Information Presentation
Actionable Insights
• Machine Intelligence

• 2014, EMC & IDC digital universe report
• A study to analyze and forecast the amount of
data produced annually
• It is the universe of digital data
• Like the physical universe
It expands fast
Includes stars
Includes dark matter
About everything
The Digital Universe

Digital Universe Expands Fast
• Digital data doubles every two year
• Expected 44 ZB by 2020  44 Trillion GB
– ZB  103 EB  106 PB  109 TB
• Every second 205,000 new GB
• During this presentation ~ new 550 Million GB
• Less than 25% of recorded data is tagged

Telecommunication Revolution
• Smart phones full of
sensors
• Smart phone cameras
• High speed networks
• Mobile penetration
• Multiple devices per
customer
• Huge amount of data
transferred
• Communication
control data

Social Networks
• YouTube Statistics
 1,300,000,000 users
 300 hours / minute
uploaded
 30 million visitors /
day

Internet of Things: Smart Cities
• Metering
• Smart homes
• Smart buildings
• Smart parking
• Street lighting
• Traffic monitoring
• And others

Internet of Things: Smart Farming
• Weather measuring
• Air sensors
• Water sensors
• Water leakage sensors
• Soil monitoring
• Irrigation monitoring and control
• Harvesting machines tracking and monitoring
• Farm animals tracking and monitoring
• And others

Internet of Things: Industrial
• Air craft sensors gather ~1TB per flight
• Jet engines produces ~25 MB per flight hour per
engine
• Think about
– power plants,
– oil plants,
– water plants, etc.

• Gartner, the known provenance of 3Vs of Big Data defines
Big Data as: High-volume, high-velocity and high-variety
information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and
decision making.
• IDC defines Big Data technologies as: A new
generation of technologies and architectures, designed to
economically extract value from very large volumes of a
wide variety of data by enabling high-velocity capture,
discovery, and/or analysis.
Definitions

• Structured, semi-structured and non-structured data
• Semi-structured
Log files
Manually edited excel files
Others
• Non-structured
Chat conversations
Emails
Images & videos
Others
• Most of this data already belongs to organizations, but it is
sitting there unused — that’s why Gartner calls it dark data
Data Variety

• The speed at which data is:-
Created
Stored
Analyzed
• In Big Data systems, data is created in real-time or
near real-time
Data Velocity

• 90% of all data ever created, was created in past 2
years
• Estimated amount of data doubles every two year
• The era of a trillion sensor is upon us
Data Volume

Big Data Information
System Layers

Big Data Information System
Layers
Actionable Insights
Information
Presentation
Data Science &
Advanced Analytics
Data Platform

Data Platform
Actionable Insights
Information
Presentation
Data Science &
Advanced Analytics
Data Platform

Hadoop Distributed File System
(HDFS)
• Open source project
• Java-based file system that
• Scalable up to 200 PB
• Up to 4500 server of single cluster
• Close to a billion files and blocks
• Concurrent access through
“YARN”

Map-Reduce Algorithm
• A framework for
processing problems in
parallel
• Uses multiple computing
cluster nodes

Apache HBase
• Open source project
• Non-relational database
• Column-oriented key-value
data store
• Part of Hadoop project
• Can serve as input & output of
map-reduce jobs in Hadoop
• Data access through Java API

Apache Phoenix
• Open source
• Part of Apache Hadoop
Project
• Based on Apache HBase
• Provides a JDBC and
ODBC drivers for Hbase

Hadoop Distributions
• Top Known:-
- Cloudera
- MapR
- Hortonworks
- IBM
- Pivotal HD
- Intel distribution
• Cloud based:-
- Azure HDInsight
- Amazon Elastic MapReduce

Massively Parallel Processing
(MPP) Data Warehouse
Architecture
• Share nothing architecture, no single point of failure
• Scale horizontally by adding nodes
• Breaks large queries across nodes for parallel
processing
• Higher data ingestion rates through parallelized data
movement

MPP Database Examples
• Teradata
• Netezza
• Vertica
• Greenplum
• Microsoft PDW (Parallel
Data Warehouse)
• DB2 UDB with database
partitioning feature
(DPF)

Pivotal Greenplum Architecture

Actionable Insights
Information
Presentation
Data Science &
Advanced Analytics
Data Platform
Data Science and Advanced
Analytics Layer

Types of Data Analytics
Analytics
Descriptive
Diagnostic
Predictive
Prescriptive

Descriptive Analytics
• What happened
- Which KPIs
- Which time frame
- Which filter
- What chart type
- How remove noise

Diagnostic Analytics
• Why happened
- Why this KPI is low
- What factors of KPI
- Which factors use
to compare
- How to compare
with changing
single factor and fix
others

On-Line Analytical Processing
(OLAP)

Predictive Analytics
• Predict / Forecasting
• Segmentation
• Classification
• Anomaly detection
• Sentiment Prediction

Prescriptive Analytics
• What is the best
course of action?
• Simulation
• Optimization
• What-if analysis

Data Mining
• Data mining is the computing process of discovering
patterns in large data sets.
• Cross Industry Standard Process for Data Mining
(CRISP-DM):-
- Business understanding
- Data understanding
- Modeling
- Evaluation
- Deployment

Data Mining Techniques
• Regression
• Classification
• Cluster Analysis
• Correlation Analysis
• Outlier Analysis
• Anomaly Detection

Proprietary Data Mining Tools
• SAS Analytics
• IBM SPSS
• SAP Predictive Analytics
• Angoss Predictive
Analytics
• KXEN Predictive Analytics
• Oracle Data Mining (ODM)
• Statistica
• TIBCO Analytics
• Matlab

Open Source
• Python packages
• R Project
• RapidMiner
• KNIME
• Weka
• Octave
• GGobi
• Tangara
• Prediction IO

Information Presentation
Actionable Insights
Information
Presentation
Data Science &
Advanced Analytics
Data Platform

Reporting / Dashboards
• Reporting
Rich formatted and interactive
reports
Reports with / or without
parameters
Using scheduling capabilities
• Dashboards
Publishing web based / mobile
reports
Interactive display for KPI
comparisons with targets
Integration with operational
applications and or event
processing engines

Alerts
• Alerts of business intelligence and analytics content
via:
Emails
SMS
Or customized receiver (i.e. custom web
service)

Geospatial and Location
Intelligence
• Combining geographical
and location-related data
from data sources
including:-
- Aerial maps
- GISs
- Consumer
demographics
• Displaying relationships by
overlaying data on
interactive maps

Mobile Information Presentation
• Develop and deliver
content to mobile devices
• Publishing mode and/or
interactive mode
• Takes advantage of mobile
devices’ native caps i.e.:-
- Touch screens
- Camera
- Location awareness
- Natural-Language
query

Actionable Insights
Information
Presentation
Data Science &
Advanced Analytics
Data Platform
Actionable Insights

Linking Insights to Actions
• Forrester reports that
74% of firms want to be
“data driven”
• But only 29% are
actually successfully
connecting analytics to
action
• Actionable insights are
the missing link

Attributes of Actionable Insights
Aligned with your
business goals
Insight results have
context
Relevance; Insights
delivered to the right
person, in the right time
and settings
Insights are Specific
Novel insights have an
advantage over familiar
ones
Clarity of the insight

Machine Learning
“Machine Learning is giving
computers the ability to learn
without being explicitly
programmed.”
~ Arthur Samuel

Why Machine Learning for Big
Data Analytics
• Dark data makes up more than 90% of the digital
universe
• This is huge amount of data volume, formats, and
sources to be handled in a conventional way
• Analysis of non-structured data like images, videos,
and sound files is usually done using Machine
Learning algorithms
• More data  better training results

Artificial Neural Networks (ANN)
• Computing systems are
inspired by biological neural
networks
• Based on a collection of
artificial “neurons” connected
by “synaptic connections”
• Synaptic connections have
weights to control transmitted
signal strength
• Neurons may have thresholds
to control aggregated signal
transmission

Deep Neural Networks (DNN)
• ANN with multiple hidden
layers between the input
and output layers
• The extra layers enable
composition of features from
lower layers
• Applied technology for
tagging of huge amount of
Dark Data images, videos,
speech, music, etc.

Graphics Processing Units (GPU)
• Rapidly create images in frame buffers for output
to display device
• General Purpose GPU (GPGPU), stream
processor or vector processor running compute
kernels
• Suitable for deep neural networks learning
• Several orders of magnitude higher than CPU
• GPU clusters
• Cloud-based GPU (IaaS)

Combining HDFS with GPU
Conventional Large Scale Distributed Deep
Learning on Hadoop Clusters

©2017 Giza Systems. All rights reserved.
Giza Systems, a leading systems integrator in the MEA region, designs and deploys industry-specific technology solutions for asset-intensive industries
such as the Telecoms, Utilities, Oil & Gas, Transportation and other market sectors. We help our clients streamline their operations and businesses
through our portfolio of solutions, managed services, and consultancy practice. Our team of 800 professionals are spread throughout the region with
anchor offices in Cairo, Riyadh, Dubai, Nairobi, Dar-es-Salaam and Abuja, allowing us to service an ever-increasing client base in over 40 countries.
Thank You!

Big data analytics and machine intelligence v5.0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data analytics and machine intelligence v5.0

Similar to Big data analytics and machine intelligence v5.0 (20)

More from Amr Kamel Deklel

More from Amr Kamel Deklel (10)

Recently uploaded

Recently uploaded (20)

Big data analytics and machine intelligence v5.0

Editor's Notes