WWW.TIC.OM
INNOVATIVE & LEADING EDGE
IT SOLUTIONS
WWW.TIC.OM
INNOVATIVE & LEADING EDGE
IT SOLUTIONS
It is a capital mistake to theorize before one has data. Insensibly one begins to twist
facts to suit theories, instead of theories to suit facts.
Sir Arthur Conan Doyle
WWW.TIC.OM
WWW.TIC.OM
Big Data and Storage
Smart Talks
Amjid Ali
Head of Business - TIC
WWW.TIC.OM
It is a capital mistake to theorize before one has data. Insensibly one begins to twist
facts to suit theories, instead of theories to suit facts.
Sir Arthur Conan Doyle
WWW.TIC.OM
WWW.TIC.OM
Agenda
Big Data and Storage
• Introduction
• Data Generations / Timeline
• Why big data? – Users vs Devices and IoTs
• Practical Benefits
• Big data defined.
• Landscape
• Storage
• Next Generation Storage
WWW.TIC.OM
●Every object on the earth will be generating data.
●Digital format of Information
●Quick search through tons of information.
●We are exposed to vast ocean of data.
●What we buy, where we go, what we say, what we do is all
been recorded forever.
HUMAN FACE OF BIG DATA
WWW.TIC.OM
●Buzz word since 2012
●Data, small data, big data.
●Exceed the processing capacity of conventional data
●All data is not being analyzed.
INTRODUCTION
WWW.TIC.OM
• Data is “data” what is big?
• Cannot be analyzed using traditional computing techniques.
• Storage
• Processing
• Visualization
INTRODUCTION - BIG DATA
WWW.TIC.OM
INTRODUCTION – BIG DATA
WWW.TIC.OM
• Relevant to more and more organizations.
• New field of applications.
• Large volume and generate automatically and continuedly.
• Various data sources
• Limitations for analyzing
• Complexity and speed limitations
INTRODUCTION - BIG DATA
WWW.TIC.OM
TIMELINE
BIG DATA
WWW.TIC.OM
BIG DATA TIMELINE
“information explosion” (a term first used in 1941,
according to the Oxford English Dictionary).
2030 – to start all the data generated 6X size of greater London data center will be required.
WWW.TIC.OM
BIG DATA TIMELINE
Over load Census
Punch
Cards
Accounting
Machine
Library
Rate of
Transmissi
on
Storage
Capacity
Predict Big Data
Visualizatio
n
1999199619901971196719441927191018901880
WWW.TIC.OM
BIG DATA TIMELINE
Everyone
Produces
Data
3V
Hadoop
and Map
Reduce
Social
Media and
Web 2.0
Big Data
Projects
5 Exabyte
Till Now vs
Two Years
Big Data
Buzz word
Data
Scientists
Genome
Decoding
Google
Largest Big
Data
2015201420132012201020092006200520022000
WWW.TIC.OM
BIG DATA TIMELINE
Iot and Big
Data
Revolution
20172016
• Year of big data Revolution
• Big data becomes fast and approachable
• Artificial Intelligence and Augmented Intelligence annual growth 34%
• Big data (scientists, engineers and analyst) most demanding jobs
• 100 times better performance computers
• GPU and HPC
• Hadoop , Hive, Presto, Impala and Spark
• Hadoop and enterprise standards.
• In-Memory Computing - in-memory data grids (IMDGs)
• IoT will grow up further
• Machine learning and Operational Intelligence
• Many big data ideas
• Business Intelligence
• Cloud – Big data as a service
• Spark
• Convergence of IoT, cloud, and big data create new opportunities for
self-service analytics
• DNA Storage
WWW.TIC.OM
Why big data?
Some Facts
WWW.TIC.OM
2.12.1
2.12.4
2.13.2
2.13.7
2012 2014 2015 2016 2017
2.13.4
In Billions
GLOBAL INTERNET POPULATION
WWW.TIC.OM
World Regions Population
( 2017 Est.)
Population
% of World
Internet Users
31 Dec 2016
Penetration
Rate (% Pop.)
Growth
2000-2017
Table
% Users
Asia 4,148,177,672 55.2 % 1,856,212,654 44.7 % 1,523.9% 50.2 %
Europe 822,710,362 10.9 % 630,708,269 76.7 % 500.1% 17.1 %
Latin America / Caribbean 647,604,645 8.6 % 384,766,521 59.4 % 2,029.4% 10.4 %
Africa 1,246,504,865 16.6 % 335,453,374 26.9 % 7,330.7% 9.1 %
North America 363,224,006 4.8 % 320,067,193 88.1 % 196.1% 8.7 %
Middle East 250,327,574 3.3 % 141,489,765 56.5 % 4,207.4% 3.8 %
Oceania / Australia 40,479,846 0.6 % 27,540,654 68.0 % 261.4% 0.7 %
WORLD TOTAL 7,519,028,970 100.0 % 3,696,238,430 49.2 % 923.9% 100.0 %
WORLD INTERNET USAGE AND POPULATION STATISTICS
MARCH 4, 2017 - Update
WWW.TIC.OM
2017
Big Data Facts
WWW.TIC.OM
CURRENT
Big Data Facts
WWW.TIC.OM
Internet Minute
• 701,389 logins on Facebook
• 69,444 hours watched on Netflix
• 150 million emails sent
• 1,389 Uber rides
• 527,760 photos shared on Snapchat
• 51,000 app downloads on Apple’s App Store
• $203,596 in sales on Amazon.com
• 120+ new Linkedin accounts
• 347,222 tweets on Twitter
• 28,194 new posts to Instagram
• 38,052 hours of music listened to on Spotify
• 1.04 million vine loops
• 2.4 million search queries on Google
• 972,222 Tinder swipes
• 2.78 million video views on Youtube
• 20.8 million messages on WhatsApp
WWW.TIC.OM
Human and Devices
WWW.TIC.OM
IoTs (Sensors and controls)
WWW.TIC.OM
WHAT GENERATES THE DATA
WWW.TIC.OM
WHAT GENERATES THE DATA
WWW.TIC.OM
The Earthscope
• world's largest science project
• 67 terabytes of data.
WWW.TIC.OM
Maximilien Brice, © CERN
CERN’s Large Hydron Collider (LHC) generates 15 PB a year
LHC
WWW.TIC.OM
A whopping 90%
of the data that
currently exists
was created in
just the last two
years
Why big?
3.7 Billion
People, 25
Billion Sensors,
Devices
connected.
WWW.TIC.OM
BIG DATA
The 3 V's - the data Volume, Variety and Velocity- create challenges
WWW.TIC.OM
WWW.TIC.OM
5 exabyte of data every 2 days
2020 – Big data and analytics market will reach $ 202b
WWW.TIC.OM
PRACTICAL BENEFITS
BIG DATA IMPLEMENTATIONS
WWW.TIC.OM
PRACTICAL BENFITS
BIG DATA
1. Dialogue with consumers
2. Re-develop your products
3. Perform risk analysis
4. Keeping your data safe
5. Create new revenue streams
6. Customize your website in real time
7. Reducing maintenance costs
8. Offering tailored healthcare
9. Offering enterprise-wide insights
10. Making our cities smarter
WWW.TIC.OM
PRODUCT FACTOR
In addition to capital, commodities
and labor force data are the fourth
production factors of the digital
economy.
DATA STRUCTURE
The most unstructured databases
in business can be structured for
analysis.
RANGE OPTIMIZATION
In particular, areas such as
development, sales, production,
organization and management
are appointed for Big Data.
IN THE COMPANY
Why, for whom and for what?
WWW.TIC.OM
• Relevant to more and more organizations.
• New field of applications.
• Large volume and generate automatically and continuedly.
• Various data sources
• Limitations for analyzing
• Complexity and speed limitations
IN THE COMPANY
Enabler
WWW.TIC.OM
TRANSPARENCY
Transparency helps all
those involved to
access information at
the same time. The
value cham can
therein be maximized.
FORECAST
Big Data offers the
opportunity for real
time performance
monitoring and to
execute extensive
simulations
CUSTOMER
FOCUS
Can be cut to size
through detailed
customer
segmentation
services.
ANALYSIS
Through real-time
analysis, automated
decisions are possible.
Alternatively, a
decisIon basis for
management can be
created.
INNOVATION
Big Data promotes the
opportunity for real-
time performance
monitorIng and
extensive simulations
to operate.
IN THE COMPANY
ECONOMIC FACTORS
WWW.TIC.OM
TEAM COLLABORATION MOBILE DATA OF
TABLETS AND SMARTPHONES
COMMUNICATION DATA CLOUD APPLICATIONS
AUTOMATED MACHINES SOCIAL MEDIA
E-COMMERCE AUDIO/VIDEO DATA
IN THE COMPANY
DATA SOURCES
WWW.TIC.OM
IN THE COMPANY
DATA ANALYTICS
WWW.TIC.OM
WWW.TIC.OM
WWW.TIC.OM
INFOGRAPHICS
Big Data Facts
WWW.TIC.OM
INFOGRAPHICS
Big Data Facts
WWW.TIC.OM
Salesforce Research
IN THE COMPANY
DATA ANALYZED
WWW.TIC.OM
● Clickstream analysis, buying patterns
● Sentiment Analysis
● Fraud detection; forensics analysis
● Machine learning based investment strategies
● Healthcare research
● Prediction and prevention of equipment failure
● Predicting epedmics using searches
● Finding correleations between different trends
● Personlizations/predective anlytics
● GPS monitoring and tracking
● Risk Analysis and management
● Identifying patterns in sensor data to predict issue.
● And many more….
Big data benefits various sectors
WWW.TIC.OM
HEALTH CARE
WWW.TIC.OM
HEALTH CARE VS BIG DATA – PERSONAL
WWW.TIC.OM
BIG DATA
WWW.TIC.OM
GENOME SEQUENCING COST
WWW.TIC.OM
BIG DATA DEFINED
WWW.TIC.OM
BIG DATA DEFINED
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
WWW.TIC.OM
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
BIG DATA DEFINED
WWW.TIC.OM
Big Data Defined
large data volumes in the range of many Terabytes and more – multiple petabytes is
absolutely realistic, various data types (structured, unstructured, semi-structured
and poly-structured data) from versatile data sources which are often physically
distributed. Quite often, data is generated at high velocity and needs to be
processed and analysed in real-time. Sometimes data expires at the same high
velocity as it is generated. From a content perspective, data can even be ambiguous,
which makes its interpretation quite challenging.
WWW.TIC.OM
Big Data Defined
“Big data are high volume, high velocity, and high variety information assets that
require new forms of processing to enable enhanced decision making, insight
discovery and process optimization” (Gartner 2012)
WWW.TIC.OM
● Data which is “big” in these 3 dimensions
○ Volume : Lots of data being collected 90%
of data the data in the world were colleted
in last two years.
○ Velocity : Data is being generated quickly
and we need to deal with it.
○ Variety : Structured, Unstructured,
3 Vs of big Data
Image Source : GITS
WWW.TIC.OM
3 Vs of big Data
There is 4th V of data
WWW.TIC.OM
4th V
● The trustworthiness of the data which is
captured, in terms of accuracy.
● uncertain or imprecise data
● inherent discrepancies in all the data collected
WWW.TIC.OM
Other Characteristics
Many definitions. Often defined in terms of 3,4,5,7,9 10 Vs
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Variability – inconsistencies in data and inconstant speed at which big data is
loaded to database.
6. Validity – similar to veracity but how correct the data is for indented use.
7. Vulnerability – Security concerns and hacking attempts
8. Volatility – How long the data needs to be kept for?
9. Visualization – How challenging it is to visualize, ways to represent the
information.
10.Value - Business Value from the Data
WWW.TIC.OM
Big data redefined
Big data is high volume, high velocity, and/or high variety information assets that require new forms of
processing to enable enhanced decision making, insight discovery and process optimization.
—-Doug Laney
Gartner Analyst, Chief Data Officer research & advisory team. Data & Analytics Strategy, Infonomics, Big Data. Info Innovation
WWW.TIC.OM
Big Data
Big Data - Value
Technology and
Architecture
WWW.TIC.OM
Big Data Defined
100s of TB – x PB
Uses Hadoop
Three Vs
Too big for OLTP
Uses distributed/parallel processing
WWW.TIC.OM
Commodity
hardware
compatibility
Reduction in
storage cost
Open source
ecosystem
The web
economy
Economics
Community
BIG DATA ENABLER
WWW.TIC.OM
Architecture
BIG DATA
WWW.TIC.OM
BIG DATA STEPS INVOLVED
Analyze
Data
Store
Data
Process
Data
Collect
Data
Data Sources
Tools
Storage
Solutions
Result (end user
Application)
Serve
Data
WWW.TIC.OM
● Capture – distributed database, appends only logs, queues
● Store – horizontally scalable system, usage patterns based data
● Search – optimized for searching
● Process – mapreduce, queues, spark jobs
● Analyze –mapreduce, spark, hive, pig
● Visualize – chart and graphs on hive
● Intergate – with existing system, datbases
Big Data and Platform requirements
WWW.TIC.OM
Architecture – Source Oracle
WWW.TIC.OM
Data Lake
WWW.TIC.OM
Data Lake
WWW.TIC.OM
Analyze Data
Store
DataProcess
Data
Tools
Storage Solutions Serve Data
Data Sources
Collect
Data
WWW.TIC.OM
HADOOP
WWW.TIC.OM
WWW.TIC.OM
● Opensource apache project
● Distrubuted fault tolerant data storage and batch processing
● Provides linear scalability on community hardware
● Flexible , scalable and free.
Hadoop
WWW.TIC.OM
Technology and Architecture
WWW.TIC.OM
● Unix file like system
● Splitting of large files into blocks
● Distribution and replication into various
nodes
● Master namenode and many data nodes
● Master namenode and many data nodes
● Name node : has namespaces which stores
the block to location.
● Datanode : Stores block to local disk,
heartbeats, reports, replications
HDFS
WWW.TIC.OM
MapReduce
• Map step : split the data and pre-process
it
• Reduce Step : aggregates the result
• Most typical of Hadoop but employed by
others, to various extent.
• First used by Google
• Google discarded it now and no plan to
continue.
WWW.TIC.OM
Cloudera
• Commercial Hadoop
• Enterprise solution
• Data security
• Doesn’t use Map Reduce now.
WWW.TIC.OM
Spark
• 2016 a great year for spark.
• Apache Spark 2.0 in 2016
• Cluster-computing framework
• Open source
• Hadoop open source community
• Apache top level project.
• Top of Hadoop file system
• Not tied to map reduce paradigm
• MapReduce is strictly disk-based
• Spark 100 times faster than Hadoop
• In Memory cluster computer
• Scala, Java and Python
• Doesn't have its own distributed filesystem, but can use HDFS.
WWW.TIC.OM
Data bricks
• Commercial Tool of
• Production
• Exploration
• Security
• Spart in cloud
hive
• Apache Hive ™ data warehouse software
• Reading/Writing and Managing large datasets
• Distributed storage.
• Facebook
WWW.TIC.OM
R
WWW.TIC.OM
PLATFORM
BIG DATA STORAGE
WWW.TIC.OM
Hardware
Specs 2010
Storage 100MB/s
Network 1Gbps
CPU 3 Ghz
2017
1000MB/s (SSD)
10Gbps
3 Ghz
Improvement
10 X
10 X

• The removal of virtualization layers.
• Acceleration technologies, such as GPUs and NVMe
• Optimal placement of storage and compute.
• High-capacity, nonblocking networking.
WWW.TIC.OM
Infrastructure with all tools
Store and Query
Many hardware vendors
Storage at Cloud
Fully-engineered, enterprise-grade big data solution.
Modern Data Architecture (MDA)
EMC Business Data Lake.
BIG DATA PLATFORMS
WWW.TIC.OM
WWW.TIC.OM
EMC
WWW.TIC.OM
Microsoft R Server
WWW.TIC.OM
Oceanstore 9000
WWW.TIC.OM
Biological Computing and Storage
BIG DATA : Nature has Solution
WWW.TIC.OM
Personal Data Storage
+ Cloud
2001 2017
What about big data?
X 90,000
2030
WWW.TIC.OM
Modern archiving technology cannot
keep up with the growing tsunami of
bits. But nature may hold an answer
to that problem already.
Big data storage
WWW.TIC.OM
All the world’s data can fit on a DNA
hard drive the size of a teaspoon
DNA Storage
WWW.TIC.OM
A bioengineer and geneticist at Harvard’s Wyss Institute
have successfully stored 5.5 petabits of data — around
700 terabytes — in a single gram of DNA, smashing the
previous DNA data density record by a thousand times.
DNA Storage
WWW.TIC.OM
DNA Storage
Hard Drives DNA Storage
3TB X 233 Hard Drives World’s data in a teaspoon size
drive
151 kg 1 gram
10 Years Lifetime
WWW.TIC.OM
011001101010001 ATGCTCGAAGCT
WWW.TIC.OM
Basic building blocks
DNA
Cell
nucleus
chromosome
genes
WWW.TIC.OM
102
DNA Structure
WWW.TIC.OM
DNA Self-refrential
WWW.TIC.OM
Data Science
BIG DATA
WWW.TIC.OM
●Then and Now
●Information becomes driving force.
●Complexity
●Processes
Data Science
WWW.TIC.OM
Skill set
WWW.TIC.OM
• Data Scientist
• Sophisticated team of developers
• Analysts
• Education Resources
Lack of Talent
2018 - the USA alone will face a shortage of 140.000 – 190.000 data scientist as well as 1.5 million data
managers.
WWW.TIC.OM
Big data – big team
WWW.TIC.OM
WWW.TIC.OM
HeadquartersOffice No. Z-215, 2nd Floor KOM4
Knowledge Oasis Muscat
Sultanate of Oman
amjid@tic.om
@ticllc
@tic_oman
+theintegratedconnection+968 24166290
Amjid Ali
Head of Business
The Integrated Connection LLC

Big data 2017 final

  • 1.
  • 2.
    WWW.TIC.OM INNOVATIVE & LEADINGEDGE IT SOLUTIONS It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle
  • 3.
  • 4.
    WWW.TIC.OM Big Data andStorage Smart Talks Amjid Ali Head of Business - TIC
  • 5.
    WWW.TIC.OM It is acapital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts. Sir Arthur Conan Doyle
  • 6.
  • 7.
    WWW.TIC.OM Agenda Big Data andStorage • Introduction • Data Generations / Timeline • Why big data? – Users vs Devices and IoTs • Practical Benefits • Big data defined. • Landscape • Storage • Next Generation Storage
  • 8.
    WWW.TIC.OM ●Every object onthe earth will be generating data. ●Digital format of Information ●Quick search through tons of information. ●We are exposed to vast ocean of data. ●What we buy, where we go, what we say, what we do is all been recorded forever. HUMAN FACE OF BIG DATA
  • 9.
    WWW.TIC.OM ●Buzz word since2012 ●Data, small data, big data. ●Exceed the processing capacity of conventional data ●All data is not being analyzed. INTRODUCTION
  • 10.
    WWW.TIC.OM • Data is“data” what is big? • Cannot be analyzed using traditional computing techniques. • Storage • Processing • Visualization INTRODUCTION - BIG DATA
  • 11.
  • 12.
    WWW.TIC.OM • Relevant tomore and more organizations. • New field of applications. • Large volume and generate automatically and continuedly. • Various data sources • Limitations for analyzing • Complexity and speed limitations INTRODUCTION - BIG DATA
  • 13.
  • 14.
    WWW.TIC.OM BIG DATA TIMELINE “informationexplosion” (a term first used in 1941, according to the Oxford English Dictionary). 2030 – to start all the data generated 6X size of greater London data center will be required.
  • 15.
    WWW.TIC.OM BIG DATA TIMELINE Overload Census Punch Cards Accounting Machine Library Rate of Transmissi on Storage Capacity Predict Big Data Visualizatio n 1999199619901971196719441927191018901880
  • 16.
    WWW.TIC.OM BIG DATA TIMELINE Everyone Produces Data 3V Hadoop andMap Reduce Social Media and Web 2.0 Big Data Projects 5 Exabyte Till Now vs Two Years Big Data Buzz word Data Scientists Genome Decoding Google Largest Big Data 2015201420132012201020092006200520022000
  • 17.
    WWW.TIC.OM BIG DATA TIMELINE Iotand Big Data Revolution 20172016 • Year of big data Revolution • Big data becomes fast and approachable • Artificial Intelligence and Augmented Intelligence annual growth 34% • Big data (scientists, engineers and analyst) most demanding jobs • 100 times better performance computers • GPU and HPC • Hadoop , Hive, Presto, Impala and Spark • Hadoop and enterprise standards. • In-Memory Computing - in-memory data grids (IMDGs) • IoT will grow up further • Machine learning and Operational Intelligence • Many big data ideas • Business Intelligence • Cloud – Big data as a service • Spark • Convergence of IoT, cloud, and big data create new opportunities for self-service analytics • DNA Storage
  • 18.
  • 19.
    WWW.TIC.OM 2.12.1 2.12.4 2.13.2 2.13.7 2012 2014 20152016 2017 2.13.4 In Billions GLOBAL INTERNET POPULATION
  • 20.
    WWW.TIC.OM World Regions Population (2017 Est.) Population % of World Internet Users 31 Dec 2016 Penetration Rate (% Pop.) Growth 2000-2017 Table % Users Asia 4,148,177,672 55.2 % 1,856,212,654 44.7 % 1,523.9% 50.2 % Europe 822,710,362 10.9 % 630,708,269 76.7 % 500.1% 17.1 % Latin America / Caribbean 647,604,645 8.6 % 384,766,521 59.4 % 2,029.4% 10.4 % Africa 1,246,504,865 16.6 % 335,453,374 26.9 % 7,330.7% 9.1 % North America 363,224,006 4.8 % 320,067,193 88.1 % 196.1% 8.7 % Middle East 250,327,574 3.3 % 141,489,765 56.5 % 4,207.4% 3.8 % Oceania / Australia 40,479,846 0.6 % 27,540,654 68.0 % 261.4% 0.7 % WORLD TOTAL 7,519,028,970 100.0 % 3,696,238,430 49.2 % 923.9% 100.0 % WORLD INTERNET USAGE AND POPULATION STATISTICS MARCH 4, 2017 - Update
  • 21.
  • 22.
  • 23.
    WWW.TIC.OM Internet Minute • 701,389logins on Facebook • 69,444 hours watched on Netflix • 150 million emails sent • 1,389 Uber rides • 527,760 photos shared on Snapchat • 51,000 app downloads on Apple’s App Store • $203,596 in sales on Amazon.com • 120+ new Linkedin accounts • 347,222 tweets on Twitter • 28,194 new posts to Instagram • 38,052 hours of music listened to on Spotify • 1.04 million vine loops • 2.4 million search queries on Google • 972,222 Tinder swipes • 2.78 million video views on Youtube • 20.8 million messages on WhatsApp
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    WWW.TIC.OM The Earthscope • world'slargest science project • 67 terabytes of data.
  • 29.
    WWW.TIC.OM Maximilien Brice, ©CERN CERN’s Large Hydron Collider (LHC) generates 15 PB a year LHC
  • 30.
    WWW.TIC.OM A whopping 90% ofthe data that currently exists was created in just the last two years Why big? 3.7 Billion People, 25 Billion Sensors, Devices connected.
  • 31.
    WWW.TIC.OM BIG DATA The 3V's - the data Volume, Variety and Velocity- create challenges
  • 32.
  • 33.
    WWW.TIC.OM 5 exabyte ofdata every 2 days 2020 – Big data and analytics market will reach $ 202b
  • 34.
  • 35.
    WWW.TIC.OM PRACTICAL BENFITS BIG DATA 1.Dialogue with consumers 2. Re-develop your products 3. Perform risk analysis 4. Keeping your data safe 5. Create new revenue streams 6. Customize your website in real time 7. Reducing maintenance costs 8. Offering tailored healthcare 9. Offering enterprise-wide insights 10. Making our cities smarter
  • 36.
    WWW.TIC.OM PRODUCT FACTOR In additionto capital, commodities and labor force data are the fourth production factors of the digital economy. DATA STRUCTURE The most unstructured databases in business can be structured for analysis. RANGE OPTIMIZATION In particular, areas such as development, sales, production, organization and management are appointed for Big Data. IN THE COMPANY Why, for whom and for what?
  • 37.
    WWW.TIC.OM • Relevant tomore and more organizations. • New field of applications. • Large volume and generate automatically and continuedly. • Various data sources • Limitations for analyzing • Complexity and speed limitations IN THE COMPANY Enabler
  • 38.
    WWW.TIC.OM TRANSPARENCY Transparency helps all thoseinvolved to access information at the same time. The value cham can therein be maximized. FORECAST Big Data offers the opportunity for real time performance monitoring and to execute extensive simulations CUSTOMER FOCUS Can be cut to size through detailed customer segmentation services. ANALYSIS Through real-time analysis, automated decisions are possible. Alternatively, a decisIon basis for management can be created. INNOVATION Big Data promotes the opportunity for real- time performance monitorIng and extensive simulations to operate. IN THE COMPANY ECONOMIC FACTORS
  • 39.
    WWW.TIC.OM TEAM COLLABORATION MOBILEDATA OF TABLETS AND SMARTPHONES COMMUNICATION DATA CLOUD APPLICATIONS AUTOMATED MACHINES SOCIAL MEDIA E-COMMERCE AUDIO/VIDEO DATA IN THE COMPANY DATA SOURCES
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
    WWW.TIC.OM ● Clickstream analysis,buying patterns ● Sentiment Analysis ● Fraud detection; forensics analysis ● Machine learning based investment strategies ● Healthcare research ● Prediction and prevention of equipment failure ● Predicting epedmics using searches ● Finding correleations between different trends ● Personlizations/predective anlytics ● GPS monitoring and tracking ● Risk Analysis and management ● Identifying patterns in sensor data to predict issue. ● And many more…. Big data benefits various sectors
  • 47.
  • 48.
    WWW.TIC.OM HEALTH CARE VSBIG DATA – PERSONAL
  • 49.
  • 50.
  • 51.
  • 52.
    WWW.TIC.OM BIG DATA DEFINED 100sof TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing
  • 53.
    WWW.TIC.OM 100s of TB– x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  • 54.
    WWW.TIC.OM 100s of TB– x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  • 55.
    WWW.TIC.OM 100s of TB– x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  • 56.
    WWW.TIC.OM 100s of TB– x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  • 57.
    WWW.TIC.OM 100s of TB– x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  • 58.
    WWW.TIC.OM 100s of TB– x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  • 59.
    WWW.TIC.OM 100s of TB– x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing BIG DATA DEFINED
  • 60.
    WWW.TIC.OM Big Data Defined largedata volumes in the range of many Terabytes and more – multiple petabytes is absolutely realistic, various data types (structured, unstructured, semi-structured and poly-structured data) from versatile data sources which are often physically distributed. Quite often, data is generated at high velocity and needs to be processed and analysed in real-time. Sometimes data expires at the same high velocity as it is generated. From a content perspective, data can even be ambiguous, which makes its interpretation quite challenging.
  • 61.
    WWW.TIC.OM Big Data Defined “Bigdata are high volume, high velocity, and high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization” (Gartner 2012)
  • 62.
    WWW.TIC.OM ● Data whichis “big” in these 3 dimensions ○ Volume : Lots of data being collected 90% of data the data in the world were colleted in last two years. ○ Velocity : Data is being generated quickly and we need to deal with it. ○ Variety : Structured, Unstructured, 3 Vs of big Data Image Source : GITS
  • 63.
    WWW.TIC.OM 3 Vs ofbig Data There is 4th V of data
  • 64.
    WWW.TIC.OM 4th V ● Thetrustworthiness of the data which is captured, in terms of accuracy. ● uncertain or imprecise data ● inherent discrepancies in all the data collected
  • 65.
    WWW.TIC.OM Other Characteristics Many definitions.Often defined in terms of 3,4,5,7,9 10 Vs 1. Volume 2. Velocity 3. Variety 4. Veracity 5. Variability – inconsistencies in data and inconstant speed at which big data is loaded to database. 6. Validity – similar to veracity but how correct the data is for indented use. 7. Vulnerability – Security concerns and hacking attempts 8. Volatility – How long the data needs to be kept for? 9. Visualization – How challenging it is to visualize, ways to represent the information. 10.Value - Business Value from the Data
  • 66.
    WWW.TIC.OM Big data redefined Bigdata is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. —-Doug Laney Gartner Analyst, Chief Data Officer research & advisory team. Data & Analytics Strategy, Infonomics, Big Data. Info Innovation
  • 67.
    WWW.TIC.OM Big Data Big Data- Value Technology and Architecture
  • 68.
    WWW.TIC.OM Big Data Defined 100sof TB – x PB Uses Hadoop Three Vs Too big for OLTP Uses distributed/parallel processing
  • 69.
    WWW.TIC.OM Commodity hardware compatibility Reduction in storage cost Opensource ecosystem The web economy Economics Community BIG DATA ENABLER
  • 70.
  • 71.
    WWW.TIC.OM BIG DATA STEPSINVOLVED Analyze Data Store Data Process Data Collect Data Data Sources Tools Storage Solutions Result (end user Application) Serve Data
  • 72.
    WWW.TIC.OM ● Capture –distributed database, appends only logs, queues ● Store – horizontally scalable system, usage patterns based data ● Search – optimized for searching ● Process – mapreduce, queues, spark jobs ● Analyze –mapreduce, spark, hive, pig ● Visualize – chart and graphs on hive ● Intergate – with existing system, datbases Big Data and Platform requirements
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
    WWW.TIC.OM ● Opensource apacheproject ● Distrubuted fault tolerant data storage and batch processing ● Provides linear scalability on community hardware ● Flexible , scalable and free. Hadoop
  • 80.
  • 81.
    WWW.TIC.OM ● Unix filelike system ● Splitting of large files into blocks ● Distribution and replication into various nodes ● Master namenode and many data nodes ● Master namenode and many data nodes ● Name node : has namespaces which stores the block to location. ● Datanode : Stores block to local disk, heartbeats, reports, replications HDFS
  • 82.
    WWW.TIC.OM MapReduce • Map step: split the data and pre-process it • Reduce Step : aggregates the result • Most typical of Hadoop but employed by others, to various extent. • First used by Google • Google discarded it now and no plan to continue.
  • 83.
    WWW.TIC.OM Cloudera • Commercial Hadoop •Enterprise solution • Data security • Doesn’t use Map Reduce now.
  • 84.
    WWW.TIC.OM Spark • 2016 agreat year for spark. • Apache Spark 2.0 in 2016 • Cluster-computing framework • Open source • Hadoop open source community • Apache top level project. • Top of Hadoop file system • Not tied to map reduce paradigm • MapReduce is strictly disk-based • Spark 100 times faster than Hadoop • In Memory cluster computer • Scala, Java and Python • Doesn't have its own distributed filesystem, but can use HDFS.
  • 85.
    WWW.TIC.OM Data bricks • CommercialTool of • Production • Exploration • Security • Spart in cloud hive • Apache Hive ™ data warehouse software • Reading/Writing and Managing large datasets • Distributed storage. • Facebook
  • 86.
  • 87.
  • 88.
    WWW.TIC.OM Hardware Specs 2010 Storage 100MB/s Network1Gbps CPU 3 Ghz 2017 1000MB/s (SSD) 10Gbps 3 Ghz Improvement 10 X 10 X  • The removal of virtualization layers. • Acceleration technologies, such as GPUs and NVMe • Optimal placement of storage and compute. • High-capacity, nonblocking networking.
  • 89.
    WWW.TIC.OM Infrastructure with alltools Store and Query Many hardware vendors Storage at Cloud Fully-engineered, enterprise-grade big data solution. Modern Data Architecture (MDA) EMC Business Data Lake. BIG DATA PLATFORMS
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
    WWW.TIC.OM Biological Computing andStorage BIG DATA : Nature has Solution
  • 95.
    WWW.TIC.OM Personal Data Storage +Cloud 2001 2017 What about big data? X 90,000 2030
  • 96.
    WWW.TIC.OM Modern archiving technologycannot keep up with the growing tsunami of bits. But nature may hold an answer to that problem already. Big data storage
  • 97.
    WWW.TIC.OM All the world’sdata can fit on a DNA hard drive the size of a teaspoon DNA Storage
  • 98.
    WWW.TIC.OM A bioengineer andgeneticist at Harvard’s Wyss Institute have successfully stored 5.5 petabits of data — around 700 terabytes — in a single gram of DNA, smashing the previous DNA data density record by a thousand times. DNA Storage
  • 99.
    WWW.TIC.OM DNA Storage Hard DrivesDNA Storage 3TB X 233 Hard Drives World’s data in a teaspoon size drive 151 kg 1 gram 10 Years Lifetime
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
    WWW.TIC.OM ●Then and Now ●Informationbecomes driving force. ●Complexity ●Processes Data Science
  • 106.
  • 107.
    WWW.TIC.OM • Data Scientist •Sophisticated team of developers • Analysts • Education Resources Lack of Talent 2018 - the USA alone will face a shortage of 140.000 – 190.000 data scientist as well as 1.5 million data managers.
  • 108.
  • 109.
  • 110.
    WWW.TIC.OM HeadquartersOffice No. Z-215,2nd Floor KOM4 Knowledge Oasis Muscat Sultanate of Oman amjid@tic.om @ticllc @tic_oman +theintegratedconnection+968 24166290 Amjid Ali Head of Business The Integrated Connection LLC