SlideShare a Scribd company logo
1 of 25
Big Data
Big Data
• What is Big Data?
• Analog starage vs digital.
• The FOUR V’s of Big Data.
• Who’s Generating Big Data
• The importance of Big Data.
• Optimalization
• HDFC
Definition
Big datais the term for a collection
of data sets so large and complex
that it becomes difficult to
process using on-hand database
management tools or traditional
data processing applications. The
challenges include capture,
curation, storage, search,
sharing, transfer, analysis, and
visualization.
The FOUR V’s of Big Data
From traffic patterns and music downloads to web
history and medical records, data is recorded,
stored, and analyzed to enable that technology
and services that the world relies on every day.
But what exactly is big data be used?
According to IBM scientists big data can be break
into four dimensions: Volume, Velocity, Variety
and Veracity.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Volume. Many factors contribute to the increase in
data volume. Transaction-based data stored
through the years. Unstructured data streaming
in from social media. Increasing amounts of
sensor and machine-to-machine data being
collected. In the past, excessive data volume was
a storage issue. But with decreasing storage
costs, other issues emerge, including how to
determine relevance within large data volumes
and how to use analytics to create value from
relevant data.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Variety. Data today comes in all types of formats.
Structured, numeric data in traditional databases.
Information created from line-of-business
applications. Unstructured text documents,
email, video, audio, stock ticker data and
financial transactions. Managing, merging and
governing different varieties of data is something
many organizations still grapple with.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Velocity. Data is streaming in at unprecedented
speed and must be dealt with in a timely manner.
RFID tags, sensors and smart metering are driving
the need to deal with torrents of data in near-
real time. Reacting quickly enough to deal with
data velocity is a challenge for most
organizations.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Veracity - Big Data Veracity refers to the biases,
noise and abnormality in data. Is the data that is
being stored, and mined meaningful to the
problem being analyzed. Inderpal feel veracity in
data analysis is the biggest challenge when
compares to things like volume and velocity. In
scoping out your big data strategy you need to
have your team and partners work to help keep
your data clean and processes to keep ‘dirty data’
from accumulating in your systems.
Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
15
The importance of Big Data
The real issue is not that you are acquiring large
amounts of data. It's what you do with the data that
counts. The hopeful vision is that organizations will
be able to take data from any source, harness
relevant data and analyze it to find answers that
enable:
• Cost reductions
• Time reductions
• New product development and optimized offerings
• Smarter business decision making
The importance of Big Data
For instance, by combining big data and high-powered analytics, it is possible
to:
• Determine root causes of failures, issues and defects in near-real time,
potentially saving billions of dollars annually.
• Optimize routes for many thousands of package delivery vehicles while
they are on the road.
• Analyze millions of SKUs to determine prices that maximize profit and
clear inventory.
• Generate retail coupons at the point of sale based on the customer's
current and past purchases.
• Send tailored recommendations to mobile devices while customers are in
the right area to take advantage of offers.
• Recalculate entire risk portfolios in minutes.
• Quickly identify customers who matter the most.
• Use clickstream analysis and data mining to detect fraudulent behavior
HDFS / Hadoop
Data in a HDFS cluster is broken down into
smaller pieces (called blocks) and
distributed throughout the cluster. In this
way, the map and reduce functions can
be executed on smaller subsets of your
larger data sets, and this provides the
scalability that is needed for big data
processing. The goal of Hadoop is to use
commonly available servers in a very
large cluster, where each server has a set
of inexpensive internal disk drives.
PROS OF HDFS
• Scalable – New nodes can be added as needed,
and added without needing to change data
formats, how data is loaded, how jobs are
written, or the applications on top.
• Cost effective – Hadoop brings massively parallel
computing to commodity servers. The result is a
sizeable decrease in the cost per terabyte of
storage, which in turn makes it affordable to
model all your data.
• Flexible – Hadoop is schema-less, and can absorb
any type of data, structured or not, from any
Sources
• McKinsey Global Institute
• Cisco
• Gartner
• EMC, SAS
• IBM
• MEPTEC
Thank you for your
attention.
Authors: Tomasz Wis
Krzysztof Rudnicki

More Related Content

What's hot

Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdfAnand572211
 
Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/PresentationKirtimaan Chhabra
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Datachennaijp
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for BeginnersMichael Perez
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Anna Kuhn
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 

What's hot (16)

Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
 
Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/Presentation
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for Beginners
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?
 
Big data
Big dataBig data
Big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Viewers also liked

Viewers also liked (20)

Text categorization as a graph
Text categorization as a graphText categorization as a graph
Text categorization as a graph
 
Introduction to security_and_crypto
Introduction to security_and_cryptoIntroduction to security_and_crypto
Introduction to security_and_crypto
 
Prolog programming
Prolog programmingProlog programming
Prolog programming
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
Exception
ExceptionException
Exception
 
Xml schema
Xml schemaXml schema
Xml schema
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
Reflection
ReflectionReflection
Reflection
 
Exception handling
Exception handlingException handling
Exception handling
 
Crypto theory practice
Crypto theory practiceCrypto theory practice
Crypto theory practice
 
Aristokrat corporate
Aristokrat corporateAristokrat corporate
Aristokrat corporate
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Hash mac algorithms
Hash mac algorithmsHash mac algorithms
Hash mac algorithms
 
Text classification
Text classificationText classification
Text classification
 
Linked list
Linked listLinked list
Linked list
 
Game theory
Game theoryGame theory
Game theory
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Prolog resume
Prolog resumeProlog resume
Prolog resume
 
Overview prolog
Overview prologOverview prolog
Overview prolog
 

Similar to Big data

Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analyticsGahya Pandian
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxYashiBatra1
 

Similar to Big data (20)

Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big data
Big dataBig data
Big data
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analytics
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 

More from Young Alista

Google appenginejava.ppt
Google appenginejava.pptGoogle appenginejava.ppt
Google appenginejava.pptYoung Alista
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architecturesYoung Alista
 
Serialization/deserialization
Serialization/deserializationSerialization/deserialization
Serialization/deserializationYoung Alista
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data miningYoung Alista
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningYoung Alista
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryYoung Alista
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceYoung Alista
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksYoung Alista
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesYoung Alista
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaYoung Alista
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsYoung Alista
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and pythonYoung Alista
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisYoung Alista
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in pythonYoung Alista
 

More from Young Alista (20)

Google appenginejava.ppt
Google appenginejava.pptGoogle appenginejava.ppt
Google appenginejava.ppt
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
Serialization/deserialization
Serialization/deserializationSerialization/deserialization
Serialization/deserialization
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Cache recap
Cache recapCache recap
Cache recap
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Object model
Object modelObject model
Object model
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Abstract class
Abstract classAbstract class
Abstract class
 
Inheritance
InheritanceInheritance
Inheritance
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Programming for engineers in python
Programming for engineers in pythonProgramming for engineers in python
Programming for engineers in python
 
Api crash
Api crashApi crash
Api crash
 
Learning python
Learning pythonLearning python
Learning python
 

Recently uploaded

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Big data

  • 2.
  • 3. Big Data • What is Big Data? • Analog starage vs digital. • The FOUR V’s of Big Data. • Who’s Generating Big Data • The importance of Big Data. • Optimalization • HDFC
  • 4. Definition Big datais the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
  • 5.
  • 6. The FOUR V’s of Big Data From traffic patterns and music downloads to web history and medical records, data is recorded, stored, and analyzed to enable that technology and services that the world relies on every day. But what exactly is big data be used? According to IBM scientists big data can be break into four dimensions: Volume, Velocity, Variety and Veracity.
  • 7. The FOUR V’s of Big Data
  • 8. The FOUR V’s of Big Data Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
  • 9. The FOUR V’s of Big Data
  • 10. The FOUR V’s of Big Data Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.
  • 11. The FOUR V’s of Big Data
  • 12. The FOUR V’s of Big Data Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near- real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
  • 13. The FOUR V’s of Big Data
  • 14. The FOUR V’s of Big Data Veracity - Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Inderpal feel veracity in data analysis is the biggest challenge when compares to things like volume and velocity. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems.
  • 15. Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 15
  • 16. The importance of Big Data The real issue is not that you are acquiring large amounts of data. It's what you do with the data that counts. The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable: • Cost reductions • Time reductions • New product development and optimized offerings • Smarter business decision making
  • 17.
  • 18. The importance of Big Data For instance, by combining big data and high-powered analytics, it is possible to: • Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually. • Optimize routes for many thousands of package delivery vehicles while they are on the road. • Analyze millions of SKUs to determine prices that maximize profit and clear inventory. • Generate retail coupons at the point of sale based on the customer's current and past purchases. • Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers. • Recalculate entire risk portfolios in minutes. • Quickly identify customers who matter the most. • Use clickstream analysis and data mining to detect fraudulent behavior
  • 19. HDFS / Hadoop Data in a HDFS cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing. The goal of Hadoop is to use commonly available servers in a very large cluster, where each server has a set of inexpensive internal disk drives.
  • 20. PROS OF HDFS • Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. • Cost effective – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data. • Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any
  • 21.
  • 22.
  • 23.
  • 24. Sources • McKinsey Global Institute • Cisco • Gartner • EMC, SAS • IBM • MEPTEC
  • 25. Thank you for your attention. Authors: Tomasz Wis Krzysztof Rudnicki