SlideShare a Scribd company logo
1 of 25
Big Data
Big Data
• What is Big Data?
• Analog starage vs digital.
• The FOUR V’s of Big Data.
• Who’s Generating Big Data
• The importance of Big Data.
• Optimalization
• HDFC
Definition
Big datais the term for a collection
of data sets so large and complex
that it becomes difficult to
process using on-hand database
management tools or traditional
data processing applications. The
challenges include capture,
curation, storage, search,
sharing, transfer, analysis, and
visualization.
The FOUR V’s of Big Data
From traffic patterns and music downloads to web
history and medical records, data is recorded,
stored, and analyzed to enable that technology
and services that the world relies on every day.
But what exactly is big data be used?
According to IBM scientists big data can be break
into four dimensions: Volume, Velocity, Variety
and Veracity.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Volume. Many factors contribute to the increase in
data volume. Transaction-based data stored
through the years. Unstructured data streaming
in from social media. Increasing amounts of
sensor and machine-to-machine data being
collected. In the past, excessive data volume was
a storage issue. But with decreasing storage
costs, other issues emerge, including how to
determine relevance within large data volumes
and how to use analytics to create value from
relevant data.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Variety. Data today comes in all types of formats.
Structured, numeric data in traditional databases.
Information created from line-of-business
applications. Unstructured text documents,
email, video, audio, stock ticker data and
financial transactions. Managing, merging and
governing different varieties of data is something
many organizations still grapple with.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Velocity. Data is streaming in at unprecedented
speed and must be dealt with in a timely manner.
RFID tags, sensors and smart metering are driving
the need to deal with torrents of data in near-
real time. Reacting quickly enough to deal with
data velocity is a challenge for most
organizations.
The FOUR V’s of Big Data
The FOUR V’s of Big Data
Veracity - Big Data Veracity refers to the biases,
noise and abnormality in data. Is the data that is
being stored, and mined meaningful to the
problem being analyzed. Inderpal feel veracity in
data analysis is the biggest challenge when
compares to things like volume and velocity. In
scoping out your big data strategy you need to
have your team and partners work to help keep
your data clean and processes to keep ‘dirty data’
from accumulating in your systems.
Who’s Generating Big Data
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
• The progress and innovation is no longer hindered by the ability to collect data
• But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
15
The importance of Big Data
The real issue is not that you are acquiring large
amounts of data. It's what you do with the data that
counts. The hopeful vision is that organizations will
be able to take data from any source, harness
relevant data and analyze it to find answers that
enable:
• Cost reductions
• Time reductions
• New product development and optimized offerings
• Smarter business decision making
The importance of Big Data
For instance, by combining big data and high-powered analytics, it is possible
to:
• Determine root causes of failures, issues and defects in near-real time,
potentially saving billions of dollars annually.
• Optimize routes for many thousands of package delivery vehicles while
they are on the road.
• Analyze millions of SKUs to determine prices that maximize profit and
clear inventory.
• Generate retail coupons at the point of sale based on the customer's
current and past purchases.
• Send tailored recommendations to mobile devices while customers are in
the right area to take advantage of offers.
• Recalculate entire risk portfolios in minutes.
• Quickly identify customers who matter the most.
• Use clickstream analysis and data mining to detect fraudulent behavior
HDFS / Hadoop
Data in a HDFS cluster is broken down into
smaller pieces (called blocks) and
distributed throughout the cluster. In this
way, the map and reduce functions can
be executed on smaller subsets of your
larger data sets, and this provides the
scalability that is needed for big data
processing. The goal of Hadoop is to use
commonly available servers in a very
large cluster, where each server has a set
of inexpensive internal disk drives.
PROS OF HDFS
• Scalable – New nodes can be added as needed,
and added without needing to change data
formats, how data is loaded, how jobs are
written, or the applications on top.
• Cost effective – Hadoop brings massively parallel
computing to commodity servers. The result is a
sizeable decrease in the cost per terabyte of
storage, which in turn makes it affordable to
model all your data.
• Flexible – Hadoop is schema-less, and can absorb
any type of data, structured or not, from any
Sources
• McKinsey Global Institute
• Cisco
• Gartner
• EMC, SAS
• IBM
• MEPTEC
Thank you for your
attention.
Authors: Tomasz Wis
Krzysztof Rudnicki

More Related Content

What's hot

Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdfAnand572211
 
Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/PresentationKirtimaan Chhabra
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Hritika Raj
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Datachennaijp
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for BeginnersMichael Perez
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Anna Kuhn
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big datakk1718
 

What's hot (16)

Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Business intelligence architectures.pdf
Business intelligence architectures.pdfBusiness intelligence architectures.pdf
Business intelligence architectures.pdf
 
Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/Presentation
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
JPJ1417 Data Mining With Big Data
JPJ1417   Data Mining With Big DataJPJ1417   Data Mining With Big Data
JPJ1417 Data Mining With Big Data
 
Big data
Big dataBig data
Big data
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for Beginners
 
Big data
Big dataBig data
Big data
 
Overview of Big data(ppt)
Overview of Big data(ppt)Overview of Big data(ppt)
Overview of Big data(ppt)
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?Big Data vs. Small Data...what's the difference?
Big Data vs. Small Data...what's the difference?
 
Big data
Big dataBig data
Big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 

Viewers also liked

Viewers also liked (20)

Learn rubyintro
Learn rubyintroLearn rubyintro
Learn rubyintro
 
Reflection
ReflectionReflection
Reflection
 
Google mock for dummies
Google mock for dummiesGoogle mock for dummies
Google mock for dummies
 
Game theory
Game theoryGame theory
Game theory
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 
Exception
ExceptionException
Exception
 
Data visualization
Data visualizationData visualization
Data visualization
 
Linked list
Linked listLinked list
Linked list
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Python data structures
Python data structuresPython data structures
Python data structures
 
Stack queue
Stack queueStack queue
Stack queue
 
Java
JavaJava
Java
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
List in webpage
List in webpageList in webpage
List in webpage
 
Data and assessment
Data and assessmentData and assessment
Data and assessment
 
Maven
MavenMaven
Maven
 
Exception handling
Exception handlingException handling
Exception handling
 
Memory caching
Memory cachingMemory caching
Memory caching
 
Exception
ExceptionException
Exception
 
Exception handling
Exception handlingException handling
Exception handling
 

Similar to Big data

Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptalmaraniabwmalk
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptxAlbert Alex
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data miningPolash Halder
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analyticsGahya Pandian
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notesMohit Saini
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxYashiBatra1
 

Similar to Big data (20)

Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Unit 1
Unit 1Unit 1
Unit 1
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Understanding big data
Understanding big dataUnderstanding big data
Understanding big data
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
bigdataintro.pptx
bigdataintro.pptxbigdataintro.pptx
bigdataintro.pptx
 
Big data
Big dataBig data
Big data
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Big data and data mining
Big data and data miningBig data and data mining
Big data and data mining
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Guide to big data analytics
Guide to big data analyticsGuide to big data analytics
Guide to big data analytics
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptxUnit-1 -2-3- BDA PIET 6 AIDS.pptx
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
 

More from Tony Nguyen

Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysisTony Nguyen
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherenceTony Nguyen
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data miningTony Nguyen
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data miningTony Nguyen
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryTony Nguyen
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching worksTony Nguyen
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cacheTony Nguyen
 
Abstract data types
Abstract data typesAbstract data types
Abstract data typesTony Nguyen
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsTony Nguyen
 
Abstraction file
Abstraction fileAbstraction file
Abstraction fileTony Nguyen
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with javaTony Nguyen
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsTony Nguyen
 
Object oriented programming-with_java
Object oriented programming-with_javaObject oriented programming-with_java
Object oriented programming-with_javaTony Nguyen
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and pythonTony Nguyen
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with pythonTony Nguyen
 

More from Tony Nguyen (20)

Object oriented analysis
Object oriented analysisObject oriented analysis
Object oriented analysis
 
Directory based cache coherence
Directory based cache coherenceDirectory based cache coherence
Directory based cache coherence
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Big picture of data mining
Big picture of data miningBig picture of data mining
Big picture of data mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Cache recap
Cache recapCache recap
Cache recap
 
How analysis services caching works
How analysis services caching worksHow analysis services caching works
How analysis services caching works
 
Hardware managed cache
Hardware managed cacheHardware managed cache
Hardware managed cache
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Optimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessorsOptimizing shared caches in chip multiprocessors
Optimizing shared caches in chip multiprocessors
 
Abstract class
Abstract classAbstract class
Abstract class
 
Abstraction file
Abstraction fileAbstraction file
Abstraction file
 
Object model
Object modelObject model
Object model
 
Concurrency with java
Concurrency with javaConcurrency with java
Concurrency with java
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Inheritance
InheritanceInheritance
Inheritance
 
Object oriented programming-with_java
Object oriented programming-with_javaObject oriented programming-with_java
Object oriented programming-with_java
 
Cobol, lisp, and python
Cobol, lisp, and pythonCobol, lisp, and python
Cobol, lisp, and python
 
Extending burp with python
Extending burp with pythonExtending burp with python
Extending burp with python
 
Api crash
Api crashApi crash
Api crash
 

Recently uploaded

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Big data

  • 2.
  • 3. Big Data • What is Big Data? • Analog starage vs digital. • The FOUR V’s of Big Data. • Who’s Generating Big Data • The importance of Big Data. • Optimalization • HDFC
  • 4. Definition Big datais the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
  • 5.
  • 6. The FOUR V’s of Big Data From traffic patterns and music downloads to web history and medical records, data is recorded, stored, and analyzed to enable that technology and services that the world relies on every day. But what exactly is big data be used? According to IBM scientists big data can be break into four dimensions: Volume, Velocity, Variety and Veracity.
  • 7. The FOUR V’s of Big Data
  • 8. The FOUR V’s of Big Data Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
  • 9. The FOUR V’s of Big Data
  • 10. The FOUR V’s of Big Data Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with.
  • 11. The FOUR V’s of Big Data
  • 12. The FOUR V’s of Big Data Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near- real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations.
  • 13. The FOUR V’s of Big Data
  • 14. The FOUR V’s of Big Data Veracity - Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Inderpal feel veracity in data analysis is the biggest challenge when compares to things like volume and velocity. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems.
  • 15. Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) • The progress and innovation is no longer hindered by the ability to collect data • But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 15
  • 16. The importance of Big Data The real issue is not that you are acquiring large amounts of data. It's what you do with the data that counts. The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable: • Cost reductions • Time reductions • New product development and optimized offerings • Smarter business decision making
  • 17.
  • 18. The importance of Big Data For instance, by combining big data and high-powered analytics, it is possible to: • Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually. • Optimize routes for many thousands of package delivery vehicles while they are on the road. • Analyze millions of SKUs to determine prices that maximize profit and clear inventory. • Generate retail coupons at the point of sale based on the customer's current and past purchases. • Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers. • Recalculate entire risk portfolios in minutes. • Quickly identify customers who matter the most. • Use clickstream analysis and data mining to detect fraudulent behavior
  • 19. HDFS / Hadoop Data in a HDFS cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing. The goal of Hadoop is to use commonly available servers in a very large cluster, where each server has a set of inexpensive internal disk drives.
  • 20. PROS OF HDFS • Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. • Cost effective – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data. • Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any
  • 21.
  • 22.
  • 23.
  • 24. Sources • McKinsey Global Institute • Cisco • Gartner • EMC, SAS • IBM • MEPTEC
  • 25. Thank you for your attention. Authors: Tomasz Wis Krzysztof Rudnicki