xGem BigData

XGem
XGem
Big Data
Agenda
 What is Big Data?
 Big Data Technologies
 What is Hadoop?
 Big Data Components
 Hadoop Distributions
 HortonWorkd Data Platform
 Log Analyzed
What is Big Data?
Ernst and Young offers the following definition:
Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools and machines. It
requires new, innovative, and scalable technology to collect, host and analytically process the vast amount of data
gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity
management and enhanced shareholder value.
The research firm Gartner, defines Big Data as follows:
Big Data is high-volume, high-velocity, and/or high-variety information assets that demand cost-effective,
innovative forms of information processing that enable enhanced insight, decision making and process
automation.
5V’s del Big Data
BigData
3
Variety is the diversity of the data. We have structured
data that fits neatly into rows and columns, or
relational databases and unstructured data that is not
organized in a pre-defined way, for example Tweets,
blogposts, pictures, numbers, and even video data.
Variety
1
Velocity
Velocity is the idea that data is being generated
extremely fast, a process that never stops. Attributes
include near or real-time streaming and local and
cloud-based technologies that can process information
very quickly.
4
Veracity is the conformity to facts and accuracy.
Is the information real, or is it false?
Veracity
2
Volume
Volume is the scale of the data, or the increase in
the amount of data stored.
5
VALUE
Big Data
Value isn't just profit. It may be medical or social benefits, or
customer, employee, personal satisfaction or crime prevention. The
main reasons for why people invest time to understand Big Data is to
derive value from it.
VALUE
Big Data Technologies
What is Apache Hadoop?
• Hadoop is an open-source software
framework used to store and process huge
amounts of data.
• Owned by Apache Software Foundation
• Transforms commodity hardware into a
service that:
• Stores petabytes of data reliably (HDFS)
• Allows huge distributed computations
(MapReduce)
• Key attributes:
• Redundant and reliable
• Doesn’t stop or lose data even if hardware
fails
• Easy to program
• Extremely powerful
• Allows the development of big data
algorithms & tools
• Batch processing centric
• Runs on commodity hardware
• Computers & network
Who build Hadoop?
Who use Hadoop?
2006 2008 2009 2010
The Datagraph Blog
2007
How HDFS Works?
Namenode
Persistent Namespace
Metadata & Journal
Namespace
State
Block
Map
Heartbeats & Block Reports
Block ID  Block Locations
Datanodes
Block ID  Data
Hierarchal Namespace
File Name  BlockIDs
Horizontally Scale IO and Storage
b1
b5
b3
JBOD
BlockStorageNamespace
b2
b3
b1
JBOD
b3
b5
b2
JBOD
b1
b5
b2
JBOD
HDFS Data Reliability
Namenode
Namespace
State
Block
Map
b1
b5
b3
JBOD
b2
b3
b4
JBOD
b3
b5
b2
JBOD
b1b5
b2
JBOD
2. copy
3.
blockReceived
1.
replicate
Bad/lost block
replica
Periodically
check block
checksums
xGem BigData
What is the Hadoop framework?
Hadoop framework Components
Hadoop Distributions
Agenda
Hortonworks Solutions
Log Analytics Systems Today
LOG
ANALYTICS
PLATFORMNetwork
Device Logs
• Not all data can be captured
• Not all captured data is valuable
• Transport all data
LOG
ANALYTICS
PLATFORM
Network
Device Logs
HDP
HDF
2. Content-based routing based on dynamic
evaluation of content, attributes, priority
1. Integrate and enrich logs across
data centers and security zones
3. Cost effectively expand collection and grow
timescale of logs collected
Expand Storage Options of Log Data
Thanks!
1 of 20

Recommended

Introduction to BigData by
Introduction to BigData Introduction to BigData
Introduction to BigData Abdelkader OUARED
220 views50 slides
Big Data by
Big DataBig Data
Big DataPriyanka Tuteja
1.3K views27 slides
Big dataservicesatfidel by
Big dataservicesatfidelBig dataservicesatfidel
Big dataservicesatfidelFidel Softech P. Ltd
1.8K views10 slides
BDaas- BigData as a service by
BDaas- BigData as a service  BDaas- BigData as a service
BDaas- BigData as a service Agile Testing Alliance
1.1K views15 slides
Big Data by
Big DataBig Data
Big DataNeha Mehta
7.8K views43 slides
Cloudera 助力台灣大數據產業的發展 by
Cloudera 助力台灣大數據產業的發展Cloudera 助力台灣大數據產業的發展
Cloudera 助力台灣大數據產業的發展Etu Solution
3.2K views20 slides

More Related Content

What's hot

Keyrus US Information by
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
566 views8 slides
Big data analytics - Introduction to Big Data and Hadoop by
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and HadoopSamiraChandan
80 views21 slides
Big Idea For Big Data by
Big Idea For Big DataBig Idea For Big Data
Big Idea For Big DataDexlab Analytics
504 views14 slides
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter... by
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Denodo
155 views21 slides
Big Data Analytics MIS presentation by
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentationAASTHA PANDEY
4.7K views15 slides
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea... by
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo
316 views23 slides

What's hot(20)

Keyrus US Information by Julian Tong
Keyrus US InformationKeyrus US Information
Keyrus US Information
Julian Tong566 views
Big data analytics - Introduction to Big Data and Hadoop by SamiraChandan
Big data analytics - Introduction to Big Data and HadoopBig data analytics - Introduction to Big Data and Hadoop
Big data analytics - Introduction to Big Data and Hadoop
SamiraChandan80 views
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter... by Denodo
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Product Keynote: Denodo 8.0 - A Logical Data Fabric for the Intelligent Enter...
Denodo 155 views
Big Data Analytics MIS presentation by AASTHA PANDEY
Big Data Analytics MIS presentationBig Data Analytics MIS presentation
Big Data Analytics MIS presentation
AASTHA PANDEY4.7K views
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea... by Denodo
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo DataFest 2016: Data Science: Operationalizing Analytical Models in Rea...
Denodo 316 views
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova... by Denodo
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Analyst Keynote: Forrester: Data Fabric Strategy is Vital for Business Innova...
Denodo 84 views
Denodo’s Data Catalog: Bridging the Gap between Data and Business by Denodo
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo 79 views
Introduction to Data Mining, Business Intelligence and Data Science by IMC Institute
Introduction to Data Mining, Business Intelligence and Data ScienceIntroduction to Data Mining, Business Intelligence and Data Science
Introduction to Data Mining, Business Intelligence and Data Science
IMC Institute3.3K views
Big data competitive landscape overview by Bisakha Praharaj
Big data competitive landscape overviewBig data competitive landscape overview
Big data competitive landscape overview
Bisakha Praharaj6.3K views
Top 10 renowned big data companies by Robert Smith
Top 10 renowned big data companiesTop 10 renowned big data companies
Top 10 renowned big data companies
Robert Smith183 views
Die Big Data Fabric als Enabler für Machine Learning & AI by Denodo
Die Big Data Fabric als Enabler für Machine Learning & AIDie Big Data Fabric als Enabler für Machine Learning & AI
Die Big Data Fabric als Enabler für Machine Learning & AI
Denodo 434 views
Forecast of Big Data Trends by IMC Institute
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data Trends
IMC Institute10K views

Similar to xGem BigData

Keyrus US Information by
Keyrus US InformationKeyrus US Information
Keyrus US InformationDevon Ziegenfuss
263 views8 slides
Big Data Hadoop by
Big Data HadoopBig Data Hadoop
Big Data HadoopTechsparks
91 views13 slides
Big data peresintaion by
Big data peresintaion Big data peresintaion
Big data peresintaion ahmed alshikh
185 views25 slides
Big Data-Survey by
Big Data-SurveyBig Data-Survey
Big Data-Surveyijeei-iaes
47 views7 slides
The Forrester Wave - Big Data Hadoop by
The Forrester Wave - Big Data HadoopThe Forrester Wave - Big Data Hadoop
The Forrester Wave - Big Data HadoopIBM Software India
632 views15 slides
Hadoop and Big Data Analytics | Sysfore by
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
50 views6 slides

Similar to xGem BigData(20)

Big data peresintaion by ahmed alshikh
Big data peresintaion Big data peresintaion
Big data peresintaion
ahmed alshikh185 views
Big Data-Survey by ijeei-iaes
Big Data-SurveyBig Data-Survey
Big Data-Survey
ijeei-iaes47 views
Big data data lake and beyond by Rajesh Kumar
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
Rajesh Kumar165 views
Business Insights at Scale: Managing and Analyzing Massive Data Sets by williamshakes1
Business Insights at Scale: Managing and Analyzing Massive Data SetsBusiness Insights at Scale: Managing and Analyzing Massive Data Sets
Business Insights at Scale: Managing and Analyzing Massive Data Sets
williamshakes13 views
The book of elephant tattoo by Mohamed Magdy
The book of elephant tattooThe book of elephant tattoo
The book of elephant tattoo
Mohamed Magdy171 views
How Big Data ,Cloud Computing ,Data Science can help business by Ajay Ohri
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
Ajay Ohri4.1K views
How to tackle big data from a security by Tyrone Systems
How to tackle big data from a securityHow to tackle big data from a security
How to tackle big data from a security
Tyrone Systems627 views
BAR360 open data platform presentation at DAMA, Sydney by Sai Paravastu
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu166 views
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag... by Experfy
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
Experfy131 views

More from Julio Castro

Blockchain zero administration with python by
Blockchain zero administration with pythonBlockchain zero administration with python
Blockchain zero administration with pythonJulio Castro
396 views22 slides
Jasper by
JasperJasper
JasperJulio Castro
187 views19 slides
Nifi by
NifiNifi
NifiJulio Castro
1.4K views19 slides
Digital transformation by
Digital transformationDigital transformation
Digital transformationJulio Castro
481 views21 slides
Mobile Offline First by
Mobile Offline FirstMobile Offline First
Mobile Offline FirstJulio Castro
244 views22 slides
Keynote xgem by
Keynote xgemKeynote xgem
Keynote xgemJulio Castro
390 views10 slides

More from Julio Castro(6)

Recently uploaded

Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
63 views13 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
344 views86 slides
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
77 views29 slides
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
91 views8 slides
State of the Union - Rohit Yadav - Apache CloudStack by
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStackShapeBlue
145 views53 slides
Scaling Knowledge Graph Architectures with AI by
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIEnterprise Knowledge
53 views15 slides

Recently uploaded(20)

Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue63 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software344 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc77 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue91 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue145 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson133 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue82 views
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue50 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue131 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 views
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ... by ShapeBlue
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
Backroll, News and Demo - Pierre Charton, Matthias Dhellin, Ousmane Diarra - ...
ShapeBlue83 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays40 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue77 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue96 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue88 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue102 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue74 views

xGem BigData

  • 3. Agenda  What is Big Data?  Big Data Technologies  What is Hadoop?  Big Data Components  Hadoop Distributions  HortonWorkd Data Platform  Log Analyzed
  • 4. What is Big Data? Ernst and Young offers the following definition: Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools and machines. It requires new, innovative, and scalable technology to collect, host and analytically process the vast amount of data gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management and enhanced shareholder value. The research firm Gartner, defines Big Data as follows: Big Data is high-volume, high-velocity, and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making and process automation.
  • 5. 5V’s del Big Data BigData 3 Variety is the diversity of the data. We have structured data that fits neatly into rows and columns, or relational databases and unstructured data that is not organized in a pre-defined way, for example Tweets, blogposts, pictures, numbers, and even video data. Variety 1 Velocity Velocity is the idea that data is being generated extremely fast, a process that never stops. Attributes include near or real-time streaming and local and cloud-based technologies that can process information very quickly. 4 Veracity is the conformity to facts and accuracy. Is the information real, or is it false? Veracity 2 Volume Volume is the scale of the data, or the increase in the amount of data stored. 5 VALUE
  • 6. Big Data Value isn't just profit. It may be medical or social benefits, or customer, employee, personal satisfaction or crime prevention. The main reasons for why people invest time to understand Big Data is to derive value from it. VALUE
  • 8. What is Apache Hadoop? • Hadoop is an open-source software framework used to store and process huge amounts of data. • Owned by Apache Software Foundation • Transforms commodity hardware into a service that: • Stores petabytes of data reliably (HDFS) • Allows huge distributed computations (MapReduce) • Key attributes: • Redundant and reliable • Doesn’t stop or lose data even if hardware fails • Easy to program • Extremely powerful • Allows the development of big data algorithms & tools • Batch processing centric • Runs on commodity hardware • Computers & network
  • 10. Who use Hadoop? 2006 2008 2009 2010 The Datagraph Blog 2007
  • 11. How HDFS Works? Namenode Persistent Namespace Metadata & Journal Namespace State Block Map Heartbeats & Block Reports Block ID  Block Locations Datanodes Block ID  Data Hierarchal Namespace File Name  BlockIDs Horizontally Scale IO and Storage b1 b5 b3 JBOD BlockStorageNamespace b2 b3 b1 JBOD b3 b5 b2 JBOD b1 b5 b2 JBOD
  • 12. HDFS Data Reliability Namenode Namespace State Block Map b1 b5 b3 JBOD b2 b3 b4 JBOD b3 b5 b2 JBOD b1b5 b2 JBOD 2. copy 3. blockReceived 1. replicate Bad/lost block replica Periodically check block checksums
  • 14. What is the Hadoop framework?
  • 18. Log Analytics Systems Today LOG ANALYTICS PLATFORMNetwork Device Logs • Not all data can be captured • Not all captured data is valuable • Transport all data
  • 19. LOG ANALYTICS PLATFORM Network Device Logs HDP HDF 2. Content-based routing based on dynamic evaluation of content, attributes, priority 1. Integrate and enrich logs across data centers and security zones 3. Cost effectively expand collection and grow timescale of logs collected Expand Storage Options of Log Data