SlideShare a Scribd company logo
1 of 50
Arun Kumar
MSc(Computer Science),
Don Bosco College Yelagirihills.
10/3/20181 Don Bosco College, Yelagiri hills.
Outline
 Big Data : An Introduction
 Big Data Analytics
 Big Data Analytics : Applications and Business
prosperity
 Big Data Technology
 Big Data : Issues and Challenges
 Conclusion
10/3/20182 Don Bosco College, Yelagiri hills.
Big Data:
An Introduction
10/3/20183 Don Bosco College, Yelagiri hills.
Introduction
4
 Data
 Facts and piece of information collected together
for reference or analysis
 Information processed or stored by computer &
other electronic devices
 Text, image, audio, video, etc.,
10/3/2018Don Bosco College, Yelagiri hills.
Introduction
10/3/20185
 Big data is similar to data, but it’s not behave the
same
 The term ‘big data’ applies to information that cannot be
processed or handled using traditional processes or tools
1 8 bit
1024
byte
1024
kilobyte
1024
megabyte
1024
Gigabyte
1024
Terabyte
1024
petabyte
1024
Exabyte
1024
zeta byte
Bit
Byte
Kilobyte
Megabyte
Gigabyte
Terabyte
Petabyte
Exabyte
Zetabyte
Yottobye
Don Bosco College, Yelagiri hills.
Definition
10/3/20186
 There is no single standard definition.
 Big data is high-volume, high-velocity and high-
variety information assets that demand cost-effective,
innovative forms of information processing for enhanced
insight and decision making.
-Gartner.
 “Big data exceeds the reach of commonly used hardware
environments and software tools to capture, manage,
and process it with in a tolerable elapsed time for its user
population.” -Teradata Magazine article,
2011.
Don Bosco College, Yelagiri hills.
Introduction
 Characteristics of Big Data
Big Data
Velocity
Variety
Volume
10/3/20187 Don Bosco College, Yelagiri hills.
Introduction
 Characteristics of Big Data.
 Volume:
 Huge size of data (Tera byte to Peta byte) at rest.
 Velocity:
 Data in motion (streaming data).
 Variety:
 Varieties of data (image, audio, text, video, etc).
10/3/20188 Don Bosco College, Yelagiri hills.
Introduction
 Characteristics of Big Data
 Now researchers include more V’s
 Veracity
 Value
 Variability
.
.
.
.
 Victory
10/3/20189 Don Bosco College, Yelagiri hills.
Volume
10/3/201810 Don Bosco College, Yelagiri hills.
Variety
10/3/201811 Don Bosco College, Yelagiri hills.
Velocity
10/3/201812 Don Bosco College, Yelagiri hills.
Sources of Big Data
13
 What is big data?
 Every day, we create 2.5 quintillion bytes of data
— so much that 90% of the data in the world today has been created
in the last two years alone.
 Data comes from everywhere:
 sensors used to gather climate information
 posts to social media sites
 digital pictures and videos
 purchase transaction records
 cell phone GPS signals, etc.
 This data is big data.
10/3/2018Don Bosco College, Yelagiri hills.
Web & Ecommerce
BECOMES
BIG
DATABank/Credit card
Transactional
Mobile
Social
Video & Preference
Machine & Sensor
Retail POS
Sources of Big Data
10/3/201814 Don Bosco College, Yelagiri hills.
Who is generating big data?
10/3/201815
 The Model of Generating/Consuming Data has
Changed
Old Model: Few companies are generating data, all others are consuming
data
New Model: all of us are generating data, and all of us are consuming
data
Don Bosco College, Yelagiri hills.
10/3/201816 Don Bosco College, Yelagiri hills.
What we know or see
What’s actually there
What does Big Data look like ?
10/3/201817 Don Bosco College, Yelagiri hills.
Area of Applications
10/3/201818
 Health care / Biotech.
 E – Governance.
 Social Networks /
Social Media.
 Weather Forecasting.
 Education data.
Don Bosco College, Yelagiri hills.
Area of Applications
10/3/201819
 Banking / Insurance / Finance.
 Retail industries.
 CRM / Customer Analytics.
 Airways and etc.,.
Don Bosco College, Yelagiri hills.
Big Data Analytics
10/3/201820 Don Bosco College, Yelagiri hills.
Definition
 Big data analytics is the process of examining
enormous amounts of data of a variety of types to
uncover hidden patterns, unknown correlations and other
useful information.
 Example:
Searches in “friends” networks at social-networking
sites, involve graphs with hundreds of millions of nodes
and many billions of edges.
10/3/201821 Don Bosco College, Yelagiri hills.
Why Big Data Analytics Feasible?
10/3/2018Don Bosco College, Yelagiri hills.22
 Increased storage capacities
 Next generation products
 Cost Reduction
 Faster and better decision making
 Communication networking
 Improved services or products
 Distributed processing technologies
Stages in Big Data Analytics
10/3/201823 Don Bosco College, Yelagiri hills.
Available Analytic Methods
 Traditional Data Processing systems
 Information Processing using statistical tools
 Knowledge Engineering and Intelligence Systems
 Business Analytics using Data mining
 Business Intelligence
 Genetic Algorithms
 Machine learning algorithms
 Exploratory data analysis and etc.,
10/3/201824 Don Bosco College, Yelagiri hills.
Types of Big Data Analytics
10/3/201825
Analytics
Descriptive:
what is
happened?
Predictive:
what will
happen?
Prescriptive:
What
should
happen?
Don Bosco College, Yelagiri hills.
Capture
Organize
IntegrateAnalyze
Act
The Cycle of Big Data Management
10/3/201826 Don Bosco College, Yelagiri hills.
 Analysis of data is a process of,
with the goal of discovering useful information,
suggesting conclusions, and supporting decision-making.
Activities in Analytics
 Inspecting
 Cleaning
 Transforming
 modeling
10/3/201827 Don Bosco College, Yelagiri hills.
Why new analytical method needed?
 Big in Size – (Volume)
 Unstructured data – (Variety)
 To analyze the streaming data (High-Velocity)
 Distributed
 Need of parallel analytics
10/3/201828 Don Bosco College, Yelagiri hills.
Big Data Technology
10/3/201829 Don Bosco College, Yelagiri hills.
Key Technologies for Big data
 DFS (Distributed File System):
 Large files are split into parts
 Move file parts into a cluster
 Fault-tolerant through replication across nodes while being rack-
aware
 MapReduce:
Move algorithms close to the data by structuring them for
parallel execution so that each task works on a part of the data. The
power of Simplicity!
 NoSQL:
A NoSQL (often interpreted as Not Only SQL) database
provides a mechanism for storage and retrieval of data that is modeled
in means other than the tabular relations used in relational databases.
10/3/201830 Don Bosco College, Yelagiri hills.
Key Technologies for Big data
Three key technologies that can help to handle big data:
 Information management for big data: Manage data as
a strategic, core asset, with ongoing process control
High-performance analytics for big data: Gain rapid
insights from big data and the ability to solve increasingly
complex problems
Flexible deployment options for big data: Choose
between options for on premises or hosted, software-as-a-
service (SaaS) approaches
10/3/201831 Don Bosco College, Yelagiri hills.
 Fast Processors and Massively Parallel Processing
(MPP)
 Distributed File System
 Apache Hadoop
 Data Intensive Computing Strategies
 Low cost storages, In-Memory Processing
Technologies for Big data
10/3/201832 Don Bosco College, Yelagiri hills.
 Hadoop Distributions
 Hortonworks
 Cloud Operating System
 Cloud Foundry — By VMware
 OpenStack — Worldwide participation and well-known
companies
 Storage
 fusion-io — Not open source, but very supportive of Open
Source projects; Flash-aware applications.
10/3/2018Don Bosco College, Yelagiri hills.33
Technologies for Big data
 Python — Awesome programming language.
 Mahout — Machine learning programming
language.
 R — Best among Data mining tools.
 Storm — Stream processing by Twitter.
 Giraph — Graph processing by Facebook.
10/3/2018Don Bosco College, Yelagiri hills.34
Development Platforms and Tools
 NoSQL Databases
 MongoDB
 Cassandra
 Hbase (Hadoop)
 SQL Databases
 MySql — Belongs to Oracle
 PostgreSQL — Object Relational Database
 TokuDB — Improves RDBMS performance
10/3/2018Don Bosco College, Yelagiri hills.35
Databases
Visualization tools
10/3/2018Don Bosco College, Yelagiri hills.36
 Maps
 Charts (pie, bar, plot, etc)
 Graphs
Big Data: Issues &
Challenges
10/3/201837 Don Bosco College, Yelagiri hills.
Challenges
10/3/201838
The Bottleneck is…..
 In technology
 New architecture, algorithms, techniques are needed
 Also in technical skills
 Lack of experts in using the new technology
Don Bosco College, Yelagiri hills.
Data sources
Big Data Analytics
10/3/201839 Don Bosco College, Yelagiri hills.
Challenges
Internet of Things related
 The amount of data needed to sort, improve, integrate,
analyze and manage is huge.
 Sensor devices, constantly chattering updates about
moisture, light, movement
 Real-time stream data analytics platform that can handle
Big Data and a scalable infrastructure to support it.
10/3/201840 Don Bosco College, Yelagiri hills.
Challenges
Cloud computing related
 Traditional WAN-based transport methods cannot move
terabytes of data at the speed dictated by businesses
10/3/201841 Don Bosco College, Yelagiri hills.
Classified Issues & Challenges
 Storage
 Management
 Processing
 Visualization
10/3/201842 Don Bosco College, Yelagiri hills.
Challenges: Storage related
 Clearly not enough hard disks/devices.
 Distributed storage is still not enough, manufacturers
cannot make enough storage devices in time.
 Speed in writing to devices, bigger data paths/data-bus
10/3/201843 Don Bosco College, Yelagiri hills.
Challenges: Management related
 Data Collection
 Organize the varieties of data
 Need of distributed environments
 Need of new analytical methodology
10/3/201844 Don Bosco College, Yelagiri hills.
Challenges: Processing related
 Integrating data using Filters
 “What” Data and “How” ?
 Effective Data processing system Design
 Latency and Bandwidth
 Streaming data processing
10/3/201845 Don Bosco College, Yelagiri hills.
Challenges: Big data visualization
 Meeting the need for speed
 Understanding the data
 Addressing data quality
 Displaying meaningful results
10/3/201846 Don Bosco College, Yelagiri hills.
Conclusion
10/3/201847 Don Bosco College, Yelagiri hills.
For Researchers
 Research institutes and companies invite more data
scientists for the research and development.
 Research opportunities in R & D in the respective fields
such as
 Telecom industry
 Retail industry
 Social networks
 Healthcare industry and so on.
10/3/201848
For Students
10/3/201849
 Develop deep analytical skills to grab Analyst positions
 Basic knowledge about Optimization techniques, Data
mining, Machine Learning algorithms, etc.
 Keep an eye on evolving technologies
Thank you
10/3/201850 Don Bosco College, Yelagiri hills.

More Related Content

What's hot

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop IntroductionJayant Mukherjee
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Simplilearn
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data AnalyticsUtkarsh Sharma
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceDATAVERSITY
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesDATAVERSITY
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?Bernard Marr
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineerAlex Chalini
 

What's hot (20)

Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Mining
Data MiningData Mining
Data Mining
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
What’s The Difference Between Structured, Semi-Structured And Unstructured Data?
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineer
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 

Similar to Big Data analytics

06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyanIAESIJEECS
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsWay-Yen Lin
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processingPranav Gontalwar
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIIJCSEA Journal
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big datasarfraznawaz
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Mr.Sameer Kumar Das
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesBhanu Prakash
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Trieu Nguyen
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyEditor IJCATR
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsSherinMariamReji05
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 

Similar to Big Data analytics (20)

06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
 
Big Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data ScientistsBig Data, Big Deal: For Future Big Data Scientists
Big Data, Big Deal: For Future Big Data Scientists
 
In memory big data management and processing
In memory big data management and processingIn memory big data management and processing
In memory big data management and processing
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
DEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AIDEALING CRISIS MANAGEMENT USING AI
DEALING CRISIS MANAGEMENT USING AI
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
 
Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53Sameer Kumar Das International Conference Paper 53
Sameer Kumar Das International Conference Paper 53
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websites
 
Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)Concepts, use cases and principles to build big data systems (1)
Concepts, use cases and principles to build big data systems (1)
 
Big Data a Catalunya
Big Data a CatalunyaBig Data a Catalunya
Big Data a Catalunya
 
Big Data a Catalunya
Big Data a CatalunyaBig Data a Catalunya
Big Data a Catalunya
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A SurveyData Mining in the World of BIG Data-A Survey
Data Mining in the World of BIG Data-A Survey
 
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital ForensicsBig Data in Distributed Analytics,Cybersecurity And Digital Forensics
Big Data in Distributed Analytics,Cybersecurity And Digital Forensics
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 

Recently uploaded

Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 

Big Data analytics

  • 1. Arun Kumar MSc(Computer Science), Don Bosco College Yelagirihills. 10/3/20181 Don Bosco College, Yelagiri hills.
  • 2. Outline  Big Data : An Introduction  Big Data Analytics  Big Data Analytics : Applications and Business prosperity  Big Data Technology  Big Data : Issues and Challenges  Conclusion 10/3/20182 Don Bosco College, Yelagiri hills.
  • 3. Big Data: An Introduction 10/3/20183 Don Bosco College, Yelagiri hills.
  • 4. Introduction 4  Data  Facts and piece of information collected together for reference or analysis  Information processed or stored by computer & other electronic devices  Text, image, audio, video, etc., 10/3/2018Don Bosco College, Yelagiri hills.
  • 5. Introduction 10/3/20185  Big data is similar to data, but it’s not behave the same  The term ‘big data’ applies to information that cannot be processed or handled using traditional processes or tools 1 8 bit 1024 byte 1024 kilobyte 1024 megabyte 1024 Gigabyte 1024 Terabyte 1024 petabyte 1024 Exabyte 1024 zeta byte Bit Byte Kilobyte Megabyte Gigabyte Terabyte Petabyte Exabyte Zetabyte Yottobye Don Bosco College, Yelagiri hills.
  • 6. Definition 10/3/20186  There is no single standard definition.  Big data is high-volume, high-velocity and high- variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. -Gartner.  “Big data exceeds the reach of commonly used hardware environments and software tools to capture, manage, and process it with in a tolerable elapsed time for its user population.” -Teradata Magazine article, 2011. Don Bosco College, Yelagiri hills.
  • 7. Introduction  Characteristics of Big Data Big Data Velocity Variety Volume 10/3/20187 Don Bosco College, Yelagiri hills.
  • 8. Introduction  Characteristics of Big Data.  Volume:  Huge size of data (Tera byte to Peta byte) at rest.  Velocity:  Data in motion (streaming data).  Variety:  Varieties of data (image, audio, text, video, etc). 10/3/20188 Don Bosco College, Yelagiri hills.
  • 9. Introduction  Characteristics of Big Data  Now researchers include more V’s  Veracity  Value  Variability . . . .  Victory 10/3/20189 Don Bosco College, Yelagiri hills.
  • 10. Volume 10/3/201810 Don Bosco College, Yelagiri hills.
  • 11. Variety 10/3/201811 Don Bosco College, Yelagiri hills.
  • 12. Velocity 10/3/201812 Don Bosco College, Yelagiri hills.
  • 13. Sources of Big Data 13  What is big data?  Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone.  Data comes from everywhere:  sensors used to gather climate information  posts to social media sites  digital pictures and videos  purchase transaction records  cell phone GPS signals, etc.  This data is big data. 10/3/2018Don Bosco College, Yelagiri hills.
  • 14. Web & Ecommerce BECOMES BIG DATABank/Credit card Transactional Mobile Social Video & Preference Machine & Sensor Retail POS Sources of Big Data 10/3/201814 Don Bosco College, Yelagiri hills.
  • 15. Who is generating big data? 10/3/201815  The Model of Generating/Consuming Data has Changed Old Model: Few companies are generating data, all others are consuming data New Model: all of us are generating data, and all of us are consuming data Don Bosco College, Yelagiri hills.
  • 16. 10/3/201816 Don Bosco College, Yelagiri hills.
  • 17. What we know or see What’s actually there What does Big Data look like ? 10/3/201817 Don Bosco College, Yelagiri hills.
  • 18. Area of Applications 10/3/201818  Health care / Biotech.  E – Governance.  Social Networks / Social Media.  Weather Forecasting.  Education data. Don Bosco College, Yelagiri hills.
  • 19. Area of Applications 10/3/201819  Banking / Insurance / Finance.  Retail industries.  CRM / Customer Analytics.  Airways and etc.,. Don Bosco College, Yelagiri hills.
  • 20. Big Data Analytics 10/3/201820 Don Bosco College, Yelagiri hills.
  • 21. Definition  Big data analytics is the process of examining enormous amounts of data of a variety of types to uncover hidden patterns, unknown correlations and other useful information.  Example: Searches in “friends” networks at social-networking sites, involve graphs with hundreds of millions of nodes and many billions of edges. 10/3/201821 Don Bosco College, Yelagiri hills.
  • 22. Why Big Data Analytics Feasible? 10/3/2018Don Bosco College, Yelagiri hills.22  Increased storage capacities  Next generation products  Cost Reduction  Faster and better decision making  Communication networking  Improved services or products  Distributed processing technologies
  • 23. Stages in Big Data Analytics 10/3/201823 Don Bosco College, Yelagiri hills.
  • 24. Available Analytic Methods  Traditional Data Processing systems  Information Processing using statistical tools  Knowledge Engineering and Intelligence Systems  Business Analytics using Data mining  Business Intelligence  Genetic Algorithms  Machine learning algorithms  Exploratory data analysis and etc., 10/3/201824 Don Bosco College, Yelagiri hills.
  • 25. Types of Big Data Analytics 10/3/201825 Analytics Descriptive: what is happened? Predictive: what will happen? Prescriptive: What should happen? Don Bosco College, Yelagiri hills.
  • 26. Capture Organize IntegrateAnalyze Act The Cycle of Big Data Management 10/3/201826 Don Bosco College, Yelagiri hills.
  • 27.  Analysis of data is a process of, with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Activities in Analytics  Inspecting  Cleaning  Transforming  modeling 10/3/201827 Don Bosco College, Yelagiri hills.
  • 28. Why new analytical method needed?  Big in Size – (Volume)  Unstructured data – (Variety)  To analyze the streaming data (High-Velocity)  Distributed  Need of parallel analytics 10/3/201828 Don Bosco College, Yelagiri hills.
  • 29. Big Data Technology 10/3/201829 Don Bosco College, Yelagiri hills.
  • 30. Key Technologies for Big data  DFS (Distributed File System):  Large files are split into parts  Move file parts into a cluster  Fault-tolerant through replication across nodes while being rack- aware  MapReduce: Move algorithms close to the data by structuring them for parallel execution so that each task works on a part of the data. The power of Simplicity!  NoSQL: A NoSQL (often interpreted as Not Only SQL) database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. 10/3/201830 Don Bosco College, Yelagiri hills.
  • 31. Key Technologies for Big data Three key technologies that can help to handle big data:  Information management for big data: Manage data as a strategic, core asset, with ongoing process control High-performance analytics for big data: Gain rapid insights from big data and the ability to solve increasingly complex problems Flexible deployment options for big data: Choose between options for on premises or hosted, software-as-a- service (SaaS) approaches 10/3/201831 Don Bosco College, Yelagiri hills.
  • 32.  Fast Processors and Massively Parallel Processing (MPP)  Distributed File System  Apache Hadoop  Data Intensive Computing Strategies  Low cost storages, In-Memory Processing Technologies for Big data 10/3/201832 Don Bosco College, Yelagiri hills.
  • 33.  Hadoop Distributions  Hortonworks  Cloud Operating System  Cloud Foundry — By VMware  OpenStack — Worldwide participation and well-known companies  Storage  fusion-io — Not open source, but very supportive of Open Source projects; Flash-aware applications. 10/3/2018Don Bosco College, Yelagiri hills.33 Technologies for Big data
  • 34.  Python — Awesome programming language.  Mahout — Machine learning programming language.  R — Best among Data mining tools.  Storm — Stream processing by Twitter.  Giraph — Graph processing by Facebook. 10/3/2018Don Bosco College, Yelagiri hills.34 Development Platforms and Tools
  • 35.  NoSQL Databases  MongoDB  Cassandra  Hbase (Hadoop)  SQL Databases  MySql — Belongs to Oracle  PostgreSQL — Object Relational Database  TokuDB — Improves RDBMS performance 10/3/2018Don Bosco College, Yelagiri hills.35 Databases
  • 36. Visualization tools 10/3/2018Don Bosco College, Yelagiri hills.36  Maps  Charts (pie, bar, plot, etc)  Graphs
  • 37. Big Data: Issues & Challenges 10/3/201837 Don Bosco College, Yelagiri hills.
  • 38. Challenges 10/3/201838 The Bottleneck is…..  In technology  New architecture, algorithms, techniques are needed  Also in technical skills  Lack of experts in using the new technology Don Bosco College, Yelagiri hills.
  • 39. Data sources Big Data Analytics 10/3/201839 Don Bosco College, Yelagiri hills.
  • 40. Challenges Internet of Things related  The amount of data needed to sort, improve, integrate, analyze and manage is huge.  Sensor devices, constantly chattering updates about moisture, light, movement  Real-time stream data analytics platform that can handle Big Data and a scalable infrastructure to support it. 10/3/201840 Don Bosco College, Yelagiri hills.
  • 41. Challenges Cloud computing related  Traditional WAN-based transport methods cannot move terabytes of data at the speed dictated by businesses 10/3/201841 Don Bosco College, Yelagiri hills.
  • 42. Classified Issues & Challenges  Storage  Management  Processing  Visualization 10/3/201842 Don Bosco College, Yelagiri hills.
  • 43. Challenges: Storage related  Clearly not enough hard disks/devices.  Distributed storage is still not enough, manufacturers cannot make enough storage devices in time.  Speed in writing to devices, bigger data paths/data-bus 10/3/201843 Don Bosco College, Yelagiri hills.
  • 44. Challenges: Management related  Data Collection  Organize the varieties of data  Need of distributed environments  Need of new analytical methodology 10/3/201844 Don Bosco College, Yelagiri hills.
  • 45. Challenges: Processing related  Integrating data using Filters  “What” Data and “How” ?  Effective Data processing system Design  Latency and Bandwidth  Streaming data processing 10/3/201845 Don Bosco College, Yelagiri hills.
  • 46. Challenges: Big data visualization  Meeting the need for speed  Understanding the data  Addressing data quality  Displaying meaningful results 10/3/201846 Don Bosco College, Yelagiri hills.
  • 47. Conclusion 10/3/201847 Don Bosco College, Yelagiri hills.
  • 48. For Researchers  Research institutes and companies invite more data scientists for the research and development.  Research opportunities in R & D in the respective fields such as  Telecom industry  Retail industry  Social networks  Healthcare industry and so on. 10/3/201848
  • 49. For Students 10/3/201849  Develop deep analytical skills to grab Analyst positions  Basic knowledge about Optimization techniques, Data mining, Machine Learning algorithms, etc.  Keep an eye on evolving technologies
  • 50. Thank you 10/3/201850 Don Bosco College, Yelagiri hills.