SlideShare a Scribd company logo
1 of 109
Download to read offline
Big Data and Analytics
Name of the Staff : M.FLORENCE DAYANA
Head, Dept. of CA
Bon Secours College for Women
Thanjavur.
• Big data analytics (BDA) is A new approach in information
management which provides a set of capabilities for
revealing additional value from BD.
• It is defined as “the process of examining large amounts of
data, from A variety of data sources and in different formats,
to deliver insights that can enable decisions in real or near
real time”.
• BDA is a different concept from those of Data Warehouse
(DW) or Business Intelligence (BI) systems.
Introduction
• The complexity of BD systems required the development of
a specialized architecture.
• Now a days, the most commonly used BD architecture is
hadoop.
• It has redefined data management because it processes large
amounts of data, timely and at a low cost.
Introduction
 Big data challenges include capturing data, data storage, data
analysis, search, sharing, transfer, visualization, querying,
updating, information privacy and data source.
 Big data was originally associated with three key concepts:
volume, variety, and velocity.
Challenges with Big Data
Challenges
1) Capture
2) Storage
3) Duration
4) Search
5) Analysis
6) Transfer
7) Visualization
8) Privacy violations
Dealing with data growth
Data today is growing at an exponential rate.
Most of the data that we have today has been generated
the last 2-3 years.
Generating insights in an timely manner
Infrastructure for big data as far as cost- efficiency,
elasticity, and easy upgrading/downgrading is concerned.
Recruiting and retaining big data talent
The other challenges is to decide on the period of
retention of big data. Just how long should one retain this
data? A tricky question indeed as some data is useful for
making long –term decisions.
Integrating disparate data source
There is a dearth of skilled professional s who
possess a high level of proficiency in data science that is
vital in implementation big data solution.
Validating data
The data changes are highly dynamic and therefore
there is a need to ingest this as quickly as possible.
Visualization
Data visualization is becoming popular as a separate
discipline. It short by quite a number, as far as business
visualization experts are concerned.
• Introduction
The information technology sector terms the exponential amounts of
data generated in today's interconnected world as 'Big Data.'
Big data comes in many forms from metrological and astronomical
calculations and mappings to social media networks and photography sharing
networks.
Retailers, government agencies, healthcare providers and insurers,
financial institutions and other organizations collect large amounts of data on
every transaction
Every doctor's office visit or purchase, to improve the functions or
processes in which they are involved.
How Big Data Impact on IT
• The Effect of Big Data on Information Technology
Employment
New data and document control systems, software, and
infrastructure to move, process and store this information are being
developed as we speak as older systems are becoming obsolete.
Indeed, the amount of data we are generating is growing at an
exponential rate. Some of the resulting effects include:
• Employment boom for specialists and IT professionals
• Shortage of IT workers in US with specific skills to handle large
pools of data
• A developed need for employer-sponsored training programs
• Call for the government to issue visas to foreign workers in US
• More data reliant companies in the marketplace as technology evolves
• New specialty job positions emerging in the healthcare IT sector
• Special higher education programs being developed to meet future
demand in Healthcare Informatics
• While the visa issuance debate rages on, IT and Healthcare IT
recruitment companies like Talascend are helping customers find the
best-fit talent for customers and best-fit IT jobs for candidates in
retail, financial, healthcare, software, insurance, manufacturing and
other technology markets to handle these effects.
• Volume: The amount of data collected from various resources
including e-business transaction (Paypal, Payatm, Airtel Money
etc), social media (Facebook, Twitter, Whatsapp), sensor
(weather monitoring, space sensor) and machine to machine
data (networking, IoT) by millions of user around the world. To
study such massive data Hadoop provide great too.
• Velocity: The massive stored data need unprecedented speed
with time constraint. In addition to device, it should be
connected in parallel with smart sensor and metering device in
real time process to keep the transparency of data.
3 V’s of Big Data
• Variety: Data comes in two or more formats, but majorly as
structured data (numeric data in traditional databases) and
unstructured data (like stock ticker data, email, financial
transactions, audio, video and text documents)
• Variability: Inconsistency of the data set at high velocity and
in variety of data needed to be processed without hampering the
information and manage the speed at peak load of data
processing for example social media data demand increase in
morning and evening.
3 V’s of Big Data
• Complexity: The data coming from variety of sources make it
difficult to link, cleanse, match and transfer.
3 V’s of Big Data
Structured dataD
I
G
I
T
A
L
D
A
T
A
Semi Structured data
Unstructured data
Types of Digital Data
Structured Data
• This is the data which is in an organized form and can be easily
used by a computer program.
• Relationship exist between entities of data such as classes and
their objects.
• When data conforms to a pre-defined scheme/structured we say
it is structured data. data which is in an organized form and can be
easily used by a computer program.
• Relationship exist between entities of data such as classes and their objects.
• When data conforms to a pre-defined schema/structured we say it is structured data.
Sources of Structured Data
Structured data
Data base such as
oracle,DB2,Tera
data ,My SQL,etc…
Spreed sheet
OLTP systems
Semi Structured Data
• Semi structured data is also refered to as self describing
structured
I. It does not conform to the data models that one
typically associates with rlational database or any
others form of data tables
II. It uses tag s to segregrate semantic elements
Sources of Semi Structured Data
Semi structured data
XML
Other mark up language
JSON
Characteristics of Semi Structured Data
Semi structured data
Inconsistent structured data
Sell-describing
Other schema information .
Data objects may have
different attributes
Unstructured data
• Unstructured data does not conform to any
pre-defined data model.
Dealing with Unstructured Data
Dealing with un
structured data
Data mining
Nature Language
Processing
(NLP)
Text Analysis
Noisy text analysis
ApacheCassandraisan opensource,distributedanddecentralized/distributed
storagesystem, formanaging very large amountsof structureddata.
It provideshighly availableservicewithno singlepointof failure.
It is scalable,fault-tolerant,and consistent.
It is a column-orienteddatabase.
Its distributiondesignis basedonAmazon’s Dynamo and itsdatamodel on
Google’sBigtable.
Cassandra
• Cassandra implements a Dynamo-style replication model
with no single point of failure, but adds a more powerful
“column family” data model.
Cassandra is being used by some of the biggest
companies such as Facebook, Twitter, Cisco, Rackspace,
ebay, Twitter, Netflix, and more.
Thefollowing aresome ofthe featuresofCassandra:
Elasticscalability− Cassandrais highly scalable;itallows toaddmore hardware to
accommodate more customers and more dataasperrequirement.
Alwaysonarchitecture− Cassandrahas no single pointoffailureand itis continuously
availableforbusiness-criticalapplicationsthatcannot afforda failure.
Fastlinear-scaleperformance −Cassandra islinearlyscalable,i.e.,itincreasesyour
throughput as you increasethenumber ofnodesin the cluster.Thereforeitmaintains a
quick responsetime.
Features of Cassandra
Flexibledatastorage−Cassandraaccommodates allpossibledataformats including:
structured,semi-structured,and unstructured.Itcandynamically accommodate changes
to your datastructuresaccordingto your need.
Easy data distribution−Cassandraprovidesthe flexibilityto distributedatawhere you
need by replicatingdataacrossmultiple datacenters.
Transactionsupport −Cassandra supportspropertieslikeAtomicity, Consistency,
Isolation,and Durability(ACID).
Fastwrites− Cassandrawas designedtorunoncheapcommodity hardware. Itperforms
blazingly fastwrites and canstorehundredsofterabytesofdata,without sacrificing the
readefficiency.
Cassandrahas peer-to-peerdistributedsystem acrossits nodes,and datais distributed
among allthe nodesina cluster.
All the nodesin a clusterplaythe same role.Each nodeisindependentand at the same
time interconnectedto othernodes.
Eachnodeina clustercanacceptreadandwrite requests,regardlessofwhere the datais
actuallylocatedin the cluster.
Whena nodegoesdown, read/writerequestscan beserved fromothernodesin the
network.
Cassandra Architecture
Data Replicationin Cassandra
In Cassandra,oneormore ofthe nodesin a clusteractasreplicasforagiven pieceof
data.
Ifitisdetectedthatsome ofthe nodesrespondedwith an out-of-datevalue, Cassandra
will returnthe most recentvalue to the client.
After returningthe most recentvalue, Cassandraperformsa readrepairin the
background to updatethestalevalues.
Components of Cassandra
Thekeycomponents ofCassandraareas follows
1.Node
2.Datacenter
3.Cluster
4.Commit log
5.Mem-table
6.SSTable
7.Bloomfilter
Node− Itisthe placewhere datais stored.
Datacenter−Itis acollectionofrelatednodes.
Cluster−Aclusteris acomponent thatcontains oneormore datacenters.
Commit log−Thecommit log isa crash-recoverymechanism in Cassandra.Every write
operationis written to thecommit log.
Mem-table−Amem-table isa memory-resident datastructure.After commit log,the
datawill bewritten tothemem-table. Sometimes, fora single-column family, therewill
bemultiple mem-tables.
SSTable − Itis adisk fileto which the datais flushed fromthe mem-table when its
contentsreacha thresholdvalue.
Bloomfilter−Thesearenothing butquick, nondeterministic,algorithms fortesting
whether an element is amember ofaset.Itis aspecialkind ofcache.Bloomfiltersare
accessedafterevery query.
UserscanaccessCassandra throughitsnodesusing CassandraQuery Language (CQL).
CQLtreatsthedatabaseKeyspace as a containeroftables
WriteOperations
Every write activity ofnodesis capturedbythe commit logswritten in the nodes.
Captureddataarestoredin themem-table. Whenever themem-table isfull, datawill be
written into the SStabledatafile.All writes areautomaticallypartitionedand replicated
throughout the cluster
ReadOperations
During readoperations,Cassandragets values from themem-table and checks thebloom
filtertofindthe appropriateSSTablethatholdsthe requireddata.
Cassandra Query Language
Cassandra - Data Model
Thedatamodel ofCassandrais significantly different froman RDBMS.
Cluster
Cassandra databaseis distributedoverseveralmachines that operatetogether.The
outermostcontaineris known as the Cluster.
Forfailurehandling, everynodecontains areplica,and incaseofa failure,the replica
takes charge.
Cassandra arrangesthe nodesin acluster,in a ringformat, and assigns datato them.
Keyspace
Keyspace is the outermostcontainerfordatain Cassandra.Thebasicattributesofa
Keyspace inCassandra are
1.Replicationfactor
2.Replicaplacementstrategy
3.Column families
Replicationfactor−Itis thenumber ofmachines in the clusterthatwill receivecopiesof
the same data.
Replicaplacementstrategy− It isnothingbutthestrategyto placereplicasin
thering.
Strategies
1.Simple strategy(rack-aware strategy),
2.Old network topologystrategy(rack-aware strategy)and
3.Network topologystrategy(datacenter-sharedstrategy).
Column families −Keyspace is acontainerforalistofoneormore column families.A
column family, inturn,is acontainerofa collectionofrows. Eachrowcontains ordered
columns. Column families representthe structureofyour data.Each keyspace has at least
oneand oftenmany column families.
Syntax
Thesyntax ofcreatinga Keyspace is asfollows −
Schematic viewofa Keyspace.
CREATE KEYSPACE Keyspace name WITH replication = {'class': 'SimpleStrategy',
'replication_factor' : 3};
Column Family
Acolumn family isa containerforan orderedcollectionofrows. Each row, in turn,is an
orderedcollectionofcolumns.
ACassandra column familyhas the following attributes −
keys_cached − It represents the number of locations to keep cached per
SSTable.
rows_cached − It represents the number of rows whose entire contents will be
cached in memory.
preload_row_cache − It specifies whether you want to pre-populate the row
cache.
Example
Column
Acolumn is the basicdatastructureofCassandra with threevalues,namely key or
column name, value, and a time stamp. Given belowis thestructureofacolumn.
SuperColumn
Asupercolumn isa specialcolumn, therefore,it isalsoakey-value pair.Buta super
column storesamap ofsub-columns.
Generallycolumn families arestoredondiskinindividualfiles.
Structure of a supercolumn
Tools - SQOOP
 When Big Data storages and analyzers such as MapReduce,
Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem
came into picture.
 They required a tool to interact with the relational database
servers for importing and exporting the Big Data residing
in them.
 Sqoop occupies a place in the Hadoop ecosystem to
provide feasible interaction between relational database
server and Hadoop’ s HDFS.
Introduction
SQOOP- DEFINITON
 Sqoop: “SQL to Hadoop and Hadoop to SQL”.
 Tool to transfer data from relational databases
Teradata, MySQL, PostgreSQL, Oracle, Netezza.
 It is provided by the Apache Software Foundation.
ARCHITECTURE OF SQOOP
WORKING OF SQOOP
SQOOP IMPORT
 The import tool imports individual tables from RDBMS to
HDFS.
 Each row in a table is treated as a record in HDFS.
 All records are stored as text data in text files or as binary
data in Avro and Sequence files.
SQOOP EXPORT
 The export tool exports a set of files from HDFS back to an
RDBMS.
 The files given as input to Sqoop contain records, which are
called as rows in table.
 Those are read and parsed into a set of records and delimited
with user-specified delimiter.
FEATURES OF SQOOP
o Full Load.
o Incremental Load.
o Parallel import/export.
o Import results of SQL query.
o Compression.
o Connectors for all major RDBMS Databases.
o Kerberos Security Integration.
ADVANTAGES OF SQOOP
 Allows the transfer of data with a variety of structured data
stores like Postgres, Oracle, Teradata, and so on.
 Sqoop can execute the data transfer in parallel, so
execution can be quick and more cost effective.
 Helps to integrate with sequential data from the
mainframe.
DISADVANTAGES OF SQOOP
 It uses a JDBC connection to connect with RDBMS
based data stores, and this can be inefficient and less
performant.
 For performing analysis, it executes various map-reduce
jobs and, at times, this can be time consuming when
there are lot of joins if the data is in a denormalized
fashion.
Introduction
• Hive is a data warehouse infrastructure tool to process
structured data in Hadoop. It resides on top of Hadoop to
summarize Big Data, and makes querying and analyzing easy.
• Initially Hive was developed by Facebook, later the Apache
Software Foundation took it up and developed it further as an
open source under the name Apache Hive. It is used by different
companies. For example, Amazon uses it in Amazon Elastic
Map Reduce.
• Hive is not A relational database A design for On Line
Transaction Processing OLTP A language for real-time queries
and row-level updates
HIVE
Features of Hive
• It stores schema in a database and processed data into HDFS.
• It is designed for OLAP. It provides SQL type language for
querying called HiveQL or HQL.
• It is familiar, fast, scalable, and extensible.
Architecture of Hive
Working of Hive
• The following diagram depicts the workflow between Hive and
Hadoop.
• A social network is a structure between actors, mostly individuals or
organizations.
• It indicates the ways in which they are connected through various
social familiarities , ranging from casual acquaintance to close
familiar bonds.
Social Network
Society as a Graph
• People are represented as nodes.
• Relationship are represented as edges: relationships may be
acquaintanceship , friendship , co-authorship , etc..
• Allows analysis using tools of mathematical graph theory.
Social NetworkAnalysis
Social network analysis[SNA] is the mapping and measuring of
relationships and flows between people , groups , organizations ,
computers or other information/knowledge processing entities.
Connections
Size
Number of nodes.
Density
Number of ties that are present/the amount of ties that
could be present.
Out – degree
Sum of connections from an actor to other.
In – degree
Sum of connections of an actor.
Distance
Walk
A sequence of actors and relations that begins and ends
with actors.
Geodesic distance
The number of relations in the shortest possible walk from
one actor to another.
Maximum flow
The amount of different actors in the neighbourhood of a
source that lead to pathways to a target.
Some measures of power and prestige
Degree
sum of connections from or to an actor.
Closeness centrality
Distance of one actor to all other in the network.
Betweenness centrality
Number that represents how frequently an actor is
between other actors geodesic paths.
Social network analysis : what for?
To control information flow
To improve/stimulate communication
To improve network resilience
To trust
Network Model Example
Centrality : strategic positions
Social Network Model
Tie strength
Social Network Formations
Community identification and marketing :
1. seasonal workers
2. SMEₛ
3. students
4. school children
Customer lifestyle analysis:
Analysis based on identifying critical life stage events
using social network changes
1.going to university
2.moving
3.changing job
4.starting a relationship- moving as a couple
5.imputing demographics
BIG DATA & IOT
• Big data is more into collecting and accumulating huge data for analysis
afterward, whereas IoT is about simultaneously collecting and
processing data to make real-time decisions.
• The internet of things, or IoT, is a system of interrelated computing devices,
mechanical and digital machines, objects, animals or people that are
provided with unique identifiers (UIDs) and the ability to transfer data over a
network without requiring human-to-human or human-to-computer
interaction.
How Big Data Powers the Internet of Things
 The Internet of Things (IoT) may sound like a futuristic term, but it’s
already here and increasingly woven into our everyday lives. The concept is
simpler than you may think: If you have a smart TV, fridge, doorbell, or any
other connected device, that’s part of the IoT .
Example 1: The region’s most popular theme park has released its own app.
It does more than just provide a map, schedule, and menu items (though
those are important); it also uses GPS pings to identify app users in line, thus
being able to display predicted wait times for rides based on density, even
being able to reserve a spot or trigger attractions based on proximity.
The Connection Between Big Data and IoT
• A company’s devices are installed to use sensors for collecting and transmitting data.
• That big data—sometimes pentabytes of data—is then collected, often in an repository
called a data lake. Both structured data from prepared data sources (user profiles,
transactional information, etc.) and unstructured data from other sources (social media
archives, emails and call center notes, security camera images, licensed data, etc.) reside in
the data lake.
• Reports, charts, and other outputs are generated, sometimes by AI-driven analytics
platforms such as Oracle Analytics
• User devices provide further metrics through settings, preferences, scheduling, metadata,
and other tangible transmissions, feeding back into the data lake for even heavier volumes
of big data.
What is the Internet of Things
• According to the Global Standards Initiative on the Internet of Things
(IoT-GSI), The Internet of Things is defined as the ‘infrastructure of
the information society’. Well, simply put, it is the interconnection
and the internetworking of devices, vehicles and various other
embedded components which are collectively used to gather data and
also analyze them in real time.
How Does IoT help
• IoT can help you manage your home in a more effective way. It helps you to
keep a check on your home from a remote location.
• IoT can help in better environment monitoring by analyzing the air and the
water quality.
• IoT can help media companies to understand the behaviour of their audience
better and develop more effective content targeted towards a specific niche.
IoT Enablers
–
• RFIDs: uses radio waves in order to electronically track the tags attached to
each physical object.
• Sensors: devices that are able to detect changes in an environment (ex:
motion detectors).
• Nanotechnology: as the name suggests, these are extremely small devices
with dimensions usually less than a hundred nanometers.
• Smart networks: (ex: mesh topology).
Applications and domains
• Application Domains:
IoT is currently found in four different popular domains:
• 1) Manufacturing/Industrial business - 40.2%
• 2) Healthcare - 30.3%
• 3) Security - 7.7%
• 4) Retail - 8.3%
ModernApplications for IOT
• Smart Grids
• Smart cities
• Smart homes
• Healthcare
• Earthquake detection
• Radiation detection/hazardous gas detection
• Smartphone detection
• Water flow monitoring
Big data platforms and IOT
• Context-Aware Infrastructures for the Internet of Things
• A Study on Opportunistic Data Dissemination Support for the Internet of
Things
• Future Trends and Research Directions in Big Data Platforms for the Internet
of Things
How does IoT contribute to big data
• IOT which connect the thing to the internet by using sensors, that
the data used for analysis and monitoring also storing.
• Cloud computing helps to store and access the data without having the
larger investment in systems and software.
• so the combination of both technologies can reduce both time and money.
IoT and Big data are working together
• There are many examples of big data and IoT working well together to offer
analysis and insight. One such example is represented by shipping organizations.
They have been utilizing big data analytics and sensor data to improve efficiency,
save money and lower their environmental impact. They utilize sensors on their
delivery vehicles in order to monitor engine health, number of stops, mileage,
miles per gallon, and speed.
• IoT and big data are creating waves in big agriculture. In this area, the field
connects systems monitors to the moisture levels and transmits this data to
farmers over a wireless connection. This data will enable farmers to find out
when crops are reaching the optimum moisture levels.
Big Data Technologies
• Data storage
• Data mining
• Data analytics
• Data visualisation
Big Data Management Technologies
Now let us deal with the technologies
falling under each of these categories with
their facts and capabilities,along with the
companies which are using them.
Data Storage
• Hadoop framework was
designed to store and process
data in a distributed data
processing environment with
commodity hardware with a
simple programming model.
• It can store and analyse the
data present in different
machines with high speeds and
low costs.
Data Mining
• Presto is an open source
distributed SQL query engine
for running interactive analytic
queries against data sources
of all sizes ranging from
gigabytes to petabytes.
• Presto allows quering data in
Hive,cassendra, relational
database and proprietary
data stories.
Data Analytics
• Apache Kafka is a distributed
streaming paltform. A
streaming platform has three
key capabilities that are as
follows:
. publisher
. Subscriber
. Consumer
Data Visualisation
• Tableau is powerful and
fastest growing data
visualisation tool used in the
business intelligence
industry.
• Data analysis is very fast
with tableau and the
visualisation created are in
the form of dashboards and
worksheets.
Thank You..

More Related Content

What's hot

big data Big Things
big data Big Thingsbig data Big Things
big data Big Thingspateelhs
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big datasarfraznawaz
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & ChallengesRupen Momaya
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWNellore Harilakshmi
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its ChallengesKathirvel Ayyaswamy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET Journal
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesIJRESJOURNAL
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutionscsandit
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A PrimerIJRTEMJOURNAL
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1Mahmoud Alfarra
 

What's hot (20)

big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
Identifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big dataIdentifying and analyzing the transient and permanent barriers for big data
Identifying and analyzing the transient and permanent barriers for big data
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
Datamining
DataminingDatamining
Datamining
 
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEWUSING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
USING BIGDATA WITH ACADEMIC LIBRARY SERVICES: A VIEW
 
Data mining
Data miningData mining
Data mining
 
Research issues in the big data and its Challenges
Research issues in the big data and its ChallengesResearch issues in the big data and its Challenges
Research issues in the big data and its Challenges
 
Data Science
Data ScienceData Science
Data Science
 
Big data
Big dataBig data
Big data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
IRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth EnhancementIRJET- Big Data Management and Growth Enhancement
IRJET- Big Data Management and Growth Enhancement
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data Management
Data ManagementData Management
Data Management
 
Moving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and PerspectivesMoving Toward Big Data: Challenges, Trends and Perspectives
Moving Toward Big Data: Challenges, Trends and Perspectives
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Issues, challenges, and solutions
Issues, challenges, and solutionsIssues, challenges, and solutions
Issues, challenges, and solutions
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A Primer
 
Introduction to-data-mining chapter 1
Introduction to-data-mining  chapter 1Introduction to-data-mining  chapter 1
Introduction to-data-mining chapter 1
 

Similar to Big Data Analytics and Its Impact

Similar to Big Data Analytics and Its Impact (20)

Big-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdfBig-Data-Analytics.8592259.powerpoint.pdf
Big-Data-Analytics.8592259.powerpoint.pdf
 
All About Big Data
All About Big Data All About Big Data
All About Big Data
 
Big data
Big dataBig data
Big data
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
big_data.ppt
big_data.pptbig_data.ppt
big_data.ppt
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Research paper on big data and hadoop
Research paper on big data and hadoopResearch paper on big data and hadoop
Research paper on big data and hadoop
 
Thilga
ThilgaThilga
Thilga
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Big data
Big dataBig data
Big data
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Big Data for Library Services (2017)
Big Data for Library Services (2017)Big Data for Library Services (2017)
Big Data for Library Services (2017)
 

More from Dr.Florence Dayana

Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdfDr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdfDr.Florence Dayana
 
Dr.M.Florence Dayana-Cloud Computing-Unit - 1.pdf
Dr.M.Florence Dayana-Cloud Computing-Unit - 1.pdfDr.M.Florence Dayana-Cloud Computing-Unit - 1.pdf
Dr.M.Florence Dayana-Cloud Computing-Unit - 1.pdfDr.Florence Dayana
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
M. FLORENCE DAYANA/unit - II logic gates and circuits.pdf
M. FLORENCE DAYANA/unit - II logic gates and circuits.pdfM. FLORENCE DAYANA/unit - II logic gates and circuits.pdf
M. FLORENCE DAYANA/unit - II logic gates and circuits.pdfDr.Florence Dayana
 
M.FLORENCE DAYANA/electronic mail security.pdf
M.FLORENCE DAYANA/electronic mail security.pdfM.FLORENCE DAYANA/electronic mail security.pdf
M.FLORENCE DAYANA/electronic mail security.pdfDr.Florence Dayana
 
M. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdf
M. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdfM. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdf
M. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdfDr.Florence Dayana
 
Professional English - Reading
Professional English - ReadingProfessional English - Reading
Professional English - ReadingDr.Florence Dayana
 
Professional English - Speaking
Professional English - SpeakingProfessional English - Speaking
Professional English - SpeakingDr.Florence Dayana
 
Professional English - Listening
Professional English - ListeningProfessional English - Listening
Professional English - ListeningDr.Florence Dayana
 
Network Security- Secure Socket Layer
Network Security- Secure Socket LayerNetwork Security- Secure Socket Layer
Network Security- Secure Socket LayerDr.Florence Dayana
 
M.florence dayana dream weaver
M.florence dayana   dream weaverM.florence dayana   dream weaver
M.florence dayana dream weaverDr.Florence Dayana
 
M.florence dayana computer networks transport layer
M.florence dayana   computer networks transport layerM.florence dayana   computer networks transport layer
M.florence dayana computer networks transport layerDr.Florence Dayana
 
M.Florence Dayana Computer Networks Types
M.Florence Dayana  Computer Networks TypesM.Florence Dayana  Computer Networks Types
M.Florence Dayana Computer Networks TypesDr.Florence Dayana
 
M.Florence Dayana Computer Networks Introduction
M.Florence Dayana   Computer Networks IntroductionM.Florence Dayana   Computer Networks Introduction
M.Florence Dayana Computer Networks IntroductionDr.Florence Dayana
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMM. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMDr.Florence Dayana
 
M.Florence Dayana / Basics of C Language
M.Florence Dayana / Basics of C LanguageM.Florence Dayana / Basics of C Language
M.Florence Dayana / Basics of C LanguageDr.Florence Dayana
 
M.Florence Dayana/Cryptography and Network security
M.Florence Dayana/Cryptography and Network securityM.Florence Dayana/Cryptography and Network security
M.Florence Dayana/Cryptography and Network securityDr.Florence Dayana
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XMLDr.Florence Dayana
 

More from Dr.Florence Dayana (20)

Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdfDr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
 
Dr.M.Florence Dayana-Cloud Computing-Unit - 1.pdf
Dr.M.Florence Dayana-Cloud Computing-Unit - 1.pdfDr.M.Florence Dayana-Cloud Computing-Unit - 1.pdf
Dr.M.Florence Dayana-Cloud Computing-Unit - 1.pdf
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
M. FLORENCE DAYANA/unit - II logic gates and circuits.pdf
M. FLORENCE DAYANA/unit - II logic gates and circuits.pdfM. FLORENCE DAYANA/unit - II logic gates and circuits.pdf
M. FLORENCE DAYANA/unit - II logic gates and circuits.pdf
 
M.FLORENCE DAYANA/electronic mail security.pdf
M.FLORENCE DAYANA/electronic mail security.pdfM.FLORENCE DAYANA/electronic mail security.pdf
M.FLORENCE DAYANA/electronic mail security.pdf
 
M. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdf
M. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdfM. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdf
M. FLORENCE DAYANA - INPUT & OUTPUT DEVICES.pdf
 
Professional English - Reading
Professional English - ReadingProfessional English - Reading
Professional English - Reading
 
Professional English - Speaking
Professional English - SpeakingProfessional English - Speaking
Professional English - Speaking
 
Professional English - Listening
Professional English - ListeningProfessional English - Listening
Professional English - Listening
 
INPUT AND OUTPUT DEVICES.pdf
INPUT  AND OUTPUT DEVICES.pdfINPUT  AND OUTPUT DEVICES.pdf
INPUT AND OUTPUT DEVICES.pdf
 
NETWORK SECURITY-SET.pptx
NETWORK SECURITY-SET.pptxNETWORK SECURITY-SET.pptx
NETWORK SECURITY-SET.pptx
 
Network Security- Secure Socket Layer
Network Security- Secure Socket LayerNetwork Security- Secure Socket Layer
Network Security- Secure Socket Layer
 
M.florence dayana dream weaver
M.florence dayana   dream weaverM.florence dayana   dream weaver
M.florence dayana dream weaver
 
M.florence dayana computer networks transport layer
M.florence dayana   computer networks transport layerM.florence dayana   computer networks transport layer
M.florence dayana computer networks transport layer
 
M.Florence Dayana Computer Networks Types
M.Florence Dayana  Computer Networks TypesM.Florence Dayana  Computer Networks Types
M.Florence Dayana Computer Networks Types
 
M.Florence Dayana Computer Networks Introduction
M.Florence Dayana   Computer Networks IntroductionM.Florence Dayana   Computer Networks Introduction
M.Florence Dayana Computer Networks Introduction
 
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEMM. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
M. FLORENCE DAYANA/DATABASE MANAGEMENT SYSYTEM
 
M.Florence Dayana / Basics of C Language
M.Florence Dayana / Basics of C LanguageM.Florence Dayana / Basics of C Language
M.Florence Dayana / Basics of C Language
 
M.Florence Dayana/Cryptography and Network security
M.Florence Dayana/Cryptography and Network securityM.Florence Dayana/Cryptography and Network security
M.Florence Dayana/Cryptography and Network security
 
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5   XMLM.FLORENCE DAYANA WEB DESIGN -Unit 5   XML
M.FLORENCE DAYANA WEB DESIGN -Unit 5 XML
 

Recently uploaded

भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Big Data Analytics and Its Impact

  • 1. Big Data and Analytics Name of the Staff : M.FLORENCE DAYANA Head, Dept. of CA Bon Secours College for Women Thanjavur.
  • 2.
  • 3. • Big data analytics (BDA) is A new approach in information management which provides a set of capabilities for revealing additional value from BD. • It is defined as “the process of examining large amounts of data, from A variety of data sources and in different formats, to deliver insights that can enable decisions in real or near real time”. • BDA is a different concept from those of Data Warehouse (DW) or Business Intelligence (BI) systems. Introduction
  • 4. • The complexity of BD systems required the development of a specialized architecture. • Now a days, the most commonly used BD architecture is hadoop. • It has redefined data management because it processes large amounts of data, timely and at a low cost. Introduction
  • 5.
  • 6.  Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source.  Big data was originally associated with three key concepts: volume, variety, and velocity. Challenges with Big Data
  • 7. Challenges 1) Capture 2) Storage 3) Duration 4) Search 5) Analysis 6) Transfer 7) Visualization 8) Privacy violations
  • 8. Dealing with data growth Data today is growing at an exponential rate. Most of the data that we have today has been generated the last 2-3 years. Generating insights in an timely manner Infrastructure for big data as far as cost- efficiency, elasticity, and easy upgrading/downgrading is concerned.
  • 9. Recruiting and retaining big data talent The other challenges is to decide on the period of retention of big data. Just how long should one retain this data? A tricky question indeed as some data is useful for making long –term decisions.
  • 10. Integrating disparate data source There is a dearth of skilled professional s who possess a high level of proficiency in data science that is vital in implementation big data solution. Validating data The data changes are highly dynamic and therefore there is a need to ingest this as quickly as possible.
  • 11. Visualization Data visualization is becoming popular as a separate discipline. It short by quite a number, as far as business visualization experts are concerned.
  • 12. • Introduction The information technology sector terms the exponential amounts of data generated in today's interconnected world as 'Big Data.' Big data comes in many forms from metrological and astronomical calculations and mappings to social media networks and photography sharing networks. Retailers, government agencies, healthcare providers and insurers, financial institutions and other organizations collect large amounts of data on every transaction Every doctor's office visit or purchase, to improve the functions or processes in which they are involved. How Big Data Impact on IT
  • 13. • The Effect of Big Data on Information Technology Employment New data and document control systems, software, and infrastructure to move, process and store this information are being developed as we speak as older systems are becoming obsolete. Indeed, the amount of data we are generating is growing at an exponential rate. Some of the resulting effects include: • Employment boom for specialists and IT professionals • Shortage of IT workers in US with specific skills to handle large pools of data • A developed need for employer-sponsored training programs • Call for the government to issue visas to foreign workers in US
  • 14. • More data reliant companies in the marketplace as technology evolves • New specialty job positions emerging in the healthcare IT sector • Special higher education programs being developed to meet future demand in Healthcare Informatics • While the visa issuance debate rages on, IT and Healthcare IT recruitment companies like Talascend are helping customers find the best-fit talent for customers and best-fit IT jobs for candidates in retail, financial, healthcare, software, insurance, manufacturing and other technology markets to handle these effects.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. • Volume: The amount of data collected from various resources including e-business transaction (Paypal, Payatm, Airtel Money etc), social media (Facebook, Twitter, Whatsapp), sensor (weather monitoring, space sensor) and machine to machine data (networking, IoT) by millions of user around the world. To study such massive data Hadoop provide great too. • Velocity: The massive stored data need unprecedented speed with time constraint. In addition to device, it should be connected in parallel with smart sensor and metering device in real time process to keep the transparency of data. 3 V’s of Big Data
  • 22. • Variety: Data comes in two or more formats, but majorly as structured data (numeric data in traditional databases) and unstructured data (like stock ticker data, email, financial transactions, audio, video and text documents) • Variability: Inconsistency of the data set at high velocity and in variety of data needed to be processed without hampering the information and manage the speed at peak load of data processing for example social media data demand increase in morning and evening. 3 V’s of Big Data
  • 23. • Complexity: The data coming from variety of sources make it difficult to link, cleanse, match and transfer. 3 V’s of Big Data
  • 24. Structured dataD I G I T A L D A T A Semi Structured data Unstructured data Types of Digital Data
  • 25. Structured Data • This is the data which is in an organized form and can be easily used by a computer program. • Relationship exist between entities of data such as classes and their objects. • When data conforms to a pre-defined scheme/structured we say it is structured data. data which is in an organized form and can be easily used by a computer program. • Relationship exist between entities of data such as classes and their objects. • When data conforms to a pre-defined schema/structured we say it is structured data.
  • 26. Sources of Structured Data Structured data Data base such as oracle,DB2,Tera data ,My SQL,etc… Spreed sheet OLTP systems
  • 27. Semi Structured Data • Semi structured data is also refered to as self describing structured I. It does not conform to the data models that one typically associates with rlational database or any others form of data tables II. It uses tag s to segregrate semantic elements
  • 28. Sources of Semi Structured Data Semi structured data XML Other mark up language JSON
  • 29. Characteristics of Semi Structured Data Semi structured data Inconsistent structured data Sell-describing Other schema information . Data objects may have different attributes
  • 30. Unstructured data • Unstructured data does not conform to any pre-defined data model.
  • 31. Dealing with Unstructured Data Dealing with un structured data Data mining Nature Language Processing (NLP) Text Analysis Noisy text analysis
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. ApacheCassandraisan opensource,distributedanddecentralized/distributed storagesystem, formanaging very large amountsof structureddata. It provideshighly availableservicewithno singlepointof failure. It is scalable,fault-tolerant,and consistent. It is a column-orienteddatabase. Its distributiondesignis basedonAmazon’s Dynamo and itsdatamodel on Google’sBigtable. Cassandra
  • 44. • Cassandra implements a Dynamo-style replication model with no single point of failure, but adds a more powerful “column family” data model. Cassandra is being used by some of the biggest companies such as Facebook, Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
  • 45. Thefollowing aresome ofthe featuresofCassandra: Elasticscalability− Cassandrais highly scalable;itallows toaddmore hardware to accommodate more customers and more dataasperrequirement. Alwaysonarchitecture− Cassandrahas no single pointoffailureand itis continuously availableforbusiness-criticalapplicationsthatcannot afforda failure. Fastlinear-scaleperformance −Cassandra islinearlyscalable,i.e.,itincreasesyour throughput as you increasethenumber ofnodesin the cluster.Thereforeitmaintains a quick responsetime. Features of Cassandra
  • 46. Flexibledatastorage−Cassandraaccommodates allpossibledataformats including: structured,semi-structured,and unstructured.Itcandynamically accommodate changes to your datastructuresaccordingto your need. Easy data distribution−Cassandraprovidesthe flexibilityto distributedatawhere you need by replicatingdataacrossmultiple datacenters. Transactionsupport −Cassandra supportspropertieslikeAtomicity, Consistency, Isolation,and Durability(ACID). Fastwrites− Cassandrawas designedtorunoncheapcommodity hardware. Itperforms blazingly fastwrites and canstorehundredsofterabytesofdata,without sacrificing the readefficiency.
  • 47. Cassandrahas peer-to-peerdistributedsystem acrossits nodes,and datais distributed among allthe nodesina cluster. All the nodesin a clusterplaythe same role.Each nodeisindependentand at the same time interconnectedto othernodes. Eachnodeina clustercanacceptreadandwrite requests,regardlessofwhere the datais actuallylocatedin the cluster. Whena nodegoesdown, read/writerequestscan beserved fromothernodesin the network. Cassandra Architecture
  • 48. Data Replicationin Cassandra In Cassandra,oneormore ofthe nodesin a clusteractasreplicasforagiven pieceof data. Ifitisdetectedthatsome ofthe nodesrespondedwith an out-of-datevalue, Cassandra will returnthe most recentvalue to the client. After returningthe most recentvalue, Cassandraperformsa readrepairin the background to updatethestalevalues.
  • 49.
  • 50. Components of Cassandra Thekeycomponents ofCassandraareas follows 1.Node 2.Datacenter 3.Cluster 4.Commit log 5.Mem-table 6.SSTable 7.Bloomfilter Node− Itisthe placewhere datais stored. Datacenter−Itis acollectionofrelatednodes.
  • 51. Cluster−Aclusteris acomponent thatcontains oneormore datacenters. Commit log−Thecommit log isa crash-recoverymechanism in Cassandra.Every write operationis written to thecommit log. Mem-table−Amem-table isa memory-resident datastructure.After commit log,the datawill bewritten tothemem-table. Sometimes, fora single-column family, therewill bemultiple mem-tables. SSTable − Itis adisk fileto which the datais flushed fromthe mem-table when its contentsreacha thresholdvalue. Bloomfilter−Thesearenothing butquick, nondeterministic,algorithms fortesting whether an element is amember ofaset.Itis aspecialkind ofcache.Bloomfiltersare accessedafterevery query.
  • 52. UserscanaccessCassandra throughitsnodesusing CassandraQuery Language (CQL). CQLtreatsthedatabaseKeyspace as a containeroftables WriteOperations Every write activity ofnodesis capturedbythe commit logswritten in the nodes. Captureddataarestoredin themem-table. Whenever themem-table isfull, datawill be written into the SStabledatafile.All writes areautomaticallypartitionedand replicated throughout the cluster ReadOperations During readoperations,Cassandragets values from themem-table and checks thebloom filtertofindthe appropriateSSTablethatholdsthe requireddata. Cassandra Query Language
  • 53. Cassandra - Data Model Thedatamodel ofCassandrais significantly different froman RDBMS. Cluster Cassandra databaseis distributedoverseveralmachines that operatetogether.The outermostcontaineris known as the Cluster. Forfailurehandling, everynodecontains areplica,and incaseofa failure,the replica takes charge. Cassandra arrangesthe nodesin acluster,in a ringformat, and assigns datato them.
  • 54. Keyspace Keyspace is the outermostcontainerfordatain Cassandra.Thebasicattributesofa Keyspace inCassandra are 1.Replicationfactor 2.Replicaplacementstrategy 3.Column families Replicationfactor−Itis thenumber ofmachines in the clusterthatwill receivecopiesof the same data.
  • 55. Replicaplacementstrategy− It isnothingbutthestrategyto placereplicasin thering. Strategies 1.Simple strategy(rack-aware strategy), 2.Old network topologystrategy(rack-aware strategy)and 3.Network topologystrategy(datacenter-sharedstrategy). Column families −Keyspace is acontainerforalistofoneormore column families.A column family, inturn,is acontainerofa collectionofrows. Eachrowcontains ordered columns. Column families representthe structureofyour data.Each keyspace has at least oneand oftenmany column families.
  • 56. Syntax Thesyntax ofcreatinga Keyspace is asfollows − Schematic viewofa Keyspace. CREATE KEYSPACE Keyspace name WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
  • 57. Column Family Acolumn family isa containerforan orderedcollectionofrows. Each row, in turn,is an orderedcollectionofcolumns. ACassandra column familyhas the following attributes − keys_cached − It represents the number of locations to keep cached per SSTable. rows_cached − It represents the number of rows whose entire contents will be cached in memory. preload_row_cache − It specifies whether you want to pre-populate the row cache.
  • 59. Column Acolumn is the basicdatastructureofCassandra with threevalues,namely key or column name, value, and a time stamp. Given belowis thestructureofacolumn. SuperColumn Asupercolumn isa specialcolumn, therefore,it isalsoakey-value pair.Buta super column storesamap ofsub-columns. Generallycolumn families arestoredondiskinindividualfiles.
  • 60. Structure of a supercolumn
  • 62.  When Big Data storages and analyzers such as MapReduce, Hive, HBase, Cassandra, Pig, etc. of the Hadoop ecosystem came into picture.  They required a tool to interact with the relational database servers for importing and exporting the Big Data residing in them.  Sqoop occupies a place in the Hadoop ecosystem to provide feasible interaction between relational database server and Hadoop’ s HDFS. Introduction
  • 63. SQOOP- DEFINITON  Sqoop: “SQL to Hadoop and Hadoop to SQL”.  Tool to transfer data from relational databases Teradata, MySQL, PostgreSQL, Oracle, Netezza.  It is provided by the Apache Software Foundation.
  • 66. SQOOP IMPORT  The import tool imports individual tables from RDBMS to HDFS.  Each row in a table is treated as a record in HDFS.  All records are stored as text data in text files or as binary data in Avro and Sequence files.
  • 67. SQOOP EXPORT  The export tool exports a set of files from HDFS back to an RDBMS.  The files given as input to Sqoop contain records, which are called as rows in table.  Those are read and parsed into a set of records and delimited with user-specified delimiter.
  • 68. FEATURES OF SQOOP o Full Load. o Incremental Load. o Parallel import/export. o Import results of SQL query. o Compression. o Connectors for all major RDBMS Databases. o Kerberos Security Integration.
  • 69. ADVANTAGES OF SQOOP  Allows the transfer of data with a variety of structured data stores like Postgres, Oracle, Teradata, and so on.  Sqoop can execute the data transfer in parallel, so execution can be quick and more cost effective.  Helps to integrate with sequential data from the mainframe.
  • 70. DISADVANTAGES OF SQOOP  It uses a JDBC connection to connect with RDBMS based data stores, and this can be inefficient and less performant.  For performing analysis, it executes various map-reduce jobs and, at times, this can be time consuming when there are lot of joins if the data is in a denormalized fashion.
  • 71. Introduction • Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on top of Hadoop to summarize Big Data, and makes querying and analyzing easy. • Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and developed it further as an open source under the name Apache Hive. It is used by different companies. For example, Amazon uses it in Amazon Elastic Map Reduce. • Hive is not A relational database A design for On Line Transaction Processing OLTP A language for real-time queries and row-level updates HIVE
  • 72. Features of Hive • It stores schema in a database and processed data into HDFS. • It is designed for OLAP. It provides SQL type language for querying called HiveQL or HQL. • It is familiar, fast, scalable, and extensible.
  • 74. Working of Hive • The following diagram depicts the workflow between Hive and Hadoop.
  • 75. • A social network is a structure between actors, mostly individuals or organizations. • It indicates the ways in which they are connected through various social familiarities , ranging from casual acquaintance to close familiar bonds. Social Network
  • 76. Society as a Graph • People are represented as nodes. • Relationship are represented as edges: relationships may be acquaintanceship , friendship , co-authorship , etc.. • Allows analysis using tools of mathematical graph theory.
  • 77. Social NetworkAnalysis Social network analysis[SNA] is the mapping and measuring of relationships and flows between people , groups , organizations , computers or other information/knowledge processing entities.
  • 78. Connections Size Number of nodes. Density Number of ties that are present/the amount of ties that could be present. Out – degree Sum of connections from an actor to other. In – degree Sum of connections of an actor.
  • 79. Distance Walk A sequence of actors and relations that begins and ends with actors. Geodesic distance The number of relations in the shortest possible walk from one actor to another. Maximum flow The amount of different actors in the neighbourhood of a source that lead to pathways to a target.
  • 80. Some measures of power and prestige Degree sum of connections from or to an actor. Closeness centrality Distance of one actor to all other in the network. Betweenness centrality Number that represents how frequently an actor is between other actors geodesic paths.
  • 81. Social network analysis : what for? To control information flow To improve/stimulate communication To improve network resilience To trust
  • 85.
  • 88. Community identification and marketing : 1. seasonal workers 2. SMEₛ 3. students 4. school children Customer lifestyle analysis: Analysis based on identifying critical life stage events using social network changes 1.going to university 2.moving 3.changing job 4.starting a relationship- moving as a couple 5.imputing demographics
  • 89. BIG DATA & IOT • Big data is more into collecting and accumulating huge data for analysis afterward, whereas IoT is about simultaneously collecting and processing data to make real-time decisions. • The internet of things, or IoT, is a system of interrelated computing devices, mechanical and digital machines, objects, animals or people that are provided with unique identifiers (UIDs) and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.
  • 90. How Big Data Powers the Internet of Things  The Internet of Things (IoT) may sound like a futuristic term, but it’s already here and increasingly woven into our everyday lives. The concept is simpler than you may think: If you have a smart TV, fridge, doorbell, or any other connected device, that’s part of the IoT . Example 1: The region’s most popular theme park has released its own app. It does more than just provide a map, schedule, and menu items (though those are important); it also uses GPS pings to identify app users in line, thus being able to display predicted wait times for rides based on density, even being able to reserve a spot or trigger attractions based on proximity.
  • 91. The Connection Between Big Data and IoT • A company’s devices are installed to use sensors for collecting and transmitting data. • That big data—sometimes pentabytes of data—is then collected, often in an repository called a data lake. Both structured data from prepared data sources (user profiles, transactional information, etc.) and unstructured data from other sources (social media archives, emails and call center notes, security camera images, licensed data, etc.) reside in the data lake. • Reports, charts, and other outputs are generated, sometimes by AI-driven analytics platforms such as Oracle Analytics • User devices provide further metrics through settings, preferences, scheduling, metadata, and other tangible transmissions, feeding back into the data lake for even heavier volumes of big data.
  • 92. What is the Internet of Things • According to the Global Standards Initiative on the Internet of Things (IoT-GSI), The Internet of Things is defined as the ‘infrastructure of the information society’. Well, simply put, it is the interconnection and the internetworking of devices, vehicles and various other embedded components which are collectively used to gather data and also analyze them in real time.
  • 93. How Does IoT help • IoT can help you manage your home in a more effective way. It helps you to keep a check on your home from a remote location. • IoT can help in better environment monitoring by analyzing the air and the water quality. • IoT can help media companies to understand the behaviour of their audience better and develop more effective content targeted towards a specific niche.
  • 94.
  • 95.
  • 96. IoT Enablers – • RFIDs: uses radio waves in order to electronically track the tags attached to each physical object. • Sensors: devices that are able to detect changes in an environment (ex: motion detectors). • Nanotechnology: as the name suggests, these are extremely small devices with dimensions usually less than a hundred nanometers. • Smart networks: (ex: mesh topology).
  • 97. Applications and domains • Application Domains: IoT is currently found in four different popular domains: • 1) Manufacturing/Industrial business - 40.2% • 2) Healthcare - 30.3% • 3) Security - 7.7% • 4) Retail - 8.3%
  • 98. ModernApplications for IOT • Smart Grids • Smart cities • Smart homes • Healthcare • Earthquake detection • Radiation detection/hazardous gas detection • Smartphone detection • Water flow monitoring
  • 99.
  • 100. Big data platforms and IOT • Context-Aware Infrastructures for the Internet of Things • A Study on Opportunistic Data Dissemination Support for the Internet of Things • Future Trends and Research Directions in Big Data Platforms for the Internet of Things
  • 101. How does IoT contribute to big data • IOT which connect the thing to the internet by using sensors, that the data used for analysis and monitoring also storing. • Cloud computing helps to store and access the data without having the larger investment in systems and software. • so the combination of both technologies can reduce both time and money.
  • 102. IoT and Big data are working together • There are many examples of big data and IoT working well together to offer analysis and insight. One such example is represented by shipping organizations. They have been utilizing big data analytics and sensor data to improve efficiency, save money and lower their environmental impact. They utilize sensors on their delivery vehicles in order to monitor engine health, number of stops, mileage, miles per gallon, and speed. • IoT and big data are creating waves in big agriculture. In this area, the field connects systems monitors to the moisture levels and transmits this data to farmers over a wireless connection. This data will enable farmers to find out when crops are reaching the optimum moisture levels.
  • 103. Big Data Technologies • Data storage • Data mining • Data analytics • Data visualisation
  • 104. Big Data Management Technologies Now let us deal with the technologies falling under each of these categories with their facts and capabilities,along with the companies which are using them.
  • 105. Data Storage • Hadoop framework was designed to store and process data in a distributed data processing environment with commodity hardware with a simple programming model. • It can store and analyse the data present in different machines with high speeds and low costs.
  • 106. Data Mining • Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. • Presto allows quering data in Hive,cassendra, relational database and proprietary data stories.
  • 107. Data Analytics • Apache Kafka is a distributed streaming paltform. A streaming platform has three key capabilities that are as follows: . publisher . Subscriber . Consumer
  • 108. Data Visualisation • Tableau is powerful and fastest growing data visualisation tool used in the business intelligence industry. • Data analysis is very fast with tableau and the visualisation created are in the form of dashboards and worksheets.