SlideShare a Scribd company logo
1 of 17
Download to read offline
BIG DATA ANALYTICS
•What is Data?
Collection of Numbers, Alphabets and Special Character.
•What is Information?
It is a Processed data like data that has been processed is called information.
E.g. 05111977 is data
05/11/1997 DOB of any person
•What is Big Data?
Big Data refers to complex and large data sets that have to be processed and
analyzed to uncover valuable information that can benefit businesses and
organizations.
However, there are certain basic tenets of Big Data that will make it even
simpler to answer what is Big Data:
• It refers to a massive amount of data that keeps on growing exponentially
with time.
• It is so voluminous that it cannot be processed or analyzed using
conventional data processing techniques.
• It includes data mining, data storage, data analysis, data sharing, and data
visualization.
• The term is an all-comprehensive one including data, data frameworks,
along with the tools and techniques used to process and analyze the data.
The History of Big Data
Although the concept of big data itself is relatively new, the origins of
large data sets go back to the 1960s and '70s when the world of data was
just getting started with the first data centers and the development of the
relational database.
Around 2005, people began to realize just how much data users generated
through Facebook, YouTube, and other online services then an
open-source framework created specifically to store and analyze big data
sets such as Hadoop and more recently, Spark e they make big data easier
to work with and cheaper to store.
Users are still generating huge amounts of data—but it’s not just humans
who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices
are connected to the internet, gathering data on customer usage patterns
and product performance.
Difference Between Small Data and Big Data:
Small Data Big Data
Mostly Structured Mostly Unstructured
Data Store in MB,GB,TB Store in PB, EB..
Increases Gradually Increase Exonentially
Store data locally present,
centralized
Store data Globally present,
Distributed
S/w-SQL, Oracale Hadoop, Spark, Big Query
Single Node Multinode Cluster
Why is Big Data Important?
The importance of big data does not revolve around how much data a
company has but how a company utilizes the collected data. Every
company uses data in its own way; the more efficiently a company uses its
data, the more potential it has to grow. The company can take data from any
source and analyze it to find answers which will enable:
• Cost Saving : Some tools of Big Data like Hadoop and Cloud-Based
Analytics can bring cost advantages to business when large amounts of data
are to be stored and these tools also help in identifying more efficient ways
of doing business.
•Time Reduction: The high speed of tools like Hadoop and in-memory
analytics can easily identify new sources of data which helps businesses
analyzing data immediately and make quick decisions based on the
learning.
• Understand the market conditions:
By analyzing big data you can get a better understanding of current market
conditions. For example, by analyzing customers’ purchasing behaviors, a
company can find out the products that are sold the most and produce products
according to this trend.
•Control online reputation:
Big data tools can do sentiment analysis. Therefore, you can get feedback about
who is saying what about your company. If you want to monitor and improve the
online presence of your business, then, big data tools can help in all this.
•Using Big Data Analytics to Boost Customer Acquisition:
The customer is the most important asset any business depends on. There is no
single business that can claim success without first having to establish a solid
customer base. The use of big data allows businesses to observe various customer
related patterns and trends.
•Big Data Analytics As a Driver of Innovations and Product Development
Another huge advantage of big data is the ability to help companies innovate and
redevelop their products.
Characteristics of Big Data:
3 ‘V’s of Big Data – Variety, Velocity, and Volume are 3 defining properties or
dimensions of Big Data.
•Variety :
variety of Big Data refers to structured, unstructured, and semi-structured data
that is gathered from multiple sources. While in the past, data could only be
collected from spreadsheets and databases, today data comes in an array of forms
such as emails, PDFs, photos, videos, audios, SM posts, and so much more.
Variety is one of the important characteristics of big data.
•Velocity :
Velocity essentially refers to the speed at which data is being created in real-time.
Initially Companies analyzed data using a Batch Process.ones take a chunk of
data and submit a job to the server and waits for the delivery of results, this
scheme works when the incoming data rate is slower than the batch processing
rate but with the new sources of data such as social media, mobile applications the
batch process breaks down.
Volume:
Volume is one of the characteristics of big data. We already know that Big Data
indicates huge ‘volumes’ of data that is being generated on a daily basis from
various sources like social media platforms, business processes, machines,
networks, human interactions, etc. Such a large amount of data is stored in data
warehouses. Thus comes to the end of characteristics of big data.
Types of Big Data:
Big data is classified in three ways:
•Structured Data.
•Unstructured Data.
•Semi-Structured Data.
Structured data:
Structured is one of the types of big data and By structured data, we mean data
that can be processed, stored, and retrieved in a fixed format.
Structured data follows schemas like A payroll database will lay out employee
identification information, pay rate, hours worked, how compensation is
delivered, etc.
Unstructured :
Unstructured data refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and
analyze unstructured data. Email is an example of unstructured data. Structured
and unstructured are two important types of big data.
Semi structured:
It is the third type of big data. Semi-structured data pertains to the data containing
both the formats mentioned above, that is, structured and unstructured data.
E.g.
you take a picture of your cat from your phone. It automatically logs the time the
picture was taken, the GPS data at the time of the capture and your device ID.
If you’re using any kind of web service for storage, like icloud, your account info
becomes attached to the file.
If you send an email, the time sent, email addresses to and from, the IP address
from the device sent from, and other pieces of information are linked to the actual
content of the email.
In both scenarios, the actual content (i.e. the pixels that compose the photo and
the characters that make up the email) is not structured, but there are components
that allow the data to be grouped based on certain characteristics.
Unstructured Data and Big Data
Unstructured data is the opposite of structured data. Structured data generally
resides in a relational database, and as a result, it is sometimes called "relational
data.“
Structured data has been or can be placed in this fields . Unstructured data is not
relational and doesn't fit into these sorts of pre-defined data models.
The term "big data" is closely associated with unstructured data. Big data refers
to extremely large datasets that are dificult to analyze with traditional tools.
A Convergence of Key Trends
Companies have always kept large amounts of information. But until recently,
they stored most of that information on tape.
While it ’s true that the amount of data in the world keeps growing, the real
change has been in the ways that we access that data and use it to create value.
Today, we have technologies like Hadoop, for example, that make it functionally
practical to access a tremendous amount of data, and then extract value from it.
The availability of lower-cost hardware makes it easier and more feasible to
retrieve and process information, quickly and at lower costs than ever before.
So it ’s the convergence of several trends—more data and less expensive, faster
hardware—that ’s driving this transformation. Today, we’ve got raw speed at an
affordable price. That cost/benefit has really been a game changer for us.
Next is the ability to do that real-time analysis on very complex sets of data and
models.
A Perfect example:
You go into a store to buy a pair of pants. You take the pants up to the cash
register and the clerk asks you if you would like to save 10 percent off your
purchase by signing up for the store ’s credit card.
99.9 percent of the time, you ’re going to say “no.” But now let ’s imagine if the
store could automatically look at all of my past purchases and see what other
items I bought when I came in to buy a pair of pants—and then offer me 50
percent off a similar purchase?
Now that would be relevant to me. The store isn’t offering me another credit
card—it ’s offering me something that I probably want, at an attractive price.
What is Big Data Analytics?
Big Data analytics is a process used to extract meaningful insights, such as hidden
patterns, unknown correlations, market trends, and customer preferences.
Why is big data analytics important?
In today’s world, Big Data analytics is fueling everything we do online—in every
industry.
Take the music streaming platform Spotify for example.
The company has nearly 96 million users that generate a tremendous amount of
data every day.
Through this information, the cloud-based platform automatically generates
suggested songs—through a smart recommendation engine—based on likes,
shares, search history, and more.
What enables this is the techniques, tools, and frameworks that are a result of Big
Data analytics.
Benefits & Advantages of Big Data Analytics
1. Risk Management :
2. Product Development and Innovations
3. Quicker and Better Decision Making Within Organizations
4. Improve Customer Experience
The Lifecycle Phases of Big Data Analytics
Now, let’s review how Big Data analytics works:
Stage 1-Business case evaluation - The Big Data analytics lifecycle begins with
a business case, which defines the reason and goal behind the analysis.
Stage 2 - Identification of data - Here, a broad variety of data sources are
identified.
Stage 3 - Data filtering - All of the identified data from the previous stage is
filtered here to remove corrupt data.
Stage 4 - Data extraction - Data that is not compatible with the tool is extracted
and then transformed into a compatible form.
Stage 5 - Data aggregation - In this stage, data with the same fields across
different datasets are integrated.
Stage 6 - Data analysis - Data is evaluated using analytical and statistical tools to
discover useful information.
Stage 7 - Visualization of data - With tools like Tableau, Power BI, and QlikView,
Big Data analysts can produce graphic visualizations of the analysis.
Stage 8 - Final analysis result - This is the last step of the Big Data analytics
lifecycle, where the final results of the analysis are made available to business
stakeholders who will take action.

More Related Content

Similar to Bda assignment can also be used for BDA notes and concept understanding.

Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.saranya270513
 
big data.pptx
big data.pptxbig data.pptx
big data.pptxVirajSaud
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfPridesys IT Ltd.
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data scienceJohnson Ubah
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfPridesys IT Ltd.
 
Big Data - Everything you need to know
Big Data - Everything you need to knowBig Data - Everything you need to know
Big Data - Everything you need to knowV2Soft
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Kavika Roy
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfRanjeet Bhalshankar
 
A Primer for a layman about Big Data, Business Analytics and Cloud
A Primer for a layman  about Big Data, Business Analytics and CloudA Primer for a layman  about Big Data, Business Analytics and Cloud
A Primer for a layman about Big Data, Business Analytics and CloudRajagopalan V
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellenceMudit Mangal
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPDr Geetha Mohan
 
Big data for small businesses
Big data for small businessesBig data for small businesses
Big data for small businessesTabor Consulting
 
Small data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsSmall data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsAhmed Banafa
 

Similar to Bda assignment can also be used for BDA notes and concept understanding. (20)

Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 
Introduction to big data – convergences.
Introduction to big data – convergences.Introduction to big data – convergences.
Introduction to big data – convergences.
 
new.pptx
new.pptxnew.pptx
new.pptx
 
big data.pptx
big data.pptxbig data.pptx
big data.pptx
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Big Data - Everything you need to know
Big Data - Everything you need to knowBig Data - Everything you need to know
Big Data - Everything you need to know
 
Bigdata Hadoop introduction
Bigdata Hadoop introductionBigdata Hadoop introduction
Bigdata Hadoop introduction
 
Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!Converting Big Data To Smart Data | The Step-By-Step Guide!
Converting Big Data To Smart Data | The Step-By-Step Guide!
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
Unit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdfUnit No2 Introduction to big data.pdf
Unit No2 Introduction to big data.pdf
 
big-data.pdf
big-data.pdfbig-data.pdf
big-data.pdf
 
A Primer for a layman about Big Data, Business Analytics and Cloud
A Primer for a layman  about Big Data, Business Analytics and CloudA Primer for a layman  about Big Data, Business Analytics and Cloud
A Primer for a layman about Big Data, Business Analytics and Cloud
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellence
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Big data for small businesses
Big data for small businessesBig data for small businesses
Big data for small businesses
 
Small data vs. Big data : back to the basics
Small data vs. Big data : back to the basicsSmall data vs. Big data : back to the basics
Small data vs. Big data : back to the basics
 
Big data vs datawarehousing
Big data vs datawarehousingBig data vs datawarehousing
Big data vs datawarehousing
 
Big data vs datawarehousing
Big data vs datawarehousingBig data vs datawarehousing
Big data vs datawarehousing
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Recently uploaded (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Bda assignment can also be used for BDA notes and concept understanding.

  • 1. BIG DATA ANALYTICS •What is Data? Collection of Numbers, Alphabets and Special Character. •What is Information? It is a Processed data like data that has been processed is called information. E.g. 05111977 is data 05/11/1997 DOB of any person
  • 2. •What is Big Data? Big Data refers to complex and large data sets that have to be processed and analyzed to uncover valuable information that can benefit businesses and organizations. However, there are certain basic tenets of Big Data that will make it even simpler to answer what is Big Data: • It refers to a massive amount of data that keeps on growing exponentially with time. • It is so voluminous that it cannot be processed or analyzed using conventional data processing techniques. • It includes data mining, data storage, data analysis, data sharing, and data visualization. • The term is an all-comprehensive one including data, data frameworks, along with the tools and techniques used to process and analyze the data.
  • 3. The History of Big Data Although the concept of big data itself is relatively new, the origins of large data sets go back to the 1960s and '70s when the world of data was just getting started with the first data centers and the development of the relational database. Around 2005, people began to realize just how much data users generated through Facebook, YouTube, and other online services then an open-source framework created specifically to store and analyze big data sets such as Hadoop and more recently, Spark e they make big data easier to work with and cheaper to store. Users are still generating huge amounts of data—but it’s not just humans who are doing it. With the advent of the Internet of Things (IoT), more objects and devices are connected to the internet, gathering data on customer usage patterns and product performance.
  • 4. Difference Between Small Data and Big Data: Small Data Big Data Mostly Structured Mostly Unstructured Data Store in MB,GB,TB Store in PB, EB.. Increases Gradually Increase Exonentially Store data locally present, centralized Store data Globally present, Distributed S/w-SQL, Oracale Hadoop, Spark, Big Query Single Node Multinode Cluster
  • 5. Why is Big Data Important? The importance of big data does not revolve around how much data a company has but how a company utilizes the collected data. Every company uses data in its own way; the more efficiently a company uses its data, the more potential it has to grow. The company can take data from any source and analyze it to find answers which will enable: • Cost Saving : Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring cost advantages to business when large amounts of data are to be stored and these tools also help in identifying more efficient ways of doing business. •Time Reduction: The high speed of tools like Hadoop and in-memory analytics can easily identify new sources of data which helps businesses analyzing data immediately and make quick decisions based on the learning.
  • 6. • Understand the market conditions: By analyzing big data you can get a better understanding of current market conditions. For example, by analyzing customers’ purchasing behaviors, a company can find out the products that are sold the most and produce products according to this trend. •Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get feedback about who is saying what about your company. If you want to monitor and improve the online presence of your business, then, big data tools can help in all this. •Using Big Data Analytics to Boost Customer Acquisition: The customer is the most important asset any business depends on. There is no single business that can claim success without first having to establish a solid customer base. The use of big data allows businesses to observe various customer related patterns and trends. •Big Data Analytics As a Driver of Innovations and Product Development Another huge advantage of big data is the ability to help companies innovate and redevelop their products.
  • 7. Characteristics of Big Data: 3 ‘V’s of Big Data – Variety, Velocity, and Volume are 3 defining properties or dimensions of Big Data. •Variety : variety of Big Data refers to structured, unstructured, and semi-structured data that is gathered from multiple sources. While in the past, data could only be collected from spreadsheets and databases, today data comes in an array of forms such as emails, PDFs, photos, videos, audios, SM posts, and so much more. Variety is one of the important characteristics of big data. •Velocity : Velocity essentially refers to the speed at which data is being created in real-time. Initially Companies analyzed data using a Batch Process.ones take a chunk of data and submit a job to the server and waits for the delivery of results, this scheme works when the incoming data rate is slower than the batch processing rate but with the new sources of data such as social media, mobile applications the batch process breaks down.
  • 8. Volume: Volume is one of the characteristics of big data. We already know that Big Data indicates huge ‘volumes’ of data that is being generated on a daily basis from various sources like social media platforms, business processes, machines, networks, human interactions, etc. Such a large amount of data is stored in data warehouses. Thus comes to the end of characteristics of big data. Types of Big Data: Big data is classified in three ways: •Structured Data. •Unstructured Data. •Semi-Structured Data.
  • 9.
  • 10. Structured data: Structured is one of the types of big data and By structured data, we mean data that can be processed, stored, and retrieved in a fixed format. Structured data follows schemas like A payroll database will lay out employee identification information, pay rate, hours worked, how compensation is delivered, etc. Unstructured : Unstructured data refers to the data that lacks any specific form or structure whatsoever. This makes it very difficult and time-consuming to process and analyze unstructured data. Email is an example of unstructured data. Structured and unstructured are two important types of big data.
  • 11. Semi structured: It is the third type of big data. Semi-structured data pertains to the data containing both the formats mentioned above, that is, structured and unstructured data. E.g. you take a picture of your cat from your phone. It automatically logs the time the picture was taken, the GPS data at the time of the capture and your device ID. If you’re using any kind of web service for storage, like icloud, your account info becomes attached to the file. If you send an email, the time sent, email addresses to and from, the IP address from the device sent from, and other pieces of information are linked to the actual content of the email. In both scenarios, the actual content (i.e. the pixels that compose the photo and the characters that make up the email) is not structured, but there are components that allow the data to be grouped based on certain characteristics.
  • 12. Unstructured Data and Big Data Unstructured data is the opposite of structured data. Structured data generally resides in a relational database, and as a result, it is sometimes called "relational data.“ Structured data has been or can be placed in this fields . Unstructured data is not relational and doesn't fit into these sorts of pre-defined data models. The term "big data" is closely associated with unstructured data. Big data refers to extremely large datasets that are dificult to analyze with traditional tools.
  • 13. A Convergence of Key Trends Companies have always kept large amounts of information. But until recently, they stored most of that information on tape. While it ’s true that the amount of data in the world keeps growing, the real change has been in the ways that we access that data and use it to create value. Today, we have technologies like Hadoop, for example, that make it functionally practical to access a tremendous amount of data, and then extract value from it. The availability of lower-cost hardware makes it easier and more feasible to retrieve and process information, quickly and at lower costs than ever before. So it ’s the convergence of several trends—more data and less expensive, faster hardware—that ’s driving this transformation. Today, we’ve got raw speed at an affordable price. That cost/benefit has really been a game changer for us.
  • 14. Next is the ability to do that real-time analysis on very complex sets of data and models. A Perfect example: You go into a store to buy a pair of pants. You take the pants up to the cash register and the clerk asks you if you would like to save 10 percent off your purchase by signing up for the store ’s credit card. 99.9 percent of the time, you ’re going to say “no.” But now let ’s imagine if the store could automatically look at all of my past purchases and see what other items I bought when I came in to buy a pair of pants—and then offer me 50 percent off a similar purchase? Now that would be relevant to me. The store isn’t offering me another credit card—it ’s offering me something that I probably want, at an attractive price.
  • 15. What is Big Data Analytics? Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Why is big data analytics important? In today’s world, Big Data analytics is fueling everything we do online—in every industry. Take the music streaming platform Spotify for example. The company has nearly 96 million users that generate a tremendous amount of data every day. Through this information, the cloud-based platform automatically generates suggested songs—through a smart recommendation engine—based on likes, shares, search history, and more. What enables this is the techniques, tools, and frameworks that are a result of Big Data analytics.
  • 16. Benefits & Advantages of Big Data Analytics 1. Risk Management : 2. Product Development and Innovations 3. Quicker and Better Decision Making Within Organizations 4. Improve Customer Experience The Lifecycle Phases of Big Data Analytics Now, let’s review how Big Data analytics works: Stage 1-Business case evaluation - The Big Data analytics lifecycle begins with a business case, which defines the reason and goal behind the analysis. Stage 2 - Identification of data - Here, a broad variety of data sources are identified. Stage 3 - Data filtering - All of the identified data from the previous stage is filtered here to remove corrupt data.
  • 17. Stage 4 - Data extraction - Data that is not compatible with the tool is extracted and then transformed into a compatible form. Stage 5 - Data aggregation - In this stage, data with the same fields across different datasets are integrated. Stage 6 - Data analysis - Data is evaluated using analytical and statistical tools to discover useful information. Stage 7 - Visualization of data - With tools like Tableau, Power BI, and QlikView, Big Data analysts can produce graphic visualizations of the analysis. Stage 8 - Final analysis result - This is the last step of the Big Data analytics lifecycle, where the final results of the analysis are made available to business stakeholders who will take action.