DBA Basics: Getting Started with Performance Tuning.pdf
Bda assignment can also be used for BDA notes and concept understanding.
1. BIG DATA ANALYTICS
•What is Data?
Collection of Numbers, Alphabets and Special Character.
•What is Information?
It is a Processed data like data that has been processed is called information.
E.g. 05111977 is data
05/11/1997 DOB of any person
2. •What is Big Data?
Big Data refers to complex and large data sets that have to be processed and
analyzed to uncover valuable information that can benefit businesses and
organizations.
However, there are certain basic tenets of Big Data that will make it even
simpler to answer what is Big Data:
• It refers to a massive amount of data that keeps on growing exponentially
with time.
• It is so voluminous that it cannot be processed or analyzed using
conventional data processing techniques.
• It includes data mining, data storage, data analysis, data sharing, and data
visualization.
• The term is an all-comprehensive one including data, data frameworks,
along with the tools and techniques used to process and analyze the data.
3. The History of Big Data
Although the concept of big data itself is relatively new, the origins of
large data sets go back to the 1960s and '70s when the world of data was
just getting started with the first data centers and the development of the
relational database.
Around 2005, people began to realize just how much data users generated
through Facebook, YouTube, and other online services then an
open-source framework created specifically to store and analyze big data
sets such as Hadoop and more recently, Spark e they make big data easier
to work with and cheaper to store.
Users are still generating huge amounts of data—but it’s not just humans
who are doing it.
With the advent of the Internet of Things (IoT), more objects and devices
are connected to the internet, gathering data on customer usage patterns
and product performance.
4. Difference Between Small Data and Big Data:
Small Data Big Data
Mostly Structured Mostly Unstructured
Data Store in MB,GB,TB Store in PB, EB..
Increases Gradually Increase Exonentially
Store data locally present,
centralized
Store data Globally present,
Distributed
S/w-SQL, Oracale Hadoop, Spark, Big Query
Single Node Multinode Cluster
5. Why is Big Data Important?
The importance of big data does not revolve around how much data a
company has but how a company utilizes the collected data. Every
company uses data in its own way; the more efficiently a company uses its
data, the more potential it has to grow. The company can take data from any
source and analyze it to find answers which will enable:
• Cost Saving : Some tools of Big Data like Hadoop and Cloud-Based
Analytics can bring cost advantages to business when large amounts of data
are to be stored and these tools also help in identifying more efficient ways
of doing business.
•Time Reduction: The high speed of tools like Hadoop and in-memory
analytics can easily identify new sources of data which helps businesses
analyzing data immediately and make quick decisions based on the
learning.
6. • Understand the market conditions:
By analyzing big data you can get a better understanding of current market
conditions. For example, by analyzing customers’ purchasing behaviors, a
company can find out the products that are sold the most and produce products
according to this trend.
•Control online reputation:
Big data tools can do sentiment analysis. Therefore, you can get feedback about
who is saying what about your company. If you want to monitor and improve the
online presence of your business, then, big data tools can help in all this.
•Using Big Data Analytics to Boost Customer Acquisition:
The customer is the most important asset any business depends on. There is no
single business that can claim success without first having to establish a solid
customer base. The use of big data allows businesses to observe various customer
related patterns and trends.
•Big Data Analytics As a Driver of Innovations and Product Development
Another huge advantage of big data is the ability to help companies innovate and
redevelop their products.
7. Characteristics of Big Data:
3 ‘V’s of Big Data – Variety, Velocity, and Volume are 3 defining properties or
dimensions of Big Data.
•Variety :
variety of Big Data refers to structured, unstructured, and semi-structured data
that is gathered from multiple sources. While in the past, data could only be
collected from spreadsheets and databases, today data comes in an array of forms
such as emails, PDFs, photos, videos, audios, SM posts, and so much more.
Variety is one of the important characteristics of big data.
•Velocity :
Velocity essentially refers to the speed at which data is being created in real-time.
Initially Companies analyzed data using a Batch Process.ones take a chunk of
data and submit a job to the server and waits for the delivery of results, this
scheme works when the incoming data rate is slower than the batch processing
rate but with the new sources of data such as social media, mobile applications the
batch process breaks down.
8. Volume:
Volume is one of the characteristics of big data. We already know that Big Data
indicates huge ‘volumes’ of data that is being generated on a daily basis from
various sources like social media platforms, business processes, machines,
networks, human interactions, etc. Such a large amount of data is stored in data
warehouses. Thus comes to the end of characteristics of big data.
Types of Big Data:
Big data is classified in three ways:
•Structured Data.
•Unstructured Data.
•Semi-Structured Data.
9.
10. Structured data:
Structured is one of the types of big data and By structured data, we mean data
that can be processed, stored, and retrieved in a fixed format.
Structured data follows schemas like A payroll database will lay out employee
identification information, pay rate, hours worked, how compensation is
delivered, etc.
Unstructured :
Unstructured data refers to the data that lacks any specific form or structure
whatsoever. This makes it very difficult and time-consuming to process and
analyze unstructured data. Email is an example of unstructured data. Structured
and unstructured are two important types of big data.
11. Semi structured:
It is the third type of big data. Semi-structured data pertains to the data containing
both the formats mentioned above, that is, structured and unstructured data.
E.g.
you take a picture of your cat from your phone. It automatically logs the time the
picture was taken, the GPS data at the time of the capture and your device ID.
If you’re using any kind of web service for storage, like icloud, your account info
becomes attached to the file.
If you send an email, the time sent, email addresses to and from, the IP address
from the device sent from, and other pieces of information are linked to the actual
content of the email.
In both scenarios, the actual content (i.e. the pixels that compose the photo and
the characters that make up the email) is not structured, but there are components
that allow the data to be grouped based on certain characteristics.
12. Unstructured Data and Big Data
Unstructured data is the opposite of structured data. Structured data generally
resides in a relational database, and as a result, it is sometimes called "relational
data.“
Structured data has been or can be placed in this fields . Unstructured data is not
relational and doesn't fit into these sorts of pre-defined data models.
The term "big data" is closely associated with unstructured data. Big data refers
to extremely large datasets that are dificult to analyze with traditional tools.
13. A Convergence of Key Trends
Companies have always kept large amounts of information. But until recently,
they stored most of that information on tape.
While it ’s true that the amount of data in the world keeps growing, the real
change has been in the ways that we access that data and use it to create value.
Today, we have technologies like Hadoop, for example, that make it functionally
practical to access a tremendous amount of data, and then extract value from it.
The availability of lower-cost hardware makes it easier and more feasible to
retrieve and process information, quickly and at lower costs than ever before.
So it ’s the convergence of several trends—more data and less expensive, faster
hardware—that ’s driving this transformation. Today, we’ve got raw speed at an
affordable price. That cost/benefit has really been a game changer for us.
14. Next is the ability to do that real-time analysis on very complex sets of data and
models.
A Perfect example:
You go into a store to buy a pair of pants. You take the pants up to the cash
register and the clerk asks you if you would like to save 10 percent off your
purchase by signing up for the store ’s credit card.
99.9 percent of the time, you ’re going to say “no.” But now let ’s imagine if the
store could automatically look at all of my past purchases and see what other
items I bought when I came in to buy a pair of pants—and then offer me 50
percent off a similar purchase?
Now that would be relevant to me. The store isn’t offering me another credit
card—it ’s offering me something that I probably want, at an attractive price.
15. What is Big Data Analytics?
Big Data analytics is a process used to extract meaningful insights, such as hidden
patterns, unknown correlations, market trends, and customer preferences.
Why is big data analytics important?
In today’s world, Big Data analytics is fueling everything we do online—in every
industry.
Take the music streaming platform Spotify for example.
The company has nearly 96 million users that generate a tremendous amount of
data every day.
Through this information, the cloud-based platform automatically generates
suggested songs—through a smart recommendation engine—based on likes,
shares, search history, and more.
What enables this is the techniques, tools, and frameworks that are a result of Big
Data analytics.
16. Benefits & Advantages of Big Data Analytics
1. Risk Management :
2. Product Development and Innovations
3. Quicker and Better Decision Making Within Organizations
4. Improve Customer Experience
The Lifecycle Phases of Big Data Analytics
Now, let’s review how Big Data analytics works:
Stage 1-Business case evaluation - The Big Data analytics lifecycle begins with
a business case, which defines the reason and goal behind the analysis.
Stage 2 - Identification of data - Here, a broad variety of data sources are
identified.
Stage 3 - Data filtering - All of the identified data from the previous stage is
filtered here to remove corrupt data.
17. Stage 4 - Data extraction - Data that is not compatible with the tool is extracted
and then transformed into a compatible form.
Stage 5 - Data aggregation - In this stage, data with the same fields across
different datasets are integrated.
Stage 6 - Data analysis - Data is evaluated using analytical and statistical tools to
discover useful information.
Stage 7 - Visualization of data - With tools like Tableau, Power BI, and QlikView,
Big Data analysts can produce graphic visualizations of the analysis.
Stage 8 - Final analysis result - This is the last step of the Big Data analytics
lifecycle, where the final results of the analysis are made available to business
stakeholders who will take action.