BIG DATA ANALYTICS
USING R
Analytics is the combination of mathematical, statistical, and heuristic techniques to glean useful insights from data and to implement actions derived from those insights.
Big Data Analytics servicesWe offer our service of Big Data Analytics for you to be able to see further progress and business prospects. To gain an insight into marketing trends and always be one step ahead of your business rivals, we resort to the following tools:
Data mining.We make your data meaningful to predict future outcomes.
StatisticsWe use statistics to measure the quality of data, define uncertainties and extract only accurate data.
Data modelingWe structure data in order so that it can feet the needs of application
Machine learningWe use machine learning to gather, integrate and process huge volumes of data.
Database managementOur services also include database management, which allows to collect, track and store stream of data, build data warehouses and make a data processing efficient. More than that, you can also receive support and maintenance of your database software, if there is such a need.
Big data visualizationBig data visualization promotes better understanding of the whole data, by breaking it into pieces with the help of colors, graphs, symbols etc.
Business IntelligenceUse Business Intelligence services to receive the assessment and summary of current situations from the point of view of market trends, financial reporting, budget planning, customer analysis and many more.
3. TABLE OF CONTENTS:
• WHAT IS BIG DATAANALYTICS?
• DATA SOURCES OF BIG DATA
• WHY DO WE NEED BIG DATAANALYTICS?
• STAGES OF BIG DATAANALYTICS
• TYPES OF BIG DATAANALYTICS
• TOOLS USED IN BIG DATAANALYTICS
• DOMAINS USING BIG DATAANALYTICS
• HISTORY OF R
• ABOUT R LANGUAGE
• FEATURES OF R
• REASONS TO LEARN R
• APPLICATIONS OF R PROGRAMMING
• INSTALLATION OF R
• COMPANIES USING R
• R VS PYTHON
• SKILLS FOR DATAANALYST
4. WHAT IS BIG DATA ANALYTICS?
• Big data analytics is the often complex process of
examining big data to uncover information, such as
hidden patterns, correlations, market trends and
customer preferences, that can help organizations
make informed business decisions.
• Big data analytics helps businesses to get insights
from today's huge data resources.
• Social media, cloud applications, and machine
sensor data are just some examples.
6. WHY DO WE NEED BIG DATA ANALYTICS?
• Making Smarter and More Efficient
Organization
• Optimize Business Operations by
Analyzing Customer Behavior
• Cost Reduction
• New Generation Products
• Detect Risks and Check Frauds
19. History OF R
• R was created by Ross Ihaka and Robert Gentleman
in the University of Auckland, New Zealand, 1993.
• This programming language name is taken from the
name of both the developers.
• The R language was closely modeled on the S
Language for Statistical Computing conceived by
John Chambers, Rick Becker, Trevor Hastie, Allan
Wilks and others at Bell Labs in the mid 1970s.
• In 1995, statistician Martin Mächler convinced Ihaka
and Gentleman to make R free and open-source
software under the General Public License.
20. About R language
• R is a interpreted computer programming language.
• R is a popular choice for data analysis, statistical
computing and graphical representation.
• R is a programming language and software
environment for statistical computing and graphics.
• The R programming language comprises packages
and environments making analytics easier.
• R can be downloaded and installed from CRAN
website , CRAN stands for Comprehensive R Archive
Network.
21. Features of R
• Open source: R is an open source programming language. It is
completely free for anybody to use.
• Varity of packages: There are more than 15,000 packages for R
on online repositories like CRAN, GitHub.
• Powerful Graphics: R’s graphical capabilities are amazing. It
can make graphs of any type with its packages.
• Cross platform support: R is cross platform supportive that can
run on any Operating system and any software environment
without any hassle.
• No need for a Compiler: the R is interpreted language. It does
not need a compiler to convert the code into a program.
• Perform Fast Calculation: Through R, you can perform a wide
variety of complex operations on arrays, data frames, vectors
and other data objects of varying sizes.
22. Reason to learn R
• Open-source and Free Tool
• Strong Graphical Capabilities
• Highly Active Community
• A Wide Selection of Packages
• Comprehensive Environment
• Can Perform Complex Statistical Calculations
• Running Code Without a Compiler
• Interacting with Databases
• Cross-platform Support
• 2 Million jobs are opening for R programmer
23. Applications of R Programming
• R is used in finance and banking sectors for detecting fraud, reducing customer
churn rate and for making future decisions.
• R is also used by bioinformatics to analyze strands of genetic sequences, for
performing drug discovery and also in computational neuroscience.
• R is used in social media analysis to discover potential customers in online
advertising. Companies also use social media information to analyze customer
sentiments for making their products better.
• E-Commerce companies make use of R to analyze the purchases made by the
customers as well as their feedbacks.
• Manufacturing companies use R to analyze customer feedback. They also use it
to predict future demand to adjust their manufacturing speeds and maximize
profits.
24. Companies Using R
Some of the companies that
are using R programming
are as follows:
• Facebook
• Google
• Ford
• Twitter
• ANZ
• Microsoft
30. R PYTHON
First appeared in 1993 First appeared in 1991
It has more functions and packages It has less functions and packages
It is an interpreter base language It is an interpreter base language
It is statistical design and graphics
programming language
It is general purpose language
It is difficult to learn and understand It is easy to understand
R is mostly use for data analysis Generic programming, tasks such as
design of software
31. Skills for Data Analyst
MACHINE
LEARNIG
MS OFFICE
SQL
PRESENTATI
ON SKILLS
CRITICAL
THINKING
R or PYTHON
DATA
VISUALIZATI
ON
Editor's Notes
Analytics is the combination of mathematical, statistical, and heuristic techniques to glean useful insights from data and to implement actions derived from those insights.
Big Data Analytics servicesWe offer our service of Big Data Analytics for you to be able to see further progress and business prospects. To gain an insight into marketing trends and always be one step ahead of your business rivals, we resort to the following tools:
Data mining.We make your data meaningful to predict future outcomes.
StatisticsWe use statistics to measure the quality of data, define uncertainties and extract only accurate data.
Data modelingWe structure data in order so that it can feet the needs of application
Machine learningWe use machine learning to gather, integrate and process huge volumes of data.
Database managementOur services also include database management, which allows to collect, track and store stream of data, build data warehouses and make a data processing efficient. More than that, you can also receive support and maintenance of your database software, if there is such a need.
Big data visualizationBig data visualization promotes better understanding of the whole data, by breaking it into pieces with the help of colors, graphs, symbols etc.
Business IntelligenceUse Business Intelligence services to receive the assessment and summary of current situations from the point of view of market trends, financial reporting, budget planning, customer analysis and many more.
Enterprise resource planning (ERP) refers to a type of software that organizations use to manage day-to-day business activities such as accounting, procurement, project management, risk management and compliance, and supply chain operations.
Let me tell you about one such organization, the New York Police Department (NYPD). The NYPD brilliantly uses Big Data analytics to detect and identify crimes before they occur. They analyze historical arrest patterns and then maps them with events such as federal holidays, paydays, traffic flows, rainfall etc. This aids them in analyzing the information immediately by utilizing these data patterns. Big Data analytics strategy helps them identify crime locations, through which they deploy their officers to these locations. Thus by reaching these locations before the crimes were committed, they prevent the occurrence of crime.
Most organizations use behavioral analytics of customers in order to provide customer satisfaction and hence, increase their customer base. The best example of this is Amazon. Amazon is one of the best and most widely used e-commerce websites with a customer base of about 300 million. They use customer click-stream data and historical purchase data to provide them with customized results on customized web pages. Analyzing the clicks of every visitor on their website aids them in understanding their site-navigation behavior, paths the user took to buy the product, paths that led them to leave the site and more. All this information helps Amazon to improve their user experience, thereby improving their sales and marketing.
Descriptive Analytics: It uses data aggregation and data mining to provide insight into the past and answer: “What has happened?” The descriptive analytics does exactly what the name implies they “describe” or summarize raw data and make it interpretable by humans.
Diagnostic Analytics: It is used to determine why something happened in the past. It is characterized by techniques such as drill-down, data discovery, data mining and correlations. Diagnostic analytics takes a deeper look at data to understand the root causes of the events.
Predictive Analytics: It uses statistical models and forecasts techniques to understand the future and answer: “What could happen?” Predictive analytics provides companies with actionable insights based on data. It provides estimates about the likelihood of a future outcome.
Prescriptive Analytics: It uses optimization and simulation algorithms to advice on possible outcomes and answers: “What should we do?” It allows users to “prescribe” a number of different possible actions and guide them towards a solution. In a nutshell, this analytics is all about providing advice.
R and Python are the top programming languages used in the Data Analytics field. R is an open-source tool used for Statistics and Analytics whereas Python is a high level, an interpreted language that has an easy syntax and dynamic semantics.
QlikView is a Self-Service Business Intelligence, Data Visualization, and Data Analytics tool. Being named a leader in Gartner Magic Quadrant 2020 for Analytics and BI platforms, it aims to accelerate business value through data by providing features such as Data Integration, Data Literacy, and Data Analytics.
Power BI is a Microsoft product used for business analytics. Named as a leader for the 13th consecutive year in the Gartner 2020 Magic Quadrant, it provides interactive visualizations with self-service business intelligence capabilities, where end users can create dashboards and reports by themselves, without having to depend on anybody.
Apache Spark is one of the most successful projects in the Apache Software Foundation and is a cluster computing framework that is open-source and is used for real-time processing. Being the most active Apache project at the moment, it comes with a fantastic open-source community and an interface for programming. This interface makes sure of fault tolerance and implicit data parallelism.
Tableau is a market-leading Business Intelligence tool used to analyze and visualize data in an easy format. Being named as a leader in the Gartner Magic Quadrant 2020 For the eighth consecutive year, Tableau allows you to work on live data-set and spend more time on Data Analysis rather than Data Wrangling.
Healthcare: Healthcare is using data analytics to reduce costs, predict epidemics, avoid preventable diseases and improve the quality of life in general. One of the most widespread applications of Big Data in healthcare is Electronic Health Record(EHRs). Almost the majority of the Healthcare industry knows about the importance of Big data Analysis in recent years.
Telecom: They are one of the most significant contributors to Big Data. Telecom industry improves the quality of service and routes traffic more effectively. By analyzing call data records in real-time, these companies can identify fraudulent behavior and act on them immediately. The marketing division can modify its campaigns to better target its customers and use insights gained to develop new products and services.
Insurance: These companies use Big Data analytics for risk assessment, fraud detection, marketing, customer insights, customer experience and more.
Government: The government use data analytics to get an estimate of trade in the country. They used Central sales tax invoices to analyze the extent to which states trade with each other.
Finance: Banks and financial services firms use analytics to differentiate fraudulent interactions from legitimate business transactions. The analytics systems suggest immediate actions, such as blocking irregular transactions, which stops fraud before it occurs and improves profitability.
Automobile: Rolls Royce which has embraced Big Data analysis by fitting hundreds of sensors into its engines and propulsion systems, which record every tiny detail about their operation. The changes in data in real-time are reported to engineers who will decide the best course of action such as scheduling maintenance or dispatching engineering teams.
Education: This is one field where Big Data Analytics is being absorbed slowly and gradually. Opting for big data powered technology as a learning tool instead of traditional lecture methods, enhanced the learning of students as well as aided the teachers to track their performance better.
Retail: Retail including e-commerce and in-stores are widely using Big Data Analytics to optimize their business. For example, Amazon, Walmart etc.
RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. It is available in two formats: RStudio Desktop is a regular desktop application while RStudio Server runs on a remote server and allows accessing RStudio using a web browser.
R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. To download R, please choose your preferred CRAN mirror.
Python is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically-typed and garbage-collected.
KEY DIFFERENCES: R is mainly used for statistical analysis while Python provides a more general approach to data science. The primary objective of R is Data analysis and Statistics whereas the primary objective of Python is Deployment and Production.
In the end, the choice between R or Python depends on:
The objectives of your mission: Statistical analysis or deployment
The amount of time you can invest
Your company/industry most-used tool
Data visualization is the practice of translating information into a visual context, such as a map or graph, to make data easier for the human brain to understand and pull insights from. The main goal of data visualization is to make it easier to identify patterns, trends and outliers in large data sets. It is very important to know how to visualize the data in order to be able to understand the insights and apply it in action.
Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.
Microsoft Office, or simply Office, is a family of client software, server software, and services developed by Microsoft. It was first announced by Bill Gates on August 1, 1988, at COMDEX in Las Vegas.
Structured Query Language (SQL) is a standardized programming language that is used to manage relational databases and perform various operations on the data in them.
Presentation skills are the abilities one needs in order to deliver compelling, engaging, informative, transformative, educational, enlightening, and/or instructive presentations. Central to effective presentation skills are public speaking, tone of voice, body language, creativity, and delivery.
Critical thinking is the analysis of available facts, evidence, observations, and arguments to form a judgement. The subject is complex; several different definitions exist, which generally include the rational, skeptical, and unbiased analysis or evaluation of factual evidence.