SlideShare a Scribd company logo
1 of 44
Open ware for Data Analysis
What is
Openware?
• Openware refers to the tools which are freely
available to the public for use, modification,
and distribution. This means that the
underlying code of the software can be
accessed, studied, modified, and shared by
anyone.
• The concept of open source promotes
collaboration, transparency, and community-
driven development. It encourages
developers from around the world to
contribute improvements, fix bugs, and
create new features, leading to rapid
innovation and often higher-quality software.
• Open source software is software developed and maintained via open collaboration, and
made available, typically at no cost, for anyone to use, examine, alter and redistribute
however they like. This contrasts with proprietary or closed source software applications—
e.g. Microsoft Word, Adobe Illustrator—which are sold to end users by the creator or
copyright holder, and cannot be edited, enhanced or redistributed except as specified by the
copyright holder.
• The term open source also refers more generally to a community-based approach to
creating any intellectual property (such as software) via open collaboration, inclusiveness,
transparency, and frequent public updates.
• Open source tools can be found in various domains, including operating systems (like
Linux), web browsers (such as Mozilla Firefox), office suites (like LibreOffice),
programming languages (like Python), and many other applications and tools. It's governed
by licenses that determine how it can be used, shared, and modified while ensuring that the
software remains open and accessible to all.
History of open-source software
• Until the mid-1970s, computer code was seen as implicit to the
operation of the computer hardware, and not unique intellectual
property subject to copyright protection. Organizations programmed
their own software, and code sharing was a common practice.
• The Commission on New Technological Uses of Copyrighted Works
(CONTU) was established in 1974 and concluded that software code
was a category of creative work suitable for copyright protection. This
fueled the growth of independent software publishing as an industry,
with proprietary source code as the primary source of revenue
• A rebellion of sorts against the restrictions and limitations of proprietary software
began in 1983. Programmer Richard Stallman chafed at the notion that users could
not customize proprietary software however they saw fit to accomplish their work.
Stallman felt that “software should be free–as in speech, not beer,” and
championed the notion of software that was freely available for customization.
• Stallman founded the Free Software Foundation and would go on to drive the
development of an open-source alternative to the AT&T-owned Unix operating
system, among other applications. He also innovated the first copyleft software
license, the GNU General Public License (GPL), which required anyone who
enhanced his source code to likewise publish their edited version freely to all.
• Because many felt that Stallman’s term “free software” inaptly emphasized “free
of cost” as the main value of the software, the term “open source” was adopted in
1999
Why users and companies choose
open source?
• Reasons for choosing open-source software can vary significantly from person to
person and organization to organization.
• In many cases, end users are completely unaware of the open-source programs on
their computers or mobile devices. It is also common for end users to download a
free application like the Mozilla Firefox browser, or an Android app. These users
simply want the software’s functionality, with no intention to rewrite or even look
at the source code.
• A company, on the other hand, might choose open-source software over a
proprietary alternative for its low (or no) cost, the flexibility to customize the
source code, or the existence of a large community supporting the application.
Professional or amateur programmers might volunteer their development and
testing skills to an open-source project, often to enhance their reputation and
connect to others in the field.
Data Analysis
• Data Analysis is the process of systematically applying
statistical and/or logical techniques to describe and
illustrate, condense and recap, and evaluate data.
• While data analysis in qualitative research can include
statistical procedures, many times analysis becomes an
ongoing iterative process where data is continuously
collected and analyzed almost simultaneously. Indeed,
researchers generally analyze for patterns in
observations through the entire data collection phase.
The form of the analysis is determined by the specific
qualitative approach taken (field study, ethnography
content analysis, oral history, biography, unobtrusive
research) and the form of the data (field notes,
documents, audiotape, videotape).
Data Analysis Process
Importance of Data Analysis
• Data analytics help businesses understand the target market faster,
increase sales, reduce costs, increase revenue, and allow for better
problem-solving. Data analysis is important for several reasons, as it
plays a critical role in various aspects of modern businesses and
organizations. The importance of the data analysis-
Informed decision-
making
Identifying
opportunities and
challenges
Improving
efficiency and
productivity
Customer
understanding and
personalization
Performance
tracking and
evaluation
Predictive
analytics
Data-driven
innovation
Fraud detection
and security
Regulatory
compliance
Data Analysis
Methods
Descriptive analysis involves summarizing and describing the
main features of a dataset, such as mean, median, mode, standard
deviation, range, and percentiles. It provides a basic understanding
of the data’s distribution and characteristics.
Inferential Statistics
• Inferential statistics are used to make inferences and draw
conclusions about a larger population based on a sample of data. It
includes techniques like hypothesis testing, confidence intervals, and
regression analysis.
Data Visualization
• Data visualization is the graphical representation of data to help
analysts and stakeholders understand patterns, trends, and insights.
Common visualization techniques include bar charts, line graphs,
scatter plots, heat maps, and pie charts.
Descriptive Statistics
Exploratory Data Analysis (EDA)
• EDA involves analyzing and visualizing data to discover patterns,
relationships, and potential outliers. It helps in gaining insights into
the data before formal statistical testing.
Predictive Modeling
• Predictive modeling uses algorithms and statistical techniques to
build models that can make predictions about future outcomes based
on historical data. Machine learning algorithms, such as decision
trees, logistic regression, and neural networks, are commonly used
for predictive modeling.
Time Series Analysis
• Time series analysis is used to analyze data collected over time, such
as stock prices, temperature readings, or sales data. It involves
identifying trends and seasonality and forecasting future values.
Factor Analysis and Principal Component Analysis (PCA)
• These techniques are used to reduce the dimensionality of data and identify
underlying factors or components that explain the variance in the data
Text Mining and Natural Language Processing (NLP)
• Text mining and NLP techniques are used to analyze and extract information from
unstructured text data, such as social media posts, customer reviews, or survey
responses.
Qualitative Data Analysis
• Qualitative data analysis involves interpreting non-numeric data, such as text,
images, audio, or video. Techniques like content analysis, thematic analysis, and
grounded theory are used to analyze qualitative data.
Quantitative Data Analysis
• Quantitative analysis focuses on analyzing numerical data to discover
relationships, trends, and patterns. This analysis often involves statistical methods.
Data Mining
• Data mining involves discovering patterns, relationships, or insights from large
datasets using various algorithms and techniques.
Regression Analysis
• Regression analysis is used to model the relationship between a dependent
variable and one or more independent variables. It helps understand how changes
in one variable impact the other(s).
Cluster Analysis
• Cluster analysis is used to group similar data points together based on certain
features or characteristics. It helps in identifying patterns and segmenting data into
meaningful clusters.
Openware tools for Data Analysis:- Business Perspective
Data
Analysis
•Gain insights
from data
•Optimize
processes
•Improve
decision
making
•Create value
for customers
•Improve data
security and
privacy
•Understand
their
customers
better
•Improve
sales
•Improve
customer
targeting
•Reduce costs
•Create better
problem-
solving
strategies
Openware tools are more adaptable and flexible than
proprietary software
Openware
•Affordable
•Transparent
•Long term use
•Helps in
developing skills
•More secure
•High quality
results
•Optimize
performances
•Reduce costs
•Customizable
•Improve
decision-making
and business
results
Open-Source Data Analytics Tools
01. APACHE
SPARK
06. GRAFANA
07. BIPP
08. CASSANDRA
09. TABLEAU
10. HPCC
1. APACHE SPARK
• Apache Spark is a lightning-fast, open-source data-processing engine for machine learning
and AI applications, backed by the largest open-source community in big data.
• Apache Spark (Spark) is an open source data-processing engine for large data sets. It is
designed to deliver the computational speed, scalability, and programmability required for
Big Data—specifically for streaming data, graph data, machine learning, and artificial
intelligence (AI) applications.
• Spark's analytics engine processes data 10 to 100 times faster than alternatives. It scales by
distributing processing work across large clusters of computers, with built-in parallelism and
fault tolerance. It even includes APIs for programming languages that are popular among data
analysts and data scientists, including Scala, Java, Python, and R.
Features of Apache Spark
Fault
tolerance
Dynamic In
Nature
Lazy
Evaluation
Real-Time
Stream
Processing
Speed Reusability
Advanced
Analytics
In Memory
Computing
Supporting
Multiple
languages
Integrated
with Hadoop
Cost efficient
2. KNIME
• KNIME (Konstanz Information Miner), an open-source, cloud-based, data integration
platform. It was developed in 2004 by software engineers at Konstanz University in
Germany. Although first created for the pharmaceutical industry, KNIME’s strength in
accruing data from numerous sources into a single system has driven its application in
other areas. These include customer analysis, business intelligence, and machine learning.
• Its main draw (besides being free) is its usability. A drag-and-drop graphical user interface
(GUI) makes it ideal for visual programming. This means users don’t need a lot of
technical expertise to create data workflows. While it claims to support the full range of
data analytics tasks, in reality, its strength lies in data mining. Though it offers in-depth
statistical analysis too, users will benefit from some knowledge of Python and R. Being
open-source, KNIME is very flexible and customizable to an organization’s needs—
without heavy costs. This makes it popular with smaller businesses, who have limited
budgets.
Features
of
KNIME
Scalability through sophisticated data handling
High, simple extensibility via a well-defined API for plugin
extensions
Intuitive user interface
Import/export of workflows
Parallel execution on multi-core systems
Command line version for "headless" batch executions
Parallel execution
3. RAPIDMINER
• RapidMiner uses a client/server model with the server offered either on-premises or in
public or private cloud infrastructures. Rapidminer is a comprehensive data science
platform with visual workflow design and full automation.
• RapidMiner provides data mining and machine learning procedures including data
loading and transformation (ETL), data preprocessing and visualization, predictive
analytics and statistical modeling, evaluation, and deployment. RapidMiner is written in
the Java programming language. RapidMiner provides a GUI to design and execute
analytical workflows. Those workflows are called “Processes” in RapidMiner, and they
consist of multiple “Operators”. Each operator performs a single task within the process,
and the output of each operator forms the input of the next one. Alternatively, the engine
can be called from other programs or used as an API. Individual functions can be called
from the command line. RapidMiner provides learning schemes, models and algorithms
and can be extended using R and Python scripts.
Features of
Rapidminer
• Environment for data analysis and machine learning process.
• Provides drag and drop interface to design the analysis process.
• Compatibility with various databases like Oracle, MySQL, SPSS etc.
• Uses XML to describe the operator tree modelling knowledge discovery process.
• Includes many learning algorithm from WEKA
• Specialised for business solutions that include predictive analysis and statistical
computing.
Advantages of
Rapidminer
• Offers enormous Procedures especially in the area of attributes selection and for
outlier detection
• Has the full facility for model evaluation using cross validation and independent
validation sets.
• Provides the integration of maximum algorithm of such tools.
• It has enormous flexibility.
4. HADOOP
• Hadoop is an open-source framework for storing and processing large amounts of data. It's based
on the MapReduce programming model, which allows for the parallel processing of large
datasets. Hadoop is written in Java and is used for batch/offline processing.
• Hadoop uses distributed storage and parallel processing to handle big data and analytics jobs. It
breaks workloads down into smaller workloads that can be run at the same time. Hadoop allows
clustering multiple computers to analyze massive datasets in parallel more quickly.
Hadoop has three components:
• Hadoop HDFS: The storage unit of Hadoop
• Ozone: A distributed object store built on the Hadoop Distributed Data Store block storage layer
• Phoenix: A SQL-based transaction processing and operational analytics engine
Key Features of Hadoop
Cost-
Effectiveness.
High-Level
Scalability.
Fault
Tolerance.
High-
Availability of
Data.
Faster Data
Processing.
Data Locality.
Possibility of
Processing All
Types of Data.
Faster Data
Processing
Machine
Learning
Capabilities
Integration
with Other
Tools
Secure
Community
Support
5. PENTAHO
• Pentaho is business intelligence software that provides data integration, OLAP
services, reporting, information dashboards, data mining and extract, transform,
load capabilities. Its headquarters are in Orlando, Florida. Pentaho was acquired
by Hitachi Data Systems in 2015 and in 2017 became part of Hitachi Vantara.
• Pentaho is a Business Intelligence tool which provides a wide range of business
intelligence solutions to the customers. It is capable of reporting, data analysis,
data integration, data mining, etc. Pentaho also offers a comprehensive set of BI
features which allows to improve business performance and efficiency.
Features of Pentaho
• ETL capabilities for business intelligence needs
• Understanding Pentaho Report Designer
• Product Expertise
• Offers Side-by-side sub reports
• Unlocking new capabilities
• Professional Support
• Query and Reporting
• Offers Enhanced Functionality
• Full runtime metadata support from data sources
6. GRAFANA
• Grafana is an open and composable observability and data
visualization platform. Visualize metrics, logs, and traces from
multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB,
Postgres and many more.
• An open-source monitoring system with a dimensional data model,
flexible query language, efficient time series database and modern
alerting approach.
• It is a multi-platform open source analytics and interactive
visualization web application. It provides charts, graphs, and alerts for
the web when connected to supported data sources.
Grafana is an
open-source
data
visualization
and monitoring
tool. Some of its
features include
• Panels: The basic building block for visualization in Grafana.
Panels can contain graphs, tables, heatmaps, and more.
• Plugins: Grafana integrates with many popular data sources.
• Graph annotations: Allows you to mark graphs to enhance
your dataset's correlation.
• Dashboards: Present data in formats like charts, tables,
histograms, heat maps, and world maps.
• Alerts: Allows you to create, manage, and silence alerts within
one UI.
• Authentication: Supports different authentication methods,
such as LDAP and OAuth.
• Logs: Allows you to tail logs in real time, update logs after a
certain time, and view logs for a particular date.
• Reporting: Allows you to automatically generate PDFs from
any of your dashboards.
7. BIPP
• Bipp is a business intelligence (BI) platform that helps organizations use data
to make faster and better decisions. Bipp is a cloud-based platform that allows
users to explore billions of records in real-time. It's built for data analysts and
simplifies SQL queries.
• BIPP is a modern, cloud business intelligence platform that lets you explore
billions of records in real-time. Simply connect your datasource, and build
reusable data models with BIPP’s Data Modeling Layer. Or explore your data
with Visual SQL Data Explorer and create charts and dashboards in minutes.
Bipp is a cloud-
based business
intelligence
platform that
allows users to
explore billions
of records in
real-time. Some
features of bipp
include:
• Data Modeling Layer: Allows users to build reusable
data models.
• Visual SQL Data Explorer: Allows users to explore
data and create charts and dashboards.
• Git: Records changes and manages file versions.
• Interactive dashboards: Can act like data
applications.
• Custom visualizations: Can meet unique needs.
• Real-time performance monitoring: Allows users to
monitor and measure performance.
• Dynamic window/analytic functions
• Views from legacy SQL
• Views using structured SQL
8. CASSANDRA
• Cassandra was created in 2008 by Avinash Lakshman, who was responsible for scaling
Facebook's inbox search feature. It's used by big companies like Apple, which manages
100 petabytes of data across hundreds of thousands of server instances.
• Cassandra is a NoSQL distributed database that manages large amounts of data across
multiple servers. It's open-source, lightweight, and non-relational. Cassandra is known for
its ability to distribute petabytes of data with high reliability and performance.
• Cassandra is schema-free, supports easy replication, and has a simple API. It's also
eventually consistent and can handle huge amounts of data.
• Cassandra might not be the right database for many-to-many mappings or joins between
tables. It doesn't support a relational schema with foreign keys and join tables.
9. Tableau
• Tableau is a popular data visualization tool that is used by businesses
of all sizes to quickly and easily analyze data. It allows users to create
dashboards and visualizations that can be used to share insights with
stakeholders. Tableau is also used by data scientists to explore data
with limitless visual analytics.
• Tableau is a powerful data visualization tool that helps businesses
derive valuable insights from their data. It allows users to create
interactive dashboards.
Features of Tableau
• Tableau Dashboard
• Collaboration and Sharing
• Live and In-memory Data
• Data Sources in Tableau
• Advanced Visualizations
• Mobile View
• Revision History
• Licensing Views
• Subscribe others
10.HPCC
• HPCC Systems, or High-Performance Computing Cluster, is an open source, data-intensive
computing platform for big data processing and analytics. It was developed by LexisNexis Risk
Solutions.
• The HPCC platform incorporates a software architecture implemented on commodity
computing clusters to provide high-performance, data-parallel processing for applications
utilizing big data. The HPCC platform includes system configurations to support both parallel
batch data processing (Thor) and high-performance online query applications using indexed
data files (Roxie). The HPCC platform also includes a data-centric declarative programming
language for parallel data processing called ECL.
• The HPCC system architecture includes two distinct cluster processing environments Thor and
Roxie, each of which can be optimized independently for its parallel data processing purpose.
HPCC Systems features
Data management and
analytics
Data profiling, data
cleansing, snapshot
data updates, and a
scheduling component
Query and search
engine
The Roxie cluster
contains a powerful
query and built-in
search engine
Lightweight core
architecture
Better performance,
near real-time results,
and full-spectrum
operational scale
Integrated
development
environment
The ECL IDE is a
Windows desktop
application for
developers to facilitate
ECL code development
Optimizer
Ensures that submitted
ECL code is executed at
the maximum possible
speed for the
underlying hardware
Fast performance Easy to deploy and use
Scale from small to Big
Data
Rich API for Data
Preparation,
Integration, Quality
Checking, Duplicate
Checking etc.
Parallelized Machine
Learning Algorithms for
distributed data
THANK YOU

More Related Content

Similar to data analytics.pptx

FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptxInfosectrain3
 
Introduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptxIntroduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptxssuser5cdaa93
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all unitsjayaramb
 
Decision Support Systems
Decision Support SystemsDecision Support Systems
Decision Support SystemsHadi Fadlallah
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Agile data science
Agile data scienceAgile data science
Agile data scienceJoel Horwitz
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and PracticeDaniel S. Katz
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarFAIRDOM
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)Shivani Rai
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionMohammad Ilyas Malik
 

Similar to data analytics.pptx (20)

FAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech ProposalsFAIRDOM data management support for ERACoBioTech Proposals
FAIRDOM data management support for ERACoBioTech Proposals
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptx
 
Introduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptxIntroduction to Data Analytics - PPM.pptx
Introduction to Data Analytics - PPM.pptx
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all units
 
Decision Support Systems
Decision Support SystemsDecision Support Systems
Decision Support Systems
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
R18 unit-1-tk-part-1-technologies in research
R18 unit-1-tk-part-1-technologies in researchR18 unit-1-tk-part-1-technologies in research
R18 unit-1-tk-part-1-technologies in research
 
Final .pptx
Final .pptxFinal .pptx
Final .pptx
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
 
ERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management WebinarERA CoBioTech Data Management Webinar
ERA CoBioTech Data Management Webinar
 
CYBER SECURITY.pdf
CYBER SECURITY.pdfCYBER SECURITY.pdf
CYBER SECURITY.pdf
 
Open soucre(cut shrt)
Open soucre(cut shrt)Open soucre(cut shrt)
Open soucre(cut shrt)
 
KIT601 Unit I.pptx
KIT601 Unit I.pptxKIT601 Unit I.pptx
KIT601 Unit I.pptx
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
 

More from Mohammad Usman

AI open tools for Research.pptx
AI open tools for Research.pptxAI open tools for Research.pptx
AI open tools for Research.pptxMohammad Usman
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithmsMohammad Usman
 
Object oriented programming with c++
Object oriented programming with c++Object oriented programming with c++
Object oriented programming with c++Mohammad Usman
 
Dynamic memory allocation
Dynamic memory allocationDynamic memory allocation
Dynamic memory allocationMohammad Usman
 
Career counselling banner
Career counselling bannerCareer counselling banner
Career counselling bannerMohammad Usman
 
Literacy for or_against_the_poor_seminar
Literacy for or_against_the_poor_seminarLiteracy for or_against_the_poor_seminar
Literacy for or_against_the_poor_seminarMohammad Usman
 

More from Mohammad Usman (10)

AI open tools for Research.pptx
AI open tools for Research.pptxAI open tools for Research.pptx
AI open tools for Research.pptx
 
Data structures and algorithms
Data structures and algorithmsData structures and algorithms
Data structures and algorithms
 
Object oriented programming with c++
Object oriented programming with c++Object oriented programming with c++
Object oriented programming with c++
 
Dynamic memory allocation
Dynamic memory allocationDynamic memory allocation
Dynamic memory allocation
 
Career Guide
Career GuideCareer Guide
Career Guide
 
C areer banner
C areer bannerC areer banner
C areer banner
 
Career counselling banner
Career counselling bannerCareer counselling banner
Career counselling banner
 
Career ccc
Career cccCareer ccc
Career ccc
 
Career ccc hindi
Career ccc hindiCareer ccc hindi
Career ccc hindi
 
Literacy for or_against_the_poor_seminar
Literacy for or_against_the_poor_seminarLiteracy for or_against_the_poor_seminar
Literacy for or_against_the_poor_seminar
 

Recently uploaded

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 

data analytics.pptx

  • 1. Open ware for Data Analysis
  • 2. What is Openware? • Openware refers to the tools which are freely available to the public for use, modification, and distribution. This means that the underlying code of the software can be accessed, studied, modified, and shared by anyone. • The concept of open source promotes collaboration, transparency, and community- driven development. It encourages developers from around the world to contribute improvements, fix bugs, and create new features, leading to rapid innovation and often higher-quality software.
  • 3. • Open source software is software developed and maintained via open collaboration, and made available, typically at no cost, for anyone to use, examine, alter and redistribute however they like. This contrasts with proprietary or closed source software applications— e.g. Microsoft Word, Adobe Illustrator—which are sold to end users by the creator or copyright holder, and cannot be edited, enhanced or redistributed except as specified by the copyright holder. • The term open source also refers more generally to a community-based approach to creating any intellectual property (such as software) via open collaboration, inclusiveness, transparency, and frequent public updates. • Open source tools can be found in various domains, including operating systems (like Linux), web browsers (such as Mozilla Firefox), office suites (like LibreOffice), programming languages (like Python), and many other applications and tools. It's governed by licenses that determine how it can be used, shared, and modified while ensuring that the software remains open and accessible to all.
  • 4. History of open-source software • Until the mid-1970s, computer code was seen as implicit to the operation of the computer hardware, and not unique intellectual property subject to copyright protection. Organizations programmed their own software, and code sharing was a common practice. • The Commission on New Technological Uses of Copyrighted Works (CONTU) was established in 1974 and concluded that software code was a category of creative work suitable for copyright protection. This fueled the growth of independent software publishing as an industry, with proprietary source code as the primary source of revenue
  • 5. • A rebellion of sorts against the restrictions and limitations of proprietary software began in 1983. Programmer Richard Stallman chafed at the notion that users could not customize proprietary software however they saw fit to accomplish their work. Stallman felt that “software should be free–as in speech, not beer,” and championed the notion of software that was freely available for customization. • Stallman founded the Free Software Foundation and would go on to drive the development of an open-source alternative to the AT&T-owned Unix operating system, among other applications. He also innovated the first copyleft software license, the GNU General Public License (GPL), which required anyone who enhanced his source code to likewise publish their edited version freely to all. • Because many felt that Stallman’s term “free software” inaptly emphasized “free of cost” as the main value of the software, the term “open source” was adopted in 1999
  • 6. Why users and companies choose open source? • Reasons for choosing open-source software can vary significantly from person to person and organization to organization. • In many cases, end users are completely unaware of the open-source programs on their computers or mobile devices. It is also common for end users to download a free application like the Mozilla Firefox browser, or an Android app. These users simply want the software’s functionality, with no intention to rewrite or even look at the source code. • A company, on the other hand, might choose open-source software over a proprietary alternative for its low (or no) cost, the flexibility to customize the source code, or the existence of a large community supporting the application. Professional or amateur programmers might volunteer their development and testing skills to an open-source project, often to enhance their reputation and connect to others in the field.
  • 7. Data Analysis • Data Analysis is the process of systematically applying statistical and/or logical techniques to describe and illustrate, condense and recap, and evaluate data. • While data analysis in qualitative research can include statistical procedures, many times analysis becomes an ongoing iterative process where data is continuously collected and analyzed almost simultaneously. Indeed, researchers generally analyze for patterns in observations through the entire data collection phase. The form of the analysis is determined by the specific qualitative approach taken (field study, ethnography content analysis, oral history, biography, unobtrusive research) and the form of the data (field notes, documents, audiotape, videotape).
  • 9. Importance of Data Analysis • Data analytics help businesses understand the target market faster, increase sales, reduce costs, increase revenue, and allow for better problem-solving. Data analysis is important for several reasons, as it plays a critical role in various aspects of modern businesses and organizations. The importance of the data analysis- Informed decision- making Identifying opportunities and challenges Improving efficiency and productivity Customer understanding and personalization Performance tracking and evaluation Predictive analytics Data-driven innovation Fraud detection and security Regulatory compliance
  • 10.
  • 11. Data Analysis Methods Descriptive analysis involves summarizing and describing the main features of a dataset, such as mean, median, mode, standard deviation, range, and percentiles. It provides a basic understanding of the data’s distribution and characteristics. Inferential Statistics • Inferential statistics are used to make inferences and draw conclusions about a larger population based on a sample of data. It includes techniques like hypothesis testing, confidence intervals, and regression analysis. Data Visualization • Data visualization is the graphical representation of data to help analysts and stakeholders understand patterns, trends, and insights. Common visualization techniques include bar charts, line graphs, scatter plots, heat maps, and pie charts. Descriptive Statistics
  • 12. Exploratory Data Analysis (EDA) • EDA involves analyzing and visualizing data to discover patterns, relationships, and potential outliers. It helps in gaining insights into the data before formal statistical testing. Predictive Modeling • Predictive modeling uses algorithms and statistical techniques to build models that can make predictions about future outcomes based on historical data. Machine learning algorithms, such as decision trees, logistic regression, and neural networks, are commonly used for predictive modeling. Time Series Analysis • Time series analysis is used to analyze data collected over time, such as stock prices, temperature readings, or sales data. It involves identifying trends and seasonality and forecasting future values.
  • 13. Factor Analysis and Principal Component Analysis (PCA) • These techniques are used to reduce the dimensionality of data and identify underlying factors or components that explain the variance in the data Text Mining and Natural Language Processing (NLP) • Text mining and NLP techniques are used to analyze and extract information from unstructured text data, such as social media posts, customer reviews, or survey responses. Qualitative Data Analysis • Qualitative data analysis involves interpreting non-numeric data, such as text, images, audio, or video. Techniques like content analysis, thematic analysis, and grounded theory are used to analyze qualitative data. Quantitative Data Analysis • Quantitative analysis focuses on analyzing numerical data to discover relationships, trends, and patterns. This analysis often involves statistical methods.
  • 14. Data Mining • Data mining involves discovering patterns, relationships, or insights from large datasets using various algorithms and techniques. Regression Analysis • Regression analysis is used to model the relationship between a dependent variable and one or more independent variables. It helps understand how changes in one variable impact the other(s). Cluster Analysis • Cluster analysis is used to group similar data points together based on certain features or characteristics. It helps in identifying patterns and segmenting data into meaningful clusters.
  • 15. Openware tools for Data Analysis:- Business Perspective Data Analysis •Gain insights from data •Optimize processes •Improve decision making •Create value for customers •Improve data security and privacy •Understand their customers better •Improve sales •Improve customer targeting •Reduce costs •Create better problem- solving strategies
  • 16. Openware tools are more adaptable and flexible than proprietary software Openware •Affordable •Transparent •Long term use •Helps in developing skills •More secure •High quality results •Optimize performances •Reduce costs •Customizable •Improve decision-making and business results
  • 17. Open-Source Data Analytics Tools 01. APACHE SPARK
  • 18. 06. GRAFANA 07. BIPP 08. CASSANDRA 09. TABLEAU 10. HPCC
  • 19. 1. APACHE SPARK • Apache Spark is a lightning-fast, open-source data-processing engine for machine learning and AI applications, backed by the largest open-source community in big data. • Apache Spark (Spark) is an open source data-processing engine for large data sets. It is designed to deliver the computational speed, scalability, and programmability required for Big Data—specifically for streaming data, graph data, machine learning, and artificial intelligence (AI) applications. • Spark's analytics engine processes data 10 to 100 times faster than alternatives. It scales by distributing processing work across large clusters of computers, with built-in parallelism and fault tolerance. It even includes APIs for programming languages that are popular among data analysts and data scientists, including Scala, Java, Python, and R.
  • 20.
  • 21. Features of Apache Spark Fault tolerance Dynamic In Nature Lazy Evaluation Real-Time Stream Processing Speed Reusability Advanced Analytics In Memory Computing Supporting Multiple languages Integrated with Hadoop Cost efficient
  • 22. 2. KNIME • KNIME (Konstanz Information Miner), an open-source, cloud-based, data integration platform. It was developed in 2004 by software engineers at Konstanz University in Germany. Although first created for the pharmaceutical industry, KNIME’s strength in accruing data from numerous sources into a single system has driven its application in other areas. These include customer analysis, business intelligence, and machine learning. • Its main draw (besides being free) is its usability. A drag-and-drop graphical user interface (GUI) makes it ideal for visual programming. This means users don’t need a lot of technical expertise to create data workflows. While it claims to support the full range of data analytics tasks, in reality, its strength lies in data mining. Though it offers in-depth statistical analysis too, users will benefit from some knowledge of Python and R. Being open-source, KNIME is very flexible and customizable to an organization’s needs— without heavy costs. This makes it popular with smaller businesses, who have limited budgets.
  • 23. Features of KNIME Scalability through sophisticated data handling High, simple extensibility via a well-defined API for plugin extensions Intuitive user interface Import/export of workflows Parallel execution on multi-core systems Command line version for "headless" batch executions Parallel execution
  • 24. 3. RAPIDMINER • RapidMiner uses a client/server model with the server offered either on-premises or in public or private cloud infrastructures. Rapidminer is a comprehensive data science platform with visual workflow design and full automation. • RapidMiner provides data mining and machine learning procedures including data loading and transformation (ETL), data preprocessing and visualization, predictive analytics and statistical modeling, evaluation, and deployment. RapidMiner is written in the Java programming language. RapidMiner provides a GUI to design and execute analytical workflows. Those workflows are called “Processes” in RapidMiner, and they consist of multiple “Operators”. Each operator performs a single task within the process, and the output of each operator forms the input of the next one. Alternatively, the engine can be called from other programs or used as an API. Individual functions can be called from the command line. RapidMiner provides learning schemes, models and algorithms and can be extended using R and Python scripts.
  • 25. Features of Rapidminer • Environment for data analysis and machine learning process. • Provides drag and drop interface to design the analysis process. • Compatibility with various databases like Oracle, MySQL, SPSS etc. • Uses XML to describe the operator tree modelling knowledge discovery process. • Includes many learning algorithm from WEKA • Specialised for business solutions that include predictive analysis and statistical computing. Advantages of Rapidminer • Offers enormous Procedures especially in the area of attributes selection and for outlier detection • Has the full facility for model evaluation using cross validation and independent validation sets. • Provides the integration of maximum algorithm of such tools. • It has enormous flexibility.
  • 26. 4. HADOOP • Hadoop is an open-source framework for storing and processing large amounts of data. It's based on the MapReduce programming model, which allows for the parallel processing of large datasets. Hadoop is written in Java and is used for batch/offline processing. • Hadoop uses distributed storage and parallel processing to handle big data and analytics jobs. It breaks workloads down into smaller workloads that can be run at the same time. Hadoop allows clustering multiple computers to analyze massive datasets in parallel more quickly. Hadoop has three components: • Hadoop HDFS: The storage unit of Hadoop • Ozone: A distributed object store built on the Hadoop Distributed Data Store block storage layer • Phoenix: A SQL-based transaction processing and operational analytics engine
  • 27.
  • 28. Key Features of Hadoop Cost- Effectiveness. High-Level Scalability. Fault Tolerance. High- Availability of Data. Faster Data Processing. Data Locality. Possibility of Processing All Types of Data. Faster Data Processing Machine Learning Capabilities Integration with Other Tools Secure Community Support
  • 29. 5. PENTAHO • Pentaho is business intelligence software that provides data integration, OLAP services, reporting, information dashboards, data mining and extract, transform, load capabilities. Its headquarters are in Orlando, Florida. Pentaho was acquired by Hitachi Data Systems in 2015 and in 2017 became part of Hitachi Vantara. • Pentaho is a Business Intelligence tool which provides a wide range of business intelligence solutions to the customers. It is capable of reporting, data analysis, data integration, data mining, etc. Pentaho also offers a comprehensive set of BI features which allows to improve business performance and efficiency.
  • 30.
  • 31. Features of Pentaho • ETL capabilities for business intelligence needs • Understanding Pentaho Report Designer • Product Expertise • Offers Side-by-side sub reports • Unlocking new capabilities • Professional Support • Query and Reporting • Offers Enhanced Functionality • Full runtime metadata support from data sources
  • 32. 6. GRAFANA • Grafana is an open and composable observability and data visualization platform. Visualize metrics, logs, and traces from multiple sources like Prometheus, Loki, Elasticsearch, InfluxDB, Postgres and many more. • An open-source monitoring system with a dimensional data model, flexible query language, efficient time series database and modern alerting approach. • It is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.
  • 33. Grafana is an open-source data visualization and monitoring tool. Some of its features include • Panels: The basic building block for visualization in Grafana. Panels can contain graphs, tables, heatmaps, and more. • Plugins: Grafana integrates with many popular data sources. • Graph annotations: Allows you to mark graphs to enhance your dataset's correlation. • Dashboards: Present data in formats like charts, tables, histograms, heat maps, and world maps. • Alerts: Allows you to create, manage, and silence alerts within one UI. • Authentication: Supports different authentication methods, such as LDAP and OAuth. • Logs: Allows you to tail logs in real time, update logs after a certain time, and view logs for a particular date. • Reporting: Allows you to automatically generate PDFs from any of your dashboards.
  • 34. 7. BIPP • Bipp is a business intelligence (BI) platform that helps organizations use data to make faster and better decisions. Bipp is a cloud-based platform that allows users to explore billions of records in real-time. It's built for data analysts and simplifies SQL queries. • BIPP is a modern, cloud business intelligence platform that lets you explore billions of records in real-time. Simply connect your datasource, and build reusable data models with BIPP’s Data Modeling Layer. Or explore your data with Visual SQL Data Explorer and create charts and dashboards in minutes.
  • 35. Bipp is a cloud- based business intelligence platform that allows users to explore billions of records in real-time. Some features of bipp include: • Data Modeling Layer: Allows users to build reusable data models. • Visual SQL Data Explorer: Allows users to explore data and create charts and dashboards. • Git: Records changes and manages file versions. • Interactive dashboards: Can act like data applications. • Custom visualizations: Can meet unique needs. • Real-time performance monitoring: Allows users to monitor and measure performance. • Dynamic window/analytic functions • Views from legacy SQL • Views using structured SQL
  • 36. 8. CASSANDRA • Cassandra was created in 2008 by Avinash Lakshman, who was responsible for scaling Facebook's inbox search feature. It's used by big companies like Apple, which manages 100 petabytes of data across hundreds of thousands of server instances. • Cassandra is a NoSQL distributed database that manages large amounts of data across multiple servers. It's open-source, lightweight, and non-relational. Cassandra is known for its ability to distribute petabytes of data with high reliability and performance. • Cassandra is schema-free, supports easy replication, and has a simple API. It's also eventually consistent and can handle huge amounts of data. • Cassandra might not be the right database for many-to-many mappings or joins between tables. It doesn't support a relational schema with foreign keys and join tables.
  • 37.
  • 38.
  • 39. 9. Tableau • Tableau is a popular data visualization tool that is used by businesses of all sizes to quickly and easily analyze data. It allows users to create dashboards and visualizations that can be used to share insights with stakeholders. Tableau is also used by data scientists to explore data with limitless visual analytics. • Tableau is a powerful data visualization tool that helps businesses derive valuable insights from their data. It allows users to create interactive dashboards.
  • 40.
  • 41. Features of Tableau • Tableau Dashboard • Collaboration and Sharing • Live and In-memory Data • Data Sources in Tableau • Advanced Visualizations • Mobile View • Revision History • Licensing Views • Subscribe others
  • 42. 10.HPCC • HPCC Systems, or High-Performance Computing Cluster, is an open source, data-intensive computing platform for big data processing and analytics. It was developed by LexisNexis Risk Solutions. • The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. The HPCC platform includes system configurations to support both parallel batch data processing (Thor) and high-performance online query applications using indexed data files (Roxie). The HPCC platform also includes a data-centric declarative programming language for parallel data processing called ECL. • The HPCC system architecture includes two distinct cluster processing environments Thor and Roxie, each of which can be optimized independently for its parallel data processing purpose.
  • 43. HPCC Systems features Data management and analytics Data profiling, data cleansing, snapshot data updates, and a scheduling component Query and search engine The Roxie cluster contains a powerful query and built-in search engine Lightweight core architecture Better performance, near real-time results, and full-spectrum operational scale Integrated development environment The ECL IDE is a Windows desktop application for developers to facilitate ECL code development Optimizer Ensures that submitted ECL code is executed at the maximum possible speed for the underlying hardware Fast performance Easy to deploy and use Scale from small to Big Data Rich API for Data Preparation, Integration, Quality Checking, Duplicate Checking etc. Parallelized Machine Learning Algorithms for distributed data