The document analyzes a superstore dataset containing sales data from 2015 to 2018 to understand shopping patterns and identify profitable products and regions. Visualizations show that the Western region had the most orders. Quantity ordered was highest for 1 item. Technology was the most profitable category at 36% of sales. The Tree Map showed phones had the highest sales while furniture had losses. Sales and profit were somewhat correlated, while profit and discount were negatively correlated. Word clouds indicated products related to Xerox, binders, chairs and Avery were ordered most frequently.
Business analytics is the practice of iterative statistical analysis of a company's data to support data-driven decision making. It has evolved from early uses of basic graphs and spreadsheets to track sales trends and predict outcomes, to modern applications that gain insights from large volumes of historical data using descriptive analytics and predict customer behavior using predictive analytics to inform real-time decisions. Common business analytics tools include SPSS for statistical analysis and Microsoft Excel for calculations, graphs, and pivot tables.
This document provides an overview of Tableau, a business intelligence software for data visualization and analytics. It outlines the 7 key steps to get insights from data quickly using Tableau: 1) connect to a data source, 2) manage the data, 3) create visualizations, 4) edit visualizations, 5) create additional visualizations, 6) build interactive dashboards, and 7) share visualizations. Tableau offers an easy and fast way to transform data into interactive visuals that help users identify patterns and trends to inform business decisions.
The document discusses the history and evolution of information systems over six periods from the 1950s to present:
1) 1950s: Transaction processing systems for electronic data processing
2) 1960s-1970s: Emergence of management information systems to provide reports for managers
3) 1970s-1980s: Development of personal computers and decision support systems for interactive analysis
4) 1980s-1990s: Creation of executive information systems and growth of the internet
5) 1990s-2000s: Applications of artificial intelligence like expert systems and knowledge management systems
6) 2000s-present: Rise of e-business, e-commerce, mobile technologies, big data, and cloud computing.
45min talk given at LondonR March 2014 Meetup.
The presentation describes how one might go about an insights-driven data science project using the R language and packages, using an open source dataset.
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
The document provides an introduction to data analytics, including defining key terms like data, information, and analytics. It outlines the learning outcomes which are the basic definition of data analytics concepts, different variable types, types of analytics, and the analytics life cycle. The analytics life cycle is described in detail and involves problem identification, hypothesis formulation, data collection, data exploration, model building, and model validation/evaluation. Different variable types like numerical, categorical, and ordinal variables are also defined.
Download at http://DavidHubbard.net/powerpoint - This Introduction to Business Intelligence gives an overview of how Business Intelligence fits into business strategy in general. It does not go into the specific technologies of Business Intelligence. It is meant to be used to explain Business Intelligence to those not already familiar with Business Intelligence.
Business analytics is the practice of iterative statistical analysis of a company's data to support data-driven decision making. It has evolved from early uses of basic graphs and spreadsheets to track sales trends and predict outcomes, to modern applications that gain insights from large volumes of historical data using descriptive analytics and predict customer behavior using predictive analytics to inform real-time decisions. Common business analytics tools include SPSS for statistical analysis and Microsoft Excel for calculations, graphs, and pivot tables.
This document provides an overview of Tableau, a business intelligence software for data visualization and analytics. It outlines the 7 key steps to get insights from data quickly using Tableau: 1) connect to a data source, 2) manage the data, 3) create visualizations, 4) edit visualizations, 5) create additional visualizations, 6) build interactive dashboards, and 7) share visualizations. Tableau offers an easy and fast way to transform data into interactive visuals that help users identify patterns and trends to inform business decisions.
The document discusses the history and evolution of information systems over six periods from the 1950s to present:
1) 1950s: Transaction processing systems for electronic data processing
2) 1960s-1970s: Emergence of management information systems to provide reports for managers
3) 1970s-1980s: Development of personal computers and decision support systems for interactive analysis
4) 1980s-1990s: Creation of executive information systems and growth of the internet
5) 1990s-2000s: Applications of artificial intelligence like expert systems and knowledge management systems
6) 2000s-present: Rise of e-business, e-commerce, mobile technologies, big data, and cloud computing.
45min talk given at LondonR March 2014 Meetup.
The presentation describes how one might go about an insights-driven data science project using the R language and packages, using an open source dataset.
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
Data Analytics for R Course: https://www.edureka.co/r-for-analytics
This Edureka Tutorial on Data Analytics for Beginners will help you learn the various parameters you need to consider while performing data analysis.
The following are the topics covered in this session:
Introduction To Data Analytics
Statistics
Data Cleaning and Manipulation
Data Visualization
Machine Learning
Roles, Responsibilities and Salary of Data Analyst
Need of R
Hands-On
Statistics for Data Science: https://youtu.be/oT87O0VQRi8
Follow us to never miss an update in the future.
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
The document provides an introduction to data analytics, including defining key terms like data, information, and analytics. It outlines the learning outcomes which are the basic definition of data analytics concepts, different variable types, types of analytics, and the analytics life cycle. The analytics life cycle is described in detail and involves problem identification, hypothesis formulation, data collection, data exploration, model building, and model validation/evaluation. Different variable types like numerical, categorical, and ordinal variables are also defined.
Download at http://DavidHubbard.net/powerpoint - This Introduction to Business Intelligence gives an overview of how Business Intelligence fits into business strategy in general. It does not go into the specific technologies of Business Intelligence. It is meant to be used to explain Business Intelligence to those not already familiar with Business Intelligence.
Business Intelligence made easy! This is the first part of a two-part presentation I prepared for one of our customers to help them understand what Business Intelligence is and what can it do...
Bigdata analysis in supply chain managmentKushal Shah
big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
supply chain industry need this type of data to survive in every situations.
Best Practices for Killer Data VisualizationQualtrics
There’s something special about simple, powerful visualizations that tell a story. In fact, 65% of people are visual learners.
Join Qualtrics and Sasha Pasulka from Tableau as we illuminate the world of data visualization and give you clear takeaways to help you tell a better story with data. Getting executive buy-in or that seat at the table may come down to who can visualize data in a way that excites and enlightens the audience.
Data Analytics PowerPoint Presentation SlidesSlideTeam
This document discusses different sources of big data including media, cloud, web, internet of things, databases, social networks, activity-generated data, and legacy documents. It provides brief descriptions of each source, highlighting how they generate valuable insights. Media such as images, videos and social media provide consumer preference data. Cloud storage accommodates structured and unstructured data to provide real-time insights. The web and internet of things generate machine-generated data from various devices. Databases integrate traditional and modern data sources. Social networks and reviews provide user profile and influencer data. Activity logs also contribute to big data. Legacy documents remain an untapped resource.
This document provides an overview of the Power BI learning journey. It outlines the basic, intermediate, and advanced levels which include understanding Power Query, Power Pivot, DAX, Power View, and building reports in Power BI Desktop and the Power BI web/mobile apps. The three main stages are discover (with Power Query), analyze (with Power Pivot and DAX), and visualize (with Power View, Power Map, and Power BI tools). Understanding functions like CALCULATE, relationships, and measures is important for effective data modeling and dashboard creation in Power BI. Upcoming features and resources for continued learning are also mentioned.
Business analytics uses data, statistical analysis, and other quantitative techniques to help understand and optimize business performance. It is becoming a major tool used by many large corporations. There are various tools and techniques for business analytics, including online analytical processing (OLAP), data visualization, data mining, predictive analysis, and geographic information systems (GIS). Real-time business intelligence and automated decision support are also increasingly important for analytics.
This document discusses online analytical processing (OLAP) and related concepts. It defines data mining, data warehousing, OLTP, and OLAP. It explains that a data warehouse integrates data from multiple sources and stores historical data for analysis. OLAP allows users to easily extract and view data from different perspectives. The document also discusses OLAP cube operations like slicing, dicing, drilling, and pivoting. It describes different OLAP architectures like MOLAP, ROLAP, and HOLAP and data warehouse schemas and architecture.
This is the first presentation of a two part webinar for Blue ocean strategy.
The presentation introduces to red ocean and blue ocean companies, How blue ocean strategy is a simultaneous pursuit of cost and value.
The presentation provides a quick introduction with new age examples to strategy canvas, 6 paths framework, four actions frame work, buyer utility map, 3 tiers of non customers and PMS maps.
The presentation also utilizes these frameworks in showcasing descriptive case studies of companies like netjets, indochino.com, Zynga and khan academy.
This presentation is aimed at explaining the greatness of Blue ocean strategy thinking to general audience and does not imply distortion of facts and frameworks of the original Authors: Chan Kim, Renee Mauborgne
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
The document discusses business intelligence and data warehousing. It describes the evolution of business intelligence from manual data retrieval and report preparation to modern integrated systems that provide analytics, dashboards, reporting, and key performance indicators. The Performa BI Suite is presented as a user-friendly business intelligence software that offers advanced visualization tools, multidimensional analysis, and integrated analytics, dashboards, and reporting in a single platform. Testimonials from users praise Performa for its ease of use, stability, and ability to meet reporting and analysis needs.
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
This document summarizes a research paper that uses unsupervised machine learning algorithms to detect credit card fraud. It describes how credit card fraud has increased with the rise of online shopping and payments. Unsupervised algorithms are well-suited for this task since labeled fraud data can be difficult to obtain. The paper tests Isolation Forest, Local Outlier Factor, and One Class SVM on a credit card transaction dataset to find anomalies (fraudulent transactions). Isolation Forest achieved the highest accuracy at 99.74%, slightly outperforming Local Outlier Factor, while One Class SVM had much lower accuracy. The paper concludes unsupervised algorithms are effective for anomaly detection tasks like credit card fraud detection.
The document provides an overview of data, information, knowledge, and data mining. It defines data as facts/observations/measurements, information as processed data that is useful (e.g. for decision making), and knowledge as patterns in data/information with a high degree of certainty. Data mining is described as the process of extracting useful but non-obvious information from large databases through an interactive and iterative process. Common business applications and technologies involved in data mining are also discussed.
Visualisation & Storytelling in Data Science & AnalyticsFelipe Rego
The document provides an overview of data visualization and storytelling in data science and analytics. It discusses key concepts like what data visualization is, compelling reasons to visualize data like Anscombe's Quartet, visualization in the context of analytics workflows, components of effective storytelling, considerations for presentation, guidelines for data storytelling, and examples of interesting data visualizations. Throughout the document, the author emphasizes best practices like keeping visualizations clear, addressing the intended audience, and avoiding bias.
Data wrangling involves transforming raw data into a usable format through processes like merging data sources, identifying and removing gaps/errors, and structuring data. The main steps of data wrangling are discovery, structuring, cleaning, enriching, validating, and publishing. Data wrangling is important because it ensures data is reliable before analysis, improving insights and reducing risks from faulty data. It typically requires significant time and resources but yields major benefits like improved data usability, integration, and analytics. Common tools for data wrangling include Excel, OpenRefine, Tabula, Google DataPrep, and Data Wrangler.
OLAP (online analytical processing) allows users to easily extract and view data from different perspectives. It was invented by Edgar Codd in the 1980s and uses multidimensional data structures called cubes to store and analyze data. OLAP utilizes either a multidimensional (MOLAP), relational (ROLAP), or hybrid (HOLAP) approach to store cube data in databases and provide interactive analysis of data.
The client provided KPMG with 3 datasets for analysis: customer demographic data, customer address data, and transactions data from the past 3 months. The transactions data contains 20,000 rows with 26 columns, including customer, product, and transaction information. Some columns have missing values. There are no duplicate rows. Dates in one column are invalid as they all occur on the same day. The new customer list contains 1,000 rows of customer profile data. The data quality assessment found issues with missing values, invalid dates, and potential for further cleaning and analysis.
Business Intelligence made easy! This is the first part of a two-part presentation I prepared for one of our customers to help them understand what Business Intelligence is and what can it do...
Bigdata analysis in supply chain managmentKushal Shah
big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
supply chain industry need this type of data to survive in every situations.
Best Practices for Killer Data VisualizationQualtrics
There’s something special about simple, powerful visualizations that tell a story. In fact, 65% of people are visual learners.
Join Qualtrics and Sasha Pasulka from Tableau as we illuminate the world of data visualization and give you clear takeaways to help you tell a better story with data. Getting executive buy-in or that seat at the table may come down to who can visualize data in a way that excites and enlightens the audience.
Data Analytics PowerPoint Presentation SlidesSlideTeam
This document discusses different sources of big data including media, cloud, web, internet of things, databases, social networks, activity-generated data, and legacy documents. It provides brief descriptions of each source, highlighting how they generate valuable insights. Media such as images, videos and social media provide consumer preference data. Cloud storage accommodates structured and unstructured data to provide real-time insights. The web and internet of things generate machine-generated data from various devices. Databases integrate traditional and modern data sources. Social networks and reviews provide user profile and influencer data. Activity logs also contribute to big data. Legacy documents remain an untapped resource.
This document provides an overview of the Power BI learning journey. It outlines the basic, intermediate, and advanced levels which include understanding Power Query, Power Pivot, DAX, Power View, and building reports in Power BI Desktop and the Power BI web/mobile apps. The three main stages are discover (with Power Query), analyze (with Power Pivot and DAX), and visualize (with Power View, Power Map, and Power BI tools). Understanding functions like CALCULATE, relationships, and measures is important for effective data modeling and dashboard creation in Power BI. Upcoming features and resources for continued learning are also mentioned.
Business analytics uses data, statistical analysis, and other quantitative techniques to help understand and optimize business performance. It is becoming a major tool used by many large corporations. There are various tools and techniques for business analytics, including online analytical processing (OLAP), data visualization, data mining, predictive analysis, and geographic information systems (GIS). Real-time business intelligence and automated decision support are also increasingly important for analytics.
This document discusses online analytical processing (OLAP) and related concepts. It defines data mining, data warehousing, OLTP, and OLAP. It explains that a data warehouse integrates data from multiple sources and stores historical data for analysis. OLAP allows users to easily extract and view data from different perspectives. The document also discusses OLAP cube operations like slicing, dicing, drilling, and pivoting. It describes different OLAP architectures like MOLAP, ROLAP, and HOLAP and data warehouse schemas and architecture.
This is the first presentation of a two part webinar for Blue ocean strategy.
The presentation introduces to red ocean and blue ocean companies, How blue ocean strategy is a simultaneous pursuit of cost and value.
The presentation provides a quick introduction with new age examples to strategy canvas, 6 paths framework, four actions frame work, buyer utility map, 3 tiers of non customers and PMS maps.
The presentation also utilizes these frameworks in showcasing descriptive case studies of companies like netjets, indochino.com, Zynga and khan academy.
This presentation is aimed at explaining the greatness of Blue ocean strategy thinking to general audience and does not imply distortion of facts and frameworks of the original Authors: Chan Kim, Renee Mauborgne
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
The document discusses business intelligence and data warehousing. It describes the evolution of business intelligence from manual data retrieval and report preparation to modern integrated systems that provide analytics, dashboards, reporting, and key performance indicators. The Performa BI Suite is presented as a user-friendly business intelligence software that offers advanced visualization tools, multidimensional analysis, and integrated analytics, dashboards, and reporting in a single platform. Testimonials from users praise Performa for its ease of use, stability, and ability to meet reporting and analysis needs.
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
This document summarizes a research paper that uses unsupervised machine learning algorithms to detect credit card fraud. It describes how credit card fraud has increased with the rise of online shopping and payments. Unsupervised algorithms are well-suited for this task since labeled fraud data can be difficult to obtain. The paper tests Isolation Forest, Local Outlier Factor, and One Class SVM on a credit card transaction dataset to find anomalies (fraudulent transactions). Isolation Forest achieved the highest accuracy at 99.74%, slightly outperforming Local Outlier Factor, while One Class SVM had much lower accuracy. The paper concludes unsupervised algorithms are effective for anomaly detection tasks like credit card fraud detection.
The document provides an overview of data, information, knowledge, and data mining. It defines data as facts/observations/measurements, information as processed data that is useful (e.g. for decision making), and knowledge as patterns in data/information with a high degree of certainty. Data mining is described as the process of extracting useful but non-obvious information from large databases through an interactive and iterative process. Common business applications and technologies involved in data mining are also discussed.
Visualisation & Storytelling in Data Science & AnalyticsFelipe Rego
The document provides an overview of data visualization and storytelling in data science and analytics. It discusses key concepts like what data visualization is, compelling reasons to visualize data like Anscombe's Quartet, visualization in the context of analytics workflows, components of effective storytelling, considerations for presentation, guidelines for data storytelling, and examples of interesting data visualizations. Throughout the document, the author emphasizes best practices like keeping visualizations clear, addressing the intended audience, and avoiding bias.
Data wrangling involves transforming raw data into a usable format through processes like merging data sources, identifying and removing gaps/errors, and structuring data. The main steps of data wrangling are discovery, structuring, cleaning, enriching, validating, and publishing. Data wrangling is important because it ensures data is reliable before analysis, improving insights and reducing risks from faulty data. It typically requires significant time and resources but yields major benefits like improved data usability, integration, and analytics. Common tools for data wrangling include Excel, OpenRefine, Tabula, Google DataPrep, and Data Wrangler.
OLAP (online analytical processing) allows users to easily extract and view data from different perspectives. It was invented by Edgar Codd in the 1980s and uses multidimensional data structures called cubes to store and analyze data. OLAP utilizes either a multidimensional (MOLAP), relational (ROLAP), or hybrid (HOLAP) approach to store cube data in databases and provide interactive analysis of data.
The client provided KPMG with 3 datasets for analysis: customer demographic data, customer address data, and transactions data from the past 3 months. The transactions data contains 20,000 rows with 26 columns, including customer, product, and transaction information. Some columns have missing values. There are no duplicate rows. Dates in one column are invalid as they all occur on the same day. The new customer list contains 1,000 rows of customer profile data. The data quality assessment found issues with missing values, invalid dates, and potential for further cleaning and analysis.
D365 Finance & Operations - Data & Analytics (see newer release of this docum...Gina Pabalan
This very comprehensive white paper provides a detailed and clear overview of Microsoft's D365 Finance & Operations solutions to support Data & Analytics.
There is a newer version of this available - search SlideShare for the new version of this deck.
Business Intelligence Presentation 1 (15th March'16)Muhammad Fahad
Business intelligence (BI) involves methods, processes, technologies, and tools to convert data into useful information that helps organizations make better plans and decisions. It has evolved from executive information systems and decision support systems in the 1980s to include data warehousing, dashboards, analytics, and big data capabilities today. BI provides benefits like improved management and operations, better adjustments to trends, and the ability to predict the future. It has applications across private and public sector organizations. The BI process involves requirements analysis, data modeling, ETL, analytics, and presentation. Key components are the data warehouse, OLAP, data mining, and visualization tools like reports, dashboards, and scorecards. The global BI market is expected to grow significantly
D365 F&O - Data and Analytics White PaperGina Pabalan
This very comprehensive white paper provides a detailed and clear overview of Microsoft's D365 Finance & Operations solutions to support Data & Analytics.
Customer Clustering for Retailer MarketingJonathan Sedar
This was a 90 min talk given to the Dublin R user group in Nov 2013. It describes how one might go about a data analysis project using the R language and packages, using an open source dataset.
This document provides an overview of online analytical processing (OLAP). It defines OLAP as a process for analyzing multidimensional data to help decision makers. OLAP uses data warehouses to store historical data in a structured format. It allows for analytical queries and operations like aggregation, roll-up, drill-down and slicing and dicing of data. SQL extensions and OLAP functions further aid analysis. OLAP systems can be MOLAP, ROLAP or HOLAP based on their architecture and data storage methods. Commercial OLAP systems include IBM, Oracle and Microsoft products.
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
In this presentation, executives from Denodo preview the new Denodo Platform 6.0 release that delivers Dynamic Query Optimizer, cloud offering on Amazon Web Services, and self-service data discovery and search. Over 30 analysts, led by Claudia Imhoff, provide input on strategic direction and benefits of Denodo 6.0 to the data virtualization and the broader data integration market.
This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/DR6r3m.
The random forest model generated 182 decision trees from the training data to classify whether users will continue their session or not, with an out-of-bag error rate of 34.17%. Important features were identified using the Gini index. The random forest model was able to successfully build a rule-based classification model with over 70% accuracy on the test data to identify if a user will continue or leave a session based on their behavior metrics.
This document provides an overview of Hyperion and Essbase. It discusses how raw data is transformed into information through data warehousing processes like extracting, transforming, and loading data. It then explains what an OLTP system is and how Essbase provides multi-dimensional analysis capabilities. Key features of Essbase like dimensions, facts, aggregation, and its architecture are summarized. Finally, the document outlines the typical lifecycle of building and maintaining an Essbase database application.
The document discusses OLAP cubes and data warehousing. It defines OLAP as online analytical processing used to analyze aggregated data in data warehouses. Key concepts covered include star schemas, dimensions and facts, cube operations like roll-up and drill-down, and different OLAP architectures like MOLAP and ROLAP that use multidimensional or relational storage respectively.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Denodo 6.0: Self Service Search, Discovery & Governance using an Universal Se...Denodo
Presentation slides taken from Fast Data Strategy Roadshow San Francisco Bay Area.
For more Denodo 6-0 demos, please follow this link:https://goo.gl/XkxJjX
Watch full webinar here: https://buff.ly/2mHGaLA
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
• What data virtualization really is
• How it differs from other enterprise data integration technologies
• Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations
Real -time data visualization using business intelligence techniques. and mak...MD Owes Quruny Shubho
Real-time data visualization using business intelligence techniques. and make a faster decision on sales data.
Business Intelligence is a way of gaining advantage form business using data. This data can be User information, Stock information sales report or any source that related to its business. From a large amount of data, business intelligence mining the information and convert them to knowledge which plays a role for the decision support system.BI is a mass effective way to make a data-driven decision.BI Visualize data and give us a visual look of data that can be easily understood.
June 10, 2010 BDPA Charlotte Program Meeting Presentation.
Presenter:
Markus Beamer, BDPA Charlotte President Elect
Topic:
Intelligent Data Strategies - Intro to Data Marts and Data Warehouses
This document provides an overview of different database technologies for managing large amounts of data, including row-based databases, columnar databases, and NoSQL databases. It discusses how traditional row-based databases struggle with analytics on large, dynamic datasets due to performance issues. Columnar databases help address this by storing data by column rather than row, reducing the amount of data retrieved for queries. NoSQL databases provide non-relational alternatives. The document aims to help readers understand which technology is best suited to their specific data challenges and needs.
The document provides details about Kevin Bengtson's SQL portfolio, including several database projects and T-SQL queries projects with examples. It also outlines SQL server administrative tasks performed and an SSIS/SSRS project involving creating a MiniAdventureWorks database. The final section describes a BlockFlix database designed for a video rental store.
Database Development Replication Security Maintenance Reportnyin27
The document discusses various database administration tasks including:
1. Creating stored procedures, functions, views and indexes
2. Configuring security using roles, permissions and encryption
3. Implementing database maintenance including backups, jobs, partitioning and monitoring
4. Setting up reports and notifications
Data Warehousing and Business Intelligence is one of the hottest skills today, and is the cornerstone for reporting, data science, and analytics. This course teaches the fundamentals with examples plus a project to fully illustrate the concepts.
This project utilizes the Amazon Rekognition API which is a software-based on deep learning technology used for analyzing images and videos with the confidence level.
The result is then emailed to the subscribed email address.
The whole process has been made possible using multiple AWS services.
Drug Review Analysis Using Elasticsearch and KibanaMonika Mishra
Web-based reviews can be viewed as an orthogonal source of information for consumers, physicians, and drug manufacturers to assess the performance of a drug. This project studies the drug review using Elasticsearch and Kibana.
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...Monika Mishra
Research on a small e-commerce clothing company called Natalie’s. We studied factors on which the yearly amount spent by consumers depend upon. We also developed a clustering model for the customer segmentation. Various regression models were also developed to predict the yearly amount spent.
Re-admit Historical using SAS Visual AnalyticsMonika Mishra
- Hospital readmissions are costly and result in $15-20 billion in expenses annually in the US. Preventing avoidable readmissions can improve patient quality of life and reduce healthcare costs.
- The study analyzed a dataset of over 142,000 hospital visits across 10 states from 2011-2012. It found that Florida had the highest number of visits and charges. The heart department had the highest operation count.
- Reducing preventable readmissions requires improving care coordination, patient education, and post-discharge support to ensure patients understand their treatment plan and who to contact if issues arise. The CMS Hospital Readmission Reduction Program financially penalizes hospitals with excess readmissions for certain conditions like heart failure to incentivize lower
Diabetic Encounter Analysis using SAS studioMonika Mishra
This document analyzes diabetic patient encounter data from 130 US hospitals from 1999-2008. Various data visualizations and statistical tests were performed on the dataset. A bar chart shows Caucasians had the most diabetic encounters, followed by African Americans. A box plot reveals the average number of diagnoses was around 7.6. An analysis of age groups found those from 70-80 had the highest inpatient encounters. Internal medicine saw the most patients. Females took diabetic medications slightly more than males. Caucasians accounted for the most inpatient and outpatient visits.
LA Energy and Water Efficiency Statistics using TableauMonika Mishra
This document provides an overview and analysis of an open dataset from the City of Los Angeles on existing building energy and water efficiency. The dataset includes energy and water usage benchmark data for over 6,000 buildings in LA. Various visualizations and statistics are presented analyzing trends in water usage, energy usage, building construction over time, compliance rates, and more to understand patterns and opportunities for improved efficiency. A dashboard combines several visualizations for easy comparison of key metrics.
Predicting Amazon Rating Using Spark ML and Azure MLMonika Mishra
The document describes using Spark ML and Azure ML to predict ratings on Amazon products. It uses various recommendation models like Matchbox Recommender, Collaborative Filtering, and Decision Forest/Boosted Decision Tree regression. Text analytics with Logistic Regression is also used to predict sentiment from reviews. Based on RMSE, some Azure ML models performed better than Spark ML models for recommendation and rating prediction. The document discusses the datasets, algorithms, results and challenges faced in modeling.
Big data analysis of the Amazon Product review using Hadoop and Hive on the Oracle Big Data Cloud platform. The visualization tools used are Tableau, Power BI and Microsoft Power Map
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Natural Language Processing (NLP), RAG and its applications .pptxfkyes25
1. In the realm of Natural Language Processing (NLP), knowledge-intensive tasks such as question answering, fact verification, and open-domain dialogue generation require the integration of vast and up-to-date information. Traditional neural models, though powerful, struggle with encoding all necessary knowledge within their parameters, leading to limitations in generalization and scalability. The paper "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" introduces RAG (Retrieval-Augmented Generation), a novel framework that synergizes retrieval mechanisms with generative models, enhancing performance by dynamically incorporating external knowledge during inference.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. CIS-5270 BUSINESS INTELLIGENCE
2
Table of Contents
S. No. Topic Page No.
1 Introduction and Goal 3
2 Data Set
1. Data Set URL
2. About the dataset
3. Dataset details
4. Column details
4
4
4
4-5
3 Data Cleaning
1. Renaming column
2. Removing unwanted column
3. Duplicating and splitting column
6-7
8-9
10-11
4 Analysis & Visualizations
1. Bar Chart
2. Histogram
3. Pie Chart
4. Tree Map
5. Correlation Matrix
6. Word Cloud
12-13
14-15
16-17
18-19
20-21
22-23
5 Statistical Summary & Functions
1. Statistical Summary
2. User Defined Functions
24-25
26-30
6 Code Summary 31-35
3. CIS-5270 BUSINESS INTELLIGENCE
3
INTRODUCTION AND GOAL
1. Introduction:
Superstores industry comprises of companies that operate by having large size spaces
which store and supply large amounts of goods. The superstore industry is comprised of
extensive stores that sell a typical product line of grocery items and merchandise
products, such as food, pharmaceuticals, apparel, games and toys, hobby items, furniture
and appliances. The analysis of such industry is of great importance as it gives insights
for the sales and profits of various products. Our analysis is based on a superstore dataset
for US country where the products are ordered between 2015 and 2018.
2. Goal: To find out various supermarket statistics such as –
Region that accounts for greater number of orders
Frequency distribution of quantity ordered
Percentage sales by category
Profitable category and sub-category
Category and sub-category that incurred losses
Product type that was ordered greater times
Yearly sales for various state.
With this analysis, the Superstore can identify various aspects of the shopping pattern and
take measures if required.
4. CIS-5270 BUSINESS INTELLIGENCE
4
DATA SET
1. Data Set URL:
https://data.world/stanke/sample-superstore-2018
2. About the dataset:
The dataset provides information about the sales and profit from a US supermarket from
the year 2015 to 2018.
3. Dataset details:
Size 2.4 MB
Number of columns 21
Number of rows 9994
Original file format XLS
4. Column details:
The dataset contains the following columns-
Column Name Column Detail
Row ID Unique row ID
Order ID Unique Order ID
Order Date Ordered Date of the Order
Ship Date Shipping Date of the Order
Ship Mode Shipping mode of the order
5. CIS-5270 BUSINESS INTELLIGENCE
5
Customer ID Unique ID of Customers
Customer Name Customer’s name
Segment Product Segment
Country US
City City of product ordered
State State of product ordered
Postal Code Postal code for the order
Region Region of product ordered
Product ID Unique Product id
Category Product category
Sub-Category Product sub-category
Product Name Name of the product
Sales Sales contribution of the order
Quantity Quantity ordered
Discount Discount provided on order
Profit Profit for the order
6. CIS-5270 BUSINESS INTELLIGENCE
6
DATA CLEANING
1. Renaming Column
Goal: The Colum name “CT” was not proper. The aim is to rename the column to “City”
Before
After
Code Used
8. CIS-5270 BUSINESS INTELLIGENCE
8
2. Removing unwanted Column
Goal: The Column named “Country” needs to be removed as it contains only one value
“United States”
Before
After
10. CIS-5270 BUSINESS INTELLIGENCE
10
3. Duplicating the column and Splitting it into 3 columns
Goal: To duplicate the column “Order.Date” to “order” and then split “order” into month,
day and year
Before
After
After duplicating After splitting order column
No column after Profit
11. CIS-5270 BUSINESS INTELLIGENCE
11
Code Used
superstore$order<-superstore$Order.Date
library(tidyr)
superstore<-separate(superstore,order,c("month","day","year"),sep="/")
Full Screenshot
12. CIS-5270 BUSINESS INTELLIGENCE
12
ANALYSIS & VISUALIZATIONS
1. What is the total number of orders by region?
Plot Type - Bar Chart
Function Used – barplot, table
Analysis
The above bar chart displays the total number of orders by region. It can be seen that the
Western region has the maximum order count (greater than 3000). The Western region is
followed by the Eastern region having an order count close to 3000. It is then followed by
the Central region with a count of around 2300. The least order has been placed by
Southern region (around 1500).
13. CIS-5270 BUSINESS INTELLIGENCE
13
Code Used
> countsR <- table(superstore$Region)
> barplot(countsR, main="Total Orders by Region",
+ xlab="Region", col="lightblue")
Full Screenshot
14. CIS-5270 BUSINESS INTELLIGENCE
14
2. What is the frequency distribution of quantity ordered?
Plot Type - Histogram
Function Used – hist
Analysis
The above histogram chart shows the frequency distribution of the quantity ordered. The
maximum ordered quantity is 1 which is greater than 3000. It is then followed by 2, the
frequency for which is close to 2500. Generally speaking, the frequency count is
decreasing as the quantity ordered is increasing. The quantity ordered 14 has the least
frequency.
15. CIS-5270 BUSINESS INTELLIGENCE
15
Code Used
> hist(superstore$Quantity, main="Frequency Distribution of Quantity
Ordered",
+
+ xlab="Quantity Ordered", ylab= "Frequency", col="lightpink")
Full Screenshot
16. CIS-5270 BUSINESS INTELLIGENCE
16
3. What is the percentage sales by category?
Plot Type – Pie Chart
Function Used – pie, group_by, summarize, round, paste
Analysis
The above pie chart shows the percentage sales by category. There are three categories –
Technology, Furniture and Office Supplies. Product category “Technology” has
contributed maximum towards sales which is 36%. It is then followed “Furniture” which
is 32%. “Office Supplies” has contributed the least which is 31%.
18. CIS-5270 BUSINESS INTELLIGENCE
18
4. Which sub-category incurred losses? Which is the most profitable sub-category?
How are the overall sales for various category and sub-category?
Plot Type – Tree Map
Function Used – list, treemap
Analysis
The above is a Tree Map which provides information about the sales and profit of various
product category and sub-category. The cell size is decided by the sales. The color
gradient describes the profit. It can be concluded from the above map that the sub-
category “Phones” under “Technology” has the highest sale. The sub-category
“Furniture” incurred losses. Most profitable sub-category is “Copiers”.
19. CIS-5270 BUSINESS INTELLIGENCE
19
Code Used
> install.packages("treemap")
> library(treemap)
> treemap(data,index = c("Category","Sub.Category"),vSize ="Sales",vColor =
"Profit",type="value",palette="RdYlGn",range=c(-20000,60000),mapping=c(-
20000,10000,60000),title = "Sales Treemap For categories",fontsize.labels =
c(15,10),align.labels = list(c("centre","centre"),c("left","top")))
Full Screenshot
20. CIS-5270 BUSINESS INTELLIGENCE
20
5. What is the co-relationship between Sales, Quantity, Discount and Profit?
Plot Type – Correlation Matrix
Function Used – corrplot, cor
Analysis
This is a co-relation matrix chart which provide the co-relationship information about
various variables. The color gradient from Red to Blue describes the extent of co-
relationship among Sales, Quantity, Discount and Profit, red being the negative co-
relationship and blue being the positive co-relationship. It can be seen that “Sales” and
“Profit” are somewhat related. “Profit” and “Quantity” are also very weakly related.
“Profit” and “Discount” are negatively related.
21. CIS-5270 BUSINESS INTELLIGENCE
21
Code Used
> install.packages("corrplot")
> mydata <- superstore[, c(18,19,20,21)]
> View(mydata)
> library(corrplot)
> mydata.cor = cor(mydata)
> mydata.cor
> corrplot(mydata.cor)
Full Screenshot
22. CIS-5270 BUSINESS INTELLIGENCE
22
6. What are the product types that have been ordered maximum times?
Plot Type – Word Cloud
Function Used – wordcloud
Analysis
Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a
specific word appears in a source of textual data (such as a speech, blog post, or
database), the bigger and bolder it appears in the word cloud. In our case we want to
know what kind of products have been ordered frequently. Looking at the above word
cloud, it is clear product related to “Xerox” has been ordered the most. The product
related to binders, chairs and avery have also been ordered many times.
24. CIS-5270 BUSINESS INTELLIGENCE
24
STATISTICAL SUMMARY & FUNCTIONS
1. Statistical Summary
Question - Provide a statistical summary of the Sales.
Answer – Given below is the statistical summary of the Sales:
Statistics Value Meaning
Min.
(Minimum) 0.444 The lowest value of the sales present in the table
1st Qu.
(First
Quartile)
17.280
The first quartile (Q1) is defined as the middle number
between the smallest number and the median of the data
set. It splits off the lowest 25% of data from the highest
75%.
Median 54.490
It represents the middle number in a given sequence of
numbers when it’s ordered by rank.
Mean 229.858
It is the average of the Sales. It is the summation of all
Sales number divided by total number of Sales.
3rd Qu.
(Third
Quartile)
209.940
The third quartile (Q3) is defined as the middle number
between the median and the highest value of the data set.
It splits off the highest 25% of data from the lowest 75%.
Max.
(Maximum)
22638.480 The highest value of the sales present in the table.
25. CIS-5270 BUSINESS INTELLIGENCE
25
Code Usedfor Execution
> setwd("~/Desktop/BI")
> superstore<-read.csv("superstore.csv")
> View(superstore)
> summary(superstore$Sales)
Result
Full Screenshot
26. CIS-5270 BUSINESS INTELLIGENCE
26
2. User Defined Function
Question – What is the total sales for each year for a particular user provided state ?
Answer – As a solution to the above question, we created a user defined function, which
takes state name as input parameter and displays total sales by year for the provided state
by plotting a line graph.
The state name provided by the user is validated to check if the name is there in
superstore table or not. If not present, an error message is shown. If present, the line chart
is plotted to display the result.
Full Screenshot
29. CIS-5270 BUSINESS INTELLIGENCE
29
Function Code
# Function returns total sales by year for the entered state
statesales<-function(inputstate)
{
# importing libraries
library(tidyr)
library(dplyr)
library(ggplot2)
print(paste("The State provided by the user is: ", inputstate))
# retrieving distinct state name from the table
state_name<-distinct(superstore, State)
# checking if the state provided is correct or not
isvalid<- any(state_name == inputstate)
# if the state name provided is valid, a graph will be plotted
if (isvalid==TRUE)
{
selected<-select(superstore, State, Sales, year)
filtered<-filter(selected,State==inputstate)
aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum)
print(aggregated)
# plotting line chart
ggplot(data=aggregated, aes(x=Group.1, y=x, group=1)) + geom_line(color="red")
+
geom_point(color="blue")+xlab("Year") + ylab("Total Sales") +
ggtitle("Total Sales by year")
}
else
{ print('Enter correct state name') }
}
30. CIS-5270 BUSINESS INTELLIGENCE
30
Execution Script
> setwd("~/Desktop/BI")
> source("sales.R")
> statesales("LA")
[1] "The State provided by the user is: LA"
[1] "Enter correct state name"
> statesales("California")
[1] "The State provided by the user is: California"
Group.1 x
1 15 91303.53
2 16 88443.84
3 17 131551.91
4 18 146388.34
31. CIS-5270 BUSINESS INTELLIGENCE
31
CODE SUMMARY
1. Data Cleaning Codes
a. Renaming Column
colnames(superstore)[colnames(superstore)=="CT"] <- "City"
b. Removing unwanted Column
superstore = subset(superstore, select = -c(Country) )
c. Duplicating the column and splitting into 3 columns
superstore$order<-superstore$Order.Date
library(tidyr)
superstore<-separate(superstore,order,c("month","day","year"),sep="/")
32. CIS-5270 BUSINESS INTELLIGENCE
32
2. Visualization Codes
a. Bar Chart
> countsR <- table(superstore$Region)
> barplot(countsR, main="Total Orders by Region",
+ xlab="Region", col="lightblue")
b. Histogram
> hist(superstore$Quantity, main="Frequency Distribution of Quantity
Ordered",
+
+ xlab="Quantity Ordered", ylab= "Frequency", col="lightpink")
c. Pie Chart
> install.packages("dplyr")
> library("dplyr")
> library(magrittr)
> gd <- superstore %>% group_by(Category) %>% summarize(Sales=sum(Sales))
> pct<-round(gd$Sales/sum(gd$Sales)*100)
> lbls<-paste(gd$Category,pct)
> lbls<-paste(lbls, "%", sep= " ")
> colors = c('lightskyblue','plum2','peachpuff')
> pie(gd$Sales, labels = lbls,main="Percentage Sales By Category",col=colors)
35. CIS-5270 BUSINESS INTELLIGENCE
35
4. User Defined Function Code
# Function returns total sales by year for the entered state
statesales<-function(inputstate)
{
# importing libraries
library(tidyr)
library(dplyr)
library(ggplot2)
print(paste("The State provided by the user is: ", inputstate))
# retrieving distinct state name from the table
state_name<-distinct(superstore, State)
# checking if the state provided is correct or not
isvalid<- any(state_name == inputstate)
# if the state name provided is valid, a graph will be plotted
if (isvalid==TRUE)
{
selected<-select(superstore, State, Sales, year)
filtered<-filter(selected,State==inputstate)
aggregated<-aggregate(filtered$Sales,by=list(filtered$year),sum)
print(aggregated)
# plotting line chart
ggplot(data=aggregated, aes(x=Group.1, y=x, group=1)) + geom_line(color="red")
+
geom_point(color="blue")+xlab("Year") + ylab("Total Sales") +
ggtitle("Total Sales by year")
}
else
{ print('Enter correct state name') }
}