Project for System Analysis and Design (IS-6410).
By performing customer segmentation following are the three objectives which can be achieved
with the implementation of this new analytics system:
1. We can track the difference between loyal customers vs visitors, perform heat map
analysis of their browsing patterns.
2. Understanding customer demographics and to focus on high profitable segments.
3. Finally empowering our Marketing department to make better strategic decisions in
terms of online Ads/campaigns.
This document discusses using predictive analytics and segmentation analysis on a telecom customer dataset. It performed three types of segmentation - demographic, customer status, and customer usage. For each segmentation, it identified 4 clusters and described the characteristics of customers in each cluster. It then performed a cross-cluster analysis to find associations between the different segmentations that could provide business insights. For example, it found valuable young adults tended to be cosmopolitan users who use international calling frequently. The document also discusses the benefits of predictive analytics, including gaining competitive advantages and improving operations. It provides an insurance case study as an example and maps how predictive analytics helps insurers achieve outcomes like growing their business and enforcing fraud detection.
This document provides an introduction to big data analytics and data science, covering topics such as the growth of data, what big data is, the emergence of big data tools, traditional and new data management architectures including data lakes, and big data analytics. It also discusses roles in data science including data scientists and data visualization.
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
The key components of a data warehouse are the source data component, data staging component, data storage component, information delivery component, meta-data component, and management and control component. The source data component includes production data, internal data, archived data, and external data. The data staging component involves extracting, transforming through processes like handling synonyms and homonyms, and loading the data. The information delivery component provides access and reports to different user types from novice to senior executives.
Data Mining: Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingSalah Amean
the chapter contains :
Data Preprocessing: An Overview,
Data Quality,
Major Tasks in Data Preprocessing,
Data Cleaning,
Data Integration,
Data Reduction,
Data Transformation and Data Discretization,
Summary.
Data Mining, KDD Process, Data mining functionalities, Characterization,
Discrimination ,
Association,
Classification,
Prediction,
Clustering,
Outlier analysis, Data Cleaning as a Process
The document discusses data warehouses and their advantages. It describes the different views of a data warehouse including the top-down view, data source view, data warehouse view, and business query view. It also discusses approaches to building a data warehouse, including top-down and bottom-up, and steps involved including planning, requirements, design, integration, and deployment. Finally, it discusses technologies used to populate and refresh data warehouses like extraction, cleaning, transformation, load, and refresh tools.
This document discusses using predictive analytics and segmentation analysis on a telecom customer dataset. It performed three types of segmentation - demographic, customer status, and customer usage. For each segmentation, it identified 4 clusters and described the characteristics of customers in each cluster. It then performed a cross-cluster analysis to find associations between the different segmentations that could provide business insights. For example, it found valuable young adults tended to be cosmopolitan users who use international calling frequently. The document also discusses the benefits of predictive analytics, including gaining competitive advantages and improving operations. It provides an insurance case study as an example and maps how predictive analytics helps insurers achieve outcomes like growing their business and enforcing fraud detection.
This document provides an introduction to big data analytics and data science, covering topics such as the growth of data, what big data is, the emergence of big data tools, traditional and new data management architectures including data lakes, and big data analytics. It also discusses roles in data science including data scientists and data visualization.
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
The key components of a data warehouse are the source data component, data staging component, data storage component, information delivery component, meta-data component, and management and control component. The source data component includes production data, internal data, archived data, and external data. The data staging component involves extracting, transforming through processes like handling synonyms and homonyms, and loading the data. The information delivery component provides access and reports to different user types from novice to senior executives.
Data Mining: Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingSalah Amean
the chapter contains :
Data Preprocessing: An Overview,
Data Quality,
Major Tasks in Data Preprocessing,
Data Cleaning,
Data Integration,
Data Reduction,
Data Transformation and Data Discretization,
Summary.
Data Mining, KDD Process, Data mining functionalities, Characterization,
Discrimination ,
Association,
Classification,
Prediction,
Clustering,
Outlier analysis, Data Cleaning as a Process
The document discusses data warehouses and their advantages. It describes the different views of a data warehouse including the top-down view, data source view, data warehouse view, and business query view. It also discusses approaches to building a data warehouse, including top-down and bottom-up, and steps involved including planning, requirements, design, integration, and deployment. Finally, it discusses technologies used to populate and refresh data warehouses like extraction, cleaning, transformation, load, and refresh tools.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
This document introduces data science, big data, and data analytics. It discusses the roles of data scientists, big data professionals, and data analysts. Data scientists use machine learning and AI to find patterns in data from multiple sources to make predictions. Big data professionals build large-scale data processing systems and use big data tools. Data analysts acquire, analyze, and process data to find insights and create reports. The document also provides examples of how Netflix uses data analytics, data science, and big data professionals to optimize content caching, quality, and create personalized streaming experiences based on quality of experience and user behavior analysis.
Web mining uses data mining techniques to extract information from web documents and services. It involves web content mining of page content and search results, web structure mining of hyperlink structures, and web usage mining of server logs to find user access patterns. Data mining techniques like classification, clustering, and association rule mining can be applied to web data to discover useful patterns and information.
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
The document summarizes information about a group project involving data stream clustering. It lists the group members and then discusses key concepts related to data stream clustering like requirements for algorithms, common algorithm types and steps, prototypes and windows. It also touches on outliers and applications of clustering.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDATAVERSITY
Data analysis can be divided into descriptive, prescriptive and predictive analytics. Descriptive analytics aims to help uncover valuable insight from the data being analyzed. Prescriptive analytics suggests conclusions or actions that may be taken based on the analysis. Predictive analytics focuses on the application of statistical models to help forecast the behavior of people and markets.
This webinar will compare and contrast these different data analysis activities and cover:
- Statistical Analysis – forming a hypothesis, identifying appropriate sources and proving / disproving the hypothesis
- Descriptive Data Analytics – finding patterns
- Predictive Analytics – creating models of behavior
- Prescriptive Analytics – acting on insight
- How the analytic environment differs for each
Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives. It originated in the 1970s and was formalized in 1993, with OLAP cubes organizing numeric facts by dimensions to enable fast analysis. OLAP provides operations like roll-up, drill-down, slice, and dice to analyze aggregated data across multiple systems. It offers advantages over relational databases for consistent reporting and analysis.
The document provides an introduction to data analytics, including defining key terms like data, information, and analytics. It outlines the learning outcomes which are the basic definition of data analytics concepts, different variable types, types of analytics, and the analytics life cycle. The analytics life cycle is described in detail and involves problem identification, hypothesis formulation, data collection, data exploration, model building, and model validation/evaluation. Different variable types like numerical, categorical, and ordinal variables are also defined.
This presentation gives the idea about Data Preprocessing in the field of Data Mining. Images, examples and other things are adopted from "Data Mining Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei "
This document outlines a presentation on web mining. It begins with an introduction comparing data mining and web mining, noting that web mining extracts information from the world wide web. It then discusses the reasons for and types of web mining, including web content, structure, and usage mining. The document also covers the architecture and applications of web mining, challenges, and provides recommendations.
The document discusses frequent itemset mining methods. It describes the Apriori algorithm which uses a candidate generation-and-test approach involving joining and pruning steps. It also describes the FP-Growth method which mines frequent itemsets without candidate generation by building a frequent-pattern tree. The advantages of each method are provided, such as Apriori being easily parallelized but requiring multiple database scans.
This document discusses big data mining. It defines big data as large volumes of structured and unstructured data that are difficult to process using traditional methods due to their size. It describes the characteristics of big data including volume, variety, velocity, variability, and complexity. It also discusses challenges of big data such as data location, volume, hardware resources, and privacy. Popular tools for big data mining include Hadoop, Apache S4, Storm, Apache Mahout, and MOA. Hadoop is an open source software framework that allows distributed processing of large datasets across clusters of computers. Common algorithms for big data mining operate at the model and knowledge levels to discover patterns and correlations across distributed data sources.
This document provides an overview of application trends in data mining. It discusses how data mining is used for financial data analysis, customer analysis in retail and telecommunications, biological data analysis, scientific research, intrusion detection, and more. It also outlines statistical and visualization techniques used in data mining as well as privacy and security considerations. The document concludes by encouraging the reader to explore additional self-help tutorials on data mining tools and techniques.
This document provides an overview of big data in various industries. It begins by defining big data and explaining the three V's of big data - volume, variety, and velocity. It then discusses examples of big data in digital marketing, financial services, and healthcare. For digital marketing, it discusses database marketers as pioneers of big data and how big data is transforming digital marketing. For financial services, it discusses how big data is used for fraud detection and credit risk management. It also provides details on algorithmic trading and how it crunches complex interrelated big data. Overall, the document outlines how big data is being leveraged across industries to improve operations, increase revenues, and achieve competitive advantages.
The document is a chapter from a textbook on data mining written by Akannsha A. Totewar, a professor at YCCE in Nagpur, India. It provides an introduction to data mining, including definitions of data mining, the motivation and evolution of the field, common data mining tasks, and major issues in data mining such as methodology, performance, and privacy.
It is an introduction to Data Analytics, its applications in different domains, the stages of Analytics project and the different phases of Data Analytics life cycle.
I deeply acknowledge the sources from which I could consolidate the material.
Data mining is an important part of business intelligence and refers to discovering interesting patterns from large amounts of data. It involves applying techniques from multiple disciplines like statistics, machine learning, and information science to large datasets. While organizations collect vast amounts of data, data mining is needed to extract useful knowledge and insights from it. Some common techniques of data mining include classification, clustering, association analysis, and outlier detection. Data mining tools can help organizations apply these techniques to gain intelligence from their data warehouses.
The document discusses frequent pattern mining and the Apriori algorithm. It introduces frequent patterns as frequently occurring sets of items in transaction data. The Apriori algorithm is described as a seminal method for mining frequent itemsets via multiple passes over the data, generating candidate itemsets and pruning those that are not frequent. Challenges with Apriori include multiple database scans and large number of candidate sets generated.
Why BI ?
Performance management
Identify trends
Cash flow trend
Fine-tune operations
Sales pipeline analysis
Future projections
business Forecasting
Decision Making Tools
Convert data into information
How to Think ?
What happened?
What is happening?
Why did it happen?
What will happen?
What do I want to happen?
Business intelligence (BI) systems allow companies to gather, store, access, and analyze corporate data to aid in decision-making. These systems illustrate intelligence in areas like customer profiling, market research, and product profitability. A hotel franchise uses BI to compile statistics on metrics like occupancy and room rates to analyze performance and competitive position. Banks also use BI to determine their most profitable customers and which customers to target for new products.
This document discusses different architectures for big data systems, including traditional, streaming, lambda, kappa, and unified architectures. The traditional architecture focuses on batch processing stored data using Hadoop. Streaming architectures enable low-latency analysis of real-time data streams. Lambda architecture combines batch and streaming for flexibility. Kappa architecture avoids duplicating processing logic. Finally, a unified architecture trains models on batch data and applies them to real-time streams. Choosing the right architecture depends on use cases and available components.
This document introduces data science, big data, and data analytics. It discusses the roles of data scientists, big data professionals, and data analysts. Data scientists use machine learning and AI to find patterns in data from multiple sources to make predictions. Big data professionals build large-scale data processing systems and use big data tools. Data analysts acquire, analyze, and process data to find insights and create reports. The document also provides examples of how Netflix uses data analytics, data science, and big data professionals to optimize content caching, quality, and create personalized streaming experiences based on quality of experience and user behavior analysis.
Web mining uses data mining techniques to extract information from web documents and services. It involves web content mining of page content and search results, web structure mining of hyperlink structures, and web usage mining of server logs to find user access patterns. Data mining techniques like classification, clustering, and association rule mining can be applied to web data to discover useful patterns and information.
Clustering for Stream and Parallelism (DATA ANALYTICS)DheerajPachauri
The document summarizes information about a group project involving data stream clustering. It lists the group members and then discusses key concepts related to data stream clustering like requirements for algorithms, common algorithm types and steps, prototypes and windows. It also touches on outliers and applications of clustering.
The Information Technology have led us into an era where the production, sharing and use of information are now part of everyday life and of which we are often unaware actors almost: it is now almost inevitable not leave a digital trail of many of the actions we do every day; for example, by digital content such as photos, videos, blog posts and everything that revolves around the social networks (Facebook and Twitter in particular). Added to this is that with the "internet of things", we see an increase in devices such as watches, bracelets, thermostats and many other items that are able to connect to the network and therefore generate large data streams. This explosion of data justifies the birth, in the world of the term Big Data: it indicates the data produced in large quantities, with remarkable speed and in different formats, which requires processing technologies and resources that go far beyond the conventional systems management and storage of data. It is immediately clear that, 1) models of data storage based on the relational model, and 2) processing systems based on stored procedures and computations on grids are not applicable in these contexts. As regards the point 1, the RDBMS, widely used for a great variety of applications, have some problems when the amount of data grows beyond certain limits. The scalability and cost of implementation are only a part of the disadvantages: very often, in fact, when there is opposite to the management of big data, also the variability, or the lack of a fixed structure, represents a significant problem. This has given a boost to the development of the NoSQL database. The website NoSQL Databases defines NoSQL databases such as "Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open source and horizontally scalable." These databases are: distributed, open source, scalable horizontally, without a predetermined pattern (key-value, column-oriented, document-based and graph-based), easily replicable, devoid of the ACID and can handle large amounts of data. These databases are integrated or integrated with processing tools based on the MapReduce paradigm proposed by Google in 2009. MapReduce with the open source Hadoop framework represent the new model for distributed processing of large amounts of data that goes to supplant techniques based on stored procedures and computational grids (step 2). The relational model taught courses in basic database design, has many limitations compared to the demands posed by new applications based on Big Data and NoSQL databases that use to store data and MapReduce to process large amounts of data.
Course Website http://pbdmng.datatoknowledge.it/
Contact me for other informations and to download
DI&A Slides: Descriptive, Prescriptive, and Predictive AnalyticsDATAVERSITY
Data analysis can be divided into descriptive, prescriptive and predictive analytics. Descriptive analytics aims to help uncover valuable insight from the data being analyzed. Prescriptive analytics suggests conclusions or actions that may be taken based on the analysis. Predictive analytics focuses on the application of statistical models to help forecast the behavior of people and markets.
This webinar will compare and contrast these different data analysis activities and cover:
- Statistical Analysis – forming a hypothesis, identifying appropriate sources and proving / disproving the hypothesis
- Descriptive Data Analytics – finding patterns
- Predictive Analytics – creating models of behavior
- Prescriptive Analytics – acting on insight
- How the analytic environment differs for each
Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives. It originated in the 1970s and was formalized in 1993, with OLAP cubes organizing numeric facts by dimensions to enable fast analysis. OLAP provides operations like roll-up, drill-down, slice, and dice to analyze aggregated data across multiple systems. It offers advantages over relational databases for consistent reporting and analysis.
The document provides an introduction to data analytics, including defining key terms like data, information, and analytics. It outlines the learning outcomes which are the basic definition of data analytics concepts, different variable types, types of analytics, and the analytics life cycle. The analytics life cycle is described in detail and involves problem identification, hypothesis formulation, data collection, data exploration, model building, and model validation/evaluation. Different variable types like numerical, categorical, and ordinal variables are also defined.
This presentation gives the idea about Data Preprocessing in the field of Data Mining. Images, examples and other things are adopted from "Data Mining Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei "
This document outlines a presentation on web mining. It begins with an introduction comparing data mining and web mining, noting that web mining extracts information from the world wide web. It then discusses the reasons for and types of web mining, including web content, structure, and usage mining. The document also covers the architecture and applications of web mining, challenges, and provides recommendations.
The document discusses frequent itemset mining methods. It describes the Apriori algorithm which uses a candidate generation-and-test approach involving joining and pruning steps. It also describes the FP-Growth method which mines frequent itemsets without candidate generation by building a frequent-pattern tree. The advantages of each method are provided, such as Apriori being easily parallelized but requiring multiple database scans.
This document discusses big data mining. It defines big data as large volumes of structured and unstructured data that are difficult to process using traditional methods due to their size. It describes the characteristics of big data including volume, variety, velocity, variability, and complexity. It also discusses challenges of big data such as data location, volume, hardware resources, and privacy. Popular tools for big data mining include Hadoop, Apache S4, Storm, Apache Mahout, and MOA. Hadoop is an open source software framework that allows distributed processing of large datasets across clusters of computers. Common algorithms for big data mining operate at the model and knowledge levels to discover patterns and correlations across distributed data sources.
This document provides an overview of application trends in data mining. It discusses how data mining is used for financial data analysis, customer analysis in retail and telecommunications, biological data analysis, scientific research, intrusion detection, and more. It also outlines statistical and visualization techniques used in data mining as well as privacy and security considerations. The document concludes by encouraging the reader to explore additional self-help tutorials on data mining tools and techniques.
This document provides an overview of big data in various industries. It begins by defining big data and explaining the three V's of big data - volume, variety, and velocity. It then discusses examples of big data in digital marketing, financial services, and healthcare. For digital marketing, it discusses database marketers as pioneers of big data and how big data is transforming digital marketing. For financial services, it discusses how big data is used for fraud detection and credit risk management. It also provides details on algorithmic trading and how it crunches complex interrelated big data. Overall, the document outlines how big data is being leveraged across industries to improve operations, increase revenues, and achieve competitive advantages.
The document is a chapter from a textbook on data mining written by Akannsha A. Totewar, a professor at YCCE in Nagpur, India. It provides an introduction to data mining, including definitions of data mining, the motivation and evolution of the field, common data mining tasks, and major issues in data mining such as methodology, performance, and privacy.
It is an introduction to Data Analytics, its applications in different domains, the stages of Analytics project and the different phases of Data Analytics life cycle.
I deeply acknowledge the sources from which I could consolidate the material.
Data mining is an important part of business intelligence and refers to discovering interesting patterns from large amounts of data. It involves applying techniques from multiple disciplines like statistics, machine learning, and information science to large datasets. While organizations collect vast amounts of data, data mining is needed to extract useful knowledge and insights from it. Some common techniques of data mining include classification, clustering, association analysis, and outlier detection. Data mining tools can help organizations apply these techniques to gain intelligence from their data warehouses.
The document discusses frequent pattern mining and the Apriori algorithm. It introduces frequent patterns as frequently occurring sets of items in transaction data. The Apriori algorithm is described as a seminal method for mining frequent itemsets via multiple passes over the data, generating candidate itemsets and pruning those that are not frequent. Challenges with Apriori include multiple database scans and large number of candidate sets generated.
Why BI ?
Performance management
Identify trends
Cash flow trend
Fine-tune operations
Sales pipeline analysis
Future projections
business Forecasting
Decision Making Tools
Convert data into information
How to Think ?
What happened?
What is happening?
Why did it happen?
What will happen?
What do I want to happen?
Business intelligence (BI) systems allow companies to gather, store, access, and analyze corporate data to aid in decision-making. These systems illustrate intelligence in areas like customer profiling, market research, and product profitability. A hotel franchise uses BI to compile statistics on metrics like occupancy and room rates to analyze performance and competitive position. Banks also use BI to determine their most profitable customers and which customers to target for new products.
Customer analytics. Turn big data into big valueJosep Arroyo
BIRT Analytics is a customer analytics solution that allows companies to gain valuable insights from big data. It integrates data from multiple sources, analyzes large volumes of data, and provides clear and granular customer information. Tools allow users to explore data, identify patterns, profile customers, and forecast trends. Advanced analytics help optimize marketing, identify cross-sell opportunities, and understand customer behavior. The solution aims to help companies understand customer needs and adapt strategies based on real customer data.
This document discusses using Microsoft Excel 2013 and Microsoft Access to create an offers bank decision support system (DSS). It proposes a 4 phase approach: 1) Create a database and star schema using Access, 2) Fill the database with data by defining dimensions and measures and retrieving data in Excel, 3) Create a dashboard in Excel, 4) Analyze past trends and predict future trends using data mining. The document also provides background on business intelligence solutions and reviews literature on using BI to turn raw data into meaningful business insights.
Data Architecture Process in a BI environmentSasha Citino
The document discusses the role of data architects in a business intelligence (BI) environment. It begins with an introduction to the author and their experience. It then provides an overview of what BI is and how it relates to data warehousing. The main roles and responsibilities of data architects are then outlined, including dimensional modeling, integration design, and defining architecture standards. Finally, it describes the typical steps in the data architecture process, from requirements gathering to data profiling, conceptual modeling, and physical design.
1. The document discusses Business Intelligence and analytics using Oracle BI Foundation Suite. It provides an overview of the different components, capabilities, and features of Oracle BI including the BI Server, presentation layer, data warehousing, ETL processes, and end users.
2. It describes the different modules of Oracle BI including dashboards, KPIs, reports, predictive analysis, and graphical OLAP. It also discusses the hardware and software components needed for a complete Oracle BI solution.
3. Screenshots are provided showing how to create a database connection in Oracle BI, indicating how users can access and work with data through the presentation layer.
This document provides an agenda and overview for a data warehousing training session. The agenda covers topics such as data warehouse introductions, reviewing relational database management systems and SQL commands, and includes a case study discussion with Q&A. Background information is also provided on the project manager leading the training.
Business intelligence environments involve collecting data from various sources, transforming and organizing it using tools like ETL, and storing it in data warehouses or marts. This data is then analyzed using OLAP and reporting tools to provide useful information for business decisions. Setting up an effective BI environment requires understanding business requirements, defining processes, determining data needs, integrating data sources, and selecting appropriate tools and techniques. Careful planning and skilled people are needed to ensure the BI environment supports organizational goals.
The document provides an overview of data warehousing and data mining. It discusses what a data warehouse is, how it is structured, and how it can help organizations make better decisions by integrating data from multiple sources and facilitating online analytical processing (OLAP). It also covers key components of a data warehousing architecture like the data manager, data acquisition, metadata repository, and middleware that connect the data warehouse to operational databases and analytical tools.
The document provides an overview of data warehousing, decision support, online analytical processing (OLAP), and data mining. It discusses what data warehousing is, how it can help organizations make better decisions by integrating data from various sources and making it available for analysis. It also describes OLAP as a way to transform warehouse data into meaningful information for interactive analysis, and lists some common OLAP operations like roll-up, drill-down, slice and dice, and pivot. Finally, it gives a brief introduction to data mining as the process of extracting patterns and relationships from data.
[Notes] Customer 360 Analytics with LEO CDPTrieu Nguyen
Part 1: Why should every business need to deploy a CDP ?
1. Big data is the reality of business today
2. What are technologies to manage customer data ?
3. The rise of first-party data and new technologies for Digital Marketing
4. How to apply USPA mindset to build your CDP for data-driven business
Part 2: How to use LEO CDP for your business
1. Core functions of LEO CDP for marketers and IT managers
2. Data Unification for Customer 360 Analytics
3. Data Segmentation
4. Customer Personalization
5. Customer Data Activation
Part 3: Case study in O2O Retail and Ecommerce
1. How to build customer journey map for ecommerce and retail
2. How to do customer analytics to find ideal customer profiles
The ideal customer profile in a B2B context
The ideal customer profile in a B2C context
3. Manage product catalog for customer personalization
4. Monitoring Data of Customer Experience (CX Analytics)
CX Data Flow
CX Rating plugin is embedded in the website, to collect feedback data
An overview of CX Report
A CX Report in a customer profile
5. Monitoring data with real-time event tracking reports
Event Data Flow
Summary Event Data Report
Event Data Report in a Customer Profile
Part 4: How to setup an instance of LEO CDP for free
1. Technical architecture
2. Server infrastructure
3. Setup middlewares: Nginx, ArangoDB, Redis, Java and Python
Network requirements
Software requirements for new server
ArangoDB
Nginx Proxy
SSL for Nginx Server
Java 8 JVM
Redis
Install Notes for Linux Server
Clone binary code for new server
Set DNS hosts for LEO CDP workers
4. Setup data for testing and system verification
Part 5: Summary all key ideas
Data science is being applied to solve a wide variety of problems across many industries. It uses techniques from many fields like statistics, machine learning, and data mining to analyze large amounts of data and extract useful insights. While technical skills are important for data scientists, soft skills like communication, collaboration, and problem solving are also critical for effectively applying data science and ensuring business value. Many organizations are now using data science for applications like customer segmentation, predictive modeling, marketing attribution, and performance management.
The document discusses Magento Business Intelligence and how it helps merchants overcome common data and analytics challenges. It provides an overview of MBI's platform capabilities like data connection, consolidation, transformation, warehousing, analysis and visualization. It also outlines the Essentials and Pro tiers, included features, pricing and examples of how MBI has helped companies like Truly Experiences and Guideboat improve marketing ROI and identify qualified leads through data-driven insights.
MarketView Marketing Database Platform | Data Services, Inc.Data Services, Inc.
Data Services' MarketView Data Management & Analytics Platform provides direct & data-driven marketers with a 360 degree view of their US & int'l marketing databases with advanced tools for database segmentation, customer/data analytics, data visualization, business intelligence, campaign management, cross-product analysis, marketing channel affinity reporting as well as a seamless connection to Data Services DSIemail Broadcasting Platform as well as integration with 3rd party platforms for CRM, ESP, eCommerce, Marketing Automation and more, all in a seamless online platform requiring no software or application to download.
Business intelligence- Components, Tools, Need and Applicationsraj
As part of the research project for the course Technical Foundations of Information Systems at the University of Illinois, our team worked on the topic, Business Intelligence. The presentation focuses on what is Business Intelligence, its various components, latest tools, the need of BI as well as applications of this technology. This project deals with the latest development of BI technologies (hardware or software) and includes comprehensive literature survey from Journals, and the Internet.
Analytics & Data Strategy 101 by Deko DimeskiDeko Dimeski
- Understand why each company needs solid analytics and data strategy & capabilities
- Typical data problems each company experiences, regardless of the scale
- Core competences and roles
- Analytics products and artefacts
- Analytics Usecases
Business intelligence (BI) refers to technologies and applications used to analyze data and present information to help corporate executives, managers, and other business users make informed decisions. Examples of how BI is used include hotels analyzing occupancy rates and revenue, banks determining profitable customers, and telecom companies providing targeted data access. BI provides insights into customer behavior, market trends, internal operations, and more to support strategic decision-making. The future of BI includes greater use of real-time analysis to provide up-to-date insights for time-sensitive business decisions.
The document proposes a finger gesture-based rating system using computer vision and cloud computing. The system would allow customers to provide ratings for products and services by holding up a corresponding number of fingers, from 1 to 5. Computer vision techniques would recognize the gesture and record the rating in a collective database in the cloud. This universal database could then provide aggregated rating data across multiple companies for improved analytics. The system aims to provide a more efficient and engaging way for customers to submit feedback compared to traditional rating methods.
This document is about Data Warehouse Tools such as:
OLAP (On – line Analytical Processing)
OLTP (On – Line Transaction Processing)
Business Intelligence
Driving Force
Data Mart
Meta Data
3 Simple Steps To Buy Verified Payoneer Account In 2024SEOSMMEARTH
Buy Verified Payoneer Account: Quick and Secure Way to Receive Payments
Buy Verified Payoneer Account With 100% secure documents, [ USA, UK, CA ]. Are you looking for a reliable and safe way to receive payments online? Then you need buy verified Payoneer account ! Payoneer is a global payment platform that allows businesses and individuals to send and receive money in over 200 countries.
If You Want To More Information just Contact Now:
Skype: SEOSMMEARTH
Telegram: @seosmmearth
Gmail: seosmmearth@gmail.com
How MJ Global Leads the Packaging Industry.pdfMJ Global
MJ Global's success in staying ahead of the curve in the packaging industry is a testament to its dedication to innovation, sustainability, and customer-centricity. By embracing technological advancements, leading in eco-friendly solutions, collaborating with industry leaders, and adapting to evolving consumer preferences, MJ Global continues to set new standards in the packaging sector.
Structural Design Process: Step-by-Step Guide for BuildingsChandresh Chudasama
The structural design process is explained: Follow our step-by-step guide to understand building design intricacies and ensure structural integrity. Learn how to build wonderful buildings with the help of our detailed information. Learn how to create structures with durability and reliability and also gain insights on ways of managing structures.
B2B payments are rapidly changing. Find out the 5 key questions you need to be asking yourself to be sure you are mastering B2B payments today. Learn more at www.BlueSnap.com.
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Final ank Satta Matka Dpbos Final ank Satta Matta Matka 143 Kalyan Matka Guessing Final Matka Final ank Today Matka 420 Satta Batta Satta 143 Kalyan Chart Main Bazar Chart vip Matka Guessing Dpboss 143 Guessing Kalyan night
The 10 Most Influential Leaders Guiding Corporate Evolution, 2024.pdfthesiliconleaders
In the recent edition, The 10 Most Influential Leaders Guiding Corporate Evolution, 2024, The Silicon Leaders magazine gladly features Dejan Štancer, President of the Global Chamber of Business Leaders (GCBL), along with other leaders.
At Techbox Square, in Singapore, we're not just creative web designers and developers, we're the driving force behind your brand identity. Contact us today.
Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...Neil Horowitz
On episode 272 of the Digital and Social Media Sports Podcast, Neil chatted with Brian Fitzsimmons, Director of Licensing and Business Development for Barstool Sports.
What follows is a collection of snippets from the podcast. To hear the full interview and more, check out the podcast on all podcast platforms and at www.dsmsports.net
How to Implement a Strategy: Transform Your Strategy with BSC Designer's Comp...Aleksey Savkin
The Strategy Implementation System offers a structured approach to translating stakeholder needs into actionable strategies using high-level and low-level scorecards. It involves stakeholder analysis, strategy decomposition, adoption of strategic frameworks like Balanced Scorecard or OKR, and alignment of goals, initiatives, and KPIs.
Key Components:
- Stakeholder Analysis
- Strategy Decomposition
- Adoption of Business Frameworks
- Goal Setting
- Initiatives and Action Plans
- KPIs and Performance Metrics
- Learning and Adaptation
- Alignment and Cascading of Scorecards
Benefits:
- Systematic strategy formulation and execution.
- Framework flexibility and automation.
- Enhanced alignment and strategic focus across the organization.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.AnnySerafinaLove
This letter, written by Kellen Harkins, Course Director at Full Sail University, commends Anny Love's exemplary performance in the Video Sharing Platforms class. It highlights her dedication, willingness to challenge herself, and exceptional skills in production, editing, and marketing across various video platforms like YouTube, TikTok, and Instagram.
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Tastemy Pandit
Know what your zodiac sign says about your taste in food! Explore how the 12 zodiac signs influence your culinary preferences with insights from MyPandit. Dive into astrology and flavors!
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....Lacey Max
“After being the most listed dog breed in the United States for 31
years in a row, the Labrador Retriever has dropped to second place
in the American Kennel Club's annual survey of the country's most
popular canines. The French Bulldog is the new top dog in the
United States as of 2022. The stylish puppy has ascended the
rankings in rapid time despite having health concerns and limited
color choices.”
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf46adnanshahzad
How to Start Up a Company: A Step-by-Step Guide Starting a company is an exciting adventure that combines creativity, strategy, and hard work. It can seem overwhelming at first, but with the right guidance, anyone can transform a great idea into a successful business. Let's dive into how to start up a company, from the initial spark of an idea to securing funding and launching your startup.
Introduction
Have you ever dreamed of turning your innovative idea into a thriving business? Starting a company involves numerous steps and decisions, but don't worry—we're here to help. Whether you're exploring how to start a startup company or wondering how to start up a small business, this guide will walk you through the process, step by step.
[To download this presentation, visit:
https://www.oeconsulting.com.sg/training-presentations]
This presentation is a curated compilation of PowerPoint diagrams and templates designed to illustrate 20 different digital transformation frameworks and models. These frameworks are based on recent industry trends and best practices, ensuring that the content remains relevant and up-to-date.
Key highlights include Microsoft's Digital Transformation Framework, which focuses on driving innovation and efficiency, and McKinsey's Ten Guiding Principles, which provide strategic insights for successful digital transformation. Additionally, Forrester's framework emphasizes enhancing customer experiences and modernizing IT infrastructure, while IDC's MaturityScape helps assess and develop organizational digital maturity. MIT's framework explores cutting-edge strategies for achieving digital success.
These materials are perfect for enhancing your business or classroom presentations, offering visual aids to supplement your insights. Please note that while comprehensive, these slides are intended as supplementary resources and may not be complete for standalone instructional purposes.
Frameworks/Models included:
Microsoft’s Digital Transformation Framework
McKinsey’s Ten Guiding Principles of Digital Transformation
Forrester’s Digital Transformation Framework
IDC’s Digital Transformation MaturityScape
MIT’s Digital Transformation Framework
Gartner’s Digital Transformation Framework
Accenture’s Digital Strategy & Enterprise Frameworks
Deloitte’s Digital Industrial Transformation Framework
Capgemini’s Digital Transformation Framework
PwC’s Digital Transformation Framework
Cisco’s Digital Transformation Framework
Cognizant’s Digital Transformation Framework
DXC Technology’s Digital Transformation Framework
The BCG Strategy Palette
McKinsey’s Digital Transformation Framework
Digital Transformation Compass
Four Levels of Digital Maturity
Design Thinking Framework
Business Model Canvas
Customer Journey Map
IMPACT Silver is a pure silver zinc producer with over $260 million in revenue since 2008 and a large 100% owned 210km Mexico land package - 2024 catalysts includes new 14% grade zinc Plomosas mine and 20,000m of fully funded exploration drilling.
1. IS - 6410 - System Analysis and Design
Group Project 2
Divya Bhatia
Poojya Reddy
Aditya Ekawade
Siddharth Suresh
Aditya Kannan
2. IS6410- Analysis & Design Customer Segmentation Report
Team Organisation Report
Team Member Skill Set IT Interest Areas
Aditya Ekawade Web technologies (HTML, JavaScript,
React, PHP, JAVA), UI, SEO
Web Development, Digital
Marketing
Siddharth Suresh IT Security, R, Statistics, Data
Visualization
Data Analytics, Business
Intelligence
Divya Bhatia Software Automation, R , Data
Visualization , Statistics
Data Science
Poojya Reddy Scripting, DevOPs, Build Engineer,
Business Analysis (Technical +
Functional)
DevOPS developer,Digital
Marketing and Analytics
Aditya Kannan Java, MySQL, Hadoop Ecosystem ,
Power BI.
Data Engineering, Data
Warehousing, Consulting.
Scrum Roles Team Member
Scrum Master Aditya Kannan
Product Owner Aditya Ekawade, Poojya Reddy
Developers Divya Bhatia, Siddharth Suresh
2
3. IS6410- Analysis & Design Customer Segmentation Report
Table of Contents
Table of Contents 3
Project Selection And Requirements Analysis Report 4
Executive Summary 4
Detailed Requirements 6
High Level Scope Definition 10
Use Case Diagram 12
Use case narratives 13
Project Plan 29
Work Breakdown Structure 29
GANTT Chart 31
CoCoMo Estimation 32
Burndown Chart 34
Sprint Planning 35
Analysis Document 37
Logical Entity Relation Diagram 37
Data Flow Diagram 38
DFD Level 0 39
Activity Diagram 40
CRUD Matrix: 41
Buy vs Build Analysis 42
Design and Prototype Document 44
Architecture/ Platform Choices 44
Data Storage Platform: 45
Data Processing Platform: 45
Physical Entity Relation Diagram 46
Physical Data Flow Diagram 47
Mock-ups 48
References: 53
3
4. IS6410- Analysis & Design Customer Segmentation Report
Project Selection And Requirements Analysis Report
Executive Summary
A flawless vision for what’s upcoming in fashion is the motto what our company likes to believe
in, since our inception 5 years ago. Trendzzz4u.com our company strives to exceed customer
expectation at every step of the user’s shopping journey on our website. This loyalty has driven
us from a small scale part time online retailer to middle tier e-commerce retailer.
Our website currently offers 15000+ products in clothes and accessories for Men and Women.
With the business expansion, which would offer 40000+ products through strategic partnerships
with suppliers in the next two years, scalability in managing our website data is the biggest
challenge we would face.
Our in-house analytics department currently deals with our inbuilt Data Warehouse which
consumes our inventory and CRM system data. Using this warehouse, our product managers
obtain actionable insights and make decisions based on weekly reports. The current size of our
warehouse is 2TB. With the target of an increase in the product catalog, there would an
exponential increase in data close to 10TB per year. If we continue with our current data
warehouse approach integration with the supplier source systems would be a problem and
working on them independently will create many data silos.
Also, we would restrict ourselves by working only on lag data as it is difficult to apply modern
statistical analysis such as association rule mining, classification on the data warehouse. This
would not help us to track on user buying/browsing patterns, work on unstructured data and
perform customer segmentation on unstructured data. With the current dynamics changing in
analytics we need to shift our existing data warehouse to a highly scalable cloud storage such
as Amazon S3 and build a data lake for analysis. ETL processing should be replaced with the
usage of modern MapReduce algorithms or agile in-memory data processing open source
frameworks such as Apache Spark/Kafka. Separating storage and computing is needed with
such huge amounts of influx data.
4
5. IS6410- Analysis & Design Customer Segmentation Report
By performing customer segmentation following are the three objectives which can be achieved
with the implementation of this new analytics system:
1. We can track the difference between loyal customers vs visitors, perform heat map
analysis of their browsing patterns.
2. Understanding customer demographics and to focus on high profitable segments.
3. Finally empowering our Marketing department to make better strategic decisions in
terms of online Ads/campaigns.
End Users for our new system would be:
1. Marketing Department users
2. Product Managers
3. Data Analyst
5
10. IS6410- Analysis & Design Customer Segmentation Report
High Level Scope Definition
User Stories Acceptance Criterion
As an Analyst, I want to load data from database so
that I can analyse it.
Data is available in the database.
Analyst should have correct
credentials and access level for the
database.
As an Analyst, I want to analyse the data so that I can
segregate the data into different customer segments.
Data is loaded from the database.
As an Analyst, I want to clean the data so that the
data is made consistent.
Data is loaded from the database.
Data may be structured or
unstructured which can be cleaned.
As an Analyst, I want to segment the data so that the
marketing team use these segments and lay out
different marketing strategies.
Data has different segments and
variety through which it can be
broken down.
Marketing strategies are created
based on segments identified.
As a Marketing Team, I want to pull reports based on
segments so that I can lay out different marketing
strategies.
Data is available based on segments
for reports to be created.
Identified segments can be mapped
to different strategies.
As a Marketing Team, I want to identify different
customer segments so that each segment can be
handled with the different promotional strategy.
Data has different segments and
variety.
Identified segments can be mapped
to different promotional strategies.
10
11. IS6410- Analysis & Design Customer Segmentation Report
As a Marketing Team, I want to track campaigns so
that I will know which ones have reached the goal.
Data is available for the customers
who have interacted with various
campaigns.
As a Marketing Team, I want to send various
promotions to customers so that more customers are
obtained.
Marketing team has access to send
promotions.
As a Customer, I want to receive promotions so that I
can avail them.
Customer should have access to
internet to receive various forms of
promotions.
As a Customer, I want to interact with the campaigns
so that I can accept the promotion.
Customer should receive
promotions.
11
12. IS6410- Analysis & Design Customer Segmentation Report
Use Case Diagram
12
13. IS6410- Analysis & Design Customer Segmentation Report
Use case narratives
Narrative - 1
Use case name (should
describe the goal- active verb)
Analyze Data
Last revised March 13, 2017 by Poojya Reddy
March 13, 2017 by Aditya Kannan
Description (purpose) This use case describes how data is analyzed .
Actors (that could invoke use
case)
Analyst
Pre-condition Data is loaded from the database.
Post-condition Cleaned data along with customer segments.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Analyst has access to the data loaded from the database.
2.As part of the data analysis, the analyst first cleans the data
3.After data cleaning, customer segments are created which can be used to identify
different customers.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
2.Data loaded from the database is already clean.
13
14. IS6410- Analysis & Design Customer Segmentation Report
3.Data is insufficient to create segments/few data points/one particular segment is
dominating the dataset.
Alternate paths (Extensions/ Exceptions)
1. a1 Data is not loaded correctly from the database.
a2 Analyst cannot access the data.
b1 Analyst does not have the correct access level to view the data
b2 Analyst cannot access the data
2.a1 Data cleaning fails due to inconsistent data,junk values,few data points etc.
3.a1 Too few data points to create customer segments/data set is only of one particular
type.
a2 Use case terminates and needs to be restarted.
List Related use case names Clean Data
Customer Segmentation
14
15. IS6410- Analysis & Design Customer Segmentation Report
Narrative -2
Use case name (should describe the goal-
active verb)
Load Data
Last revised March 13, 2017 by Divya Bhatia
March 13, 2017 by Siddharth Suresh
Description (purpose) This use case describes how data can be
loaded from database which is required for
analysis.
Actors (that could invoke use case) Analyst,AWS System
Pre-condition An existing database and valid credentials
for the analyst.
Post-condition Data is loaded from the database.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Analyst logins into the database with valid credentials.
2.Database validates the user credentials and access type, and allows the analyst to
login.
3.Analyst can view the data and load the data( via various data source systems like
CRM,Operational systems,external data providers) in memory to work on it.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
1.Credentials can be of various types such as Administrator,User,Team accesses.
3.Connect database to external sources.
Alternate paths (Extensions/ Exceptions)
15
16. IS6410- Analysis & Design Customer Segmentation Report
1. a1 Credentials entered are incorrect, which does not allow the analyst to login.
a2 Loading the database fails.
a3 Analyst is redirected to the login page.
2.a1 Credentials have a different access level than required, which does not allow the
analyst to login.
a2 Loading the database fails.
a3 Analyst is redirected to the login page.
3.a1 Loading the database fails.
a2 Use case terminates and needs to be restarted.
List Related use case names
16
17. IS6410- Analysis & Design Customer Segmentation Report
Narrative -3
Use case name (should describe the goal-
active verb)
Identify Segments
Last revised March 13, 2017 by Aditya Kannan
March 13, 2017 by Aditya Ekawade
Description (purpose) This use case describes how segments
can be identified from marketing
perspective.
Actors (that could invoke use case) Marketing team
Pre-condition Marketing team has access to reports
created by the analyst.
Post-condition Customer segments identified by
marketing team.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Marketing team has access to reports created by the analyst.
2.Identify segments based on the reports created by the analyst.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
1.Reports has insufficient data
2.Data is insufficient to create segments/few data points/one particular segment is
dominating the dataset.
Alternate paths (Extensions/ Exceptions)
1. a1 Marketing team does not have access to reports created by the analyst.
17
18. IS6410- Analysis & Design Customer Segmentation Report
a2 Marketing team cannot access the reports.
2.a1 Too few data points to create customer segments/data set is only of one particular
type.
a2 Use case terminates and needs to be restarted.
List Related use case names
18
19. IS6410- Analysis & Design Customer Segmentation Report
Narrative -4
Use case name (should describe the goal- active
verb)
Pull Reports
Last revised March 13, 2017 by Divya Bhatia
March 13, 2017 by Siddharth Suresh
Description (purpose) This use case describes how marketing
team can pull reports created by the
analyst.
Actors (that could invoke use case) Marketing team,AWS System
Pre-condition Marketing team has access to reports
created by the analyst.
Post-condition Reports can be viewed by the marketing
team.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Marketing team has access to reports created by the analyst.
2.Marketing team can view and make edits on the reports.
3.Data for the reports is pulled from the AWS system.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
1.Reports have no data
Alternate paths (Extensions/ Exceptions)
1. a1 Marketing team does not have access to reports created by the analyst.
a2 Marketing team cannot access the reports.
19
20. IS6410- Analysis & Design Customer Segmentation Report
2.a1 Marketing team cannot make edits or use filters on the reports.
a2 Use case terminates and needs to be restarted.
3.a1 AWS System is down and data cannot be pulled
a2 Use case terminates and needs to restart
List Related use case names
20
21. IS6410- Analysis & Design Customer Segmentation Report
Narrative -5
Use case name (should describe the goal- active
verb)
Interacts with campaign
Last revised March 13, 2017 by Divya Bhatia
March 13, 2017 by Poojya Reddy
Description (purpose) This use case describes the
interaction of customer with a
campaign.
Actors (that could invoke use case) Customer
Pre-condition Customer received a promotion from
the marketing team.
Post-condition Customer interacted with the
promotion.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Marketing team sends promotions to the customer.
2.Customer responds to the promotion.
3.The interaction of the customer with the promotion is tracked by the marketing team
which is used to compare with the goals required by the team.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
1. Customer does not respond to the promotion.
2. Marketing team sends multiple promotions to the same customer.
Alternate paths (Extensions/ Exceptions)
2. a1 Customer does not interact with the promotions sent.
21
22. IS6410- Analysis & Design Customer Segmentation Report
a2 Use case terminates.
3. a1 No interaction by the user results in no data generation, hence the marketing team
cannot track the campaign.
List Related use case names Track Campaigns
22
23. IS6410- Analysis & Design Customer Segmentation Report
Narrative -6
Use case name (should describe the goal- active
verb)
Send Promotions
Last revised March 13, 2017 by Siddharth Suresh
March 13, 2017 by Aditya Ekawade
Description (purpose) This use case describes type of
promotions the marketing team sends.
Actors (that could invoke use case) Marketing team
Pre-condition Marketing team has access to send
promotions.
Post-condition Marketing team sends promotions.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Marketing team sends various forms of promotions like emails,loyalty programs,
coupons, social media ads and paid ads.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
1. Team sends only emails or loyalty program promotion to the customer.
2. Team sends coupons and media ads to the user based on interactions with the
campaigns.
Alternate paths (Extensions/ Exceptions)
1. a1 Marketing team is unable to gather any data about customers and no promotions
are sent.
a2. Use case terminates.
23
24. IS6410- Analysis & Design Customer Segmentation Report
List Related use case names Email marketing
Loyalty program
Send Coupon
Social Media
Display/Paid Ads
24
25. IS6410- Analysis & Design Customer Segmentation Report
Narrative -7
Use case name (should describe the goal- active
verb)
Track Campaigns
Last revised March 13, 2017 by Poojya Reddy
March 13, 2017 by Aditya Ekawade
Description (purpose) This use case describes how
marketing team can track campaigns.
Actors (that could invoke use case) Marketing team
Pre-condition NA
Post-condition Marketing team could successfully
track campaigns
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Marketing team tracks the campaign for which the user interacts with the campaign.
2.Tracked campaigns are compared with respect to the goals required for the campaign.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
1. No user interacts with the campaign.
Alternate paths (Extensions/ Exceptions)
1. a1 There is no data to track and compare with the expected goals as no user interacts
with the campaign.
a2. Use case terminates
2.a1 There are no expected goals for comparison.
25
26. IS6410- Analysis & Design Customer Segmentation Report
List Related use case names Interacts with campaigns
Goals completed
26
27. IS6410- Analysis & Design Customer Segmentation Report
Narrative -8
Use case name (should describe the goal-
active verb)
Send Coupons
Last revised March 13, 2017 by Divya Bhatia
March 13, 2017 by Aditya Kannan
Description (purpose) This use case describes the interaction of
marketing teams,customer with a coupon.
Actors (that could invoke use case) Marketing team,Customer
Pre-condition Marketing team has access to send
promotions,Customer can receive
promotions.
Post-condition Marketing team sends promotions via
coupons.
Other business rules (if any)
Basic success flow (number lines, say what info passes between actor and system from
trigger to end)
1.Marketing team sends promotions via coupons.
2.Customer responds to the promotional coupon either by using it or asking updates on it.
3.The interaction of the customer with the coupon is tracked by the marketing team which
is used to compare with the goals required by the team.
Variations in success flows (list variations in the main flow that also lead to successful
accomplishment of use case goals)
1. Customer does not respond to the promotional coupon.
2. Marketing team sends multiple promotions to the same customer.
Alternate paths (Extensions/ Exceptions)
1. a1 Marketing team does not send any promotions.
27
28. IS6410- Analysis & Design Customer Segmentation Report
a2. Use case terminates.
2. a1 Customer does not interact with the promotions sent.
a2 Use case terminates.
List Related use case names Send Promotions
28
29. IS6410- Analysis & Design Customer Segmentation Report
Project Plan
Work Breakdown Structure
WBS is a hierarchical and incremental decomposition of the project into phases, deliverables
and work packages. It is a tree structure, which shows a subdivision of effort required to achieve
an objective; for example a program, project, and contract.[2]
In a project or contract, the WBS is
developed by starting with the end objective and successively subdividing it into manageable
components in terms of size, duration, and responsibility (e.g., systems, subsystems,
components, tasks, subtasks, and work packages) which include all steps necessary to achieve
the objective.
The diagram below shows the WBS of the entire customer segmentation project. The project is
divided into 5 modules
1. Customer Survey
2. Create E-Commerce Website
3. Set Hadoop Environment
4. Data Engineering
5. Analyze Data & Reporting
29
30. IS6410- Analysis & Design Customer Segmentation Report
Customer Survey: The main focus of this module is to prepare, send and analyze
questionnaires for potential customers. The questionnaires are prepared such that to analyze
the the demographics and the type of devices used by people. The purpose of this phase is to
use this data as a means to estimate the success rate of reaching potential customers with
targeted promotions.
Create e-Commerce Website: This module of the project includes, searching and acquiring an
e-commerce web site that is readily available in the market, analyzing whether to go with cloud
or web hosting (web hosting chosen for our project), purchasing a web domain, installing the
the e-commerce template on the server, getting the website up and running and finally
generating the web site logs.
Set Hadoop Environment: The operations during this phase includes creating login credentials
in the AWS, Purchasing EMR and S3 services, installing the necessary softwares in EC2 and
finally testing the Hadoop clusters.
30
31. IS6410- Analysis & Design Customer Segmentation Report
Data Engineering: The Data Engineering phase is responsible for ingesting the log data
contained in the web server into the EMR node clusters, then converting the unstructured data
into structured data using the MapReduce algorithm and storing the structured data in a
relational database.
Analyze Data & Reporting: This is the final phase of the project which helps the marketing team
create targeted promotions. The data is loaded from the relational database for the analysts to
perform data analysis and identify the various customer segments. The identified customer
segments and provided to the marketing team in the form of reports. The marketing team will
perform their analysis and come up with campaign strategies and targeted promotions.
GANTT Chart
A GANTT chart is a good way to keep track of the various activities undertaken during the
project. However, we are constricting our chart to only the planning phase which is the entire
endeavor of the class project.
31
32. IS6410- Analysis & Design Customer Segmentation Report
CoCoMo Estimation
Based on the definitions of each of the development modes, we have decided that our
project to be a semi-detached project. It is a software project which is intermediate in
both size and complexity. Our team consists of individuals with mixed experience levels
and our project deals with a good mix of rigid and less than rigid requirements.
The equation for the Effort (E) and Development time (D) for this model are :
E = 3.0 * (KLOC)^1.12 D = 2.5 * (E)^0.35
Simple Average Complex
Inputs Member
Login
3 6
Member registration 3
Outputs Send Promotions 4 4
Inquires Pull reports 3 37
32
33. IS6410- Analysis & Design Customer Segmentation Report
Analyze Data 10
Identify Segments 8
Track Campaigns 8
Interacts with campaigns 8
Files Reports 8 8
Interfaces Application server to
database
10 20
User to application server 10
Total 75
Calculating the Adjusted Function Point -
The adjusted function point denoted by FP is given by the formula:
FP = total UFP * (0.65 + (0.01 * Total complexity adjustment value)) or
FP = total UFP * (Complexity adjustment factor)
Total complexity adjustment value is counted based on responses to questions called
complexity weighting factors in the table below:
Table Adjusted Function Points
Number Complexity Weighting Factor Valu
e
1 Backup and recovery 2
2 Data communications 2
3 Distributed processing 2
4 Performance critical 5
5 Existing operating environment 4
6 Online Data Entry 3
7 Input transaction over multiple screens 1
33
34. IS6410- Analysis & Design Customer Segmentation Report
8 Master files updated online 3
9 Information domain values complex 5
10 Internal processing complex 4
11 Code designed for reuse 5
12 Software Deployment 4
13 Application designed for change 4
Total complexity adjustment value 44
Calculating the Source Lines of Code (SLOC) -
· Total Unadjusted Function Points (UFP) = 75
· Product Complexity Adjustment (PC) = 0.65 + (0.01 *44) = 1.74
· Total Adjusted Function Points (FP) = UFP * PC = 75 *1.74 = 130.5
· Language Factor (LF) for programming languages used assumed as = 25
· Source Lines of Code (SLOC) = FP * LF = 130.5 *25 = 3262.5
Estimating the Effort and Development Time -
The programmer productivity and the development time are as follows:
· KDSI = 3.263 KLOC
· Effort = 3 * (3.26) 1.12
= 11.27 person-month
· Development TIme = 2.5 * (11.27) 0.35
= 5.83 months
Burndown Chart
After understanding the scope of the project, we estimated the deliverables of the class project
to be equivalent to 90 hours of work and estimated 2 hours of work to be completed on a daily
basis, thereby completing the project in 45 days time.
The burndown chart below shows the rate of work completed from inception to completion.
34
41. IS6410- Analysis & Design Customer Segmentation Report
CRUD Matrix:
Processes/
Entities
Load Data Perform Data
Analysis
Build
Customer
Segmentation
Dashboard
Build
Strategy
System
Data Lake R R R R
Reports R CRUD CRUD RU
Campaign
Log file
R R R CRUD
41
42. IS6410- Analysis & Design Customer Segmentation Report
Buy vs Build Analysis
For our project, we need 4 machines each with minimum 8GB RAM to be running to process our
website logs. If we plan for an in house cluster setup, it would increase the maintenance cost
and also for processing big data, scalability is the biggest worry as we never know the size of
the incoming data.So after careful analysis, meetings with the current IT systems and
stakeholders team, we have decided to go ahead with buy option.
Amazon Web Services (AWS), offers EMR (Elastic MapReduce) a on cloud hadoop framework
to process vast amounts of data in the most cost effective and fast way.EMR provides an option
to scale node and clusters dynamically. Also aws offers 99.99% run time, and any cluster can
be spinned up in under 2minutes. We calculated estimated cost from AWS calculator for using
EC2 and EMR services. The cost is around 60$ per month. Below given is the snapshot from
the aws calculator.
Further if we need to separate computing and storage as we progress in big data, we can opt
for Amazon S3, for on cloud storage and create a data pipeline between S3 and amazon EMR.
The cost of using S3 as per aws calculator is 266$ for storing 10TB of data.
42
44. IS6410- Analysis & Design Customer Segmentation Report
Design and Prototype Document
Architecture/ Platform Choices
1. The above diagram depicts our ‘to-be-system’ for applying customer segmentation.
2. The process would start by first generating the logs, from our website (trendzzz4u.com).
The logs would consist of clickstream data and browsing data.
3. Using the logs generated, the data would be ingested in the AWS cloud for Data
Processing.
4. AWS would be Infrastructure platform, for deploying, processing and applying analytics
on the log data.
44
45. IS6410- Analysis & Design Customer Segmentation Report
5. Unstructured data would be converted to a structured format for data analysis.
Data Storage Platform:
1. Amazon S3.
2. Amazon EFS
3. MySQL DB Instance
Data Processing Platform:
Amazon EMR: A comprehensive hadoop package provided by amazon consisting of
Hive, Sqoop, Flume, MapReduce and Hbase. This is main processing engine for our
application. Business logic would reside here.
45
48. IS6410- Analysis & Design Customer Segmentation Report
Mock-ups
The diagrams below depicts the mock-up screens of the dashboards for the Analyst and the
Marketer. The diagrams cover the following uses cases:
● Analyzing data and creating reports by the Analyst.
● Pulling the reports, sending promotions and tracking the campaigns by the Marketer.
These UI mock-ups are designed by using the software Adobe experience design (XD) and are
designed by focussing on the principles of Utility and Usability.
The dashboards will be created in such a way that the Analysts and Marketers can spend more
time in doing what they do best and less time in learning these interfaces.
Mockup screen for data analyst dashboard.
48
49. IS6410- Analysis & Design Customer Segmentation Report
Mockup screen for data analysis.
49
53. IS6410- Analysis & Design Customer Segmentation Report
References:
1.For general understanding of all concepts - “Dr. Ramachandran, Vandana”, All the lecture
slides
2. For all references regarding services offered by amazon.
https://aws.amazon.com/, February 10, February 17, March 13, March 14, March 15, 2017
3. To Understand the writing style in executive summary - “Faulkner,Jennifer ” Published on
September 17,2015, https://www.proposify.biz/blog/executive-summary , Accessed on March 18
2017
4. To estimate CoCoMo -
http://people.cs.ksu.edu/~padmaja/Project/CostEstimate.htm , Accessed on March 19 2017
5. For Use Case Narratives,High level scope definition -
“Dr. Ramachandran, Vandana”, s3_IS6410-Requirements.pptx, 23rd January 2017
Tools used :
6.For all diagrams(Use case,ERDs,DFDs,Software architecture,WBS) -
https://www.lucidchart.com/documents#docs?folder_id=home&browser=icon&sort=saved-desc
7. For creating UI Mockups-
Design for the Header on Analyst’s dashboard based on Power BI and the software used Adobe
XD - https://powerbi.microsoft.com/en-us/
53