Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...Vivek Mohan
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opportunity.
Digital transformation Leader with an insightful career of 16.5 years. Over the years, I have built up expertise in Business Analysis, Enterprise Architecture, Enterprise Business Intelligence, Data Visualization, Data Warehouse Management, Big Data Analytics, Advanced Analytics, Machine Learning, Artificial Intelligence, IoT Analytics, Big Data Processing, Data Cataloguing and Journey Analytics. During my corporate career, I have been associated with reputed industry leaders like British Telecom, Accenture, ZS Associates, Infosys, Goldman Sachs, Tesco and Wipro Technology.
Possess verifiable track record of heading multi-million dollar complex software/application development projects and Strategy & Architecture Groups of organizations, setting up CoEs (Center of Excellence), designing BI solutions, project/program management (SDLC), vendor management, capability building, coaching and mentoring team members and stakeholder management. I specialize in driving innovation, capitalizing on an analytical and 'big picture' perspective to exceed objectives and deliver results. I am known for developing sustainable growth and expansion plans, forming strategic partnerships and maintaining strong relationships.
With strong clarity of purpose and objective analytical approach, I have gained exceptionally consistent and committed track-record of excelling in all assignments undertaken. I am driven by strong personal values of honesty, integrity & sincerity and I can perform in demanding work environments utilizing strong work ethics.
Active speaker at various events. Currently heading the COE - BI & Analytics, Enterprise Architecture & Data Exploitation - Data & Analytics in BT.
Technical proficiency –
SAP BO, MicroStrategy, OBIEE, ODV, Qlikview, Xcelsius, SAS, SPSS, RoamBI, BIG DATA, Hadoop, Tableau, Spotfire, Pentaho, SAP Lumira, Informatica, Alteryx, ZoomData, IBM Watson , Thoughtspot, AtScale, Trifacta, Alation, Dataiku, Rapidminer, H20, Datameer, Clickfox, Adobe Analytics, Zaloni, Alpine data labs, Kvyos Insights & Stratifyd.
Big Data Analytics for Banking, a Point of ViewPietro Leo
This document discusses how big data and analytics can transform the banking industry. It notes that digital transformation, enabled by big data and analytics, is creating pressures on banks from new digital native customers, large amounts of new data, new channels like mobile, and new competitors. It argues that to succeed in this new environment, banks need to build a 360-degree integrated customer view using big data, and ensure analytics are part of closed-loop business processes to create value. New applications and platforms like IBM Watson Analytics aim to make analytics more accessible and valuable to more users.
Using the Students Performance in Exams Dataset we will try to understand what affects the exam scores. The data is limited, but it will present a good visualization to spot the relations. First of all, we explore our data and after that we apply Naive Bayes Classification technique for evaluation purpose.
DAX and Power BI Training - 001 OverviewWill Harvey
Course & Power BI Overview: This is the first session in a course that primarily focuses on DAX and PowerPivot, but also teaches the surrounding tools such as Power Query, Power BI Desktop and PowerBI.com.
Fraud is a crime and civil violation involving intentional deception for personal gain, costing the global healthcare system an estimated $415 billion annually. Data analytics can help detect fraud through identifying anomalies like duplicate claims, age/gender-specific procedures outliers, and physicians billing outside their specialty. Singapore suspended two dental clinics for continuously breaching rules through non-matching claims and procedures not performed. Data analytics provides a defense against fraud through monitoring for irregularities.
Business intelligence and data warehousesDhani Ahmad
This chapter discusses business intelligence and data warehouses. It covers how operational data differs from decision support data, the components of a data warehouse including facts, dimensions and star schemas, and how online analytical processing (OLAP) and SQL extensions support analysis of multidimensional decision support data. The chapter also discusses data mining, requirements for decision support databases, and considerations for implementing a successful data warehouse project.
This document summarizes the history of big data from 1944 to 2013. It outlines key milestones such as the first use of the term "big data" in 1997, the growth of internet traffic in the late 1990s, Doug Laney coining the three V's of big data in 2001, and the focus of big data professionals shifting from IT to business functions that utilize data in 2013. The document serves to illustrate how data storage and analysis have evolved over time due to technological advances and changing needs.
Data analytics refers to the broad field of using data and tools to make business decisions, while data analysis is a subset that refers to specific actions within the analytics process. Data analysis involves collecting, manipulating, and examining past data to gain insights, while data analytics takes the analyzed data and works with it in a meaningful way to inform business decisions and identify new opportunities. Both are important, with data analysis providing understanding of what happened in the past and data analytics enabling predictions about what will happen in the future.
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opp...Vivek Mohan
Resume-Vivek Mohan (BI & Analytics Enterprise Architect) - Looking for an opportunity.
Digital transformation Leader with an insightful career of 16.5 years. Over the years, I have built up expertise in Business Analysis, Enterprise Architecture, Enterprise Business Intelligence, Data Visualization, Data Warehouse Management, Big Data Analytics, Advanced Analytics, Machine Learning, Artificial Intelligence, IoT Analytics, Big Data Processing, Data Cataloguing and Journey Analytics. During my corporate career, I have been associated with reputed industry leaders like British Telecom, Accenture, ZS Associates, Infosys, Goldman Sachs, Tesco and Wipro Technology.
Possess verifiable track record of heading multi-million dollar complex software/application development projects and Strategy & Architecture Groups of organizations, setting up CoEs (Center of Excellence), designing BI solutions, project/program management (SDLC), vendor management, capability building, coaching and mentoring team members and stakeholder management. I specialize in driving innovation, capitalizing on an analytical and 'big picture' perspective to exceed objectives and deliver results. I am known for developing sustainable growth and expansion plans, forming strategic partnerships and maintaining strong relationships.
With strong clarity of purpose and objective analytical approach, I have gained exceptionally consistent and committed track-record of excelling in all assignments undertaken. I am driven by strong personal values of honesty, integrity & sincerity and I can perform in demanding work environments utilizing strong work ethics.
Active speaker at various events. Currently heading the COE - BI & Analytics, Enterprise Architecture & Data Exploitation - Data & Analytics in BT.
Technical proficiency –
SAP BO, MicroStrategy, OBIEE, ODV, Qlikview, Xcelsius, SAS, SPSS, RoamBI, BIG DATA, Hadoop, Tableau, Spotfire, Pentaho, SAP Lumira, Informatica, Alteryx, ZoomData, IBM Watson , Thoughtspot, AtScale, Trifacta, Alation, Dataiku, Rapidminer, H20, Datameer, Clickfox, Adobe Analytics, Zaloni, Alpine data labs, Kvyos Insights & Stratifyd.
Big Data Analytics for Banking, a Point of ViewPietro Leo
This document discusses how big data and analytics can transform the banking industry. It notes that digital transformation, enabled by big data and analytics, is creating pressures on banks from new digital native customers, large amounts of new data, new channels like mobile, and new competitors. It argues that to succeed in this new environment, banks need to build a 360-degree integrated customer view using big data, and ensure analytics are part of closed-loop business processes to create value. New applications and platforms like IBM Watson Analytics aim to make analytics more accessible and valuable to more users.
Using the Students Performance in Exams Dataset we will try to understand what affects the exam scores. The data is limited, but it will present a good visualization to spot the relations. First of all, we explore our data and after that we apply Naive Bayes Classification technique for evaluation purpose.
DAX and Power BI Training - 001 OverviewWill Harvey
Course & Power BI Overview: This is the first session in a course that primarily focuses on DAX and PowerPivot, but also teaches the surrounding tools such as Power Query, Power BI Desktop and PowerBI.com.
Fraud is a crime and civil violation involving intentional deception for personal gain, costing the global healthcare system an estimated $415 billion annually. Data analytics can help detect fraud through identifying anomalies like duplicate claims, age/gender-specific procedures outliers, and physicians billing outside their specialty. Singapore suspended two dental clinics for continuously breaching rules through non-matching claims and procedures not performed. Data analytics provides a defense against fraud through monitoring for irregularities.
Business intelligence and data warehousesDhani Ahmad
This chapter discusses business intelligence and data warehouses. It covers how operational data differs from decision support data, the components of a data warehouse including facts, dimensions and star schemas, and how online analytical processing (OLAP) and SQL extensions support analysis of multidimensional decision support data. The chapter also discusses data mining, requirements for decision support databases, and considerations for implementing a successful data warehouse project.
This document summarizes the history of big data from 1944 to 2013. It outlines key milestones such as the first use of the term "big data" in 1997, the growth of internet traffic in the late 1990s, Doug Laney coining the three V's of big data in 2001, and the focus of big data professionals shifting from IT to business functions that utilize data in 2013. The document serves to illustrate how data storage and analysis have evolved over time due to technological advances and changing needs.
Data analytics refers to the broad field of using data and tools to make business decisions, while data analysis is a subset that refers to specific actions within the analytics process. Data analysis involves collecting, manipulating, and examining past data to gain insights, while data analytics takes the analyzed data and works with it in a meaningful way to inform business decisions and identify new opportunities. Both are important, with data analysis providing understanding of what happened in the past and data analytics enabling predictions about what will happen in the future.
This document discusses connecting Oracle Analytics Cloud (OAC) Essbase data to Microsoft Power BI. It provides an overview of Power BI and OAC, describes various methods for connecting the two including using a REST API and exporting data to Excel or CSV files, and demonstrates some visualization capabilities in Power BI including trends over time. Key lessons learned are that data can be accessed across tools through various connections, analytics concepts are often similar between tools, and while partnerships exist between Microsoft and Oracle, integration between specific products like Power BI and OAC is still limited.
This document summarizes a presentation on data discovery. It discusses key concepts in data discovery including data source joining, ontologies and taxonomies, rules of data discovery, single source of truth, and data visualization. It emphasizes the importance of not discarding original data and keeping track of the data transformation process to maintain data provenance and lineage. Overall, the presentation aims to illuminate how to understand and work with data through concepts in data discovery.
Search: Probabilistic Information RetrievalVipul Munot
Probabilistic Information Retrieval uses probability rankings to effectively retrieve documents. It assumes binary relevance and independence between documents. The Binary Independence Model represents documents and queries as term vectors and estimates probabilities of relevance using term frequencies. Documents are ranked by their odds of relevance based on query term matches. In practice, probability estimates use collection frequencies. Extensions allow dependencies between terms and non-binary representations.
This document discusses big data and use cases. It begins by reviewing the history and evolution of big data and advanced analytics. It then explains how technologies like Hadoop, stream processing, and in-memory computing support big data solutions. The document presents two use cases - analyzing credit risk by examining customer transaction data to improve credit offers, and detecting fraud by analyzing financial transactions for unusual patterns that could indicate suspicious activity. It describes how these use cases leverage technologies like Oracle R Connector for Hadoop to run analytics and machine learning algorithms on large datasets.
Power BI can be used either through Power BI Desktop or Power BI Embedded. Power BI Desktop is a free desktop application that allows connecting to various data sources and creating visual analytics. Power BI Embedded allows integrating Power BI visualizations into web and mobile applications. Reports in Power BI combine visuals and filters to analyze data, while dashboards combine multiple reports. Filters and slicers allow filtering the data in visuals. Authentication is handled through Azure Active Directory, while access is controlled using various token types.
This document discusses how big data can be used in the healthcare sector to improve outcomes and reduce costs. It begins by defining big data and describing how large corporations have been using big data for years. It then draws a parallel between how big data helped answer what advertising worked for companies like Google, and how big data can help determine which medical treatments are effective. The document outlines some key characteristics of big data in healthcare, such as different types of data silos and the 4 Vs of big data. It also discusses drivers for adoption of big data in healthcare and provides examples of how big data can enable quality improvement and cost cutting. Challenges to adoption are outlined as well as some leading big data companies in healthcare. The document
Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.
This ppt is about the cleaning and pre-processing.
Columnar databases store data by column rather than by row. This allows for faster analytical queries on large datasets by minimizing the movement of read/write heads across disk drives. Columnar databases are well-suited for data warehousing and business intelligence tasks that require aggregating large amounts of data, while row-oriented databases are better for transactional applications that involve frequent updates to small subsets of data.
User Case of Migration from MicroStrategy to Power BIGreenM
На Data Monsters Руслан Золотухин, Power BI Trainer & Consultant, описал юзкейсы при переносе отчетов из MicroStrategy в Power BI, рассказал о миграционной стратегии и функциях MicroStrategy, которые пугают всех BI девелоперов.
The document summarizes the founding and goals of Practo Ray, a healthcare technology startup founded in 2008. The startup was founded by two engineering students with the goal of revolutionizing the healthcare industry by integrating it with information technology. Specifically, their goals were to provide easy software solutions for doctors' practices, help patients make informed choices about doctors, and provide an advertising platform for medical practitioners. To achieve this, Practo Ray developed a patient management software using a Software as a Service business model, with features like online appointment booking, a patient database, doctors' schedules, patient medical histories, and a billing system.
By leveraging Big Data, the healthcare industry has an incredible potential to improve lives. This session will give examples of how data volume, velocity and variety is transforming the “art” of a doctor to the science of care. It will describe how the use of machine learning and massive amount of data will drive the new Consumer Drive healthcare movement.
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
The document discusses Chapter 5 from the book "Data Mining: Concepts and Techniques" which covers frequent pattern mining, association rule mining, and correlation analysis. It provides an overview of basic concepts such as frequent patterns and association rules. It also describes efficient algorithms for mining frequent itemsets such as Apriori and FP-growth, and discusses challenges and improvements to frequent pattern mining.
This document discusses the importance of data modeling in Power BI and provides an agenda for a workshop on data modeling techniques. The agenda includes explaining why data modeling is important, relationships, cross filtering, many-to-many relationships, role playing dimensions, and using a calendar table. The workshop aims to explore the features of data modeling in Power BI and enable users to navigate data without having to rewrite queries each time.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
This document outlines a course on knowledge acquisition in decision making, including the course objectives of introducing data mining techniques and enhancing skills in applying tools like SAS Enterprise Miner and WEKA to solve problems. The course content is described, covering topics like the knowledge discovery process, predictive and descriptive modeling, and a project presentation. Evaluation includes assignments, case studies, and a final exam.
Fraud detection is a popular application of Machine Learning. But is not that obvious and not that common as it seems. I'll tell how QuantUp implemented it for WARTA insurance company (a subsidiary of Talanx International AG).
The models developed gave between 10% and 30% of reduction of losses. The project was not a simple one because of the complex process of handling claims and using really rich dataset. The tools applied were R (modeling) and DataWalk (data peparation). You will learn what is important in development of such solutions in general, what was difficult in this particular project, and how to overcome possible difficulties in similar projects.
The document discusses machine learning techniques for big data, including:
1) Various machine learning models like decision trees, linear models, neural networks and their assumptions.
2) Applications of machine learning like predictive modeling, clustering, personalization and optimization.
3) Key aspects of building machine learning systems like feature selection, model selection, evaluation and continuous adaptation.
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
Association analysis is used to uncover relationships between data items by identifying frequent patterns and association rules. The Apriori algorithm is a two-step process used for association rule mining: 1) find frequent itemsets that satisfy a minimum support threshold, and 2) generate strong association rules from the frequent itemsets that meet minimum support and confidence thresholds. Practical issues like level of data aggregation and appropriate support/confidence levels must be considered.
Predicting Hospital Readmission Using TreeNetSalford Systems
This document discusses using a TreeNet model to predict hospital readmissions using data from electronic medical records (EMRs). It provides an overview of the dataset used, which includes over 1,000 clinical and administrative variables for 1,612 heart failure patients. The document describes how TreeNet was used to create a predictive model from the EMR data, including feature selection and parameter tuning. It also discusses evaluating the model, assessing variable importance, and exploring model results to gain clinical insights. The goal is to develop an accurate model that can help reduce the risk of avoidable hospital readmissions.
Classifying Readmissions of Diabetic Patient EncountersMayur Srinivasan
Readmission rates in hospitals are a key indicator on quality of patient care and a clear indication of total cost or inconvenience related to the treatment. Patients with serious medical
conditions such as diabetes mellitus are key drivers of readmission rates owing to the complexity of their illness. Therefore, being able to predict based on certain features whether or not a patient
will need readmission can help doctors and hospitals provide better care initially and not get financially penalized under Obamacare’s readmission policy
This document discusses connecting Oracle Analytics Cloud (OAC) Essbase data to Microsoft Power BI. It provides an overview of Power BI and OAC, describes various methods for connecting the two including using a REST API and exporting data to Excel or CSV files, and demonstrates some visualization capabilities in Power BI including trends over time. Key lessons learned are that data can be accessed across tools through various connections, analytics concepts are often similar between tools, and while partnerships exist between Microsoft and Oracle, integration between specific products like Power BI and OAC is still limited.
This document summarizes a presentation on data discovery. It discusses key concepts in data discovery including data source joining, ontologies and taxonomies, rules of data discovery, single source of truth, and data visualization. It emphasizes the importance of not discarding original data and keeping track of the data transformation process to maintain data provenance and lineage. Overall, the presentation aims to illuminate how to understand and work with data through concepts in data discovery.
Search: Probabilistic Information RetrievalVipul Munot
Probabilistic Information Retrieval uses probability rankings to effectively retrieve documents. It assumes binary relevance and independence between documents. The Binary Independence Model represents documents and queries as term vectors and estimates probabilities of relevance using term frequencies. Documents are ranked by their odds of relevance based on query term matches. In practice, probability estimates use collection frequencies. Extensions allow dependencies between terms and non-binary representations.
This document discusses big data and use cases. It begins by reviewing the history and evolution of big data and advanced analytics. It then explains how technologies like Hadoop, stream processing, and in-memory computing support big data solutions. The document presents two use cases - analyzing credit risk by examining customer transaction data to improve credit offers, and detecting fraud by analyzing financial transactions for unusual patterns that could indicate suspicious activity. It describes how these use cases leverage technologies like Oracle R Connector for Hadoop to run analytics and machine learning algorithms on large datasets.
Power BI can be used either through Power BI Desktop or Power BI Embedded. Power BI Desktop is a free desktop application that allows connecting to various data sources and creating visual analytics. Power BI Embedded allows integrating Power BI visualizations into web and mobile applications. Reports in Power BI combine visuals and filters to analyze data, while dashboards combine multiple reports. Filters and slicers allow filtering the data in visuals. Authentication is handled through Azure Active Directory, while access is controlled using various token types.
This document discusses how big data can be used in the healthcare sector to improve outcomes and reduce costs. It begins by defining big data and describing how large corporations have been using big data for years. It then draws a parallel between how big data helped answer what advertising worked for companies like Google, and how big data can help determine which medical treatments are effective. The document outlines some key characteristics of big data in healthcare, such as different types of data silos and the 4 Vs of big data. It also discusses drivers for adoption of big data in healthcare and provides examples of how big data can enable quality improvement and cost cutting. Challenges to adoption are outlined as well as some leading big data companies in healthcare. The document
Data pre-processing is a data mining technique that involves transforming raw data into an understandable format. Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.
This ppt is about the cleaning and pre-processing.
Columnar databases store data by column rather than by row. This allows for faster analytical queries on large datasets by minimizing the movement of read/write heads across disk drives. Columnar databases are well-suited for data warehousing and business intelligence tasks that require aggregating large amounts of data, while row-oriented databases are better for transactional applications that involve frequent updates to small subsets of data.
User Case of Migration from MicroStrategy to Power BIGreenM
На Data Monsters Руслан Золотухин, Power BI Trainer & Consultant, описал юзкейсы при переносе отчетов из MicroStrategy в Power BI, рассказал о миграционной стратегии и функциях MicroStrategy, которые пугают всех BI девелоперов.
The document summarizes the founding and goals of Practo Ray, a healthcare technology startup founded in 2008. The startup was founded by two engineering students with the goal of revolutionizing the healthcare industry by integrating it with information technology. Specifically, their goals were to provide easy software solutions for doctors' practices, help patients make informed choices about doctors, and provide an advertising platform for medical practitioners. To achieve this, Practo Ray developed a patient management software using a Software as a Service business model, with features like online appointment booking, a patient database, doctors' schedules, patient medical histories, and a billing system.
By leveraging Big Data, the healthcare industry has an incredible potential to improve lives. This session will give examples of how data volume, velocity and variety is transforming the “art” of a doctor to the science of care. It will describe how the use of machine learning and massive amount of data will drive the new Consumer Drive healthcare movement.
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
The document discusses Chapter 5 from the book "Data Mining: Concepts and Techniques" which covers frequent pattern mining, association rule mining, and correlation analysis. It provides an overview of basic concepts such as frequent patterns and association rules. It also describes efficient algorithms for mining frequent itemsets such as Apriori and FP-growth, and discusses challenges and improvements to frequent pattern mining.
This document discusses the importance of data modeling in Power BI and provides an agenda for a workshop on data modeling techniques. The agenda includes explaining why data modeling is important, relationships, cross filtering, many-to-many relationships, role playing dimensions, and using a calendar table. The workshop aims to explore the features of data modeling in Power BI and enable users to navigate data without having to rewrite queries each time.
Disclaimer :
The images, company, product and service names that are used in this presentation, are for illustration purposes only. All trademarks and registered trademarks are the property of their respective owners.
Data/Image collected from various sources from Internet.
Intention was to present the big picture of Big Data & Hadoop
This document outlines a course on knowledge acquisition in decision making, including the course objectives of introducing data mining techniques and enhancing skills in applying tools like SAS Enterprise Miner and WEKA to solve problems. The course content is described, covering topics like the knowledge discovery process, predictive and descriptive modeling, and a project presentation. Evaluation includes assignments, case studies, and a final exam.
Fraud detection is a popular application of Machine Learning. But is not that obvious and not that common as it seems. I'll tell how QuantUp implemented it for WARTA insurance company (a subsidiary of Talanx International AG).
The models developed gave between 10% and 30% of reduction of losses. The project was not a simple one because of the complex process of handling claims and using really rich dataset. The tools applied were R (modeling) and DataWalk (data peparation). You will learn what is important in development of such solutions in general, what was difficult in this particular project, and how to overcome possible difficulties in similar projects.
The document discusses machine learning techniques for big data, including:
1) Various machine learning models like decision trees, linear models, neural networks and their assumptions.
2) Applications of machine learning like predictive modeling, clustering, personalization and optimization.
3) Key aspects of building machine learning systems like feature selection, model selection, evaluation and continuous adaptation.
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
the presentation contains the following :
-Data Objects and Attribute Types.
-Basic Statistical Descriptions of Data.
-Data Visualization.
-Measuring Data Similarity and Dissimilarity.
-Summary.
Association analysis is used to uncover relationships between data items by identifying frequent patterns and association rules. The Apriori algorithm is a two-step process used for association rule mining: 1) find frequent itemsets that satisfy a minimum support threshold, and 2) generate strong association rules from the frequent itemsets that meet minimum support and confidence thresholds. Practical issues like level of data aggregation and appropriate support/confidence levels must be considered.
Predicting Hospital Readmission Using TreeNetSalford Systems
This document discusses using a TreeNet model to predict hospital readmissions using data from electronic medical records (EMRs). It provides an overview of the dataset used, which includes over 1,000 clinical and administrative variables for 1,612 heart failure patients. The document describes how TreeNet was used to create a predictive model from the EMR data, including feature selection and parameter tuning. It also discusses evaluating the model, assessing variable importance, and exploring model results to gain clinical insights. The goal is to develop an accurate model that can help reduce the risk of avoidable hospital readmissions.
Classifying Readmissions of Diabetic Patient EncountersMayur Srinivasan
Readmission rates in hospitals are a key indicator on quality of patient care and a clear indication of total cost or inconvenience related to the treatment. Patients with serious medical
conditions such as diabetes mellitus are key drivers of readmission rates owing to the complexity of their illness. Therefore, being able to predict based on certain features whether or not a patient
will need readmission can help doctors and hospitals provide better care initially and not get financially penalized under Obamacare’s readmission policy
Applied Multivariable Modeling in Public Health: Use of CART and Logistic Reg...Salford Systems
This document discusses using logistic regression and classification and regression tree (CART) analysis to develop models for predicting undiagnosed HIV infection. The models were developed using data from over 10,000 patients. Logistic regression was used to create the Denver HIV Risk Score based on demographics, sexual behaviors, and other risk factors. CART analysis was then used to develop a decision tree to classify patients into risk groups. Both methods showed good ability to predict undiagnosed HIV. Future work includes external validation of the decision tree and comparing screening approaches.
This document provides a summary of a project to analyze factors related to readmission of diabetes patients using a dataset from 130 US hospitals. The team cleaned the data by removing attributes with high percentages of missing values, irrelevant attributes, and instances of deceased patients. They applied the SMOTE technique to address data imbalance, oversampling the minority readmission class by 200%. Three classifiers - J48 decision tree, Naive Bayes, and Bayes Net - were selected for experiments to predict patient readmission.
This document summarizes an analysis of customer reviews on Yelp for 212 Starbucks stores in Manhattan. Key findings include that the average rating was 2.8 stars, with some stores receiving mainly 1-2 star reviews. The analysis identified factors correlated with lower ratings, such as comment length and store location. It also found specific Manhattan neighborhoods tended to have higher complaints about issues like product, service, wait times and environment. Recommendations focused on hiring more staff, improving service and addressing top complaints on a neighborhood level.
Adopt clustering and sentiment analysis to find out which topic Jeb Bush cared the most from his email contents during 1999-2006 and compare the voters’ opinions from Twitter
Use Data Mining technique (Decision Tree, Bootstrap Forest,Boosted Tree, Neural Network, Nominal Logistic Regression) to predict wining probability for each team. Accuracy in 2015 (until 3/26) is 73%.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help boost feelings of calmness, happiness and focus.
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEChaoyi WU
This document describes a study that uses multiple linear regression to model the rate of uninsured population in counties in Georgia. The study finds that the uninsured rate is closely related to demographic factors like age distribution, income levels, employment rates, gender distribution, and citizenship status. Specifically, counties with larger populations aged 18-24, higher median incomes, lower poverty rates, stronger job markets, and more native-born residents tended to have lower uninsured rates. The researchers used principal component analysis to address correlations between employment-related variables before selecting variables and building the regression model.
Churn Modeling-For-Mobile-Telecommunications Salford Systems
This document summarizes a study on predicting customer churn for a major mobile provider. TreeNet models were used to predict the probability of customers churning (switching providers) within a 30-60 day period. TreeNet models significantly outperformed other methods, increasing accuracy and the proportion of high-risk customers identified. Applying the most accurate TreeNet models could translate to millions in additional annual revenue by helping the provider preemptively retain more customers.
This document discusses using PROC FREQ and PROC LOGISTIC in SAS to analyze a binary outcome variable using contingency tables and logistic regression. It analyzes data on smoking status and abnormal breathing tests. PROC FREQ generates contingency tables and finds that the odds of an abnormal test are about 2.8 times higher for current smokers compared to never smokers. PROC LOGISTIC fits a logistic regression model that estimates the increase in log odds of an abnormal test for current smokers is 1.01 compared to never smokers.
5 Vital Tips to Help Reduce Readmissions in HospitalsJuran Global
This document provides tips for reducing hospital readmissions. It discusses the importance of accurately analyzing the root causes of readmission problems using valid data and proven methods like Six Sigma. The first tip is that there are no "magic bullets" and organizations need to properly diagnose issues before implementing solutions. Additional tips include engaging a multi-functional team to address readmissions and ensuring leadership support. Data analysis methods like process mapping and cause-and-effect diagrams are recommended to identify factors contributing to excess readmissions. Overall the document emphasizes the need for rigorous problem analysis over quick fixes in order to successfully reduce readmission rates.
This document discusses using data analytics to analyze patient readmission rates using temporal features and predictive modeling. It describes using a clinical data warehouse containing inpatient encounter data to identify variables associated with 30-day readmissions. Key variables analyzed include comorbidities, procedures, labs, medications, and temporal patterns over multiple encounters. Models are developed for specific subpopulations to identify variables predictive of readmissions.
This competition is challenging because the attendants are asked to use Microsoft Azure to analyze any open data. The judge would evaluate team's performance based on visualization and modeling. Our team analyzed the factors that cause multiple car crashes and the correlation between Vehicle crashes and repair shops.
Improve Your Regression with CART and RandomForestsSalford Systems
Why You Should Watch: Learn the fundamentals of tree-based machine learning algorithms and how to easily fine tune and improve your Random Forest regression models.
Abstract: In this webinar we'll introduce you to two tree-based machine learning algorithms, CART® decision trees and RandomForests®. We will discuss the advantages of tree based techniques including their ability to automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values. We'll explore the CART algorithm, bootstrap sampling, and the Random Forest algorithm (all with animations) and compare their predictive performance using a real world dataset.
Learn how to do a conjoint analysis project in 1 hrQuestionPro
Survey Analytics provides conjoint analysis software to help companies evaluate new products and variations of existing products. The software allows users to define product attributes and levels, conduct surveys to assess consumer preferences, and analyze results. Key features include an intuitive interface for setting up studies, previewing concepts, and reviewing utility values and relative importance of attributes. The software also includes a market segmentation simulator that allows predicting how changes to products may impact market share. Conjoint analysis provides a cost-effective way to test concepts without full product development and can help optimize offerings to meet consumer demands.
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
This document discusses predictive modeling in insurance in the context of big data. It begins with an introduction to the speaker and outlines some key concepts in actuarial science from both American and European perspectives. It then provides examples of common actuarial problems involving ratemaking, pricing, and claims reserving. The document reviews the history of actuarial models and discusses issues around statistical learning, machine learning, and their relationship to statistics. It also covers model evaluation and various loss functions used in modeling.
The document discusses decision trees and random forest algorithms. It begins with an outline and defines the problem as determining target attribute values for new examples given a training data set. It then explains key requirements like discrete classes and sufficient data. The document goes on to describe the principles of decision trees, including entropy and information gain as criteria for splitting nodes. Random forests are introduced as consisting of multiple decision trees to help reduce variance. The summary concludes by noting out-of-bag error rate can estimate classification error as trees are added.
The study explores major factors that contribute to hospital readmissions via various analysis algorithms, including decision tree, neutral network and Bayesian network.
Are you interested in learning how to prevent hospital readmissions for your diabetic population? It is a popular belief that measuring blood glucose for your diabetic population is the most predictive variable in determining a hospital readmission for a diabetic. However, many providers of care simply do not perform the test on known diabetic patients. This study takes a look at an advanced analytic method that works within the current healthcare providers workflow to looks to identify the likelihood of a future 30-day unplanned readmission before hospital discharge.
How Allina Health Uses Analytics to Transform Care - HAS Session 16Health Catalyst
Allina Health is a large health system in Minnesota serving over 3 million patients annually. Facing rising healthcare costs and an unsustainable model, Allina transitioned to focus on higher value care through advanced analytics. Allina developed predictive models and dashboards to identify at-risk patients, coordinate care more effectively, and drive improvements in outcomes like reducing readmissions. By leveraging enterprise data and strategic use of analytics, Allina has achieved leading quality scores and cost reductions while better supporting providers to improve population health.
Using the Bigtown Simulation Model to Predict the Impact of Enhanced Seven Day Services on Hospital Performance and Patient Outcomes
Poster from the 'Delivering NHS services, seven days a week' event held in Birmingham on 16 November 2013
More information about this event can be found at
http://www.nhsiq.nhs.uk/news-events/events/nhs-services-seven-days-a-week.aspx
The Imperative of Linking Clinical and Financial Data to Improve Outcomes - H...Health Catalyst
Quality and cost improvements require the intelligent use of financial and clinical data coupled with education for multi-disciplinary teams who are driving process improvements. Once a data warehouse is established, healthcare organizations need to set up multi-disciplinary clinical, financial, and IT specialist teams to make the best use of the data. Sometimes, financial involvement is minimized or even excluded for a number of reasons that can turn out to be counterproductive. However, including financial measurements and participation up front can help enhance the recognized value and sustainability of quality improvement or waste reduction efforts. the In this session you will learn keys to success and real-life examples of linking clinical, financial and patient satisfaction data via multi-disciplinary teams that produce impressive results.
IV Congresso Internacional CBA2017
Emerging Technologies and the Quality of Care
David W. Bates, MD, MSc, Chief, Division of General Internal Medicine, Brigham and Women’s Hospital, Past President, ISQua
Advanced Laboratory Analytics — A Disruptive Solution for Health SystemsViewics
Advanced laboratory analytics can provide a disruptive solution for health systems facing challenges under value-based care models. Laboratory data is well-suited for advanced analytics due to its timeliness, structured format, ubiquity across settings and providers, and predictive potential. Laboratory-based predictive algorithms and clinical decision support tools can help optimize outcomes like readmissions, adverse events, costs, and disease management. By leveraging laboratory data and analytics, health systems can better manage patient populations, make personalized medical decisions, and support value-based care goals of improving quality while reducing costs.
The document recommends continuing to use the Post-Discharge Questionnaire (PDQ) on Unit P7 to assess patient satisfaction and gather data in real-time. Results from the PDQ administered in June 2011 to 101 discharged patients on Unit P7 showed high patient satisfaction scores above the 80th percentile for most discharge measures. Scores were below the 80th percentile for information provided in the discharge packet. The PDQ provided a larger sample size than other surveys and captured feedback that can be used to improve the discharge process.
Every hospital and health care system is significantly impacted by readmission policies mandated by new regulations.
And every facility must implement strategies to reduce the number of costly and unnecessary readmissions.
During this presentation you will discover how to decrease your readmission rates and take advantage of incentives, rather than suffer penalties that can significantly impact your bottom line.
1) The document discusses sample size calculation for medical audits. It provides the key ingredients needed which include target performance, population size, confidence level, and precision.
2) It gives an example where the target population is 300 hypertensive patients, target performance is 70%, precision is 5%, and confidence level is 95%. Using a sample size calculator, it is determined that a minimum sample size of 71 is needed.
3) It emphasizes that the sample size should be inflated by 10-20% to account for missing data. The aim of data analysis is to assess if the current practice meets the agreed standards by interpreting the collected data through descriptive statistics and simple analytical methods.
This document discusses measurement for quality improvement in healthcare. It defines measurement as the systematic collection of quantifiable data about processes and outcomes over time or at a single point in time. The purpose of measurement is to identify ways to improve, track performance improvements, and focus efforts on the right areas. Measurement should involve employees and measure effectiveness, efficiency, and support for strategic initiatives. Examples of potential measures for male and female wards are provided, including outcomes, processes, balancing measures. Cause and effect diagrams and building a cascading system of measures from the hospital board level down to individual caregivers and patients is also discussed.
This document discusses measurement for quality improvement in healthcare. It defines measurement as the systematic collection of quantifiable data over time or at a single point in time about processes and outcomes. The purpose of measurement is to identify areas for improvement, track performance changes, and focus efforts on strategic priorities. The document recommends measuring effectiveness, efficiency, and factors that support strategic goals using simple metrics developed with employee input. Examples provided include measures for length of stay, patient satisfaction, and infection rates. Cause-and-effect diagrams and a framework for cascading measures across different system levels are also presented.
Methods for Observational Comparative Effectiveness Research on Healthcare De...Marion Sills
Research Objective: The SAFTINet project was funded by the AHRQ to build a distributed network of existing clinical and claims data that would support comparative effectiveness research (CER), with a focus on underserved populations and healthcare delivery system (HDS) characteristics. Observational research methods are appropriate, but require detailed protocols with a priori hypotheses and analytic plans. SAFTINet research specifically concerns the effects of a discrete set of HDS features (those often included in Patient-Centered Medical Home (PCMH) models) on health outcomes for primary care patients with asthma, hypertension, and hypercholesterolemia. Our objective is to present a description of this study’s measurement challenges, and to specify a priori hypotheses, analytic strategies, and plans for addressing bias and confounding for our asthma cohorts.
Study Design: An observational, longitudinal cohort study of primary care patients with asthma, with both secondary use of existing clinical and claims data and primary data collection for HDS features and patient- reported outcomes.
Population Studied: Our sample consists of 59 primary care practices in 5 healthcare organizations in Colorado, Utah and Tennessee; all practices serve underserved populations. These practices care for about 275,000 patients per year, of whom an estimated 22,000 have a diagnosis of asthma.
Principal Findings: We will present the processes used to define and measure the HDS features, covariates and asthma outcomes, along with planned analysis. Challenges include valid measurement of a multi-faceted HDS “exposure” variable, the inability to identify exposure onset, and the non-dichotomous nature of HDS characteristics. To measure HDS characteristics, we created a practice-level survey assessing 9 PCMH domains, including care coordination, specialty care and mental health integration, and patient-centeredness, as well as asthma-specific HDS characteristics (e.g., the use of asthma registries). Asthma outcomes included (1) those available as a result of routine electronic documentation of clinical care and claims administration (utilization indicative of an exacerbation), and (2) patient reported outcomes tools (Asthma Control Test). We used directed acyclic graphs to identify potential confounders of the relationship between HDS characteristics and asthma control, as well as other potential biases. The analytic plan is based on linear mixed effects models. Perspectives of the CER team, the technology team and the community engagement group were considered in the operationalization of all variables.
Conclusions: The design of rigorous observational CER observational CER should recognize the need for an intense planning phase. In accordance with good practice guidance for observational studies, an important component of the planning phase is to disseminate and obtain feedback on the research design in advance of its conduct.
Making Healthcare Waste Reduction and Patient Safety Actionable - HAS Session 6Health Catalyst
Multiple studies have estimated that at least 30% of US healthcare expenditures are wasteful. But how do you identify and reduce that waste? In this session, we will share with you a three-part framework for understanding, measuring and addressing waste reduction. In particular, we will highlight the importance patient safety and injury prevention, framing the importance of shifting from a system of incident reporting (which creates a culture of blame and guilt) to a system in which patient injury is regarded as a process failure rather than a person failure. To make that transition, health systems will need to 1) define process flows and metrics for each major type of patient injury; and 2) create a learning environment in which team members are engaged in process redesign to prevent process failure and injury. A leading health system in patient safety and quality will also share their best practices in how they have created a culture of patient safety and quality.
The document analyzes the potential impact of smoothing inpatient occupancy over the entire week at children's hospitals. It finds that smoothing could reduce the number of hospitals and patients exposed to peak mid-week occupancy above 95%. Specifically, smoothing reduced the weekly maximum occupancy by an average of 6.6 percentage points and would require rescheduling 2.6% of admissions on average each week to achieve. While smoothing shows potential to improve patient flow and care quality, limitations include not modeling all hospital responses and using midnight census rather than peak daily occupancy.
Decision Support System to Evaluate Patient Readmission RiskAvishek Choudhury
This document presents a decision support system for predicting patient readmission risk. It discusses the importance of reducing patient readmissions due to factors like healthcare costs and patient health. The author develops a predictive model using a dataset of 68,000 patient instances and 14 attributes. Feature selection identifies the top 5 important parameters. Various machine learning methods are tested, with support vector machines achieving the best accuracy of 97% on preprocessed data. A genetic algorithm and greedy ensemble further improve the prediction accuracy to 98.5% using gradient boosting. The author concludes the model can help identify at-risk patients to minimize preventable readmissions.
PCMH: Part 4 – Learn How to Start or Improve Your Quality Improvement ProgramJulie Champagne
We wrap up our PCMH series with a deep dive into Standard 5-Care Coordination and Care Transitions and Standard 6- Performance Measurement and Quality Improvement. How are you handling referrals and transitions of care today? Do you need to make changes to optimize the process? We’ll review care coordination elements and factors as well as the performance improvement standards, elements, and associated factors in this webinar to complete your practice’s PCMH transformation!
Understand what healthcare analytics is.
Identify the 5-stage Analytics Program Lifecycle (APL).
Understand how data analytics can be used in healthcare.
Check it on Experfy: https://www.experfy.com/training/courses/introduction-to-healthcare-analytics.
Similar to Data mining for diabetes readmission (20)
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
3. Abstract
Background: Alarmingly high risk of readmissions in the US
Goal: discover factors contributing to hospitals readmissions
Methods: C5.0 Decision Tree, Quest, Neural Network and Bayesian Network
Results: emergency readmissions occur most frequently
Conclusions: effective prediction on readmissions enables hospitals to
identify and target patients at the highest risk
3
4. Introduction
Topic: Diabetes Readmission Prediction
What Is Readmission Rate
A hospitalization that occurs within 30 days after a discharge
Why Is Readmission Important
Reduce cost of care and medical disputes
Improve patients’ safety and health
4
5. Data Description
Data source: The data is from the Center
for Clinical and Translational Research,
Virginia Commonwealth University
The dataset represents 10 years (1999-
2008) of clinical care at 130 US
hospitals and integrated delivery
networks. It includes101,766 instances
and 55 features representing patient and
hospital outcomes.
Link:
http://www.cioslab.vcu.edu/index.html
Attribute Description
Admission type
patient's admission type: emergency, urgent, elective,
newborn, not available, etc.
Time in hospital Integer number of days between admission and discharge
Number of lab
procedures Number of lab tests performed during the encounter
Number of outpatient
visits
Number of outpatient visits of the patient in the year
preceding the encounter
Number of emergency
visits
Number of emergency visits of the patient in the year
preceding the encounter
Number of inpatient
visits
Number of inpatient visits of the patient in the year
preceding the encounter
Number of diagnoses Number of diagnoses entered to the system
A1c test result The range of the result or if the test was not taken
Diabetes medications
Whether there was any diabetic medication prescribed or
not
Readmitted Days to inpatient readmission
Main attributes :
5
6. Methodology
Business Outlook
Data
Processing
Variable
Selection
Accuracy
Comparison
Conclusion
Model Building
Problem
Statement
Data
Pre-mining
Perform
AnalysisEvaluation
6
7. Problem Statement
Identify the major factors that contribute to hospital readmissions
Measure the influence of every attribute
Compare accuracy of each model
7
8. 0%
20%
40%
60%
80%
100%
As-IS Now
11%
49%
89%
51%
Patient Distribution
Readmission < 30 days Non-within 30 days
Data Pre-mining
Re-categorize Readmission group from 3 groups (<30, >30 days and No)
to 2 groups (Readmission <30 days and Non-within 30 days)
Review all variables’ distribution to ensure data quality
Make Readmission group’s sample ratio be 1:1
Review each variable’s relationship with Readmission
Review numeric variables’ correlation
8
9. Variable’s relationship with Readmission
Remove irrelevant variables
(EX: patient ID, payer code)
Delete unknown value
Eliminate variables only
has majority value
9
10. Variable’s relationship with Readmission
Found some variables have
different behavior between
readmission group
Diagnoses Number
Admission Type ID
Inpatient Number
10
11. Review numeric variables’ correlation
No correlation between
numeric variables’
correlation
Variables are independent
11
12. Perform Analysis
Input variables: 14; Output variable: Readmission group
Total sample size: 23,154
Partition: 70% on Training; 30% on Testing
Build models from (1) Decision Tree Analysis: C5.0 & QUEST
(2) Apply Neural Network Methodology
(3) Perform Bayes’ Analysis
12
15. 13
Neural Network 15
Training data Testing data
Model Performance:
66% accuracy in testing
dataset
Predictor Importance:
Top 5:
Number of inpatient visits
Admission type
Admission source
Discharge disposition
Number of emergency visits
16. Why Use Bayesian Network
• Have probability for reference
• Allow variables’ dependency
• Well suited for categorical variables
16
18. • Overall Model:
Training accuracy: 70.67%
Testing accuracy: 69.51%
• Model on “Yes” results:
Training accuracy: 71.35%
Testing accuracy: 69.54%
19. 19Decision Tree: C5.0
Model Performance:
65% accuracy in testing
dataset
Predictor Importance:
Top 5:
Admission type
Admission source
Discharge disposition
Number of inpatient visits
Number of lab procedures
21. Decision Tree: QUEST
What’s QUEST: Quick, Unbiased and Efficient Statistical Tree
Advantage:
Easily handle categorical predictor variables with many categories
Use imputation to deal with missing values
It provides linear splits using Fisher's LDA method
How to improve accuracy
Boosting: Randomly re-sample N dataset -> create
N trees -> optimal by weighted average
Bagging: optimal by aggregated average
21
22. Decision Tree: QUEST
Adopt Boosting method in QUEST
Model Performance:
72% accuracy in testing dataset
Variable Importance:
Top 5: Admission Type,
Admission Source,
Number of inpatient visits
Discharge Deposition,
Number of diagnoses
22
Training data Testing data
23. Model Evaluation 23
QUEST C5.0 Neural Net Bayesian
Admission Type Admission Type Inpatient # Discharge Disposition
Admission Source Admission Source Admission Type Admission Source
Inpatient # Discharge Disposition Admission Source Inpatient #
Discharge Disposition Inpatient # Discharge Disposition Lab Procedures #
Diagonose # Lab Procedures # Emergency # Admission Type
Top Important Variables:
QUEST has the best accuracy
• Top 4 variables: Admission Type, Admission
Source, number of inpatient visits and
Discharge disposition.
24. NUMBER OF INPATIENT VISITS
(Number of inpatient visits of the patient in the year
preceding the encounter)
From the graph, it’s shown that number of inpatient visits
potentially increases the risk of readmission. For patients
who stayed in a hospital over 15 times in a year, they
would be 100% readmitted to the hospital within 30 days.
Inpatient treatment facilities should better provide
additional and more specialized medical care to reduce
their readmission rate.
Evaluation of top 4 important attributes
25. 25ADMISSION TYPE
(1.Emergency, 2.Urgent, 3.Elective,
4.Newborn, 5.Not Available, 6.NULL,
7.Trauma Center, 8.Not Mapped )
ADMISSION SOURCE
(physician referral, emergency room,
and transfer from a hospital, etc.)
DISCHARGE DISPOSITION
(discharged to home, expired, and Hospice
/ home, etc. )
27. 27
27Conclusion
• Data pre-mining is of upmost importance in improving the model accuracy
(1%72%);
• The readmission groups are related to admission source,admission type,
discharge disposition and number of inpatient visits;
• The readmission groups do not solely depend on any single variable, but the
interactions of related variables;
• Instead of tracking all 55 attributes, hospitals are suggested to focus on number
of patient’s inpatient visits, admission source, admission type, discharge
disposition;
• Hospitals are advised to concern not only inpatient treatment but also continuing
care after discharge.
Editor's Notes
From the perspective of number_inpatient, we can see that when number_inpatient is less than 3, the readimission rate is relatively low. By contrast, when it is higher or equal to three, the readmission rate becomes very high. And it even goes to 100 percent when it is larger than or equals to 14.So, the hospitals should aimed at increasing the number of inpatient visits of the patient in the year preceding the encounter to increase the readmission rate of the patients, thus realizing their business goal.