Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case StudyStudyStudyStudyStudy
Case Study 1 (Job Data)
Below is the structure of the table with the definition of each column that you must work on:
Table-1: job_data
job_id: unique identifier of jobs
actor_id: unique identifier of actor
event: decision/skip/transfer
language: language of the content
time_spent: time spent to review the job in seconds
org: organization of the actor
ds: date in the yyyy/mm/dd format. It is stored in the form of text and we use presto to run. no need for date function
Use the dataset attached in the Dataset section below the project images then answer the questions that follows
Number of jobs reviewed: Amount of jobs reviewed over time.
Your task: Calculate the number of jobs reviewed per hour per day for November 2020?
Throughput: It is the no. of events happening per second.
Your task: Let’s say the above metric is called throughput. Calculate 7 day rolling average of throughput? For throughput, do you prefer daily metric or 7-day rolling and why?
Percentage share of each language: Share of each language for different contents.
Your task: Calculate the percentage share of each language in the last 30 days?
Duplicate rows: Rows that have the same value present in them.
Your task: Let’s say you see some duplicate rows in the data. How will you display duplicates from the table?
Case Study 2 (Investigating metric spike)
The structure of the table with the definition of each column that you must work on is present in the project image
Table-1: users
This table includes one row per user, with descriptive information about that user’s account.
Table-2: events
This table includes one row per event, where an event is an action that a user has taken. These events include login events, messaging events, search events, events logged as users progress through a signup funnel, events around received emails.
Table-3: email_events
This table contains events specific to the sending of emails. It is similar in structure to the events table above.
Use the dataset attached in the Dataset section below the project images then answer the questions that follows
User Engagement: To measure the activeness of a user. Measuring if the user finds quality in a product/service.
Your task: Calculate the weekly user engagement?
User Growth: Amount of users growing over time for a product.
Your task: Calculate the user growth for product?
Weekly Retention: Users getting retained weekly after signing-up for a product.
Your task: Calculate the weekly retention of users-sign up cohort?
Weekly Engagement: To measure the activeness of a user. Measuring if the user finds quality in a product/service weekly.
Your task: Calculate the weekly engagement per device?
Email Engagement: Users engaging with the email service.
Your task: Calculate the email engagement metrics?
Project Description
Give a brief about your project description i.e. what is this project about, how are you going to handle the things and what are the things that you are going to find out through the project.
Approach
Write a short paragraph about your approach towards the project and how you have executed it.
Tech-Stack Used
Do mention the software and the version used while making the project (For Eg. Jupyter Notebook, etc) and mention the purpose of using it.
Insights
Jot down the insights and the knowledge you gained while making the project. You need to write that what do you infer about the things. Make sure its brief and up to the point only. For Eg. If you got a graph then what do you understand by the graph, what changes can you make or what can you derive from the graph.
Result
Mention what have you achieved while making the project and how do you think it has helped you.
Drive Link
Save your file as a “.pdf” file and upload it to your Google Drive. Mention the sharable link (link visibility should be set to public) in your pdf file which you will be uploading. Do not directly upload your project.
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
This case study aims to identify patterns which indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case Credit EDA Case StudyStudyStudyStudyStudy
Case Study 1 (Job Data)
Below is the structure of the table with the definition of each column that you must work on:
Table-1: job_data
job_id: unique identifier of jobs
actor_id: unique identifier of actor
event: decision/skip/transfer
language: language of the content
time_spent: time spent to review the job in seconds
org: organization of the actor
ds: date in the yyyy/mm/dd format. It is stored in the form of text and we use presto to run. no need for date function
Use the dataset attached in the Dataset section below the project images then answer the questions that follows
Number of jobs reviewed: Amount of jobs reviewed over time.
Your task: Calculate the number of jobs reviewed per hour per day for November 2020?
Throughput: It is the no. of events happening per second.
Your task: Let’s say the above metric is called throughput. Calculate 7 day rolling average of throughput? For throughput, do you prefer daily metric or 7-day rolling and why?
Percentage share of each language: Share of each language for different contents.
Your task: Calculate the percentage share of each language in the last 30 days?
Duplicate rows: Rows that have the same value present in them.
Your task: Let’s say you see some duplicate rows in the data. How will you display duplicates from the table?
Case Study 2 (Investigating metric spike)
The structure of the table with the definition of each column that you must work on is present in the project image
Table-1: users
This table includes one row per user, with descriptive information about that user’s account.
Table-2: events
This table includes one row per event, where an event is an action that a user has taken. These events include login events, messaging events, search events, events logged as users progress through a signup funnel, events around received emails.
Table-3: email_events
This table contains events specific to the sending of emails. It is similar in structure to the events table above.
Use the dataset attached in the Dataset section below the project images then answer the questions that follows
User Engagement: To measure the activeness of a user. Measuring if the user finds quality in a product/service.
Your task: Calculate the weekly user engagement?
User Growth: Amount of users growing over time for a product.
Your task: Calculate the user growth for product?
Weekly Retention: Users getting retained weekly after signing-up for a product.
Your task: Calculate the weekly retention of users-sign up cohort?
Weekly Engagement: To measure the activeness of a user. Measuring if the user finds quality in a product/service weekly.
Your task: Calculate the weekly engagement per device?
Email Engagement: Users engaging with the email service.
Your task: Calculate the email engagement metrics?
Project Description
Give a brief about your project description i.e. what is this project about, how are you going to handle the things and what are the things that you are going to find out through the project.
Approach
Write a short paragraph about your approach towards the project and how you have executed it.
Tech-Stack Used
Do mention the software and the version used while making the project (For Eg. Jupyter Notebook, etc) and mention the purpose of using it.
Insights
Jot down the insights and the knowledge you gained while making the project. You need to write that what do you infer about the things. Make sure its brief and up to the point only. For Eg. If you got a graph then what do you understand by the graph, what changes can you make or what can you derive from the graph.
Result
Mention what have you achieved while making the project and how do you think it has helped you.
Drive Link
Save your file as a “.pdf” file and upload it to your Google Drive. Mention the sharable link (link visibility should be set to public) in your pdf file which you will be uploading. Do not directly upload your project.
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)PRABHASH GOKARN
An attempt to analyze Bank Data on loans and find patterns in the data that are predictors of loan defaults. This will ensure that future loan decisions are made more logically and reduce possible defaults. The analysis has been done using Python.
Project Description
Give a brief about your project description i.e. what is this project about, how are you going to handle the things and what are the things that you are going to find out through the project.
Approach
Write a short paragraph about your approach towards the project and how you have executed it.
Tech-Stack Used
Do mention the software and the version used while making the project (For Eg. Jupyter Notebook, etc) and mention the purpose of using it.
Insights
Jot down the insights and the knowledge you gained while making the project. You need to write that what do you infer about the things. Make sure its brief and up to the point only. For Eg. If you got a graph then what do you understand by the graph, what changes can you make or what can you derive from the graph.
Result
Mention what have you achieved while making the project and how do you think it has helped you.
Drive Link
Save your file as a “.pdf” file and upload it to your Google Drive. Mention the sharable link (link visibility should be set to public) in your pdf file which you will be uploading. Do not directly upload your project.
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
User analysis is the process by which we track how users engage and interact with our digital product (software or mobile application) in an attempt to derive business insights for marketing, product & development teams.
These insights are then used by teams across the business to launch a new marketing campaign, decide on features to build for an app, track the success of the app by measuring user engagement and improve the experience altogether while helping the business grow.
You are working with the product team of Instagram and the product manager has asked you to provide insights on the questions asked by the management team.
1.Find the 5 oldest users of the Instagram from the database provided
2. Find the users who have never posted a single photo on Instagram
3.Identify the winner of the contest and provide their details to the team
4. Identify and suggest the top 5 most commonly used hashtags on the platform
5. What day of the week do most users register on? Provide insights on when to schedule an ad campaign
6. Provide how many times does average user posts on Instagram. Also, provide the total number of photos on Instagram/total number of users
7. Provide data on users (bots) who have liked every single photo on the site (since any normal user would not be able to do this).
The capstone project is a Machine Learning application that creates a model for a famous bank in New Jersey.
It analyzes their Clients who took loans in their bank based on various parameters.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Exploratory Data Analysis Example - Credit Risk Analysis (Second Attempt)PRABHASH GOKARN
An attempt to analyze Bank Data on loans and find patterns in the data that are predictors of loan defaults. This will ensure that future loan decisions are made more logically and reduce possible defaults. The analysis has been done using Python.
Project Description
Give a brief about your project description i.e. what is this project about, how are you going to handle the things and what are the things that you are going to find out through the project.
Approach
Write a short paragraph about your approach towards the project and how you have executed it.
Tech-Stack Used
Do mention the software and the version used while making the project (For Eg. Jupyter Notebook, etc) and mention the purpose of using it.
Insights
Jot down the insights and the knowledge you gained while making the project. You need to write that what do you infer about the things. Make sure its brief and up to the point only. For Eg. If you got a graph then what do you understand by the graph, what changes can you make or what can you derive from the graph.
Result
Mention what have you achieved while making the project and how do you think it has helped you.
Drive Link
Save your file as a “.pdf” file and upload it to your Google Drive. Mention the sharable link (link visibility should be set to public) in your pdf file which you will be uploading. Do not directly upload your project.
This case study aims to identify patterns that indicate if a client has difficulty paying their installments which may be used for taking actions such as denying the loan, reducing the amount of the loan, lending (to risky applicants) at a higher interest rate, etc. This will ensure that the consumers capable of repaying the loan are not rejected. Identification of such applicants using EDA is the aim of this case study.
In other words, the company wants to understand the driving factors (or driver variables) behind loan default, i.e. the variables which are strong indicators of default. The company can utilize this knowledge for its portfolio and risk assessment.
What is Predictive Analytics?
Predictive Analytics is the stream of the advanced analytics which utilizes diverse techniques like data mining, predictive modelling, statistics, machine learning and artificial intelligence to analyse current data and predict future.
To Know more: https://goo.gl/zAcnCR
LOAN DEFAULT PREDICTION – A CASE STUDY
Content Covered in this video:
Business Problem & Benefits
The Risk - LOAN DEFAULT PREDICTION
Data Analysis Process
Data Processing
Predictive Analysis Process
Tools & Technology
User analysis is the process by which we track how users engage and interact with our digital product (software or mobile application) in an attempt to derive business insights for marketing, product & development teams.
These insights are then used by teams across the business to launch a new marketing campaign, decide on features to build for an app, track the success of the app by measuring user engagement and improve the experience altogether while helping the business grow.
You are working with the product team of Instagram and the product manager has asked you to provide insights on the questions asked by the management team.
1.Find the 5 oldest users of the Instagram from the database provided
2. Find the users who have never posted a single photo on Instagram
3.Identify the winner of the contest and provide their details to the team
4. Identify and suggest the top 5 most commonly used hashtags on the platform
5. What day of the week do most users register on? Provide insights on when to schedule an ad campaign
6. Provide how many times does average user posts on Instagram. Also, provide the total number of photos on Instagram/total number of users
7. Provide data on users (bots) who have liked every single photo on the site (since any normal user would not be able to do this).
The capstone project is a Machine Learning application that creates a model for a famous bank in New Jersey.
It analyzes their Clients who took loans in their bank based on various parameters.
Machine Learning Project - Default credit card clients Vatsal N Shah
- The model we built here will use all possible factors to predict data on customers to find who are defaulters and non‐defaulters next month.
- The goal is to find the whether the clients are able to pay their next month credit amount.
- Identify some potential customers for the bank who can settle their credit balance.
- To determine if their customers could make the credit card payments on‐time.
- Default is the failure to pay interest or principal on a loan or credit card payment.
The Purpose is to optimize the lead scoring mechanism based on their fit,demographics,behaviors,buying tendency etc. By implementing explicit & Implicit lead scoring modelling with lead point system.
The importance of this type of research in the telecom market is to help companies make more profit.
It has become known that predicting churn is one of the most important sources of income to Telecom companies.
Hence, this research aimed to build a system that predicts the churn of customers i telecom company.
These prediction models need to achieve high AUC values. To test and train the model, the sample data is divided into 70% for training and 30% for testing.
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
This is a project that I worked on as a Capstone for my Masters in Business Analytics program at the University of Cincinnati. In this project, I have performed an end-to-end data mining exercise including data cleaning, distribution analysis, exploratory data analysis, model building etc. to identify and predict Credit Card defaults using Customer's data on past payments and general profile. In the process for building Machine Learning models, I have fit and compared the performance of multiple models and algorithms like Logistic Regreesion, PCA, Classification tree, AdaBoost Classifier, ANN and LDA.
Magnetic Separation of Metallics from Ferrochrome SlagPRABHASH GOKARN
At a Ferroalloy Plant producing High Carbon Ferro Chrome, the slag co-produced is granulated. The separation between slag and metal is not perfect and the granulated slag contains ~1% to 3% of entrapped ferrochrome. Apart from being a loss of valuable Ferro Chrome, local miscreants climb the unstable slag heaps to manually recover and steal the carry over Ferro Chrome granules, which is both a security and safety risk. We have successfully implemented a magnetic separation method for the recovery of metallics from the slag.
60 not out - sixty successful years of continuous ferro alloy making at jodaPRABHASH GOKARN
On 20th April 2018, Tata Steel’s Ferro Alloy Plant at Joda turned sixty. It is India’s oldest continuously operating ferroalloy plant, and one of the oldest continuously operating ferroalloy plants in the world. The Ferro Alloy sector globally, and especially in India, is notoriously short-lived for reasons detailed in the paper. It also elaborates the reasons for the longetivity of FAP Joda.
On 20th April 2018, Tata Steel’s Ferro Alloy Plant at Joda turned sixty. It is India’s oldest continuously operating ferroalloy plant, and one of the oldest continuously operating ferroalloy plants in the world. The plant was set up as a wholly owned subsidiary(Joda Ferro Alloy Pvt Ltd). It was the first assignment of M/s M N Dastur and completed eight months ahead of schedule. This is a booklet published by M/s MN Dastur on the occasion.
SEWAGE AND ITS TREATMENT - Experience from setting up Sewage Treatment Plants PRABHASH GOKARN
Growing population has resulted in a steep increase in demand for freshwater coupled with increased contamination from untreated wastewater. Along with steps taken to clean our polluted rivers and streams, laws for disposal of wastewater are becoming stricter, resulting in an urgent need for setting up facilities for treatment of sewage. There are several treatment options, each with its own set of advantages and disadvantages. Drawing from our experience in setting up and running sewage treatment plants across various locations involving multiple technologies, this paper discusses the major technologies for sewage treatment.
Sewage and its treatment - experience from setting up STPs PRABHASH GOKARN
Growing population has resulted in a steep increase in demand for fresh water coupled with increased contamination from untreated waste water. Along with steps taken to clean our polluted rivers and streams, laws for disposal of waste water are becoming stricter, resulting in an urgent need for setting up facilities for treatment of sewage. There are several treatment options, each with its own set of advantages and disadvantages. Drawing from our experience in setting up and running sewage treatment plants across various locations involving multiple technologies, this paper describes most of the popular technologies adopted for sewage treatment and the possible reasons for their selection.
Pre – fabricated buildings in mining - an environment friendly alternativePRABHASH GOKARN
Pre-fabricated buildings (or simply, pre-fabs), are buildings that are manufactured off-site in advance, usually in standard sections that can be easily shipped, and are assembled at site. There are many advantages of Pre-Fabricated buildings which make it especially suited to Mining Locations. With the improvement in the materials used in making pre-fab buildings, the rising cost of labour, safety & quality concerns, and environmental concerns of construction waste disposal; pre-fab buildings are poised to increase in popularity. This article discusses the experience in making a pre-fabricated office building at a mining location.
Safety Challenges in the Construction of a Large Water Recovery PlantPRABHASH GOKARN
The Ferro Alloy Plant at Joda was commissioned in 1958 and is in continuous operation since. It currently produces 50,000 MTPA of HC Ferro Manganese in two Submerged Arc Furnaces.
Gas Cleaning Plant (GCP) slurry generated in wet venture scrubbers is collected in slurry pits inside the plant for drying and subsequent disposal. Because of space constraints, and in order to recycle the water used in the wet venture scrubbers, Tata Steel is upgrading its GCP slurry handling process by the installation of a GCP slurry dewatering plant.
Construction of large structures within an operating plant, without affecting operations is always a challenge. The job is even more challenging, since the plant is 58 years old, with many unmarked structures, pipes, and cables lying underneath.
This paper discusses how new and unexpected challenges are tackled during the construction of the Slurry Dewatering Plant without compromising on safe working.
Constructing on of India's largest single location Effluent Treatment PlantsPRABHASH GOKARN
Tata Steel operates one of the largest chromite mines in India at the Sukinda Valley in Odisha. The chrome ore produced is subsequently converted it to Ferro Chrome and sold to customers across the world, making Tata Steel one of the top ten Chrome players in the world. A large quantity of water, generated during mining and due to rainfall, needs to be handled during the mining operations. Chrome Ore mainly contains tri-valent chromium oxide and a very small fraction of hexavalent di-chromate. Water coming in contact with chromium ore preferentially leaches out soluble hexavalent chromium from the ore body, as a result, water from the mine contains 0.2 – 4 mg/l of hexavalent chromium against a safe limit of 0.05 mg/l for human consumption; requiring all water to be treated before its release from the mines. Thus, Tata Steel has set up an Effluent Treatment Plant at Sukinda with a capacity of ~108 million litres/day, the largest in the region, and possibly one of the largest single location ETPs in India. This paper discusses how the challenges faced during construction of this Effluent Treatment Plant were successfully tackled.
Brazil's Mining Tragedy : Lessons for the Mining IndustryPRABHASH GOKARN
The Brazilian mining tragedy was an eye-opener for the mining fraternity to introspect on the existing tailing management processes, identify gaps, complete hazard identification and risk assessments, and modify or develop safe operating procedures and emergency preparedness plans in line with the guidelines issued by Statutory Authorities from time to time. This is necessary to avert the occurrence of similar incidents in the future.
Presentation at the 9th WORLD AQUA CONGRESS on 26th-27th Nov 15PRABHASH GOKARN
Tata Steel operates chromite mines at the Sukinda Valley in Odisha producing chrome ore which is subsequently converted it to Ferro Chrome and sold to customers across the world. A large quantity of water, pumped out from the mining pit and due to rainfall, needs to be handled during the mining operations. Chrome Ore mainly contains tri-valent Chromic oxide and a very small fraction of hexavalent di-chromate. Water coming in contact with chromium ore preferentially leaches out soluble hexavalent chromium from the ore body, as a result, water from the mine contains 0.2 – 4 mg/l of hexavalent chromium against a safe limit of 0.005 mg/l for human consumption; requiring all water to be treated before its release from the mines. Thus, Tata Steel is setting up an additional state of art effluent treatment plant at Sukinda with a capacity of 108 million litres/day; one of the largest in the region; which will be completed by Sept 2015. This paper discusses how the technology for the Effluent Treatment Plant was chosen amongst various alternatives, how the capacity of the plant was decided, the challenges during construction of the said Effluent Treatment Plant that were faced, and how these were successfully tackled. The paper also describes how, because the outlet water is of a better quality than the water from the local water body, the outlet water will be used as the input to the Water Treatment Plant, aiding water consumption and lowering operating cost.
Improving Water Quality by Constructing an Effluent Treatment PlantPRABHASH GOKARN
Tata Steel operates chromite mines at the Sukinda Valley in Odisha producing chrome ore which is subsequently converted it to Ferro Chrome and sold to customers across the world. A large quantity of water, pumped out from the mining pit and due to rainfall, needs to be handled during the mining operations. Chrome Ore mainly contains tri-valent Chromic oxide and a very small fraction of hexavalent di-chromate. Water coming in contact with chromium ore preferentially leaches out soluble hexavalent chromium from the ore body, as a result, water from the mine contains 0.2 – 4 mg/l of hexavalent chromium against a safe limit of 0.005 mg/l for human consumption; requiring all water to be treated before its release from the mines. Thus, Tata Steel is setting up an additional state of art effluent treatment plant at Sukinda with a capacity of 108 million litres/day; one of the largest in the region; which will be completed by Sept 2015. This paper discusses how the technology for the Effluent Treatment Plant was chosen amongst various alternatives, how the capacity of the plant was decided, the challenges during construction of the said Effluent Treatment Plant that were faced, and how these were successfully tackled. The paper also describes how, because the outlet water is of a better quality than the water from the local water body, the outlet water will be used as the input to the Water Treatment Plant, aiding water consumption and lowering operating cost.
Project Management Challenges in an Effluent Treatment Plant Construction PRABHASH GOKARN
Tata Steel operates India’s largest chromite mines at the Sukinda Valley in Odisha producing chrome ore which is subsequently converted it to Ferro Chrome and sold to customers across the world. A large quantity of water, generated during mining and due to rainfall, needs to be handled during the mining operations. Chrome Ore mainly contains tri-valent Chromic oxide and a very small fraction of hexavalent di-chromate. Water coming in contact with chromium ore preferentially leaches out soluble hexavalent chromium from the ore body, as a result, water from the mine contains 0.2 – 4 mg/l of hexavalent chromium against a safe limit of 0.005 mg/l for human consumption; requiring all water to be treated before its release from the mines. Thus, Tata Steel is setting up an effluent plant at Sukinda with a capacity of 108 million litres/day; perhaps one of the largest in the region; which will be complete by end June 2015.
Sustainable Development in the Mining Industry - presentation at QCFI BhilaiPRABHASH GOKARN
Mining is the primary method of extraction of minerals needed by man.
The main constraint to sustainability in mining stems from the increasing pollution generated by the extraction process and the large consumption of resources (mostly energy and water) needed in refining of the minerals.
Mining operations are associated with a range of environmental and social impacts, as well as the non-renewable nature of many mined resources. Thus the sustainability of this industry and the efficient use of its resources for development remain crucial.
There is a growing realization globally of the importance of strong and effective legal and regulatory frameworks, policies and practices for the mining sector that deliver economic and social benefits.
It is possible for mining to contribute to sustainable development through:
• Enhancing the benefits while mitigating the negative impacts both when mining is taking place and subsequently as well (by scientific mine closure)
• Improving stakeholder participation in the management of the resources including local and indigenous communities
• Addressing the environmental, economic, health and social impacts and benefits of mining throughout their life cycle, including workers' health and safety.
There must be a balance between contrasting claims of sustainability for the future versus current economic benefit from mining. We must :
• Stay within the capacity of ecosystems to absorb change.
• Provide an adequate standard of living for those in the area of influence.
• Create wealth for development of society & provide for the development of advanced technology.
• Develop systems of governance which promote and sustain these goals.
Brand Identity of Alloys by Innovative Packaging - Design HonourPRABHASH GOKARN
Reinforcing brand identity & product superiority and facilitating ease of use of ferroalloys from Tata Steel through innovative product packaging, whose components include :
a) Brand Name that enhances Product Recall
b) Stylized Logotype to indicate product name integrated with a graphical representation of molten alloy to communicate product usage and generate instant buyer involvement.
c) Bagging solution incorporating security, traceability and “easy to use” features
d) Bright, bold & visible text ensuring availability of all needed information at place of use,
e) Innovative, neat, white HDPE bag with brightly coloured “happy” markings emotionally connecting with customer in the usually drab & grey industrial environment
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found