SlideShare a Scribd company logo
1 of 3
Alight
Technical Report
Introduction
Data discovery means that we already have some understanding of a phenomenon (ie.
smoking); we obtained data on factors which we think contributes to this phenomenon.
Since temporal correlation is difficult to establish with complex phenomenon such as
smoking, we have to use mathematical means of discovering which of these factors actually
affect the occurrence of the phenomenon.
Taking the relationship between income and education, it is widely believed that higher
education leads to higher income; however individuals cannot easily translate this trend
into their personal life. For example, someone from a low income family who wishes to
improve his financial future would know the importance of education, but he cannot act on
this information. However, if researchers include other confounding factors related to
education, such as knowledge of available funding sources or friends who have attended
higher education, then that person can take active steps either to find more information
about funding sources or connect with the right peers.
We are trying to change the current paradigm of smoking research, which is similar to the
income and education situation described above, into a personalized one in which the
findings can affect smokers personally.
DataCollection
We will use two main data sources for our data discovery, data mining, and predictive
analytics. First, date, time and location data will be generated by the user when they light a
cigarette. Secondly, descriptive categorical data such as age, sex, income status, place of
residence, etc. will be collected when the user creates a profile on our online interface.
Lastly, we have the option of creating additional surveys that the users will fill on our
website in case there are specific questions researchers want to ask; for example, the user’s
smoking reduction goals.
StatisticalMethods
Data discovery
One of the major tools used in statistical studies is regression analysis. Regression gives a
mathematical formula to describe the relationship between different factors. Among these
factors, independent variable is the phenomena we are trying to describe using the
formula, while dependent variables are factors we think affect the outcome. The
independent variable will be individual smoking incidence, and the dependent variables
are: time, GPS location, demographics, and any other data that can be obtained from
surveys. The regression will take the form of:
π‘†π‘šπ‘œπ‘˜π‘’(𝑦𝑒𝑠 π‘›π‘œ) = πΏπ‘œπ‘π‘Žπ‘‘π‘–π‘œπ‘› + π‘‡π‘–π‘šπ‘’ π‘œπ‘“ π‘‘π‘Žπ‘¦ + π·π‘Žπ‘‘π‘’ + π·π‘’π‘šπ‘œπ‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘π‘ ( π‘Žπ‘”π‘’, 𝑠𝑒π‘₯, 𝑒𝑑𝑐) + π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿβ„
Alight
To be more specific on the math, we are using Probit and Poisson models which give
precise probability of an event occurring or not (i.e. smoking). Probit will give the
probability of a user lighting a cigarette, while Poisson regressions will give the probability
of how many cigarettes a user smokes in one day.
The way that we measure the accuracy of such probabilities is testing whether our result
was obtained purely by chance (i.e. false positive). Imagine every time a person lights a
cigarette, he always has a cup of coffee; it could be the case that coffee triggers him to
smoke, or it is purely due to chance that he happened to have a cup of coffee when he
smokes. The way of discerning if coffee is the culprit is to calculate its statistical
significance (i.e. p-value).
Data mining
Smoking involves subtle and often inconspicuous influences such as seeing another person
smoke or passing by a convenience store. Data mining techniques allow us to discover the
hidden relationship among unlikely agents that might affect smoking.
We believe that the overlooked aspects of smoking are: where you are, and what is around
you. That is to say, the GPS information we obtain can derive additional benefits. We will
compare the user’s location data with publicly available geo-spatial data, such as locations
of businesses (e.g. coffee shops, convenience stores), weather conditions, traffic conditions
or other smokes in the vicinity.
Two data algorithms exist for such analysis: clustering and associative rule learning, and
both algorithms do not require a researcher to pre-define any set of rules (such as what we
would do with regression analysis).
A clustering algorithm measures the distance between each data point and automatically
creates rules to define the β€œcluster” (i.e. classification) each data point belongs to. For
example, given a large data pool, we can build β€œsmoking clusters” without human errors
which often happen with large data pools. The associative rule learning algorithm creates
rules that define the probability of an event occurring given the concurrence of a fixed
basket of events. In this case, we can measure how many concurrent events (e.g. number of
convenience stores; number of adjacent smokers) does it take for someone to smoke.
Factors unearthed by data mining are reintroduced to the regression analysis in order to
increase the value derived from existing data. The resultant model will describe smoking
more accurately, so that researchers and policy-makers can understand and modify
smoking behaviour.
Prediction andForecasting
The better a statistical model becomes, the more accurately it describes the relationship
between the outcome and other associated factors. But, regardless of how well the model
described the data from the past, it is hard to assess the predictive power of our model.
The best data scientists can do is to randomly separate the original data into training,
validation, and testing groups. The model creation only takes data from the training group,
Alight
and the validation data group is used to refine the performance of the model. Lastly, the
model’s prediction is matched with data from the testing group, and the predictive
performance is determined based on the difference between the model’s predictions and
the data values from the testing group.
For instance, we have 100 data points on smokers’ location. We use 60 data points to create
the model, 10 data points to validate, and we generate 30 data points from this model.
Finally, the 30 forecasted data points are compared with the unused (testing) data group. If
25 of the predicted data points match exactly with the 30 data points in the original data
pool, the predictive power of the model is 83%.
FuturePossibilities
The goal of smoking research should be tied to the health outcomes of the general
population and it should not be isolated from the other wealth of health-related data.
Currently, it is very difficult for researchers from different fields to pool their separate data
together. Part of this difficulty arises from lack of a unique identifier for convergent data.
That is to say, researcher A collects 100 observations on smoking data and researcher B
collects 100 observations on blood pressure data; it is impossible for both researchers to
know which observations came from the same data source. Hence, the researchers can
conduct cross data-source research.
Our solution for the data sharing problem is to endorse Apple’s newly announced
ResearchKit. Apple has created a common platform for medical researchers to have easy
access (fingerprint ID approval) to medical information attributed to unique identifiers
(iPhone users).
Version 2.0 of Alight is to be paired via Bluetooth to mobile phones and to have a dedicated
app on the Apple ResearchKit platform. This means researchers at CAMH will have access
to other health-related data without having to run another primary research.

More Related Content

Similar to Discover Factors That Affect Smoking With Data Analytics

data science course with placement in hyderabad
data science course with placement in hyderabaddata science course with placement in hyderabad
data science course with placement in hyderabadmaneesha2312
Β 
Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018Carlos Rodarte
Β 
Unlocking Hidden Insights for Pharma with Social Media Listening
Unlocking Hidden Insights for Pharma with Social Media ListeningUnlocking Hidden Insights for Pharma with Social Media Listening
Unlocking Hidden Insights for Pharma with Social Media ListeningRNayak3
Β 
Machine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesMachine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesCovance
Β 
Business stats assignment
Business stats assignmentBusiness stats assignment
Business stats assignmentInfosys
Β 
Statistics in different fields of life
Statistics in different fields of lifeStatistics in different fields of life
Statistics in different fields of lifesyedmehran6
Β 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...ijaia
Β 
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...gerogepatton
Β 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...gerogepatton
Β 
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...homeworkping3
Β 
Data science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptxData science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptxArpitaDebnath20
Β 
0314policyforumff
0314policyforumff0314policyforumff
0314policyforumffMujtaba Tahir
Β 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12Bayesia USA
Β 
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdHealthcare consultant
Β 
Using Big Data to Drive Diabetes Management and Care
Using Big Data to Drive Diabetes Management and CareUsing Big Data to Drive Diabetes Management and Care
Using Big Data to Drive Diabetes Management and CareEMMAIntl
Β 
Introduction to machine_learning_us
Introduction to machine_learning_usIntroduction to machine_learning_us
Introduction to machine_learning_usAnasua Sarkar
Β 
H0333039042
H0333039042H0333039042
H0333039042theijes
Β 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBig Data Week
Β 

Similar to Discover Factors That Affect Smoking With Data Analytics (20)

data science course with placement in hyderabad
data science course with placement in hyderabaddata science course with placement in hyderabad
data science course with placement in hyderabad
Β 
Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018Volar Health PharmaVOICE Blogs 2018
Volar Health PharmaVOICE Blogs 2018
Β 
Unlocking Hidden Insights for Pharma with Social Media Listening
Unlocking Hidden Insights for Pharma with Social Media ListeningUnlocking Hidden Insights for Pharma with Social Media Listening
Unlocking Hidden Insights for Pharma with Social Media Listening
Β 
Machine Learning and the Value of Health Technologies
Machine Learning and the Value of Health TechnologiesMachine Learning and the Value of Health Technologies
Machine Learning and the Value of Health Technologies
Β 
Business stats assignment
Business stats assignmentBusiness stats assignment
Business stats assignment
Β 
Statistics in different fields of life
Statistics in different fields of lifeStatistics in different fields of life
Statistics in different fields of life
Β 
IOS
IOSIOS
IOS
Β 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
Β 
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Β 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
Β 
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...
59172888 introduction-to-statistics-independent-study-requirements-2nd-sem-20...
Β 
Ο΅-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
Ο΅-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...Ο΅-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
Ο΅-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
Β 
Data science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptxData science in healthcare-Assignment 2.pptx
Data science in healthcare-Assignment 2.pptx
Β 
0314policyforumff
0314policyforumff0314policyforumff
0314policyforumff
Β 
Impact Analysis V12
Impact Analysis V12Impact Analysis V12
Impact Analysis V12
Β 
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan PhdSMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
SMART HEALTH PREDICTION USING DATA MINING by Dr.Mahboob Khan Phd
Β 
Using Big Data to Drive Diabetes Management and Care
Using Big Data to Drive Diabetes Management and CareUsing Big Data to Drive Diabetes Management and Care
Using Big Data to Drive Diabetes Management and Care
Β 
Introduction to machine_learning_us
Introduction to machine_learning_usIntroduction to machine_learning_us
Introduction to machine_learning_us
Β 
H0333039042
H0333039042H0333039042
H0333039042
Β 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
Β 

More from Peter Zhang

Info Slides-Final Version
Info Slides-Final VersionInfo Slides-Final Version
Info Slides-Final VersionPeter Zhang
Β 
ENSEMBLE SUMMARY
ENSEMBLE SUMMARYENSEMBLE SUMMARY
ENSEMBLE SUMMARYPeter Zhang
Β 
Social Network in Argentina
Social Network in ArgentinaSocial Network in Argentina
Social Network in ArgentinaPeter Zhang
Β 
Report edited
Report editedReport edited
Report editedPeter Zhang
Β 
Presentation
PresentationPresentation
PresentationPeter Zhang
Β 
Industry Analysis
Industry AnalysisIndustry Analysis
Industry AnalysisPeter Zhang
Β 
Heamatology
HeamatologyHeamatology
HeamatologyPeter Zhang
Β 
Healthcare Benchmarking Report
Healthcare Benchmarking ReportHealthcare Benchmarking Report
Healthcare Benchmarking ReportPeter Zhang
Β 
Final Reading for Pleasure
Final Reading for PleasureFinal Reading for Pleasure
Final Reading for PleasurePeter Zhang
Β 
Deloitte Report
Deloitte ReportDeloitte Report
Deloitte ReportPeter Zhang
Β 
Assignment 2-Ereader ownership
Assignment 2-Ereader ownershipAssignment 2-Ereader ownership
Assignment 2-Ereader ownershipPeter Zhang
Β 
HSFR & Cancer Surgery Program
HSFR & Cancer Surgery ProgramHSFR & Cancer Surgery Program
HSFR & Cancer Surgery ProgramPeter Zhang
Β 
Stars in Global Health Grant Proposal Version I (2)
Stars in Global Health Grant Proposal Version I (2)Stars in Global Health Grant Proposal Version I (2)
Stars in Global Health Grant Proposal Version I (2)Peter Zhang
Β 
Alight Business Plan
Alight Business PlanAlight Business Plan
Alight Business PlanPeter Zhang
Β 

More from Peter Zhang (15)

Info Slides-Final Version
Info Slides-Final VersionInfo Slides-Final Version
Info Slides-Final Version
Β 
ENSEMBLE SUMMARY
ENSEMBLE SUMMARYENSEMBLE SUMMARY
ENSEMBLE SUMMARY
Β 
Binder1
Binder1Binder1
Binder1
Β 
Social Network in Argentina
Social Network in ArgentinaSocial Network in Argentina
Social Network in Argentina
Β 
Report edited
Report editedReport edited
Report edited
Β 
Presentation
PresentationPresentation
Presentation
Β 
Industry Analysis
Industry AnalysisIndustry Analysis
Industry Analysis
Β 
Heamatology
HeamatologyHeamatology
Heamatology
Β 
Healthcare Benchmarking Report
Healthcare Benchmarking ReportHealthcare Benchmarking Report
Healthcare Benchmarking Report
Β 
Final Reading for Pleasure
Final Reading for PleasureFinal Reading for Pleasure
Final Reading for Pleasure
Β 
Deloitte Report
Deloitte ReportDeloitte Report
Deloitte Report
Β 
Assignment 2-Ereader ownership
Assignment 2-Ereader ownershipAssignment 2-Ereader ownership
Assignment 2-Ereader ownership
Β 
HSFR & Cancer Surgery Program
HSFR & Cancer Surgery ProgramHSFR & Cancer Surgery Program
HSFR & Cancer Surgery Program
Β 
Stars in Global Health Grant Proposal Version I (2)
Stars in Global Health Grant Proposal Version I (2)Stars in Global Health Grant Proposal Version I (2)
Stars in Global Health Grant Proposal Version I (2)
Β 
Alight Business Plan
Alight Business PlanAlight Business Plan
Alight Business Plan
Β 

Discover Factors That Affect Smoking With Data Analytics

  • 1. Alight Technical Report Introduction Data discovery means that we already have some understanding of a phenomenon (ie. smoking); we obtained data on factors which we think contributes to this phenomenon. Since temporal correlation is difficult to establish with complex phenomenon such as smoking, we have to use mathematical means of discovering which of these factors actually affect the occurrence of the phenomenon. Taking the relationship between income and education, it is widely believed that higher education leads to higher income; however individuals cannot easily translate this trend into their personal life. For example, someone from a low income family who wishes to improve his financial future would know the importance of education, but he cannot act on this information. However, if researchers include other confounding factors related to education, such as knowledge of available funding sources or friends who have attended higher education, then that person can take active steps either to find more information about funding sources or connect with the right peers. We are trying to change the current paradigm of smoking research, which is similar to the income and education situation described above, into a personalized one in which the findings can affect smokers personally. DataCollection We will use two main data sources for our data discovery, data mining, and predictive analytics. First, date, time and location data will be generated by the user when they light a cigarette. Secondly, descriptive categorical data such as age, sex, income status, place of residence, etc. will be collected when the user creates a profile on our online interface. Lastly, we have the option of creating additional surveys that the users will fill on our website in case there are specific questions researchers want to ask; for example, the user’s smoking reduction goals. StatisticalMethods Data discovery One of the major tools used in statistical studies is regression analysis. Regression gives a mathematical formula to describe the relationship between different factors. Among these factors, independent variable is the phenomena we are trying to describe using the formula, while dependent variables are factors we think affect the outcome. The independent variable will be individual smoking incidence, and the dependent variables are: time, GPS location, demographics, and any other data that can be obtained from surveys. The regression will take the form of: π‘†π‘šπ‘œπ‘˜π‘’(𝑦𝑒𝑠 π‘›π‘œ) = πΏπ‘œπ‘π‘Žπ‘‘π‘–π‘œπ‘› + π‘‡π‘–π‘šπ‘’ π‘œπ‘“ π‘‘π‘Žπ‘¦ + π·π‘Žπ‘‘π‘’ + π·π‘’π‘šπ‘œπ‘”π‘Ÿπ‘Žπ‘β„Žπ‘–π‘π‘ ( π‘Žπ‘”π‘’, 𝑠𝑒π‘₯, 𝑒𝑑𝑐) + π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿβ„
  • 2. Alight To be more specific on the math, we are using Probit and Poisson models which give precise probability of an event occurring or not (i.e. smoking). Probit will give the probability of a user lighting a cigarette, while Poisson regressions will give the probability of how many cigarettes a user smokes in one day. The way that we measure the accuracy of such probabilities is testing whether our result was obtained purely by chance (i.e. false positive). Imagine every time a person lights a cigarette, he always has a cup of coffee; it could be the case that coffee triggers him to smoke, or it is purely due to chance that he happened to have a cup of coffee when he smokes. The way of discerning if coffee is the culprit is to calculate its statistical significance (i.e. p-value). Data mining Smoking involves subtle and often inconspicuous influences such as seeing another person smoke or passing by a convenience store. Data mining techniques allow us to discover the hidden relationship among unlikely agents that might affect smoking. We believe that the overlooked aspects of smoking are: where you are, and what is around you. That is to say, the GPS information we obtain can derive additional benefits. We will compare the user’s location data with publicly available geo-spatial data, such as locations of businesses (e.g. coffee shops, convenience stores), weather conditions, traffic conditions or other smokes in the vicinity. Two data algorithms exist for such analysis: clustering and associative rule learning, and both algorithms do not require a researcher to pre-define any set of rules (such as what we would do with regression analysis). A clustering algorithm measures the distance between each data point and automatically creates rules to define the β€œcluster” (i.e. classification) each data point belongs to. For example, given a large data pool, we can build β€œsmoking clusters” without human errors which often happen with large data pools. The associative rule learning algorithm creates rules that define the probability of an event occurring given the concurrence of a fixed basket of events. In this case, we can measure how many concurrent events (e.g. number of convenience stores; number of adjacent smokers) does it take for someone to smoke. Factors unearthed by data mining are reintroduced to the regression analysis in order to increase the value derived from existing data. The resultant model will describe smoking more accurately, so that researchers and policy-makers can understand and modify smoking behaviour. Prediction andForecasting The better a statistical model becomes, the more accurately it describes the relationship between the outcome and other associated factors. But, regardless of how well the model described the data from the past, it is hard to assess the predictive power of our model. The best data scientists can do is to randomly separate the original data into training, validation, and testing groups. The model creation only takes data from the training group,
  • 3. Alight and the validation data group is used to refine the performance of the model. Lastly, the model’s prediction is matched with data from the testing group, and the predictive performance is determined based on the difference between the model’s predictions and the data values from the testing group. For instance, we have 100 data points on smokers’ location. We use 60 data points to create the model, 10 data points to validate, and we generate 30 data points from this model. Finally, the 30 forecasted data points are compared with the unused (testing) data group. If 25 of the predicted data points match exactly with the 30 data points in the original data pool, the predictive power of the model is 83%. FuturePossibilities The goal of smoking research should be tied to the health outcomes of the general population and it should not be isolated from the other wealth of health-related data. Currently, it is very difficult for researchers from different fields to pool their separate data together. Part of this difficulty arises from lack of a unique identifier for convergent data. That is to say, researcher A collects 100 observations on smoking data and researcher B collects 100 observations on blood pressure data; it is impossible for both researchers to know which observations came from the same data source. Hence, the researchers can conduct cross data-source research. Our solution for the data sharing problem is to endorse Apple’s newly announced ResearchKit. Apple has created a common platform for medical researchers to have easy access (fingerprint ID approval) to medical information attributed to unique identifiers (iPhone users). Version 2.0 of Alight is to be paired via Bluetooth to mobile phones and to have a dedicated app on the Apple ResearchKit platform. This means researchers at CAMH will have access to other health-related data without having to run another primary research.