IQSS Presentation to Program in Health PolicyPresentation Transcript
Research Technology ConsultingSimo GoshevAlex StorerSteve WorthingtonIsta Zahnsupport@help.hmdc.harvard.eduhttp://rtc.iq.harvard.edu
Consulting Goals Data analysis support and programming services Research project planning and guidance selecting appropriate technology for research projects Facilitating appropriate organization, storage and sharing of data Training on the use of both established software packages and emerging tools
Scope Free! Support the entire social science community Consults measured in hours rather than weeks or months Currently doing outreach to departments, student groups and centers Drop-ins on Fridays at 1pm in the training lab, Appointments, Help Tickets and casual chats in K306
Who WeScope Are
Simo Goshev BA – Sofia, Bulgaria Applied Econometrics MS – McMaster University Statistics PhD – McMaster University EconomicsAnalysis: Tools: Econometrics Mainly Stata Applied Microeconometrics Some R Panel Data Applied statistics
Help with econometrics What model is most suitable for my data on hospital IT innovation? I am looking at HIV in children. Can you help me design an overlapping generations model? Why are the confidence intervals of my spline of health care spending so wide/narrow? Could the interaction between an exogenous and endogenous variable be exogenous? I am looking for a way to compare survival between two cancer management programs. Can you help me?
Help with computation/estimation I am trying to estimate a model but for some reason the routine fails. Could you have a look at my script ? I am working with a large dataset and my machine is giving up on me. Do you have any suggestions? Which routine is best for…?
Replication study in health economics•Graduate Student •Make sense of a study and Stata code 1 1 .8 .8 .6 .6 .4 .4 .2 .2 65 70 75 80 65 70 75 80
Predictors of hospital IT adoption•Graduate Student, School of Public •Understand what factors facilitate/hinder Health adoption of IT in US hospitals Data: Sample of hospitals clustered within states Count of IT’s adopted by a hospital in 3 consecutive years Modeling strategy: Three-level mixed effects model
Alex Storer BS,BA - UC Berkeley Electrical Engineering & Computer Science, Cognitive Science PhD – Boston University Cognitive & Neural SystemsAnalysis: Tools: Machine Learning Matlab, R, Python Signal Processing Emacs, LaTeX, Linux Surface Based Techniques Simulation Optimization
Text Analysis Topic Models Large corpus Prevalenc e of Sentiment certain terms
Text Analysis Twitter: #obamacare Positive/Ne gative Opinions?
Text Analysis Distinct Content Groupings Congress Speeches
Text Analysis NY Times Archive Term: "Medicare"
Text Analysis Topic Models What models are appropriate to perform our analysis? What software is appropriate? Prevalenc e of Sentiment certain terms
Text Analysis Where do we obtain this corpus? How do we pre-process it so we can analyze it? Large corpus
Federal Procurement Database
Federal Procurement Database Only first 500 hits, only a few columns All of the data, but…
Federal Procurement Database Download atom feeds Parse XML Tree structure Python! Search for union of entries Output as CSVFor 20gb of data, there is no way to download by hand…
Steve Worthington BA / MS – Durham, UK Anthropology & Archeology PhD – NYU Biological AnthropologyAnalysis: Tools: Linear models (OLS, GLS, PLS, etc.) Mainly R Resampling (permutation, bootstrap) Some SAS, SPSS Ordination (PCA, LDA, CVA, etc.)
Cleaning / reshaping data•Department of •171 files, 3 types (2 ascii •Parse messy data Economics text, 1 binary) into a long-format Stata•Daily Lat/Long data on •One file for each year data frame rainfall in India (1951 – (containing 365 daily 2007) matrices) June 21st 2007
Cleaning / reshaping data• No common delimiter (spaces and tabs)• Use regexp to parse each datum• Use template to place each datum into correct row/column Template
Cleaning / reshaping data Long format data frame in Stata Rainfall for each day and lat/long
Rainfall / CEO movie
Rainfall / CEO movie
Geospatial Analysis in R Spatial prediction: interpolation of data points Spatial autocorrelation analysis Drug resistant TB Moldova
Ista Zahn BS – University of Oregon Psychology PhD (ABD) – University of Rochester Social PsychologyAnalysis: Tools: Regression R, Stata, SAS, SPSS Mixed Models Emacs, LaTeX, Linux Scale Development
Workshops(schedule at http://rtc.iq.harvard.edu)
IQSS Services THE INSTITUTE FOR Quantitative Social Science at Harvard University