this presentation is an introduction to R programming language.we will talk about usage, history, data structure and feathers of R programming language.
A presentation on the history, design, and use of R. The talk will focus on companies that use and support R, use cases, where it is going, competitors, advantages and disadvantages, and resources to learn more about R. Speaker Bio
Joseph Kambourakis has been the Lead Data Science Instructor at EMC for over two years. He has taught in eight countries and been interviewed by Japanese and Saudi Arabian media about his expertise in Data Science. He holds a Bachelors in Electrical and Computer Engineering from Worcester Polytechnic Institute and an MBA from Bentley University with a concentration in Business Analytics.
This short text will get you up to speed in no time on creating visualizations using R's ggplot2 package. It was developed as part of a training to those who had no prior experience in R and had limited knowledge on general programming concepts. It's a must have initial guide for those exploring the field of Data Science
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
this presentation is an introduction to R programming language.we will talk about usage, history, data structure and feathers of R programming language.
A presentation on the history, design, and use of R. The talk will focus on companies that use and support R, use cases, where it is going, competitors, advantages and disadvantages, and resources to learn more about R. Speaker Bio
Joseph Kambourakis has been the Lead Data Science Instructor at EMC for over two years. He has taught in eight countries and been interviewed by Japanese and Saudi Arabian media about his expertise in Data Science. He holds a Bachelors in Electrical and Computer Engineering from Worcester Polytechnic Institute and an MBA from Bentley University with a concentration in Business Analytics.
This short text will get you up to speed in no time on creating visualizations using R's ggplot2 package. It was developed as part of a training to those who had no prior experience in R and had limited knowledge on general programming concepts. It's a must have initial guide for those exploring the field of Data Science
A short tutorial on R, basically for a starter who wants to do data mining especially text data mining.
Related codes and data will be found at the following lnik: http://textanalytics.in/wm/R%20tutorial%20(DATA2014).zip
It is one of the Best Presentation on the topic "R Programming" having interesting Slides consisting of Amazing Images & Very Useful Information. It also have Transitions & Animation which makes the Presentation more Interesting & Attractive.
Created By - Abhishek Pratap Singh (Aps)
The R language is a project designed to create a free, open source language which can be used as a replacement for the S-PLUS language, originally developed as the S language at AT&T Bell Labs, and currently marketed by Insightful Corporation of Seattle, Washington. R is an open source implementation of S, and differs from S-plus largely in its command-line only format.
Topics Covered:
1.Introduction to R
2.Installing R
3.Why Learn R
4.The R Console
5.Basic Arithmetic and Objects
6.Program Example
7.Programming with Big Data in R
8.Big Data Strategies in R
9.Applications of R Programming
10.Companies Using R
11.What R is not so good at
12.Conclusion
Basic tutorial for R programming. this video contains lot of information about r programming like
agenda
history
SOFTWARE PARADIGM
R interface
advantages of r
drawbacks of r
R is a programming language and software environment for statistical analysis, graphics representation and reporting. Are You Interested to Learning R Programming in Best Institute Join Besant Technologies in Bangalore.
Very brief introduction to R software that I have presented at UNISZA. No R codes and No Statistical Contents. Basically for those who just heard about R software for the first time
Computational Biology and BioinformaticsSharif Shuvo
Computational Biology and Bioinformatics is a rapidly developing multi-disciplinary field. The systematic achievement of data made possible by genomics and proteomics technologies has created a tremendous gap between available data and their biological interpretation.
It is one of the Best Presentation on the topic "R Programming" having interesting Slides consisting of Amazing Images & Very Useful Information. It also have Transitions & Animation which makes the Presentation more Interesting & Attractive.
Created By - Abhishek Pratap Singh (Aps)
The R language is a project designed to create a free, open source language which can be used as a replacement for the S-PLUS language, originally developed as the S language at AT&T Bell Labs, and currently marketed by Insightful Corporation of Seattle, Washington. R is an open source implementation of S, and differs from S-plus largely in its command-line only format.
Topics Covered:
1.Introduction to R
2.Installing R
3.Why Learn R
4.The R Console
5.Basic Arithmetic and Objects
6.Program Example
7.Programming with Big Data in R
8.Big Data Strategies in R
9.Applications of R Programming
10.Companies Using R
11.What R is not so good at
12.Conclusion
Basic tutorial for R programming. this video contains lot of information about r programming like
agenda
history
SOFTWARE PARADIGM
R interface
advantages of r
drawbacks of r
R is a programming language and software environment for statistical analysis, graphics representation and reporting. Are You Interested to Learning R Programming in Best Institute Join Besant Technologies in Bangalore.
Very brief introduction to R software that I have presented at UNISZA. No R codes and No Statistical Contents. Basically for those who just heard about R software for the first time
Computational Biology and BioinformaticsSharif Shuvo
Computational Biology and Bioinformatics is a rapidly developing multi-disciplinary field. The systematic achievement of data made possible by genomics and proteomics technologies has created a tremendous gap between available data and their biological interpretation.
Journal of Computational Systems Biology (JCSB) is an open access online journal which aims to publish peer reviewed research articles and short communications in all aspects of computational biology and bioinformatics. JCSB comprehend the broad spectrum of computational bioscience including biological databases and bioalgorithms.
Im Kontext von IoT spielt die Gewinnung und Verarbeitung von großen Datenmengen, z.B. von Sensoren eine große Rolle. Die Rohdaten alleine machen aber noch lange keine smarten Systeme. Aus Daten werden Informationen aus Informationen wird Wissen und aus Wissen resultieren Entscheidungen - im besten Fall. Neben der technischen Herausforderungen im Umgang mit BigData rückt die „schlaue Auswertung" derselben (Digitale Analyse) immer mehr in den Vordergrund und zeigt die Grenzen des Könnens vieler Unternehmen auf. Kein Wunder also, dass dem Berufsbild des Data Scientisten eine wachsende Bedeutung zukommt. Nicht umsonst benannte das Harvard Business Review diesen als „The sexiest job of the 21st Century“.
Die Digital Analytics Assocations e.V. (DAA) treibt gezielt Fach- und Führungskräfte sowie Unternehmen die Professionalisierung von Digitalen Analysten und Data Scientists voran.
Frank Pörschmann, Mitglied des Vorstands des DAA e.V., erzählt in diesem Vortrag etwas über
- den Unterschied zwischen BigData, SmartData und Data Analytics
- Datenökonomie
- das Berufsbild des Data Scientist / Digitalen Analysten
- Aus- und Fortbildungsmöglichkeiten
Tutorial 1: Your First Science App - Araport Developer WorkshopVivek Krishnakumar
Slide deck pertaining to Tutorial 1 of the Araport Developer Workshop conducted at TACC, Austin TX on November 5, 2014.
Presented by Vivek Krishnakumar
Apps for Science - Elsevier Developer Network Workshop 201102remko caprio
This presentation is an introduction into programming OpenSocial Gadgets for Science.
1. overview of apps
2. social networks
3. opensocial
4. SciVerse Platform
5. SciVerse APIs
6. Coding OpenSocial Gadgets for SciVerse
7. Resources
In this presentation its given an introduction about Data Science, Data Scientist role and features, and how Python ecosystem provides great tools for Data Science process (Obtain, Scrub, Explore, Model, Interpret).
For that, an attached IPython Notebook ( http://bit.ly/python4datascience_nb ) exemplifies the full process of a corporate network analysis, using Pandas, Matplotlib, Scikit-learn, Numpy and Scipy.
Lawrence berkeley national laboratory sep 2015 - Jupyter Talk
Scientific facilities are increasingly generating large data sets. Next-generation scientific productivity relies on user-friendly tools and efficient, effective and seamless access to resources and data. Traditional approaches to research and software development for science focus on the hardware and software of the machine and do not consider the user. In this talk, I will highlight a different approach to building software for scientific users by including user knowledge in the process. I will illustrate a few example projects where this has been used to date.
GIthub repository: https://github.com/Carreau/talks/tree/master/labtech-2015
Welche Karrierechancen bietet eine Werbeagentur für Screendesigner, Programmierer und Co.? Und was macht ein Webworker eigentlich den ganzen Tag? Erste Eindrücke gibt dieser Kurzvortrag!
Do you know what k-Means? Cluster-Analysen Harald Erb
Cluster-Analysen sind heute "Brot und Butter"-Analysetechniken mit Verfahren, die zur Entdeckung von Ähnlichkeitsstrukturen in (großen) Datenbeständen genutzt werden, mit dem Ziel neue Gruppen in den Daten zu identifizieren. Der K-Means-Algorithmus ist dabei einer der einfachsten und bekanntesten unüberwachten Lernverfahren, das in verschiedenen Machine Learning Aufgabenstellung einsetzbar ist. Zum Beispiel können abnormale Datenpunkte innerhalb eines großen Data Sets gefunden, Textdokumente oder Kunden¬segmente geclustert werden. Bei Datenanalysen kann die Anwendung von Cluster-Verfahren ein guter Einstieg sein bevor andere Klassifikations- oder Regressionsmethoden zum Einsatz kommen.
In diesem Talk wird der K-Means Algorithmus samt Erweiterungen und Varianten nicht im Detail betrachtet und ist stattdessen eher als ein Platzhalter für andere Advanced Analytics-Verfahren zu verstehen, die heute „intelligente“ Bestandteile in modernen Softwarelösungen sind bzw. damit kombiniert werden können. Anhand von zwei Kurzbeispielen wird live gezeigt: (1) Identifizierung von Kunden-Cluster mit einem Big Data Discovery Tool und Python (Jupyter Notebook) und (2) die Realisierung einer Anomalieerkennung direkt im Echtzeitdatenstrom mit einer Stream Analytics Lösung von Oracle.
Analytics meets Big Data – R/Python auf der Hadoop/Spark-PlattformRising Media Ltd.
Big Data verändert nicht nur die Unternehmens-IT fundamental, sondern auch die Arbeit des Analysten. Die klassischen Analysten sehen sich im Zuge des Wandels zu einer datengetriebenen Unternehmenskultur mit neuen Anforderungen und ungewohnten technologischen Plattformen konfrontiert. Sie müssen als Data Scientist fachliche Fragestellungen unter dem Aspekt der Big Data-Technologien umsetzen, visualisieren und aus den Daten Werte generieren. Anhand eines konkreten Use Cases, der Programmierung eines Recommender-Systems, zeigen wir Ansätze, wie sich die gewohnten Vorgehensweisen und Werkzeuge eines Analysten (namentlich R und Python) mit einer Big Data-Technologie (Spark) kombinieren lassen. Ziel ist es, dem Analysten den Einstieg in die Big Data-Welt zu erleichtern. Wir demonstrieren die Arbeit mit diesem Toolset an anschaulichen Beispielen in einem interaktiven Workshop-Format und laden zur Diskussion und Nachahmung dieser Vorgehensweise ein. Der Workshop richtet sich an Teilnehmer mit Grundkenntnissen aus den Bereichen analytische Methoden und Machine Learning sowie R oder Python. Der Workshop wird auf der Spark-Plattform durchgeführt. Zu Spark werden keine Kenntnisse vorausgesetzt.
Mathai Joseph, Advisor, Tata Consultancy Service discusses about Alan Turing at the Grand Launch of Alan Turing Centenary Celebrations at Persistent Systems
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
How Much Do Data Scientists Make?
The demand and salary for data scientists tend to be higher than most other ITES jobs. Experience is one of the key factors in determining the salary range of a data science professional.
According to Glassdoor, a Data Scientist in the United States earns an annual average of USD 117,212, and the same site reports that Data Scientists in India make a yearly average of ₹1,000,000.
Data Scientist Career Path
Data Science is currently considered one of the most lucrative careers available. Companies across all major industries/sectors have data scientist requirements to help them gain valuable insights from big data. There is a sharp growth in demand for highly skilled data science professionals who can straddle the business and IT worlds.
The career path to becoming a data scientist isn’t clearly defined since this is a relatively new profession. People from different backgrounds like mathematics, statistics, computer science or economics, end up in data science.
The major designations for data science professionals are:
Data Analyst
Data Scientist (entry-level)
Associate data scientist
Data Scientist (senior-level)
Product Manager
Lead data scientist
Director/VP/SVP
That was all about Data Scientist Job Description.
Become a Data Scientist Today!
In this write-up, we covered the Data Scientist job description in detail. Irrespective of which location you are in, there is no dearth of jobs for skillful data scientists. A career in data science is a rewarding journey to embark on, especially in the finance, retail, and e-commerce sectors. Jobs are also available with Government departments, universities and research institutes, telecoms, transports, the list goes on.
This video covers
Introductory Questions
Data Science Introduction
Data Science Technical Interview QnA :
#Excel
#SQL
#Python3
#MachineLearning
#DataAnalyticstechnical Interview
#DataScienceProjects
#coder #statistics #datamining #dataanalyst #code #engineering #linux #codinglife #cloudcomputing #businessintelligence #robotics #softwaredeveloper #automation #cloud #neuralnetworks #sql #science #softwareengineer #digitaltransformation #computer #daysofcode #coders #bigdataanalytics #programminglife #dataviz #html #digitalmarketing #devops #datasciencetraining #dataprotection
#rohitdubey
#teachtechtoe
#datascience #datasciencetraining #datasciencejobs #datasciencecourse #datasciencenigeria #datasciencebootcamp #datascienceworkshop #datasciencecareers #datasciencestudent #datascienceproject #datascienceforall #datasciencetraininginpatelnagar#datasciencetrainingindelhi
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
Teradata and Revolution Analytics worked together to develop in-database analytical capabilities for Teradata Database. Teradata v14.10 provides a foundation for in-database analytics in Teradata. Revolution Analytics has ported its Revolution R Enterprise (RRE) Version 7.1 to use the in-database capabilities of version 14.10. With RRE inside Teradata, users can run fully parallelized algorithms in each node of the Teradata appliance to achieve performance and data scale heretofore unavailable. We'll get past the market-ecture quickly and dive into a “how it really works” presentation, review implications for system configuration and administration, and then take questions from Teradata users who will be charged with deploying and administering Teradata systems as platforms for big data analytics inside the database engine.
Presentation on Demystifying Data Science. I presented this ppt at a panel discussion organised by Christ University on March 1, 2019. The presentation tries to present a realistic perspective of Data Science to aspiring Data Scientists. This perspective is from my own experience as a Data Scientist.
microsoft r server for distributed computingBAINIDA
microsoft r server for distributed computing กฤษฏิ์ คำตื้อ,
Technical Evangelist,
Microsoft (Thailand)
ในงาน THE FIRST NIDA BUSINESS ANALYTICS AND DATA SCIENCES CONTEST/CONFERENCE จัดโดย คณะสถิติประยุกต์และ DATA SCIENCES THAILAND
Learn Business Analytics with R at edureka!Edureka!
This is a 6-week course for professionals who aspire to learn 'R' language for Analytics. Practical approach of learning has been followed in order to provide a real time experience and make you think like an analyst. Our course will cover not only the basic concepts but also the advanced concepts like Data Visualization, Data Mining, Model Building in R, Web Analytics and so on.
I am shubham sharma graduated from Acropolis Institute of technology in Computer Science and Engineering. I have spent around 2 years in field of Machine learning. I am currently working as Data Scientist in Reliance industries private limited Mumbai. Mainly focused on problems related to data handing, data analysis, modeling, forecasting, statistics and machine learning, Deep learning, Computer Vision, Natural language processing etc. Area of interests are Data Analytics, Machine Learning, Machine learning, Time Series Forecasting, web information retrieval, algorithms, Data structures, design patterns, OOAD.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
Python is the choice llanguage for data analysis,
The aim of this slide is to provide a comprehensive learning path to people new to python for data analysis. This path provides a comprehensive overview of the steps you need to learn to use Python for data analysis.
Similar to LSESU a Taste of R Language Workshop (20)
I am an instructor of the MLOps workshop for some anonymous startup incubation program where the objectives are (1) to orchestrate and deploy updates to the application and the deep learning model in a unified way. (2) To design a DevOps pipeline to coordinate retrieving the latest best model from the model registry, packaging the web application, deploying the web application and inferencing web service.
"Big Data is not the new oil." - Jer Thorp, the co-founder of the Office For Creative Research, a multi-disciplinary research group exploring new modes of engagement with data.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
1. a Taste of R Programming
Kyle Akepanidtaworn,
LSESU Data Science
Society
2. About Me
• Founder and President of LSESU Data Science Society
(2016)
• A General Course Student at LSE studying Econ & Stats
• Former Big Data Intern at IMC Institute, Thailand
• Former Teaching Assistant at Wesleyan University
• Former Quantitative Consultant for the Connection in the
course “intro to statistical consulting“ at Wesleyan
• Programming & Stats Packages: R, Python, SPSS, SAS,
STATA
• Business Intelligence Tools: Tableau, Qilkview
• Linkedin: https://uk.linkedin.com/in/korkridakepan
3. 7 Quick Facts about R
• R is the highest paid IT skill (Dice.com
survey, January 2014)
• R most-used data science language after
SQL (O'Reilly survey, January 2014)
• R is used by 70% of data miners (Rexer
survey, October 2013)
• R is #15 of all programming languages
(RedMonk language rankings, January
2014)
• R growing faster than any other data
science language (KDNuggets survey,
August 2013)
• R is the #1 Google Search for Advanced
Analytics software (Google Trends, March
2014)
• R has more than 2 million users
worldwide (Oracle estimate, February
2012)
• http://blog.revolutionanalytics.com/2014/0
4/seven-quick-facts-about-r.html
4. What is R?
• Developed by Ross Ihaka and Robert
Gentleman (statistician)
• First appeared Aug 1993; 23 years ago
• Some capabilities of R include:
Software development
Data analysis and visualization
Polling, surveys of data miners
Shiny application development
Writing project report
Creating the HTML presentation
5. Data Science War: R vs. Python
Source: Which #superheroe are you?(#batman Vs. #Superman) == (#R Vs. #Python)?
8. Why Learn R?
Outstanding Graphs
Big Community!
Friendly to New Users and Non-programmers
Extremely Comprehensive
Flexible & Fun!
Open-source Language
Cross-Platform Compatibility
Advanced Statistical Language
9. • Facebook - For behavior analysis related to status
updates and profile pictures.
• Google - For advertising effectiveness and economic
forecasting.
• Twitter - For data visualization and semantic
clustering
• Microsoft - Acquired Revolution R company and use
it for a variety of purposes.
• Uber - For statistical analysis
• Airbnb - Scale data science.
• IBM - Joined R Consortium Group
• ANZ - For credit risk modeling
• HP
• Ford
• Novartis
• Roche
• New York Times - For data visualization
• Mckinsey
• BCG
• Bain
Companies Using R
10. Installation Guide
1. Go to https://cran.r-project.org/ (The Comprehensive R Archive Network)
2. Choose the platform (either Windows or Mac) that suits you
3. Follow the installation instruction…nothing tricky here
4. Download R-Studio, which is an add-on user interface of R programming.
https://www.rstudio.com/
12. Remark
I encourage everyone to follow the latest development of R programming via R-Bloggers,
CRAN R, and R Studio websites. There are always a tremendous number of developers
who help ease the analysis task for the R users.
13. A Transition from Microsoft Excel to R?
• Peter Flom, Independent statistical consultant for researchers in behavioral, social and
medical sciences, has a compelling argument why Excel is such an undervalued tool for data
analysis.
• Excel isn’t undervalued as a tool for statistical analysis. If anything, it’s overvalued as such a
tool.
• Most competent analysts do not use Excel, not because it’s too easy, but other analytical tools
have more statistical capabilities.
• The default graphs in Excel are awful…other visualization tools outperform Microsoft Excel.
• Learning statistics in Excel sometimes gives an imaginary idea about data analysis. Doing
good statistics requires rigorously intensive training.
• Excel cannot handle big data. If you are dealing with more than 1+ million data points, you
need to seek help from R or Python.
• It makes it harder than other programs to check the assumptions we made in analysis.
15. Teaching Outline
Chapter 1: R in Point-and-Click
• Rcommander
• Menu in Rcommander
• R vs. STATA vs. IBM SPSS
• Why is Coding Critical?
Chapter 2: Basics of R Programming
• Basic and Complex Numerical Operations
• R Basic Data Types
Numeric
Integer
Complex
Logical
Character
Chapter 2: Basics of R (Cont’)
• Matrix
• Vector
• List
• Data Frame
• For-Loop
• Writing your functions
Chapter 3: R for Data Science
• Using External Data
• Exploratory Data Analysis (EDA)
• Predictive Modelling
Linear Regression
Classification
Clustering
17. R Commanders
• Enables analysts to access a
selection of commonly-used R
commands.
• Serves the important role of helping
users to implement R commands
and develop their knowledge and
expertise in using the command line.
• Comes with a number of plugins
available that provide direct access
to R packages.
22. Comparing the statistical capabilities of
software packages
• A statistical consultant known only as "Stanford PhD" has put together a table comparing the
statistical capabilities of the software packages R, Matlab, SAS, Stata and SPSS.
23. Comparing the statistical capabilities of
software packages
• For each of 57 methods (including techniques like "ridge regression", "survival analysis",
"optimization") the author ranks the capabilities of each software package as "Yes" (fully
supported), "Limited" or "Experimental".
• R and Matlab capabilities outperform those of SAS, STATA, and SPSS.
• Python, to the best of my knowledge, is not rich in statistical testing functions, so it lies
somewhere between R and SAS.
R 57
Matlab 57
SAS 42
Stata 29
SPSS 20
24. Should economists learn programming?
Of course! As Keynes said: "The master-economist must possess a rare combination of gifts ....
He must be mathematician, historian, statesman, programmer, philosopher -- in some degree.
He must understand symbols, write code, and speak in words. He must contemplate the
particular, in terms of the general, and touch abstract and concrete in the same flight of thought.
He must study the present in the light of the past for the purposes of the future. He must be able
to speak a common language with a computer scientist, a physicist and a sociologist. No part of
man's nature or his institutions must be entirely outside his regard. He must be purposeful and
disinterested in a simultaneous mood, as aloof and incorruptible as an artist, yet sometimes as
near to earth as a politician.”
--- Alex Teytelboym, Research Fellow in Economics at INET, University of Oxford
25. Why Coding? (I)
• As a social science student at LSE, managing, analyzing, and playing with data is an
important part of your work. (charts, curves, and trends etc.)
• Without programming skills, your work becomes more limited.
• Are you always relying upon manual calculations?
• Are you hand-collecting the data when you can write the code to easily retrieve data?
• Are you working with big data? Do you think excel will solve all data problems?
• With code, you can increase multiply by a huge factor the amount of work or calculations
you can perform, read millions of rows of data, try and find patterns or relations, compare
oil prices to Reddit traffic, or the natality rate to the average interest earned by investors in
Wyoming; whatever you can think of in a matter of minutes or hours and unleash your
imagination.
26. Why Coding? (II)
• Many experimental data requires you pull, clean and manipulate large sets of available
and incoming data to run experiments based on some economic question you're testing.
• If you can write code to do these tasks quickly and efficiently, you can iterate quickly
through a lot more hypotheses you might want to test.
• The cutting edge of economic research uses novel datasets and combines both theory
and empirics.
• Everyone in this room has different expectations of what they want to be able to do with
data. In social science, analytics is very important, while computer programming with
C++, Java, and similar languages are hardly necessary.
27. Everybody in this country
should learn to program
a computer… because it
teaches you how to think
- Steve Jobs, Co-Founder and CEO of Apple Inc. (1995 -2011)
28. R QuintessenceThe most disastrous thing that you can ever learn is
your first programming language – Alan Kay
35. What are “Loops”?
• “Looping”, “cycling”, “iterating” or just replicating instructions is an old practice that
originated well before the invention of computers. It is nothing more than automating a
multi-step process by organizing sequences of actions or ‘batch’ processes and by
grouping the parts that need to be repeated.
• All modern programming languages provide special constructs that allow for the repetition
of instructions or blocks of instructions.
• Broadly speaking, there are two types of these special constructs or loops in modern
programming languages. Some loops execute for a prescribed number of times, as
controlled by a counter or an index, incremented at each iteration cycle. These are part of
the for loop family.
• On the other hand, some loops are based on the onset and verification of a logical
condition. The condition is tested at the start or the end of the loop construct. These
variants belong to the while or repeat family of loops, respectively.
37. Functions (I)
• Functions are used to logically break our code into simpler parts which become easy to
maintain and understand.
• It's pretty straightforward to create your own function in R programming.
38. Functions (II)
• Conceptually, given some inputs of x, we perform some computation to get the new
output.
• Some commonly known functions are mean, median, square root and summation etc.
41. Functions (V)
• Writing functions should spring to your
mind when you want to write your own
chunk of codes and automate codes
easily.
• Creating your own functions begs some
imagination and efficient coding skill.
• Please revisit the workshop file for
example in action!
• Peace of mind: a vast community of R
developers around the world collaborates
in providing R useful packages, which
saves us a lot of time and effort.
• As you progress in data analysis with R,
finding the right packages may provide a
shortcut for your research project.
44. Packages are collections of R functions, data, and
compiled code in a well-defined format. The directory
where packages are stored is called the library.
- Quick-R
45. R for Data ScienceWithout data, you’re just another person with an opinion
46. How to Import the Data
Importing your data into R – R Tutorials by R-Bloggers
47. “The simple graph has brought more information to the
data analyst’s mind than any other device.”
- John Tukey
48. R for Data Science: Data Visualization
• R has several systems for making
graphs, but ggplot2 is one of the most
elegant and most versatile. ggplot2
implements the grammar of graphics,
a coherent system for describing and
building graphs. With ggplot2, you can
do more faster by learning one system
and applying it in many places.
• If you’d like to learn more about the
theoretical underpinnings of ggplot2
before you start, I’d recommend
reading “The Layered Grammar of
Graphics”,
http://vita.had.co.nz/papers/layered-
grammar.pdf.
66. R Studio Tips and Tricks
These are not exactly coding tricks, but rather ways to make your life easier using key
commands.
• The up arrow on your keyboard will allow you to scroll up through your past commands
• The tab key on your keyboard will help you (particularly in RStudio) by offering ways to
finish your code.
• When working within a .R or .Rmd file, you can put your cursor on a line and hit Cntrl +
Enter to get the code to execute in the Console. (On a mac, Command + Enter.)
• If you get stuck with some syntax (usually, mismatched parentheses or quotes), the R
Console will change from the > at the beginning of the line (which means it is waiting for a
new command) to the + at the beginning of the line (which means it is waiting for you to
finish a command). To get out, hit the Escape key.
67. Tearable Panes
Tearable panes are anything but terrible. This feature allows users to tear off data view
panes and source panes facilitating the use of multiple screens.
68. Command History
In the console it is possible to scroll through the command history by clicking Ctrl/Cmd and ↑.
The command history will be filtered as code is typed into the console:
69. History Pane
The history pane shows a searchable list of commands that have been run. Commands can
be written to the source pane or the console. No more copy and paste from the console to a
script!
70. Rename in Scope
This feature makes it easy to rename all instances of a variable. The tool is context aware;
changing ‘m’ to ‘m1’ won’t change ‘mtcars’ to ‘m1tcars’.
71. Gallery and Satellite View in Notebooks
A new feature built into R Notebooks, a code chunk that produces multiple plots will produce
a gallery. The plots can be viewed by toggling between thumbnails. The gallery can be
expanded into a new satellite window for closer inspection.
72. Code Outline
Save time scrolling with the code outline. This feature works for R Notebooks and traditional
R scripts. In R Notebooks sections are delimited by the R Markdown headers. In R scripts
sections are delimited by section comments (Try Code -> Insert Section).
73. Code Snippets
Code snippets are a shortcut to insert common boilerplate code. For instance, type fun and
then Tab to insert the skeleton code for a function definition. Then hit Tab to replace the
necessary components. In addition to a rich set of defaults, custom code snippets can also
be created.
74. File Navigation
Many people know of RStudio’s rich set of tab complete options for functions and function
arguments. Tab complete can also help find files and remove the hassle of writing out long
path locations. Hit tab in between two double quotes (“ “) to open a file explorer.
75. Jump To Function Definition
Want to dig into the innards of a function? With the cursor on a function press F2 to jump to
the function definition, even for functions in a package.
77. Further Resources
R Language
Advanced R: http://adv-r.had.co.nz/
Bioconductor: https://www.bioconductor.org/
CRANberries:
http://dirk.eddelbuettel.com/cranberries/
MRAN:
https://mran.revolutionanalytics.com/
rOpenSci: https://ropensci.org/
R Project: https://www.r-project.org/
The R Journal: https://journal.r-project.org/
R Community
R Consortium: https://www.r-consortium.org/
R Weekly: https://rweekly.org/
Data Sciences
Apache Hadoop: http://hadoop.apache.org/
KDnuggets: http://www.kdnuggets.com/
R for Data Science: http://r4ds.had.co.nz/
sparklyr: http://spark.rstudio.com/
SparkR:
https://spark.apache.org/docs/latest/sparkr.h
tml
Tessera: http://tessera.io/
78. Further Resources
Blogs
RStudio Blog: https://blog.rstudio.org/
BLOGR: https://drsimonj.svbtle.com/
Mad (Data) Scientist:
https://matloff.wordpress.com/
R Bloggers: https://www.r-bloggers.com/
R Consortium Blog: https://www.r-
consortium.org/news/blog
Revolutions Blog:
http://blog.revolutionanalytics.com
rOpenSci Blog: https://ropensci.org/blog/
Simply Statistics: http://simplystatistics.org/
Statistical Modeling, Causal Inference, and Social
Science: http://andrewgelman.com/
StatsBlogs: http://www.statsblogs.com/
Win-Vector Blog: http://www.win-vector.com/blog/
Statistics
Journal of Statistical Software:
https://www.jstatsoft.org/index
Forecasting: principles and practice:
https://www.otexts.org/fpp
From Algorithms to Z-Scores:
http://heather.cs.ucdavis.edu/~matloff/132/PLN/P
robStatBookW16ECS132.pdf
Statistical Foundations of Machine Learning:
https://www.otexts.org/book/sfml
The Elements of Statistical Learning:
http://statweb.stanford.edu/~tibs/ElemStatLearn/p
rintings/ESLII_print10.pdf
80. Udemy R Courses
• Another company is Udemy. While they do not offer video + interactive sessions like DataCamp
• They do offer extensive video lessons, covering some other topics in using R and learning statistics.
• The Comprehensive Programming in R Course (25 Hours of video)
• Graphs in R (ggplot2, plotrix, base R) – Data Visualization with R Programming Language (5 Hours of video)
• Linear Mixed-Effects Models with R (11 Hours of video)
• Multivariate Data Visualization with R (7 Hours of video)
• Applied Multivariate Analysis with R (13 Hours of video)
• More Data Mining with R (11 Hours of video)
• Text Mining, Scraping and Sentiment Analysis with R (4 Hours of video)
• R Programming for Simulation and Monte Carlo Methods (12 Hours of video)
• Programming Statistical Applications in R (12 Hours of video)
• Comprehensive Linear Modeling with R (15 Hours of video)
• Bayesian Computational Analyses with R (12 Hours of video)
• Time Series Analysis and Forecasting in R (3 Hours of video)