SlideShare a Scribd company logo
1 of 19
Download to read offline
INTERNSHIP REPORT
At Decision Stats Consultancy
18th
June 2014 - Present
Chandan Kumar Routray Second Year Undergraduate
Student
IIT Kharagpur, West Bengal
Table of Contents
I. Summary................................................................ 2
II. An Overview: Thing that I learned ....................... 3
III. Blog Posts: During the Internship........................ 4
IV. Appendix (Day wise Work) ................................... 5
Day 1...................................................................................................5
Day 2...................................................................................................6
Day 3...................................................................................................7
Day 4-5 ...............................................................................................8
Day 6-7 ...............................................................................................9
Day 8-9 .............................................................................................10
Day 9-11...........................................................................................11
Day 12-13.........................................................................................12
Day 14-18.........................................................................................14
Day 19-20.........................................................................................15
Day 20-26.........................................................................................16
Day 40+............................................................................................17
Summary
I have completed almost a month of this internship with Decision Stats
Consultancy. This internship has been a roller coaster ride for me from the
very beginning and the past four weeks were the most productive period
of my life in terms of learning. It has helped me changed into a more
disciplined and professional person. This internship helped me to acquire
a wide set of skills like analytics, web development, technical blog writing
etc. The daily update calls of Mr. Ajay Ohri, my guide for this internship
were scheduled at around 9 pm through Skype in which he reviews my
daily assignments, gives me some useful tips on how to manage my work
and to make it more presentable followed by the assignment for the next
day. Every day after these calls I found myself with a new target to
achieve in a given period of time, according to which I plan my next day
so as to achieve the same.
A typical assignment consist of learning a new thing like
coding or understanding a package in R or Python and performing an
exercise on the same, writing an informative blogpost on the things I
learned that day and a life experience blogpost. In this due course I
learned a lot of things: Coding in Python, R & JavaScript, Using packages
like Shiny, Rpy2, ggvis etc. in both R and Python, How to write a query in a
database using MySql, How to protect your website by SQL Injection,
Making your own website using Bootstrap, Automated extraction of data
from web and network, Working in the cloud and many other things.
Earlier I used to run away from writing stuffs but now I have become a
blogger and I am enjoying it too. I have also learned how to write an
informative blogpost, people have also started asking doubts on the
same and few of my blogposts are also re-blogged by some bloggers. Till
now I have written 14 tech blogpost about various thing that I have
learned, all of them were made very reader friendly by me.
I have been very excited about his internship from the very
beginning and now Mr. Ajay Ohri has offered me to continue this
internship for some more time for which I am very grateful to him.
An Overview: Thing that I learned
Programming Web Development
โ€ข Python
(www.codecademy.com)
โ€ข Java Script
โ€ข R (Swirl Package and
www.datacamp.com)
โ€ข Bootstrap (www.jetstrap.com)
โ€ข SQL and SQL Injection
โ€ข JavaScript(D3.js)
โ€ข Hosting via Dropbox
Writing Software
โ€ข Technical Blog Writing
www.python4analytics.wor
dpress.com
โ€ข Report Making
โ€ข Virtualization Software: Oracle
Virtual Box
VM Ware Player
โ€ข Database Management: My
SQL Workbench
Analytics Working on Cloud
โ€ข Data Extraction: Wireshark,
iMacros
โ€ข Analysing Data: Rstudio,
Python (Pandas,Rpy2)
โ€ข Result Presentation:
Rstudio(Shiny, ggvis, slidify),
d3.js
โ€ข AWS EC2: Starting an Instance,
Accessing the instance,
Installing Rstudio Server &
Ipython on it etc.
Big Data Other
โ€ข Apache Hadoop: On
Hortonworks Sandbox
โ€ข Hue: Hive, Pig, HCatalog
โ€ข Git
โ€ข Infographics: Infogr.am
Blog Posts: During the Internship
Topics Links
Python https://python4analytics.wordpress.com/2014/06/18/python
-for-analytics-intro/
Installing Ipython https://python4analytics.wordpress.com/2014/06/19/installing-
ipython-on-anaconda/
Introduction to R https://python4analytics.wordpress.com/2014/06/20/introdu
ction-to-r-language-installing-swirl/
Pandas Library https://python4analytics.wordpress.com/2014/06/22/statistical-
python-pandas-library/
D3.js(JavaScr
ipt)
https://python4analytics.wordpress.com/2014/06/23/presen
ting-the-results-working-with-d3-js-a-javascript-library/
Datacamp vs. Swirl https://python4analytics.wordpress.com/2014/06/24/learning-r-
datacamp-com-vs-swirl-package/
Shiny https://python4analytics.wordpress.com/2014/06/26/shiny-
rstudio-web-application-framework-for-r/
Git https://python4analytics.wordpress.com/2014/06/26/using-git-for-
projects/
SAS https://python4analytics.wordpress.com/2014/06/30/intro-
to-sas-and-installation/
iMacros https://python4analytics.wordpress.com/2014/07/07/web-
scrapingdata-extraction-from-web-using-imacros/
SQL https://python4analytics.wordpress.com/2014/07/07/web-
scrapingdata-extraction-from-web-using-imacros/
EC2 https://python4analytics.wordpress.com/2014/07/18/setting-up-
rstudio-server-on-aws-ec2-instance/
Infogr.am https://python4analytics.wordpress.com/2014/07/25/infogra
m-infographics-made-easy/
Wireshark https://python4analytics.wordpress.com/2014/07/25/infogram-
infographics-made-easy/
Bootstrap https://python4analytics.wordpress.com/2014/07/01/make-
responsive-website-with-bootstrap/
Apache Hadoop https://python4analytics.wordpress.com/2014/08/06/installin
g-hortonworks-sandbox-hadoop
Appendix (Day wise Work)
Day 1
Task Given
1) Create a blog on http://blogger.com and http://wordpress.com
2) Start an account on code academy and send screenshot of initial Page. You will
be learning Python
3) Download and Install R from www.r-project.org
4) Write a blog post on your experience on Day 1 of internship
Work Update
1) Codecademy account started. Completed 23% of the beginnerโ€™s course on very
first day. Check my progress by visiting this link :- www.codecademy.com/imeckr
2) R downloaded and installed on my system.
3) Created a blog on Blogger.com. I have also posted my first Blog on it
http://analyticsinternship.blogspot.com/2014/06/day-1.html
Reference used
None
Remarks
1) Proper editing of the day1 blog, write more information oriented blog
2) URL of a web should be answer to a question(SEO)
3) How to select a good theme for your blog
Day 2
Task Given
1) Create a new blog on Wordpress.com with a catchier name, same content, better
editing, and its title URL should be the answer to a question on Google Search. Please
send me screenshots. What should be the reason for choosing an appropriate
theme?
2) Tags should be used and then you should share it on your Facebook, LinkedIn,
Twitter and Google Plus profiles- please send me screenshots of this.
3) Please earn at least 5 badges in Python in Code academy for tomorrowโ€™s
submission
4) Read this page please - http://pandas.pydata.org/. Download and Install Pandas
5) Download and Install Ipython-http://ipython.org/
6) Blog on Day 2 (besides your existing edited and refined Day 1 blog)
Work Update
1) Created Wordpress blog https://python4analytics.wordpress.com/
Link to Day 1 blog :- https://python4analytics.wordpress.com/2014/06/18/python-for-
analytics-intro/
Link to Day 2 blog :- https://python4analytics.wordpress.com/2014/06/19/installing-
ipython-on-anaconda/
2) Earned 6 badges on Day 2 on codecademy
http://www.codecademy.com/imeckr
3) Ipython and Pandas downloaded and installed on system
4) Blog shared facebook, google+, linkedin accounts.
References used
www.pandas.pydata.org, www.ipython.org. Also the respective documentation.
Remarks
1) Maintain two different blogs one on Blogger.com for work experience and one on
Wordpress.com for tech blogging on things I learn daily
Screenshots
Day 3
Task Given
1) Create accounts on topcoder, kaggle, github. Write one paragraph summary of
what these websites are, what advantages can you have by an account on this
2) Install swirl package in R (use Google on how to). Do one exercise. Show
screenshot
3) Go to Datacamp.com and create account. Do one exercise and show
screenshot.
4) Get 4 badges in Python and 2 badges in Java Script on Code Academy
5) Blog on this. Show screenshots of analytics of each blog- answer this question-
which are the metrics I should track for my blog if I want to make it better
Work Update
1) Codecademy status: Python completed 50%, Java script 21% with 23 badges, 187
points and 4 day streak.
2) Blogged about R on Wordpress
https://python4analytics.wordpress.com/2014/06/20/introduction-to-r-language-
installing-swirl/
3) Swirl package installed on system. Done few exercises
4) Created account on Datacamp. Done few exercises there too.
5) Creating account on sites like Topcoder, Kaggel and Github helps a user in many
ways. As, these sites already have a lot registered user from across the world, it act as
an online community of coders, designers, analyst, innovators etc. where users can
discuss their problems and ideas among themselves. It also helps a user to see where
exactly he/she stands now and how can he/she develop his/her talent in their
respective field. A user can also take up various courses, projects and even also
compete with other user.
6) Keeping track on following will make one's blog better
(i) Referrers: - From where are my visitors are getting redirected, where should i share
my blog more often?
(ii)Region of visitors: - Which region does most of my visitors belong?
(iii)Tags and Categories: - Shows which topic is more trending on search engines.
References used
www.swirlstats.com . Also its documentation.
Screenshots
Day 4-5
Task Given
1) CODING- Get to 60 % in Python and 40% in Java Script on Code Academy
2) STATISTICAL PYTHON -Go to http://pandas.pydata.org/pandas-
docs/stable/10min.html#min Blog on the experience
3) PRESENTATION OF RESULTS Go to http://d3js.org/ . Read it and Blog on it. (Part 2 is
shiny package in R from http://shiny.rstudio.com/tutorial/, Part 3 will http://slidify.org/
packages in R)
4) CODING- Do one modules in Swirl. Write a tech blog on what you have learnt
5) CODING- Go to Datacamp.com. Do one exercise and show screenshots.
Work Update
1) Completed 60% in Python and 40% in Java Script
2) Blog on Statistical Python :
https://python4analytics.wordpress.com/2014/06/22/statistical-python-pandas-library/
Blog on D3.js : https://python4analytics.wordpress.com/2014/06/23/presenting-the-
results-working-with-d3-js-a-javascript-library/
3) One module completed in Swirl
4) Completed one exercise on Datacamp.com
5) Blog on experience:
Day 3: http://analyticsinternship.blogspot.in/2014/06/day-3.html
Day 4-5:http://analyticsinternship.blogspot.in/2014/06/day-4-5.html
References used
www.d3js.org. Also its documentation.
Screenshots
Day 6-7
Task Given
1) Do one more module in Swirl
2) Do one exercise in Data Camp
3) Write Technical Blog Post on how the two are different, including plus and minus of
both (Swirl vs. Data Camp)
4) Read about using JS within R here http://timelyportfolio.blogspot.in/2013/04/d3-r-
with-rcharts-and-slidify.html
5) Complete 4 badges each in Python and JS
Work Update
1) Codecademy status: Python - 70% and Java Script - 50%
2) One module completed in Swirl Package.
3) One exercise completed on Datacamp
4) Read about the link that you had given
5) Blogged on Swirl Vs. Datacamp
http://python4analytics.wordpress.com/2014/06/24/learning-r-datacamp-com-vs-
swirl-package/
References used
http://timelyportfolio.blogspot.in/2013/04/d3-r-with-rcharts-and-slidify.html
Screenshots
Day 8-9
Task Given
1) Make a demo app on Shiny. How is population of India and China changing over
time? How is the per capita GDP changing over time? Google for datasets. Send me
initial draft.
2) Install and Load SAS University Edition
http://www.sas.com/en_us/software/university-edition.html
3) Complete the exercises at https://try.github.io/
4) 3 tech blog posts on Shiny, GIT and SAS
Work Update
1) Made a demo app on shiny which can show one plot at time. Made a dataframe,
which I have used in the app.
2) Completed the GIT exercise.
3) Blog on Git https://python4analytics.wordpress.com/2014/06/26/using-git-for-
projects/
Blog on Shiny https://python4analytics.wordpress.com/2014/06/26/shiny-rstudio-web-
application-framework-for-r/
References used
www.ggvis.rstudio.com, shiny.rstudio.com Also their documentations.
Screenshots
Shiny App
Day 9-11
Task Given
1) Use http://shiny.rstudio.com/gallery/ for troubleshooting your Shiny App
2) Use ggvis package somehow in your app http://ggvis.rstudio.com/ and
also use d3.js (hint - read this http://www.xavierdupre.fr/blog/2013-11-30_nojs.html)
3) Use and create one small demo showing data flow and calls from python and R
using ryp2. For example load some JSON data using python and then call a R
package.
4) Complete all pending blog posts
5) Make a small demo website using https://jetstrap.com/
6) Create an infographic for the same dataset that you are using in shiny dataset
using http://infogr.am/
7)Try and download and install this- this will help check for the VMware and also start
off big data efforts : http://hortonworks.com/products/hortonworks-sandbox/
Work Update
1) Build the Shiny app. Used ggvis but not D3js till now
2) Used rpy2 in python to import a built-in dataset from R and plotting a graph of that.
3) Tech blogpost on SAS : http://python4analytics.wordpress.com/2014/06/30/intro-to-
sas-and-installation/
4) Made an Infographic
5) Demo Website : I tried to re-create my blog http://jetstrap.io/share/cfcd9bc36a
References used
www.shiny.rstudio.com, www.xavierdupre.fr/blog/2013-11-30_nojs.html
Screenshots
Shiny App
Day 12-13
Task Given
1) Create Demo Website - Read this and try and create a website for Decision Stats
Consulting. Take content from the image in the post, and
http://decisionstats.com/about-decisionstats/ page.
2) For tomorrow Read about bootstrap http://getbootstrap.com/ and blog on it
3) Install MYSQL on your system (full installation). Learn SQL. Create a table with all
teams remaining round of 16 of all players. It should have player name, player
surname, football club, position he plays, one more additional column based on your
discretion. Then answer using SQL queries the following answers programmatically
Which World Cup team is now the tallest? Which is the oldest? Which is the shortest?
Which is the youngest? Which striker is the fattest/youngest?
4) Python - Make it to 90% by Wednesday
5) R- Finish swirl (all modules) by Wednesday
Work Update
1) Completed 90% Python course on Codecademy.
2)Learned about Bootstrap and revised HTML & CSS
3) Made "About Page" for Decision Stats by editing existing templates and adding
some new elements(Hosted the same using dropbox.com
http://imeckrdemo.kissr.com/)
4)Blogged on Bootstrap (https://python4analytics.wordpress.com/2014/07/01/make-
responsive-website-with-bootstrap/)
5)Completed all modules of R programming in Swirl
6)Installed MySQL on my system. Read about MySQL and currently learning it, will do
the assignment of the same after clearing doubts with you.
References used
www.getbootstrap.com and its documentation
Screenshots
MadeThisDemoWebsite
Day 14-18
Task Given
1) Read on SQL Injection and SQL http://decisionstats.com/2013/03/26/how-to-learn-
sql-injection/ and try and do the demos at http://sqlzoo.net/hack/
2) What was the problem with SAS Installation? Blog on this AFTER you have successfully
installed it and shown screenshots
3) Compile everything you have learnt in 1 page essay. With appendix of day wise
submissions that you did.
4) Edit all the Blogs
Work Update
1) Learned Web Scrapping using iMacros, still facing some problem in extracting data
from some sites. Also wrote a blogpost on the same
https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction-
from-web-using-imacros/
2) Created a basic table (in Database) using MySQL Workbench, practiced some basic
queries on it.
3) Learned about SQL injection by resources provided by you. Also blogged on the
same http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web-
attack-technique/
References used
http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/
Screenshots
Day 19-20
Task Given
1) Read these papers: http://www.slideshare.net/ajayohri/using-r-for-cyber-security-part-
1 and http://www.sis.pitt.edu/jjoshi/courses/IS2621/Spring2014/Lab3.pdf
2 ) Use Wireshark and/or Silk to capture some dummy data from a network ( wifi or
wherever)
3) Use the paper 1 to import the data in R and visualize it
4) Additional download and install wireshark and use the instructions from
http://www.ict.kth.se/courses/II2202/II2202-quantitative-chip-R-20110918.pdf to help you
with the analysis
Work Update
1) Learned Web Scrapping using iMacros, still facing some problem in extracting data
from some sites. Also wrote a blogpost on the same
https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction-from-
web-using-imacros/
2) Created a basic table (in Database) using MySQL Workbench, practiced some basic
queries on it.
3) Learned about SQL injection by resources provided by you. Also blogged on the same
http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web-attack-
technique/
References used
http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/
Screenshots
Day 20-26
Task Given
1) Complete python on codecademy.
2) Setup RStudio server on AWS and Blog on the same
3) Giving user rights to you, choosing the appropriate user rights.
4) Setup Ipython on AWS
Work Update
1) Completed Python on Codecademy
2) Created AWS account, set up RStudio server on it.
3) Blogged on the same https://python4analytics.wordpress.com/2014/07/18/setting-up-
rstudio-server-on-aws-ec2-instance/
References used
http://www.s-anand.net/blog/ssh-tunneling-through-web-filters/
http://www.r-bloggers.com/instructions-for-installing-using-r-on-amazon-ec2/
Screenshots
Day 40+
Task Given
1) Read these
http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and-
big-data
http://www.slideshare.net/ajayohri/big-data-big-analytics
2) Explore Hadoop
3) Complete the tutorials on Hortonworks Sandbox
Work Update
1) Read the two papers provided by you
http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and-
big-data
http://www.slideshare.net/ajayohri/big-data-big-analytics
2) Explored Hadoop: What is HDFS, Map Reduce, Pig, Hive etc.
3) Resolved that problem that I was having with Pig and Sandbox
4) Completed first two tutorials on Hortonworks Sandbox and Learned following things:
Basics commands in Pig(Grunt Shell), Downloaded a sample data and performed basic
Hive Queries on it
Blog Written
https://python4analytics.wordpress.com/2014/08/06/installing-hortonworks-sandbox-
hadoop/
Screenshots
Thank You

More Related Content

Similar to Decisionstats.com Data Science Virtual Internship

Introduce Django
Introduce DjangoIntroduce Django
Introduce Django
Chui-Wen Chiu
ย 
Lviv .Net User Group. NHibernate
Lviv .Net User Group. NHibernateLviv .Net User Group. NHibernate
Lviv .Net User Group. NHibernate
Dima Maleev
ย 

Similar to Decisionstats.com Data Science Virtual Internship (20)

Web Development in Django
Web Development in DjangoWeb Development in Django
Web Development in Django
ย 
Google Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended HanoiGoogle Cloud: Next'19 Extended Hanoi
Google Cloud: Next'19 Extended Hanoi
ย 
Sucuri Webinar: How to Optimize Your Website for Best Performance
Sucuri Webinar: How to Optimize Your Website for Best PerformanceSucuri Webinar: How to Optimize Your Website for Best Performance
Sucuri Webinar: How to Optimize Your Website for Best Performance
ย 
4-identifying-problems.pdf
4-identifying-problems.pdf4-identifying-problems.pdf
4-identifying-problems.pdf
ย 
Upgrading your site from Drupal 6 to Drupal 7
Upgrading your site from Drupal 6 to Drupal 7Upgrading your site from Drupal 6 to Drupal 7
Upgrading your site from Drupal 6 to Drupal 7
ย 
10 Web Performance Lessons For the 21st Century
10 Web Performance Lessons For the  21st Century10 Web Performance Lessons For the  21st Century
10 Web Performance Lessons For the 21st Century
ย 
HBD
HBDHBD
HBD
ย 
Microservices for the Masses with Spring Boot, JHipster, and OAuth - Belfast ...
Microservices for the Masses with Spring Boot, JHipster, and OAuth - Belfast ...Microservices for the Masses with Spring Boot, JHipster, and OAuth - Belfast ...
Microservices for the Masses with Spring Boot, JHipster, and OAuth - Belfast ...
ย 
Robert Fornal - ChatGPT as a Testing Tool.pptx
Robert Fornal - ChatGPT as a Testing Tool.pptxRobert Fornal - ChatGPT as a Testing Tool.pptx
Robert Fornal - ChatGPT as a Testing Tool.pptx
ย 
Introduce Django
Introduce DjangoIntroduce Django
Introduce Django
ย 
A Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source AdventureA Year of Pyxley: My First Open Source Adventure
A Year of Pyxley: My First Open Source Adventure
ย 
Automating some google things
Automating some google thingsAutomating some google things
Automating some google things
ย 
Qtp Interview Questions
Qtp Interview QuestionsQtp Interview Questions
Qtp Interview Questions
ย 
Computer vision and face recognition using python
Computer vision and face recognition using pythonComputer vision and face recognition using python
Computer vision and face recognition using python
ย 
Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...
Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...
Comparing Hot JavaScript Frameworks: AngularJS, Ember.js and React.js - Sprin...
ย 
Scrapy
ScrapyScrapy
Scrapy
ย 
Scrum discussion
Scrum discussionScrum discussion
Scrum discussion
ย 
Scrapy tutorial
Scrapy tutorialScrapy tutorial
Scrapy tutorial
ย 
Lviv .Net User Group. NHibernate
Lviv .Net User Group. NHibernateLviv .Net User Group. NHibernate
Lviv .Net User Group. NHibernate
ย 
NHibernate
NHibernateNHibernate
NHibernate
ย 

More from Ajay Ohri

More from Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
ย 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
ย 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
ย 
Pyspark
PysparkPyspark
Pyspark
ย 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
ย 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
ย 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
ย 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
ย 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
ย 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
ย 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
ย 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
ย 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
ย 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
ย 
Craps
CrapsCraps
Craps
ย 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
ย 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
ย 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
ย 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
ย 
Analyze this
Analyze thisAnalyze this
Analyze this
ย 

Recently uploaded

Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
KreezheaRecto
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
ย 

Recently uploaded (20)

chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
ย 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
ย 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
ย 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
ย 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
ย 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
ย 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
ย 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
ย 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
ย 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
ย 
Top Rated Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth โŸŸ 6297143586 โŸŸ Call Me For Genuine Se...
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 

Decisionstats.com Data Science Virtual Internship

  • 1. INTERNSHIP REPORT At Decision Stats Consultancy 18th June 2014 - Present Chandan Kumar Routray Second Year Undergraduate Student IIT Kharagpur, West Bengal
  • 2. Table of Contents I. Summary................................................................ 2 II. An Overview: Thing that I learned ....................... 3 III. Blog Posts: During the Internship........................ 4 IV. Appendix (Day wise Work) ................................... 5 Day 1...................................................................................................5 Day 2...................................................................................................6 Day 3...................................................................................................7 Day 4-5 ...............................................................................................8 Day 6-7 ...............................................................................................9 Day 8-9 .............................................................................................10 Day 9-11...........................................................................................11 Day 12-13.........................................................................................12 Day 14-18.........................................................................................14 Day 19-20.........................................................................................15 Day 20-26.........................................................................................16 Day 40+............................................................................................17
  • 3. Summary I have completed almost a month of this internship with Decision Stats Consultancy. This internship has been a roller coaster ride for me from the very beginning and the past four weeks were the most productive period of my life in terms of learning. It has helped me changed into a more disciplined and professional person. This internship helped me to acquire a wide set of skills like analytics, web development, technical blog writing etc. The daily update calls of Mr. Ajay Ohri, my guide for this internship were scheduled at around 9 pm through Skype in which he reviews my daily assignments, gives me some useful tips on how to manage my work and to make it more presentable followed by the assignment for the next day. Every day after these calls I found myself with a new target to achieve in a given period of time, according to which I plan my next day so as to achieve the same. A typical assignment consist of learning a new thing like coding or understanding a package in R or Python and performing an exercise on the same, writing an informative blogpost on the things I learned that day and a life experience blogpost. In this due course I learned a lot of things: Coding in Python, R & JavaScript, Using packages like Shiny, Rpy2, ggvis etc. in both R and Python, How to write a query in a database using MySql, How to protect your website by SQL Injection, Making your own website using Bootstrap, Automated extraction of data from web and network, Working in the cloud and many other things. Earlier I used to run away from writing stuffs but now I have become a blogger and I am enjoying it too. I have also learned how to write an informative blogpost, people have also started asking doubts on the same and few of my blogposts are also re-blogged by some bloggers. Till now I have written 14 tech blogpost about various thing that I have learned, all of them were made very reader friendly by me. I have been very excited about his internship from the very beginning and now Mr. Ajay Ohri has offered me to continue this internship for some more time for which I am very grateful to him.
  • 4. An Overview: Thing that I learned Programming Web Development โ€ข Python (www.codecademy.com) โ€ข Java Script โ€ข R (Swirl Package and www.datacamp.com) โ€ข Bootstrap (www.jetstrap.com) โ€ข SQL and SQL Injection โ€ข JavaScript(D3.js) โ€ข Hosting via Dropbox Writing Software โ€ข Technical Blog Writing www.python4analytics.wor dpress.com โ€ข Report Making โ€ข Virtualization Software: Oracle Virtual Box VM Ware Player โ€ข Database Management: My SQL Workbench Analytics Working on Cloud โ€ข Data Extraction: Wireshark, iMacros โ€ข Analysing Data: Rstudio, Python (Pandas,Rpy2) โ€ข Result Presentation: Rstudio(Shiny, ggvis, slidify), d3.js โ€ข AWS EC2: Starting an Instance, Accessing the instance, Installing Rstudio Server & Ipython on it etc. Big Data Other โ€ข Apache Hadoop: On Hortonworks Sandbox โ€ข Hue: Hive, Pig, HCatalog โ€ข Git โ€ข Infographics: Infogr.am
  • 5. Blog Posts: During the Internship Topics Links Python https://python4analytics.wordpress.com/2014/06/18/python -for-analytics-intro/ Installing Ipython https://python4analytics.wordpress.com/2014/06/19/installing- ipython-on-anaconda/ Introduction to R https://python4analytics.wordpress.com/2014/06/20/introdu ction-to-r-language-installing-swirl/ Pandas Library https://python4analytics.wordpress.com/2014/06/22/statistical- python-pandas-library/ D3.js(JavaScr ipt) https://python4analytics.wordpress.com/2014/06/23/presen ting-the-results-working-with-d3-js-a-javascript-library/ Datacamp vs. Swirl https://python4analytics.wordpress.com/2014/06/24/learning-r- datacamp-com-vs-swirl-package/ Shiny https://python4analytics.wordpress.com/2014/06/26/shiny- rstudio-web-application-framework-for-r/ Git https://python4analytics.wordpress.com/2014/06/26/using-git-for- projects/ SAS https://python4analytics.wordpress.com/2014/06/30/intro- to-sas-and-installation/ iMacros https://python4analytics.wordpress.com/2014/07/07/web- scrapingdata-extraction-from-web-using-imacros/ SQL https://python4analytics.wordpress.com/2014/07/07/web- scrapingdata-extraction-from-web-using-imacros/ EC2 https://python4analytics.wordpress.com/2014/07/18/setting-up- rstudio-server-on-aws-ec2-instance/ Infogr.am https://python4analytics.wordpress.com/2014/07/25/infogra m-infographics-made-easy/ Wireshark https://python4analytics.wordpress.com/2014/07/25/infogram- infographics-made-easy/ Bootstrap https://python4analytics.wordpress.com/2014/07/01/make- responsive-website-with-bootstrap/ Apache Hadoop https://python4analytics.wordpress.com/2014/08/06/installin g-hortonworks-sandbox-hadoop
  • 6. Appendix (Day wise Work) Day 1 Task Given 1) Create a blog on http://blogger.com and http://wordpress.com 2) Start an account on code academy and send screenshot of initial Page. You will be learning Python 3) Download and Install R from www.r-project.org 4) Write a blog post on your experience on Day 1 of internship Work Update 1) Codecademy account started. Completed 23% of the beginnerโ€™s course on very first day. Check my progress by visiting this link :- www.codecademy.com/imeckr 2) R downloaded and installed on my system. 3) Created a blog on Blogger.com. I have also posted my first Blog on it http://analyticsinternship.blogspot.com/2014/06/day-1.html Reference used None Remarks 1) Proper editing of the day1 blog, write more information oriented blog 2) URL of a web should be answer to a question(SEO) 3) How to select a good theme for your blog
  • 7. Day 2 Task Given 1) Create a new blog on Wordpress.com with a catchier name, same content, better editing, and its title URL should be the answer to a question on Google Search. Please send me screenshots. What should be the reason for choosing an appropriate theme? 2) Tags should be used and then you should share it on your Facebook, LinkedIn, Twitter and Google Plus profiles- please send me screenshots of this. 3) Please earn at least 5 badges in Python in Code academy for tomorrowโ€™s submission 4) Read this page please - http://pandas.pydata.org/. Download and Install Pandas 5) Download and Install Ipython-http://ipython.org/ 6) Blog on Day 2 (besides your existing edited and refined Day 1 blog) Work Update 1) Created Wordpress blog https://python4analytics.wordpress.com/ Link to Day 1 blog :- https://python4analytics.wordpress.com/2014/06/18/python-for- analytics-intro/ Link to Day 2 blog :- https://python4analytics.wordpress.com/2014/06/19/installing- ipython-on-anaconda/ 2) Earned 6 badges on Day 2 on codecademy http://www.codecademy.com/imeckr 3) Ipython and Pandas downloaded and installed on system 4) Blog shared facebook, google+, linkedin accounts. References used www.pandas.pydata.org, www.ipython.org. Also the respective documentation. Remarks 1) Maintain two different blogs one on Blogger.com for work experience and one on Wordpress.com for tech blogging on things I learn daily Screenshots
  • 8. Day 3 Task Given 1) Create accounts on topcoder, kaggle, github. Write one paragraph summary of what these websites are, what advantages can you have by an account on this 2) Install swirl package in R (use Google on how to). Do one exercise. Show screenshot 3) Go to Datacamp.com and create account. Do one exercise and show screenshot. 4) Get 4 badges in Python and 2 badges in Java Script on Code Academy 5) Blog on this. Show screenshots of analytics of each blog- answer this question- which are the metrics I should track for my blog if I want to make it better Work Update 1) Codecademy status: Python completed 50%, Java script 21% with 23 badges, 187 points and 4 day streak. 2) Blogged about R on Wordpress https://python4analytics.wordpress.com/2014/06/20/introduction-to-r-language- installing-swirl/ 3) Swirl package installed on system. Done few exercises 4) Created account on Datacamp. Done few exercises there too. 5) Creating account on sites like Topcoder, Kaggel and Github helps a user in many ways. As, these sites already have a lot registered user from across the world, it act as an online community of coders, designers, analyst, innovators etc. where users can discuss their problems and ideas among themselves. It also helps a user to see where exactly he/she stands now and how can he/she develop his/her talent in their respective field. A user can also take up various courses, projects and even also compete with other user. 6) Keeping track on following will make one's blog better (i) Referrers: - From where are my visitors are getting redirected, where should i share my blog more often? (ii)Region of visitors: - Which region does most of my visitors belong? (iii)Tags and Categories: - Shows which topic is more trending on search engines. References used www.swirlstats.com . Also its documentation. Screenshots
  • 9. Day 4-5 Task Given 1) CODING- Get to 60 % in Python and 40% in Java Script on Code Academy 2) STATISTICAL PYTHON -Go to http://pandas.pydata.org/pandas- docs/stable/10min.html#min Blog on the experience 3) PRESENTATION OF RESULTS Go to http://d3js.org/ . Read it and Blog on it. (Part 2 is shiny package in R from http://shiny.rstudio.com/tutorial/, Part 3 will http://slidify.org/ packages in R) 4) CODING- Do one modules in Swirl. Write a tech blog on what you have learnt 5) CODING- Go to Datacamp.com. Do one exercise and show screenshots. Work Update 1) Completed 60% in Python and 40% in Java Script 2) Blog on Statistical Python : https://python4analytics.wordpress.com/2014/06/22/statistical-python-pandas-library/ Blog on D3.js : https://python4analytics.wordpress.com/2014/06/23/presenting-the- results-working-with-d3-js-a-javascript-library/ 3) One module completed in Swirl 4) Completed one exercise on Datacamp.com 5) Blog on experience: Day 3: http://analyticsinternship.blogspot.in/2014/06/day-3.html Day 4-5:http://analyticsinternship.blogspot.in/2014/06/day-4-5.html References used www.d3js.org. Also its documentation. Screenshots
  • 10. Day 6-7 Task Given 1) Do one more module in Swirl 2) Do one exercise in Data Camp 3) Write Technical Blog Post on how the two are different, including plus and minus of both (Swirl vs. Data Camp) 4) Read about using JS within R here http://timelyportfolio.blogspot.in/2013/04/d3-r- with-rcharts-and-slidify.html 5) Complete 4 badges each in Python and JS Work Update 1) Codecademy status: Python - 70% and Java Script - 50% 2) One module completed in Swirl Package. 3) One exercise completed on Datacamp 4) Read about the link that you had given 5) Blogged on Swirl Vs. Datacamp http://python4analytics.wordpress.com/2014/06/24/learning-r-datacamp-com-vs- swirl-package/ References used http://timelyportfolio.blogspot.in/2013/04/d3-r-with-rcharts-and-slidify.html Screenshots
  • 11. Day 8-9 Task Given 1) Make a demo app on Shiny. How is population of India and China changing over time? How is the per capita GDP changing over time? Google for datasets. Send me initial draft. 2) Install and Load SAS University Edition http://www.sas.com/en_us/software/university-edition.html 3) Complete the exercises at https://try.github.io/ 4) 3 tech blog posts on Shiny, GIT and SAS Work Update 1) Made a demo app on shiny which can show one plot at time. Made a dataframe, which I have used in the app. 2) Completed the GIT exercise. 3) Blog on Git https://python4analytics.wordpress.com/2014/06/26/using-git-for- projects/ Blog on Shiny https://python4analytics.wordpress.com/2014/06/26/shiny-rstudio-web- application-framework-for-r/ References used www.ggvis.rstudio.com, shiny.rstudio.com Also their documentations. Screenshots Shiny App
  • 12. Day 9-11 Task Given 1) Use http://shiny.rstudio.com/gallery/ for troubleshooting your Shiny App 2) Use ggvis package somehow in your app http://ggvis.rstudio.com/ and also use d3.js (hint - read this http://www.xavierdupre.fr/blog/2013-11-30_nojs.html) 3) Use and create one small demo showing data flow and calls from python and R using ryp2. For example load some JSON data using python and then call a R package. 4) Complete all pending blog posts 5) Make a small demo website using https://jetstrap.com/ 6) Create an infographic for the same dataset that you are using in shiny dataset using http://infogr.am/ 7)Try and download and install this- this will help check for the VMware and also start off big data efforts : http://hortonworks.com/products/hortonworks-sandbox/ Work Update 1) Build the Shiny app. Used ggvis but not D3js till now 2) Used rpy2 in python to import a built-in dataset from R and plotting a graph of that. 3) Tech blogpost on SAS : http://python4analytics.wordpress.com/2014/06/30/intro-to- sas-and-installation/ 4) Made an Infographic 5) Demo Website : I tried to re-create my blog http://jetstrap.io/share/cfcd9bc36a References used www.shiny.rstudio.com, www.xavierdupre.fr/blog/2013-11-30_nojs.html Screenshots Shiny App
  • 13. Day 12-13 Task Given 1) Create Demo Website - Read this and try and create a website for Decision Stats Consulting. Take content from the image in the post, and http://decisionstats.com/about-decisionstats/ page. 2) For tomorrow Read about bootstrap http://getbootstrap.com/ and blog on it 3) Install MYSQL on your system (full installation). Learn SQL. Create a table with all teams remaining round of 16 of all players. It should have player name, player surname, football club, position he plays, one more additional column based on your discretion. Then answer using SQL queries the following answers programmatically Which World Cup team is now the tallest? Which is the oldest? Which is the shortest? Which is the youngest? Which striker is the fattest/youngest? 4) Python - Make it to 90% by Wednesday 5) R- Finish swirl (all modules) by Wednesday Work Update 1) Completed 90% Python course on Codecademy. 2)Learned about Bootstrap and revised HTML & CSS 3) Made "About Page" for Decision Stats by editing existing templates and adding some new elements(Hosted the same using dropbox.com http://imeckrdemo.kissr.com/) 4)Blogged on Bootstrap (https://python4analytics.wordpress.com/2014/07/01/make- responsive-website-with-bootstrap/) 5)Completed all modules of R programming in Swirl 6)Installed MySQL on my system. Read about MySQL and currently learning it, will do the assignment of the same after clearing doubts with you. References used www.getbootstrap.com and its documentation Screenshots
  • 15. Day 14-18 Task Given 1) Read on SQL Injection and SQL http://decisionstats.com/2013/03/26/how-to-learn- sql-injection/ and try and do the demos at http://sqlzoo.net/hack/ 2) What was the problem with SAS Installation? Blog on this AFTER you have successfully installed it and shown screenshots 3) Compile everything you have learnt in 1 page essay. With appendix of day wise submissions that you did. 4) Edit all the Blogs Work Update 1) Learned Web Scrapping using iMacros, still facing some problem in extracting data from some sites. Also wrote a blogpost on the same https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction- from-web-using-imacros/ 2) Created a basic table (in Database) using MySQL Workbench, practiced some basic queries on it. 3) Learned about SQL injection by resources provided by you. Also blogged on the same http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web- attack-technique/ References used http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/ Screenshots
  • 16. Day 19-20 Task Given 1) Read these papers: http://www.slideshare.net/ajayohri/using-r-for-cyber-security-part- 1 and http://www.sis.pitt.edu/jjoshi/courses/IS2621/Spring2014/Lab3.pdf 2 ) Use Wireshark and/or Silk to capture some dummy data from a network ( wifi or wherever) 3) Use the paper 1 to import the data in R and visualize it 4) Additional download and install wireshark and use the instructions from http://www.ict.kth.se/courses/II2202/II2202-quantitative-chip-R-20110918.pdf to help you with the analysis Work Update 1) Learned Web Scrapping using iMacros, still facing some problem in extracting data from some sites. Also wrote a blogpost on the same https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction-from- web-using-imacros/ 2) Created a basic table (in Database) using MySQL Workbench, practiced some basic queries on it. 3) Learned about SQL injection by resources provided by you. Also blogged on the same http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web-attack- technique/ References used http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/ Screenshots
  • 17. Day 20-26 Task Given 1) Complete python on codecademy. 2) Setup RStudio server on AWS and Blog on the same 3) Giving user rights to you, choosing the appropriate user rights. 4) Setup Ipython on AWS Work Update 1) Completed Python on Codecademy 2) Created AWS account, set up RStudio server on it. 3) Blogged on the same https://python4analytics.wordpress.com/2014/07/18/setting-up- rstudio-server-on-aws-ec2-instance/ References used http://www.s-anand.net/blog/ssh-tunneling-through-web-filters/ http://www.r-bloggers.com/instructions-for-installing-using-r-on-amazon-ec2/ Screenshots
  • 18. Day 40+ Task Given 1) Read these http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and- big-data http://www.slideshare.net/ajayohri/big-data-big-analytics 2) Explore Hadoop 3) Complete the tutorials on Hortonworks Sandbox Work Update 1) Read the two papers provided by you http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and- big-data http://www.slideshare.net/ajayohri/big-data-big-analytics 2) Explored Hadoop: What is HDFS, Map Reduce, Pig, Hive etc. 3) Resolved that problem that I was having with Pig and Sandbox 4) Completed first two tutorials on Hortonworks Sandbox and Learned following things: Basics commands in Pig(Grunt Shell), Downloaded a sample data and performed basic Hive Queries on it Blog Written https://python4analytics.wordpress.com/2014/08/06/installing-hortonworks-sandbox- hadoop/ Screenshots