INTERNSHIP REPORT
At Decision Stats Consultancy
18th
June 2014 - Present
Chandan Kumar Routray Second Year Undergraduate
S...
Table of Contents
I. Summary................................................................ 2
II. An Overview: Thing that...
Summary
I have completed almost a month of this internship with Decision Stats
Consultancy. This internship has been a rol...
An Overview: Thing that I learned
Programming Web Development
• Python
(www.codecademy.com)
• Java Script
• R (Swirl Packa...
Blog Posts: During the Internship
Topics Links
Python https://python4analytics.wordpress.com/2014/06/18/python
-for-analyt...
Appendix (Day wise Work)
Day 1
Task Given
1) Create a blog on http://blogger.com and http://wordpress.com
2) Start an acco...
Day 2
Task Given
1) Create a new blog on Wordpress.com with a catchier name, same content, better
editing, and its title U...
Day 3
Task Given
1) Create accounts on topcoder, kaggle, github. Write one paragraph summary of
what these websites are, w...
Day 4-5
Task Given
1) CODING- Get to 60 % in Python and 40% in Java Script on Code Academy
2) STATISTICAL PYTHON -Go to ht...
Day 6-7
Task Given
1) Do one more module in Swirl
2) Do one exercise in Data Camp
3) Write Technical Blog Post on how the ...
Day 8-9
Task Given
1) Make a demo app on Shiny. How is population of India and China changing over
time? How is the per ca...
Day 9-11
Task Given
1) Use http://shiny.rstudio.com/gallery/ for troubleshooting your Shiny App
2) Use ggvis package someh...
Day 12-13
Task Given
1) Create Demo Website - Read this and try and create a website for Decision Stats
Consulting. Take c...
MadeThisDemoWebsite
Day 14-18
Task Given
1) Read on SQL Injection and SQL http://decisionstats.com/2013/03/26/how-to-learn-
sql-injection/ and...
Day 19-20
Task Given
1) Read these papers: http://www.slideshare.net/ajayohri/using-r-for-cyber-security-part-
1 and http:...
Day 20-26
Task Given
1) Complete python on codecademy.
2) Setup RStudio server on AWS and Blog on the same
3) Giving user ...
Day 40+
Task Given
1) Read these
http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and-
big...
Thank You
Upcoming SlideShare
Loading in …5
×

Decisionstats.com Data Science Virtual Internship

2,785 views

Published on

an internship report by Decisionstats.com intern and IIT Student - Chandan R.

Published in: Engineering
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,785
On SlideShare
0
From Embeds
0
Number of Embeds
1,476
Actions
Shares
0
Downloads
32
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Decisionstats.com Data Science Virtual Internship

  1. 1. INTERNSHIP REPORT At Decision Stats Consultancy 18th June 2014 - Present Chandan Kumar Routray Second Year Undergraduate Student IIT Kharagpur, West Bengal
  2. 2. Table of Contents I. Summary................................................................ 2 II. An Overview: Thing that I learned ....................... 3 III. Blog Posts: During the Internship........................ 4 IV. Appendix (Day wise Work) ................................... 5 Day 1...................................................................................................5 Day 2...................................................................................................6 Day 3...................................................................................................7 Day 4-5 ...............................................................................................8 Day 6-7 ...............................................................................................9 Day 8-9 .............................................................................................10 Day 9-11...........................................................................................11 Day 12-13.........................................................................................12 Day 14-18.........................................................................................14 Day 19-20.........................................................................................15 Day 20-26.........................................................................................16 Day 40+............................................................................................17
  3. 3. Summary I have completed almost a month of this internship with Decision Stats Consultancy. This internship has been a roller coaster ride for me from the very beginning and the past four weeks were the most productive period of my life in terms of learning. It has helped me changed into a more disciplined and professional person. This internship helped me to acquire a wide set of skills like analytics, web development, technical blog writing etc. The daily update calls of Mr. Ajay Ohri, my guide for this internship were scheduled at around 9 pm through Skype in which he reviews my daily assignments, gives me some useful tips on how to manage my work and to make it more presentable followed by the assignment for the next day. Every day after these calls I found myself with a new target to achieve in a given period of time, according to which I plan my next day so as to achieve the same. A typical assignment consist of learning a new thing like coding or understanding a package in R or Python and performing an exercise on the same, writing an informative blogpost on the things I learned that day and a life experience blogpost. In this due course I learned a lot of things: Coding in Python, R & JavaScript, Using packages like Shiny, Rpy2, ggvis etc. in both R and Python, How to write a query in a database using MySql, How to protect your website by SQL Injection, Making your own website using Bootstrap, Automated extraction of data from web and network, Working in the cloud and many other things. Earlier I used to run away from writing stuffs but now I have become a blogger and I am enjoying it too. I have also learned how to write an informative blogpost, people have also started asking doubts on the same and few of my blogposts are also re-blogged by some bloggers. Till now I have written 14 tech blogpost about various thing that I have learned, all of them were made very reader friendly by me. I have been very excited about his internship from the very beginning and now Mr. Ajay Ohri has offered me to continue this internship for some more time for which I am very grateful to him.
  4. 4. An Overview: Thing that I learned Programming Web Development • Python (www.codecademy.com) • Java Script • R (Swirl Package and www.datacamp.com) • Bootstrap (www.jetstrap.com) • SQL and SQL Injection • JavaScript(D3.js) • Hosting via Dropbox Writing Software • Technical Blog Writing www.python4analytics.wor dpress.com • Report Making • Virtualization Software: Oracle Virtual Box VM Ware Player • Database Management: My SQL Workbench Analytics Working on Cloud • Data Extraction: Wireshark, iMacros • Analysing Data: Rstudio, Python (Pandas,Rpy2) • Result Presentation: Rstudio(Shiny, ggvis, slidify), d3.js • AWS EC2: Starting an Instance, Accessing the instance, Installing Rstudio Server & Ipython on it etc. Big Data Other • Apache Hadoop: On Hortonworks Sandbox • Hue: Hive, Pig, HCatalog • Git • Infographics: Infogr.am
  5. 5. Blog Posts: During the Internship Topics Links Python https://python4analytics.wordpress.com/2014/06/18/python -for-analytics-intro/ Installing Ipython https://python4analytics.wordpress.com/2014/06/19/installing- ipython-on-anaconda/ Introduction to R https://python4analytics.wordpress.com/2014/06/20/introdu ction-to-r-language-installing-swirl/ Pandas Library https://python4analytics.wordpress.com/2014/06/22/statistical- python-pandas-library/ D3.js(JavaScr ipt) https://python4analytics.wordpress.com/2014/06/23/presen ting-the-results-working-with-d3-js-a-javascript-library/ Datacamp vs. Swirl https://python4analytics.wordpress.com/2014/06/24/learning-r- datacamp-com-vs-swirl-package/ Shiny https://python4analytics.wordpress.com/2014/06/26/shiny- rstudio-web-application-framework-for-r/ Git https://python4analytics.wordpress.com/2014/06/26/using-git-for- projects/ SAS https://python4analytics.wordpress.com/2014/06/30/intro- to-sas-and-installation/ iMacros https://python4analytics.wordpress.com/2014/07/07/web- scrapingdata-extraction-from-web-using-imacros/ SQL https://python4analytics.wordpress.com/2014/07/07/web- scrapingdata-extraction-from-web-using-imacros/ EC2 https://python4analytics.wordpress.com/2014/07/18/setting-up- rstudio-server-on-aws-ec2-instance/ Infogr.am https://python4analytics.wordpress.com/2014/07/25/infogra m-infographics-made-easy/ Wireshark https://python4analytics.wordpress.com/2014/07/25/infogram- infographics-made-easy/ Bootstrap https://python4analytics.wordpress.com/2014/07/01/make- responsive-website-with-bootstrap/ Apache Hadoop https://python4analytics.wordpress.com/2014/08/06/installin g-hortonworks-sandbox-hadoop
  6. 6. Appendix (Day wise Work) Day 1 Task Given 1) Create a blog on http://blogger.com and http://wordpress.com 2) Start an account on code academy and send screenshot of initial Page. You will be learning Python 3) Download and Install R from www.r-project.org 4) Write a blog post on your experience on Day 1 of internship Work Update 1) Codecademy account started. Completed 23% of the beginner’s course on very first day. Check my progress by visiting this link :- www.codecademy.com/imeckr 2) R downloaded and installed on my system. 3) Created a blog on Blogger.com. I have also posted my first Blog on it http://analyticsinternship.blogspot.com/2014/06/day-1.html Reference used None Remarks 1) Proper editing of the day1 blog, write more information oriented blog 2) URL of a web should be answer to a question(SEO) 3) How to select a good theme for your blog
  7. 7. Day 2 Task Given 1) Create a new blog on Wordpress.com with a catchier name, same content, better editing, and its title URL should be the answer to a question on Google Search. Please send me screenshots. What should be the reason for choosing an appropriate theme? 2) Tags should be used and then you should share it on your Facebook, LinkedIn, Twitter and Google Plus profiles- please send me screenshots of this. 3) Please earn at least 5 badges in Python in Code academy for tomorrow’s submission 4) Read this page please - http://pandas.pydata.org/. Download and Install Pandas 5) Download and Install Ipython-http://ipython.org/ 6) Blog on Day 2 (besides your existing edited and refined Day 1 blog) Work Update 1) Created Wordpress blog https://python4analytics.wordpress.com/ Link to Day 1 blog :- https://python4analytics.wordpress.com/2014/06/18/python-for- analytics-intro/ Link to Day 2 blog :- https://python4analytics.wordpress.com/2014/06/19/installing- ipython-on-anaconda/ 2) Earned 6 badges on Day 2 on codecademy http://www.codecademy.com/imeckr 3) Ipython and Pandas downloaded and installed on system 4) Blog shared facebook, google+, linkedin accounts. References used www.pandas.pydata.org, www.ipython.org. Also the respective documentation. Remarks 1) Maintain two different blogs one on Blogger.com for work experience and one on Wordpress.com for tech blogging on things I learn daily Screenshots
  8. 8. Day 3 Task Given 1) Create accounts on topcoder, kaggle, github. Write one paragraph summary of what these websites are, what advantages can you have by an account on this 2) Install swirl package in R (use Google on how to). Do one exercise. Show screenshot 3) Go to Datacamp.com and create account. Do one exercise and show screenshot. 4) Get 4 badges in Python and 2 badges in Java Script on Code Academy 5) Blog on this. Show screenshots of analytics of each blog- answer this question- which are the metrics I should track for my blog if I want to make it better Work Update 1) Codecademy status: Python completed 50%, Java script 21% with 23 badges, 187 points and 4 day streak. 2) Blogged about R on Wordpress https://python4analytics.wordpress.com/2014/06/20/introduction-to-r-language- installing-swirl/ 3) Swirl package installed on system. Done few exercises 4) Created account on Datacamp. Done few exercises there too. 5) Creating account on sites like Topcoder, Kaggel and Github helps a user in many ways. As, these sites already have a lot registered user from across the world, it act as an online community of coders, designers, analyst, innovators etc. where users can discuss their problems and ideas among themselves. It also helps a user to see where exactly he/she stands now and how can he/she develop his/her talent in their respective field. A user can also take up various courses, projects and even also compete with other user. 6) Keeping track on following will make one's blog better (i) Referrers: - From where are my visitors are getting redirected, where should i share my blog more often? (ii)Region of visitors: - Which region does most of my visitors belong? (iii)Tags and Categories: - Shows which topic is more trending on search engines. References used www.swirlstats.com . Also its documentation. Screenshots
  9. 9. Day 4-5 Task Given 1) CODING- Get to 60 % in Python and 40% in Java Script on Code Academy 2) STATISTICAL PYTHON -Go to http://pandas.pydata.org/pandas- docs/stable/10min.html#min Blog on the experience 3) PRESENTATION OF RESULTS Go to http://d3js.org/ . Read it and Blog on it. (Part 2 is shiny package in R from http://shiny.rstudio.com/tutorial/, Part 3 will http://slidify.org/ packages in R) 4) CODING- Do one modules in Swirl. Write a tech blog on what you have learnt 5) CODING- Go to Datacamp.com. Do one exercise and show screenshots. Work Update 1) Completed 60% in Python and 40% in Java Script 2) Blog on Statistical Python : https://python4analytics.wordpress.com/2014/06/22/statistical-python-pandas-library/ Blog on D3.js : https://python4analytics.wordpress.com/2014/06/23/presenting-the- results-working-with-d3-js-a-javascript-library/ 3) One module completed in Swirl 4) Completed one exercise on Datacamp.com 5) Blog on experience: Day 3: http://analyticsinternship.blogspot.in/2014/06/day-3.html Day 4-5:http://analyticsinternship.blogspot.in/2014/06/day-4-5.html References used www.d3js.org. Also its documentation. Screenshots
  10. 10. Day 6-7 Task Given 1) Do one more module in Swirl 2) Do one exercise in Data Camp 3) Write Technical Blog Post on how the two are different, including plus and minus of both (Swirl vs. Data Camp) 4) Read about using JS within R here http://timelyportfolio.blogspot.in/2013/04/d3-r- with-rcharts-and-slidify.html 5) Complete 4 badges each in Python and JS Work Update 1) Codecademy status: Python - 70% and Java Script - 50% 2) One module completed in Swirl Package. 3) One exercise completed on Datacamp 4) Read about the link that you had given 5) Blogged on Swirl Vs. Datacamp http://python4analytics.wordpress.com/2014/06/24/learning-r-datacamp-com-vs- swirl-package/ References used http://timelyportfolio.blogspot.in/2013/04/d3-r-with-rcharts-and-slidify.html Screenshots
  11. 11. Day 8-9 Task Given 1) Make a demo app on Shiny. How is population of India and China changing over time? How is the per capita GDP changing over time? Google for datasets. Send me initial draft. 2) Install and Load SAS University Edition http://www.sas.com/en_us/software/university-edition.html 3) Complete the exercises at https://try.github.io/ 4) 3 tech blog posts on Shiny, GIT and SAS Work Update 1) Made a demo app on shiny which can show one plot at time. Made a dataframe, which I have used in the app. 2) Completed the GIT exercise. 3) Blog on Git https://python4analytics.wordpress.com/2014/06/26/using-git-for- projects/ Blog on Shiny https://python4analytics.wordpress.com/2014/06/26/shiny-rstudio-web- application-framework-for-r/ References used www.ggvis.rstudio.com, shiny.rstudio.com Also their documentations. Screenshots Shiny App
  12. 12. Day 9-11 Task Given 1) Use http://shiny.rstudio.com/gallery/ for troubleshooting your Shiny App 2) Use ggvis package somehow in your app http://ggvis.rstudio.com/ and also use d3.js (hint - read this http://www.xavierdupre.fr/blog/2013-11-30_nojs.html) 3) Use and create one small demo showing data flow and calls from python and R using ryp2. For example load some JSON data using python and then call a R package. 4) Complete all pending blog posts 5) Make a small demo website using https://jetstrap.com/ 6) Create an infographic for the same dataset that you are using in shiny dataset using http://infogr.am/ 7)Try and download and install this- this will help check for the VMware and also start off big data efforts : http://hortonworks.com/products/hortonworks-sandbox/ Work Update 1) Build the Shiny app. Used ggvis but not D3js till now 2) Used rpy2 in python to import a built-in dataset from R and plotting a graph of that. 3) Tech blogpost on SAS : http://python4analytics.wordpress.com/2014/06/30/intro-to- sas-and-installation/ 4) Made an Infographic 5) Demo Website : I tried to re-create my blog http://jetstrap.io/share/cfcd9bc36a References used www.shiny.rstudio.com, www.xavierdupre.fr/blog/2013-11-30_nojs.html Screenshots Shiny App
  13. 13. Day 12-13 Task Given 1) Create Demo Website - Read this and try and create a website for Decision Stats Consulting. Take content from the image in the post, and http://decisionstats.com/about-decisionstats/ page. 2) For tomorrow Read about bootstrap http://getbootstrap.com/ and blog on it 3) Install MYSQL on your system (full installation). Learn SQL. Create a table with all teams remaining round of 16 of all players. It should have player name, player surname, football club, position he plays, one more additional column based on your discretion. Then answer using SQL queries the following answers programmatically Which World Cup team is now the tallest? Which is the oldest? Which is the shortest? Which is the youngest? Which striker is the fattest/youngest? 4) Python - Make it to 90% by Wednesday 5) R- Finish swirl (all modules) by Wednesday Work Update 1) Completed 90% Python course on Codecademy. 2)Learned about Bootstrap and revised HTML & CSS 3) Made "About Page" for Decision Stats by editing existing templates and adding some new elements(Hosted the same using dropbox.com http://imeckrdemo.kissr.com/) 4)Blogged on Bootstrap (https://python4analytics.wordpress.com/2014/07/01/make- responsive-website-with-bootstrap/) 5)Completed all modules of R programming in Swirl 6)Installed MySQL on my system. Read about MySQL and currently learning it, will do the assignment of the same after clearing doubts with you. References used www.getbootstrap.com and its documentation Screenshots
  14. 14. MadeThisDemoWebsite
  15. 15. Day 14-18 Task Given 1) Read on SQL Injection and SQL http://decisionstats.com/2013/03/26/how-to-learn- sql-injection/ and try and do the demos at http://sqlzoo.net/hack/ 2) What was the problem with SAS Installation? Blog on this AFTER you have successfully installed it and shown screenshots 3) Compile everything you have learnt in 1 page essay. With appendix of day wise submissions that you did. 4) Edit all the Blogs Work Update 1) Learned Web Scrapping using iMacros, still facing some problem in extracting data from some sites. Also wrote a blogpost on the same https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction- from-web-using-imacros/ 2) Created a basic table (in Database) using MySQL Workbench, practiced some basic queries on it. 3) Learned about SQL injection by resources provided by you. Also blogged on the same http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web- attack-technique/ References used http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/ Screenshots
  16. 16. Day 19-20 Task Given 1) Read these papers: http://www.slideshare.net/ajayohri/using-r-for-cyber-security-part- 1 and http://www.sis.pitt.edu/jjoshi/courses/IS2621/Spring2014/Lab3.pdf 2 ) Use Wireshark and/or Silk to capture some dummy data from a network ( wifi or wherever) 3) Use the paper 1 to import the data in R and visualize it 4) Additional download and install wireshark and use the instructions from http://www.ict.kth.se/courses/II2202/II2202-quantitative-chip-R-20110918.pdf to help you with the analysis Work Update 1) Learned Web Scrapping using iMacros, still facing some problem in extracting data from some sites. Also wrote a blogpost on the same https://python4analytics.wordpress.com/2014/07/07/web-scrapingdata-extraction-from- web-using-imacros/ 2) Created a basic table (in Database) using MySQL Workbench, practiced some basic queries on it. 3) Learned about SQL injection by resources provided by you. Also blogged on the same http://python4analytics.wordpress.com/2014/07/08/sql-and-sql-injection-a-web-attack- technique/ References used http://decisionstats.com/2013/03/26/how-to-learn-sql-injection/ http://sqlzoo.net/hack/ Screenshots
  17. 17. Day 20-26 Task Given 1) Complete python on codecademy. 2) Setup RStudio server on AWS and Blog on the same 3) Giving user rights to you, choosing the appropriate user rights. 4) Setup Ipython on AWS Work Update 1) Completed Python on Codecademy 2) Created AWS account, set up RStudio server on it. 3) Blogged on the same https://python4analytics.wordpress.com/2014/07/18/setting-up- rstudio-server-on-aws-ec2-instance/ References used http://www.s-anand.net/blog/ssh-tunneling-through-web-filters/ http://www.r-bloggers.com/instructions-for-installing-using-r-on-amazon-ec2/ Screenshots
  18. 18. Day 40+ Task Given 1) Read these http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and- big-data http://www.slideshare.net/ajayohri/big-data-big-analytics 2) Explore Hadoop 3) Complete the tutorials on Hortonworks Sandbox Work Update 1) Read the two papers provided by you http://www.slideshare.net/ajayohri/decision-making-in-the-era-of-cloud-computing-and- big-data http://www.slideshare.net/ajayohri/big-data-big-analytics 2) Explored Hadoop: What is HDFS, Map Reduce, Pig, Hive etc. 3) Resolved that problem that I was having with Pig and Sandbox 4) Completed first two tutorials on Hortonworks Sandbox and Learned following things: Basics commands in Pig(Grunt Shell), Downloaded a sample data and performed basic Hive Queries on it Blog Written https://python4analytics.wordpress.com/2014/08/06/installing-hortonworks-sandbox- hadoop/ Screenshots
  19. 19. Thank You

×