Data visualization in python/Django


Published on

This is a slide talk presentation in Aalto University.

Published in: Technology

Data visualization in python/Django

  1. 1. Data Visualization in Python/Django By KENNETH EMEKA ODOH By KENNETH EMEKA ODOH
  2. 2. Table of ContentsIntroductionMotivationMethodAppendicesConclusionReferences
  3. 3. Introduction My background Requirements ( Python, Django, Matplotlib, ajax ) and other third-party libraries. What this talk is not about ( we are not trying to re-implement Google analytics ). Source codes are available at ( _Talk )."Everything should be made as simple as
  4. 4. MOTIVATIONThere is a need to represent the business analytic data in a graphical form. This because a picture speaks more than a thousand words. Source:
  5. 5. Where do we finddata? Source:
  6. 6. Sources of Data• CSV• DATABASES
  7. 7. Data Processing Identify the data source. Preprocessing of the data ( removing nulls, wide characters ) e.g. Google refine. Actual data processing. Present the clean data in descriptive format. i.e. Data visualization See Appendix 1
  8. 8. Visual Representation of data  Charts / Diagram format  Texts format Tables Log filesSource: Source:
  9. 9. Categorization of dataReal-time See Appendix 2Batch-based See Appendix 2
  10. 10. Rules of Data Collection Keep data in the easiest processable form e.g database, csv Keep data collected with timestamp. Gather data that are relevant to the business needs. Remove old data
  11. 11. Where is the data visualization done? Server See Appendix from 2 - 6 Client Examples of Javascript library DS.js ( ) gRaphael.js ( )
  12. 12. Factors to Consider forChoice of Visualization Where do we perform the visualization processing? Is it Server or Client?It depends Security Scalability
  13. 13. Tools needed for dataanalysis Csvkit ( ) networkx ( ) pySAL ( )
  14. 14. AppendicesLet the codes begin Source:
  15. 15. Appendix 1## This describes a scatter plot of solar radiation against the month.This aim to describe the steps of data gathering.CSV file from data sciencehackathon website. The source code is available in a folder named“plotCode”import csvfrom matplotlib.backends.backend_aggimport FigureCanvasAgg as FigureCanvasfrom matplotlib.figure import Figuredef prepareList(month_most_common_list): Prepare the input for process by removing all unnecessary values. Replace "NA"with 0„ output_list = [] for x in month_most_common_list: if x != NA: output_list.append(x) else: output_list.append(0) return output_list
  16. 16. Appendix 1def plotSolarRadiationAgainstMonth(filename): contd. trainRowReader = csv.reader(open(filename, rb), delimiter=,) month_most_common_list = [] Solar_radiation_64_list = [] for row in trainRowReader: month_most_common = row[3] Solar_radiation_64 = row[6] month_most_common_list.append(month_most_common) Solar_radiation_64_list.append(Solar_radiation_64) #convert all elements in the list to float while skipping the first element for the 1st element is adescription of the field. month_most_common_list = [float(i) for i in prepareList(month_most_common_list)[1:] ] Solar_radiation_64_list = [float(i) for i in prepareList(Solar_radiation_64_list)[1:] ] fig=Figure() ax=fig.add_subplot(111) title=Scatter Diagram of solar radiation against month of the year ax.set_xlabel(Most common month) ax.set_ylabel(Solar Radiation) fig.suptitle(title, fontsize=14) try: ax.scatter(month_most_common_list, Solar_radiation_64_list) #it is possible to make other kind of plots e.g bar charts, pie charts, histogram except ValueError: pass canvas = FigureCanvas(fig) canvas.print_figure(solarRadMonth.png,dpi=500) if __name__ == "__main__": plotSolarRadiationAgainstMonth(TrainingData.csv)
  17. 17. Appendix 2From the project in folder named WebMonitorclass LoadEvent:…def fillMonitorModel(self): for monObj in self.monitorObjList: mObj = Monitor(url = monObj[2], httpStatus =monObj[0], responseTime = monObj[1], contentStatus= monObj[5]) see the following examples in project This shows how the analytic tables areloaded with real-time data.
  18. 18. Appendix 3from django.http import HttpResponsefrom matplotlib.backends.backend_aggimport FigureCanvasAgg as FigureCanvasfrom matplotlib.figureimport Figurefrom YAAS.stats.models import RegisteredUser, OnlineUser, StatBid #scatter diagram of number of bidsmade against number of online users# weekly report@staff_member_requireddef weeklyScatterOnlinUsrBid(request, week_no): page_title=Weekly Scatter Diagram based on Online user verses Bid weekno=week_no fig=Figure() ax=fig.add_subplot(111) year=stat.getYear() onlUserObj = OnlineUser.objects.filter(week=weekno).filter(year=year) bidObj = StatBid.objects.filter(week=weekno).filter(year=year) onlUserlist = list(onlUserObj.values_list(no_of_online_user, flat=True)) bidlist = list(bidObj.values_list(no_of_bids, flat=True)) title=Scatter Diagram of number of online User against number of bids (week {0}){1}.format(weekno,year) ax.set_xlabel(Number of online Users) ax.set_ylabel(Number of Bids) fig.suptitle(title, fontsize=14) try: ax.scatter(onlUserlist, bidlist) except ValueError: pass canvas = FigureCanvas(fig) response = HttpResponse(content_type=image/png) canvas.print_png(response) return responseMore info. can be found in YAAS/graph/The folder named"graph"
  19. 19. Appendix 4# Example of how database may be deleted to recover some space.From folder named “YAAS”. Check, minute=30, day_of_week=0))def deleteOldItemsandBids(): hunderedandtwentydays = -datetime.timedelta(days=120) myItem = Item.objects.filter(end_date__lte=hunderedandtwentydays).delete() myBid = Bid.objects.filter(end_date__lte=hunderedandtwentydays).delete()#populate the registereduser and onlineuser model at regularintervals
  20. 20. Appendix 5Check project inYAAS/stats/for more information onstatistical processing
  21. 21. Appendix 6 # how to refresh the views in django. To keep the charts. updated. See WebMonitor project {% extends "base.html" %} {% block site_wrapper %} <div id="messages">Updating tables ...</div> <script> function refresh() { $.ajax({ url: "/monitor/", success: function(data) { $(#messages).html(data); } }); setInterval("refresh()", 100000); } $(function(){ refresh(); }); </script> {% endblock %}
  22. 22. References Python documentation ( ) Django documentation ( ) Stack overflow ( ) Celery documentation ( email logo ( http:// ) blog logo ( http:// )
  23. 23. Thanks for listening Follow me using any of @kenluck2001 / 1