Your SlideShare is downloading. ×
Data visualization in python/Django
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Data visualization in python/Django

7,378
views

Published on

This is a slide talk presentation in Aalto University.

This is a slide talk presentation in Aalto University.

Published in: Technology

0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,378
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
38
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Data Visualization in Python/Django By KENNETH EMEKA ODOH By KENNETH EMEKA ODOH
  • 2. Table of ContentsIntroductionMotivationMethodAppendicesConclusionReferences
  • 3. Introduction My background Requirements ( Python, Django, Matplotlib, ajax ) and other third-party libraries. What this talk is not about ( we are not trying to re-implement Google analytics ). Source codes are available at ( https://github.com/kenluck2001/PyCon2012 _Talk )."Everything should be made as simple as
  • 4. MOTIVATIONThere is a need to represent the business analytic data in a graphical form. This because a picture speaks more than a thousand words. Source: en.wikipedia.org
  • 5. Where do we finddata? Source: en.wikipedia.org
  • 6. Sources of Data• CSV• DATABASES
  • 7. Data Processing Identify the data source. Preprocessing of the data ( removing nulls, wide characters ) e.g. Google refine. Actual data processing. Present the clean data in descriptive format. i.e. Data visualization See Appendix 1
  • 8. Visual Representation of data  Charts / Diagram format  Texts format Tables Log filesSource: devk2.wordpress.com Source: elementsdatabase.com
  • 9. Categorization of dataReal-time See Appendix 2Batch-based See Appendix 2
  • 10. Rules of Data Collection Keep data in the easiest processable form e.g database, csv Keep data collected with timestamp. Gather data that are relevant to the business needs. Remove old data
  • 11. Where is the data visualization done? Server See Appendix from 2 - 6 Client Examples of Javascript library DS.js ( http://d3js.org/ ) gRaphael.js ( http://g.raphaeljs.com/ )
  • 12. Factors to Consider forChoice of Visualization Where do we perform the visualization processing? Is it Server or Client?It depends Security Scalability
  • 13. Tools needed for dataanalysis Csvkit ( http://csvkit.readthedocs.org/en/latest/ ) networkx ( http://networkx.lanl.gov/ ) pySAL ( http://code.google.com/p/pysal/ )
  • 14. AppendicesLet the codes begin Source: caseinsights.com
  • 15. Appendix 1## This describes a scatter plot of solar radiation against the month.This aim to describe the steps of data gathering.CSV file from data sciencehackathon website. The source code is available in a folder named“plotCode”import csvfrom matplotlib.backends.backend_aggimport FigureCanvasAgg as FigureCanvasfrom matplotlib.figure import Figuredef prepareList(month_most_common_list): Prepare the input for process by removing all unnecessary values. Replace "NA"with 0„ output_list = [] for x in month_most_common_list: if x != NA: output_list.append(x) else: output_list.append(0) return output_list
  • 16. Appendix 1def plotSolarRadiationAgainstMonth(filename): contd. trainRowReader = csv.reader(open(filename, rb), delimiter=,) month_most_common_list = [] Solar_radiation_64_list = [] for row in trainRowReader: month_most_common = row[3] Solar_radiation_64 = row[6] month_most_common_list.append(month_most_common) Solar_radiation_64_list.append(Solar_radiation_64) #convert all elements in the list to float while skipping the first element for the 1st element is adescription of the field. month_most_common_list = [float(i) for i in prepareList(month_most_common_list)[1:] ] Solar_radiation_64_list = [float(i) for i in prepareList(Solar_radiation_64_list)[1:] ] fig=Figure() ax=fig.add_subplot(111) title=Scatter Diagram of solar radiation against month of the year ax.set_xlabel(Most common month) ax.set_ylabel(Solar Radiation) fig.suptitle(title, fontsize=14) try: ax.scatter(month_most_common_list, Solar_radiation_64_list) #it is possible to make other kind of plots e.g bar charts, pie charts, histogram except ValueError: pass canvas = FigureCanvas(fig) canvas.print_figure(solarRadMonth.png,dpi=500) if __name__ == "__main__": plotSolarRadiationAgainstMonth(TrainingData.csv)
  • 17. Appendix 2From the project in folder named WebMonitorclass LoadEvent:…def fillMonitorModel(self): for monObj in self.monitorObjList: mObj = Monitor(url = monObj[2], httpStatus =monObj[0], responseTime = monObj[1], contentStatus= monObj[5]) mObj.save()#also see the following examples in project namedYAAStasks.py This shows how the analytic tables areloaded with real-time data.
  • 18. Appendix 3from django.http import HttpResponsefrom matplotlib.backends.backend_aggimport FigureCanvasAgg as FigureCanvasfrom matplotlib.figureimport Figurefrom YAAS.stats.models import RegisteredUser, OnlineUser, StatBid #scatter diagram of number of bidsmade against number of online users# weekly report@staff_member_requireddef weeklyScatterOnlinUsrBid(request, week_no): page_title=Weekly Scatter Diagram based on Online user verses Bid weekno=week_no fig=Figure() ax=fig.add_subplot(111) year=stat.getYear() onlUserObj = OnlineUser.objects.filter(week=weekno).filter(year=year) bidObj = StatBid.objects.filter(week=weekno).filter(year=year) onlUserlist = list(onlUserObj.values_list(no_of_online_user, flat=True)) bidlist = list(bidObj.values_list(no_of_bids, flat=True)) title=Scatter Diagram of number of online User against number of bids (week {0}){1}.format(weekno,year) ax.set_xlabel(Number of online Users) ax.set_ylabel(Number of Bids) fig.suptitle(title, fontsize=14) try: ax.scatter(onlUserlist, bidlist) except ValueError: pass canvas = FigureCanvas(fig) response = HttpResponse(content_type=image/png) canvas.print_png(response) return responseMore info. can be found in YAAS/graph/The folder named"graph"
  • 19. Appendix 4# Example of how database may be deleted to recover some space.From folder named “YAAS”. Check task.py@periodic_task(run_every=crontab(hour=1, minute=30, day_of_week=0))def deleteOldItemsandBids(): hunderedandtwentydays = datetime.today() -datetime.timedelta(days=120) myItem = Item.objects.filter(end_date__lte=hunderedandtwentydays).delete() myBid = Bid.objects.filter(end_date__lte=hunderedandtwentydays).delete()#populate the registereduser and onlineuser model at regularintervals
  • 20. Appendix 5Check project inYAAS/stats/for more information onstatistical processing
  • 21. Appendix 6 # how to refresh the views in django. To keep the charts. updated. See WebMonitor project {% extends "base.html" %} {% block site_wrapper %} <div id="messages">Updating tables ...</div> <script> function refresh() { $.ajax({ url: "/monitor/", success: function(data) { $(#messages).html(data); } }); setInterval("refresh()", 100000); } $(function(){ refresh(); }); </script> {% endblock %}
  • 22. References Python documentation ( http://www.python.org/ ) Django documentation ( https://www.djangoproject.com/ ) Stack overflow ( http://stackoverflow.com/ ) Celery documentation (http://ask.github.com/celery/)Pictures email logo ( http:// ambrosedesigns.co.uk ) blog logo ( http:// sociolatte.com )
  • 23. Thanks for listening Follow me using any of @kenluck2001 kenluck2001@yahoo.com http://kenluck2001.tumblr.com / https://github.com/kenluck200 1