Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data visualization by Kenneth Odoh

1,490 views

Published on

Data Visualization in Python/ Django presentation from PyCon Finland 2012

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Data visualization by Kenneth Odoh

  1. 1. Data Visualization in Python/ Django By KENNETH EMEKA ODOH By KENNETH EMEKA ODO
  2. 2. Table of ContentsIntroductionMotivationMethodAppendicesConclusionReferences
  3. 3. Introduction My background Requirements( Python, Django, Matplotlib, ajax ) and other third-party libraries. What this talk is about ( we will be restricted to python, matplotlib and django ). What this talk is not about ( we are not trying to re-implement Google analytics ). Source codes are available at ( https://github.com/kenluck2001/PyCon2012_T alk ).
  4. 4. MOTIVATIONThere is a need to represent the business analytic data in a graphical form. This is because a picture speaks more than a thousand words. Source: en.wikipedia.org
  5. 5. Where do we finddata? Source: en.wikipedia.org
  6. 6. Sources of Data• CSV• DATABASES
  7. 7. Steps for data gathering Identify the data source. Preprocessing of the data ( removing nulls, wide characters ) e.g. Google refine. Actual data processing ( perform some statistical analysis ). Present the clean data in descriptive format. i.e Data visualization See Appendix 1
  8. 8. Visual Representation of data  Charts / Diagram format  Texts format  Tables  Log filesSource: devk2.wordpress.com Source: elementsdatabase.com
  9. 9. Categorization of data Real-time ( generating charts on real time. This can also include mechanism for refreshing the site to get the latest chart ). See Appendix 2 Batch-based ( create charts from csv file. Example in my blog) See Appendix 2
  10. 10. Rules of Data Collection Keep data in the easiest process able form e.g database, csv Keep data collected with timestamp. The time that the data is collected or processed, for filtering . Gather data that are relevant to the business needs. Ensure that whenever the data grows so large. You have to prune some stale or old data that are no longer needed.
  11. 11. Where is the data visualization done? Server See Appendix from 2 - 6 Client Examples of Javascript library DS.js ( http://d3js.org/ ) gRaphael.js ( http://g.raphaeljs.com/ )
  12. 12. Factors to Consider forChoice of Visualization Where do we perform the visualization processing? Is it Server or Client?It depends Security Scalability
  13. 13. Tools needed for dataanalysis Csvkit (http://csvkit.readthedocs.org/en/latest/) networkx (graphs) (spatial analysis) (http://networkx.lanl.gov/) pySAL ( http://code.google.com/p/pysal/ )
  14. 14. AppendicesLet the codes begin
  15. 15. Appendix 1## This describes a scatter plot of solar radiation against the month.This aim to describe the steps of data gathering.CSV file from data sciencehackathon website. The source code is available in a folder named“plotCode”impoqv cuvfqommavplovlib.backendu.backend_aggimpoqv FigtqeCanvauAgg au FigtqeCanvaufqom mavplovlib.figtqe impoqv FigtqedefpqepaqeLiuv(monvh_mouv_common_liuv): Pqepaqe vhe inptv foq pqoceuu byqemoving all tnneceuuaqy valteu.Replace "NA" sivh 0 otvptv_liuv = [] foq x in monvh_mouv_common_liuv: if x != NA: otvptv_liuv.append(x)
  16. 16. Appendix 1 contd.def plovSolaqRadiavionAgainuvMonvh(filename): vqainRosReadeq =cuv.qeadeq(open(filename, qb), delimiveq=,) monvh_mouv_common_liuv = [] Solaq_qadiavion_64_liuv = [] foq qos in vqainRosReadeq: monvh_mouv_common = qos[3] Solaq_qadiavion_64 = qos[6]monvh_mouv_common_liuv.append(monvh_mouv_common)Solaq_qadiavion_64_liuv.append(Solaq_qadiavion_64) #conveqv all elemenvu in vhe liuv vo floavshile ukipping vhe fiquv elemenv foq vhe 1uvelemenv iu a deucqipvion of vhe field. monvh_mouv_common_liuv = [floav(i) foq i inpqepaqeLiuv(monvh_mouv_common_liuv)[1:] ] Solaq_qadiavion_64_liuv = [floav(i) foq i inpqepaqeLiuv(Solaq_qadiavion_64_liuv)[1:] ] fig=Figtqe() ax=fig.add_utbplov(111) vivle=Scavveq Diagqam of uolaq qadiavionagainuv monvh of vhe yeaq ax.uev_xlabel(Mouv common monvh) ax.uev_ylabel(Solaq Radiavion) fig.utpvivle(vivle, fonvuize=14) vqy:
  17. 17. Appendix 2Fqom vhe pqojecv in foldeqnamed WebMonivoqclauu LoadEvenv:def fillMonivoqModel(uelf): foq monObj inuelf.monivoqObjLiuv: mObj =Monivoq(tql =monObj[2], hvvpSvavtu =monObj[0], qeuponueTime =monObj[1], convenvSvavtu =monObj[5])
  18. 18. Appendix 3fqom django.hvvp impoqv HvvpReuponuefqom mavplovlib.backendu.backend_aggimpoqv FigtqeCanvauAgg au FigtqeCanvaufqommavplovlib.figtqeimpoqv Figtqefqom YAAS.uvavu.modelu impoqvRegiuveqedUueq, OnlineUueq, SvavBid #ucavveq diagqam ofntmbeq of bidu made againuv ntmbeq of online tuequ# seekly qepoqv@uvaff_membeq_qertiqeddef seeklyScavveqOnlinUuqBid(qerteuv, seek_no): page_vivle=Weekly Scavveq Diagqam baued on Onlinetueq vequeu Bid seekno=seek_no fig=Figtqe() ax=fig.add_utbplov(111) yeaq=uvav.gevYeaq() onlUueqObj =OnlineUueq.objecvu.filveq(seek=seekno).filveq(yeaq=yeaq) bidObj =SvavBid.objecvu.filveq(seek=seekno).filveq(yeaq=yeaq) onlUueqliuv =liuv(onlUueqObj.valteu_liuv(no_of_online_tueq, flav=Tqte)) bidliuv =liuv(bidObj.valteu_liuv(no_of_bidu, flav=Tqte)) vivle=Scavveq Diagqam of ntmbeq of online Uueqagainuv ntmbeq of bidu (seek {0l){1l.foqmav(seekno,yeaq) ax.uev_xlabel(Ntmbeq of online Uuequ) ax.uev_ylabel(Ntmbeq of Bidu) fig.utpvivle(vivle, fonvuize=14) vqy: ax.ucavveq(onlUueqliuv, bidliuv) excepv ValteEqqoq: pauu
  19. 19. Appendix 4# Example of how database may be deleted to recover some space.From folder named “YAAS”. Check task.py@peqiodic_vauk(qtn_eveqy=cqonvab(hotq=1, mintve=30, day_of_seek=0))def deleveOldIvemuandBidu(): htndeqedandvsenvydayu =davevime.voday() -davevime.vimedelva(dayu=120) myIvem =Ivem.objecvu.filveq(end_dave__lve=htndeqedandvsenvydayu ).deleve() myBid =Bid.objecvu.filveq(end_dave__lve=htndeqedandvsenvydayu).deleve()#poptlave vheqegiuveqedtueq and onlinetueq modelav qegtlaq inveqvalu
  20. 20. Appendix 5Check project inYAAS/stats/for more information onstatistical processing
  21. 21. Appendix 6 # how to refresh the views in django. To keep the charts. updated. See WebMonitor project {% exvendu "baue.hvml" %l {% block uive_sqappeq %l <div id="meuuageu">Updaving vableu ...</div> <ucqipv> ftncvion qefqeuh() { $.ajax({ tql: "/monivoq/", utcceuu: ftncvion(dava) { $(#meuuageu).hvml(dava); l l); uevInveqval("qefqeuh()", 10000 0);
  22. 22. References Python documentation ( http://www.python.org/ ) Django documentation ( https://www.djangoproject.com/ ) Stack overflow ( http://stackoverflow.com/ ) Celery documentation (http://ask.github.com/celery/)Pictures email logo ( http:// ambrosedesigns.co.uk ) blog logo ( http:// sociolatte.com )
  23. 23. Thanks for listening Follow me using any of @kenluck2001 kenluck2001@yahoo.com http://kenluck2001.tumblr.com / https://github.com/kenluck200 1

×