Mining Lectures
 Marcel Caraciolo - @marcelcaraciolo




                                       1
Who’s me ?                    Marcel Pinheiro Caraciolo

 Brazilian, lover of crabs

 Director of P&D - brazilian startup Orygens
 M.S.C Candidate at Data Mining and Recommender Systems
 Current moderator of the Local Python User Group at Pernambuco

           Interested at machine learning,
      recommender systems and mobile computing

  Blogging about machine learning with Python since 2008
              http://aimotion.blogspot.com

 Young apprentice with Python programming since 2008.



                                                                  2
How I started this analysis?




          24 hours ago...

                               3
Question

          How were the topics
distributed around the Scipy Conference
           General Sessions ?




                                          4
Scrapping of Scipy Conference




    Small Web-Crawler for extracting the
            approved lectures
         urllib2, re, BeautifulSoap...
                                           5
Resume


41        Lectures


820   minutes length




                       6
It means...



=~   4100 tweets posted.



                           7
Or watch...



    Star Wars Trilogy


         2x

                        8
Or finish Super Mario Game...




          82 x!
                               9
Na nossa língua agora...
Or open the Eclipse




 Abrir o Eclipse 2 vezes!
         2 x!
                            11


                             10
Most popular Authors

  Dharhas Pothina - 3


  Wes McKinney - 2


   All the others - 1



                        11
Playing with the text...




The most frequent words at the conference
                 nltk, re
                                            12
But let’s take a deeper look.
 I used the clustering algorithm K-Means

 Tool used for visualization Ubigraph




                                           13
Distribution of the Lectures

 Basic Frameworks
       matplotlib, ipython



                                    Building frameworks
                                      performance, models, web services



   Parallelism
   performance, gpu, statistical



                                               Visualization
              Numpy                                 data analysis, statistical

             toolkits using Numpy

                                                                                 14
To sum up...
Mining english text is so
     much easier!!!
        Submit your work also!

 Spread the scientific python over the
             community

I expect to be back to Scipy next year!



                                          15
https://github.com/marcelcaraciolo/clustering_scipy

        Mining Lectures
           Marcel Caraciolo - @marcelcaraciolo



                                                      16

Mining Scipy Lectures

  • 1.
    Mining Lectures MarcelCaraciolo - @marcelcaraciolo 1
  • 2.
    Who’s me ? Marcel Pinheiro Caraciolo Brazilian, lover of crabs Director of P&D - brazilian startup Orygens M.S.C Candidate at Data Mining and Recommender Systems Current moderator of the Local Python User Group at Pernambuco Interested at machine learning, recommender systems and mobile computing Blogging about machine learning with Python since 2008 http://aimotion.blogspot.com Young apprentice with Python programming since 2008. 2
  • 3.
    How I startedthis analysis? 24 hours ago... 3
  • 4.
    Question How were the topics distributed around the Scipy Conference General Sessions ? 4
  • 5.
    Scrapping of ScipyConference Small Web-Crawler for extracting the approved lectures urllib2, re, BeautifulSoap... 5
  • 6.
    Resume 41 Lectures 820 minutes length 6
  • 7.
    It means... =~ 4100 tweets posted. 7
  • 8.
    Or watch... Star Wars Trilogy 2x 8
  • 9.
    Or finish SuperMario Game... 82 x! 9
  • 10.
    Na nossa línguaagora... Or open the Eclipse Abrir o Eclipse 2 vezes! 2 x! 11 10
  • 11.
    Most popular Authors Dharhas Pothina - 3 Wes McKinney - 2 All the others - 1 11
  • 12.
    Playing with thetext... The most frequent words at the conference nltk, re 12
  • 13.
    But let’s takea deeper look. I used the clustering algorithm K-Means Tool used for visualization Ubigraph 13
  • 14.
    Distribution of theLectures Basic Frameworks matplotlib, ipython Building frameworks performance, models, web services Parallelism performance, gpu, statistical Visualization Numpy data analysis, statistical toolkits using Numpy 14
  • 15.
    To sum up... Miningenglish text is so much easier!!! Submit your work also! Spread the scientific python over the community I expect to be back to Scipy next year! 15
  • 16.
    https://github.com/marcelcaraciolo/clustering_scipy Mining Lectures Marcel Caraciolo - @marcelcaraciolo 16