A Virtual Infrastructure for Data intensive Analysis (VIDIA)


Published on

The presentation will overview a the establishment of a collaborative virtual community, focusing initially on data-intensive computing education in the social sciences.

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A Virtual Infrastructure for Data intensive Analysis (VIDIA)

  1. 1. COTE Fellow Chat
  2. 2. Community of Practice 1 2 Learn More: http://commons.suny.edu/c ote/ Join: http://commons.suny.edu/co te/join-community-of- practice/ 3 Submit a Proposal: http://bit.ly/COTEproposal
  3. 3. Jim Greenberg, Director TLTC Director, Teaching Learning Technology Center SUNY Oneonta Open SUNY Fellow Role: Innovator and/ or Researcher Topic: A Virtual Infrastructure for Data intensive Analysis (VIDIA) Theme: Research & Innovation COTE NOTE: http://bit.ly/cotenotevidia
  4. 4. Providing Undergraduates with a Virtual Infrastructure for Data Intensive Analysis • Jeanette Sperhac and Steven M. Gallo • SUNY Buffalo • Brian Lowe and Jim Greenberg • SUNY Oneonta
  5. 5. The VIDIA Team: Gregory Fulkerson, Ph.D. Assistant Professor of Sociology James Greenberg Director, TLTC Brett Heindl, Ph.D. Assistant Professor of Political Science Achim Koeddermann, Ph.D. Associate Professor of Philosophy and Env. SciencesBrian M. Lowe, Ph.D. Associate Professor of SociologyDiana Moseman Instructional Designer/Programmer TLTC Harry Pence, Ph.D. Distinguished Professor of Chemistry Tim Ploss Instructional Designer Bill Wilkerson, Ph.D. Associate Professor of Political Science Steven M. Gallo Lead Software Engineer CCR, University at BuffaloJeanette Sperhac Scientific Programmer CCR, University at Buffalo
  6. 6. Adopting social media analysis at Oneonta  Social Sciences approached Oneonta IT to build an analysis environment  The needed resources did not exist in house  IITG connected Oneonta with CCR
  7. 7. Case Study: Society and Animals  200 level Sociology course; social science majors without formal programming training  Comparative/historical, social scientific, journalistic  Goal: students gather, organize and interpret mined social media
  8. 8. Project Goals  Achieving critical thinking through engaging texts  Deploying ideas from texts in new directions  Applying theoretical perspectives and concepts  Achieving student engagement through data-driven research
  9. 9. Collaboration Goals  Create a social sciences big data discovery environment  Support social science teaching and research  Leverage High Performance Computing (HPC) resources  Support coursework at Oneonta, Spring 2014
  10. 10. Introducing VIDIA • Virtual Infrastructure • for Data Intensive Analysis
  11. 11. VIDIA • Deployed using Purdue's HUBzero platform:  Provide workflow tools for data analysis  Offer access to computing resources  Curate large datasets of social scientific interest
  12. 12. Data Mining Workflow Tools  Graphical User Interface  Powerful, easy to use  Open source, extensible
  13. 13. Dataset Access • Curate Big Data for social science:  Social data: Twitter feeds, etc.  Partnerships with social dataset providers  Enable students to capture own data
  14. 14. HUBzero Platform • Open source platform offers:  Access via web browser  Computation, collaboration, software tool development  Simplified access to remote HPC resources  Upload and sharing of course materials  And more...
  15. 15. Teaching on HUBzero  Unified platform for coursework  Easy on IT staff:  Obviates software installs on individual student workstations  Access anytime, anywhere  Resources can be selectively secured  Students may access resources after course conclusion
  16. 16. User Dashboard
  17. 17. Collaborative Features • Any registered user can manage and control access to their own:  Groups: assemble users with common interests  Projects: assemble resources for a common goal  Tools: development, deployment, simulations
  18. 18. Groups • HUBzero groups can:  Control access to resources  Share and distribute content  Allow users with common interests to associate • Any registered user may create a group
  19. 19. Resources
  20. 20. Deployed Tool • Orange Data Mining Tool
  21. 21. Computing Environment User's Workstation (web browser) HUBzero server Data storage Cluster resources
  22. 22. VIDIA Hardware • HUBzero and webserver: Dell PowerEdge R720xd  2x 6-core Intel Xeon E5-2630 (2.30 GHz, 15M cache)  48 TB raw (~36 TB usable) SATA disk space  128 GB memory (16x8GB - 1333MHz DIMMS) • Analysis: 4x Dell PowerEdge R520  6-core Intel Xeon E5-2430 (2.20 GHz, 15M cache)  4.8 TB raw (~4 TB usable) SAS disk space  96 GB memory (6x16GB - 1600MHz DIMMS)
  23. 23. VIDIA: Spring 2014  Supported three SUNY Oneonta courses  Deployed three data analysis tools  76 student users registered (themselves!)  Assigned student tasks:  k-Means Clustering  Word Co-Occurrences  Enabled 25+ simultaneous tool sessions
  24. 24. RapidMiner Sessions Month Tool Users Tool Sessions Run Tool Walltime Tool CPU Time April 2014 77 568 41.7 days 21.7 hours May 2014 (as of 8 May) 80 849 61.0 days 23.7 hours on VIDIA
  25. 25. Challenges  User training: learning the platform and tools  Technical performance details  HUBzero updates  Browser compatibility  Dataset acquisition
  26. 26. What's next?  SUNY Oneonta coursework, Fall 2014  Deploy additional data mining tools  Integrate HUBzero collaboration features  Roll out to other SUNY comprehensive colleges (Discussion underway with SUNY Brockport)
  27. 27. Thank You! Join the SUNY Learning Commons http:///commons.suny.edu for access to the COTE Community group to continue the conversation! View a Recording of today’s Fellow Chat: http://bit.ly/COTEfellowchatRECORDING View the COTE NOTE: http://bit.ly/cotenotevidia Become an Open SUNY Fellow: http://bit.ly/joinCOTE Submit a Proposal: http://bit.ly/COTEproposal
  28. 28. Next Fellow Chat Open SUNY Fellow: Rhianna Rogers, Assistant Professor, SUNY Empire State College Open SUNY Fellow Role: Innovator or Researcher Topic: Fostering Creativity in Learning: How to Effectively Incorporate OERs into Assignments Date: Thursday August 7 & 14, 2014 12:00 PM Register: http://www.cvent.com/d/t4qdfw