• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big Data

Big Data en el Barcelona Supercomputing Center

Big Data en el Barcelona Supercomputing Center



Total Views
Views on SlideShare
Embed Views



4 Embeds 523

http://www.jorditorres.org 471
http://www.jorditorres.eu 38
http://www.linkedin.com 12
http://us-w1.rockmelt.com 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • I like your Big Data presentation.
    I would like to share with you document about application of Big Data and Data Science in retail banking. http://www.slideshare.net/LadislavUrban/syoncloud-big-data-for-retail-banking-syoncloud
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Big Data Big Data Presentation Transcript

    • Activityin Big Data 26/03/2012 1
    • Previous work in Big Data 2
    • Scenario Application Data Target placement and management: Applications: scheduling: MapReduce Key-Value Data Analytics storage Bioinformatics 3
    • Big Data PapersHigh level performance goals and Big Data• Resource-aware Adaptive Scheduling for MapReduce Clusters. J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, E. Ayguadé. In the ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011).• Performance-Driven Task Co-Scheduling for MapReduce Environments.J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, I. Whalley. In the 12th IEEE/IFIP Network Operations and Management Symposium (NOMS2010).Hybrid Hardware and Big Data• Speeding Up Distributed MapReduce Applications Using Hardware Accelerators. Y. Becerra, V. Beltran, D. Carrera, M. González, J. Torres and E. Ayguadé. In the 38th International Conference on Parallel Processing (ICPP 2009).• Accelerated MapReduce Workloads in Heterogeneous Clusters. J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres, E. Ayguadé. Performance Management of Accelerators. In the 39th International Conference on Parallel Processing (ICPP2010).Big Data and Energy:• Towards Energy-Eficient Management of MapReduce Workloads. J.Polo, Y. Becerra, D. Carrera, V. Beltran, J. Torres and E. Ayguadé. First international conference on energy-efficient computing and networking. (e-Energy 2010).• GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks. Í. Goiri, K. Le, T. D. Nguyen, J. Guitart, J. Torres, and R. Bianchini. European Conference on Computer Systems (Eurosys 2012). 4
    • On going research in Big Data 5
    • New challenges in Big Data: OUR VISION Conventional Storage Execution Time Systems Large Data new Sets, growing requirements too big for for real-time conventional decisions storage/tools GBs Data Volume PBs 6
    • New challenges in Big Data: OUR APPROACH MapReduce & NoSQL Conventional Storage Execution Time Systems In-memory GBs Data Volume PBs 7
    • On going research projects Technology Goal Use case Collaborators involved Hadoop Snapshot isolation (support to Data Analytics IBM & online data generation) Cassandra High level performance goal and Data Analytics and Hadoop Life Science Dept. MapReduce automatic query configuration Bioinformatics (support to & (BSC) & NoSQL drug discovery) Cassandra Automatic configuration, data Bioinformatics (support to Life Science Dept. organization to meet high level Cassandra performance goals drug discovery) (BSC) In-Memory Bioinformatics Workflows (index construction, Bioinformatics (genomic IBM and Life In-Memory alignment, sorting, data PIMD sequencing) Science Dept. (BSC) processing) 8
    • Next planned research in Big Data 9
    • New challenges in Big Data: Our approach MapReduce & NoSQL Conventional Storage Execution Time Systems APPLICATION Storage IN-MEMORY Hierarchy Management RDBMS In-memory GBs Data Volume PBs 10
    • Our Big Data resource management picture Data Genomic Drug Air Quality Business Analytics Sequencing Discovery Forecasting Intelligent Resource Management High Scalable Application placement Data Management: To meet performance goals as: automatic data organization and scheduling: Consistency, and configuration (multi-job performance Availability, Partitioning Tolerance, (NoSQL/In-Memory/Hierarchy goals, resource awareness, Energy Consumption, management) hybrid harware) Response Time, … In-Memory DB Heterogeneous NoSQL Compute Nodes SQL Storage Hierarchy: Mix of Mechanichal + Flash + SCM
    • Collaboration with other BSC departments Heterogeneous Application Flows (Domain Specific, Differentiated Resource Requirements) Legacy Code Prog. (MPI) Models eDSL Data-centric Resource Manager Custom In-mem NoSQL Data Key/Val Mgmt. Persistent FileSystems Objects Mix of Mechanichal + Flash + SCM Storage Compute nodes
    • Autonomic & eBusiness Group 13
    • Group Goal To research autonomic and intelligent resource management for todays business applications. Cloud Computing The objective is to create new components at middleware level that provides holistic Sustainable Computing Big Data solutions for some of the new Autonomic and Intelligent IT challenges in the industry Resource Management High Business Performance Analytics Computing 14
    • Current main interrelated areas BLO-driven High Exploiting Management performance Heterogeneous architectures Hardware for Big Data Embedded Domain Specific Online Languages for predictors HPC Massively Service-aware Distributed VM Data Stores Management Middleware Workload that provides Energy-aware Management a holistic Management solution 15
    • Group members www.bsc.es/eBusiness 16