Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data


Published on

Big Data en el Barcelona Supercomputing Center

Published in: Technology, Business

Big Data

  1. 1. Activityin Big Data 26/03/2012 1
  2. 2. Previous work in Big Data 2
  3. 3. Scenario Application Data Target placement and management: Applications: scheduling: MapReduce Key-Value Data Analytics storage Bioinformatics 3
  4. 4. Big Data PapersHigh level performance goals and Big Data• Resource-aware Adaptive Scheduling for MapReduce Clusters. J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, E. Ayguadé. In the ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011).• Performance-Driven Task Co-Scheduling for MapReduce Environments.J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, I. Whalley. In the 12th IEEE/IFIP Network Operations and Management Symposium (NOMS2010).Hybrid Hardware and Big Data• Speeding Up Distributed MapReduce Applications Using Hardware Accelerators. Y. Becerra, V. Beltran, D. Carrera, M. González, J. Torres and E. Ayguadé. In the 38th International Conference on Parallel Processing (ICPP 2009).• Accelerated MapReduce Workloads in Heterogeneous Clusters. J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres, E. Ayguadé. Performance Management of Accelerators. In the 39th International Conference on Parallel Processing (ICPP2010).Big Data and Energy:• Towards Energy-Eficient Management of MapReduce Workloads. J.Polo, Y. Becerra, D. Carrera, V. Beltran, J. Torres and E. Ayguadé. First international conference on energy-efficient computing and networking. (e-Energy 2010).• GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks. Í. Goiri, K. Le, T. D. Nguyen, J. Guitart, J. Torres, and R. Bianchini. European Conference on Computer Systems (Eurosys 2012). 4
  5. 5. On going research in Big Data 5
  6. 6. New challenges in Big Data: OUR VISION Conventional Storage Execution Time Systems Large Data new Sets, growing requirements too big for for real-time conventional decisions storage/tools GBs Data Volume PBs 6
  7. 7. New challenges in Big Data: OUR APPROACH MapReduce & NoSQL Conventional Storage Execution Time Systems In-memory GBs Data Volume PBs 7
  8. 8. On going research projects Technology Goal Use case Collaborators involved Hadoop Snapshot isolation (support to Data Analytics IBM & online data generation) Cassandra High level performance goal and Data Analytics and Hadoop Life Science Dept. MapReduce automatic query configuration Bioinformatics (support to & (BSC) & NoSQL drug discovery) Cassandra Automatic configuration, data Bioinformatics (support to Life Science Dept. organization to meet high level Cassandra performance goals drug discovery) (BSC) In-Memory Bioinformatics Workflows (index construction, Bioinformatics (genomic IBM and Life In-Memory alignment, sorting, data PIMD sequencing) Science Dept. (BSC) processing) 8
  9. 9. Next planned research in Big Data 9
  10. 10. New challenges in Big Data: Our approach MapReduce & NoSQL Conventional Storage Execution Time Systems APPLICATION Storage IN-MEMORY Hierarchy Management RDBMS In-memory GBs Data Volume PBs 10
  11. 11. Our Big Data resource management picture Data Genomic Drug Air Quality Business Analytics Sequencing Discovery Forecasting Intelligent Resource Management High Scalable Application placement Data Management: To meet performance goals as: automatic data organization and scheduling: Consistency, and configuration (multi-job performance Availability, Partitioning Tolerance, (NoSQL/In-Memory/Hierarchy goals, resource awareness, Energy Consumption, management) hybrid harware) Response Time, … In-Memory DB Heterogeneous NoSQL Compute Nodes SQL Storage Hierarchy: Mix of Mechanichal + Flash + SCM
  12. 12. Collaboration with other BSC departments Heterogeneous Application Flows (Domain Specific, Differentiated Resource Requirements) Legacy Code Prog. (MPI) Models eDSL Data-centric Resource Manager Custom In-mem NoSQL Data Key/Val Mgmt. Persistent FileSystems Objects Mix of Mechanichal + Flash + SCM Storage Compute nodes
  13. 13. Autonomic & eBusiness Group 13
  14. 14. Group Goal To research autonomic and intelligent resource management for todays business applications. Cloud Computing The objective is to create new components at middleware level that provides holistic Sustainable Computing Big Data solutions for some of the new Autonomic and Intelligent IT challenges in the industry Resource Management High Business Performance Analytics Computing 14
  15. 15. Current main interrelated areas BLO-driven High Exploiting Management performance Heterogeneous architectures Hardware for Big Data Embedded Domain Specific Online Languages for predictors HPC Massively Service-aware Distributed VM Data Stores Management Middleware Workload that provides Energy-aware Management a holistic Management solution 15
  16. 16. Group members 16