Accelerating Data Management - Dave Fellinger - RDAP12


Published on

Accelerating Data Management
Dave Fellinger, DataDirect Networks
Presentation at Research Data Access & Preservation Summit
22 March 2012

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Accelerating Data Management - Dave Fellinger - RDAP12

  1. 1. 03.20.2012Accelerating Data ManagementResearch Data Access and Preservation SummitMarch 22, 2012Dave FellingerChief Scientist, Office of Strategy & Technology ©2012 DataDirect Networks. All Rights Reserved.
  2. 2. Data Management Data management can consist of; ► Access management • Client access and maintenance • User permissions • Policies including data security • Policies including data continuance ► Data manipulation utilizing microservices • Data checking • Implementing processes such as reduction and filtering • Data registration and metadata extraction • Data migration3 ©2012 DataDirect Networks. All Rights Reserved.
  3. 3. Sources of Service Latency ► Hardware Chain • Disk drive servo operation • Multiple SCSI layers • Multiple bus transitions • Memory bandwidth limitations • Network service latencies ► Software Chain • Memory copies • Kernel operations • Layers of consecutive operations including the service of V-nodes, I-nodes and FAT • Serial data transport processes4 ©2012 DataDirect Networks. All Rights Reserved.
  4. 4. What is „Embedded Processing‟? And why ?► Do data intensive processing as „close‟ to the storage as possible. • Bring computing to the data instead of bring data to computing► HADOOP is an example of this approach.► Why Embedded Processing?► Moving data is a lot of work► A lot of infrastructure needed Client sends a request to storage (red ball) Client But what we really want is :► So how do we do that? Storage Storage responds with data (blue ball) ©2012 DataDirect Networks. All Rights Reserved.
  5. 5. Storage with Virtual Machines 8 x IB QDR/10GbE Host Ports (No Fibre Channel) Interface Virtualization Virtual Virtual Virtual Virtual Machine Machine Machine Machine System memory RAID Processors High-Speed Cache Cache Link Internal SAS Switching ©2012 DataDirect Networks. All Rights Reserved.
  6. 6. Repurposing Interface Processors► In the block based SFA10K platform, the IF processors are responsible for mapping Virtual Disks to LUNs on FC or IB► In the SFA10KE platform the IF processors are running VMs► The OS running on those VMs uses a driver to access the RAID processors directly► RAID processors place data (or use data) directly in the VM‟s memory► One hop from disk to VM‟s memory► Now the storage is no longer a block device► It is a storage appliance with processing capabilities ©2012 DataDirect Networks. All Rights Reserved.
  7. 7. Example configuration► Now we can put iRODS inside the RAID controllers► The iCAT processor has lots of memory and SSDs for DB storage► Either use all VMs for iRODS or add a parallel filesystem such as GPFS for fast scratch► The filesystem uses SAS for frequent used files and SATA for the rest► The following example is a mix of iRODS with GPFS • This give iRODS the fastest access to the storage because it doesn‟t have to go onto the network to access a fileserver. It lives inside the fileserver. • The same filesystem is also visible from an external compute cluster via GPFS running on the remaining VMs► This is only one controller, the 4 VMs on the other controller need some work too • They see the same storage and can access it at the same speed. ©2012 DataDirect Networks. All Rights Reserved.
  8. 8. Example configuration 8x 10GbE Host Ports Interface VirtualizationVirtual Virtual Virtual VirtualMachine Linux Machine Linux Machine Linux Machine Linux iCAT GPFS GPFS GPFS SFA Driver SFA Driver SFA Driver SFA Driver 16 GB 8 GB 8GB 8GB memory memory System memory memory memory allocated allocated allocated allocated RAID Processors High-Speed Cache Cache Link Internal SAS Switching RAID sets RAID sets with RAID sets with with 2TB SSD 300TB SATA 30TB SAS ©2012 DataDirect Networks. All Rights Reserved.
  9. 9. Running Micro Services as a VM► Since iRODS runs inside the controller we now can run iRODS MicroServices right on top of the storage.► The storage has become an iRODS appliance „speaking‟ iRODS natively.► iRODS can execute “in-band” operations registering data and extracting metadata during injest.► We could create „hot‟ directories that kick off processing depending on the type of incoming data. ©2012 DataDirect Networks. All Rights Reserved.
  10. 10. Conclusion ► The elimination of software or hardware layers increases reliability and decreases latency. ► Automated, policy based services can be run within a storage environment. ► Policy execution can include data migration, data checking, or data manipulation by calling or scheduling microservices. Data intensive operations executed by a server can cause network traffic and SCSI bus transaction latency. Moving these operations to the storage is efficient and easily managed.11 ©2012 DataDirect Networks. All Rights Reserved.
  11. 11. Questions?DataDirect Networks, Information in Motion, Silicon Storage Appliance, S2A, Storage Fusion Architecture, SFA, Storage Fusion Fabric, Web Object Scaler, WOS, EXAScaler, GRIDScaler, xSTREAMScaler, NAS Scaler, ReAct, ObjectAssure, In-Storage Processing and SATAssure are all trademarks of DataDirect Networks. Any unauthorized use is prohibited. 12 ©2012 DataDirect Networks. All Rights Reserved.