Aspera bt-big-data-cloud

2,032
-1

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,032
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
80
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Due to many uncontrolled parameters such as … it is a difficult task to compare the performance of different protocols. For the same protocol, its performance can vary … To exclude these uncontrolled factors, we repeat the same test many times in a relative long time and then compare the mean, worse and best scenario.We also manually verified the performance on the iPhone …
  • Aspera products used: Connect Server, SDK
  • Fred Hutch 3000 employeesResearchers doing a variety of researchRaw data comes off of mass spectrometers: data comes off instrument to HDD on instrument PC and copied to file servers (Sun). Lab Key Server proteomics pipeline. Researchers log into web app. Web server mounts sun server. Researchers (from other institutions) need to validate results—comparing raw data. Customers can log into proteomics pipeline. Comparing results. Researchers all over the world. Broad, Harvard, Berkeley—and other countries. Data set: Few hundred Megs to 25 Gigs. Mainly 1 to 5 Gigs range. About 10 percent compared remotely. Some labs it’s all remote; other labs don’t collaborate remotely. Total data set size has file sizes between 500Megs to 1Gigs x 2 or 25 for total data set size. Raw file converted to MZ-XML by pipeline—so a 500Meg file turns into a 1.5Gig file—just used for conversion to results. Next gen DNA research. In-house pipeline. Collaborators outside the hutch. Pipeline software from Illumina, customized with scripts. Submitting jobs to cluster, sending out results. For example, working on publishing a paper, would pretty much share everything—all results (not necessarily all data). Storage: Sun Server mounting 3PAR. HPC cluster with nodes mounting storage. Directories mounted from servers running Aspera (e.g. Faspex). Clients: Diversity of Linux, Windows, Macs. Aspera Developers Network: Fget fsend Other use case: long term collaboration between researchers. Could be one to one or one to many. Collaborating on papers and such. Proteomics and genomics coming in as well.
  • First bottleneck's solution Transfer bulk data over WAN using Aspera fasp, overcomes TCP limitations under network latency and packet loss.Aspera solutions yield 100x performance improvements
  • fasp Technology and Software Suite for Predictable, High Speed File-based TransferUnique in the world, patented transport technology providing unparalleled speed, efficiency, concurrency and bandwidth controlFully integrated, cross platform software suite for interoperable file transfer – any size, any distance, and network BWSecure standard for the industryIntegrated global management, tracking and reportingExtends to all Cloud based storageExtends to all major mobile platformsKey Business Areas Enabled with Latest Aspera SoftwareHigh Speed Content Delivery, Synchronization and Distribution (including Cloud!)Ad Hoc Content Ingest, Delivery and CollaborationIntegrated File Workflow Automation and OrchestrationMobile Platform File Delivery
  • Aspera bt-big-data-cloud

    1. 1. Enabling The Big Data Cloud for HPCand Collaboration With High-Speed DataTransport
    2. 2. PRESENTER AND AGENDAPRESENTER AGENDADaniel KumiDirector, New Market Developmentdaniel@asperasoft.com • Who and Why Aspera? • WAN Transport • Wireless Transport • Customer Use Cases • Cloud and Big Data – Transfer Challenges for HPC and Collaboration • Aspera On Demand • BT-Aspera Discussion
    3. 3. ASPERA’S MISSION Creating next-generation transport technologies that move the world’s digital assets at maximum speed,regardless of file size, transfer distance and network conditions.
    4. 4. Aspera: moving the world’s digital assets at maximum speed 50% YOY growth in revenue and employees Over 10,000 licenses sold, and over 1,500 customers world wide Expanded to Asia PAC and Latin America through direct and channel Patents issued or pending in 32 countriesContinuing to innovate: fasp3™, fasp-MC™, mobile transport, cloud enablement
    5. 5. Aspera Ecosystem of Partners
    6. 6. Life SciencesLife Sciences
    7. 7. BIG DATA TRANSFER CHALLENGE
    8. 8. What Happened to my Bandwidth? WAN 1000 Mbps Seattle • 170ms RTT • 0.001% packet loss rate ParisWAN Throughput is 1000MbpsMax TCP Throughput ~29MbpsWhere’s my 970Mbps? At 29Mbps 50GB transfer will take 4 hrs 1TB transfer will take 3.3 days
    9. 9. BIG-DATA and WAN TRANSFER WITH TCPTCP WAS DESIGNED IN THE EARLY 80’S • When data was small & bandwidth was limited • Fantastic for reliable data delivery • Not fast enough for big-dataTCP IS THE ENGINE THAT DRIVES • FTP, HTTP & HTTPS • RSYNC, SCP & DICOM • CIFS & NFSTCP DOES NOT LIKE NETWORK LATENCY/ RTT • Geographic distance increases latency • Network congestion increases latencyTCP DOES NOT LIKE PACKET LOSS • Loss is caused by congestion • Different network capacity • Wireless and satellite communications
    10. 10. So if TCP doesn’t work, what’s the answer?The Aspera Solution
    11. 11. Same WAN Scenario with Aspera WAN 1000 Mbps • 170ms RTT Seattle • 0.001% packet loss rate ParisWAN is 1000MbpsMax TCP Throughput ~29MbpsMax Aspera Throughput ~995Mbps (gain of x34)ROI measured in $$ cost of not using 971Mbps At 995 Mbps • 50GB transfer will take ~4 hrs • 1TB transfer will take 3.3 days • 50GB transfer will take ~7 mins • 1TB transfer will take 2.4 hrs
    12. 12. FASP™ — HIGH-PERFORMANCE DATA TRANSPORTMAXIMUM LINE-RATE WAN TRANSFER SPEED • Transfer performance scales with bandwidth independent of transfer distance and resilient to packet loss • Optimal end-to-end throughput efficiencyCONGESTION AVOIDANCE AND POLICY CONTROL • Automatic, full utilization of available bandwidth • On-the-fly prioritization and bandwidth allocationUNCOMPROMISING SECURITY AND RELIABILITY • Secure, user/endpoint authentication • AES-128 cryptography in transit & at-restSCALABLE MANAGEMENT, MONITORING AND CONTROL • Real-time progress, performance and bandwidth utilization • Detailed transfer history, logging, and manifestENTERPRISE-CLASS FILE DELIVERY • Transfers up to thousands of times faster than FTP/HTTP(S) • Precise and predictable transfer times • Extreme scalability (concurrency and throughput)
    13. 13. FASP vs TCP PERFORMANCEfasp Bandwidth ROI FTP: Limited by Distance & Packet Loss, Not B/W FTP Across US US – EU US – ASIA Satellite 1 GB 1 – 2 hrs 2 – 4 hrs 4 – 20 hrs 8 – 20 hrs 10 GB 15 – 20 hrs 20 – 40 hrs Impractical Impractical 100 GB Impractical Impractical Impractical Impractical Aspera: Scales Linearly with Bandwidth fasp™ 2 Mbps 10 Mbps 45 Mbps 100 Mbps 200 Mbps 1 Gbps 1 GB 70 min. 14 min. 3.2 min. 1.4 min. 42 sec. 8.4 sec. 10 GB 11.7 hrs 140 min. 32 min. 14 min. 7 min. 1.4 min. 100 GB 23.3 hrs 5.3 hrs 2.3 hrs 1.2 hrs 14 min. Distance & Packet Loss Independent
    14. 14. 6 Gbps Scalable WAN Throughput~6Gbps Big-Data Throughput x3000 improvement vs. TCP • Latency independent • 1TB data moved in 20 min • Loss independent • 2 days with TCP over LAN conditions Scale to ~10Gbps with IQ Accelerator
    15. 15. High Speed Mobile Data Transfer with fasp-AIR™fasp-AIR SDK – maximum data transfer speed and predictability formobile devices • Embeddable software library allows app developers to integrate superior transport capabilities to their own applications such as faster and more predictable downloads/uploads. • Available for Android and iOS on Aspera Developer Network • Designed for wireless networks with high latency, high packet loss environments • Integrated transfer queuing, pause, resume and progress reporting • Achieves significant performance improvements for upload and download speeds over 3G, 4G and 802.11 g/n.
    16. 16. fasp-AIR Benchmarks on Verizon 4GIn some cases (highlighted in orange), speeds will varygreatly, depending on available bandwidth and the underlyingcondition of the wireless network.
    17. 17. CUSTOMER USE CASES:NCBI/NIH, HUTCHINSON
    18. 18. Large-scale Global Collaboration: 1000 GenomesPetabytes of data transferred monthly • Files range in size from KBs to many GBsRepository contents • 2,500 genomes from 27 populations NIH NIH • Several types of variations: SNPs, small insertions and deletions, NIH structural variants, and copy number variants DataAvailable on web - 4 locations Cloud • 1000genomes.org, AWS, NCBI, and EBI websites • Technology web sites use: • Aspera Connect Server • Aspera Developers’ Network and SDK Upload/ • Researchers across all locations use: Download • Aspera Connect client • (Freely distributable with server license)
    19. 19. Researcher to Researcher Collaboration Faspex in use by world-renowned Cancer Research Center in Seattle, WAUse case : Genomic researchGenomic research results sharing • Research made available to collaborators • Research published—globallyWorkflow • Illumina > Storage > Researcher > Aspera • Publish one-to-many SeattleCollaboration options • Person-to-person, one-to-many (faspex server) • Publish-subscribe (faspex or connect server)
    20. 20. CLOUD COMPUTING & BIG DATA
    21. 21. CLOUD COMPUTING — WHY IS IT SO COMPELLING?THE POTENTIAL OF INFINITE COMPUTING RESOURCES, ON DEMAND • Eliminates the need to plan ahead • Allows companies to meet demand • Without the lead-time bottleneckTHE ELIMINATION OF AN UP-FRONT COMMITMENT • Reduce capital outlay and investment risk • Start small & increase h/w resources to match need • Auto-scale to meet demandPAY-FOR-USE RESOURCE MODEL • CPU’s by the hour • Storage by the day • Bandwidth by the GB
    22. 22. SO? WHAT CAN I DO WITH IT? • Compute Intensive: 10’s, 100’s, 1000’s of CPU cores DATA PROCESSING& CONTENT CREATION • Transcoding, rendering, encoding, watermarking • Big-data analytics & HPC • Near-line for editing, creative apps and processing STORAGE FOR ARCHIVE & D/R • B2B / B2C data workflow • Offsite storage for disaster recovery and business continuity • OTT, play out, release, project & event specific marketing DATA & CONTENT DISTRIBUTION • Collaborative data exchange • CDN and global delivery
    23. 23. GETTING IN AND OUT OF THE CLOUDKNOWING WHEN TO CHOSE THE RIGHT TOOL
    24. 24. CHALLENGES OF STORING BIG FILES IN THE CLOUD?BEWARE THE OBJECT STORE: • Not like traditional NAS or SAN • Bigger, better, but possibly much more complex • a.k.a. Google File System, Amazon S3, Hadoop Distributed File System • Simple read/write of data ―blobs‖, indexed by a key • Multiple replicas are distributed across storage for durability and optimized for access • Should work well for storing large numbers of filesUNDERSTAND CHUNKS, BLOCKS and BLOBS • You need to deal with chunks, blocks and blobs • ―Chunk‖ sizes are small (64 MB/128 MB) • Large media files must be ―chunked‖ (1TB file = transporting and reassembling 10,000+ chunks!) • Multi-chunk APIs impede workflow and are complex • Data I/O use the standard HTTP(s) protocol • VERY SLOW at distance • Single HTTP stream slow even locally (<100 Mbps).BIG-DATA SERVICES WILL NEED A HIGH-SPEED BRIDGE TO THE CLOUD • Large files moved at full bandwidth capacity with global access • Overcome the WAN and storage bottleneck • Support files of any size or quantity • Transparent to the end user/data owner (GUI, command line, API, browser, etc.) • No hardware to support B2B, B2C, C2B workflow
    25. 25. FIRST MAJOR BOTTLENECKS: WAN TRANSFER
    26. 26. SECOND MAJOR BOTTLENECKS: LOCAL HTTP I/O1st Bottleneck - WAN 2nd Bottleneck — Data Center
    27. 27. S3 & BIG-DATA: UNDERSTAND THE CONTRAINTS
    28. 28. S3 & BIG-DATA: MEET ASPERA’s DIRECT-TO-S3 client cargo downloaderpoint-to-point mobile appsconnect plug-in
    29. 29. OVERCOMING BOTH BOTTLENECKS #1 — TRANSFER DATA TO EC2 OVER WAN EFFECTIVE THROUGHPUT• http transfer over WAN (single stream) <10 Mbps• Typical internet conditions • 50–250ms latency & 0.1–3% packet loss <10 to 100 Mbps• 15 parallel http streams• Aspera fasp transfer over WAN to EC2 up to 1Gbps (per EC2 Extra Large Instance) #2 — TRANSFER DATA FROM EC2 TO S3 EFFECTIVE THROUGHPUT• Standard single stream http 10 to 100 Mbps• Aspera S3 Proxy up to 1Gbps • With parallel I/O http streams (per EC2 Extra Large Instance) ASPERA + AWS | ~10 TB transferred per 24 hours | PER EC2 INSTANCE
    30. 30. ASPERA DIRECT-TO-S3 — LINE RATE ACCESS TO THE CLOUDUNRIVALED ASPERA PERFORMANCE • Built on Aspera fasp™ technology for maximum transfer speed • Regardless of file size, transfer distance and network conditions • Precise bandwidth control ensures the available bandwidth is utilized to achieve maximum transfer speeds, while being fair to other business-critical network trafficSEAMLESS INTEGRATION WITH S3 • Integrated with S3 multi-part HTTP for maximum ―last foot‖ performance • Simple configuration of S3 credentials, for both shared and dedicated docroot • Transfers directly into S3 are seamless and transparent to userENTERPRISE-GRADE SECURITY AND RELIABILITY • Secure authentication with encryption in transit & at rest (AES-128, FIPS 140-2, HIPPA Compliant) • Packet-level data integrity verification • Automatic resume of partial or failed transfers • Full support for AWS S3 Service-side-encryption at restINTEROPERATES WITH ALL ASPERA HOST OPTIONS • Any platform (Windows, Linux, MAC, UNIX, iOS, Android) • Any Aspera Clients (CLI, Desktop, Point-to-Point, Mobile, Web, Embedded) • Any Aspera Servers (Enterprise, Connect, faspex)
    31. 31. ASPERA FOR AWS: DIRECT-TO-S31. Upload using typical multi-part HTTP client Scale out HTTP –2. fasp high-speed upload Direct-to-S3 multipart 2 fasp Aspera Transfer Server Aspera Herndon, VA Client 1 HTTP – multipart Client, Dallas, TX
    32. 32. HYBRID CLOUD DEPLOYMENT (PUBLIC/PRIVATE)Shares app transparently communicates with Aspera server Nodes in cloud and in enterpriseUser browses content across authorized sharesHigh-speed data transfers with DatacenterHigh-speed data transfers with Direct-to-S3 Client, NY, NY fasp Shares fasp DMZ Node Node Herndon, VA Datacenter, Emeryville, CA
    33. 33. ASPERA SOFTWARE ON DEMAND Aspera Server Aspera faspex Aspera Shares Aspera Console Universal file transfer server Global Person-to-person file Global Person-to-person file Global transfer monitoring, supports desktop, web, mobile & ingest & distribution transfer & exchange reporting & control embeddedKEY FEATURES • On demand high-performance data transport to and from remote infrastructures • Unlimited scale out of transfer capacity with additional AMIs • Support for all Aspera Server software and use cases • Additional Client Options: Mobile, Outlook Plug-in & Cargo (Aspera faspex) • Flexible Storage Options: Local, EBS, AWS S3 • Seamlessly interoperates with on-premise Aspera deployments • Integrated Management and MonitoringAPPLICATIONS AND USE CASE • High Performance Computing On Demand • Content Aggregation, Transformation and Distribution • Time-boxed event or project-based collaboration, ad-hoc distribution or content ingest
    34. 34. Aspera software product & technology portfolio Distribute Collaborate Automate Complete portfolio of servers and end point Global person-to-person and project-based Web-based application and SDK for creating and clients for high-speed digital content delivery and exchange and collaboration of files and directories, managing automated workflows, from simple file distribution. of any size, over any distance, over any network. forwarding, to complex process orchestration. Enterprise and Connect Server faspex Server Orchestrator • Universal file transfer server and web-based • Secure digital delivery and collaborative file • Intuitive graphical workflow designer interface and directory listing transfers with remote users and partners • File processing decision tree and flow Client and Point-to-point • Integrated e-mail notifications for delivery and • Rich and flexible plug-in architecture for third- • Uni- and bi-directional transfer clients successful download party process integration • Comprehensive administration, user • Comprehensive library of plug-ins for Connect management & access control transcoding, virus checking, quality checking, • Web browser plug-in for high-speed uploads faspex Multi-Server / HA archive, notifications and downloads • Automated bi-directional relays between sites • High volume processing Mobile • Detailed dashboard, workflow, and step-level and multiple servers • High-speed transfer for mobile devices progress reporting. • 3-tier architecture with support for clustering and Sync high availability • Open development framework for designing • Highly scalable, multidirectional file replication and integrating highly processing and Cargo and synchronization automation pipelines • Automated client downloads TransportOur unique, patented transport technologies provide unparalleled speed, efficiency, concurrency and bandwidth control over any size, distance, and networkfasp™ fasp3™ Aspera On-Demand S3|DirectPatented, file-based bulk data transport Next-gen protocol for any bulk data High-speed transfer direct to cloud storage (S3)fasp-AIR™ fasp-MC™ Console transport managementUploads and downloads over 3G, LTE and Wi-Fi networks High-speed delivery over multicast Centralized web-based management, monitoring, and reporting
    35. 35. Aspera fasp™ software environment
    36. 36. ASPERA DEVELOPER NETWORKA complete set of SDKs provides developers with guides, reference information, and sample code to assist them withintegrating Aspera technology into their own applications. Aspera fasp™ technology can be used in desktop, network-based, and web applications in place of FTP, HTTP, or custom TCP-based copy protocols. ASPERA TRANSFER APIs ASPERA MOBILE APIs Aspera Web Services Android SDK A SOAP based web service API that allows Aspera Android SDK provides a Java API to transfer files using initiation, monitoring and controlling of fasp based file fasp-AIR™. transfers. iPhone SDK Aspera iPhone SDK provides an Objective C API to transfer files Aspera Web using fasp-AIR. Javascript API exposed by Aspera Connect client. It allows integration of fasp based file transfers into web applications. ASPERA APPLICATION APIs Connect 2.8 developer Preview 2 faspex™ Web API Introducing the new Connect 2.8 developer preview! The Aspera faspex Web API provides a set of services that enables Integrate the functionality of Aspera Connect 2.8, a fasp- users to create and receive digital deliveries via a Web interface, while based file transfer client, into your own web taking advantage of fasp high-speed transfer technology. applications, while customizing it to your unique brand. fasp Manager OTHER INFORMATION A class library that allows intiations, monitoring and controlling of fasp based file transfers. Supporting Tools and Libraries Supporting tools and libraries let you perform other common tasks Aspera Multicast SDK surrounding file transfers. A Java class library that allows initiation and management General Reference of IP multicast based data transmissions using Aspera Reference on error codes, log file locations, configuration files fasp-MC™. and more.
    37. 37. Aspera software product & technology portfolio Distribute Collaborate AutomateComplete portfolio of servers and clients for high- Global person-to-person and project-based Web-based application and SDK for creating andspeed data delivery and distribution. exchange and collaboration of files and directories. managing automated file-based workflows.Enterprise and Connect Server faspex™ Server Orchestrator• Universal file transfer server and web-based • Secure digital delivery and collaborative file • Intuitive graphical workflow designer APIs interface and directory listingClient and Point-to-point• Uni- and bi-directional transfer clientsConnect• Web browser plug-in APIs transfers with remote users and partners • Web, email, mobile client options • Comprehensive administration, user management & access control faspex™ Multi-Server / HA APIs • File processing decision tree and flow • Rich and flexible plug-in architecture for third- party process integration • Comprehensive library of plug-ins for transcoding, A/V, QC, archive, notificationsMobile • Automated bi-directional relays between sites • High volume processing• High-speed transfer for mobile devices • 3-tier architecture with support for clustering, HA • Detailed dashboard, workflow, and step-levelSync progress reporting.• Highly scalable, multidirectional file replication Cargo • Automated package downloads • Open development framework for designing and synchronization and integrating automation pipelines Transport API’sOur unique, patented transport technologies provide unparalleled speed, efficiency, concurrency and bandwidth control over any size, distance, and networkfasp™ fasp3™ Aspera On-Demand S3|DirectPatented, file-based bulk data transport Next-gen protocol for any bulk data High-speed transfer direct to cloud storage (S3)fasp-AIR™ fasp-MC™ Console transport managementUploads and downloads over 3G, LTE and Wi-Fi networks High-speed delivery over multicast Centralized web-based management, monitoring, and reporting
    38. 38. BT-ASPERA DISCUSSION
    39. 39. THANK YOU!Daniel KumiDirector, New Market Developmentdaniel@asperasoft.com

    ×