October 2013 "Beyond the Genome" presentation slides. Talk is mostly focused on issues around IaaS cloud usage for "Bio-IT" and life science informatics & scientific computing.
PDF SLIDES AVAILABLE DIRECTLY - PLEASE EMAIL "CHRIS@BIOTEAM.NET" FOR SLIDES
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
Talk slides from the annual "trends from the trenches" address at BioITWorld Expo. 2014 Edition.
### Email chris@bioteam.net if you'd like a PDF copy of this deck ###
This is a very short slide deck I did for a 10-minute slot on a http://pistoiaalliance.org/ webinar. The slides do not fully cover what I intend to talk about so if the webinar is recorded and available afterwards I'll update this description with the recording URL.
PDF copy of the slides available upon request ("chris@bioteam.net")
Talk slides from my annual address at the Bio-IT World Expo & Conference where I cover trends, best practices and emerging pain points for life science focused HPC, scientific computing and "research IT"
Email "chris@bioteam.net" if you want a PDF copy of these slides. I've disabled the raw powerpoint download option on slideshare.
This was a 30 min talk intended as one of the opening/overview presentations before a full-day deep dive into ScienceDMZ design patterns and architectures.
Direct downloads are not enabled. Contact me directly (chris@bioteam.net) if you for some odd reason want a copy of this slide deck!
This is a custom "Bio IT trends/problems" deck that I did for a general but highly technical audience at the 2014 Internet2 Technology Exchange conference.
Download of the raw PPT is disabled; contact me at chris@bioteam.net if a direct copy or PDF of the presentation would be useful.
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
As presented at BioIT World 2016. In one of the more popular presentations of the Expo, Chris delivers a candid assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. He’ll cover what has changed (or not) in the past year around infrastructure, storage, computing, and networks. This presentation will help you understand IT to build and support data intensive science.
Video link from the presentation: biote.am/bs
[Note: email chris@bioteam.net if you would like a PDF copy of this presentation]
This is a massive slide deck I used as the starting point for a 1.5 hour talk at the 2012 www.nerlscd.org conference. Mixture of old and (some) new slides from my usual stuff.
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
2014 BioIT World Expo presentation
"Many of the largest NGS sites have identified IO bottlenecks as their number one concern in growing their infrastructure to support current and projected data growth rates. In this talk Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc. will share real-world strategies and implementation details for building converged storage infrastructure to support the performance, scalability and collaborative requirements of today's NGS workflows. "
For a copy of this presentation please email: chris@bioteam.net
2014 BioIT World - Trends from the trenches - Annual presentationChris Dagdigian
Talk slides from the annual "trends from the trenches" address at BioITWorld Expo. 2014 Edition.
### Email chris@bioteam.net if you'd like a PDF copy of this deck ###
This is a very short slide deck I did for a 10-minute slot on a http://pistoiaalliance.org/ webinar. The slides do not fully cover what I intend to talk about so if the webinar is recorded and available afterwards I'll update this description with the recording URL.
PDF copy of the slides available upon request ("chris@bioteam.net")
Talk slides from my annual address at the Bio-IT World Expo & Conference where I cover trends, best practices and emerging pain points for life science focused HPC, scientific computing and "research IT"
Email "chris@bioteam.net" if you want a PDF copy of these slides. I've disabled the raw powerpoint download option on slideshare.
This was a 30 min talk intended as one of the opening/overview presentations before a full-day deep dive into ScienceDMZ design patterns and architectures.
Direct downloads are not enabled. Contact me directly (chris@bioteam.net) if you for some odd reason want a copy of this slide deck!
This is a custom "Bio IT trends/problems" deck that I did for a general but highly technical audience at the 2014 Internet2 Technology Exchange conference.
Download of the raw PPT is disabled; contact me at chris@bioteam.net if a direct copy or PDF of the presentation would be useful.
BioIT World 2016 - HPC Trends from the TrenchesChris Dagdigian
As presented at BioIT World 2016. In one of the more popular presentations of the Expo, Chris delivers a candid assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. He’ll cover what has changed (or not) in the past year around infrastructure, storage, computing, and networks. This presentation will help you understand IT to build and support data intensive science.
Video link from the presentation: biote.am/bs
[Note: email chris@bioteam.net if you would like a PDF copy of this presentation]
This is a massive slide deck I used as the starting point for a 1.5 hour talk at the 2012 www.nerlscd.org conference. Mixture of old and (some) new slides from my usual stuff.
Taming Big Science Data Growth with Converged InfrastructureThe BioTeam Inc.
2014 BioIT World Expo presentation
"Many of the largest NGS sites have identified IO bottlenecks as their number one concern in growing their infrastructure to support current and projected data growth rates. In this talk Aaron D. Gardner, Senior Scientific Consultant, BioTeam, Inc. will share real-world strategies and implementation details for building converged storage infrastructure to support the performance, scalability and collaborative requirements of today's NGS workflows. "
For a copy of this presentation please email: chris@bioteam.net
Mapping Life Science Informatics to the CloudChris Dagdigian
Infrastructure cloud platforms such as those offered by Amazon Web Services are not designed and built with scientific research as the primary use case. These presentation slides cover the current state of mapping life science research and HPC technique onto “the cloud” and how to work around the common engineering, orchestration and data movement problems.
[Note: I've replaced the 2011 version of this talk deck with a slightly updated version as delivered at the AIRI Petabyte Challenge Meeting]
BioITWorld 2013 presentation - Best practices for building multi-tenant HPC clusters for Pharma/BioTech
Essentially a mini case study of a recent deployment of a multi-petabyte, 1000+ CPU core Linux cluster in the Boston area.
Please email me at: chris@bioteam.net if you would like the actual PDF file itself.
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
The term disruptive innovation was popularized by Harvard professor Clayton Christensen in his 1997 book “The Innovator’s Dilemma.” Nearly 20 years later “Disrupt!” is a popular leadership mantra that is more frequently uttered than experienced. You can't productize it. You can't always control it – at least what effects it has in practice. You aren't necessarily going to like every product of innovation. So are you sure you want it? If so, how do you promote a culture in which innovation can flower – and, potentially, thrive? Because that's probably the best that you can do.
Perhaps there's a better framing for innovation than just "disruption.“ This session is an overview of commmoditization and innovation theories followed by basic things you can do to apply that theory to your daily job architecting, choosing and managing a data environment in your company.
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Candid/blunt AWS advice for research IT and life science IT leadership. Hard lessons learned from many years of AWS consulting. Contact dag@bioteam.net if you want a PDF copy of this presentation
Bi isn't big data and big data isn't BI (updated)mark madsen
Big data is hyped, but isn't hype. There are definite technical, process and business differences in the big data market when compared to BI and data warehousing, but they are often poorly understood or explained. BI isn't big data, and big data isn't BI. By distilling the technical and process realities of big data systems and projects we can separate fact from fiction. This session examines the underlying assumptions and abstractions we use in the BI and DW world, the abstractions that evolved in the big data world, and how they are different. Armed with this knowledge, you will be better able to make design and architecture decisions. The session is sometimes conceptual, sometimes detailed technical explorations of data, processing and technology, but promises to be entertaining regardless of the level.
Yes, it’s about the data normally called “big”, but it’s not Hadoop for the database crowd, despite the prominent role Hadoop plays. The session will be technical, but in a technology preview/overview fashion. I won’t be teaching you to write MapReduce jobs or anything of the sort.
The first part will be an overview of the types, formats and structures of data that aren’t normally in the data warehouse realm. The second part will cover some of the basic technology components, vendors and architecture.
The goal is to provide an overview of the extent of data available and some of the nuances or challenges in processing it, coupled with some examples of tools or vendors that may be a starting point if you are building in a particular area.
Annual address covering trends, emerging requirements, pain points and infrastructure issues in the "Bio-IT" aka life science informatics and HPC realm; Email me if you want a PDF of this talk - chris@bioteam.net
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
Tiny slide deck from a 5-min lightning talk covering a recent project involving live replication of 2-petabytes of scientific data.
Please leave feedback if you'd like to see this as a long-form technical blog article or conference talk, thanks!
Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because "databases don't scale" or because data is "unstructured". If data had no structure then it wouldn't be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means.
This presentation will describe concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examine the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities.
Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.
Briefing room: An alternative for streaming data collectionmark madsen
Knowing what’s happening in your enterprise right now can mark the difference between success and failure. The key is to have a rich view of activity, such that analysts and others can explore in a fully multidimensional fashion. Benefiting from such a detailed perspective can help professionals identify the exact nature of problems or opportunities, thus enabling precise actions that make a difference quickly.
Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain how a nexus of innovations for analyzing network traffic can help companies stay on top of their game. He’ll be briefed by Erik Giesa of ExtraHop, who will showcase his company’s stream analytics technology for wire data, which provides real-time, multidimensional views of network traffic. He’ll share success stories of how ExtraHop has solved otherwise intractable problems and enabled a new level of root-cause analysis.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
Keynote, Munich, June 2016
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
(PDF available upon request). This is an updated version of the 2012 BioITWorld Boston talk that I gave 6 weeks later at Bio IT World Asia in June 2012. Some slide content was updated and revised and I also deleted a number of slides in an attempt to shorten the talk since I'm known to speak fast. There was legit concern I'd be unintelligible to non-native english speakers!
Mapping Life Science Informatics to the CloudChris Dagdigian
Infrastructure cloud platforms such as those offered by Amazon Web Services are not designed and built with scientific research as the primary use case. These presentation slides cover the current state of mapping life science research and HPC technique onto “the cloud” and how to work around the common engineering, orchestration and data movement problems.
[Note: I've replaced the 2011 version of this talk deck with a slightly updated version as delivered at the AIRI Petabyte Challenge Meeting]
BioITWorld 2013 presentation - Best practices for building multi-tenant HPC clusters for Pharma/BioTech
Essentially a mini case study of a recent deployment of a multi-petabyte, 1000+ CPU core Linux cluster in the Boston area.
Please email me at: chris@bioteam.net if you would like the actual PDF file itself.
Disruptive Innovation: how do you use these theories to manage your IT?mark madsen
The term disruptive innovation was popularized by Harvard professor Clayton Christensen in his 1997 book “The Innovator’s Dilemma.” Nearly 20 years later “Disrupt!” is a popular leadership mantra that is more frequently uttered than experienced. You can't productize it. You can't always control it – at least what effects it has in practice. You aren't necessarily going to like every product of innovation. So are you sure you want it? If so, how do you promote a culture in which innovation can flower – and, potentially, thrive? Because that's probably the best that you can do.
Perhaps there's a better framing for innovation than just "disruption.“ This session is an overview of commmoditization and innovation theories followed by basic things you can do to apply that theory to your daily job architecting, choosing and managing a data environment in your company.
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Candid/blunt AWS advice for research IT and life science IT leadership. Hard lessons learned from many years of AWS consulting. Contact dag@bioteam.net if you want a PDF copy of this presentation
Bi isn't big data and big data isn't BI (updated)mark madsen
Big data is hyped, but isn't hype. There are definite technical, process and business differences in the big data market when compared to BI and data warehousing, but they are often poorly understood or explained. BI isn't big data, and big data isn't BI. By distilling the technical and process realities of big data systems and projects we can separate fact from fiction. This session examines the underlying assumptions and abstractions we use in the BI and DW world, the abstractions that evolved in the big data world, and how they are different. Armed with this knowledge, you will be better able to make design and architecture decisions. The session is sometimes conceptual, sometimes detailed technical explorations of data, processing and technology, but promises to be entertaining regardless of the level.
Yes, it’s about the data normally called “big”, but it’s not Hadoop for the database crowd, despite the prominent role Hadoop plays. The session will be technical, but in a technology preview/overview fashion. I won’t be teaching you to write MapReduce jobs or anything of the sort.
The first part will be an overview of the types, formats and structures of data that aren’t normally in the data warehouse realm. The second part will cover some of the basic technology components, vendors and architecture.
The goal is to provide an overview of the extent of data available and some of the nuances or challenges in processing it, coupled with some examples of tools or vendors that may be a starting point if you are building in a particular area.
Annual address covering trends, emerging requirements, pain points and infrastructure issues in the "Bio-IT" aka life science informatics and HPC realm; Email me if you want a PDF of this talk - chris@bioteam.net
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
Tiny slide deck from a 5-min lightning talk covering a recent project involving live replication of 2-petabytes of scientific data.
Please leave feedback if you'd like to see this as a long-form technical blog article or conference talk, thanks!
Data lakes, data exhaust, web scale, data is the new oil. Vendors are throwing new terms and analogies at us to convince us to buy their products as the market around data technologies grows. We change data persistence and transaction layers because "databases don't scale" or because data is "unstructured". If data had no structure then it wouldn't be data, it would be noise. Schema on read, schema on write, schemaless databases; they imply structure underlying the data. All data has schema, but that word may not mean what you think it means.
This presentation will describe concepts of data storage and retrieval from technology prehistory (i.e. before the 1980s) and examine the design principles behind both old and new technology for managing data because sometimes post-relational is actually pre-relational. It is important to separate what is identical to things that were tried in the past from new twists on old topics that deliver new capabilities.
Directly related to these topics are performance, scalability and the realities of what organizations do with data over time. All of these topics should guide architecture decisions to avoid the trap of creating technical debts that must be paid later, after systems are in place and change is difficult.
Briefing room: An alternative for streaming data collectionmark madsen
Knowing what’s happening in your enterprise right now can mark the difference between success and failure. The key is to have a rich view of activity, such that analysts and others can explore in a fully multidimensional fashion. Benefiting from such a detailed perspective can help professionals identify the exact nature of problems or opportunities, thus enabling precise actions that make a difference quickly.
Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain how a nexus of innovations for analyzing network traffic can help companies stay on top of their game. He’ll be briefed by Erik Giesa of ExtraHop, who will showcase his company’s stream analytics technology for wire data, which provides real-time, multidimensional views of network traffic. He’ll share success stories of how ExtraHop has solved otherwise intractable problems and enabled a new level of root-cause analysis.
Everything Has Changed Except Us: Modernizing the Data Warehousemark madsen
Keynote, Munich, June 2016
The way we make decisions has changed. The data we use has changed. The techniques we can apply to data and decisions have changed. Yet what we build and how we build it has barely changed in 20 years.
The definition of madness is doing more of what you already do and expecting different results. The threat to the data warehouse is not from new technology that will replace the data warehouse. It is from destabilization caused by new technology as it changes the architecture, and from failure to adapt to those changes.
The technology that we use is problematic because it constrains and sometimes prevents necessary activities. We don’t need more technology and bigger machines. We need different technology that does different things. More product features from the same vendors won’t solve the problem.
The data we want to use is challenging. We can’t model and clean and maintain it fast enough. We don’t need more data modeling to solve this problem. We need less modeling and more metadata.
And lastly, a change in scale has occurred. It isn’t a simple problem of “big”. The problem with current workloads has been solved, despite the performance problems that many people still have today. Scale has many dimensions – important among them are the number of discrete sources and structures, the rate of change of individual structures, the rate of change in data use, the variety of uses and the concurrency of those uses.
In short, we need new architecture that is not focused on creating stability in data, but one that is adaptable to continuous and rapidly changing uses of data.
(PDF available upon request). This is an updated version of the 2012 BioITWorld Boston talk that I gave 6 weeks later at Bio IT World Asia in June 2012. Some slide content was updated and revised and I also deleted a number of slides in an attempt to shorten the talk since I'm known to speak fast. There was legit concern I'd be unintelligible to non-native english speakers!
First Nonfiction Reading is a brand new three-level reading series for young, emergent-level readers that helps studentes transition from phonics to reading. Each realistic fiction passage is based on a school subject and helps bridge fiction and nonfiction topics.
Makers Go To College - Your Digital Future 2016Martin Hamilton
Young digital makers will need a new kind of college - some thoughts from me, presented at the City of Liverpool College Your Digital Future event in June 2016.
Find Complete schedule and dates about BITSAT entrance exam of BITS Pilani, Information regarding Application form and other detailed information, BITS Pilani entrance is tough exam
http://www.entrancezone.com/engineering/bitsat-2017-important-dates/
Just because you can doesn't mean that you should - thingmonk 2016Boris Adryan
Big data! Fast data! Real-time analytics! These are buzzwords commonly associated with platform offerings around IoT.
Although the Law of large numbers always applies, just because you can deploy more sensors doesn't automatically mean that you should. After all, they cost money, bandwidth, and can be a pain to maintain. On the example of the Westminster Parking Trial, I'd like to show how analytics on preliminary survey data could have reduced the number of deployed sensors significantly.
A similar logic goes for fast and real-time analytics. While being advertised as killer features, many people new to IoT and analytics are not even aware that they might get away with batch processing. On the example of flying a drone, I'd like to discuss for which use cases I'd apply edge processing (on the drone), stream or micro-batch analytics (when data arrives at the platform) or work on batched data (stored in a database).
Making the Most of In-Memory: More than SpeedInside Analysis
The Briefing Room with Robin Bloor and Kognitio
Live Webcast Oct. 1, 2013
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=7539482&rKey=bc304aa8dac7b781
Everyone’s talking about in-memory these days, and the term has become synonymous with speed. But pinning data into memory is just the beginning, and it’s about more than speed. In-memory solutions need a tailored architecture, one that can take full advantage RAM processing from every aspect, and this requires an approach that considers memory and CPU from the ground-up.
Register for this episode of The Briefing Room to hear from veteran Analyst Robin Bloor as he explains how memory is on the fast track to supersede disk, at least with respect to advanced analytics. He’ll be briefed by Kognitio CTO Roger Gaskell, who pioneered the in-memory analytical platform since its inception in 1989. He will also discuss how this type of solution changes the landscape for the modern data architecture and its impact on advanced analytical capabilities.
Visit InsideAnalysis.com for more information
CIW Lab with CoheisveFT: Get started in public cloud - Part 1 Cloud & Virtual...Ryan Koop
CohesiveFT: Get started with public cloud
It's time to explore the public cloud. Get familiar with Amazon's AWS EC2 compute and S3 storage. Demo and guides will prep you to do big things with hosting for your websites and apps!
Part 1 Cloud & Virtualization: Welcome! We'll run through the basics of public vs. private cloud, the cloud marketplace, and why we picked AWS to demonstrate
Hosted by: Ryan Koop, Director of Marketing
CIW Lab with CoheisveFT: Get started in public cloud - Part 2 Hands OnCohesive Networks
CohesiveFT: Get started with public cloud
It's time to explore the public cloud. Get familiar with Amazon's AWS EC2 compute and S3 storage. Demo and guides will prep you to do big things with hosting for your websites and apps!
Part 2 Hands On: After covering the basics of cloud and virtualization, we'll dive into AWS terminology and getting set up, then we'll all find an image and launch our own AWS instance. Additional information includes VPC vs. VNS3 features, real cloud use cases, and further reading.
Hosted by: Ryan Koop, Director of Product Marketing
AZUG.BE - Azure User Group Belgium - First public meetingMaarten Balliauw
- What is AZUG? Who is who?
- An overview of the Azure platform
- .NET Services
- Enterprise reasons to adopt the cloud
- Getting started with Azure
- Open discussion
CloudCamp Chicago - November 2013 Fighting Cloud FUDCloudCamp Chicago
Slides from the November CloudCamp Chicago. This time, we fought off Cloud FUD "Fear, uncertainty, and doubt"
Lightning talks included in these slides:
Tech in Illinois - Fred Hoch, Chairman, Illinois Technology Association @fredhoch
- "A retrospective of the Cloud, then and now" - Michael Segel, Segel & Associates; Founder of CHUG @chihadoopusers
- "Scientific Clouds: Hard Numbers vs. FUD" - Steve Timm, Lead FermiCloud Project, FermiLabs @StevenCTimm
- "Enterprise Adoption - The Chasm is Crossed" - Sashi Desikan, Global Executive, Pega Cloud at PegaSystems @PegaSashi
- "hybrid cloud governance" - Mike Bresett, Account CTO, Unisys @bresett
- "How we fought and are fighting Cloud FUD" - Paul Inboriboon, Director of Tech Infrastructure,Alzheimer's
Association @inboriboon
- "Can you make your cloud rain at the press of a button?" Robert Clarke, Account Executive, Crissie Insurance Group @RobertKClarke
CIW Lab with CoheisveFT: Get started in public cloud - Part 1 Cloud & Virtual...Cohesive Networks
CohesiveFT: Get started with public cloud
It's time to explore the public cloud. Get familiar with Amazon's AWS EC2 compute and S3 storage. Demo and guides will prep you to do big things with hosting for your websites and apps!
Part 1 Cloud & Virtualization: Welcome! We'll run through the basics of public vs. private cloud, the cloud marketplace, and why we picked AWS to demonstrate
Hosted by: Margaret Walker, Marketing Specialist
Slides for talk by Prof Christopher Millard on "Cloud computing: identifying and managing legal risks" at Google's Oxford Internet Institute Learned Lunches, Brussel, February 2011
Internet of Things (IoT) - in the cloud or rather on-premises?Guido Schmutz
You want to implement a Big Data or Internet of Things (IoT) solution and like to know if it should be implemented in the cloud or on-premises. You are interested in the cloud offerings of vendors and what benefits they provide and if a similar solution would not be possible on-premises.
This presentation deals with this and other questions. Starting from a vendor-independent reference architecture and corresponding design patterns, different cloud solutions from various vendors are compared and rated. Additionally, it will be shown how such solution could be implemented on-premises and how a hybrid IoT solution could look like.
Mehr und schneller ist nicht automatisch besser - data2day, 06.10.16Boris Adryan
Das Gesetz der großen Zahlen gilt immer: Die statistische Sicherheit nimmt mit der Anzahl der Datenpunkte immer zu, sofern die Datennahme fair erfolgt. Leider kostet das Sammeln der Daten oftmals Geld, und so ist man vor allem im Bereich der Sensorik (Stichwort: Internet der Dinge) gezwungen, sinnvolle Kompromisse einzugehen. In diesem Vortrag fasse ich die Erkenntnisse eines Projekts zusammen, in dem die Datenanalytik zeigte, dass man zukünftig nur 60% der ausgebrachten Sensoren wirklich braucht. Auch muss es nicht immer Echtzeit-Analyse sein: Mit einer auf den Business-Case abgestimmten Datenstrategie lassen sich unnötige Ausgaben vermeiden.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
PHP Frameworks: I want to break free (IPC Berlin 2024)
Bio-IT & Cloud Sobriety: 2013 Beyond The Genome Meeting
1. Bio-IT & Cloud Sobriety
Beyond the Genome, San Francisco 2013
Thursday, October 3, 13
2. 2
The ‘Meta’ Issue
What is driving all of this?
Drivers For Cloud Adoption In Bio-IT
What The Cloud Salespeople Will Not Tell You
Private Clouds & Practical Advice
Intro & Terminology
Getting our buzzwords straight
The Road Ahead
1
2
3
4
5
6
Thursday, October 3, 13
3. 3
I’m Chris.
I’m an infrastructure geek.
I work for the BioTeam.
Twitter: @chris_dag
Thursday, October 3, 13
4. Who, What, Why ...
4
BioTeam
‣ Independent consulting shop
‣ Staffed by scientists forced to
learn IT, SW & HPC to get our
own research done
‣ 10+ years bridging the “gap”
between science, IT & high
performance computing
‣ Our wide-ranging work is what
gets us invited to speak at
events like this ...
Thursday, October 3, 13
5. Seriously.
Listen to me at your own risk
‣ Clever people find multiple
solutions to common issues
‣ I’m fairly blunt, burnt-out
and cynical in my advanced
age
‣ Significant portion of my
work has been done in
demanding production
Biotech & Pharma
environments
‣ Filter my words accordingly
5
Thursday, October 3, 13
7. 7
Defining Terms
‣ The term ‘cloud computing’ is almost meaning-
free today – too many marketers have fuzzed
and co-opted the term
‣ Before serious discussion can occur it is
essential that all parties are operating from
similar baseline presumptions
Thursday, October 3, 13
8. Gartner
8
Defining Terms
‣ Gartner:
• “Cloud computing is a style of computing where
scalable and elastic IT-enabled capabilities are
delivered as a service to external customers using
Internet technologies.”
Thursday, October 3, 13
9. 9
My preferred definition
‣ Jinesh Varia on Amazon Web Services:
• “… a highly reliable and scalable infrastructure for
deploying web-scale solutions, with minimal support
and administration costs, and more flexibility than
you’ve come to expect from your own infrastructure,
either on-premise or at a datacenter facility.”
Thursday, October 3, 13
10. I’m an infrastructure geek, which do you think I prefer?
10
Cloud Subtypes
‣ Software as a Service
(SaaS)
‣ Platform as a Service
(PaaS)
‣ Infrastructure as a Service
(IaaS)
Thursday, October 3, 13
11. 11
This is an IaaS cloud talk
‣ We need flexible scientific computing and
informatics capability “on the cloud”
‣ Service and Platform clouds are not a good fit
for the flexible/general use case
‣ IaaS clouds provide “building blocks” that allow
us to build the informatics environments we
require
Thursday, October 3, 13
29. 29
16 of AWS’s biggest servers + 22 GPU nodes
... at a cost of $30/hour via Spot Market
Non Trivial HPC on the cloud
Thursday, October 3, 13
30. Why this work was ‘easy’ on Amazon AWS ...
30
Difficult on any other cloud
‣ Lets discuss why this simulation workload
would be much, much harder to do on some
other cloud platform ...
Thursday, October 3, 13
31. Why this work was ‘easy’ on Amazon AWS ...
31
Nightmare on any other cloud
1. Virtual Servers
2. Block Storage
3. Object Storage
4. ... and maybe
some other stuff
if I’m lucky
‣ EC2, S3, EBS, RDS, SNS,
SQS, SWS, GPUs, SSDs,
CloudFormation, VPC,
ENIs, SecurityGroups,
10GbE, DirectConnect,
Reserved Instances,
ImportExport, Spot Market
‣ And ~30 other products
and service features with
more added monthly
Brand ‘X’ Cloud Amazon
Thursday, October 3, 13
32. Easy on AWS; much harder elsewhere
One very specific example
32
‣ The widely used
FLEXlm license server
uses NIC MAC
addresses when
generating license keys
‣ Different MAC? Science
stops. Screwed.
‣ VPC ENIs allow
separation of MAC
address from Network
Interface. Badass.
Thursday, October 3, 13
33. 33
The ‘Meta’ Issue
What is driving all of this?
Drivers For Cloud Adoption In Bio-IT
What The Cloud Salespeople Will Not Tell You
Private Clouds & Practical Advice
Intro & Terminology
Getting our buzzwords straight
The Road Ahead
1
2
3
4
5
6
Thursday, October 3, 13
35. 35
Big Picture / Meta Issue
‣ HUGE revolution in the rate at which
lab platforms are being redesigned,
improved & refreshed
• Example: CCD sensor upgrade on that
confocal microscopy rig just doubled
storage requirements
• Example: The 2D ultrasound imager is
now a 3D imager
• Example: Illumina HiSeq upgrade just
doubled the rate at which you can acquire
genomes. Massive downstream increase
in storage, compute & data movement
needs
‣ For the above examples, do you
think IT was informed in advance?
Thursday, October 3, 13
36. Science progressing way faster than IT can refresh/change
The Central Problem Is ...
‣ Instrumentation & protocols are changing FAR
FASTER than we can refresh our Research-IT &
Scientific Computing infrastructure
• Bench science is changing month-to-month ...
• ... while our IT infrastructure only gets refreshed every
2-7 years
‣ We have to design systems TODAY that can
support unknown research requirements &
workflows over many years (gulp ...)
36
Thursday, October 3, 13
37. The Central Problem Is ...
‣ The easy period is over
‣ 5 years ago we could toss
inexpensive storage and
servers at the problem;
even in a nearby closet or
under a lab bench if
necessary
‣ That does not work any
more; real solutions
required
37
Thursday, October 3, 13
38. And a related problem ...
‣ It has never been easier to
acquire vast amounts of data
cheaply and easily
‣ Growth rate of data creation/
ingest exceeds rate at which
the storage industry is
improving disk capacity
‣ Not just a storage lifecycle
problem. This data *moves*
and often needs to be shared
among multiple entities and
providers
• ... ideally without punching holes in
your firewall or consuming all
available internet bandwidth
38
Thursday, October 3, 13
39. If we get it wrong ...
‣ Lost opportunity
‣ Missing capability
‣ Beaten by the competition
‣ Frustrated & very vocal scientific staff
‣ Problems in recruiting, retention,
publication & product development
39
Thursday, October 3, 13
40. 40
The ‘Meta’ Issue
What is driving all of this?
Drivers For Cloud Adoption In Bio-IT
What The Cloud Salespeople Will Not Tell You
Private Clouds & Practical Advice
Intro & Terminology
Getting our buzzwords straight
The Road Ahead
1
2
3
4
5
6
Thursday, October 3, 13
42. Mainstream in life science for quite some time ...
42
Public IaaS Clouds
‣ Public infrastructure clouds offer
excellent “pressure release valve”
when rapidly changing scientific
requirements can’t be satisfied by
on-premise infrastructure
‣ Economics can’t be ignored
‣ Popular meeting ground for data
swapping and collaboration
‣ ‘Scriptable Datacenters’ enabling
entirely new capabilities
‣ Money people like converting
CapEx to OpEx
Thursday, October 3, 13
43. The ‘neutral’ meeting ground ..
43
Cloud Hubs & Portals
‣ Many types of entities need
to meet, collaborate and
exchange life science data
‣ Data sharing hubs and
portals becoming popular on
public IaaS clouds like AWS
‣ Why?
• Far easier than punching holes in
your firewall and issuing VPN
credentials to outsiders
Thursday, October 3, 13
44. Compelling economics
44
Cloud Data Repositories
‣ IaaS clouds becoming ‘center of
gravity’ for some large scale
scientific data hosting
‣ Why?
• Compelling pricing
• No need to own & operate mirror sites
• AWS has some very interesting
‘downloader pays’ models that seem
to be a good fit for grant-funded
science with mandated multi-year
data accessibility requirements
www.1000genomes.org
Thursday, October 3, 13
45. My $.02
Amazon vs. Everyone Else
‣ AWS clear leader for Bio IT IaaS cloud use
‣ Why?
• By far the largest number of IaaS building blocks
• Rate of innovation puts AWS years ahead of competition
‣ Exceptions
• For specific high-value pipelines & workstreams, Google
& Microsoft are valid alternatives
45
Thursday, October 3, 13
46. 46
The ‘Meta’ Issue
What is driving all of this?
Drivers For Cloud Adoption In Bio-IT
What The Cloud Salespeople Will Not Tell You
Private Clouds & Practical Advice
Intro & Terminology
Getting our buzzwords straight
The Road Ahead
1
2
3
4
5
6
Thursday, October 3, 13
47. What the salesfolk won’t tell you ...
47
‣ There is no one-size-fits-all
research design pattern ...
‣ You are not going to toss everything
and replace it with “Big Data”
‣ Very few of us have a single pipeline
or workflow that we can devote
endless engineering effort to
‣ We are not going to toss out
hundreds of legacy codes and
rewrite everything for GPUs or
MapReduce
‣ For research HPC it’s all about the
building blocks { and how we can
effectively use/deploy them }
Thursday, October 3, 13
48. 48
What the salesfolk won’t tell you
‣ Your organization actually needs THREE
tested cloud design patterns:
‣ (1) To handle ‘legacy’ scientific apps &
workflows
‣ (2) The special stuff that is worth re-architecting
‣ (3) Hadoop & big data analytics
Thursday, October 3, 13
49. Legacy HPC on the Cloud
49
Design Pattern #1 - Legacy
‣ There are many hundreds of
existing algorithms and
applications in the life
science informatics space
‣ We’ll be running/using these
codes for years to come
‣ Many can’t or will never be
refactored or rewritten
‣ I call this the “legacy”
design pattern
Thursday, October 3, 13
51. StarCluster
51
Design Pattern #1 - Legacy
‣ MIT StarCluster
• http://web.mit.edu/star/cluster/
‣ Infinite Awesomeness. Worth a talk by itself.
‣ This is your baseline
‣ Extend as needed
Thursday, October 3, 13
52. 52
Design Pattern #2 - “Cloudy”
‣ Some of our research workflows are important
enough to be rewritten for “the cloud” and the
advantages that a truly elastic & API-driven
infrastructure can deliver
‣ This is where you have the most freedom
‣ Many published best practices you can borrow
‣ Warning: Cloud vendor lock-in potential is
strongest here
Thursday, October 3, 13
53. 53
Design Pattern #3 - Hadoop/BigData
‣ Hadoop and “big data” need to be on your
radar
‣ Be careful though, you’ll need a gas mask to
avoid the smog of marketing and vapid hype
‣ The utility is real and this does represent one
“future path” for analysis of large data sets
Thursday, October 3, 13
54. 54
Design Pattern #3 - Hadoop/BigData
‣ It’s gonna be a MapReduce world, get used to it
‣ Little need to roll your own Hadoop in 2013
‣ ISV & commercial ecosystem already healthy
‣ Multiple providers today; both onsite & cloud-
based
‣ Often a slam-dunk cloud use case
Thursday, October 3, 13
55. What you need to know
55
Design Pattern #3 - Hadoop/BigData
‣ “Hadoop” and “Big Data” are now general
terms
‣ You need to drill down to find out what people
actually mean
‣ We are still in the period where senior
leadership may demand “Hadoop” or “BigData”
capability without any actual business or
scientific need
Thursday, October 3, 13
56. What you need to know
56
Hadoop & “Big Data”
‣ In broad terms you can break “Big Data” down into two
very basic use cases:
1. Compute: Hadoop can be used as a very powerful
platform for the analysis of very large data sets. The
google search term here is “map reduce”
2. Data Stores: Hadoop is driving the development of very
sophisticated “no-SQL” “non-Relational” databases and
data query engines. The google search terms include
“nosql”, “couchdb”, “hive”, “pig” & “mongodb”, etc.
‣ Your job is to figure out which type applies for the
groups requesting “Hadoop” or “BigData” capability
Thursday, October 3, 13
57. What you need to know
57
Hadoop & “Big Data”
‣ Hadoop is being driven by a small group of
academics writing and releasing open source
life science hadoop applications;
‣ Your people will want to run these codes
‣ In some academic environments you may find
people wanting to develop on this platform
Thursday, October 3, 13
58. 58
The ‘Meta’ Issue
What is driving all of this?
Drivers For Cloud Adoption In Bio-IT
What The Cloud Salespeople Will Not Tell You
Private Clouds & Practical Advice
Intro & Terminology
Getting our buzzwords straight
The Road Ahead
1
2
3
4
5
6
Thursday, October 3, 13
60. 60
Private Clouds: Only 60% BS in ’13
‣ I’m known as a private cloud cynic
‣ The hype::usefulness ratio is still extreme
‣ For vendors it’s still a play to get you to toss
everything in your datacenter and ‘start fresh’
‣ However ...
Thursday, October 3, 13
61. 61
Private Clouds: Make sense for ...
‣ If you are a globe spanning enterprise with tens
of thousands of employees or “customers”
‣ If you want to leverage hardcore DevOps for
serious infrastructure automation and
configuration management
‣ If you want to use Private Cloud to drive fresh
new tech like object storage and software
defined networking (SDN) into your
environment
Thursday, October 3, 13
62. 62
Private Clouds: However ...
‣ My $.02 is that the two primary science-facing benefits
from Cloud are:
1. Browsable catalogs of available server images
2. Self-service (Scientists can select & provision systems)
‣ And guess what? You can do that TODAY on most
enterprise virtualization stacks WITHOUT jumping on
the private cloud bandwagon
‣ My advice:
• Think hard about what you hope to gain from private clouds and
do some extra due-diligence to see if you can gain those
capabilities in a simpler and cheaper way
Thursday, October 3, 13
64. Design Patterns
64
Practical Advice
‣ Remember the three design patterns on the
cloud:
• Legacy HPC systems
(replicate traditional clusters in the cloud)
• Hadoop
• Cloudy
(when you rewrite something to fully leverage cloud
capability)
Thursday, October 3, 13
65. Policies and Procedures
65
Practical Advice
‣ Cloud technology bits are easy. Cloud Process
and Policy discussions take forever
‣ Start these conversations sooner rather than
later!
Thursday, October 3, 13
66. Core services that take time and advance planning
66
Practical Advice
‣ A few key cloud services take time and
advanced planning to deploy properly:
‣ VPNs & subnet schemes
‣ Identity Management & Access Control
‣ Data Movement
Thursday, October 3, 13
68. 68
Physical Ingest Just Plain Nasty
‣ Easy to talk about in theory
‣ Seems “easy” to scientists
and even IT at first glance
‣ Really really nasty in practice
• Incredibly time consuming
• Significant operational burden
• Easy to do badly / lose data
Thursday, October 3, 13
69. And huge need for fast(er) research networks!
69
Huge Need For Network Ingest
1. Public data repositories have
petabytes of useful data
2. Collaborators still need to
swap data in serious ways
3. Amazon becoming an
important repo of public and
private sources
4. Many vendors now “deliver”
to the cloud
Thursday, October 3, 13
76. Network vs. Physical
Cloud Data Movement
‣ With a 1GbE internet connection ...
‣ and using Aspera software ....
‣ We sustained 700 MB/sec for more than 7 hours
freighting genomes into Amazon Web Services
‣ This is fast enough for many use cases,
including genome sequencing core facilities*
‣ Chris Dwan’s webinar on this topic:
http://biote.am/7e
76
Thursday, October 3, 13
77. Network vs. Physical
Cloud Data Movement
‣ Results like this mean we now favor network-
based data movement over physical media
movement
‣ Large-scale physical data movement carries a
high operational burden and consumes non-
trivial staff time & resources
77
Thursday, October 3, 13
78. There are three ways to do network data movement ...
Cloud Data Movement
1. Buy software from Aspera and be done with it
2. Attend the annual SuperComputing conference
& see which student group wins the bandwidth
challenge contest; use their code
3. Get GridFTP from the Globus folks
78
Thursday, October 3, 13
79. 79
The ‘Meta’ Issue
What is driving all of this?
Drivers For Cloud Adoption In Bio-IT
What The Cloud Salespeople Will Not Tell You
Private Clouds & Practical Advice
Intro & Terminology
Getting our buzzwords straight
The Road Ahead
1
2
3
4
5
6
Thursday, October 3, 13
81. Some final thoughts
81
Future Trends & Patterns
‣ Compute continues to become easier
‣ Data movement (physical & network) gets harder.
‣ The cloud decision may be made by
where your data actually resides
‣ Cost of storage will be dwarfed by “cost of
managing stored data”
‣ We can see end-of-life for our current IT
architecture and design patterns; new patterns
will start to appear over next 2-5 years
Thursday, October 3, 13
82. Very blurry lines in 2013 for all of these roles
82
Scientist/SysAdmin/Programmer
‣ Cloud is forcing these issues ...
‣ Far more control is going into
the hands of the research end
user
‣ IT support roles will radically
change -- no longer owners or
gatekeepers
‣ IT will handle policies,
procedures, reference patterns ,
security & best practices
‣ Researchers will control the
“what”, “when” and “how big”
Thursday, October 3, 13