This document discusses hybrid transactional/analytical processing (HTAP) with Apache Spark and in-memory data grids. It describes how combining an in-memory data grid for low-latency transactions with Spark enables real-time analytics over both historical and streaming data at scale. The approach integrates Spark and the data grid through connectors to provide a unified API, push down predicates from Spark to the grid for efficient processing, and leverage data locality. This hybrid model supports various data types and provides a scale-out, unified data store to meet the needs of Internet of Things and omni-channel applications.
From Data to Services at the Speed of BusinessAli Hodroj
From Data to Services at the Speed of Business: Applying cloud-native paradigm to combine fast data analytics with microservices architecture for hybrid workloads.
Real-time Microservices and In-Memory Data GridsAli Hodroj
How in-memory data grids enable a real-time microservices architecture while diminishing the accidental complexity of persistence, orchestration, and fragmentation of scale.
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
Overview of the Pentaho Big Data Analytics Suite from the Pentaho + Vertica presentation at Big Data Techcon 2014 in Boston for the session called "The Ultimate Selfie | Picture Yourself with the Fastest Analytics on Hadoop with HP Vertica and Pentaho"
From Data to Services at the Speed of BusinessAli Hodroj
From Data to Services at the Speed of Business: Applying cloud-native paradigm to combine fast data analytics with microservices architecture for hybrid workloads.
Real-time Microservices and In-Memory Data GridsAli Hodroj
How in-memory data grids enable a real-time microservices architecture while diminishing the accidental complexity of persistence, orchestration, and fragmentation of scale.
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
Overview of the Pentaho Big Data Analytics Suite from the Pentaho + Vertica presentation at Big Data Techcon 2014 in Boston for the session called "The Ultimate Selfie | Picture Yourself with the Fastest Analytics on Hadoop with HP Vertica and Pentaho"
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
At CenterPoint Energy, both structured and unstructured data are continuing to grow at a rapid pace. This growth presents many opportunities to deliver business value and many challenges to control costs. To maximize the value of this data while controlling costs, CenterPoint Energy created a data lake using SAP HANA and Hadoop. During this presentation, CenterPoint will discuss their journey of moving smart meter data to Hadoop, how Hadoop is allowing CenterPoint to derive value from big data and their future use case road map.
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionMSAdvAnalytics
Wee Hyong Tok. With Azure Data Factory (ADF), existing data movement and analytics processing services can be composed into data pipelines that are highly available and managed in the cloud. In this demo-driven session, you learn by example how to build, operationalize, and manage scalable analytics pipelines. Go to https://channel9.msdn.com/ to find the recording of this session.
In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed.
Optimizing industrial operations using the big data ecosystemDataWorks Summit
GE Digital is undertaking a journey to optimize the reliability, availability, and efficiency of assets in the industrial sector and converge IT and OT. To do so, GE Digital is building cloud-based products that enable customers to analyze the asset data, detect anomalies, and provide recommendations for operating plants efficiently while increasing productivity. In a energy sector such as oil and gas, power, or renewables, a single plant comprises multiple complex assets, such as steam turbines, gas turbines, and compressors, to generate power. Each system contains various sensors to detect the operating conditions of the assets, generating large volumes of variety of data. A highly scalable distributed environment is required to analyze such a large volume of data and provide operating insights in near real time.
In this session I will share the challenges encountered when analyzing the large volumes of data, in-stream data analysis and how we standardized the industrial data based on data frames, and performance tuning.
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksDatabricks
Many companies need to analyze large datasets that include location information. To be able to derive business insights from these datasets you need a solution that provides geospatial analysis functionalities and can scale to manage large volumes of information. The combination of CARTO and Databricks allows you to solve this kind of large scale geospatial analytics problems. CARTO provides a location intelligence platform to discover and predict key insights through location data. In this session we will see how we can integrate CARTO and Databricks and how we can take advantage of this combination to solve specific problems for industries such as logistics, telecommunications or financial services.
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
At CenterPoint Energy, both structured and unstructured data are continuing to grow at a rapid pace. This growth presents many opportunities to deliver business value and many challenges to control costs. To maximize the value of this data while controlling costs, CenterPoint Energy created a data lake using SAP HANA and Hadoop. During this presentation, CenterPoint will discuss their journey of moving smart meter data to Hadoop, how Hadoop is allowing CenterPoint to derive value from big data and their future use case road map.
Cortana Analytics Workshop: Operationalizing Your End-to-End Analytics SolutionMSAdvAnalytics
Wee Hyong Tok. With Azure Data Factory (ADF), existing data movement and analytics processing services can be composed into data pipelines that are highly available and managed in the cloud. In this demo-driven session, you learn by example how to build, operationalize, and manage scalable analytics pipelines. Go to https://channel9.msdn.com/ to find the recording of this session.
In this slidedeck, Infochimps Director of Product, Tim Gasper, discusses how Infochimps tackles business problems for customers by deploying a comprehensive Big Data infrastructure in days; sometimes in just hours. Tim unlocks how Infochimps is now taking that same aggressive approach to deliver faster time to value by helping customers develop analytic applications with impeccable speed.
Optimizing industrial operations using the big data ecosystemDataWorks Summit
GE Digital is undertaking a journey to optimize the reliability, availability, and efficiency of assets in the industrial sector and converge IT and OT. To do so, GE Digital is building cloud-based products that enable customers to analyze the asset data, detect anomalies, and provide recommendations for operating plants efficiently while increasing productivity. In a energy sector such as oil and gas, power, or renewables, a single plant comprises multiple complex assets, such as steam turbines, gas turbines, and compressors, to generate power. Each system contains various sensors to detect the operating conditions of the assets, generating large volumes of variety of data. A highly scalable distributed environment is required to analyze such a large volume of data and provide operating insights in near real time.
In this session I will share the challenges encountered when analyzing the large volumes of data, in-stream data analysis and how we standardized the industrial data based on data frames, and performance tuning.
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksDatabricks
Many companies need to analyze large datasets that include location information. To be able to derive business insights from these datasets you need a solution that provides geospatial analysis functionalities and can scale to manage large volumes of information. The combination of CARTO and Databricks allows you to solve this kind of large scale geospatial analytics problems. CARTO provides a location intelligence platform to discover and predict key insights through location data. In this session we will see how we can integrate CARTO and Databricks and how we can take advantage of this combination to solve specific problems for industries such as logistics, telecommunications or financial services.
"Applications, programming languages, and libraries that leverage sophisticated network hardware capabilities have a natural advantage when used in today’s and tomorrow’s high-performance and data center computer environments. Modern RDMA based network interconnects provides incredibly rich functionality (RDMA, Atomics, OS-bypass, etc.) that enable low-latency and high-bandwidth communication services. The functionality is supported by a variety of interconnect technologies such as InfiniBand, RoCE, iWARP, Intel OPA, Cray’s Aries/Gemini, and others. OFA organization and LinuxRDMA community have been playing a predominant role in the enablement efficient and vendor agnostic software stack for those interconnects. Over the last decade, the community has developed variety user/kernel level protocols and libraries that enable a variety of applications over RDMA including MPI, SHMEM, NFS over RDMA, IPoIB, and many others."
"With the emerging availability server platforms based on ARM CPU architecture, it is important to understand ARM integrates with RDMA hardware and software eco-system. In this talk, we will overview ARM architecture and system software stack. We will discuss how ARM CPU interacts with network devices and accelerators. In addition, we will share our experience in enabling RDMA software stack (OFED/MOFED Verbs) and one-sided communication libraries (Open UCX, OpenSHMEM/SHMEM) on ARM and share preliminary evaluation results."
Watch the video presentation: http://wp.me/p3RLHQ-gyO
Learn more: https://www.openfabrics.org/index.php/abstracts-agenda.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Exascale Computing Project - Driving a HUGE Change in a Changing Worldinside-BigData.com
In this video from the OpenFabrics Workshop in Austin, Al Geist from ORNL presents: Exascale Computing Project - Driving a HUGE Change in a Changing World.
"In this keynote, Mr. Geist will discuss the need for future Department of Energy supercomputers to solve emerging data science and machine learning problems in addition to running traditional modeling and simulation applications. In August 2016, the Exascale Computing Project (ECP) was approved to support a huge lift in the trajectory of U.S. High Performance Computing (HPC). The ECP goals are intended to enable the delivery of capable exascale computers in 2022 and one early exascale system in 2021, which will foster a rich exascale ecosystem and work toward ensuring continued U.S. leadership in HPC. He will also share how the ECP plans to achieve these goals and the potential positive impacts for OFA."
Learn more: https://exascaleproject.org/
and
https://www.openfabrics.org/index.php/abstracts-agenda.html
Sign up for our insideHPC Newsletter: https://www.openfabrics.org/index.php/abstracts-agenda.html
Cloudifying High Availability: The Case for Elastic Disaster RecoveryAli Hodroj
Elastic DR: a solution architecture that aims to optimally balance cost and recovery time via three core principles that are germane the cloud world:
On-Demand: The disaster recovery cloud can be provisioned on any availability zone, region, or public/private cloud through Cloudify's cloud-agnostic bootstrapping mechanism.
Elastic: The ability to automatically provision resources in the recovery cloud in case of disaster while eliminating the need for idle resources in normal scenarios, thereby fully profiting from the pay-per-use pricing model of clouds.
Flexible RTO/RPO: The architecture can be easily extended from a warm DR to a hot DR pattern through enabling/disabling application recipes. This allows us to exploit economies of scale that the cloud provides by matching the number of recipes/tiers to provision (in the recovery cloud) against the recovery time/point objective for our disaster recovery strategy
Accelerating Hadoop, Spark, and Memcached with HPC Technologiesinside-BigData.com
DK Panda from Ohio State University presented this deck at the OpenFabrics Workshop.
"Modern HPC clusters are having many advanced features, such as multi-/many-core architectures, highperformance RDMA-enabled interconnects, SSD-based storage devices, burst-buffers and parallel file systems. However, current generation Big Data processing middleware (such as Hadoop, Spark, and Memcached) have not fully exploited the benefits of the advanced features on modern HPC clusters. This talk will present RDMA-based designs using OpenFabrics Verbs and heterogeneous storage architectures to accelerate multiple components of Hadoop (HDFS, MapReduce, RPC, and HBase), Spark and Memcached. An overview of the associated RDMA-enabled software libraries (being designed and publicly distributed as a part of the HiBD project for Apache Hadoop (integrated and plug-ins for Apache, HDP, and Cloudera distributions), Apache Spark and Memcached will be presented. The talk will also address the need for designing benchmarks using a multi-layered and systematic approach, which can be used to evaluate the performance of these Big Data processing middleware."
Watch the video presentation: http://wp.me/p3RLHQ-gzg
Learn more: http://hibd.cse.ohio-state.edu/
and
https://www.openfabrics.org/index.php/abstracts-agenda.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Machine learning is overhyped nowadays. There is a strong belief that this area is exclusively for data scientists with a deep mathematical background that leverage Python (scikit-learn, Theano, Tensorflow, etc.) or R ecosystem and use specific tools like Matlab, Octave or similar. Of course, there is a big grain of truth in this statement, but we, Java engineers, also can take the best of machine learning universe from an applied perspective by using our native language and familiar frameworks like Apache Spark. During this introductory presentation, you will get acquainted with the simplest machine learning tasks and algorithms, like regression, classification, clustering, widen your outlook and use Apache Spark MLlib to distinguish pop music from heavy metal and simply have fun.
Source code: https://github.com/tmatyashovsky/spark-ml-samples
Design by Yarko Filevych: http://filevych.com/
Susan Coulter from LANL presented this deck at the OpenFabrics Workshop.
"The OpenFabrics Alliance (OFA) is an open source-based organization that develops, tests, licenses, supports and distributes OpenFabrics Software (OFS). The Alliance’s mission is to develop and promote software that enables maximum application efficiency by delivering wire-speed messaging, ultra-low latencies and maximum bandwidth directly to applications with minimal CPU overhead.
Founded in June 2004 as the OpenIB Alliance, the Alliance was originally focused on developing a vendor-independent, Linux-based InfiniBand software stack. In 2005, the Alliance committed itself to supporting Windows, a move that would make the Alliance’s software stack truly cross-platform. In 2006, the organization again expanded its charter to include support for iWARP and in 2010 it added support for RoCE (RDMA over Converged Ethernet), both for delivering high-performance RDMA and kernel bypass solutions over Ethernet. In 2014 the Alliance expanded again with the creation of the OpenFabrics Interfaces working group to investigate and incorporate support for other high performance networks."
Watch the video presentation: http://wp.me/p3RLHQ-gzo
Learn more: https://www.openfabrics.org/index.php/abstracts-agenda.html
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Video of the presentation can be seen here: https://www.youtube.com/watch?v=uxuLRiNoDio
The Data Source API in Spark is a convenient feature that enables developers to write libraries to connect to data stored in various sources with Spark. Equipped with the Data Source API, users can load/save data from/to different data formats and systems with minimal setup and configuration. In this talk, we introduce the Data Source API and the unified load/save functions built on top of it. Then, we show examples to demonstrate how to build a data source library.
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)inside-BigData.com
In this video from the Open Compute Summit, Siamak Tavallaei from Microsoft presents an overview of the Microsoft Project Olympus AI Accelerator Chassis, also known as the HGX-1.
Watch the presentation video: http://wp.me/p3RLHQ-guX
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
3 Things to Learn:
-How data is driving digital transformation to help businesses innovate rapidly
-How Choice Hotels (one of largest hoteliers) is using Cloudera Enterprise to gain meaningful insights that drive their business
-How Choice Hotels has transformed business through innovative use of Apache Hadoop, Cloudera Enterprise, and deployment in the cloud — from developing customer experiences to meeting IT compliance requirements
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
Thinking about your sales team's goals for 2017? Drift's VP of Sales shares 3 things you can do to improve conversion rates and drive more revenue.
Read the full story on the Drift blog here: http://blog.drift.com/sales-team-tips
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Red Hat's own Sr. Cloud Storage Solutions Architect Narendra Narang took the podium at Red Hat Storage Day New York 1/19/16 to highlight emerging use cases for Red Hat's software-defined-storage products.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Strata Singapore 2017 business use case section
"Big Telco Real-Time Network Analytics"
https://conferences.oreilly.com/strata/strata-sg/public/schedule/detail/62797
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
NoSQL data stores have emerged for scalable capture and real-time analysis of data. Apache Spark and Hadoop provide additional scalable analytics processing. This session looks at these technologies and how they can be used to support operational analytics to improve operational effectiveness. It also looks at an example of how operational analytics can be implemented in NoSQL environments using the Basho Data Platform with Apache Spark:
•The emergence of NoSQL, Hadoop and Apache Spark
•NoSQL Use Cases
•The need for operational analytics
•Types of operational analysis
•Key requirements for operational analytics
•Operational analytics using the Basho Data Platform with Apache Spark.
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
London Spark Meetup 2014-11-11 @Skimlinks
http://www.meetup.com/Spark-London/events/217362972/
To paraphrase the immortal crooner Don Ho: "Tiny Batches, in the wine, make me happy, make me feel fine." http://youtu.be/mlCiDEXuxxA
Apache Spark provides support for streaming use cases, such as real-time analytics on log files, by leveraging a model called discretized streams (D-Streams). These "micro batch" computations operated on small time intervals, generally from 500 milliseconds up. One major innovation of Spark Streaming is that it leverages a unified engine. In other words, the same business logic can be used across multiple uses cases: streaming, but also interactive, iterative, machine learning, etc.
This talk will compare case studies for production deployments of Spark Streaming, emerging design patterns for integration with popular complementary OSS frameworks, plus some of the more advanced features such as approximation algorithms, and take a look at what's ahead — including the new Python support for Spark Streaming that will be in the upcoming 1.2 release.
Also, let's chat a bit about the new Databricks + O'Reilly developer certification for Apache Spark…
Continuous Intelligence - Intersecting Event-Based Business Logic and MLParis Carbone
Modern data-driven business infrastructure is not as effective as it should be when it comes to critical decision making. Eng-to-End Data pipelines are composed out of fundamentally diverse pieces of tech, each focusing on a specific frontend (e.g., DataFrames, Tensors, Streams) and running in total isolation, thus, being highly unoptimised and complex to integrate with event-based business logic. Our research group has been looking into ways we can use advanced systems theory to compile, optimise and execute distributed functions in unison across the whole spectrum of data-driven programming, leading to a unified way to combine analytics and services all the way down to hardware execution and make continuous intelligence a reality.
Key Takeaways
Introducing the concept of Continuous Intelligence and why we are not there yet.
Pinpointing weaknesses in the current way we structure data-driven pipelines today
Explaining the potential of an Intermediate Representation (IR) and Shared Hardware Execution support to solve the problem.
Presenting our vision on how this new tech can be used to radically change the way we declare and distill knowledge from data in a fast-changing world.
Similar to Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids (20)
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
Keep tabs on your field staff effortlessly with Informap Technology Centre LLC. Real-time tracking, task assignment, and smart features for efficient management. Request a live demo today!
For more details, visit us : https://informapuae.com/field-staff-tracking/
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
6. 6
Sample Customer Use Cases
Internet of Things Omni-Channel Operational
Intelligence
Operational
Analytics
Predictive
Analytics
Fraud Detection, Supply
chain optimization
Personalization,
Recommendation
Edge
Analytics
Operational Intelligence,
Predictive Maintenance,
Spatial Analytics
7. 7
In-Memory Computing
(not a new thing)
Rapid decline in RAM prices lead to advanced data processing
innovations
drives
• Transactional (2001-present)
– In-Memory Databases
– In-Memory Data Grids
• Analytics (2012-present)
– In-Memory Data Processing
Frameworks (Spark)
– In-Memory File Systems (Tachyon)
9. 99
Data Grid is a cluster of
machines that work
together to create a
resilient shared data
fabric for low-latency
data access and extreme
transaction processing
In-Memory Data Grid:
Online Transaction Processing at Low-Latency and High Throughput
http://xap.github.io
10. 10
In-Memory Data Grid 101
Feeder
Virtual Machine Virtual MachineVirtual Machine
Partitioned Data
13. 13
In-Memory Data Grid 101: Typical Deployment
HTML
HTTP/S
HW LB
REST
HTTP/
S
REST
HTTP/S
LB
Agen
t
GSA
HTTPD
Load
Balanc
er
LB
Agen
t
GSA
HTTPD
Load
Balanc
er
Mirror
Service
GSA
DB
Private or Public Cloud
Processing Processing Processing
Processing Processing
Processi
ng
Processing Processing Processing
Processing Processing Processing
Primary Set 1 Primary Set 2 Primary Set 3
Primary Set 4 Primary Set 5 Primary Set 6
Backup Set 6Backup Set 5Backup Set 4
Backup Set 1 Backup Set 2 Backup Set 3
GSA GSA GSA
GSA GSA GSA
Async
)
14. 14
Host Cisco UCS Server
CPU Intel 16core 2.9GHz
Concurrent Threads 2
Throughput 200, 400, 800 ops/sec
16. 16
Hybrid Transactional/Analytics Processing at Scale
Provide closed-loop analytics pipeline. Data,
insight, to action at sub-second latency
IoT and Omni-channel require the
convergence of many different data
types
Blend of both real-time and historical
data
Requirements
1
Bi-directional integration between
transactional and analytical data stores
Ability to support POJO, JSON,
GeoSpatial, and Unstructured types
through a unified API
Unified and scale-out real-time
and historical data store
Challenges
2
3
24. 24
• Get Partitions: An array of partitions
that a dataset is divided to
• Compute: A compute function to do a
computation on partitions
• Get Preferred Location: Optional
preferred locations, i.e. hosts for a
partition where the data will be loaded
• IMDG Distributed Query to get partitions
and their hosts
• Iterator over portion of data
• Hosts from Distributed Query
Build a connector: Spark to IMDG
26. 26
Aggregation in
Spark
Filtering and
columns pruning
in Data Grid
SELECT SUM(amount)
FROM order
WHERE city = ‘NY’ AND year > 2012
Spark SQL architecture:
• Pushing down predicates to Data Grid
• Leveraging indexes
• Transparent to user
• Enabling support for other languages -
Python/R
Implementing DataSource API
Pattern #2: Pushdown Predicates (Grid-side processing)
27. 27
node 1
Spark master
Grid
master
node 2
Spark worker
Grid
Partition
node 3
Spark worker
Grid
Partition
Lightweight
workers,
small JVMs
Large JVMs,
Fast
indexing
NoSQL Storage
Pattern #3: Decouple Data Processing from Data Storage
29. 29
Ability to support POJO, JSON, GeoSpatial, and
Unstructured types through a unified API
2
IoT and Omni-channel require the convergence of many
different data types