Survivorship bias applied to information. Cognition, how we learn, sensation and perception, experience. Human sight and visual perception, visual memory. Gestalt principles. Machine perception.
Presentation by
Primary Information Services
www.primaryinfo.com
mailto:primaryinfo@gmail.com
Download PDF Version at
https://www.slideshare.net/thorapadi/presentations
See You tube Channel
https://www.youtube.com/user/ch600091/videos?view_as=subscriber
Survivorship bias applied to information. Cognition, how we learn, sensation and perception, experience. Human sight and visual perception, visual memory. Gestalt principles. Machine perception.
Presentation by
Primary Information Services
www.primaryinfo.com
mailto:primaryinfo@gmail.com
Download PDF Version at
https://www.slideshare.net/thorapadi/presentations
See You tube Channel
https://www.youtube.com/user/ch600091/videos?view_as=subscriber
Максим Хромцов "Yandex MapKit для Android OS в примерах"Yandex
15 октября 2011, Я.Субботник в Алматы
Максим Хромцов Yandex "MapKit для Android OS в примерах"
О докладе:
Разработка приложений на основе Yandex MapKit для Android OS. Внедрение Yandex Map в приложения для увеличения функциональности. В докладе рассматривается описание основных модулей, на основе примеров объясняется принцип работы и их взаимодействие. Также рассказывается, с какими трудностями мы столкнулись во время разработки и как мы их решили.
Presentation on the topic Screenless Display , it is a type of display in which no screen is used.
It has 3 sub-topic visual image, retinal direct display and synaptic interface
The Blue Brain, a Swiss national brain initiative, aims to create a digital reconstruction of the brain by reverse-engineering mammalian brain circuitry. The mission of the project, founded in May 2005 by the Brain and Mind Institute of the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, is to use biologically-detailed digital reconstructions and simulations of the mammalian brain (brain simulation) to identify the fundamental principles of brain structure and function in health and disease.
It is said that within 30 years we will be able to scan ourselves into computers.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2016-member-meeting-uofw
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Professor Jeff Bilmes of the University of Washington delivers the presentation "Image and Video Summarization" at the December 2016 Embedded Vision Alliance Member Meeting. Bilmes provides an overview of the state of the art in image and video summarization.
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
Alluxio Webinar
September 22, 2020
For more Alluxio events: https://www.alluxio.io/events/
Speakers:
Alex Ma, Alluxio
Peter Behrakis, Alluxio
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.
In this tech talk, we'll go over:
- What is Alluxio Data Orchestration?
- How does it work?
- Alluxio customer results
Максим Хромцов "Yandex MapKit для Android OS в примерах"Yandex
15 октября 2011, Я.Субботник в Алматы
Максим Хромцов Yandex "MapKit для Android OS в примерах"
О докладе:
Разработка приложений на основе Yandex MapKit для Android OS. Внедрение Yandex Map в приложения для увеличения функциональности. В докладе рассматривается описание основных модулей, на основе примеров объясняется принцип работы и их взаимодействие. Также рассказывается, с какими трудностями мы столкнулись во время разработки и как мы их решили.
Presentation on the topic Screenless Display , it is a type of display in which no screen is used.
It has 3 sub-topic visual image, retinal direct display and synaptic interface
The Blue Brain, a Swiss national brain initiative, aims to create a digital reconstruction of the brain by reverse-engineering mammalian brain circuitry. The mission of the project, founded in May 2005 by the Brain and Mind Institute of the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, is to use biologically-detailed digital reconstructions and simulations of the mammalian brain (brain simulation) to identify the fundamental principles of brain structure and function in health and disease.
It is said that within 30 years we will be able to scan ourselves into computers.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/dec-2016-member-meeting-uofw
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Professor Jeff Bilmes of the University of Washington delivers the presentation "Image and Video Summarization" at the December 2016 Embedded Vision Alliance Member Meeting. Bilmes provides an overview of the state of the art in image and video summarization.
NVIDIA’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, NVIDIA is increasingly known as “the AI computing company.”
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
Alluxio Webinar
September 22, 2020
For more Alluxio events: https://www.alluxio.io/events/
Speakers:
Alex Ma, Alluxio
Peter Behrakis, Alluxio
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.
In this tech talk, we'll go over:
- What is Alluxio Data Orchestration?
- How does it work?
- Alluxio customer results
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
Alluxio Community Office Hour
February 23, 2021
For more Alluxio events: https://www.alluxio.io/events/
Speaker(s):
Alex Ma, Alluxio
Peter Behrakis, Alluxio
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.
In this tech talk, we'll go over:
- What is Alluxio Data Orchestration?
- How does it work?
- Alluxio customer results
Data Orchestration for the Hybrid Cloud EraAlluxio, Inc.
Alluxio Community Office Hour
October 20, 2020
For more Alluxio events: https://www.alluxio.io/events/
Speaker(s):
Alex Ma, Alluxio
Peter Behrakis, Alluxio
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.
In this tech talk, we'll go over:
- What is Alluxio Data Orchestration?
- How does it work?
- Alluxio customer results
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
Using “zero-copy” hybrid bursting on remote data to solve data lake analytics capacity and performance problems.
Data scientists want answers on demand. But in today’s enterprise architectures, the reality is that most data remains on-prem, despite the promise of cloud-based analytics. Moving all that data to the cloud has typically not been possible for many reasons including cost, latency, and technical difficulty. So, what if there was a technology that would connect these on-prem environments to any major cloud platform, enabling high-powered computing without the need to move massive amounts of data?
Join us for this webinar where Alex Ma of Alluxio, an open-source data orchestration platform, will discuss how a data orchestration approach offers a solution for connecting traditional on-prem data centers and cloud data lakes with other clouds and data centers. With Alluxio’s “zero-copy” burst solution, companies can bridge remote data centers and data lakes with computing frameworks in other locations, enabling them to offload, compute, and leverage the flexibility, scalability, and power of the cloud for their remote data.
Accelerate Analytics and ML in the Hybrid Cloud EraAlluxio, Inc.
Alluxio Webinar
April 6, 2021
For more Alluxio events: https://www.alluxio.io/events/
Speakers:
Alex Ma, Alluxio
Peter Behrakis, Alluxio
Many companies we talk to have on premises data lakes and use the cloud(s) to burst compute. Many are now establishing new object data lakes as well. As a result, running analytics such as Hive, Spark, Presto and machine learning are experiencing sluggish response times with data and compute in multiple locations. We also know there is an immense and growing data management burden to support these workflows.
In this talk, we will walk through what Alluxio’s Data Orchestration for the hybrid cloud era is and how it solves the performance and data management challenges we see.
In this tech talk, we'll go over:
- What is Alluxio Data Orchestration?
- How does it work?
- Alluxio customer results
From limited Hadoop compute capacity to increased data scientist efficiencyAlluxio, Inc.
Alluxio Tech Talk
Oct 17, 2019
Speaker:
Alex Ma, Alluxio
Want to leverage your existing investments in Hadoop with your data on-premise and still benefit from the elasticity of the cloud?
Like other Hadoop users, you most likely experience very large and busy Hadoop clusters, particularly when it comes to compute capacity. Bursting HDFS data to the cloud can bring challenges – network latency impacts performance, copying data via DistCP means maintaining duplicate data, and you may have to make application changes to accomodate the use of S3.
“Zero-copy” hybrid bursting with Alluxio keeps your data on-prem and syncs data to compute in the cloud so you can expand compute capacity, particularly for ephemeral Spark jobs.
Achieving Separation of Compute and Storage in a Cloud WorldAlluxio, Inc.
Alluxio Tech Talk
Feb 12, 2019
Speaker:
Dipti Borkar, Alluxio
The rise of compute intensive workloads and the adoption of the cloud has driven organizations to adopt a decoupled architecture for modern workloads – one in which compute scales independently from storage. While this enables scaling elasticity, it introduces new problems – how do you co-locate data with compute, how do you unify data across multiple remote clouds, how do you keep storage and I/O service costs down and many more.
Enter Alluxio, a virtual unified file system, which sits between compute and storage that allows you to realize the benefits of a hybrid cloud architecture with the same performance and lower costs.
In this webinar, we will discuss:
- Why leading enterprises are adopting hybrid cloud architectures with compute and storage disaggregated
- The new challenges that this new paradigm introduces
- An introduction to Alluxio and the unified data solution it provides for hybrid environments
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
Alluxio Tech Talk
January 21, 2020
Speakers:
Matt Fuller, Starburst
Dipti Borkar, Alluxio
With the advent of the public clouds and data increasingly siloed across many locations -- on premises and in the public cloud -- enterprises are looking for more flexibility and higher performance approaches to analyze their structured data.
Join us for this tech talk where we’ll introduce the Starburst Presto, Alluxio, and cloud object store stack for building a highly-concurrent and low-latency analytics platform. This stack provides a strong solution to run fast SQL across multiple storage systems including HDFS, S3, and others in public cloud, hybrid cloud, and multi-cloud environments. You’ll learn more about:
- The architecture of Presto, an open source distributed SQL engine
- How the Presto + Alluxio stack queries data from cloud object storage like S3 for faster and more cost-effective analytics
- Achieving data locality and cross-job caching with Alluxio regardless of where data is persisted
Alluxio 2.0 Deep Dive – Simplifying data access for cloud workloadsAlluxio, Inc.
Alluxio Tech Talk
Aug 7, 2019
Speaker:
Dipti Borkar, Alluxio
Alluxio 2.0 is the most ambitious platform upgrade since the inception of Alluxio with greatly expanded capabilities to empower users to run analytics and AI workloads on private, public or hybrid cloud infrastructures leveraging valuable data wherever it might be stored.
This release, now available for download, includes many advancements that will allow users to push the limits of their data-workloads in the cloud.
In this tech talk, we will introduce the key new features and enhancements such as:
- Support for hyper-scale data workloads with tiered metadata storage, distributed cluster services, and adaptive replication for increased data locality
- Machine learning and deep learning workloads on any storage with the improved POSIX API
- Better storage abstraction with support for HDFS clusters across different versions & active sync with Hadoop
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
In questa sessione HPE e SUSE illustrano con casi reali come HPE Data Management Framework e SUSE Enterprise Storage permettano di risolvere i problemi di gestione della crescita esponenziale dei dati realizzando un’architettura software-defined flessibile, scalabile ed economica. (Alberto Galli, HPE Italia e SUSE)
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyAlluxio, Inc.
Data Orchestration Summit 2020 organized by Alluxio
https://www.alluxio.io/data-orchestration-summit-2020/
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Sandipan Chakraborty, Director of Engineering (Rakuten)
About Alluxio: alluxio.io
Engage with the open source community on slack: alluxio.io/slack
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
AI/ML Infra Meetup | ML explainability in MichelangeloAlluxio, Inc.
AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Eric Wang (Software Engineer, @Uber)
Uber has numerous deep learning models, most of which are highly complex with many layers and a vast number of features. Understanding how these models work is challenging and demands significant resources to experiment with various training algorithms and feature sets. With ML explainability, the ML team aims to bring transparency to these models, helping to clarify their predictions and behavior. This transparency also assists the operations and legal teams in explaining the reasons behind specific prediction outcomes.
In this talk, Eric Wang will discuss the methods Uber used for explaining deep learning models and how we integrated these methods into the Uber AI Michelangelo ecosystem to support offline explaining.
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAlluxio, Inc.
AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Junchen Jiang (Assistant Professor of Computer Science, @University of Chicago)
Prefill in LLM inference is known to be resource-intensive, especially for long LLM inputs. While better scheduling can mitigate prefill’s impact, it would be fundamentally better to avoid (most of) prefill. This talk introduces our preliminary effort towards drastically minimizing prefill delay for LLM inputs that naturally reuse text chunks, such as in retrieval-augmented generation. While keeping the KV cache of all text chunks in memory is difficult, we show that it is possible to store them on cheaper yet slower storage. By improving the loading process of the reused KV caches, we can still significantly speed up prefill delay while maintaining the same generation quality.
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Triston Cao (Senior Deep Learning Software Engineering Manager, @NVIDIA)
From Caffe to MXNet, to PyTorch, and more, Xiande Cao, Senior Deep Learning Software Engineer Manager, will share his perspective on the evolution of deep learning frameworks.
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
AI/ML Infra Meetup
May. 23, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Lu Qiu (Data & AI Platform Tech Lead, @Alluxio)
- Siyuan Sheng (Senior Software Engineer, @Alluxio)
Speed and efficiency are two requirements for the underlying infrastructure for machine learning model development. Data access can bottleneck end-to-end machine learning pipelines as training data volume grows and when large model files are more commonly used for serving. For instance, data loading can constitute nearly 80% of the total model training time, resulting in less than 30% GPU utilization. Also, loading large model files for deployment to production can be slow because of slow network or storage read operations. These challenges are prevalent when using popular frameworks like PyTorch, Ray, or HuggingFace, paired with cloud object storage solutions like S3 or GCS, or downloading models from the HuggingFace model hub.
In this presentation, Lu and Siyuan will offer comprehensive insights into improving speed and GPU utilization for model training and serving. You will learn:
- The data loading challenges hindering GPU utilization
- The reference architecture for running PyTorch and Ray jobs while reading data from S3, with benchmark results of training ResNet50 and BERT
- Real-world examples of boosting model performance and GPU utilization through optimized data access
Alluxio Monthly Webinar | Simplify Data Access for AI in Multi-CloudAlluxio, Inc.
Alluxio Monthly Webinar
May. 14, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- ChanChan Mao (Developer Advocate, Alluxio)
- Bin Fan (VP of Technology, Alluxio)
Running AI/ML workloads in different clouds present unique challenges. The key to a manageable multi-cloud architecture is the ability to seamlessly access data across environments with high performance and low cost.
This webinar is designed for data platform engineers, data infra engineers, data engineers, and ML engineers who work with multiple data sources in hybrid or multi-cloud environments. Chanchan and Bin will guide the audience through using Alluxio to greatly simplify data access and make model training and serving more efficient in these environments.
You will learn:
- How to access data in multi-region, hybrid, and multi-cloud like accessing a local file system
- How to run PyTorch to read datasets and write checkpoints to remote storage with Alluxio as the distributed data access layer
- Real-world examples and insights from tech giants like Uber, AliPay and more
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
Alluxio Monthly Webinar
Apr. 23, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- ChanChan Mao (Developer Advocate, Alluxio)
- Shawn Sun (Tech Lead of Cloud Native, Alluxio)
Cloud-native model training jobs require fast data access to achieve shorter training cycles. Accessing data can be challenging when your datasets are distributed across different regions and clouds. Additionally, as GPUs remain scarce and expensive resources, it becomes more common to set up remote training clusters from where data resides. This multi-region/cloud scenario introduces the challenges of losing data locality, resulting in operational overhead, latency and expensive cloud costs.
In the third webinar of the multi-cloud webinar series, Chanchan and Shawn dive deep into:
- The data locality challenges in the multi-region/cloud ML pipeline
- Using a cloud-native distributed caching system to overcome these challenges
- The architecture and integration of PyTorch/Ray+Alluxio+S3 using POSIX or RESTful APIs
- Live demo with ResNet and BERT benchmark results showing performance gains and cost savings analysis
Optimizing Data Access for Analytics And AI with AlluxioAlluxio, Inc.
Alluxio x Tobiko - ETL Happy Hour
April 16, 2024
For more Alluxio events: https://alluxio.io/events/
Speaker:
Lucy Ge (Staff Software Engineer @ Alluxio)
In this presentation, Lucy Ge will discuss the data access challenges in the data pipeline and how to optimize the speed and costs of analytics and AI workloads.
Speed Up Presto at Uber with Alluxio CachingAlluxio, Inc.
Alluxio x Tobiko - ETL Happy Hour
April 16, 2024
For more Alluxio events: https://alluxio.io/events/
Speaker:
Chen Liang (Staff Software Engineer @ Uber)
In this presentation, Chen Liang will share the design and implementation of the Alluxio-Presto local cache to reduce query latency.
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
Alluxio x Tobiko - ETL Happy Hour
April 16, 2024
For more Alluxio events: https://alluxio.io/events/
Speaker:
Toby Mao (CTO @ Tobiko Data)
Writing efficient and correct incremental pipelines is challenging. Data practitioners who take on this challenge are viewed as performing an "advanced" function, which discourages broader teams from adopting incremental loads. In this lightning talk, CTO of Tobiko Data, Toby Mao, will demystify incremental loading data at scale.
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLAlluxio, Inc.
Big Data Bellevue Meetup
March 21, 2024
For more Alluxio events: https://alluxio.io/events/
Speakers:
Bin Fan (VP of Open Source, Alluxio)
In this presentation, Bin Fan (VP of Open Source @ Alluxio) will address a critical challenge of optimizing data loading for distributed Python applications within AI/ML workloads in the cloud, focusing on popular frameworks like Ray and Hugging Face. Integration of Alluxio’s distributed caching for Python applications is accomplished using the fsspec interface, thus greatly improving data access speeds. This is particularly useful in machine learning workflows, where repeated data reloading across slow, unstable or congested networks can severely affect GPU efficiency and escalate operational costs.
Attendees can look forward to practical, hands-on demonstrations showcasing the tangible benefits of Alluxio’s caching mechanism across various real-world scenarios. These demos will highlight the enhancements in data efficiency and overall performance of data-intensive Python applications. This presentation is tailored for developers and data scientists eager to optimize their AI/ML workloads. Discover strategies to accelerate your data processing tasks, making them not only faster but also more cost-efficient.
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio, Inc.
Alluxio Monthly Webinar
Feb. 27, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Tarik Bennett (Senior Solutions Engineer, Alluxio)
As GenAI and AI continue to transform businesses, scaling these workloads requires optimized underlying infrastructure. A multi-cloud architecture allows organizations to leverage different cloud services to meet diverse workload demands while maximizing efficiency, reducing costs, and avoiding vendor lock-in. However, achieving a multi-cloud vision can be challenging.
In this webinar, Tarik will share how an agonistic data layer, like Alluxio, allows you to embrace the separation of storage from compute and simplify the adoption of multi-cloud for AI.
- Learn why leveraging multiple cloud providers is critical for balancing performance, scalability, and cost of your AI platform
- Discover how an agnostic data layer like Alluxio provides seamless data access in multi-cloud that bridges storage and compute without data replication
- Gain insights into real-world examples and best practices for deploying AI across on-prem, hybrid, and multi-cloud environments
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...Alluxio, Inc.
Alluxio Monthly Webinar
Jan. 30, 2024
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Kevin Petrie (VP of Research, Eckerson Group)
- Omid Razavi (SVP of Customer Success, Alluxio)
2024 is gearing up to be an impactful year for AI and analytics. Join us on January 30, as Kevin Petrie (VP of Research at Eckerson Group) and Omid Razavi (SVP of Customer Success at Alluxio) share key trends that data and AI leaders should know. This event will efficiently guide you with market data and expert insights to drive successful business outcomes.
- Assess current and future trends in data and AI with industry experts
- Discover valuable insights and practical recommendations
- Learn best practices to make your enterprise data more accessible for both analytics and AI applications
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionAlluxio, Inc.
Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Juncheng Yang(Ph.D Candidate, @CMU)
As a cache eviction algorithm, FIFO has a lot of attractive properties, such as simplicity, speed, scalability, and flash-friendliness. The most prominent criticism of FIFO is its low efficiency (high miss ratio). In this talk, I will describe a simple, scalable FIFO-based algorithm with three static queues (S3-FIFO). Evaluated on 6594 cache traces from 14 datasets, we show that S3- FIFO has lower miss ratios than state-of-the-art algorithms across traces. Moreover, S3-FIFO’s efficiency is robust — it has the lowest mean miss ratio on 10 of the 14 datasets. FIFO queues enable S3-FIFO to achieve good scalability with 6× higher throughput compared to optimized LRU at 16 threads. Our insight is that most objects in skewed workloads will only be accessed once in a short window, so it is critical to evict them early (also called quick demotion). The key of S3-FIFO is a small FIFO queue that filters out most objects from entering the main cache, which provides a guaranteed demotion speed and high demotion precision.
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeAlluxio, Inc.
Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jingwen Ouyang (Product Manager, @Alluxio)
In this session, Jingwen presents an overview of using Alluxio Edge caching to accelerate Trino or Presto queries. She offers practical best practices for using distributed caching with compute engines. In addition, this session also features insights from real-world examples.
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudAlluxio, Inc.
Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Siyuan Sheng (Senior Software Engineer, @Alluxio)
- Chunxu Tang (Research Scientist, @Alluxio)
In this session, cloud optimization specialists Chunxu and Siyuan break down the challenges and present a fresh architecture designed to optimize I/O across the data pipeline, ensuring GPUs function at peak performance. The integrated solution of PyTorch/Ray + Alluxio + S3 offers a promising way forward, and the speakers delve deep into its practical applications. Attendees will not only gain theoretical insights but will also be treated to hands-on instructions and demonstrations of deploying this cutting-edge architecture in Kubernetes, specifically tailored for Tensorflow/PyTorch/Ray workloads in the public cloud.
Data Infra Meetup | ByteDance's Native Parquet ReaderAlluxio, Inc.
Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Shengxuan Liu (Software Engineer, @ByteDance)
Shengxuan Liu from ByteDance presents the new ByteDance’s native Parquet Reader. The talk covers the architecture and key features of the Reader, and how the new Reader is able to facilitate data processing efficiency.
Data Infra Meetup | Uber's Data Storage EvolutionAlluxio, Inc.
Data Infra Meetup
Jan. 25, 2024
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Jing Zhao (Principal Engineer, @Uber)
Uber builds one of the biggest data lakes in the industry, which stores exabytes of data. In this talk, we will introduce the evolution of our data storage architecture, and delve into multiple key initiatives during the past several years.
Specifically, we will introduce:
- Our on-prem HDFS cluster scalability challenges and how we solved them
- Our efficiency optimizations that significantly reduced the storage overhead and unit cost without compromising reliability and performance
- The challenges we are facing during the ongoing Cloud migration and our solutions
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio, Inc.
Alluxio Monthly Webinar
Nov. 15, 2023
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Tarik Bennett (Senior Solutions Engineer)
- Beinan Wang (Senior Staff Engineer & Architect)
Many companies are working with development architectures for AI platforms but have concerns about efficiency at scale as data volumes increase. They use centralized cloud data lakes, like S3, to store training data for AI platforms. However, GPU shortages add more complications. Storage and compute can be separate, or even remote, making data loading slow and expensive:
1) Optimizing a developmental setup can include manual copies, which are slow and error-prone
2) Directly transferring data across regions or from cloud to on-premises can incur expensive egress fees
This webinar covers solutions to improve data loading for model training. You will learn:
- The data loading challenges with distributed infrastructure
- Typical solutions, including NFS/NAS on object storage, and why they are not the best options
- Common architectures that can improve data loading and cost efficiency
- Using Alluxio to accelerate model training and reduce costs
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...Alluxio, Inc.
AI Infra Day
Oct. 25, 2023
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Adit Madan (Director of Product Management, @Alluxio)
In this session, Adit Madan, Director of Product Management at Alluxio, presents an overview of using distributed caching to accelerate model training and serving. He explores the requirements of data access patterns in the ML pipeline and offers practical best practices for using distributed caching in the cloud. This session features insights from real-world examples, such as AliPay, Zhihu, and more.
AI Infra Day | The AI Infra in the Generative AI EraAlluxio, Inc.
AI Infra Day
Oct. 25, 2023
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Bin Fan (Cheif Architect, VP of Open Source, @Alluxio)
As the AI landscape rapidly evolves, the advancements in generative AI technologies, such as ChatGPT, are driving a need for a robust AI infra stack. This opening keynote will explore the key trends of the AI infra stack in the generative AI era.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Enabling big data & AI workloads on the object store at DBS
1. Enabling big data & AI workloads on the
object store at DBS
Vitaliy Baklikov (DBS Bank)
Dipti Borkar (Alluxio)
Strata Data New York
2019
2. About DBS
• Headquartered in Singapore
• Largest bank in South East Asia
• Present in 18 markets globally,
including 6 priority markets
• Singapore, Hong Kong, China,
India, Indonesia and Taiwan
• We have a very cool digiBank app
• And lots lots lots of data systems
3. DBS – Top 20 Business Transformations of the Last Decade
4. AWS EnginesOnprem Engines
HDFS
Object Store
Evolution of Data Platforms at DBS
Generation 1
• Boxed data
• Monolithic/Closed Systems
• Proprietary HW/SW
• Data for Targeted Use Cases
Generation 2
• Big Data Explosion
• Hadoop Data Lakes
• Commodity HW and Hadoop
Ecosystem
• Compute tied to Storage
Generation 3
• Data Democratization
• Cloud Native platform… Hybrid! Multi!
• Open Source Engines
• Burst compute in the cloud with data
on-prem for compliance
• AI/ML Centric
Teradata
Informatica
SAS
HadoopTeradata
Informatica
SAS
Teradata
Informatica
SAS
Hadoop
5. Challenges
1. Data Lake built on local Object Store
– Expensive rename operation
– Object listing is slow
– Variable performance
– Data locality is gone
2. Multiple Data Silos
3. Limited on-premise compute capacity
– Legacy ITIL processes for Infra provisioning
– No dynamic scale out/in
6. Alluxio at DBS
Mount HDFS from other
platforms into common
Alluxio cluster
Unified
Namespace
Object store
Analytics
Hybrid
cloud bursting
Caching layer for hot
data to speed up Presto
and Spark jobs
Extend Alluxio cluster into
AWS VPC
Run EMR for model training
and bring the results back to
on-prem
7. Advanced Analytics on Object Storage
The Use Case
• Cash-in-Transit use case
• Forecast cash replenishment schedule for each ATM
• Produce a delivery graph by 4am each morning
The Challenges
• Strict fixed SLA
• Need to load all history data for the forecasting model… daily!
8. Alluxio at DBS
Mount HDFS from other
platforms into common
Alluxio cluster
Unified
Namespace
Object store
Analytics
Hybrid
cloud bursting
Caching layer for hot
data to speed up Presto
and Spark jobs
Extend Alluxio cluster into
AWS VPC
Run EMR for model training
and bring the results back to
on-prem
9. Burst processing into the Cloud
The Use Case
– Call Center project
– Millions of calls annually
– Why do our customers call us?
– What do they do before picking up the phone?
• Reconstruct customer journey
• Predict the reason for the call
The Challenges
– Transcript quality
– Need lots of compute
• >30TB of clickstream, transaction, customer, and product data
• >20TB of audio files
– Need dynamic compute for training and analysis
– Data needs to reside on-prem for compliance
13. Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
Enable innovation with any frameworks
running on data stored anywhere
Data Analyst
Data Engineer
Storage Ops
Data Scientist
Lines of Business
14. Data Orchestration for the Cloud
Java File API HDFS Interface S3 Interface REST APIPOSIX Interface
HDFS Driver Swift Driver S3 Driver NFS Driver
Enable innovation with any frameworks
running on data stored anywhere
15. Problem: HDFS cluster is compute-
bound & complex to maintain
AWS Public Cloud IaaS
Spark Presto Hive TensorFlow
Alluxio Data Orchestration and Control Service
On Premises
Connectivity
Datacenter
Spark Presto Hive
Tensor
Flow
Alluxio Data Orchestration and Control Service
Barrier 1: Prohibitive network latency
and bandwidth limits
• Makes hybrid analytics unfeasible
Barrier 2: Copying data to cloud
• Difficult to maintain copies
• Data security and governance
• Costs of another silo
Step 1: Hybrid Cloud for Burst Compute Capacity
• Orchestrates compute access to on-prem data
• Working set of data, not FULL set of data
• Local performance
• Scales elastically
• On-Prem Cluster Offload (both Compute & I/O)
Step 2: Online Migration of Data Per Policy
• Flexible timing to migrate, with less dependencies
• Instead of hard switch over, migrate at own pace
• Moves the data per policy – e.g. last 7 days
“Zero-copy” bursting to the cloud
16. Using Alluxio with AWS EMR
Presto Hive
Instances
Metadata &
Data cache
Presto Hive
Metadata &
Data cache
HDFS HDFSEMRF
S
EMRF
S
Compute-driven
Continuous sync
Compute-driven
Continuous sync
17. Spark Presto Hive TensorFlow
RAM
Framework
Read file /trades/us
Bucket Trades Bucket Customers
Data requests
Feature Highlight: Data Caching for faster compute
Read file /trades/us again Read file /trades/top
Read file /trades/top
Variable latency
with throttling
Read file /trades/us again Read file /trades/top
Read file /trades/top
Read file /trades/us again Read file /trades/top
Read file /trades/top
Read file /trades/us again Read file /trades/top
Read file /trades/top
Read file /trades/us again
18. Spark Presto Hive TensorFlow
RAM
SSD
Disk
Framework
Bucket Trades Bucket Customers
Data requests
Feature Highlight - Intelligent Tiering for resource efficiency
Read file /customers/145
Out of memory
Variable latency
with throttling
Data moved to another tier
19. Spark Presto Hive TensorFlow
RAM
SSD
Disk
Framework
New Trades
Policy Defined Move data > 90 days old to
Feature Highlight – Policy-driven Data Management
S3 Standard
Policy interval : Every day
Policy applied everyday
HDFS