AdFin is a company that provides analytics tools for programmatic advertising markets to bring transparency. They developed PetaBucket, a distributed, relational OLAP database that can query a petabyte dataset in seconds. AdFin uses CephFS for scalable storage across petabyte datasets and nodes. They contributed code to add local caching support to the Ceph kernel client to improve performance for their workload of querying recent time-series data more frequently.
We at Preply do our best to ensure that our website loads quicky as it has huge impact on business. In my talk I will explain:
- why pageload metric is important from business standpoint and how to measure its impact.
- how we evolved with our speed optimization technics starting from very basic ones(caching, orm optimizations) to more advanced(replicas, load-balancing) and the level where we are now(CDN optmization, microservices etc.)
- I will talk about both front-end and backend optimization with focus on the stack we use: AWS, Django/Python, Postgres, Docker.
Configuring storage. The slides to this webinar cover how to configure storage for Aerospike. It includes a discussion of how Aerospike uses Flash/SSDs and how to get the best performance out of them.
Find the full webinar with audio here - http://www.aerospike.com/webinars
We at Preply do our best to ensure that our website loads quicky as it has huge impact on business. In my talk I will explain:
- why pageload metric is important from business standpoint and how to measure its impact.
- how we evolved with our speed optimization technics starting from very basic ones(caching, orm optimizations) to more advanced(replicas, load-balancing) and the level where we are now(CDN optmization, microservices etc.)
- I will talk about both front-end and backend optimization with focus on the stack we use: AWS, Django/Python, Postgres, Docker.
Configuring storage. The slides to this webinar cover how to configure storage for Aerospike. It includes a discussion of how Aerospike uses Flash/SSDs and how to get the best performance out of them.
Find the full webinar with audio here - http://www.aerospike.com/webinars
Big Data Real Time Analytics - A Facebook Case StudyNati Shalom
Building Your Own Facebook Real Time Analytics System with Cassandra and GigaSpaces.
Facebook's real time analytics system is a good reference for those looking to build their real time analytics system for big data.
The first part covers the lessons from Facebook's experience and the reason they chose HBase over Cassandra.
In the second part of the session, we learn how we can build our own Real Time Analytics system, achieve better performance, gain real business insights, and business analytics on our big data, and make the deployment and scaling significantly simpler using the new version of Cassandra and GigaSpaces Cloudify.
Growing Monitoring to Keep Up with Technology and Business DemandsZenoss
NetApp's Ed Wang (Senior Manager, Automation and Monitoring Tools) presents Growing Monitoring to Keep Up with Technology and Business Demands.
Access the full presentation recordings for GalaxZ17 here: http://ow.ly/WyBu30cakk0
NetApp IT Data Center Strategies to Enable Digital TransformationNetApp
During an Insight Las Vegas 2017 breakout presentation, NetApp IT Customer-1 Director, Stan Cox, and Senior Storage Architect, Eduardo Rivera explained how NetApp IT enables digital transformation with data center strategies that incorporates ONTAP AFF systems in the data center to save power, cooling & space and NetApp Private Storage and ONTAP Cloud to leverage the public cloud while retaining control of their data. Using OnCommand Insight for data center management—and its integration with their configuration management database—the NetApp IT team knows what’s in their data centers, in terms of both functionality, usage, and inter-connections. NetApp IT believes knowing what’s in your data centers is fundamental to maintaining total cost of ownership, adapting to new technologies, leveraging the cloud while owning your data, and enabling digital transformation.
In this video from the Stanford HPC Conference, Liran Zvibel from Weka.IO presents: Making Machine Learning Compute Bound Again.
"GPUs are getting faster on a yearly cycle. Networking was able to catch up and support linear scaling of models that fit in memory. Traditional storage has not caught up to the condensed performance needed by GPU-filled servers. The amount of concurrent clients and the sheer amount of data required to effectively scale modern deep learning models keeps growing.
We are going to present WekaIO, the lowest latency, highest throughput file system solution that scales to 100s of PB in a single namespace supporting the most challenging deep learning projects that run today. We will present real life benchmarks comparing WekaIO performance to a local SSD file system, showing that we are the only coherent shared storage that is even faster than the current caching solutons, while allowing customers to linearly scale performance by adding more GPU servers. Also, we will view the complete ML project lifecycle, from collecting data, cleaning, tagging, exploring, training, validating, and finally archiving, and how customers can use cloud bursting to leverage public cloud infrastructure for improved economics."
Learn more: https://weka.io
and
http://hpcadvisorycouncil.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This presentation has been well received the the SANS community and many information security teams I engage with.
It describes how integrating a full content repository to your existing security architecture can decrease incident response time and lead to fast identification of root cause.
I also describe a new way of implementing NetFlow without sampling to provide greater visibility of your network.
Enjoy!
Boni Bruno, CISSP, CISM, CGEIT
www.bonibruno.com
Similar to Ceph Day New York 2014: Distributed OLAP queries in seconds using CephFS (20)
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfJay Das
With the advent of artificial intelligence or AI tools, project management processes are undergoing a transformative shift. By using tools like ChatGPT, and Bard organizations can empower their leaders and managers to plan, execute, and monitor projects more effectively.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
top nidhi software solution freedownloadvrstrong314
This presentation emphasizes the importance of data security and legal compliance for Nidhi companies in India. It highlights how online Nidhi software solutions, like Vector Nidhi Software, offer advanced features tailored to these needs. Key aspects include encryption, access controls, and audit trails to ensure data security. The software complies with regulatory guidelines from the MCA and RBI and adheres to Nidhi Rules, 2014. With customizable, user-friendly interfaces and real-time features, these Nidhi software solutions enhance efficiency, support growth, and provide exceptional member services. The presentation concludes with contact information for further inquiries.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Who is Adfin?
What special sauce did we build … very large OLAP DB.
Goals:
Have you take a look at at CephFS … might be one of the few people talking about it.
Realized that it’s possible for your organization to develop some expertise in-house… contribute.
Name implies a combination of Advertising + Finance Markets. Two home town industries (Madison Ave and Wall St)
Using tools and knowledge pioneered by the financial industry.
Most media (by volume) is bought and sold pragmatically. Ala. HFT
It’s an opaque marketplace.
Bloomberg … Information Platform, S&P… Indices, Market … aggregating market data (CDS)
I am going to keep butchering these analogies.
Pictures of some of the tools we’ve built.
Real time analysis into your own data and market data.
Run a query get a result… lots of variables.
Forecasting
Pictures of some of the tools we’ve built.
Real time analysis into your own data and market data.
Run a query get a result… lots of variables.
Forecasting
Pictures of some of the tools we’ve built.
Real time analysis into your own data and market data.
Run a query get a result… lots of variables.
Forecasting
Pictures of some of the tools we’ve built.
Real time analysis into your own data and market data.
Run a query get a result… lots of variables.
Forecasting
The advertising market is larger then the financial market… in terms of volume of transactions.
Each impression is worth a tiny fraction of a penny.
When I looked at the number of transactions for an exchange like the NASDAQ… it’s like 50 million, NYSE 100 million.
A lot of duct tape, but also a lot of efficiency.
This number is not getting smaller. All advertising is going to be digitally bought and sold and that day is coming.
Distributed, relational database for running real time analytics queries on very large time series data. KDB on many many nodes.
Some fun things. It’s a relational model, but not SQL. 90% of queries or sums or group bys.
Data is sharded into partitions by time. Spread across many nodes.
We get pretty amazing singe node performance. 100s of millions of rows a second per partition.
There’s been a lot of research into this stuff. Based on research into compression, indexing, query all from like last 3 to 4 years.
For large datasets our goal is to answer under 10 seconds for really large queries. Reality is most things we do answer under 1 second.
Why? Because the dataset is huge.
Also, we’re a bit crazy.
Distributed, relational database for running real time analytics queries on very large time series data. KDB on many many nodes.
Some fun things. It’s a relational model, but not SQL. 90% of queries or sums or group bys.
Data is sharded into partitions by time. Spread across many nodes.
We get pretty amazing singe node performance. 100s of millions of rows a second per partition.
There’s been a lot of research into this stuff. Based on research into compression, indexing, query all from like last 3 to 4 years.
For large datasets our goal is to answer under 10 seconds for really large queries. Reality is most things we do answer under 1 second.
Why? Because the dataset is huge.
Also, we’re a bit crazy.
Before we’re storing it all on local disks.
Couple problems:
Redundancy?
Can’t grow computation without storage, vice versa.
Looked into Ceph:
Scalable storage, just throw more machines at it… don’t worry about topology too much.
We could separate storage from computation.
No SPOF, redundancy everywhere.
Pretty good speed for DFS.
We can leverage the kernel. The kernel client versus doing it directly. Page cache etc…Common theme
“Beta company, okay using a beta product” We can get under the good.
Early start was a bit rough. There was lots of bugs. We found lots of bugs.
Community was great, esp Yan.
Yan fixed our last bug around the end of 2013… haven’t had a single problem since.
We’re not storing multi-PB yet but we processed multi-PB and haven’t had a problem
We lost some performance as a result of this. Network latency, overhead, Ceph overhead.
We can also go even cheaper without Ceph nodes / network.
Our access pattern, write once read many (mostly true).
Most recent data is most often use (working set larger then RAM smaller the the full DFS)
The linux kernel people really put hundreds of man years into scabiliity.
I don’t want to discourage anybody … we did something not smart, picked the hardest problem.
It required us to know a lot of things about Ceph, kernel, concurrency.
I would pick something simpler next time.
There’s bugs in the other parts of the kernel?
So one of the reasons we wanted to do this work in the kernel was concurrency, so our benefit was also out PITA.
We got it up to the Ceph code base around 3.13
Bunch of bug fixes from external folks. We’ve exposed issues with FSCache code.
We’ve fixed a bunch of concurrency bugs that only happen in the error path of FSCache under VMA pressure. A lot of filesystems benefit.
We’re really happy with performance… we’ve made a good bet on the kernel.
We’re able to really the fscache up to the speed of the disks we have.
So despite the initial learning curve … we want to contribute work.
Where we can leverage our knowledge … performance.
We’ve built a lot of things in our system for improving latency. Learned what to do what not to do, where to apply lockless alogs.
Readv2 syscall… Help all applications that do both IO and CPU bound work.
Thanks for listening to me.
Hopefully it was a good story of what we’re up to… how we’re leveraging Ceph.
Motivating to help and contribute.
It’s nice to have a vendor you can call up and yell at when things not working, but it’s even better to be able to guide the tool to do what you want.
The Ceph community is great, there’s so many people contributing to so many different projects.