This document provides an update on the Ceph community. Key points include Ceph Days being held in Chicago and other locations to focus on the community, metrics showing growth in code, trackers, and forums, and work continuing on CephFS, object classes, and testing. Governance discussions are also ongoing.
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
Paolo Di Tommaso from the Center for Genomic Regulation presented this talk at the Switzerland HPC Conference.
"Research computational workflows consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This presentation will introduce our experience deploying genomic pipelines with Docker containers at the Center for Genomic Regulation (CRG). I will discuss how we implemented it, the main issues we faced, the pros and cons of using Docker in an HPC environment including a benchmark of the impact of containers technology on the performance of the executed applications."
Watch the video presentation: https://www.youtube.com/watch?v=Doo9H2-gBAk
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Natural Language Processing (NLP) practitioners often have to deal with analyzing large corpora of unstructured documents and this is often a tedious process. Python tools like NLTK do not scale to large production data sets and cannot be plugged into a distributed scalable framework like Apache Spark or Apache Flink.
The Apache OpenNLP library is a popular machine learning based toolkit for processing unstructured text. Combining a permissive licence, a easy-to-use API and set of components which are highly customize and trainable to achieve a very high accuracy on a particular dataset. Built-in evaluation allows to measure and tune OpenNLP’s performance for the documents that need to be processed.
From sentence detection and tokenization to parsing and named entity finder, Apache OpenNLP has the tools to address all tasks in a natural language processing workflow. It applies Machine Learning algorithms such as Perceptron and Maxent, combined with tools such as word2vec to achieve state of the art results. In this talk, we’ll be seeing a demo of large scale Name Entity extraction and Text classification using the various Apache OpenNLP components wrapped into Apache Flink stream processing pipeline and as an Apache NiFI processor.
NLP practitioners will come away from this talk with a better understanding of how the various Apache OpenNLP components can help in processing large reams of unstructured data using a highly scalable and distributed framework like Apache Spark/Apache Flink/Apache NiFi.
Reproducible Computational Pipelines with Docker and Nextflowinside-BigData.com
Paolo Di Tommaso from the Center for Genomic Regulation presented this talk at the Switzerland HPC Conference.
"Research computational workflows consist of several pieces of third party software and, because of their experimental nature, frequent changes and updates are commonly necessary thus raising serious deployment and reproducibility issues. Docker containers are emerging as a possible solution for many of these problems, as they allow the packaging of pipelines in an isolated and self-contained manner. This presentation will introduce our experience deploying genomic pipelines with Docker containers at the Center for Genomic Regulation (CRG). I will discuss how we implemented it, the main issues we faced, the pros and cons of using Docker in an HPC environment including a benchmark of the impact of containers technology on the performance of the executed applications."
Watch the video presentation: https://www.youtube.com/watch?v=Doo9H2-gBAk
See more talks in the Swiss Conference Video Gallery: http://insidehpc.com/2016-swiss-hpc-conference/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Natural Language Processing (NLP) practitioners often have to deal with analyzing large corpora of unstructured documents and this is often a tedious process. Python tools like NLTK do not scale to large production data sets and cannot be plugged into a distributed scalable framework like Apache Spark or Apache Flink.
The Apache OpenNLP library is a popular machine learning based toolkit for processing unstructured text. Combining a permissive licence, a easy-to-use API and set of components which are highly customize and trainable to achieve a very high accuracy on a particular dataset. Built-in evaluation allows to measure and tune OpenNLP’s performance for the documents that need to be processed.
From sentence detection and tokenization to parsing and named entity finder, Apache OpenNLP has the tools to address all tasks in a natural language processing workflow. It applies Machine Learning algorithms such as Perceptron and Maxent, combined with tools such as word2vec to achieve state of the art results. In this talk, we’ll be seeing a demo of large scale Name Entity extraction and Text classification using the various Apache OpenNLP components wrapped into Apache Flink stream processing pipeline and as an Apache NiFI processor.
NLP practitioners will come away from this talk with a better understanding of how the various Apache OpenNLP components can help in processing large reams of unstructured data using a highly scalable and distributed framework like Apache Spark/Apache Flink/Apache NiFi.
RBD, the RADOS Block Device in Ceph, gives you virtually unlimited scalability (without downtime), high performance, intelligent balancing and self-healing capabilities that traditional SANs can't provide. Ceph achieves this higher throughput through a unique system of placing objects across multiple nodes, and adaptive load balancing that replicates frequently accessed objects over more nodes. This talk will give a brief overview of the Ceph architecture, current integration with Apache CloudStack, and recent advancements with Xen and blktap2.
There have been heaping piles of buzz surrounding Ceph and OpenStack lately. Similar amounts of work have been going in to the integration between Ceph and OpenStack in recent versions. We'll take a look at how this work is making all the awesomeness of Ceph available to users in a simple, intuitive, and powerful way. The world of Havana and beyond is certainly no different, and promises to continue the trend of both functionality and buzz-worthiness.
This talk given at the OpenStack meetup in Boston (Aug 14, 2013) gives a brief introduction to Ceph for the uninitiated and take a look at what's coming down the road. The short term of Havana has plenty to keep fans of both platforms happy and busy, but there are plenty more interesting problems that we can tackle. In addition to the concrete of the short term we'll take a look at how less-oft-used pieces of the Ceph platform can help augment your OpenStack setup, some general blue sky thinking, and what the community can do to get involved.
Conf42-Python-Building Apache NiFi 2.0 Python Processors
https://www.conf42.com/Python_2024_Tim_Spann_apache_nifi_2_processors
Building Apache NiFi 2.0 Python Processors
Abstract
Let’s enhance real-time streaming pipelines with smart Python code. Adding code for vector databases and LLM.
Summary
Tim Spann: I'm going to be talking today, be building Apache 9520 Python processors. One of the main purposes of supporting Python in the streaming tool Apache Nifi is to interface with new machine learning and AI and Gen AI. He says Python is a real game changer for Cloudera.
You're just going to add some metadata around it. It's a great way to pass a file along without changing it too substantially. We really need you to have Python 310 and again JDK 21 on your machine. You got to be smart about how you use these models.
There are a ton of python processors available. You can use them in multiple ways. We're still in the early world of Python processors, so now's the time to start putting yours out there. Love to see a lot of people write their own.
When we are parsing documents here, again, this is the Python one I'm picking PDF. Lots of different things you could do. If you're interested on writing your own python code for Apache Nifi, definitely reach out and thank.
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...DevOps_Fest
Я розповім про досвід будування системи для роботи з великими даними на базі відкритої технологіі Apache Nifi та Kubernetes на прикладі аналізу ресурсів новин з використанням NLP аналізом.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
2. Focused on being non-disruptive
RHEL / Fedora / CentOS coverage
Maintaining Ubuntu / SUSE coverage
Co-Existing with Gluster
Bi-directional learning is fun!
ONE YEAR SINCE ACQUISITION
2
3. CEPH DAYS
You’re here!
Turning up the heat
100% Community-focused
Upcoming
Shanghai
Tokyo
Melbourne
3
6. USER COMMITTEE
6
Started after Dumpling
Current chair: Wido den Hollander
Periodic meetings to discuss community matters
Most recently:
Release cadence
Contributor credits
Global mirror network
First steps towards broader governance
7. Google Summer of Code
7
Ceph project’s second year
Ending soon
6 slots, used 4
Last year
Wireshark
Reliability Model
Work published on Ceph wiki
Outreachy too
Get involved!
8. CentOS Storage SIG
8
Creating a collection of packages
http://wiki.centos.org/SpecialIntere
stGroup/Storage/Proposal
Easier to deploy CentOS as a
storage node
Early days
Packages for Ceph & Gluster
Stalled -- get involved!
12. librados
12
Many more deployments/apps using direct librados
Native library for accessing RADOS
Librados.so shared library
C, C++, Python, Erlang, Haskell, PHP, Java (JNA)
Direct data path to storage nodes
Speaks native Ceph protocol with cluster
Exposes
Mutable objects
Rich per-object API and data model
Hides
Data distribution, migration, replication, failures
13. CephFS
13
Lots of hard work!
Dogfooding
Code
Src/mds: 366 commits, 19,417 lines added/removed
Src/client: 131 commits, 4289 lines
Src/tools/cephfs: 41 commits, 4179 lines
Ceph-qa-suite: 4842 added lines of FS-related python
Tracker
108 FS tickets resolved since Firefly (of which 97 were created
since Firefly)
83 ticket currently open for FS, of which 35 were created since
Firefly
31 feature tickets resolved
15. DEPLOYMENT / ORCHESTRATION
15
Ceph-deploy in a good place to start
Puppet (ruby, largest mindshare -- devops)
Chef (git, ruby -- devops)
Ansible (python, fast -- sysadmins)
Salt (python, yaml, scalable – sysadmins)
Juju (any language, yaml – simple)
Crowbar (started for Dell OpenStack, now FOSS)
16. TESTING
16
Ceph core lab (Sepia)
Old Dell machines
Old hardware
New Intel donated cluster
Ceph community lab (coming soon-ish)
Seamicro boxes that fell over
Shopping for something new
OpenStack integration
http://dachary.org/?p=3828