My DevCon 2019 talk discusses how to make it easier to integrate Alfresco with other systems using an event-based approach. Two real world examples are discussed and demonstrated. The first is about reporting against Alfresco metadata. The second is about enriching metadata by running content through a Natural Language Processing (NLP) model. Both solutions work by listening to generic events generated by Alfresco and placed on an Apache Kafka queue. For the reporting example, the Spring Boot consumer subscribes to Kafka events, then fetches metadata via CMIS and indexes that into Elasticsearch. For the NLP example, a separate Spring Boot consumer subscribes to the same events, but in this case, fetches the content, extracts text using Apache Tika, runs the text through Apache OpenNLP, then writes back extracted entities to Alfresco via CMIS. These are relatively simple examples, but illustrate how a de-coupled, asynchronous, event-based approach can make integrating Alfresco with other systems easier.
In this session, we will look first at the rich metadata that documents in your repository have, how to control the mapping of this on to your content model, and some of the interesting things this can deliver. We'll then move on to the content transformation and rendition services, and see how you can easily and powerfully generate a wide range of media from the content you already have.
Alfresco DevCon 2019 (Edinburgh)
"Transforming the Transformers" for Alfresco Content Services (ACS) 6.1 & beyond
https://community.alfresco.com/community/ecm/blog/2019/02/07/alfresco-transform-service-new-with-acs-61
Alfresco provides various content transformation options across the Digital Business Platform (DBP). In this talk, we will explore the new independently-scalable Alfresco Transform Service. This enables a new option for transforms to be asynchronously off-loaded by Alfresco Content Services (ACS).
https://devcon.alfresco.com/speaker/jan-vonka/
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
Discover tips and tools that will help you to keep your Alfresco environment in shape. Most of the best tools are free or Open Source, and this presentation will guide you through the steps to improve the performance of your system.
Moving Gigantic Files Into and Out of the Alfresco RepositoryJeff Potts
This talk is a technical case study showing show Metaversant solved a problem for one of their clients, Noble Research Institute. Researchers at Noble deal with very large files which are often difficult to move into and out of the Alfresco repository.
In this session, we will look first at the rich metadata that documents in your repository have, how to control the mapping of this on to your content model, and some of the interesting things this can deliver. We'll then move on to the content transformation and rendition services, and see how you can easily and powerfully generate a wide range of media from the content you already have.
Alfresco DevCon 2019 (Edinburgh)
"Transforming the Transformers" for Alfresco Content Services (ACS) 6.1 & beyond
https://community.alfresco.com/community/ecm/blog/2019/02/07/alfresco-transform-service-new-with-acs-61
Alfresco provides various content transformation options across the Digital Business Platform (DBP). In this talk, we will explore the new independently-scalable Alfresco Transform Service. This enables a new option for transforms to be asynchronously off-loaded by Alfresco Content Services (ACS).
https://devcon.alfresco.com/speaker/jan-vonka/
Alfresco DevCon 2019 Performance Tools of the TradeLuis Colorado
Discover tips and tools that will help you to keep your Alfresco environment in shape. Most of the best tools are free or Open Source, and this presentation will guide you through the steps to improve the performance of your system.
Moving Gigantic Files Into and Out of the Alfresco RepositoryJeff Potts
This talk is a technical case study showing show Metaversant solved a problem for one of their clients, Noble Research Institute. Researchers at Noble deal with very large files which are often difficult to move into and out of the Alfresco repository.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
다양한 하둡에코 소프트웨어 성능을 검증하려는 목적으로 성능 테스트 환경을 구성해보았습니다. ELK, JMeter를 활용해 구성했고 Kafka에 적용해 보았습니다.
프로젝트에서 요구되는 성능요건을 고려해 다양한 옵션을 조정해 시뮬레이션 해볼수 있습니다.
처음 적용한 뒤 2년 정도가 지났지만, kafka 만이 아니다 다른 Hadoop eco 및 Custom Solution에도 유용하게 활용 가능하겠습니다.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Flink currently features different APIs for bounded/batch (DataSet) and streaming (DataStream) programs. And while the DataStream API can handle batch use cases, it is much less efficient in that compared to the DataSet API. The Table API was built as a unified API on top of both, to cover batch and streaming with the same API, and under the hood delegate to either DataSet or DataStream.
In this talk, we present the latest on the Flink community's efforts to rework the APIs and the stack for better unified batch & streaming experience. We will discuss:
- The future roles and interplay of DataSet, DataStream, and Table API
- The new Flink stack and the abstractions on which these APIs will build
- The new unified batch/streaming sources
- How batch and streaming optimizations differ in the runtime, and what the future interplay of batch and streaming execution could look like
Dennis Wittekind, Confluent, Senior Customer Success Engineer
Perhaps you have heard of Kafka Connect and think it would be a great fit in your application's architecture, but you like to know how things work before you propose them to your team? Perhaps you know enough Connect to be dangerous, but you haven't had the time to really understand all the moving pieces? This meetup talk is for you! We'll briefly introduce Connect to the uninitiated, and then jump in to underlying concepts and considerations you should make when running Connect in production! We'll even run a live demo! What could go wrong!?
https://www.meetup.com/Saint-Louis-Kafka-meetup-group/events/272687113/
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
Apache Flink is a distributed stream processing framework that allows users to process and analyze data in real-time. At LinkedIn, we developed a fully managed stream processing platform on Flink running on K8s to power hundreds of stream processing pipelines in production. This platform is the backbone for other infra systems like Search, Espresso (internal document store) and feature management etc. We provide a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes. Users can focus on their business logic, leaving the Flink platform to take care of management aspects such as split deployment, resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more. In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at LinkedIn and share the experiences and lessons we have learned.
Alfresco Content Modelling and Policy BehavioursJ V
Alfresco DevCon 2010 (Paris and New York)
This session starts by giving an overview of components of an Alfresco content model. We then examine the various forms of call-backs and hook-points available to the developer and give some examples of how these can be used to enforce custom business logic and model consistency.
Introducing Apache Kafka - a visual overview. Presented at the Canberra Big Data Meetup 7 February 2019. We build a Kafka "postal service" to explain the main Kafka concepts, and explain how consumers receive different messages depending on whether there's a key or not.
Hyperspace is a recently open-sourced (https://github.com/microsoft/hyperspace) indexing sub-system from Microsoft. The key idea behind Hyperspace is simple: Users specify the indexes they want to build. Hyperspace builds these indexes using Apache Spark, and maintains metadata in its write-ahead log that is stored in the data lake. At runtime, Hyperspace automatically selects the best index to use for a given query without requiring users to rewrite their queries. Since Hyperspace was introduced, one of the most popular asks from the Spark community was indexing support for Delta Lake. In this talk, we present our experiences in designing and implementing Hyperspace support for Delta Lake and how it can be used for accelerating queries over Delta tables. We will cover the necessary foundations behind Delta Lake’s transaction log design and how Hyperspace enables indexing support that seamlessly works with the former’s time travel queries.
ksqlDB is a stream processing SQL engine, which allows stream processing on top of Apache Kafka. ksqlDB is based on Kafka Stream and provides capabilities for consuming messages from Kafka, analysing these messages in near-realtime with a SQL like language and produce results again to a Kafka topic. By that, no single line of Java code has to be written and you can reuse your SQL knowhow. This lowers the bar for starting with stream processing significantly.
ksqlDB offers powerful capabilities of stream processing, such as joins, aggregations, time windows and support for event time. In this talk I will present how KSQL integrates with the Kafka ecosystem and demonstrate how easy it is to implement a solution using ksqlDB for most part. This will be done in a live demo on a fictitious IoT sample.
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
Flink Forward San Francisco 2022.
At Stripe we have created a complete end to end exactly-once processing pipeline to process financial data at scale, by combining the exactly-once power from Flink, Kafka, and Pinot together. The pipeline provides exactly-once guarantee, end-to-end latency within a minute, deduplication against hundreds of billions of keys, and sub-second query latency against the whole dataset with trillion level rows. In this session we will discuss the technical challenges of designing, optimizing, and operating the whole pipeline, including Flink, Kafka, and Pinot. We will also share our lessons learned and the benefits gained from exactly-once processing.
by
Xiang Zhang & Pratyush Sharma & Xiaoman Dong
다양한 하둡에코 소프트웨어 성능을 검증하려는 목적으로 성능 테스트 환경을 구성해보았습니다. ELK, JMeter를 활용해 구성했고 Kafka에 적용해 보았습니다.
프로젝트에서 요구되는 성능요건을 고려해 다양한 옵션을 조정해 시뮬레이션 해볼수 있습니다.
처음 적용한 뒤 2년 정도가 지났지만, kafka 만이 아니다 다른 Hadoop eco 및 Custom Solution에도 유용하게 활용 가능하겠습니다.
Video: http://joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x ; This talk for SCaLE11x covers system performance analysis methodologies and the Linux tools to support them, so that you can get the most out of your systems and solve performance issues quickly. This includes a wide variety of tools, including basics like top(1), advanced tools like perf, and new tools like the DTrace for Linux prototypes.
Flink currently features different APIs for bounded/batch (DataSet) and streaming (DataStream) programs. And while the DataStream API can handle batch use cases, it is much less efficient in that compared to the DataSet API. The Table API was built as a unified API on top of both, to cover batch and streaming with the same API, and under the hood delegate to either DataSet or DataStream.
In this talk, we present the latest on the Flink community's efforts to rework the APIs and the stack for better unified batch & streaming experience. We will discuss:
- The future roles and interplay of DataSet, DataStream, and Table API
- The new Flink stack and the abstractions on which these APIs will build
- The new unified batch/streaming sources
- How batch and streaming optimizations differ in the runtime, and what the future interplay of batch and streaming execution could look like
Dennis Wittekind, Confluent, Senior Customer Success Engineer
Perhaps you have heard of Kafka Connect and think it would be a great fit in your application's architecture, but you like to know how things work before you propose them to your team? Perhaps you know enough Connect to be dangerous, but you haven't had the time to really understand all the moving pieces? This meetup talk is for you! We'll briefly introduce Connect to the uninitiated, and then jump in to underlying concepts and considerations you should make when running Connect in production! We'll even run a live demo! What could go wrong!?
https://www.meetup.com/Saint-Louis-Kafka-meetup-group/events/272687113/
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
Apache Flink is a distributed stream processing framework that allows users to process and analyze data in real-time. At LinkedIn, we developed a fully managed stream processing platform on Flink running on K8s to power hundreds of stream processing pipelines in production. This platform is the backbone for other infra systems like Search, Espresso (internal document store) and feature management etc. We provide a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes. Users can focus on their business logic, leaving the Flink platform to take care of management aspects such as split deployment, resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more. In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at LinkedIn and share the experiences and lessons we have learned.
Alfresco Content Modelling and Policy BehavioursJ V
Alfresco DevCon 2010 (Paris and New York)
This session starts by giving an overview of components of an Alfresco content model. We then examine the various forms of call-backs and hook-points available to the developer and give some examples of how these can be used to enforce custom business logic and model consistency.
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
As a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and excels at processing streams of real-time events. Kafka provides reliable, millisecond delivery for connecting downstream systems with real-time data.
In this talk, we will show how easy it is to leverage Kafka and the Elasticsearch connector to keep your indices populated with the latest data from the rest of your enterprise, as it changes.
Structured-Streaming-as-a-Service with Kafka, YARN, and Tooling with Jim DowlingDatabricks
Since mid-2016, Spark-as-a-Service has been available to researchers in Sweden from the Rise SICS ICE Data Center at www.hops.site. In this session, Dowling will discuss the challenges in building multi-tenant Spark structured streaming applications on YARN that are metered and easy-to-debug. The platform, called Hopsworks, is in an entirely UI-driven environment built with only open-source software. Learn how they use the ELK stack (Elasticsearch, Logstash and Kibana) for logging and debugging running Spark streaming applications; how they use Grafana and InfluxDB for monitoring Spark streaming applications; and, finally, how Apache Zeppelin can provide interactive visualizations and charts to end-users.
This session will also show how Spark applications are run within a ‘project’ on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In addition, hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.hear about the experiences of their users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and the novel solutions for helping researchers debug and optimize Spark applications.afka topics are protected from access by users that are not members of the project. We will also discuss the experiences of our users (over 150 users as of early 2017): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
Analitica de datos en tiempo real con Apache Flink y Apache BEAMjavier ramirez
Trabajar en tiempo real con datos que se mueven muy rápido no es trivial, sobre todo con volúmenes de datos elevados. Apache Flink y Apache BEAM están específicamente diseñadas para ese caso de uso. En esta charla te contaré los retos de la analítica en tiempo real, cuál es la arquitectura de Apache Flink, qué es Apace BEAM, y cómo usan estas herramientas empresas para hacer desde procesos triviales hasta gestionar billones de eventos al día con latencias de milisegundos. Por supuesto, haremos una demo :)
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
The driving question behind redesigns of countless data collection architectures has often been, ?how can we make the data available to our analytical systems faster?? Increasingly, the go-to solution for this data collection problem is Apache Flume. In this talk, architectures and techniques for designing a low-latency Flume-based data collection and delivery system to enable Hadoop-based analytics are explored. Techniques for getting the data into Flume, getting the data onto HDFS and HBase, and making the data available as quickly as possible are discussed. Best practices for scaling up collection, addressing de-duplication, and utilizing a combination streaming/batch model are described in the context of Flume and Hadoop ecosystem components.
No Docker? No Problem: Automating installation and config with AnsibleJeff Potts
In this talk I show how to bring stability and repeatability to your Alfresco installation by automating install and config management with Ansible.
This talk was originally given at Alfresco DevCon 2020 (virtual edition).
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Spark
DevNexus 2022 Atlanta
https://devnexus.com/presentations/7150/
This talk is a quick overview of the How, What and WHY of Apache Pulsar, Apache Flink and Apache NiFi. I will show you how to design event-driven applications that scale the cloud native way.
This talk was done live in person at DevNexus across from the booth in room 311
Tim Spann
Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
FlinkForward Asia 2019 - Evolving Keystone to an Open Collaborative Real Time...Zhenzhong Xu
Netflix is obsessed with customer joy, we relentlessly focus on product experience and high-quality content. In recent years, we have been making heavy investments in the tech-driven studio and content production. As a result, a lot of unique challenges arise in the real-time data infrastructure space. For example, in a microservices architecture, domain entities are spread in different applications and persistence storages, this made low latency consistent operational reporting and entity searching especially challenging.
In this talk, we’ll talk about some interesting use cases, the various challenges lay in the fundamentals of distributed systems, and how did we solve them. We will also discuss the learnings, things we could’ve done differently, and the new vision towards an open self-serving Data Mesh platform that empowers our partners and users to build flexible real-time data pipelines.
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
http://flink-forward.org/kb_sessions/multi-tenant-flink-as-a-service-on-yarn/
Since June 2016, Flink-as-a-service has been available to researchers and companies in Sweden from the Swedish ICT SICS Data Center at www.hops.site using the HopsWorks platform. Flink applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin on YARN. Flink applications are run within a project on a YARN cluster with the novel property that Flink applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics that are protected from access by users that are not members of the project. Hopsworks is entirely UI-driven, is open-source, and Flink applications that include Kafka topics can be created in a few mouse clicks. In this talk we will discuss the challenges in building a metered version of Flink-as-a-Service for YARN, experiences with Flink-on-YARN, and some of the possibilities that Hopsworks opens up for building secure, multi-ten
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
Presentation @ Oracle Code Berlin.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. How can we make sure that all these events are accepted and forwarded in an efficient and reliable way? This is where Apache Kafaka comes into play, a distirbuted, highly-scalable messaging broker, build for exchanging huge amounts of messages between a source and a target. This session will start with an introduction of Apache and presents the role of Apache Kafka in a modern data / information architecture and the advantages it brings to the table.
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
A presentation cum workshop on Real time Analytics with Apache Kafka and Apache Spark. Apache Kafka is a distributed publish-subscribe messaging while other side Spark Streaming brings Spark's language-integrated API to stream processing, allows to write streaming applications very quickly and easily. It supports both Java and Scala. In this workshop we are going to explore Apache Kafka, Zookeeper and Spark with a Web click streaming example using Spark Streaming. A clickstream is the recording of the parts of the screen a computer user clicks on while web browsing.
Lesfurest.com invited me to talk about the KAPPA Architecture style during a BBL.
Kappa architecture is a style for real-time processing of large volumes of data, combining stream processing, storage, and serving layers into a single pipeline. It's different from the Lambda architecture, uses separate batch and stream processing pipelines.
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaGuido Schmutz
After a quick overview and introduction of Apache Kafka, this session cover two components which extend the core of Apache Kafka: Kafka Connect and Kafka Streams/KSQL.
Kafka Connects role is to access data from the out-side-world and make it available inside Kafka by publishing it into a Kafka topic. On the other hand, Kafka Connect is also responsible to transport information from inside Kafka to the outside world, which could be a database or a file system. There are many existing connectors for different source and target systems available out-of-the-box, either provided by the community or by Confluent or other vendors. You simply configure these connectors and off you go.
Kafka Streams is a light-weight component which extends Kafka with stream processing functionality. By that, Kafka can now not only reliably and scalable transport events and messages through the Kafka broker but also analyse and process these event in real-time. Interestingly Kafka Streams does not provide its own cluster infrastructure and it is also not meant to run on a Kafka cluster. The idea is to run Kafka Streams where it makes sense, which can be inside a “normal” Java application, inside a Web container or on a more modern containerized (cloud) infrastructure, such as Mesos, Kubernetes or Docker. Kafka Streams has a lot of interesting features, such as reliable state handling, queryable state and much more. KSQL is a streaming engine for Apache Kafka, providing a simple and completely interactive SQL interface for processing data in Kafka.
Paolo Castagna is a Senior Sales Engineer at Confluent. His background is on 'big data' and he has, first hand, saw the shift happening in the industry from batch to stream processing and from big data to fast data. His talk will introduce Kafka Streams and explain why Apache Kafka is a great option and simplification for stream processing.
An overview of extension points in Kubernetes. Extend Kubernetes using API Aggregation, Custom Resource Definitions and your own Controllers. Kubernetes Meetup Frankfurt, March 25th 2019 at Meshcloud GmbH
Similar to Moving From Actions & Behaviors to Microservices (20)
Flexible Permissions Management with ACL TemplatesJeff Potts
This is was presented as an ignite-style lightning talk at DevCon 2018 in Lisbon. It discusses an open source add-on called ACL Templates which can be used to separate ACL settings from code.
I gave this talk in April 2016 at BeeCon, the Alfresco Community conference. It discusses what would happen if Alfresco Software, Inc., the commercial open source company behind Alfresco were to cease to exist. Would Alfresco as an open source project survive? The talk is light on bullet/text so you may prefer to find a recording to get the full context.
Connecting Content Management Apps with CMISJeff Potts
Discusses ECM interoperability achieved when content repositories like Nuxeo implement a specification called Content Management Interoperability Services (CMIS).
Several years ago, James Dixon, CTO at Pentaho, published a paper called, “The Bees and the Trees: The Beekeeper Model of Commercial Open Source Software”. I have found that this metaphor is hugely helpful in explaining commercial open source to people. So in this talk I introduce James' model, then I use it as the context to discuss the state of the Alfresco community.
A recording of this session lives here: http://www.youtube.com/watch?v=_NSsz-sjbzg
Content Management Interoperability Services (CMIS) is the preferred API for writing code against Alfresco. This presentation explains how to get started using CMIS and covers some of the gotchas and limitations you should be aware of before you commit to CMIS for your project. This presentation was originally delivered at Alfresco Summit 2013 in Barcelona and Boston.
Apache Chemistry in Action: Using CMIS and your favorite language to unlock c...Jeff Potts
This presentation shows how the CMIS specification and Apache Chemistry can be used to create content-centric applications that work with any CMIS-compliant repository.
Alfresco: The Story of How Open Source Disrupted the ECM MarketJeff Potts
The early 90's saw the rise of powerful, inexpensive team collaboration software on one hand and huge document management systems on the other. Open source and cloud have brought us full circle. Today's businesses can implement extremely powerful productivity enhancing solutions quickly and easily. Alfresco capitalized on this trend. It used open source to get to the market quickly. It delivered functionality on par with legacy ECM as open source. Today, however, it is not just an open source alternative to things like Documentum and SharePoint, it is a visionary in the ECM market. This presentation tells that story, putting into context the things happening in ECM, collaboration, open source, and cloud from the 1990's to present day.
This presentation shows how the CMIS specification and Apache Chemistry can be used to create content-centric applications that work with any CMIS-compliant repository.
Building Content-Rich Java Apps in the Cloud with the Alfresco APIJeff Potts
This presentation, originally delivered at JavaOne on October 2, 2012, talks about why you should use Alfresco instead of rolling your own content repository and discusses the new public Alfresco API for writing content apps that persist content to Alfresco in the Cloud.
A brief introduction to the CMIS spec and some tips and tricks for developers new to CMIS. Demos showed how to install and use cmislib, the Python API for CMIS, and OpenCMIS, the Java API. Both projects are part of Apache Chemistry. Originally given as part of an Alfresco webinar. Recording: http://blogs.alfresco.com/wp/webcasts/2012/01/getting-started-with-cmis-2/
In the past, developers have chosen to develop their own content-centric apps from scratch or by leveraging low level libraries. A content repository like Alfresco can save time and cost. Even if you don't choose Alfresco, you should still consider leveraging a standard API like CMIS as much as possible.
Presented at the Alfresco South Africa User Group on September 14. Covers a high-level look at the state of the ECM industry including a "Top 5 Takeaways" from AIIM's State of the Industry report.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Moving From Actions & Behaviors to Microservices
1. Moving from Actions & Behaviors to Microservices
Jeff Potts, Metaversant
@jeffpotts01
2. How do we make it easier to
integrate Alfresco with other
systems?
3. Learn. Connect. Collaborate.
“We want to be able to report against metadata in real-time.”
“When this custom property changes we need to notify this other system.”
“We want to improve how Alfresco transforms Word documents into HTML.”
“When content changes we want to run it through an NLP model.”
“Our company has an enterprise search solution that needs to index Alfresco content.”
“We want to replicate content between multiple Alfresco servers.”
Recurring customer requirements
4. Learn. Connect. Collaborate.
Traditional approaches run in-process
• Custom Alfresco Actions
– Java, deployed to Alfresco WAR
– Triggered by rule on a folder, a UI action, or by a schedule
• Custom Alfresco Behaviors
– Java, deployed to Alfresco WAR
– Bound to a policy on a class of nodes (e.g., specific type or aspect)
• Custom Web Scripts
– Java or JavaScript, deployed to Alfresco WAR
– Triggered by a REST call
• All of these run in Alfresco’s process
5. Learn. Connect. Collaborate.
Tradeoffs of the traditional approach
• Advantages
– Full access to the Alfresco API
– Runs as the authenticated user or as the system user
– Code is managed with the content model and other customizations
• Disadvantages
– Performance risk
– Requires server restart to deploy
– Requires an Alfresco developer familiar with Alfresco API
• Java & JavaScript are the only practical language options
– Long-running tasks may block user interface
– Scales as Alfresco scales
7. Learn. Connect. Collaborate.
Event-based integration approach
• Alfresco can be extended to generate generic events when something
happens to a node
• Interested systems
– Listen for Alfresco events
– Filter out what they don’t care about
– Fetch additional data from Alfresco and perform custom logic as needed
• Additional systems can be added without touching Alfresco
• Systems can use different frameworks & languages
• Independently scalable
• Can use Alfresco Kafka as a starting point
8. Learn. Connect. Collaborate.
Apache Kafka
Alfresco
Microservice
Event
Event
Microservice
Event
Microservice
Event
Move logic out of Alfresco into microservices
alfresco-
kafka
Kafka
Client JAR
18. Learn. Connect. Collaborate.
Example: Alfresco reporting
• Customer: “We want to be able to report against metadata in real-time.”
• Solution:
– Spring Boot microservice consumes Alfresco Kafka events
– When a node changes that is interesting, it fetches the metadata using CMIS
– Indexes metadata into Elasticsearch
– Kibana dashboard used to visualize data
• Demo: https://youtu.be/jGZVfP5L8yU
19. Learn. Connect. Collaborate.
Indexer Service
• Small Spring Boot app
• Runs in a servlet container
• Listens for Alfresco Kafka events
• Fetches the Alfresco Node as
JSON
• Indexes the Node JSON into
Elasticsearch
• Deletes objects from
Elasticsearch when DELETE
events occur
Apache Kafka
Alfresco
Elasticsearch Cluster
alf-es-indexer
alfresco-
kafka
Kafka
Client JAR
Event
Event
CMIS GET
Node JSON
Node JSON
24. Learn. Connect. Collaborate.
Example: Natural Language Processing
• Customer: “I want to be able to enrich Alfresco metadata by extracting
people, places, and names from content using an NLP model”
• Solution:
– Spring Boot microservice consumes Alfresco Kafka events
– When a node with a “marker” aspect changes, the microservice fetches the
content
– Fingerprints are used to avoid repeatedly processing the same content
– Text is extracted using Apache Tika
– Extracted text is run through Apache OpenNLP to extract people and places
– People and places are written to Alfresco content metadata via CMIS
• Demo: https://youtu.be/H-2TgoUijzY
25. Learn. Connect. Collaborate.
NLP Enricher Service
• Small Spring Boot app
• Runs in a servlet container
• Listens for Alfresco Kafka events
• Fetches Alfresco content
• Extracts people, places, and orgs
• Writes metadata back to Alfresco Apache Kafka
Alfresco
alf-nlp-enricher
alfresco-
kafka
Kafka
Client JAR
Event
Event
CMIS GET
Node JSON
CMIS POST
31. Learn. Connect. Collaborate.
Apache Kafka
Alfresco
Microservice
Event
Event
Microservice
Event
Microservice
Event
Move logic out of Alfresco into microservices
alfresco-
kafka
Kafka
Client JAR
32. Learn. Connect. Collaborate.
Other potential uses
• Full-text search indexing into standalone search engine
• Synchronizing content with other servers
• Improved HTML transformations
• Notification/subscription service
• Chat integration
33. Learn. Connect. Collaborate.
Event-based approach disadvantages
• More code/complexity than traditional approach
• User feedback/notification is not straightforward
• Potentially increases the number of “containers” in the IT shop
34. Learn. Connect. Collaborate.
Event-based approach advantages
• In-line with Alfresco’s stated architectural direction
• Reduces the amount of code running in Alfresco’s process
– Reduces the number of deployments required to support integrations
– Off-loads long-running and/or process-intensive integrations from Alfresco
– Scales independently of Alfresco
• Integrations are more loosely-coupled from Alfresco
– Requires less Alfresco knowledge
– Frees up architectural choices for integrations (not just Java)
• Integration apps are relatively easy to containerize
• Can work alongside traditional approach
35. Learn. Connect. Collaborate.
Demo Dependency Versions
• Alfresco 5.2.g CE & 5.2.3 Enterprise with
– Metaversant Alfresco Kafka open source add-on 0.0.2
• Apache Kafka 2.12-0.10.2.1
• Elasticsearch 6.3.2
• Kibana 6.3.2
• Custom Spring Boot applications
– Spring Boot 1.5.8
– Elasticsearch High-level Rest Client 6.3.2
– Tika 1.18
– OpenNLP 1.8.4
– Apache Chemistry 1.0.0
37. Learn. Connect. Collaborate.
See Also
• Apache ManifoldCF
– http://manifoldcf.apache.org/
– Crawler that indexes from repositories like Alfresco into Solr & Elasticsearch
• Apache Stanbol
– http://stanbol.apache.org/
– Semantic engine that can do metadata enhancement and other things
• Apache Camel
– http://camel.apache.org/
– Enterprise integration platform
38. Learn. Connect. Collaborate.
• Consulting firm focused on solving business problems with open source
Content Management, Workflow, & Search technology
• Founded in 2010
• Clients all over the world in a variety of industries, including:
– Airlines & Aerospace
– Manufacturing
– Construction
– Financial Services
– Higher Education
– Life Sciences
– Professional Services
https://www.metaversant.com
39. Moving from Actions &
Behaviors to Microservices
Jeff Potts, Metaversant
@jeffpotts01
Editor's Notes
…without writing one-off integrations that must be deployed to the Alfresco server
…without adding unnecessary performance burden on Alfresco
…without requiring other teams to learn Alfresco internals
Triggers can be action-driven or change-driven or both.
These requirements often met with actions and behaviors.
Event is kept minimal to avoid disclosing sensitive information and to keep the solution as generic as possible.
Alfresco Insight Engine is an interesting alternative, but it requires the latest Alfresco version.
For a production implementation, the past hashes should probably be persisted somewhere instead of being kept in memory