The document discusses content extraction with Apache Tika. It introduces Tika and describes how it can be used to extract full text and metadata from various file formats. It also discusses using Tika with Solr and Lucene, including feeding parsed content directly into a Lucene index and using the ExtractingRequestHandler with Solr. Special considerations for large documents and link extraction are also covered.
DNS hijacking using cloud providers – No verification neededFrans Rosén
This is my talk from OWASP Appsec EU and also Security Fest 2017.
A few years ago, Frans and his team posted an article on Detectify Labs regarding domain hijacking using services like AWS, Heroku and GitHub. These issues still remains and are still affecting a lot of companies. Jonathan Claudius from Mozilla even calls “Subdomain takeover” “the new XSS”. Since then, many tools have popped up to spot these sorts of vulnerabilities. Frans will go through both the currently disclosed and the non-disclosed ways to take control over domains and will share the specific techniques involved.
PSConfEU - Offensive Active Directory (With PowerShell!)Will Schroeder
This talk covers PowerShell for offensive Active Directory operations with PowerView. It was given on April 21, 2016 at the PowerShell Conference EU 2016.
This presentation was given at BSides Austin '15, and is an expanded version of the "I hunt sys admins" Shmoocon firetalk. It covers various ways to hunt for users in Windows domains, including using PowerView.
A story of the passive aggressive sysadmin of AEMFrans Rosén
# By Frans Rosén
Adobe Experience Manager is an enterprise CMS with a troubled history. It was created with the angle of high customization factor, enabling consulting firms to deploy it all over the world for huge customers.
Then came security.
Frans will go through some terrible default configuration mistakes, Adobe’s love for bad Flash and how a sysadmin accidentialy exposed an international multi billion dollar company using only sad thoughts.
# About speaker
Frans Rosén is a tech entrepreneur, bug bounty hunter and a Security Advisor at Detectify, a security service for developers. He’s a frequent blogger at Detectify Labs and a top ranked participant of bug bounty programs, receiving some of the highest bounty payouts ever on HackerOne.
Frans was recently featured as #2 on Hackread’s list of 10 Famous Bug Bounty Hunters of All Time and the results of his security research has been covered in numerous international publications such as Observer, BBC, Ars Technica, Wired and Mashable.
DNS hijacking using cloud providers – No verification neededFrans Rosén
This is my talk from OWASP Appsec EU and also Security Fest 2017.
A few years ago, Frans and his team posted an article on Detectify Labs regarding domain hijacking using services like AWS, Heroku and GitHub. These issues still remains and are still affecting a lot of companies. Jonathan Claudius from Mozilla even calls “Subdomain takeover” “the new XSS”. Since then, many tools have popped up to spot these sorts of vulnerabilities. Frans will go through both the currently disclosed and the non-disclosed ways to take control over domains and will share the specific techniques involved.
PSConfEU - Offensive Active Directory (With PowerShell!)Will Schroeder
This talk covers PowerShell for offensive Active Directory operations with PowerView. It was given on April 21, 2016 at the PowerShell Conference EU 2016.
This presentation was given at BSides Austin '15, and is an expanded version of the "I hunt sys admins" Shmoocon firetalk. It covers various ways to hunt for users in Windows domains, including using PowerView.
A story of the passive aggressive sysadmin of AEMFrans Rosén
# By Frans Rosén
Adobe Experience Manager is an enterprise CMS with a troubled history. It was created with the angle of high customization factor, enabling consulting firms to deploy it all over the world for huge customers.
Then came security.
Frans will go through some terrible default configuration mistakes, Adobe’s love for bad Flash and how a sysadmin accidentialy exposed an international multi billion dollar company using only sad thoughts.
# About speaker
Frans Rosén is a tech entrepreneur, bug bounty hunter and a Security Advisor at Detectify, a security service for developers. He’s a frequent blogger at Detectify Labs and a top ranked participant of bug bounty programs, receiving some of the highest bounty payouts ever on HackerOne.
Frans was recently featured as #2 on Hackread’s list of 10 Famous Bug Bounty Hunters of All Time and the results of his security research has been covered in numerous international publications such as Observer, BBC, Ars Technica, Wired and Mashable.
Here Be Dragons: The Unexplored Land of Active Directory ACLsAndy Robbins
Presented by Andy Robbins, Rohan Vazarkar, and Will Schroeder at DerbyCon 7.0: Legacy, in Louisville, Kentucky, 2017.
See the video recording of the presentation here: https://www.youtube.com/watch?v=mfaFuXEiLF4
October OpenNTF Webinar - What we like about Domino/Notes 12, recommended new...Howard Greenberg
In this webinar OpenNTF members will discuss the Domino/Notes 12 features they like and suggest for everyone to check out!
The topics and speakers will be:
Time-based One-time Authentication (TOTP) - Roberto Boccadoro
TOTP allows multi-factor authentication. When users login to a Domino web server they have to provide a time-based one-time use password in addition to their usual name/password. This is done using a third party application like Google Authenticator, Authy or Duo Mobile on their mobile devices/computers.
Domino OSGI Tasklet Service (DOTS) - Serdar Basegmez
Create Domino server tasks using Java OSGI plugins. These can be scheduled and can interface with the server console using TELL commands.
One Touch Setup for Domino - Roberto Boccadoro
In previous versions of HCL Domino, setting up a Domino server involved multiple steps. Starting with Domino 12, you can use one-touch Domino setup to set up a server in just a single step.
Oracle 21c: New Features and Enhancements of Data Pump & TTSChristian Gohmann
At the end of the year 2020, Oracle released 21c on its Cloud infrastructure. The on-premises version will follow later this year. As with every new Oracle version, the Data Pump utility gets new features or enhancements for existing features.
This presentation gives an overview of the enhancements of Data Pump and Transportable Tablespaces. The following list is an excerpt of the points I will talk about
- Simultaneous use of EXCLUDE and INCLUDE
- Parallelized import of metadata during a TTS import operation
- Checksum support for dump files
- Direct access to Oracle Cloud Object Store for exports and imports
Thick Client Penetration Testing
You will learn how to do pentesting of Thick client applications on a local and network level, You will also learn how to analyze the internal communication between web services & API.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
In this session, we will look first at the rich metadata that documents in your repository have, how to control the mapping of this on to your content model, and some of the interesting things this can deliver. We’ll then move on to the content transformation and rendition services, and see how you can easily and powerfully generate a wide range of media from the content you already have. Finally, we’ll look at how to extend these services to support additional formats.
This along with the binaries to be found at my github profiles @ github.com/shahenshah99 is used to present and conduct a hands-on session on securely deploying containers in docker at the time of production
Here Be Dragons: The Unexplored Land of Active Directory ACLsAndy Robbins
Presented by Andy Robbins, Rohan Vazarkar, and Will Schroeder at DerbyCon 7.0: Legacy, in Louisville, Kentucky, 2017.
See the video recording of the presentation here: https://www.youtube.com/watch?v=mfaFuXEiLF4
October OpenNTF Webinar - What we like about Domino/Notes 12, recommended new...Howard Greenberg
In this webinar OpenNTF members will discuss the Domino/Notes 12 features they like and suggest for everyone to check out!
The topics and speakers will be:
Time-based One-time Authentication (TOTP) - Roberto Boccadoro
TOTP allows multi-factor authentication. When users login to a Domino web server they have to provide a time-based one-time use password in addition to their usual name/password. This is done using a third party application like Google Authenticator, Authy or Duo Mobile on their mobile devices/computers.
Domino OSGI Tasklet Service (DOTS) - Serdar Basegmez
Create Domino server tasks using Java OSGI plugins. These can be scheduled and can interface with the server console using TELL commands.
One Touch Setup for Domino - Roberto Boccadoro
In previous versions of HCL Domino, setting up a Domino server involved multiple steps. Starting with Domino 12, you can use one-touch Domino setup to set up a server in just a single step.
Oracle 21c: New Features and Enhancements of Data Pump & TTSChristian Gohmann
At the end of the year 2020, Oracle released 21c on its Cloud infrastructure. The on-premises version will follow later this year. As with every new Oracle version, the Data Pump utility gets new features or enhancements for existing features.
This presentation gives an overview of the enhancements of Data Pump and Transportable Tablespaces. The following list is an excerpt of the points I will talk about
- Simultaneous use of EXCLUDE and INCLUDE
- Parallelized import of metadata during a TTS import operation
- Checksum support for dump files
- Direct access to Oracle Cloud Object Store for exports and imports
Thick Client Penetration Testing
You will learn how to do pentesting of Thick client applications on a local and network level, You will also learn how to analyze the internal communication between web services & API.
EXPLAIN ANALYZE is a new query profiling tool first released in MySQL 8.0.18. This presentation covers how this new feature works, both on the surface and on the inside, and how you can use it to better understand your queries, to improve them and make them go faster.
This presentation is for everyone who has ever had to understand why a query is executed slower than anticipated, and for everyone who wants to learn more about query plans and query execution in MySQL.
In this session, we will look first at the rich metadata that documents in your repository have, how to control the mapping of this on to your content model, and some of the interesting things this can deliver. We’ll then move on to the content transformation and rendition services, and see how you can easily and powerfully generate a wide range of media from the content you already have. Finally, we’ll look at how to extend these services to support additional formats.
This along with the binaries to be found at my github profiles @ github.com/shahenshah99 is used to present and conduct a hands-on session on securely deploying containers in docker at the time of production
Mythical Mysfits: Monolith to Microservice with Docker and AWS Fargate (CON21...Amazon Web Services
Help our Mythical Mysfits find their forever homes! Our Mythical stack is aging and needs to be revamped ASAP. Join this workshop to get hands-on experience with Docker as you containerize the Mythical monolithic application, start breaking it apart into microservices, and deploy it using AWS Fargate. This is a foundational workshop on containers. No Docker experience required. Basic AWS experience recommended. For more advanced workshops in this series, consider CON321 and CON322.
Since the release of 17.05, Docker has introduced Multi-Stage Build for Docker Images for anyone who has struggled to optimize Dockerfiles while keeping them easy to read and maintain. This builder pattern will help anyone who would just like to have the runtime, configuration & application and doesn’t want to have compilers, debuggers, code, build, test logs etc.
Using Fedora Commons To Create A Persistent ArchivePhil Cryer
With the increasing amount of digital data and demand for open access to view and reuse such data continually increasing, the adoption of open source digital repository software is critical for long term storage and management of digital objects. By utilizing the open source Fedora Commons software, the Missouri Botanical Garden has created a stable, persistent archive for Tropicos digital objects, including specimen images, plant photos, and other digital media. Metadata, organized in standard Dublin Core extracted from Tropicos, are stored alongside the digital objects providing search and sharing of data via open standards such as REST and OAI, opening the door for mash-ups and alternative uses. The presentation will cover initial discovery, required hardware and software, and an overview of our experience implementing Fedora Commons. Lessons learned, pros and cons, and other options will also be covered.
Presentation given at AutoMacon on September 16, 2015 in Portland, OR. This talk covered how to build CoreOS components on Debian using Ansible and a brief demonstration of running an ELKStack application on the new cluster.
Construindo Data Lakes - Visão Prática com Hadoop e BigDataMarco Garcia
Minha apresentação sobre construção de data lakes para bigdata usando hadoop como plataforma de dados. Conheça mais sobre nossos trabalhos de consultoria e treinamento em Hadoop Hortonworks, BigData, Data Warehousing e Business Intelligence
Working Group on Distributed Authoring and Versioning on the World Wide Web
Goal: To enable distributed web authoring tools to be broadly interoperable.
Presentation uploaded by Murali Krishna Nookella
Hadoop and object stores: Can we do it better?gvernik
Strata Data Conference, London, May 2017
Trent Gray-Donald and Gil Vernik explain the challenges of current Hadoop and Apache Spark integration with object stores and discuss Stocator, an open source (Apache License 2.0) object store connector for Hadoop and Apache Spark specifically designed to optimize their performance with object stores. Trent and Gil describe how Stocator works and share real-life examples and benchmarks that demonstrate how it can greatly improve performance and reduce the quantity of resources used.
Presentation of the early prototype of "FAIR Profiles" - an example of the proposed DCAT Profile, proposed by the DCAT working group (but AFAIK never implemented). This prototype emerged from the activity of the "Skunkworks" group, from the Data FAIRport project.
We are excited to continue our work on BeanStalk with the introduction of a range of great new features. If you are a Python shop you'll learn how BeanStalk now supports Python containers and the Django and Flask frameworks. Hear about BeanStalk integration with RDS and how custom configuration of containers is possible through simple configuration files.
Similar to Content extraction with apache tika (20)
AEM6 comes with a fresh new repository backend designed for improved performance and scalability. This session introduces the new repository architecture and describes the key differences and improvements for developers and operations teams. Topics covered include content migration, backwards compatibility, key deployment scenarios and configuration options, and custom search indexes.
Apache development with GitHub and Travis CIJukka Zitting
Much of the recent innovation in development tooling has happened around Git-based cloud services like GitHub and Travis CI. While these services are not part of the official Apache infrastructure, it's still possible to use them to complement the tooling available to Apache projects. Based on experience from Apache Jackrabbit, this presentation shows how to leverage such external services while staying true to Apache principles and policies.
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
Apache Jackrabbit is just about to reach the 3.0 milestone based on a new architecture called Oak. Based on concepts like eventual consistency and multi-version concurrency control, and borrowing ideas from distributed version control systems and cloud-scale databases, the Oak architecture is a major leap ahead for Jackrabbit. This presentation describes the Oak architecture and shows what it means for the scalability and performance of modern content applications. Changes to existing Jackrabbit functionality are described and the migration process is explained.
/path/to/content - the Apache Jackrabbit content repositoryJukka Zitting
Looking for a database where user profiles and image galleries are equally at home? That comes with built-in full text search, fine-grained access control, flexible schemas, versioning and many more advanced features? Take a look at Apache Jackrabbit, the Java-based content repository that combines the best parts of file systems and databases. This introductory presentation covers Apache Jackrabbit and its hierarchical content model, and shows how it can be used as a powerful foundation of modern content-based applications.
Apache Tika aims to make it easier to extract metadata and structured text content from all kinds of files. Tika is a subproject of Apache Lucene, and leverages libraries like Apache POI and Apache PDFBox to provide a powerful yet simple interface for parsing dozens of document formats. This makes Tika an ideal companion for Apache Lucene, or for any search engine that needs to be able to index metadata and content from many different types of files. This presentation introduces Apache Tika and shows how it's being used in projects like Apache Solr and Apache Jackrabbit. You will learn how to integrate Tika with your application and how to configure and extend Tika to best suit your needs.