SlideShare a Scribd company logo

BP-3 Taking Your Bulk Content Ingestions to the Next Level

Learn about the Alfresco Bulk Filesystem Import Tool, a community developed extension to Alfresco that provides a high performance bulk import feature. Discover how different tuning parameters affect import performance, and learn how to determine the optimum configuration for your Alfresco environment.

1 of 40
Download to read offline
Peter Monks!
Director of Technology, Strategic Alliances!
Agenda!
 1.  Introduction to the Bulk Filesystem Import Tool!
 2.  Demo!
 3.  Performance analysis!
   1.  Methodology!
   2.  Results!
   3.  Conclusions!
 4.  Roadmap for the Bulk Filesystem Import Tool!
 5.  Q&A!
 6.  Appendices!
Introduction to the

ulk ile ystem mport ool !
      (   for short)!
Introduction to the BFSIT!
        =    ulk ile ystem mport ool
 •  Primary use case: one-off content migration / ingestion!
 •  Provides high-performance import of content!

 •  A community maintained extension to Alfresco!
   •    Hosted on Google Code [1]!
   •    LGPL licensed!
   •    Widely adopted!
Introduction to the BFSIT!
Why not use…
  •    Web UIs?!
  •    ACP Files?!
  •    CIFS, FTP, NFS, WebDAV, IMAP?!
  •    CMIS?!


All of the above suffer from one or more of:
  •  Content sent over network!
  •  External (out of process) orchestration!
  •  Content requires pre-/post-processing (e.g. ACP)!
  •  Chatty (e.g. CIFS, NFS)!
  •  Overly general (e.g. CMIS)!
Introduction to the BFSIT!
Solution
 •     Import content from the Alfresco server!
 •     Load folders & content as they appear on disk!
 •     Content is imported in batches!
 •     The “unit of work” is the directory!
 •  Each directory is imported in at least one batch!
      •  More if lots of content!
 •  Batches within a directory are processed serially!
Ad

Recommended

Postgresql 9.0 HA at LOADAYS 2012
Postgresql 9.0 HA at LOADAYS 2012Postgresql 9.0 HA at LOADAYS 2012
Postgresql 9.0 HA at LOADAYS 2012Julien Pivotto
 
Jenkins User Meetup - eXo usages of Jenkins
Jenkins User Meetup - eXo usages of JenkinsJenkins User Meetup - eXo usages of Jenkins
Jenkins User Meetup - eXo usages of JenkinsArnaud Héritier
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenPostgresOpen
 
Chaione Ember.js Training
Chaione Ember.js TrainingChaione Ember.js Training
Chaione Ember.js Trainingaortbals
 
Powerpoint: Video and Audio
Powerpoint: Video and AudioPowerpoint: Video and Audio
Powerpoint: Video and Audiosamharlow
 
BP-9 Share Customization Best Practices
BP-9 Share Customization Best PracticesBP-9 Share Customization Best Practices
BP-9 Share Customization Best PracticesAlfresco Software
 
BP-7 Share Customization Best Practices
BP-7 Share Customization Best PracticesBP-7 Share Customization Best Practices
BP-7 Share Customization Best PracticesAlfresco Software
 

More Related Content

Similar to BP-3 Taking Your Bulk Content Ingestions to the Next Level

Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Pythoninfodox
 
BP-8 Global Federation and Search
BP-8 Global Federation and SearchBP-8 Global Federation and Search
BP-8 Global Federation and SearchAlfresco Software
 
Ratpack for Real
Ratpack for RealRatpack for Real
Ratpack for RealTomAkehurst
 
Source version control using subversion
Source version control using subversionSource version control using subversion
Source version control using subversionMangesh Bhujbal
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions Alfresco Software
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data SafeTony Tam
 
Netty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersNetty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersRick Hightower
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)MongoDB
 
Florian Koch - Monitoring CoreOS with Zabbix
Florian Koch - Monitoring CoreOS with ZabbixFlorian Koch - Monitoring CoreOS with Zabbix
Florian Koch - Monitoring CoreOS with ZabbixZabbix
 
2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...
2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...
2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...PHP Conference Argentina
 
Git vs Subversion: ¿Cuando elegir uno u otro?
Git vs Subversion: ¿Cuando elegir uno u otro?Git vs Subversion: ¿Cuando elegir uno u otro?
Git vs Subversion: ¿Cuando elegir uno u otro?Paradigma Digital
 
Project Basecamp: News From Camp 4
Project Basecamp: News From Camp 4Project Basecamp: News From Camp 4
Project Basecamp: News From Camp 4Digital Bond
 
Docking postgres
Docking postgresDocking postgres
Docking postgresrycamor
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsNETWAYS
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment StrategyMongoDB
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications OpenEBS
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBDdawnlua
 
CoconutKit
CoconutKitCoconutKit
CoconutKitdefagos
 

Similar to BP-3 Taking Your Bulk Content Ingestions to the Next Level (20)

Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
 
BP-8 Global Federation and Search
BP-8 Global Federation and SearchBP-8 Global Federation and Search
BP-8 Global Federation and Search
 
Ratpack for Real
Ratpack for RealRatpack for Real
Ratpack for Real
 
Source version control using subversion
Source version control using subversionSource version control using subversion
Source version control using subversion
 
Scale your Alfresco Solutions
Scale your Alfresco Solutions Scale your Alfresco Solutions
Scale your Alfresco Solutions
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
Netty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and BuffersNetty Notes Part 2 - Transports and Buffers
Netty Notes Part 2 - Transports and Buffers
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Florian Koch - Monitoring CoreOS with Zabbix
Florian Koch - Monitoring CoreOS with ZabbixFlorian Koch - Monitoring CoreOS with Zabbix
Florian Koch - Monitoring CoreOS with Zabbix
 
2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...
2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...
2013 - Igor Sysoev - NGINx: origen, evolución y futuro - PHP Conference Argen...
 
Git vs Subversion: ¿Cuando elegir uno u otro?
Git vs Subversion: ¿Cuando elegir uno u otro?Git vs Subversion: ¿Cuando elegir uno u otro?
Git vs Subversion: ¿Cuando elegir uno u otro?
 
Extlect03
Extlect03Extlect03
Extlect03
 
Project Basecamp: News From Camp 4
Project Basecamp: News From Camp 4Project Basecamp: News From Camp 4
Project Basecamp: News From Camp 4
 
Docking postgres
Docking postgresDocking postgres
Docking postgres
 
OSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy HawkinsOSDC 2013 | Introduction into Chef by Andy Hawkins
OSDC 2013 | Introduction into Chef by Andy Hawkins
 
Hot sec10 slide-suzaki
Hot sec10 slide-suzakiHot sec10 slide-suzaki
Hot sec10 slide-suzaki
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications Latest (storage IO) patterns for cloud-native applications
Latest (storage IO) patterns for cloud-native applications
 
Introduction to DRBD
Introduction to DRBDIntroduction to DRBD
Introduction to DRBD
 
CoconutKit
CoconutKitCoconutKit
CoconutKit
 

More from Alfresco Software

Alfresco Day Benelux Inholland studentendossier
Alfresco Day Benelux Inholland studentendossierAlfresco Day Benelux Inholland studentendossier
Alfresco Day Benelux Inholland studentendossierAlfresco Software
 
Alfresco Day Benelux Hogeschool Inholland Records Management application
Alfresco Day Benelux Hogeschool Inholland Records Management applicationAlfresco Day Benelux Hogeschool Inholland Records Management application
Alfresco Day Benelux Hogeschool Inholland Records Management applicationAlfresco Software
 
Alfresco Day BeNelux: Customer Success Showcase - Saxion Hogescholen
Alfresco Day BeNelux: Customer Success Showcase - Saxion HogescholenAlfresco Day BeNelux: Customer Success Showcase - Saxion Hogescholen
Alfresco Day BeNelux: Customer Success Showcase - Saxion HogescholenAlfresco Software
 
Alfresco Day BeNelux: Customer Success Showcase - Gemeente Amsterdam
Alfresco Day BeNelux: Customer Success Showcase - Gemeente AmsterdamAlfresco Day BeNelux: Customer Success Showcase - Gemeente Amsterdam
Alfresco Day BeNelux: Customer Success Showcase - Gemeente AmsterdamAlfresco Software
 
Alfresco Day BeNelux: The success of Alfresco
Alfresco Day BeNelux: The success of AlfrescoAlfresco Day BeNelux: The success of Alfresco
Alfresco Day BeNelux: The success of AlfrescoAlfresco Software
 
Alfresco Day BeNelux: Customer Success Showcase - Credendo Group
Alfresco Day BeNelux: Customer Success Showcase - Credendo GroupAlfresco Day BeNelux: Customer Success Showcase - Credendo Group
Alfresco Day BeNelux: Customer Success Showcase - Credendo GroupAlfresco Software
 
Alfresco Day BeNelux: Digital Transformation - It's All About Flow
Alfresco Day BeNelux: Digital Transformation - It's All About FlowAlfresco Day BeNelux: Digital Transformation - It's All About Flow
Alfresco Day BeNelux: Digital Transformation - It's All About FlowAlfresco Software
 
Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...
Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...
Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...Alfresco Software
 
Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...
Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...
Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...Alfresco Software
 
Alfresco Day Vienna 2016: Alfrescos neue Rest API
Alfresco Day Vienna 2016: Alfrescos neue Rest APIAlfresco Day Vienna 2016: Alfrescos neue Rest API
Alfresco Day Vienna 2016: Alfrescos neue Rest APIAlfresco Software
 
Alfresco Day Vienna 2016: Support Tools für die Admin-Konsole
Alfresco Day Vienna 2016: Support Tools für die Admin-KonsoleAlfresco Day Vienna 2016: Support Tools für die Admin-Konsole
Alfresco Day Vienna 2016: Support Tools für die Admin-KonsoleAlfresco Software
 
Alfresco Day Vienna 2016: Entwickeln mit Alfresco
Alfresco Day Vienna 2016: Entwickeln mit AlfrescoAlfresco Day Vienna 2016: Entwickeln mit Alfresco
Alfresco Day Vienna 2016: Entwickeln mit AlfrescoAlfresco Software
 
Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...
Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...
Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...Alfresco Software
 
Alfresco Day Vienna 2016: Partner Lightning Talk: Westernacher
Alfresco Day Vienna 2016: Partner Lightning Talk: WesternacherAlfresco Day Vienna 2016: Partner Lightning Talk: Westernacher
Alfresco Day Vienna 2016: Partner Lightning Talk: WesternacherAlfresco Software
 
Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...
Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...
Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...Alfresco Software
 
Alfresco Day Vienna 2016: Partner Lightning Talk - it-novum
Alfresco Day Vienna 2016: Partner Lightning Talk - it-novumAlfresco Day Vienna 2016: Partner Lightning Talk - it-novum
Alfresco Day Vienna 2016: Partner Lightning Talk - it-novumAlfresco Software
 
Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...
Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...
Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...Alfresco Software
 
Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...
Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...
Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...Alfresco Software
 
Alfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - Safran
Alfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - SafranAlfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - Safran
Alfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - SafranAlfresco Software
 
Alfresco Day Warsaw 2016: Advancing the Flow of Digital Business
Alfresco Day Warsaw 2016: Advancing the Flow of Digital BusinessAlfresco Day Warsaw 2016: Advancing the Flow of Digital Business
Alfresco Day Warsaw 2016: Advancing the Flow of Digital BusinessAlfresco Software
 

More from Alfresco Software (20)

Alfresco Day Benelux Inholland studentendossier
Alfresco Day Benelux Inholland studentendossierAlfresco Day Benelux Inholland studentendossier
Alfresco Day Benelux Inholland studentendossier
 
Alfresco Day Benelux Hogeschool Inholland Records Management application
Alfresco Day Benelux Hogeschool Inholland Records Management applicationAlfresco Day Benelux Hogeschool Inholland Records Management application
Alfresco Day Benelux Hogeschool Inholland Records Management application
 
Alfresco Day BeNelux: Customer Success Showcase - Saxion Hogescholen
Alfresco Day BeNelux: Customer Success Showcase - Saxion HogescholenAlfresco Day BeNelux: Customer Success Showcase - Saxion Hogescholen
Alfresco Day BeNelux: Customer Success Showcase - Saxion Hogescholen
 
Alfresco Day BeNelux: Customer Success Showcase - Gemeente Amsterdam
Alfresco Day BeNelux: Customer Success Showcase - Gemeente AmsterdamAlfresco Day BeNelux: Customer Success Showcase - Gemeente Amsterdam
Alfresco Day BeNelux: Customer Success Showcase - Gemeente Amsterdam
 
Alfresco Day BeNelux: The success of Alfresco
Alfresco Day BeNelux: The success of AlfrescoAlfresco Day BeNelux: The success of Alfresco
Alfresco Day BeNelux: The success of Alfresco
 
Alfresco Day BeNelux: Customer Success Showcase - Credendo Group
Alfresco Day BeNelux: Customer Success Showcase - Credendo GroupAlfresco Day BeNelux: Customer Success Showcase - Credendo Group
Alfresco Day BeNelux: Customer Success Showcase - Credendo Group
 
Alfresco Day BeNelux: Digital Transformation - It's All About Flow
Alfresco Day BeNelux: Digital Transformation - It's All About FlowAlfresco Day BeNelux: Digital Transformation - It's All About Flow
Alfresco Day BeNelux: Digital Transformation - It's All About Flow
 
Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...
Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...
Alfresco Day Vienna 2016: Activiti – ein Katalysator für die DMS-Strategie be...
 
Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...
Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...
Alfresco Day Vienna 2016: Elektronische Geschäftsprozesse auf Basis von Alfre...
 
Alfresco Day Vienna 2016: Alfrescos neue Rest API
Alfresco Day Vienna 2016: Alfrescos neue Rest APIAlfresco Day Vienna 2016: Alfrescos neue Rest API
Alfresco Day Vienna 2016: Alfrescos neue Rest API
 
Alfresco Day Vienna 2016: Support Tools für die Admin-Konsole
Alfresco Day Vienna 2016: Support Tools für die Admin-KonsoleAlfresco Day Vienna 2016: Support Tools für die Admin-Konsole
Alfresco Day Vienna 2016: Support Tools für die Admin-Konsole
 
Alfresco Day Vienna 2016: Entwickeln mit Alfresco
Alfresco Day Vienna 2016: Entwickeln mit AlfrescoAlfresco Day Vienna 2016: Entwickeln mit Alfresco
Alfresco Day Vienna 2016: Entwickeln mit Alfresco
 
Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...
Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...
Alfresco Day Vienna 2016: Activiti goes enterprise: Die Evolution der BPM Sui...
 
Alfresco Day Vienna 2016: Partner Lightning Talk: Westernacher
Alfresco Day Vienna 2016: Partner Lightning Talk: WesternacherAlfresco Day Vienna 2016: Partner Lightning Talk: Westernacher
Alfresco Day Vienna 2016: Partner Lightning Talk: Westernacher
 
Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...
Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...
Alfresco Day Vienna 2016: Bringing Content & Process together with the App De...
 
Alfresco Day Vienna 2016: Partner Lightning Talk - it-novum
Alfresco Day Vienna 2016: Partner Lightning Talk - it-novumAlfresco Day Vienna 2016: Partner Lightning Talk - it-novum
Alfresco Day Vienna 2016: Partner Lightning Talk - it-novum
 
Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...
Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...
Alfresco Day Vienna 2016: How to Achieve Digital Flow in the Enterprise - Joh...
 
Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...
Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...
Alfresco Day Warsaw 2016 - Czy możliwe jest spełnienie wszystkich regulacji p...
 
Alfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - Safran
Alfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - SafranAlfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - Safran
Alfresco Day Warsaw 2016: Identyfikacja i podpiselektroniczny - Safran
 
Alfresco Day Warsaw 2016: Advancing the Flow of Digital Business
Alfresco Day Warsaw 2016: Advancing the Flow of Digital BusinessAlfresco Day Warsaw 2016: Advancing the Flow of Digital Business
Alfresco Day Warsaw 2016: Advancing the Flow of Digital Business
 

Recently uploaded

Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceSusan Ibach
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementMimmo Squillace
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxInfosec
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Product School
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
Imaging and Design for the Online Environment Part 1.pptx
Imaging and Design for the Online Environment Part 1.pptxImaging and Design for the Online Environment Part 1.pptx
Imaging and Design for the Online Environment Part 1.pptxPower Point
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxmohayyudin7826
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor FesenkoFwdays
 
10 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 202410 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 2024Thijs Feryn
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...Fwdays
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys VasylievFwdays
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...DianaGray10
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro KozhevinFwdays
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?MENGSAYLOEM1
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Umar Saif
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfkatalinjordans1
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 

Recently uploaded (20)

Confoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data scienceConfoo 2024 Gettings started with OpenAI and data science
Confoo 2024 Gettings started with OpenAI and data science
 
AI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvementAI Act & Standardization: UNINFO involvement
AI Act & Standardization: UNINFO involvement
 
How AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptxHow AI and ChatGPT are changing cybersecurity forever.pptx
How AI and ChatGPT are changing cybersecurity forever.pptx
 
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
Cultivating Entrepreneurial Mindset in Product Management: Strategies for Suc...
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
Imaging and Design for the Online Environment Part 1.pptx
Imaging and Design for the Online Environment Part 1.pptxImaging and Design for the Online Environment Part 1.pptx
Imaging and Design for the Online Environment Part 1.pptx
 
Apex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptxApex Replay Debugger and Salesforce Platform Events.pptx
Apex Replay Debugger and Salesforce Platform Events.pptx
 
"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko"Platform Engineering with Development Containers", Igor Fesenko
"Platform Engineering with Development Containers", Igor Fesenko
 
10 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 202410 things that helped me advance my career - PHP UK Conference 2024
10 things that helped me advance my career - PHP UK Conference 2024
 
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions..."How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
"How we created an SRE team in Temabit as a part of FOZZY Group in conditions...
 
"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev"AIRe - AI Reliability Engineering", Denys Vasyliev
"AIRe - AI Reliability Engineering", Denys Vasyliev
 
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
Automation Ops Series: Session 1 - Introduction and setup DevOps for UiPath p...
 
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
"DevOps Practisting Platform on EKS with Karpenter autoscaling", Dmytro Kozhevin
 
Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?Are Human-generated Demonstrations Necessary for In-context Learning?
Are Human-generated Demonstrations Necessary for In-context Learning?
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
Progress Report: Ministry of IT under Dr. Umar Saif Aug 23-Feb'24
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdf
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 

BP-3 Taking Your Bulk Content Ingestions to the Next Level

  • 1. Peter Monks! Director of Technology, Strategic Alliances!
  • 2. Agenda! 1.  Introduction to the Bulk Filesystem Import Tool! 2.  Demo! 3.  Performance analysis! 1.  Methodology! 2.  Results! 3.  Conclusions! 4.  Roadmap for the Bulk Filesystem Import Tool! 5.  Q&A! 6.  Appendices!
  • 3. Introduction to the
 ulk ile ystem mport ool ! ( for short)!
  • 4. Introduction to the BFSIT! = ulk ile ystem mport ool •  Primary use case: one-off content migration / ingestion! •  Provides high-performance import of content! •  A community maintained extension to Alfresco! •  Hosted on Google Code [1]! •  LGPL licensed! •  Widely adopted!
  • 5. Introduction to the BFSIT! Why not use… •  Web UIs?! •  ACP Files?! •  CIFS, FTP, NFS, WebDAV, IMAP?! •  CMIS?! All of the above suffer from one or more of: •  Content sent over network! •  External (out of process) orchestration! •  Content requires pre-/post-processing (e.g. ACP)! •  Chatty (e.g. CIFS, NFS)! •  Overly general (e.g. CMIS)!
  • 6. Introduction to the BFSIT! Solution •  Import content from the Alfresco server! •  Load folders & content as they appear on disk! •  Content is imported in batches! •  The “unit of work” is the directory! •  Each directory is imported in at least one batch! •  More if lots of content! •  Batches within a directory are processed serially!
  • 7. Introduction to the BFSIT! Usage: 1.  Initiated via a simple repo Web Script:! (can also be initiated via wget, curl, et al)! 2.  Import runs in background! 3.  Detailed status is displayed while in progress! •  Weʼll see that during the demo!
  • 8. Introduction to the BFSIT! Process details: •  Place source directory on job queue & immediately return! •  Worker thread pulls a single directory off job queue, and:! 1.  Lists the contents of the directory! 2.  Groups entries into “importable items”! 3.  Filters importable items, based on admin-defined filtering rules! 4.  Subdivides list of importable items into batches! 5.  Imports batches, one at a time (serially)! 6.  Places all subdirectories onto the job queue!
  • 9. Introduction to the BFSIT! Process details: •  Place source directory on job queue & immediately return! •  Worker thread pulls a single directory off job queue, and:! 1.  Lists the contents of the directory! 2.  Groups entries into “importable items”! I/O   3.  Filters importable items, based on admin-defined filtering rules! Bound   Phases   4.  Subdivides list of importable items into batches! 5.  Imports batches, one at a time (serially)! CPU   6.  Places all subdirectories onto the job queue! Bound   Phases  
  • 10. Demo!
  • 12. Performance Analysis:
 Methodology!
  • 13. Goals and Test Plan! Goals: •  Benchmark total time taken for bulk imports, using combination of:! •  Machine environments! •  Source content sets! •  Alfresco repository configurations! •  Bulk import tool configurations! Test Plan: •  Parallel testing in 2 environments! •  Two runs per test per machine:! 1.  Import into fresh (empty) repository! 2.  Delete target folder then re-import (without restarting Alfresco)! •  Record average of duration of each run! •  Modify only one configuration parameter at a time, resetting earlier modifications in between!
  • 14. Environments! Environment 1 Environment 2 •  2009 model MacBook Pro! •  2006 model Thinkpad T60! •  2.8Ghz dual-core CPU! •  2.33Ghz dual-core CPU! •  4GB RAM! •  3GB RAM! •  Solid State Drive (Toshiba OEM) ! •  Dual hard drives (Seagate, Hitachi)! •  64bit Mac OSX Lion 10.7.1! •  First used for source directory! •  MySQL 5.1! •  Second used for Alfresco repository! •  Apple JDK 1.6.0_26! •  64bit Ubuntu Natty Narwhal 11.04! •  MySQL 5.1! •  OpenJDK 1.6.0_22! NOTE 1: Neither of these environments are “production grade”! NOTE 2: These environments are not directly comparable!
  • 15. Content Sets! Name   #  Folders   #  Files   Total  Size   Notes   Typical   38   4,640   1.44GB   Extreme  File  Size   1   9   4.41GB   Extreme  File  Volume   4   11,100   521.7KB   Extreme  Directory   1,021   0   0B   100  levels  of  nes8ng   Structure  
  • 17. Baseline! Notes: •  Repository tuned as per Day Zero Config Guide! •  BFSIT has default configuration! Observations: •  Environment 2 is significantly slower at creating cm:folder nodes! •  Theory: creating cm:folder nodes is “seeky” (more on this later)!
  • 18. Disable User Quotas! Observations: •  Quota calculation performance proportional to number of cm:content nodes! •  Quota calculation performance not affected by content size!
  • 19. Disable In-txn Indexing! Notes: •  This configuration is not compatible with Share 3.x!! Observations: •  Transactional indexing slows Alfresco down a lot, particularly in environment 2! •  Theory: indexing is highly “seeky”!
  • 20. Disable Indexing Entirely! Notes: •  This configuration is not compatible with Share 3.x!! •  This configuration functionally cripples Alfresco!! Observations: •  Some contention between ingestions & indexing (even async)! •  Theory: SOLR integration in 4.x should provide similar performance!
  • 21. Optimal Repository Configuration! Optimal repository configuration, without functionally crippling Alfresco, is: •  Disable user quotas:! system.usages.enabled=false •  Disable in-transaction indexing:! index.tracking.disableInTransactionIndexing=true alfresco.cluster.name=dummyCluster •  Indexing still occurs, just not synchronously in-transaction! •  Incompatible with Share 3.x, but can be disabled temporarily during import, then re-enabled post-import!
  • 22. Optimal Repository Configuration – Results! Notes: •  This configuration is not compatible with Share 3.x!! Observations: •  Slower environment (2) benefits more than the faster environment (1)! •  Configuration canʼt speed up import of large files! •  Requires faster storage devices (e.g. RAID 10)!
  • 25. Worker Thread Pool Sizes! Notes: •  Baseline is optimal repository configuration! •  Only the “Typical” content set was used for testing! Observations: •  Multi-threading is mostly irrelevant! •  Not surprising, given ingestion is I/O bound! •  Steady improvement in environment 1! •  Theory: concurrent I/O support in SSD!
  • 26. Batch Weights! Observations: •  Larger batches = better performance! …HOWEVER…! •  UI responsiveness got worse! •  A classic trade-off! •  Ultimately, performance similar to baseline (batch weight = 100)!
  • 27. Optimal BFSIT Configuration! Optimal BFSIT configuration: •  High thread count (mostly irrelevant):! alfresco-bulk-filesystem-import.threadpool.size.core=48 alfresco-bulk-filesystem-import.threadpool.size.max=48 •  More importantly, high batch weight:! alfresco-bulk-filesystem-import.batch.weight=1000 •  Impacts UI responsiveness! •  Could reduce if needed, at little cost!
  • 28. Optimal BFSIT Configuration - Results! Observations: •  Modest improvement over baseline! •  Implies default BFSIT configuration is close to optimal!
  • 30. Rethinking the Problem! What if the BFSIT didn’t have to stream content into the repository at all? What if the source content was already in the contentstore and only had to be “linked” into the repository?
  • 31. In-place Import! Notes: •  Baseline is optimal repository configuration! •  Optimal repository & BFSIT configuration! Observations: •  Improvement across the board! •  Best improvement is extreme file size case – 375X faster!!
  • 33. Performance Analysis:
 Conclusions !
  • 34. Conclusions! Results: •  Minimum improvement of 6%! •  Average improvement of 60%! •  Maximum improvement of 99.7%! •  In absolute terms, saw performance of up to:! •  16GB / sec! •  120 nodes / sec! Recall this wasn’t on production hardware!!
  • 35. Conclusions! Developers: •  Macro-optimization will always outperform micro-optimization!! •  Multi-threading is not a magic bullet! Itʼs only helpful if a given operation is CPU bound and can be parallelised.! Administrators: •  Use the Day Zero Configuration Guide for every install you do!! •  Donʼt assume superficially similar environments will perform similarly! •  For bulk ingestions Alfresco is (mostly) I/O bound!
  • 37. BFSIT Roadmap! Official roadmap is on the Google Code project’s wiki [2]. BFSIT v1.1 – Performance: •  Issue #91: Optimization of directory analysis phase [complete].! •  Issue #8: Multi-threaded imports [complete].! •  Issue #86: In-place imports [complete].! •  Issue #77: graphical display of throughput.! •  Issue #17: Test various different dimensions to see how they affect performance [complete – this talk!]! BFSIT v1.2 – Alfresco 4.0, Usability & Performance: •  Issue #92: Test on Alfresco 4.0! •  Issue #26: Integrate into Share's administration console! •  Issue #94: Investigate use of Alfresco's BatchProcessor framework for the multi-threaded importer! •  Issue #96: Measure performance of alternative batching strategies! •  Issue #79: Reimplement the bulk filesystem import as a subsystem! •  Issue #62: Add support for cm:content properties! BFSIT v1.3+: •  You tell me – Iʼm always keen to hear feedback!! •  The issues list [3] and mailing list [4] are great ways to start getting involved in the project!
  • 38. References! [1] http://code.google.com/p/alfresco-bulk-filesystem-import/ [2] http://code.google.com/p/alfresco-bulk-filesystem-import/wiki/Roadmap [3] http://code.google.com/p/alfresco-bulk-filesystem-import/issues/list [4] http://groups.google.com/group/alfresco-bulk-filesystem-import Also: •  http://blogs.alfresco.com/wp/pmonks/2009/10/22/bulk-import-from-a-filesystem/! •  Sessions:! •  BP-1 – Performance Tuning! •  BP-6 – Repository Customization Best Practices! •  BP-9 – Share Customization Best Practices!
  • 40. Appendix A – “Typical” Content Set Distributions!