This document discusses scaling natural language processing (NLP) tasks by distributing work across multiple processors and machines. It describes running UIMA pipelines on a local cluster managed by Sun Grid Engine (SGE) to parallelize processing of independent documents. The local cluster, called Colfax, has 6 machines with 48 CPU cores and 96GB RAM that can be utilized through SGE job scripts to split work into arrays processed in parallel.
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
Odoo Online platform: architecture and challengesOdoo
A short introduction to the technical architecture of the Odoo Online platform, including the advanced integrated features (instant DNS, email gateways, etc.), and the technical aspect of the SLA.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
New Ways to Find Latency in Linux Using TracingScyllaDB
Ftrace is the official tracer of the Linux kernel. It originated from the real-time patch (now known as PREEMPT_RT), as developing an operating system for real-time use requires deep insight and transparency of the happenings of the kernel. Not only was tracing useful for debugging, but it was critical for finding areas in the kernel that was causing unbounded latency. It's no wonder why the ftrace infrastructure has a lot of tooling for seeking out latency. Ftrace was introduced into mainline Linux in 2008, and several talks have been done on how to utilize its tracing features. But a lot has happened in the past few years that makes the tooling for finding latency much simpler. Other talks at P99 will discuss the new ftrace tracers "osnoise" and "timerlat", but this talk will focus more on the new flexible and dynamic aspects of ftrace that facilitates finding latency issues which are more specific to your needs. Some of this work may still be in a proof of concept stage, but this talk will give you the advantage of knowing what tools will be available to you in the coming year.
Odoo Online platform: architecture and challengesOdoo
A short introduction to the technical architecture of the Odoo Online platform, including the advanced integrated features (instant DNS, email gateways, etc.), and the technical aspect of the SLA.
By Olivier Dony - Lead Developer & Community Manager, OpenERP
All you need to know about the JavaScript event loopSaša Tatar
Learn the difference between JavaScript Engine, JavaScript Runtime, what is JavaScript event loop and why we should care.
At the end the presentation goes through a couple of examples and implementations of throttle and debounce utility functions.
Overview of Mulesoft's Quartz Connector with sample scenarios. Sample examples are also provided in the slides.
Quartz inbound and Quartz outbound endpoints are explained.
Rafał Ostrowski: Node.js jest znany ze swojej wydajności przy obsługiwaniu wielu operacji I/O. Ale co w momencie, gdy przychodzi potrzeba obsługi zadań przy których wiatraki naszych CPU zaczynają podkręcać obroty? Czy nasze API jest wtedy skazane na czkawkę? Na prezentacji co nieco o Worker Threads oraz o klastrach i procesach, które ułatwią nam budowanie skalowalnych aplikacji.
This is a story about how we struggled to implement strict latency requirements in a service implemented with Scala and Netty. And how we managed to do that.
The most common latency contributors are an in-process locking, thread scheduling, I/O, algorithmic inefficiencies and, of course, garbage collector.
I will share our experience of dealing with the causes. And tell what you can do to prevent them from affecting the production.
Today we will take a look at OCP4 UPI Installation on KVM.
Basically, I used this official doc from Red Hat. Especially bare metal part. So although I use KVM, it is almost the same as bare metal.
To use UPI method, we need to setup a lot of stuff such as dns,network,load balancer, matchbox and so on. You can config them all maually but tn order to explain this topic properly, I've developed ansible and terraform script. From this video, I will explain pre-requisites and how you should config it by manual or by automation.
All you need to know about the JavaScript event loopSaša Tatar
Learn the difference between JavaScript Engine, JavaScript Runtime, what is JavaScript event loop and why we should care.
At the end the presentation goes through a couple of examples and implementations of throttle and debounce utility functions.
Overview of Mulesoft's Quartz Connector with sample scenarios. Sample examples are also provided in the slides.
Quartz inbound and Quartz outbound endpoints are explained.
Rafał Ostrowski: Node.js jest znany ze swojej wydajności przy obsługiwaniu wielu operacji I/O. Ale co w momencie, gdy przychodzi potrzeba obsługi zadań przy których wiatraki naszych CPU zaczynają podkręcać obroty? Czy nasze API jest wtedy skazane na czkawkę? Na prezentacji co nieco o Worker Threads oraz o klastrach i procesach, które ułatwią nam budowanie skalowalnych aplikacji.
This is a story about how we struggled to implement strict latency requirements in a service implemented with Scala and Netty. And how we managed to do that.
The most common latency contributors are an in-process locking, thread scheduling, I/O, algorithmic inefficiencies and, of course, garbage collector.
I will share our experience of dealing with the causes. And tell what you can do to prevent them from affecting the production.
Today we will take a look at OCP4 UPI Installation on KVM.
Basically, I used this official doc from Red Hat. Especially bare metal part. So although I use KVM, it is almost the same as bare metal.
To use UPI method, we need to setup a lot of stuff such as dns,network,load balancer, matchbox and so on. You can config them all maually but tn order to explain this topic properly, I've developed ansible and terraform script. From this video, I will explain pre-requisites and how you should config it by manual or by automation.
business services, specifications, proof of concepts, software, architecture, sales,
project specifications, xtreme programming,
use case, IT, business ideas, novelty,
Oracle Database Appliance RAC in a box Some Strings AttachedFuad Arshad
Oracle Database Appliance is an engineered system that is geared towards Small and Medium businesses but has all the bells and whistles for enterprise deployments. This talk will focus on Oracle Database Appliance deployment as a consolidation platform as well as as a development platform for rapidly deploying databases. We will talk about the Deployment process , the patching process and the best practices built into The Oracle Database Appliance. We will also talk about the DBA's role for managing the ODA as well as security considerations
With the rapid growth of the production and storage of large scale data sets it is important to investigate methods to drive the cost of storage systems down. We are currently in the midst of an information explosion and large scale storage centers are increasingly used to help store generated data. There are several methods to bring the cost of large scale storage centers down and we investigate a technique that focuses on transitioning storage disks into lower power states. This talk introduces a model of disk systems that leverages disk access patterns to produce energy saving opportunities for parallel disk systems. We also focus on the implementation of an energy-efficient storage cluster, where a couple of energy-saving techniques are incorporated. Our modeling and simulation results indicated that large data sizes and knowledge about the disk access pattern are valuable for storage system energy savings techniques. Storage servers that support applications that stream media is one key area that would benefit from our strategies.
Troubleshooting Complex Performance issues - Oracle SEG$ contentionTanel Poder
From Tanel Poder's Troubleshooting Complex Performance Issues series - an example of Oracle SEG$ internal segment contention due to some direct path insert activity.
Lecture 7: Introduction to Quantum Chemical Simulation graduate course taught at MIT in Fall 2014 by Heather Kulik. This course covers: wavefunction theory, density functional theory, force fields and molecular dynamics and sampling.
Working in Web Operations means dealing with production systems that in most cases needs to be operational 24×7x365.
To reach 99.99999% uptime, you must fail as little as possible.
This talk will go through a few real-world incidents and failures experienced by our small WebOps team, and outline what we are learning (the hard way), and how we’re trying to improve.
What could possibly go wrong? :-)
Oracle Database performance tuning using oratopSandesh Rao
Oratop is a text-based user interface tool for monitoring basic database operations in real-time. This presentation will go into depth on how to use the tool and some example scenarios. It can be used for both RAC and single-instance databases and in combination with top to get a more holistic view of system performance and identify any bottlenecks.
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and GPU instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
keynote by Brendan Gregg. "Traditional performance monitoring makes do with vendor-supplied metrics, often involving interpretation and inference, and with numerous blind spots. Much in the field of systems performance is still living in the past: documentation, procedures, and analysis GUIs built upon the same old metrics. Modern BSD has advanced tracers and PMC tools, providing virtually endless metrics to aid performance analysis. It's time we really used them, but the problem becomes which metrics to use, and how to navigate them quickly to locate the root cause of problems.
There's a new way to approach performance analysis that can guide you through the metrics. Instead of starting with traditional metrics and figuring out their use, you start with the questions you want answered then look for metrics to answer them. Methodologies can provide these questions, as well as a starting point for analysis and guidance for locating the root cause. They also pose questions that the existing metrics may not yet answer, which may be critical in solving the toughest problems. System methodologies include the USE method, workload characterization, drill-down analysis, off-CPU analysis, chain graphs, and more.
This talk will discuss various system performance issues, and the methodologies, tools, and processes used to solve them. Many methodologies will be discussed, from the production proven to the cutting edge, along with recommendations for their implementation on BSD systems. In general, you will learn to think differently about analyzing your systems, and make better use of the modern tools that BSD provides."
UKOUG version of a presentation trying to establish the sensible limits of parallelism on a couple of hardware configurations. Detailed white paper is at http://oracledoug.com/px_slaves.pdf
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
Testing Persistent Storage Performance in Kubernetes with SherlockScyllaDB
Getting to understand your Kubernetes storage capabilities is important in order to run a proper cluster in production. In this session I will demonstrate how to use Sherlock, an open source platform written to test persistent NVMe/TCP storage in Kubernetes, either via synthetic workload or via variety of databases, all easily done and summarized to give you an estimate of what your IOPS, Latency and Throughput your storage can provide to the Kubernetes cluster.
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...Amazon Web Services
Amazon EC2 provides a broad selection of instance types to accommodate a diverse mix of workloads. In this session, we provide an overview of the Amazon EC2 instance platform, key platform features, and the concept of instance generations. We dive into the current generation design choices of the different instance families, including the General Purpose, Compute Optimized, Storage Optimized, Memory Optimized, and Accelerated Computing (GPU and FPGA) instance families. We also detail best practices and share performance tips for getting the most out of your Amazon EC2 instances.
2. The Scaling Problem “Does the solution scale?” asks if larger versions of the problem (often more data) can be dealt with by a given piece of software. “Scaling” is a loose collection of techniques to improve or implement a solution’s scalability. The choice of techniques depends on the critical resource: cpu, memory or i/o and how easily the task is broken into pieces. This talk focusses on Scaling as it applies to UIMA NLP processing (not withstanding OpenDMAPv2). It is a work in progress.
3. Scaling NLP Processing a file is independent of processing another file:Text in, annotations out. Multi-threaded More than one thread of execution in one process pipelines share memory and can step on each other. Ex. Stanford crashes because of concurrency issues “was not an issue in 2001” <casProcessors casPoolSize=“4" processingUnitThreadCount=“2"> Multi-process Separate JVM’s, each with a single thread Memory is not shared, no crushed toes <casProcessors casPoolSize="3" processingUnitThreadCount=“1"> Overhead of repeated JVM and pipeline does cost, but it works. Many machines More memory, more cores Independence means they won’t miss being on the same machine Independent machines (Cluster) are cheaper than integrated (Enki)
4. Hardware Local Cluster (Colfax) A rack of machines with software (SGE) to integrate Integrated CPUs (Enki) Much like a rack, but motherboards are tied together and can share memory Gigabit ethernet delivers on the order of 300Mb/sec Motherboard runs up to 4.8GB/sec Virtual Cluster Virtualization software allows for a single machine to appear as many, offers flexibility, security Cloud A virtual cluster on the net: Amazon EC2
5. Hardware: CCP’s Colfax Cluster Runs Linux (Fedora/Red Hat) 6 machines (amc-colfax, amc-colfaxnd[1-5]) 2 cpus (Intel), 4 cores each, 48 cores total Intel motherboard 16GB memory each, 96 GB total 5TB shared (over NFS) disk array, RAID5 Named after the assembler: Colfax International
6. (Sun|Oracle) Grid Engine (SGE) Manages a queue of jobs, optimizing resources utilization Starts individual processes for a job Often used with Message Passing Interface (MPI) for processes that cooperate Used here to start “Array Jobs” Each job processes a portion of a large array of work to be done.
7. SGE Job An SGE job is a script and a command line Command line specifies resources for scheduling Memory others Script is run once for each process started Is not pure shell, but more/less a shell script (next slide) Job is assigned an ID number
8. more/less a shell script? Put these lines at top for SGE: #$ -N stanford_out Standard out goes to a file with this prefix #$ -S /bin/bash The shell to use (no “she-bang”: #!/bin/sh) #$ -cwd Runs from the current directory #$ -j y Merge stdout and stderr to one file
9. Submit a Job: qsub Qsub –t 1-200000:20000 sge_stanford_out.sh -t Index Range Do array items from 1 to 200 thousand, by 20k: 10 processes Do this with the sge_stanford_out.sh script How does the script know what files to process? $SGE_TASK_ID (first file number to run) $SGE_TASK_STEPSIZE A task will get values of 0,19999,20000 for example
10. Sge_stanford_out.sh Will evolve into generic UIMA job submission script Script modifies a template CPE file, creates a CPE for each process CPE specifies starting document number and number to process http://wikis.sun.com/display/gridengine62u2/How+to+Submit+an+Array+Job+From+the+Command+Line [roederc@amc-colfax sge_scripts]$ qsub -t 1-50:3 sge_stanford_out.sh Your job-array 130.1-50:3 ("stanford_out") has been submitted
11. qstat [roederc@amc-colfax sge_scripts]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 130 0.00000 stanford_o roederc qw 11/02/2010 12:39:01 1 1-49:3 [roederc@amc-colfax sge_scripts]$ qmon [roederc@amc-colfax sge_scripts]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd4.ucdenver.p 1 4 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd2.ucdenver.p 1 7 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd5.ucdenver.p 1 10 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd3.ucdenver.p 1 13 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd1.ucdenver.p 1 16 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd5.ucdenver.p 1 19 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd2.ucdenver.p 1 22 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd4.ucdenver.p 1 25 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfax.ucdenver.pvt 1 28 130 0.55500 stanford_o roederc r 11/02/2010 12:39:10 all.q@amc-colfaxnd3.ucdenver.p 1 31
13. Failures? Q:What if a job fails? (A: it stops) Open problem For now, that process dies leaving unprocessed jobs Need to cull unprocessed files and try again Usually not enough memory Future: db-driven collection reader with cas-consumer that reports completion
14. Example 1: Distribute a simple script on cluster: Test_sge.sh Qsub test_sge.sh Runs it once Qsub test_sge.sh –t 1-5:1 Runs it five times Qsub test_sge.sh –t 100-500:100 Also runs it five times Gives index starts spaced by 100
15. Example 2:Run UIMA on Cluster Sge_stanford_out.sh: Calls a script with a template CPE and index range: run_cpe_cluster_stanford_out.sh Modifies CPE template, creating a CPE for each sub-range Sets up environment, calls SimpleRunCPE (java) Note temp_cpe_<n>.xml in ../desc/cpe Start a number of terminals, run “top” in each to see cpu and memory usage.
16. Hadoop Inspired by Lisp’s map/reduce Map: apply a function to each element of a hash Reduce: combine hashes into one Known for optimizing by moving processing rather than data Similar code used by Google. Hadoop is open source, used by Yahoo, Amazon. Specialized interfaces make it more suited to greenfield development
17. What about “The Cloud” Amazon’s Elastic Compute Cloud (EC2) is a cluster on the internet that can be rented by the hour Very Dynamic Set up nodes when you start using them Expect them to dissapper when you stop Must have machine configuration management sussed. You have to re-install everything. Use S3 for long-term storage Starts at $0.10/hour