Php Site Optimization

•Download as ODP, PDF•

1 like•868 views

PHP site optimization can improve performance by 10-15% through techniques like using Nginx + APC + caching + Memcache + CDN. Database performance is key and can be optimized through simplifying queries, optimization, replication and sharding. Network latency issues can be identified through slow page responses and CPU usage, and addressed through techniques like persistent connections and reducing the number of hops. IO contention and high CPU utilization should also be addressed.

PHP site optimization

Demonstrating a Zend framework website
performance improvement and optimization.

Areas of improvement
Application Framework
Database
Network Latency
IO Contention
CPU Utilization
Network Connectivity
Others

Performance or Scalability. Both?
Hiphop (Facebook implementation of php) is high
performing but not scalable

Workload distribution of a page
request
10% web server
40% php processing
50% database response

10-15% improvement easily achievable using a
commonly used alternative stack i.e. Nginx +
APC + Htmlcaching + Memcache + CDN

Database performance
Query simplification
Query optimization
Schema tuning

Replication
Sharding
Clustering

Network Latency
How to identify?
When both page response and CPU usage are
slow
Reasons?
DNS Reverse Lookups, TCP Handshakes, High
number of hops
Tools?
Tcpdump, ping, traceroute ..

DNS Reverse Lookups
Use ipaddress wherever possible

TCP Handshakes
Use persistent connections to remote/local-
network services wherever possible

Number of Hops
Try to put servers on same switch or in the same
LAN.
Physical layer and network layer trade offs to be
avoided

ORM options
Active Record (made popular by Ruby on Rails)
Data Mapper
Collection
Doctrine
Data access methods used by Yii, CI,
Symphony ...

Active Record (AR)
It is an object-relational mapping and object-
persistence pattern
It binds a business object with a relational record
(row)
AR class maps to a db table (or view)
AR instance maps to a db table record
AR instance properties maps to db table record
fields
Instance methods act on a specific record

Data Patterns in Zend Framework
Zend_Controller_Front
Singleton
FrontController
Zend_Db_Table
Table Data Gateway
Zend_Log
Factory Method
Adapter

Profiler with Zend_DB
$db = Zend_Db::factory('Pdo_Mysql',
array('host' => '127.0.0.1',
'username' => 'user1',
'password' => 'pwd',
'dbname', 'ex_db',
'profiler' => true));
After query execution using $db:-
$prof = $db->getProfiler();
$prof->getTotalElapsedSecs();
$prof->getTotalNumQueries();

Preg
Stripos is 2 times faster than preg
ctype_alnum is 5 times faster than preg
Casting “if ($var == (int) $var)” is 5 times faster
than preg_match(“/^d*$/”, $var)

Magic methods
__get, __set, __call
Used by Soap, data tables, java objects
Use sparingly and avoid too much recursion

Code Acceleration
APC, Zend Optimizer
Increase performance by 3-4 times

Queue
Queueing is offloading long running tasks to
queuing system
Job Queue
Gearman
Message Queue
ActiveMQ
AWS Simple Queue Service

What not to do
Caching should be the last thing to do. (Focus on
performance optimization)
Avoid LIKE queries. (Try Solr, Zend_Lucene)

What to do
Minimize require_once. Use autoloader
Horizontal architecture is better than vertical
ex. wide inheritance is better than deep level
inheritance
Lazy loading / Load on Demand
Use diagnostic tools
Zend studio profiler, Code Tracing

As part of the Tungsten project, Spark has started an ongoing effort to dramatically improve performance to bring the execution closer to bare metal. In this talk, we’ll go over the progress that has been made so far and the areas we’re looking to invest in next. This talk will discuss the architectural changes that are being made as well as some discussion into how Spark users can expect their application to benefit from this effort. The focus of the talk will be on Spark SQL but the improvements are general and applicable to multiple Spark technologies.

Up and running with pyspark

Krishna Sangeeth KS

Debugging PySpark: Spark Summit East talk by Holden Karau

Spark Summit

Apache Spark is one of the most popular big data projects, offering greatly improved performance over traditional MapReduce models. Much of Apache Spark’s power comes from lazy evaluation along with intelligent pipelining, which can make debugging more challenging. This talk will examine how to debug Apache Spark applications, the different options for logging in Spark’s variety of supported languages, as well as some common errors and how to detect them. Spark’s own internal logging can often be quite verbose, and this talk will examine how to effectively search logs from Apache Spark to spot common problems. In addition to the internal logging, this talk will look at options for logging from within our program itself. Spark’s accumulators have gotten a bad rap because of how they interact in the event of cache misses or partial recomputes, but this talk will look at how to effectively use Spark’s current accumulators for debugging as well as a look to future for data property type accumulators which may be coming to Spark in future version. In addition to reading logs, and instrumenting our program with accumulators, Spark’s UI can be of great help for quickly detecting certain types of problems.

Prediction as a service with ensemble model in SparkML and Python ScikitLearn

Josef A. Habdank

Watch the recording of the speech done at Spark Summit Brussles 2016 here: https://www.youtube.com/watch?v=wyfTjd9z1sY Data Science with SparkML on DataBricks is a perfect platform for application of Ensemble Learning on massive a scale. This presentation describes Prediction-as-a-Service platform which can predict trends on 1 billion observed prices daily. In order to train ensemble model on a multivariate time series in thousands/millions dimensional space, one has to fragment the whole space into subspaces which exhibit a significant similarity. In order to achieve this, the vastly sparse space has to undergo dimensionality reduction into a parameters space which then is used to cluster the observations. The data in the resulting clusters is modeled in parallel using machine learning tools capable of coefficient estimation at the massive scale (SparkML and Scikit Learn). The estimated model coefficients are stored in a database to be used when executing predictions on demand via a web service. This approach enables training models fast enough to complete the task within a couple of hours, allowing daily or even real time updates of the coefficients. The above machine learning framework is used to predict the airfares used as support tool for the airline Revenue Management systems.

Scaling massive elastic search clusters - Rafał Kuć - Sematext

Rafał Kuć

Nov HUG 2009: Hadoop Record Reader In Python

Yahoo Developer Network

SparkR - Play Spark Using R (20160909 HadoopCon)

wqchen

Back in ye olde days of Spark, using Python with Spark was an exercise in patience. Data was moving up and down from Python to Scala, being serialised constantly. Leveraging SparkSQL and avoiding UDFs made things better, as well as the constant improvement of the optimisers (Catalyst and Tungsten). But, with Spark 2.3 PySpark has speed up tremendously thanks to the (still experimental) addition of the Arrow serialisers. In this talk we will learn how PySpark has improved its performance in Apache Spark 2.3 by using Apache Arrow. To do this, we will travel through the internals of Spark to find how Python interacts with the Scala core, and some of the internals of Pandas to see how data moves from Python to Scala via Arrow. https://github.com/rberenguel/pyspark-arrow-pandas

Getting The Best Performance With PySpark

Spark Summit

Spark Summit EU talk by Ted Malaska

Spark Summit

From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...

Databricks

We can think of an Apache Spark application as the unit of work in complex data workflows. Building a configurable and reusable Apache Spark application comes with its own challenges, especially for developers that are just starting in the domain. Configuration, parametrization, and reusability of the application code can be challenging. Solving these will allow the developer to focus on value-adding work instead of mundane tasks such as writing a lot of configuration code, initializing the SparkSession or even kicking-off a new project. This presentation will describe using code samples a developer’s journey from the first steps into Apache Spark all the way to a simple open-source framework that can help kick-off an Apache Spark project very easy, with a minimal amount of code. The main ideas covered in this presentation are derived from the separation of concerns principle. The first idea is to make it even easier to code and test new Apache Spark applications by separating the application logic from the configuration logic. The second idea is to make it easy to configure the applications, providing SparkSessions out-of-the-box, easy to set-up data readers, data writers and application parameters through configuration alone. The third idea is that taking a new project off the ground should be very easy and straightforward. These three ideas are a good start in building reusable and production-worthy Apache Spark applications. The resulting framework, spark-utils, is already available and ready to use as an open-source project, but even more important are the ideas and principles behind it.

Redis and Bloom Filters - Atlanta Java Users Group 9/2014

Christopher Curtin

Amazon (AWS) Aurora

PGConf APAC

Fine Tuning and Enhancing Performance of Apache Spark Jobs

Databricks

Which DBMS and Why?

Majid Azimi

Amazon elastic map reduceOlga Lavrentieva

Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung

Spark Summit

R is a very popular platform for Data Science. Apache Spark is a highly scalable data platform. How could we have the best of both worlds? How could a Data Scientist leverage the rich 9000+ packages on CRAN, and integrate Spark into their existing Data Science toolset? In this talk we will walkthrough many examples how several new features in Apache Spark 2.x will enable this. We will also look at exciting changes in and coming next in Apache Spark 2.x releases.

3 avro hug-2010-07-21Hadoop User Group

SparkR: Enabling Interactive Data Science at Scale on HadoopDataWorks Summit

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Databricks

Apache Spark is a fast and flexible compute engine for a variety of diverse workloads. Optimizing performance for different applications often requires an understanding of Spark internals and can be challenging for Spark application developers. In this session, learn how Facebook tunes Spark to run large-scale workloads reliably and efficiently. The speakers will begin by explaining the various tools and techniques they use to discover performance bottlenecks in Spark jobs. Next, you’ll hear about important configuration parameters and their experiments tuning these parameters on large-scale production workload. You’ll also learn about Facebook’s new efforts towards automatically tuning several important configurations based on nature of the workload. The speakers will conclude by sharing their results with automatic tuning and future directions for the project.ing several important configurations based on nature of the workload. We will conclude by sharing our result with automatic tuning and future directions for the project.

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Amazon Web Services

"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way. In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."

Zend Con 2008 Slides

mkherlakian

What's hot

Dive into PySpark

Mateusz Buśkiewicz

PySpark in practice slides

Dat Tran

PySpark Best Practices

Cloudera, Inc.

London Spark Meetup Project Tungsten Oct 12 2015

Chris Fregly

Cascalog internal dsl_presoHadoop User Group

H2O World - Munging, modeling, and pipelines using Python - Hank Roark

Sri Ambati

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...

Lucidworks

Speeding up PySpark with Arrow

Rubén Berenguel

Getting The Best Performance With PySpark

Spark Summit

Spark Summit EU talk by Ted Malaska

Spark Summit

From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...

Databricks

Redis and Bloom Filters - Atlanta Java Users Group 9/2014

Christopher Curtin

Amazon (AWS) Aurora

PGConf APAC

Fine Tuning and Enhancing Performance of Apache Spark Jobs

Databricks

Which DBMS and Why?

Majid Azimi

Amazon elastic map reduceOlga Lavrentieva

Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung

Spark Summit

3 avro hug-2010-07-21Hadoop User Group

SparkR: Enabling Interactive Data Science at Scale on HadoopDataWorks Summit

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Databricks

What's hot (20)

Dive into PySpark

PySpark in practice slides

PySpark Best Practices

London Spark Meetup Project Tungsten Oct 12 2015

Cascalog internal dsl_preso

H2O World - Munging, modeling, and pipelines using Python - Hank Roark

High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...

Speeding up PySpark with Arrow

Getting The Best Performance With PySpark

Spark Summit EU talk by Ted Malaska

From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...

Redis and Bloom Filters - Atlanta Java Users Group 9/2014

Amazon (AWS) Aurora

Fine Tuning and Enhancing Performance of Apache Spark Jobs

Which DBMS and Why?

Amazon elastic map reduce

Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung

3 avro hug-2010-07-21

SparkR: Enabling Interactive Data Science at Scale on Hadoop

Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia

Similar to Php Site Optimization

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Amazon Web Services

Zend Con 2008 Slides

mkherlakian

Do you queue (updated)10n Software, LLC

Hibernate in Nutshell

Onkar Deshpande

Introduction to Apache Flink - Fast and reliable big data processing

Till Rohrmann

This presentation introduces Apache Flink, a massively parallel data processing engine which currently undergoes the incubation process at the Apache Software Foundation. Flink's programming primitives are presented and it is shown how easily a distributed PageRank algorithm can be implemented with Flink. Intriguing features such as dedicated memory management, Hadoop compatibility, streaming and automatic optimisation make it an unique system in the world of Big Data processing.

Emerging technologies /frameworks in Big Data

Rahul Jain

Reactive Java Programming: A new Asynchronous Database Access API by Kuassi M...

Oracle Developers

SnappyData overview NikeTechTalk 11/19/15

SnappyData

Slideshare - Magento Imagine - Do You Queue10n Software, LLC

PHP Performance: Principles and tools10n Software, LLC

Eagle from eBay at China Hadoop Summit 2015

Hao Chen

Secure Hadoop Cluster With Kerberos

Edureka!

Distributed Applications with Apache Zookeeper

Alex Ehrnschwender

Apache Eagle - Monitor Hadoop in Real Time

DataWorks Summit/Hadoop Summit

Magento's Imagine eCommerce Conference: Do You Queue?varien

Nike tech talk.2

Jags Ramnarayan

Intro to-html-backbone

zonathen

Node.js and Cassandra

Stratio

Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...

javier ramirez

How would you build a database to support sustained ingestion of several hundreds of thousands rows per second while running near real-time queries on top? In this session I will go over some of the technical decisions and trade-offs we applied when building QuestDB, an open source time-series database developed mainly in JAVA, and how we can achieve over four million row writes per second on a single instance without blocking or slowing down the reads. There will be code and demos, of course. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.

SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...

Sencha

LinkRest is an active project commissioned and open sourced by NHL under Apache 2.0 license. It was conceived specifically as the server-side counterpart to Ext JS, so it natively supports Ext JS rest proxy, including CRUD, grouping, sorting, filtering, and more. In this session, we'll review Ext JS protocol support. You'll also see extensions that allow the client to safely query the server for specific data and shape the response. We'll cover advanced features, including security, constraints, idempotent methods, and metadata service. We'll show demos and try to allow time for Q&A.

Similar to Php Site Optimization (20)

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Zend Con 2008 Slides

Do you queue (updated)

Hibernate in Nutshell

Introduction to Apache Flink - Fast and reliable big data processing

Emerging technologies /frameworks in Big Data

Reactive Java Programming: A new Asynchronous Database Access API by Kuassi M...

SnappyData overview NikeTechTalk 11/19/15

Slideshare - Magento Imagine - Do You Queue

PHP Performance: Principles and tools

Eagle from eBay at China Hadoop Summit 2015

Secure Hadoop Cluster With Kerberos

Distributed Applications with Apache Zookeeper

Apache Eagle - Monitor Hadoop in Real Time

Magento's Imagine eCommerce Conference: Do You Queue?

Nike tech talk.2

Intro to-html-backbone

Node.js and Cassandra

Ingesting Over Four Million Rows Per Second With QuestDB Timeseries Database ...

SenchaCon 2016: LinkRest - Modern RESTful API Framework for Ext JS Apps - Rou...

Recently uploaded

How world-class product teams are winning in the AI era by CEO and Founder, P...

Product School

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...

Thierry Lestable

FIDO Alliance Osaka Seminar: Overview.pdf

FIDO Alliance

Epistemic Interaction - tuning interfaces to provide information for AI support

Alan Dix

Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024 https://alandix.com/academic/papers/synergy2024-epistemic/ As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Generating a custom Ruby SDK for your web service or Rails API using Smithy

g2nightmarescribd

Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

Product School

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Recently uploaded (20)