HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureMichael Zhang
HKIX upgraded to a two-tier 100Gbps architecture to support continued traffic growth and new connections. This involved migrating to a new facility with more ports and space for 100GbE interfaces. The upgrade helps ensure HKIX can reliably serve as a critical internet infrastructure in Hong Kong and support keeping intra-Asia traffic within the region.
HKIX Upgrade to 100Gbps-Based Two-Tier ArchitectureMichael Zhang
HKIX upgraded to a two-tier 100Gbps architecture to support continued traffic growth and new connections. This involved migrating to a new facility with more ports and space for 100GbE interfaces. The upgrade helps ensure HKIX can reliably serve as a critical internet infrastructure in Hong Kong and support keeping intra-Asia traffic within the region.
Hadoop Hardware @Twitter: Size does matter.Michael Zhang
This document discusses Twitter's experience scaling their Hadoop clusters and evaluating different hardware configurations. It describes how Twitter developed the "Twitter Hadoop Server" (THS) to optimize for different workloads like backups, processing, and cold storage. The THS uses single-socket servers with fewer disks optimized for cost and density. Testing showed the THS outperformed baseline dual-socket servers on processing benchmarks while providing higher storage density and lower costs per node. The document concludes that for large clusters, specialized hardware can improve performance and efficiency compared to a single size-fits-all approach.
CUDA 6.0 provides performance improvements and new features for several CUDA libraries and tools. Key updates include up to 2x faster kernel launches, new cuFFT and cuBLAS features for multi-GPU support, up to 700 GFLOPS performance from cuFFT, over 3 TFLOPS from cuBLAS, and 5x faster cuSPARSE performance compared to MKL. New features also improve the performance of cuRAND, NPP, and Thrust.
This document discusses MediaV's implementation of reliable advertising systems using Docker and Mesos. It describes how the system handles over 10 billion impressions with huge computing needs through containerization. It addresses common problems encountered like debugging, network performance, storage, service discovery, scheduling, and data loading. Frameworks like Marathon and Chronos are used on Mesos for orchestration and batch jobs. Health checks, port resources, and Dockerfile reviews are important practices.
Hadoop con 2015 hadoop enables enterprise data lakeJames Chen
Mobile Internet, Social Media 以及 Smart Device 的發展促成資訊的大爆炸,伴隨產生大量的非結構化及半結構化的資料,不但資料的格式多樣,產生的速度極快,對企業的資訊架構帶來了前所未有的挑戰,面對多樣的資料結構及多樣的分析工具,我們應該採用什麼樣的架構互相整合,才能有效的管理資料生命週期,提取資料價值,Hadoop 生態系統,無疑的在這個大架構裡,將扮演最基礎的資料平台的角色,實現企業的 Data Lake。
Fastsocket is a software that improves the scalability and performance of socket-based applications on multicore systems. It addresses kernel inefficiencies like synchronization overhead that consume over 90% of CPU cycles. Fastsocket introduces techniques like receive flow delivery, local listen/established tables, and a fastsocket-aware VFS to partition resources and process connections locally on each CPU core. In production at SINA, Fastsocket improved HTTP load balancing throughput by 45% on a 16-core system. Future work aims to further optimize performance through techniques like improved interrupt handling and system call batching.
Spark SQL is a module for structured data processing on Spark. It integrates relational processing with Spark's functional programming API and allows SQL queries to be executed over data sources via the Spark execution engine. Spark SQL includes components like a SQL parser, a Catalyst optimizer, and Spark execution engines for queries. It supports HiveQL queries, SQL queries, and APIs in Scala, Java, and Python.
The document discusses network integration considerations for Hadoop data centers. It addresses traffic types, job patterns, network attributes, architecture, availability, capacity, flexibility, management and visibility. It provides examples of buffer usage on switches and recommendations for dual 1GbE or 10GbE NIC configuration for Hadoop servers.
Q con shanghai2013-[ben lavender]-[long-distance relationships with robots]Michael Zhang
The document discusses long distance relationships with robots and maintaining open communication. It describes how the author started their relationship with a robot named Hubot by installing it on GitHub and customizing it with scripts. The author emphasizes values like passion, dedication, responsibility and communication to make the relationship work despite distances of thousands of kilometers. It provides tips for deploying robots, monitoring their operations, and using them to help with work communications and status updates.
Q con shanghai2013-[jains krums]-[real-time-delivery-archiecture]Michael Zhang
The document describes Twitter's real-time delivery architecture. It aims to evolve beyond being just a web stack by isolating responsibilities into components like routing, presentation, logic, storage and retrieval. The architecture uses various services and technologies like the Write API, Ingester, Blender, Hadoop, Redis, and Earlybird to ingest tweets in real-time, cache timelines and search indexes, and deliver tweets to users via pull and push methods. The goal is to improve site speed, reliability, and developer innovation speed.
Q con shanghai2013-[黄舒泉]-[intel it openstack practice]Michael Zhang
This document summarizes an Intel IT presentation on their OpenStack practice. It discusses Intel's contributions to OpenStack projects, their converged OpenStack and IT platform, and their solutions for continuous delivery, deployment, Tempest testing automation, and improving the speed of OpenStack snapshots.
Q con shanghai2013-罗婷-performance methodologyMichael Zhang
This document summarizes a presentation on performance methodology at Salesforce given at QCon Beijing 2014. It discusses:
- The importance of performance for user experience, decreasing costs, and serving more customers.
- Salesforce's dedicated performance team is organized by areas like UI, mobile, platforms, and infrastructure.
- Key performance metrics include response time, throughput, CPU/memory utilization, and database metrics.
- Performance is tested proactively via feature and regression tests, and passively via production analysis. Automated testing uses tools like JMeter and internal frameworks. Profilers like Yourkit and HeapAudit help identify causes.
This document provides an overview of Hive and its performance capabilities. It discusses Hive's SQL interface for querying large datasets stored in Hadoop, its architecture which compiles SQL queries into MapReduce jobs, and its support for SQL semantics and datatypes. The document also covers techniques for optimizing Hive performance, including data abstractions like partitions, buckets and skews. It describes different join strategies in Hive like shuffle joins, broadcast joins and sort-merge bucket joins and how they are implemented in MapReduce. The overall presentation aims to explain how Hive provides scalable SQL processing for big data.
This document provides an overview of the DRAM module market and discusses various module configurations for different applications. It describes the transition from DDR1 to DDR2 and upcoming shift to DDR3. Key markets discussed include personal computers, servers, networking equipment, and peripherals. For servers, it notes ongoing debate around using fully buffered DIMMs (FB-DIMMs) versus registered DIMMs (RDIMMs). New module formats like mini-RDIMMs and 72b SO-RDIMMs are presented as solutions for networking routers and other embedded applications.
Percona live linux filesystems and my sqlMichael Zhang
The document discusses Linux filesystems and MySQL tuning. It begins with an overview of basic Linux IO concepts like directory structure, LVM, RAID, SSDs, and filesystem concepts. It then covers filesystem choices and best practices for MySQL tuning, benchmarks, and AWS EC2 deployments. The goal is to provide optimization strategies for storing MySQL data and logs on Linux filesystems.
Velocity china2012kit life on edge —— 如何使用 esi 完成任务Michael Zhang
The document discusses Edge Server Insertion (ESI), which allows a web page to be dynamically assembled from components hosted on different servers to improve performance. It provides examples of how ESI can be used for content assembly, automatic fallback of modules, and timely launch of new features. Key benefits of ESI include increased caching, better availability, and higher throughput. Future potential enhancements to ESI discussed include manipulating requests and responses for includes as well as more advanced assembly techniques.
This document discusses ways to improve TCP performance for web applications. It proposes TCP Fast Open to eliminate the TCP handshake, increasing initial congestion window (IW10) to speed bulk transfers, and Tail Loss Probe to quickly recover from losses. It also discusses the need for new mobile congestion control given fluctuating wireless rates. The author provides status on implementations for Fast Open, IW10 and plans to open source a Tail Loss Probe and mobile congestion control protocol in 2013 after further research. The overall goal is to optimize TCP for the low-latency, short transfers common on the modern web.
The document discusses using YSlow to measure website performance. It provides information on using YSlow from the command line and integrating it with continuous integration workflows. Key points include:
- YSlow can analyze HAR files from the command line to generate performance metrics and scores. It has options to customize the output format, information displayed, and ruleset used.
- YSlow results can be logged to a URL for monitoring. Integrating YSlow with PhantomJS allows running performance tests from scripts.
- Continuous integration of YSlow analyses allows catching performance regressions early. Combining it with real user monitoring and WebPageTest gives a comprehensive performance testing suite.