The document summarizes a presentation about scaling Pinterest's architecture. It discusses how Pinterest uses sharding across multiple MySQL databases to scale its infrastructure. Key points include how Pinterest generates IDs across shards, loads data, handles caching, and has transitioned its database structure over time to support increased load.
A Morning with MongoDB Barcelona: MongoDB Basic ConceptsMongoDB
The document discusses MongoDB concepts including:
- Replication allows for data availability across nodes through multiple copies spread across data centers and automatic failover.
- Sharding provides horizontal scalability by distributing data across nodes in a transparent way and automatically redistributing as needed.
- MongoDB offers both eventual and immediate consistency, with the latter avoiding conflicts by limiting updates to a single master node.
- Durability options include fire and forget writes, getting the last error, journal syncing, and replica set acknowledgment.
This document provides an overview of MongoDB and how it compares to traditional RDBMS systems. It discusses key concepts like how MongoDB uses JSON-like documents with flexible schemas rather than rigid tables. It also covers CRUD operations, indexing, replication, and how data is stored and queries in MongoDB compared to SQL.
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMongoDB
The document discusses key concepts of MongoDB including its document-oriented data model, replication for high availability and backups, horizontal scalability through sharding of data across multiple nodes, and flexibility in schema design. It also covers consistency and durability options in MongoDB as well as the roadmap for upcoming features like authentication and improved indexing.
Facilitando a Programação concorrente com o Fork/JoinMario Amaral
The document discusses parallelizing tasks using fork/join in Java. It describes how fork/join works by dividing tasks into smaller subtasks (fork), running the subtasks in parallel, and then joining the results. An example implementation of a JsonTask class is shown that extends RecursiveTask and overrides the compute method to either directly process small lists or divide larger lists into sublists and fork new tasks to process them in parallel.
This document provides an overview of MongoDB, Java, and Spring Data. It discusses how MongoDB is a document-oriented NoSQL database that uses JSON-like documents with dynamic schemas. It describes how the Java driver can be used to interact with MongoDB to perform CRUD operations. It also explains how Spring Data provides an abstraction layer over the Java driver and allows for object mapping and repository-based queries to MongoDB.
This document discusses JNDI (Java Naming and Directory Interface) and how it provides a common way to access distributed and local resources for enterprise applications. It covers the basics of JNDI, including getting an initial context, looking up objects in the JNDI tree, adding and removing objects, and using JNDI for local data access and singletons. It also discusses how JNDI can be used with CORBA objects and for narrowing object references.
ENIB 2015-2016 - CAI Web - S01E01- MongoDB and NoSQLHoracio Gonzalez
This document provides an introduction and overview of MongoDB and NoSQL databases. It discusses key MongoDB concepts like document databases, collections and documents. It also covers how to install and run MongoDB, insert and query data, and use common operations. Some examples show how to create indexes, use JavaScript and regex queries. Exercises at the end propose practicing installing MongoDB, creating a collection and querying data.
A Morning with MongoDB Barcelona: MongoDB Basic ConceptsMongoDB
The document discusses MongoDB concepts including:
- Replication allows for data availability across nodes through multiple copies spread across data centers and automatic failover.
- Sharding provides horizontal scalability by distributing data across nodes in a transparent way and automatically redistributing as needed.
- MongoDB offers both eventual and immediate consistency, with the latter avoiding conflicts by limiting updates to a single master node.
- Durability options include fire and forget writes, getting the last error, journal syncing, and replica set acknowledgment.
This document provides an overview of MongoDB and how it compares to traditional RDBMS systems. It discusses key concepts like how MongoDB uses JSON-like documents with flexible schemas rather than rigid tables. It also covers CRUD operations, indexing, replication, and how data is stored and queries in MongoDB compared to SQL.
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMongoDB
The document discusses key concepts of MongoDB including its document-oriented data model, replication for high availability and backups, horizontal scalability through sharding of data across multiple nodes, and flexibility in schema design. It also covers consistency and durability options in MongoDB as well as the roadmap for upcoming features like authentication and improved indexing.
Facilitando a Programação concorrente com o Fork/JoinMario Amaral
The document discusses parallelizing tasks using fork/join in Java. It describes how fork/join works by dividing tasks into smaller subtasks (fork), running the subtasks in parallel, and then joining the results. An example implementation of a JsonTask class is shown that extends RecursiveTask and overrides the compute method to either directly process small lists or divide larger lists into sublists and fork new tasks to process them in parallel.
This document provides an overview of MongoDB, Java, and Spring Data. It discusses how MongoDB is a document-oriented NoSQL database that uses JSON-like documents with dynamic schemas. It describes how the Java driver can be used to interact with MongoDB to perform CRUD operations. It also explains how Spring Data provides an abstraction layer over the Java driver and allows for object mapping and repository-based queries to MongoDB.
This document discusses JNDI (Java Naming and Directory Interface) and how it provides a common way to access distributed and local resources for enterprise applications. It covers the basics of JNDI, including getting an initial context, looking up objects in the JNDI tree, adding and removing objects, and using JNDI for local data access and singletons. It also discusses how JNDI can be used with CORBA objects and for narrowing object references.
ENIB 2015-2016 - CAI Web - S01E01- MongoDB and NoSQLHoracio Gonzalez
This document provides an introduction and overview of MongoDB and NoSQL databases. It discusses key MongoDB concepts like document databases, collections and documents. It also covers how to install and run MongoDB, insert and query data, and use common operations. Some examples show how to create indexes, use JavaScript and regex queries. Exercises at the end propose practicing installing MongoDB, creating a collection and querying data.
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQLHoracio Gonzalez
This document provides an introduction and overview of MongoDB and how to use it. Some key points:
- MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling. It uses documents (similar to JSON objects) rather than tables and rows.
- Documents are stored in collections without a predefined schema. Fields can be added, modified or deleted at any time.
- Common operations include inserting, querying, updating, and removing documents from collections. Queries can use filters, projections, sorting, skips, limits, and regular expressions.
- MongoDB is flexible compared to relational databases as schemas are not rigidly defined. It is suitable for high performance applications that need
Couchbase Korea User Group 2nd Meetup #2won min jang
The document discusses developing with JSON documents in Couchbase. It covers working with JSON documents, schemas and relationships between documents. Key points include:
- JSON documents map closely to objects and entities, allowing CRUD operations with a lightweight schema.
- Documents can be linked together through relationships, and data can be embedded or fetched from multiple documents.
- Couchbase is a document database that stores data as JSON documents with dynamic schemas driven by application code rather than rigid database schemas.
- Documents can be linked together through relationships between their unique IDs, allowing data to be distributed across multiple documents for performance and scalability.
The document discusses the problems with using plain SQL queries, including that they are not type-safe and errors cannot be caught until runtime. It introduces QueryDSL as an alternative that allows type-safe queries in Java. QueryDSL converts query objects to SQL and can catch errors during compilation rather than runtime. The document suggests using QueryDSL with Spring Data JPA to take advantage of these benefits when querying data.
This document provides an overview of using Hibernate, an open source Java object-relational mapping tool. It discusses typical problems with JDBC code, how Hibernate addresses these issues, and provides sample code for mapping Java objects to database tables and performing queries and transactions with Hibernate. The document also outlines steps for a basic Hibernate tutorial, including database design, mapping classes, configuration, and data access objects.
This document discusses naming and directory services, including JNDI. JNDI provides a way for Java clients to access naming and directory services to lookup and locate remote services and components. A naming service contains objects and references mapped to names, while a directory service associates attributes with bindings. Popular directory services include LDAP, NDS, NIS+, RMI Registry, CORBA Naming Service, and DNS. JNDI configuration can be difficult but is simplified when using an application server which automatically starts required services.
After a short introduction to the Java driver for MongoDB, we'll have a look at the more abtract persistence frameworks like Morphia, Spring Data, Jongo and Hibernate OGM.
Morphia: Simplifying Persistence for Java and MongoDBJeff Yemin
The document describes Morphia, an object document mapper for Java that simplifies working with MongoDB. It outlines Morphia's key features like model mapping with annotations, query and update APIs, and support for references and embedded objects. Examples are provided of defining entity classes with annotations, performing queries, updates, and indexing. The document recommends Morphia for working with MongoDB on the JVM and provides resources for learning more.
The document discusses OrientDB, a multi-model NoSQL database that supports document, key-value, and graph structures. It highlights several of OrientDB's features, including its support for relationships without joins, complex types, ACID transactions, and its RESTful HTTP interface. The document also briefly describes OrientDB's indexing, security, multi-master replication, and use of a graph database model.
This document discusses CouchDB and CouchDBKit, a Python framework for CouchDB. It introduces CouchDB as a document-oriented, RESTful database with map-reduce capabilities. CouchDBKit provides a simple Python client for CouchDB that supports schemas, Django integration, and syncing design documents from files. The document demonstrates how to perform basic operations with the CouchDB and CouchDBKit clients.
MongoDB + Java - Everything you need to know Norberto Leite
Learn everything you need to know to get started building a MongoDB-based app in Java. We'll explore the relationship between MongoDB and various languages on the Java Virtual Machine such as Java, Scala, and Clojure. From there, we'll examine the popular frameworks and integration points between MongoDB and the JVM including Spring Data and object-document mappers like Morphia.
This document provides a UML class diagram showing the relationships between classes in the StoreKit framework for making in-app purchases in iOS 6. The diagram shows that SKProductsRequest and SKProductsResponse are used to request product information from the App Store. SKPayment and SKPaymentTransaction represent purchase transactions, and SKPaymentQueue manages transactions and can restore purchases. SKStoreProductViewController displays product information to users.
The document provides information about a talk on Java persistence frameworks for MongoDB given at MongoDB Berlin 2013. It discusses MongoDB Java Driver, Spring Data MongoDB, Morphia, and Hibernate OGM as frameworks for connecting Java applications to MongoDB. The talk covers connecting to MongoDB from Java, mapping objects to documents, and repository support features of the frameworks.
This is a talk I presented at University Limerick to give people an introduction into CouchDB.
What is it? How does it generally work? Introducing new concepts, etc.
This document summarizes a lecture about data in iPhone applications. The lecture covers several topics: storing and loading local data using property lists, the iPhone's file system, archiving objects, SQLite databases, and accessing remote data via JSON and push notifications. It provides examples of writing arrays and dictionaries to property list files, explains the iPhone's application home directory structure, and gives an overview of archiving objects and using the SQLite database API.
Just a few simple slides for a presentation at Kraków's Scala User Group. The rest of the session is live coding where we reimplement a tiny subset of Rogue, Foursquares MongoDB DSL.
MongoDB is the coolest NoSQL DB around, partially because it's simple-by-design philosophy. Without transactions or joins, all the bad vibe of SQL is gone.
In this presentation I demonstrate how to get started with MongoDB in the cloud using Mon
This document provides an overview of Morphia, a Java object mapping library for MongoDB. It discusses Morphia's advantages over using raw MongoDB drivers, including type safety and the ability to work with POJOs rather than generic maps. Key features covered include annotation-based mapping of entities to collections, lifecycle callbacks, queries, updates, and support for relationships and object graphs. The document aims to demonstrate how Morphia simplifies common data access patterns while retaining performance.
The document summarizes a lecture on iPhone application development that covered custom classes, object lifecycles, and properties. Specifically, it discussed creating custom classes with properties and methods, allocating and initializing objects, and the two-step process of object creation involving allocation and initialization. It also mentioned implementing init methods and calling superclass methods.
This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability.
I gave this talk at TheEdge conference.
Cisco is a global leader in networking with $40 billion in annual revenue. They produce over 1,000 videos per year for marketing and education on their website, YouTube channel, and other social media. Video is becoming an increasingly important content format, as viewership of online video continues to grow dramatically. Cisco finds that videos can help facilitate the customer purchasing process when used strategically at different stages from awareness to purchase.
OM Plus Report Manager software from Plus Technologies can help you reduce the productivity and monetary costs associated with paper-based reports. This presentation examines the features and advantages (including case studies) of OM Plus RM.
ENIB 2015 2016 - CAI Web S02E03 - Forge JS 2/4 - MongoDB and NoSQLHoracio Gonzalez
This document provides an introduction and overview of MongoDB and how to use it. Some key points:
- MongoDB is an open-source document database that provides high performance, high availability, and automatic scaling. It uses documents (similar to JSON objects) rather than tables and rows.
- Documents are stored in collections without a predefined schema. Fields can be added, modified or deleted at any time.
- Common operations include inserting, querying, updating, and removing documents from collections. Queries can use filters, projections, sorting, skips, limits, and regular expressions.
- MongoDB is flexible compared to relational databases as schemas are not rigidly defined. It is suitable for high performance applications that need
Couchbase Korea User Group 2nd Meetup #2won min jang
The document discusses developing with JSON documents in Couchbase. It covers working with JSON documents, schemas and relationships between documents. Key points include:
- JSON documents map closely to objects and entities, allowing CRUD operations with a lightweight schema.
- Documents can be linked together through relationships, and data can be embedded or fetched from multiple documents.
- Couchbase is a document database that stores data as JSON documents with dynamic schemas driven by application code rather than rigid database schemas.
- Documents can be linked together through relationships between their unique IDs, allowing data to be distributed across multiple documents for performance and scalability.
The document discusses the problems with using plain SQL queries, including that they are not type-safe and errors cannot be caught until runtime. It introduces QueryDSL as an alternative that allows type-safe queries in Java. QueryDSL converts query objects to SQL and can catch errors during compilation rather than runtime. The document suggests using QueryDSL with Spring Data JPA to take advantage of these benefits when querying data.
This document provides an overview of using Hibernate, an open source Java object-relational mapping tool. It discusses typical problems with JDBC code, how Hibernate addresses these issues, and provides sample code for mapping Java objects to database tables and performing queries and transactions with Hibernate. The document also outlines steps for a basic Hibernate tutorial, including database design, mapping classes, configuration, and data access objects.
This document discusses naming and directory services, including JNDI. JNDI provides a way for Java clients to access naming and directory services to lookup and locate remote services and components. A naming service contains objects and references mapped to names, while a directory service associates attributes with bindings. Popular directory services include LDAP, NDS, NIS+, RMI Registry, CORBA Naming Service, and DNS. JNDI configuration can be difficult but is simplified when using an application server which automatically starts required services.
After a short introduction to the Java driver for MongoDB, we'll have a look at the more abtract persistence frameworks like Morphia, Spring Data, Jongo and Hibernate OGM.
Morphia: Simplifying Persistence for Java and MongoDBJeff Yemin
The document describes Morphia, an object document mapper for Java that simplifies working with MongoDB. It outlines Morphia's key features like model mapping with annotations, query and update APIs, and support for references and embedded objects. Examples are provided of defining entity classes with annotations, performing queries, updates, and indexing. The document recommends Morphia for working with MongoDB on the JVM and provides resources for learning more.
The document discusses OrientDB, a multi-model NoSQL database that supports document, key-value, and graph structures. It highlights several of OrientDB's features, including its support for relationships without joins, complex types, ACID transactions, and its RESTful HTTP interface. The document also briefly describes OrientDB's indexing, security, multi-master replication, and use of a graph database model.
This document discusses CouchDB and CouchDBKit, a Python framework for CouchDB. It introduces CouchDB as a document-oriented, RESTful database with map-reduce capabilities. CouchDBKit provides a simple Python client for CouchDB that supports schemas, Django integration, and syncing design documents from files. The document demonstrates how to perform basic operations with the CouchDB and CouchDBKit clients.
MongoDB + Java - Everything you need to know Norberto Leite
Learn everything you need to know to get started building a MongoDB-based app in Java. We'll explore the relationship between MongoDB and various languages on the Java Virtual Machine such as Java, Scala, and Clojure. From there, we'll examine the popular frameworks and integration points between MongoDB and the JVM including Spring Data and object-document mappers like Morphia.
This document provides a UML class diagram showing the relationships between classes in the StoreKit framework for making in-app purchases in iOS 6. The diagram shows that SKProductsRequest and SKProductsResponse are used to request product information from the App Store. SKPayment and SKPaymentTransaction represent purchase transactions, and SKPaymentQueue manages transactions and can restore purchases. SKStoreProductViewController displays product information to users.
The document provides information about a talk on Java persistence frameworks for MongoDB given at MongoDB Berlin 2013. It discusses MongoDB Java Driver, Spring Data MongoDB, Morphia, and Hibernate OGM as frameworks for connecting Java applications to MongoDB. The talk covers connecting to MongoDB from Java, mapping objects to documents, and repository support features of the frameworks.
This is a talk I presented at University Limerick to give people an introduction into CouchDB.
What is it? How does it generally work? Introducing new concepts, etc.
This document summarizes a lecture about data in iPhone applications. The lecture covers several topics: storing and loading local data using property lists, the iPhone's file system, archiving objects, SQLite databases, and accessing remote data via JSON and push notifications. It provides examples of writing arrays and dictionaries to property list files, explains the iPhone's application home directory structure, and gives an overview of archiving objects and using the SQLite database API.
Just a few simple slides for a presentation at Kraków's Scala User Group. The rest of the session is live coding where we reimplement a tiny subset of Rogue, Foursquares MongoDB DSL.
MongoDB is the coolest NoSQL DB around, partially because it's simple-by-design philosophy. Without transactions or joins, all the bad vibe of SQL is gone.
In this presentation I demonstrate how to get started with MongoDB in the cloud using Mon
This document provides an overview of Morphia, a Java object mapping library for MongoDB. It discusses Morphia's advantages over using raw MongoDB drivers, including type safety and the ability to work with POJOs rather than generic maps. Key features covered include annotation-based mapping of entities to collections, lifecycle callbacks, queries, updates, and support for relationships and object graphs. The document aims to demonstrate how Morphia simplifies common data access patterns while retaining performance.
The document summarizes a lecture on iPhone application development that covered custom classes, object lifecycles, and properties. Specifically, it discussed creating custom classes with properties and methods, allocating and initializing objects, and the two-step process of object creation involving allocation and initialization. It also mentioned implementing init methods and calling superclass methods.
This is an introductory presentation to Cassandra, the database of choice for high availability and insane scalability.
I gave this talk at TheEdge conference.
Cisco is a global leader in networking with $40 billion in annual revenue. They produce over 1,000 videos per year for marketing and education on their website, YouTube channel, and other social media. Video is becoming an increasingly important content format, as viewership of online video continues to grow dramatically. Cisco finds that videos can help facilitate the customer purchasing process when used strategically at different stages from awareness to purchase.
OM Plus Report Manager software from Plus Technologies can help you reduce the productivity and monetary costs associated with paper-based reports. This presentation examines the features and advantages (including case studies) of OM Plus RM.
This document provides guidance for an art project involving observational drawing and printmaking. It instructs students to choose complex objects to draw from observation using a viewfinder. It encourages experimenting with techniques like distortion and references well-known artists. It outlines developing initial sketches into a final printmaking or 2D art idea through half page thumbnail sketches. The document demonstrates example student work and provides structure for a final printmaking piece or 2D art work.
Sarah and Trevor recently got married. The document is congratulating Sarah and Trevor on their marriage in an enthusiastic manner, using multiple exclamation points to convey excitement. In just 3 words, it acknowledges and celebrates Sarah and Trevor's union.
Peritaje en la construcción de proyectos inmobiliarios 1Magaly Caba
El documento presenta un resumen de los pasos involucrados en realizar un peritaje. Explica que un peritaje implica el examen y estudio de un problema para luego entregar un informe con conclusiones y recomendaciones. Detalla los diferentes temas que pueden abarcar un peritaje como vicios de construcción, incumplimiento de contratos o proyectos de rehabilitación. Además, describe las secciones que suele contener un informe pericial como introducción, objetivos, metodología, conclusiones y anexos.
This document provides an overview of facts about Florida and reasons to visit the state. It lists Florida's nickname, bird, flower, and tree. It also discusses Florida's capital, population, year of statehood, and major ethnic groups. The document outlines Florida's key industries like tourism and agriculture, mentioning crops like citrus, sugarcane, and tomatoes. Finally, it briefly summarizes Florida's history from its earliest Native American inhabitants to becoming a US state in 1845 after being a territory and under colonial rule by Spain and Britain.
The document discusses the band Extra Nena. Extra Nena is a Mexican rock band formed in Mexico City in 1984. They are considered pioneers of alternative rock music in Latin America. Their fusion of rock, punk and new wave styles was influential in the region.
This document provides information about why someone should visit Arkansas. It includes fun facts about Arkansas' nickname, bird, flower, and tree. It highlights some landmarks to visit such as Blanchard Springs Caverns, Crater of Diamonds State Park, and Ozark Folk Center. Additionally, it notes the geography, capital, population, statehood, ethnicities, industries, crops, and a brief history of Arkansas and its Native American population. A proposed itinerary is given spending one day at each of Blanchard Springs Caverns, Crater of Diamonds State Park, and Ozark Folk Center.
The British Board of Film Classification (BBFC) is responsible for classifying films, videos, DVDs, and some video games in the UK. It provides guidelines for what types of content are appropriate at different rating levels. For example, a 'U' rating is suitable for children aged 4 and over and should not include horror, threat, violence or illegal drugs. A '15' rating means no one under 15 can see the film in a cinema; it can contain strong language, violence, and threat but should not dwell on pain or injury details. The filmmaker's piece was rated '15' due to scenes of violence, threat, and imitable dangerous behavior over two minutes including a death and threatened hostage.
The document discusses several men's fashion magazines and their online and print designs. GQ is recognized as a leading men's fashion magazine that maintains consistent branding across its print and website formats. Men's Health focuses on fashion, health, and sports tips and also keeps its print and online designs synced. MFM is an online-only magazine but is designed to mimic page flipping of a print magazine. The New York Times Men's Vogue are print-only publications that provide examples of standard magazine layouts.
This document provides information about visiting the state of Kentucky. It includes sections about state fun facts, geography, basic facts, industries and crops, and a suggested 3-day itinerary for things to do in Kentucky, which includes visiting landmarks, experiencing the culture and history, and enjoying local agriculture.
Mississippi is known as the Father of Waters, with a mockingbird as its state bird and magnolia as its state flower. It has forests and trees along the Mississippi River, and was the 13th state, with Jackson as its capital, known for industries like rice, cotton and soybeans, and activities include hunting, boating, and visiting the capital.
Provide high level technical support, including identifying and resolving problems on Cisco supported products. This included external routing and internal/intranet routing. Work with the data center planning groups, assisting with network capacity and high availability requirements. Review all changes to network configuration for technical accuracy and impact and provide Multi-Protocol Network problem resolutions.
The document introduces C# as the first component-oriented language in the C/C++ family. It discusses key features of C# including everything being an object, robust and durable software, and preservation of investment from other languages. It provides examples of basic C# syntax and programming structures.
The document describes Pinterest's scaling efforts from 2010 to 2012. It started with a single server on Rackspace and grew to use Amazon Web Services with over 100 web servers and database shards on MySQL, Redis, Memcache and other technologies. Key lessons included keeping systems simple initially and that clustering is difficult due to a single point of failure in the cluster management. Pinterest transitioned to manual sharding of MySQL databases to improve scalability while avoiding the complexity of clustering.
Pinterest arch summit august 2012 - scaling pinterestdrewz lin
This document summarizes a presentation on scaling Pinterest. It discusses how Pinterest scaled from using a single small web server and database in 2010 to using Amazon EC2, S3, multiple databases, caches and task queues with over 100 servers and 25 engineers by 2012. It also covers lessons learned around keeping infrastructure simple initially and the challenges of clustering versus sharding databases.
Castle is an open-source project that provides an alternative to the lower layers of the storage stack -- RAID and POSIX filesystems -- for big data workloads, and distributed data stores such as Apache Cassandra.
This presentation from Berlin Buzzwords 2012 provides a high-level overview of Castle and how it is used with Cassandra to improve performance and predictability.
High-Performance Storage Services with HailDB and Javasunnygleason
This document summarizes an approach to providing high-performance storage services using Java and HailDB. It discusses using the optimized "guts" of MySQL without needing to go through JDBC and SQL. It presents HailDB as a storage engine alternative to NoSQL options like Voldemort. It describes integrating HailDB with Java using JNA, building a REST API on top called St8, and examples of nifty applications like graph stores and counters. It concludes with discussing future work like improving packaging, online backup, and exploring JNI bindings.
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
MongoDB is easy to download and run locally but requires some thought and further understanding when deploying to production. At scale, schema design, indexes and query patterns really matter. So does data structure on disk, sharding, replication and data centre awareness. This talk will examine these factors in the context of analytics, and more generally, to help you optimise MongoDB for any scale.
Presented at MongoDB Days London 2013 by David Mytton.
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMongoDB
The document outlines the agenda for a MongoDB conference in Paris. It introduces the presenters from 10gen and partner companies. The agenda includes introductions to NoSQL and MongoDB, technical overviews, customer and partner presentations, and a Q&A session. It also discusses how MongoDB provides an operational 'Big Data' solution and can help address the challenges of increasing data complexity, volume, and types that traditional RDBMS struggle with.
This document discusses Brisk, a peer-to-peer Hadoop distribution built on Apache Cassandra. It provides an overview of key Hadoop concepts like MapReduce and HDFS. It also describes how Brisk uses Cassandra to address limitations of Hadoop by allowing one-kind-of-node and removing single points of failure. The document highlights Brisk features like Hive integration for SQL access and a custom snitch for routing. Demo URLs and getting started instructions are provided.
NoSQL refers to non-relational databases that were developed to handle large volumes of data across many servers. NoSQL databases such as Redis, Memcached, MongoDB and CouchDB do not use SQL for queries and are more scalable and flexible than traditional relational databases. They often sacrifice consistency, complex queries, and transaction support in favor of availability and flexibility.
These are the slides from my talk at Hulu in March 2015 discussing Apache Spark & Cassandra. I cover the evolution of data from a single machine to RDBMS (MySQL is the primary example) to big data systems.
On the Spark side, I covered batch jobs, streaming, Apache Kafka, an introduction to machine learning, clustering, logistic regression and recommendations systems (collaborative filtering).
The talk was recorded and is available on youtube: https://www.youtube.com/watch?v=_gFgU3phogQ
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
Massively scalable, always on, and ridiculously fast. Apache Cassandra is the database chosen by Apple, Netflix, and 30 of the Fortune 100 to power their critical infrastructure. How do we analyze petabytes of data, whether it be massive batching or as it’s ingested via streaming with Apache Kafka? Enter Apache Spark. Challenging MapReduce head on, Apache Spark offers powerful constructs that make it possible to slice and dice your data, whether it be through machine learning, graph queries, as well as transformations familiar to people with functional programming backgrounds such as map, filter, and reduce. Step away ready to rock with the most powerful distributed database, scalable messaging, and analytics platform on the planet.
Watch the video here
https://www.youtube.com/watch?v=X-FKmKc9hkI
The document provides an overview of MongoDB, describing it as an open-source, high-performance, schema-free, document-oriented database. It then outlines some basic terms used in MongoDB like document, BSON, collection, and GridFS. The remainder of the document discusses technical aspects of MongoDB like administration, drivers, replication, sharding, and features such as queries, indexes, map reduce, and upserts. It concludes by reviewing several companies that use MongoDB successfully in production applications.
This document provides an introduction to NoSQL databases. It discusses that NoSQL refers to non-relational databases that are not based on SQL and are focused on scalability. Some common types of NoSQL databases include column stores, key-value stores, document stores, graph databases, and XML databases. NoSQL databases are designed to handle large volumes of data across many servers and provide high availability with no single point of failure. Common uses of NoSQL databases include distributed systems like social networks where data is highly distributed and needs to be replicated across servers.
The document provides an overview of the Microsoft database stack, including the various SQL Server products that make up the stack. It discusses some of the hard problems databases help solve, such as query plan generation and ensuring data consistency. It also covers file layouts and I/O patterns for different SQL Server file types.
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
DBA's used to be Relational Database centric for instance managing Microsoft SQL Server or Oracle, in this changing world of polyglot database environments their role has expanded not just into new platforms other than SQL but also new legal governance, modelling techniques, architecture etc. They need to have a base knowledge of Kimball, Inmon, Data Vault, what CAP theorem is, LAMBDA, Big Data, Data Science etc.
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
In this session it will presented main workflows and technologies of getting value from Big Data stored in our Enterprise using Azure.
- When we have a Big Data problem
- Finding the best solution for our Big Data
- Working inside the Data Team
- Extract the true value of our data.
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
This will cover what to consider for high write throughput performance from hardware configuration through to the use of replica sets, multi-data centre deployments, monitoring and sharding to ensure your database is fast and stays online.
Spring one2gx2010 spring-nonrelational_dataRoger Xia
This document provides a summary of a talk on using Spring with NoSQL databases. The talk discusses the benefits and drawbacks of NoSQL databases, and how the Spring Data project simplifies development of NoSQL applications. It then provides background on the two speakers, Chris Richardson and Mark Pollack. The agenda outlines explaining why NoSQL, overviewing some NoSQL databases, discussing Spring NoSQL projects, and having demos and code examples.
Brisk - Truly peer-to-peer hadoop.
Brisk is an open-source Hadoop & Hive distribution that uses Apache Cassandra for its core services and storage. Brisk makes it possible to run Hadoop MapReduce on top of CassandraFS, an HDFS-compatible storage layer. By replacing HDFS with CassandraFS, users leverage MapReduce jobs on Cassandra’s peer-to-peer, fault-tolerant and scalable architecture.
With CassandraFS all nodes are peers. Data files can be loaded through any node in the cluster and any node can serve as the JobTracker for MapReduce jobs. Hive MetaStore is stored & accessed as just another column family (table) on the distributed data store. Brisk makes Hadoop truly peer-to-peer.
We demonstrate visualisation & monitoring of Brisk using OpsCenter. The operational simplicity of cassandra’s multi-datacenter & multi-region aware replication makes Brisk well-suited for a rich set of Applications and usecases. And by being able to store and isolate hdfs & online data within the same data cluster, Brisk makes analytics possible without ETL!
LA Scalability Talk, Mahalo
May 31.2011
- MongoDB allows for automatic sharding of data across multiple servers to improve write performance. However, scaling write performance is challenging due to the way B-tree indexes handle random inserts.
- To improve write performance, one can partition data by time or use a hash shard key. However, these have limitations as the data grows large. The best approach is to use a low-cardinality hash prefix combined with a sequential part for the shard key.
- Proper choice of shard key is crucial for scaling MongoDB's write performance as data size increases. Linear scalability is difficult to achieve and alternative databases may be better if extremely high write throughput is required.
Scaling Big Data Mining Infrastructure Twitter ExperienceDataWorks Summit
The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. In this talk, we’ll discuss the evolution of our infrastructure and the development of capabilities for data mining on “big data”. We’ll share our experiences as a case study, but make recommendations for best practices and point out opportunities for future work.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
4. · Amazon EC2 + S3 + CloudFront
· 2 NGinX, 16 Web Engines + 2 API Engines
Page Views / Day
· 5 Functionally Sharded MySQL DB + 9 read slaves
· 4 Cassandra Nodes
· 15 Membase Nodes (3 separate clusters)
· 8 Memcache Nodes
· 10 Redis Nodes
Mar 2010·
Mar 2010
3 Task Routers + 4 Task Processors
Jan 2011
Jan 2011
Jan 2012
Jan 2012
May 2012
· 4 Elastic Search Nodes
· 3 Mongo Clusters
· 3 Engineers
Scaling Pinterest
12年8月10⽇日星期五
5. Clustering
· Data distributed automatically
· Data can move
· Rebalances to distribute capacity
· Nodes communicate with each othe
Sharding
Scaling Pinterest
12年8月10⽇日星期五
6. Clustering
· Data distributed manually
· Data does not move
· Split data to distribute load
· Nodes are not aware of each other
Sharding
Scaling Pinterest
12年8月10⽇日星期五
7. Our Transition
1 DB + Foreign Keys + Joins
1 DB + Denormalized +
Cache + Read slaves +
1 DB
Cache
Several functionally sharded DBs + Read slaves +
Cache ID sharded DBs + Backup slaves +
Cache
Scaling Pinterest
12年8月10⽇日星期五
10. High Availability
db00001 db00513 db03072 db03584
db00002 db00514 db03073 db03585
....... ....... ....... .......
db00512 db01024 db03583 db04096
Multi Master replication for hot swapping
All writes/reads go to only 1 master
Scaling Pinterest
12年8月10⽇日星期五
11. Configuration
· Initially 8 masters, each with 512 DBs
· Now at 80 masters, some 32 DBs and some 64
DBs
· Over 10 TB
· All tables are InnoDB
Scaling Pinterest
12年8月10⽇日星期五
12. Configuration
· Ubuntu 11.10 on AWS managed by Puppet
· m1.xlarge using 4 stripe-raided ephemeral
drives
· cc2.8xlarge using 4 stripe-raided ephemeral
drives
· Soon: hi1.4xlarge (SSD)
Scaling Pinterest
12年8月10⽇日星期五
13. ID Structure
64 bits
Shard ID Type Local ID
· A lookup data structure has physical server to
shard ID range (cached by each app server
process)
· Shard ID denotes which shard
· Type denotes object type (e.g., pins)
· Local ID denotes position in table
Scaling Pinterest
12年8月10⽇日星期五
15. ID Generation
· New users are randomly distributed across
shards
· Boards, pins, etc. try to be collocated with user
· Local ID’s are assigned by auto-increment
mysql_query(“USE pbdata%d” % shard_id)
local_id = mysql_query(“INSERT INTO users (body) VALUES (%s)”, json)
full_id = shard_id << 46 | USER_TYPE << 10 | local_id
· Enough ID space for 65536 shards, but only
first 4096 opened initially. Can expand
horizontally.
Scaling Pinterest
12年8月10⽇日星期五
16. · Object tables (e.g., pin, board, user, comment)
Objects and Mappings
· Local ID MySQL blob (JSON / Serialized
thrift)
· Mapping tables (e.g., user has boards, pin has
likes)
· Full ID Full ID (+ sequence ID /
timestamp)
· Naming schema is noun_verb_noun
· Queries are PK or index lookups (no joins)
· Data DOES NOT MOVE
· All tables exist on all shards
Scaling Pinterest
12年8月10⽇日星期五
· No schema changes required (index = new
17. Loading a Page
· Rendering user profile
SELECT body FROM users WHERE id=<local_user_id>
SELECT board_id FROM user_has_boards WHERE user_id=<user_id>
SELECT body FROM boards WHERE id IN (<board_ids>)
SELECT pin_id FROM board_has_pins WHERE board_id=<board_id>
SELECT body FROM pins WHERE id IN (pin_ids)
· Most of these calls will be a cache hit
· Omitting offset/limits and mapping sequence
id sort
Scaling Pinterest
12年8月10⽇日星期五
18. Increased load on DB?
sharddb001a
db00001
db00002
db00003
{“sharddb001a”: ( 1, 4), db00004
“sharddb002b”: ( 5, 8),
“sharddb003a”: ( 9, 12),
...
“sharddb008b”: ( 29, 32)}
sharddb001b
To increase capacity, a server is replicated
and the new replica becomes responsible for
some DBs
Scaling Pinterest
12年8月10⽇日星期五
19. Increased load on DB?
sharddb001a
db00001
db00002
db00003
{“sharddb001a”: ( 1, 4), db00004 sharddb009a
“sharddb002b”: ( 5, 8),
“sharddb003a”: ( 9, 12), db00001
... db00002
“sharddb008b”: ( 29, 32)} db00003
db00004
sharddb001b
sharddb009b
To increase capacity, a server is replicated
and the new replica becomes responsible for
some DBs
Scaling Pinterest
12年8月10⽇日星期五
20. Increased load on DB?
sharddb001a
db00001
db00002
db00003
{“sharddb001a”: ( 1, 2), db00004 sharddb009a
“sharddb002b”: ( 5, 8),
“sharddb003a”: ( 9, 12), db00001
... db00002
“sharddb008b”: ( 29, 32), db00003
“sharddb009a”: ( 3, 4)} db00004
sharddb001b
sharddb009b
To increase capacity, a server is replicated
and the new replica becomes responsible for
some DBs
Scaling Pinterest
12年8月10⽇日星期五
21. Increased load on DB?
sharddb001a
db00001
db00002
db00003
{“sharddb001a”: ( 1, 2), db00004 sharddb009a
“sharddb002b”: ( 5, 8),
“sharddb003a”: ( 9, 12), db00001
... db00002
“sharddb008b”: ( 29, 32), db00003
“sharddb009a”: ( 3, 4)} db00004
sharddb001b
sharddb009b
To increase capacity, a server is replicated
and the new replica becomes responsible for
some DBs
Scaling Pinterest
12年8月10⽇日星期五
22. Increased load on DB?
sharddb001a
db00001
db00002
{“sharddb001a”: ( 1, 2), sharddb009a
“sharddb002b”: ( 5, 8),
“sharddb003a”: ( 9, 12),
...
“sharddb008b”: ( 29, 32), db00003
“sharddb009a”: ( 3, 4)} db00004
sharddb001b
sharddb009b
To increase capacity, a server is replicated
and the new replica becomes responsible for
some DBs
Scaling Pinterest
12年8月10⽇日星期五
23. Scripting
· Must get old data into your shiny new shard
· 500M pins, 1.6B follower rows, etc
· Build a scripting farm with Pyres + Redis
· Spawn more workers and complete the task
faster
· Can push millions of jobs onto the queue
Scaling Pinterest
12年8月10⽇日星期五
25. · ID generated by Generation
ID shard + type + auto-
increment
· Pinterest, Facebook
· Pro: ID generation scales with size of shard
· Pro: No routing services which can fail
· Pro: No holes in ID space
· Pro: Type verification built-in
· Con: No global ordering (requires sequence
ID)
· Con: Can’t move parts of databases around
Scaling Pinterest
12年8月10⽇日星期五
26. ID Generation
· ID contains timestamp + shard
· Instagram
· Pro: Global ordering
· Pro: No routing services which can fail
· Con: No type verification
· Con: Can’t move parts of databases around
· Con: Not much space for shard ID’s
Scaling Pinterest
12年8月10⽇日星期五
27. ID Generation
· ID generated by service
· Twitter, Etsy
· Requires: ID routing service
· Pro: Possible global ordering
· Pro: Can move individual data elements
· Pro: Can perform type-checking
· Con: ID routing service can fail
· Con: ID routing service can be unwieldy and
complicated
Scaling Pinterest
12年8月10⽇日星期五
29. Hotspots
· Our board_followedby_users tables were
unbalanced
· Rearchitected to have
board_followedby_user_shard map boards to
shards containing subsets of the
board_followedby_users data.
· Pro: Data evenly distributed
· Con: May have to hit multiple shards
Scaling Pinterest
12年8月10⽇日星期五
30. No Joins
· Home is user_follows_board JOIN board_has_pins
· Each user has a home feed redis list
· New pin ID placed in each following user’s list
for user_id in board_followedby_users:
redisconn.lpush(“home:%s” % user_id, new_pin_id)
Scaling Pinterest
12年8月10⽇日星期五
31. No Automatic Failover
· Some fraction of users see timeouts if a DB fails
· Manually change settings and re-deploy
· Needs to be automated, especially with 128+
DBs
· Moving to automation with Zookeeper
Scaling Pinterest
12年8月10⽇日星期五
33. Caching
· Redis lists for persistently caching homefeeds
· lpush homefeed:82363 7233494
· lrange homefeed:82363 0 49
· Memcache for caching lists
· MessagePack to serialize data
· Append to add new data
· Never delete from middle
· Memcache for caching objects
Scaling Pinterest
12年8月10⽇日星期五
34. Caching
· Caches separated by pools
· Example: pin objects, board_has_pins, etc
· Better stats
· Easier to scale and manage
Scaling Pinterest
12年8月10⽇日星期五
35. Caching with decorators
@mc_objects(USER_MC_CONN,
format = “user:%d”,
version=1,
serialization=simplejson
expire_time=0)
def user_get_many(self, user_ids):
cursor = get_mysql_conn().cursor
cursor.execute(“SELECT * FROM users WHERE id in (%s)”,
user_ids)
return cursor.fetchall()
Scaling Pinterest
12年8月10⽇日星期五
36. Caching with decorators
@paged_list(BOARD_MC_CONN,
format = “b:p:%d”,
version=1,
expire_time=24*60*60)
def get_board_pins(self, board_id, offset, limit):
cursor = get_mysql_conn().cursor
cursor.execute(“SELECT pin_id FROM bp WHERE id =%s OFFSET
%d LIMIT %d”, board_id, offset, limit)
return cursor.fetchall()
Scaling Pinterest
12年8月10⽇日星期五
37. We are Hiring!
jobs@pinterest.com
Scaling Pinterest
12年8月10⽇日星期五