Learn about Espresso, Databus, and Voldemort. LinkedIn Data Infrastructure Slides (Version 2). This talk was given in NYC on June 20, 2012
You can download the slides as PPT in order to see the transitions here :
http://bit.ly/LfH6Ru
Modern Java web applications with Spring Boot and ThymeleafLAY Leangsros
If you’re using Java in an enterprise environment, you’ve most likely been using Spring Framework with JSP which does the job pretty well.But I will provide the sampling of how Spring Boot helps you accelerate and facilitate application development better. I will show a templating technology, Thymleaf which can be used much more modern features;
This session is about Django, which is a web framework build in python. It has several features like admin interface and ORM. The architecture of Django has Model, View, and template and it's ORM saves the pain of writing database queries.
A closer look at the MySQL and PostgreSQL compatible relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. We’ll explore how Aurora uses the AWS cloud to provide high reliability, high durability, and high throughput.
I did this presentation for one of my java user groups at work.
Basically, this is a mashed up version of various presentations, slides and images that I gathered over the internet.
I've quoted the sources in the end. Feel free to reuse it as you like.
Modern Java web applications with Spring Boot and ThymeleafLAY Leangsros
If you’re using Java in an enterprise environment, you’ve most likely been using Spring Framework with JSP which does the job pretty well.But I will provide the sampling of how Spring Boot helps you accelerate and facilitate application development better. I will show a templating technology, Thymleaf which can be used much more modern features;
This session is about Django, which is a web framework build in python. It has several features like admin interface and ORM. The architecture of Django has Model, View, and template and it's ORM saves the pain of writing database queries.
A closer look at the MySQL and PostgreSQL compatible relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. We’ll explore how Aurora uses the AWS cloud to provide high reliability, high durability, and high throughput.
I did this presentation for one of my java user groups at work.
Basically, this is a mashed up version of various presentations, slides and images that I gathered over the internet.
I've quoted the sources in the end. Feel free to reuse it as you like.
This is the slides I used when I shared my humble insight on Django to the students in University of Taipei in 2016. Please feel free to correct me if there is anything wrong.
Most learning materials for web app pentesting focus on “old school” apps. Maybe they have a little jQuery sprinkled in, but most of the heavy-lifting happens server-side. With the dawn of frontend frameworks like AngularJS, Vue, and React and Single-Page Applications, the way web apps are developed is changing, and pentesters need to keep up. This talk runs through common security issues with and approaches to testing these new apps.
This presentation discuss the evolution of Access Control Model and also introducing ABAC or Attributes Based Access Control as a new approach for authorization.
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Amazon Web Services Korea
발표영상 다시보기: https://kr-resources.awscloud.com/data-databases-and-analytics/aurora-mysql-backtrack%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EB%B9%A0%EB%A5%B8-%EB%B3%B5%EA%B5%AC-%EB%B0%A9%EB%B2%95-%EC%A7%84%EA%B5%90%EC%84%A0-aws-database-modernization-day-%EC%98%A8%EB%9D%BC%EC%9D%B8-2
Aurora MySQL은 기존 MySQL의 운영에 추가한 많은 기능들을 제공해 드리고 있습니다. 이 중 복구에 관련된 기능인 Aurora MySQL PITR과 Backtrack에 대한 소개를 드리고자 합니다. 두 기능을 통해 운영 중 일어날 수 있는 rollback 상황에서, 어떠한 방식으로 복구를 할 수 있는지 실습해보실 수 있습니다.
This talk was given by Jun Rao (Staff Software Engineer at LinkedIn) and Sam Shah (Senior Engineering Manager at LinkedIn) at the Analytics@Webscale Technical Conference (June 2013).
This is the slides I used when I shared my humble insight on Django to the students in University of Taipei in 2016. Please feel free to correct me if there is anything wrong.
Most learning materials for web app pentesting focus on “old school” apps. Maybe they have a little jQuery sprinkled in, but most of the heavy-lifting happens server-side. With the dawn of frontend frameworks like AngularJS, Vue, and React and Single-Page Applications, the way web apps are developed is changing, and pentesters need to keep up. This talk runs through common security issues with and approaches to testing these new apps.
This presentation discuss the evolution of Access Control Model and also introducing ABAC or Attributes Based Access Control as a new approach for authorization.
Aurora MySQL Backtrack을 이용한 빠른 복구 방법 - 진교선 :: AWS Database Modernization Day 온라인Amazon Web Services Korea
발표영상 다시보기: https://kr-resources.awscloud.com/data-databases-and-analytics/aurora-mysql-backtrack%EC%9D%84-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EB%B9%A0%EB%A5%B8-%EB%B3%B5%EA%B5%AC-%EB%B0%A9%EB%B2%95-%EC%A7%84%EA%B5%90%EC%84%A0-aws-database-modernization-day-%EC%98%A8%EB%9D%BC%EC%9D%B8-2
Aurora MySQL은 기존 MySQL의 운영에 추가한 많은 기능들을 제공해 드리고 있습니다. 이 중 복구에 관련된 기능인 Aurora MySQL PITR과 Backtrack에 대한 소개를 드리고자 합니다. 두 기능을 통해 운영 중 일어날 수 있는 rollback 상황에서, 어떠한 방식으로 복구를 할 수 있는지 실습해보실 수 있습니다.
This talk was given by Jun Rao (Staff Software Engineer at LinkedIn) and Sam Shah (Senior Engineering Manager at LinkedIn) at the Analytics@Webscale Technical Conference (June 2013).
Victors & Spoils at DIS: Rethinking the Agency ModelDigiday
Digital technology has made people more efficient than ever before. As a result, the industry needs to shift from a closed to an open business model in a big way. Victors & Spoils' John Winsorwill explain the guiding principles needed for agencies to thrive in an open and abundant world.
Presenter: John Winsor, chief innovation officer, Havas and CEO, Victors & Spoils @jtwinsor
Bigger Faster Easier: LinkedIn Hadoop Summit 2015Shirshanka Das
We discuss LinkedIn's big data ecosystem and its evolution through the years. We introduce three open source projects, Gobblin for ingestion, Cubert for computation and Pinot for fast OLAP serving. We also showcase our in-house data discovery and lineage portal WhereHows.
How do you build a general purpose data collection tool that can be integrated into 100+ sites, handle 15,000 1K writes/second, and have nearly zero downtime? How do you do that and have an architecture virtually unchanged for 3+ years? Start with using Mongo. Michael Makunas and Ramesh Nuthalapati from Viacom Media Networks, home of MTV, Comedy Central, Nickelodeon, and dozens of other brands, detail the architecture of the voting, polling, and data collection system that powers everything from voting for the MTV Video Music Awards, to sliming celebrities at the Nickelodeon Kids Choice Awards, to contests to meet Miley Cyrus.
This is the first episode of ... ESPRESSO SHOTS OF BUSINESS WISDOM. In this series, I plan to share some easy-to-follow business advice.
For more business marketing advice, peruse my blog: brandautopsy.typepad.com. Or, visit my website to learn more about me ... www.brandautopsy.com.
اشكر لطفك وذوقك وسعة صدرك
ارجو الانضمام الى حسابي الجديد
الحساب بالعربي
فهيدة تيشوري
شكرا لكم استاذ – ة-هذا حساب جديد لان ادارة الفيسبوك عاقبت الحساب القديم ومنعت المشاركة في المجموعات الوطنية لاننا نحب سورية ونؤيد الرئيس الاسد
كل عام وانتم بالف خير
اشكركم اشكر اعجابكم اعتز بصداقتكم ارجو لكم السعادة ارجو ان نبقى على تواصل – ارجو الانضمام الى مجموعة سورية الغد المتجددة
عبد الرحمن تيشوري – شهادة عليا بالادارة – شهادة عليا بالاقتصاد
طرطوس – سورية – ALRAHMANABD@GMAIL.COM
Nespresso coffee expertise enables anyone, anywhere to make a perfect espresso coffee.
At Nespresso we want to make it possible for you to make the same full-bodied espresso offered by skilled baristas. Your business can benefit from years of Nespresso expertise in premium Grand Crus coffees, innovative machines and excellent customer support.
Our rigorous selection process and careful blending, roasting, and grinding methods create exceptional Grand Crus coffees that are enjoyed all over the world. The exclusive capsule delivery system helps you to control costs and preserves the roast and ground coffee until you’re ready to serve.
In a business environment, Nespresso contributes to the performance of your company by offering the best quality coffee to your employees, but also a competitive and easy to manage coffee solution to you.
We believe that our Grand Crus stand apart from other coffees and fit naturally with the world of fine dining. Today, Nespresso is the coffee of choice in over 400 of the world finest restaurants.
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
Shirshanka Das and Yael Garten describe how LinkedIn redesigned its data analytics ecosystem in the face of a significant product rewrite, covering the infrastructure changes that enable LinkedIn to roll out future product innovations with minimal downstream impact. Shirshanka and Yael explore the motivations and the building blocks for this reimagined data analytics ecosystem, the technical details of LinkedIn’s new client-side tracking infrastructure, its unified reporting platform, and its data virtualization layer on top of Hadoop and share lessons learned from data producers and consumers that are participating in this governance model. Along the way, they offer some anecdotal evidence during the rollout that validated some of their decisions and are also shaping the future roadmap of these efforts.
خليك بروفيشنال | حدث وجود | انوار رسالة الشرقيةAhmed Faris
وجود هو حدث جماهيري من تنظيم | انوار رسالة فروع الشرقية كلها
الإنسان إتخلق لغاية ... وربنا منّ عليه بمواهب وميزات ويمكن حاجات الناس شايفينها عيوب .. لكن الحاجات دي كلها مجمعة بتحققله الوجود بتخلقله بصمة خاصة بيه وتأثير قوي في مجتمعه ... لكن عليه يستغلها صح ويشتغل على نفسه ويمشي في طريق واضح محدده لنفسه
وأكيد علشان بحقق ده بيحتاج توجيه من حد ناجح في نفس المجال أللي هو إختاره لنفسه لانه مناسب ليه وقادر يوصل معاه للشغف أو حب الحاجه ألي بيعملها والإبداع فيها
ومن هنا جت فكرة الإيفينت
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
الإيفينت #5شخصيات اثبتوا نجاحهم في 5 مجالات مختلفه هما ◕‿◕
1- ◄ريادة الأعمال | م.رضا الشريف
2- ◄محو الأمية الإلكترونية | م. مصطفى رسلان
3- ◄الفرص على لينكدان | م.احمد فارس
4- ◄TOT | د.محمد رأفت
5-◄التطوع واخلاقياته | م.محمد الشافعي
▽▽▽▽▽▽▽▽▽▽▽▽▽▽▽▽▽
كل سبيكر شارك 30 دقيقة
اتكلم فيهم عن تجربته الناجحة في مجالة أ، و يوصل للحضور إزاي يقدروا ينجحوا في نفس المجال ..بمجهودهم
يعني بإختصار حاول يحفزهم بتجربته الشخصية ويعرفهم ازاي يقدروا ينجحوا سوء فى ريادة الأعمال السوشال ميديا وغيرها فينورلهم الدنيا علشان يقدروا هما كمان ينوروها بوجودهم ونجاحهم وتأثيرهم الإيجابي :)
الحدث كان بتاريخ 22\8\2015 في مول المعلمين بالزقازيق
Document Classification using DMX in SQL Server Analysis ServicesMark Tabladillo
Presentation for SQL Saturday Raleigh NC, Septmber 18, 2010
Overview of using DMX (Data Mining Extensions) in Excel, SSMS (SQL Server Management Studio), BIDS (Business Intelligence Development Studio), and PowerShell
In this presentation we will discuss the basic of internet, the basic parts and it’s relationship with e-business.
To know more about Welingkar School’s Distance Learning Program and courses offered, visit:
http://www.welingkaronline.org/distance-learning/online-mba.html
Cloud computing has become a reality in most enterprises. And as cloud technology is more widely adopted, integration between cloud offerings and traditional on premise solutions becomes more relevant. While enterprise application integration was already a non-trivial exercise within the enterprise, the cloud provides additional challenges in terms of connectivity, performance, and security. This preentation on cloud enterprise integration will explore architectural styles for successful integrations and show how NTT DATA has productised the integration of Salesfore.com CRM
Internet enables in linking the headquarters of an organization with customers. Suppliers,business partners and remote offices. Various categories of internet network include Backbone Network, Metropolitan Area Network, Business Area Network and Campus Area Network. Building traffic on the internet is one of the most daunting challenges. It can be tackled by using the following techniques:
• Registration of search engines
• Content Partnerships
• Banner Advertising
• E-mail advertising
• Press Relations
• Web-based events
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
Cloud is a sharing economy that reduces your spending. But does this also apply to data and analytics? Doesn't this require you to provision dedicated data warehouse systems to run analytics SQL queries on terabytes of data? With IBM Cloud, the answer is no. By using serverless analytics via IBM Cloud SQL Query, you can analyze your data directly where it sits, be it in IBM Cloud Object Storage or in your NoSQL databases. Due to the serverless nature of SQL Query, you only pay for your queries depending on the data volume that they process. There are no standing costs. You do not need to provision and wait for a data warehouse. But you can still run SQLs on terabytes of data.
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention, to content and network recommendations, to ranking and bidding, our world not only consumes low-latency data streams, it adapts to changing conditions modeled by that data.
While software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the ecosystem of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few.
Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
Slides from "Cloud Native Data Pipelines" talk given @ QCon Tokyo 2016. The slides are in both English and Japanese. Thanks to Kiro Harada (https://jp.linkedin.com/in/haradakiro) for the translation.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
2. What Do I Mean by V2?
V2 == version of this talk, not version of our architecture.
*
Version 1 of this talk
• Presented at QCon London (March 2012)
• http://www.infoq.com/presentations/Data-Infrastructure-LinkedIn
Version 2 – i.e. this talk
• Contains some overlapping content
• Introduces Espresso, our new NoSQL database
For more information on what LinkedIn Engineering is doing, feel free to
follow @LinkedInEng
@r39132 2
3. About Me
Current Life…
*
LinkedIn
Site Ops
Web (Software) Engineering
Search, Network, and Analytics (SNA)
Distributed Data Systems (DDS)
Me (working on analytics projects)
In Recent History…
Netflix, Cloud Database Architect (4.5 years)
eBay, Web Development, Research Labs, & Search Engine (4 years)
@r39132 3
5. The world’s largest professional network
Over 60% of members are outside of the United States
82%
*
161M+ *
90 Fortune 100 Companies
use LinkedIn to hire
>2M
*
Company Pages
55
17
*
32
Languages
~4.2B
17
8
2 4
Professional searches in 2011
2004 2005 2006 2007 2008 2009 2010 *as of March 31, 2012
LinkedIn Members (Millions)
@r39132 5
7. LinkedIn : Architecture
Overview
• Our site runs primarily on Java, with some use of Scala for specific
infrastructure
• What runs on Scala?
• Network Graph Service
• Kafka
• Most of our services run on Apache Traffic Server + Jetty
@r39132 7
8. LinkedIn : Architecture
A web page requests
information A and B
Presentation Tier A thin layer focused on
building the UI. It assembles
the page by making parallel
requests to the BT
A A B B C C Business Tier Encapsulates business
logic. Can call other BT
clusters and its own DAT
cluster.
A A B B C C Data Access Tier Encapsulates DAL logic
Data Infrastructure
Oracle Oracle Oracle Oracle Concerned with the
persistent storage of and
easy access to data
Memcached
@r39132 8
9. LinkedIn : Architecture
A web page requests
information A and B
Presentation Tier A thin layer focused on
building the UI. It assembles
the page by making parallel
requests to the BT
A A B B C C Business Tier Encapsulates business
logic. Can call other BT
clusters and its own DAT
cluster.
A A B B C C Data Access Tier Encapsulates DAL logic
Data Infrastructure
Oracle Oracle Oracle Oracle Concerned with the
persistent storage of and
easy access to data
Memcached
@r39132 9
12. Oracle : Overview
Oracle
• Until recently, all user-provided data was stored in Oracle – our source of truth
• Espresso is ramping up
• About 50 Schemas running on tens of physical instances
• With our user base and traffic growing at an accelerating pace, how do we scale
Oracle for user-provided data?
Scaling Reads
• Oracle Slaves
• Memcached
• Voldemort – for key-value lookups
***
Scaling Writes
• Move to more expensive hardware or replace Oracle with something better
@r39132 12
14. Voldemort : Overview
• A distributed, persistent key-value store influenced by the Amazon Dynamo paper
• Key Features of Dynamo
Highly Scalable, Available, and Performant
Achieves this via Tunable Consistency
• Strong consistency comes with a cost – i.e. lower availability and higher response times
• The user can tune this to his/her needs
Provides several self-healing mechanisms when data does become inconsistent
• Read Repair
Repairs value for a key when the key is looked up/read
• Hinted Handoff
Buffers value for a key that wasn’t successfully written, then writes it later
• Anti-Entropy Repair
Scans the entire data set on a node and fixes it
Provides means to detect node failure and a means to recover from node failure
• Failure Detection
• Bootstrapping New Nodes
@r39132 14
15. Voldemort : Overview
API Layered, Pluggable Architecture
VectorClock<V> get (K key)
Client
put (K key, VectorClock<V> value)
applyUpdate(UpdateAction action, int retries) Client API
Conflict Resolution
Voldemort-specific Features Serialization
Implements a layered, pluggable Repair Mechanism
architecture
Failure Detector
Routing
Each layer implements a common interface
(c.f. API). This allows us to replace or
remove implementations at any layer Server
Repair Mechanism
• Pluggable data storage layer Failure Detector Admin
BDB JE, Custom RO storage,
Routing
etc…
Storage Engine
• Pluggable routing supports
Single or Multi-datacenter routing
@r39132 15
16. Voldemort : Overview
Layered, Pluggable Architecture
Client
Voldemort-specific Features
Client API
Conflict Resolution
• Supports Fat client or Fat Server Serialization
Repair Mechanism
• Repair Mechanism + Failure Failure Detector
Detector + Routing can run on Routing
server or client
Server
• LinkedIn currently runs Fat Client, but we Repair Mechanism
would like to move this to a Fat Server Failure Detector Admin
Model Routing
Storage Engine
@r39132 16
18. Voldemort : Usage Patterns @ LinkedIn
2 Usage-Patterns
Read-Write Store
– A key-value alternative to Oracle
– Uses BDB JE for the storage engine
– 50% of Voldemort Stores (aka Tables) are RW
Read-only Store
– Uses a custom Read-only format
– 50% of Voldemort Stores (aka Tables) are RO
Let’s look at the RO Store
@r39132 18
19. Voldemort : RO Store Usage at LinkedIn
People You May Know
Viewers of this profile also viewed
Related Searches
Events you may be interested in
LinkedIn Skills
Jobs you may be
interested in
@r39132 19
20. Voldemort : Usage Patterns @ LinkedIn
RO Store Usage Pattern
1. We schedule a Hadoop job to build a table (called “store” in Voldemort-speak)
2. Azkaban, our Hadoop scheduler, detects Hadoop job completion and tells Voldemort to fetch the new data
and index files
3. Voldemort fetches the data and index files in parallel from HDFS. Once fetch completes, swap indexes!
4. Voldemort serves fast key-value look-ups on the site
– e.g. For key=“Sid Anand”, get all the people that “Sid Anand” may know!
– e.g. For key=“Sid Anand”, get all the jobs that “Sid Anand” may be interested in!
@r39132 20
21. How Do The Voldemort RO
Stores Perform?
@r39132 21
25. Databus : Use-Cases @ LinkedIn
Updates
Standard Search Graph Read
ization Index Index Replicas
Oracle
Data Change Events
A user updates his profile with skills and position history. He also accepts a connection
• The write is made to an Oracle Master and Databus replicates:
• the profile change to the Standardization service
E.G. the many (actually 40) forms of IBM are canonicalized for search-friendliness and
recommendation-friendliness
• the profile change to the Search Index service
Recruiters can find you immediately by new keywords
• the connection change to the Graph Index service
The user can now start receiving feed updates from his new connections immediately
@r39132 25
27. Databus : Architecture
Databus consists of 2 components
Capture Relay • Relay Service
DB Changes
Event Win • “Relays” DB changes to
subscribers as they happen in
On-line Oracle
Changes
• Uses a sharded, in-memory,
circular buffer
Bootstrap • Very fast, but has limited amount of
buffered data!
DB
• Bootstrap Service
• Serves historical data to
subscribers who have fallen behind
on the latest data from the “relay”
• Not as fast as the relay, but has
large amount of data buffered on
disk
@r39132 27
28. Databus : Architecture
Client
Capture Relay Consumer 1
Client Lib
On-line
Databus
DB Changes Changes
Event Win Consumer n
On-line
ated
Changes solid
Con eT
Sinc
Delta Client
Bootstrap
Consumer 1
Client Lib
Consistent
Databus
Snapshot at U Consumer n
DB
@r39132 28
30. LinkedIn Data Infrastructure Technologies
Espresso: Indexed Timeline-Consistent Distributed
Data Store
@r39132 30
31. Why Do We Need Yet Another
NoSQL DB?
@r39132 31
32. Espresso: Overview
What is Oracle Missing?
• (Infinite) Incremental Scalability
• Adding 50% more resources gives us 50% more scalability (e.g. storage and serving
capacity, etc…). In effect, we do not want to see diminishing returns
• Always Available
• Users of the system do not perceive any service interruptions
• Adding capacity or doing any other maintenance does not incur downtime
• Node failures do not cause downtime
• Operational Flexibility and Cost Control
• No recurring license fees
• Runs on commodity hardware
• Simplified Development Model
• E.g. Adding a column without running DDL ALTER TABLE commands
@r39132 32
33. Espresso: Overview
What Features Should We Retain from Oracle?
• Limited Transactions : Constrained Strong Consistency
• We need consistency within an entity (more on this later)
• Ability to Query by Fields other than the Primary Key
• a.k.a. non-PK columns in RDBMS
• Must Feed a Change Data Capture system
• i.e. can act as a Databus Source
• Recall that a large ecosystem of analytic services are fed by the Oracle-Databus pipe. We
need to continue to feed that ecosystem
Guiding Principles when replacing a system (e.g. Oracle)
• Strive for Usage Parity, not Feature Parity
• In other words, first look at how you use Oracle and look for those features in candidate
systems. Do not look for general feature parity between systems.
• Don’t shoot for a full replacement
• Buy yourself headroom by migrating the top K use-cases by load off Oracle. Leave the
others.
@r39132 33
35. Espresso: API Example
Consider the User-to-User Communication Case at LinkedIn
• Users can view
• Inbox, sent folder, archived folder, etc….
• On social networks such as LinkedIn and Facebook, user-to-user messaging consumes substantial database
resources in terms of storage and CPU (e.g. activity)
• At LinkedIn 40% of Host DB CPU estimated to serve U2U Comm
• Footnote : Facebook migrated their messaging use-case to HBase
• We’re moving this off Oracle to Espresso
@r39132 35
36. Espresso: API Example
Database and Tables
• Imagine that you have a Mailbox Database containing tables needed for the U2U Communications case
• The Mailbox Database contains the following 3 tables:
• Message_Metadata – captures subject text
• Primary Key = MemberId & MsgId
• Message_Details – captures full message content
• Primary Key = MemberId & MsgId
• Mailbox_Aggregates – captures counts of read/unread & total
• Primary Key = MemberId
Example Read Request
Espresso supports REST semantics.
To get unread and total email count for “bob”, issue
a request of the form:
• GET /<database_name>/<table_name>/
<resource_id>
• GET /Mailbox/Mailbox_Aggregates/bob
@r39132 36
37. Espresso: API Example
Collection Resources vs. Singleton Resources
A resource identified by a resource_id may be either a singleton or a collection
Examples (Read)
For singletons, the URI refers to an individual resource.
– GET /<database_name>/<table_name>/<resource_id>
– E.g.: Mailbox_Aggregates table
To get unread and total email count for “bob”, issue a request of the form:
– GET /Mailbox/Mailbox_Aggregates/bob
For collections, a secondary path element defines individual resources within the collection
• GET /<database_name>/<table_name>/<resource_id>/<subresource_id>
• E.g.: Message_Metadata & Message_Details tables
• To get all of “bob’s” mail metadata, specify the URI up to the <resource_id>
• GET /Mailbox/Message_Metadata/bob
• To display one of “bob’s” messages in its entirety, specify the URI up to the
<subresource_id>
• GET /Mailbox/Message_Details/bob/4
@r39132 37
39. Espresso: Architecture
• Components
• Request Routing Tier
• Stateless
• Accepts HTTP request
• Consults Cluster Manager to
determine master for partition
• Forwards request to appropriate
storage node
• Storage Tier
• Data Store (e.g. MySQL)
• Data is semi-structured (e.g. Avro)
• Local Secondary Index (Lucene)
• Cluster Manager
• Responsible for data set
partitioning
• Notifies Routing Tier of partition
locations for a data set
• Monitors health and executes
repairs on an unhealthy cluster
• Relay Tier
• Replicates changes in commit
order to subscribers (uses
Databus)
@r39132 39
41. Espresso: Next Steps
Key Road Map Items
• Internal Rollout for more use-cases within LinkedIn
• Development
• Active-Active across data centers
• Global Search Indexes : “<resource_id>?query=…” for singleton resources as well
• More Fault Tolerance measures in the Cluster Manager (Helix)
• Open Source Helix (Cluster Manager) in Summer of 2012
• Look out of an article in High Scalability
• Open Source Espresso in the beginning of 2013
@r39132 41
42. Acknowledgments
Presentation & Content
Tom Quiggle (Espresso) @TomQuiggle
Kapil Surlaker (Espresso) @kapilsurlaker
Shirshanka Das (Espresso) @shirshanka
Chavdar Botev (Databus) @cbotev
Roshan Sumbaly (Voldemort) @rsumbaly
Neha Narkhede (Kafka) @nehanarkhede
A Very Talented Development Team
Aditya Auradkar, Chavdar Botev, Antony Curtis, Vinoth Chandar, Shirshanka Das, Dave
DeMaagd, Alex Feinberg, Phanindra Ganti, Mihir Gandhi, Lei Gao, Bhaskar Ghosh, Kishore
Gopalakrishna, Brendan Harris, Todd Hendricks, Swaroop Jagadish, Joel Koshy, Brian
Kozumplik, Kevin Krawez, Jay Kreps, Shi Lu, Sunil Nagaraj, Neha Narkhede, Sasha
Pachev, Igor Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian,
Oliver Seeliger, Rupa Shanbag, Adam Silberstein, Boris Shkolnik, Chinmay Soman, Subbu
Subramaniam, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, Cuong Tran, Balaji
Varadarajan, Jemiah Westerman, Zhongjie Wu, Zach White, Yang Ye, Mammad Zadeh,
David Zhang, and Jason Zhang
@r39132 42