LinkedIn uses DataBus to replicate data changes from Oracle in real-time. DataBus consists of relay and bootstrap services that capture changes from Oracle and distribute them to various services like search indexes, graphs, and read replicas to keep them updated in real-time. This allows users to immediately see profile updates or new connections in search results and feeds.
Meraki vs. Viptela: Which Cisco SD-WAN Solution Is Right for You?Insight
More organizations are making the shift to Software-Defined Wide Area Networks (SD-WANs), but it can be difficult to decide which solutions are best suited to your business. In this infographic, explore two of the most well-established and widely deployed enterprise-grade Cisco SD-WAN solutions: Meraki MX and Viptela.
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
As a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and excels at processing streams of real-time events. Kafka provides reliable, millisecond delivery for connecting downstream systems with real-time data.
In this talk, we will show how easy it is to leverage Kafka and the Elasticsearch connector to keep your indices populated with the latest data from the rest of your enterprise, as it changes.
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
Speakers: Michael Hunger (Neo Technology) and Josh Long (Pivotal)
Spring Data Neo4j 3.0 is here and it supports Neo4j 2.0. Neo4j is a tiny graph database with a big punch. Graph databases are imminently suited to asking interesting questions, and doing analysis. Want to load the Facebook friend graph? Build a recommendation engine? Neo4j's just the ticket. Join Spring Data Neo4j lead Michael Hunger (@mesirii) and Spring Developer Advocate Josh Long (@starbuxman) for a look at how to build smart, graph-driven applications with Spring Data Neo4j and Spring Boot.
Microsoft Azure is an ever-expanding set of cloud services to help your organization meet your business challenges. It’s the freedom to build, manage, and deploy applications on a massive, global network using your favorite tools and frameworks.
Productive
Reduce time to market, by delivering features faster with over 100 end-to-end services.
Hybrid
Develop and deploy where you want, with the only consistent hybrid cloud on the market. Extend Azure on-premises with Azure Stack.
Intelligent
Create intelligent apps using powerful data and artificial intelligence services.
Trusted
Join startups, governments, and 90 percent of Fortune 500 businesses who run on the Microsoft Cloud today.
Meraki vs. Viptela: Which Cisco SD-WAN Solution Is Right for You?Insight
More organizations are making the shift to Software-Defined Wide Area Networks (SD-WANs), but it can be difficult to decide which solutions are best suited to your business. In this infographic, explore two of the most well-established and widely deployed enterprise-grade Cisco SD-WAN solutions: Meraki MX and Viptela.
Integrating Apache Kafka and Elastic Using the Connect Frameworkconfluent
As a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and excels at processing streams of real-time events. Kafka provides reliable, millisecond delivery for connecting downstream systems with real-time data.
In this talk, we will show how easy it is to leverage Kafka and the Elasticsearch connector to keep your indices populated with the latest data from the rest of your enterprise, as it changes.
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
Speakers: Michael Hunger (Neo Technology) and Josh Long (Pivotal)
Spring Data Neo4j 3.0 is here and it supports Neo4j 2.0. Neo4j is a tiny graph database with a big punch. Graph databases are imminently suited to asking interesting questions, and doing analysis. Want to load the Facebook friend graph? Build a recommendation engine? Neo4j's just the ticket. Join Spring Data Neo4j lead Michael Hunger (@mesirii) and Spring Developer Advocate Josh Long (@starbuxman) for a look at how to build smart, graph-driven applications with Spring Data Neo4j and Spring Boot.
Microsoft Azure is an ever-expanding set of cloud services to help your organization meet your business challenges. It’s the freedom to build, manage, and deploy applications on a massive, global network using your favorite tools and frameworks.
Productive
Reduce time to market, by delivering features faster with over 100 end-to-end services.
Hybrid
Develop and deploy where you want, with the only consistent hybrid cloud on the market. Extend Azure on-premises with Azure Stack.
Intelligent
Create intelligent apps using powerful data and artificial intelligence services.
Trusted
Join startups, governments, and 90 percent of Fortune 500 businesses who run on the Microsoft Cloud today.
Installation of Grafana on linux ; connectivity with Prometheus database , installation of Prometheus ; Installation of node_exporter ,Tomcat-exporter ; installation and configuration of alert manager .. Detailed step by step installation and working
In this presentation, we will do assess the on-premises environment and determining what workloads and databases are ready to make the move and what can you do to improve their Azure readiness while reducing downtime during the migration. Planning and assessment plays a critical role in moving to the cloud. We would see wide range of resources and tools to get an assessment completed with ease while identifying workload dependencies with practical tips and tricks focusing on sizing and costs. And finally, we’ll assess the SQL instances and identify their readiness for Azure as well.
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Web Services
In this session, we will walk through the fundamentals of Amazon Virtual Private Cloud (VPC). We will discuss core VPC concepts including picking your IP space, subnetting, routing, security, NAT and VPC Endpoints.
Microsoft Azure Platform-as-a-Service (PaaS)Chris Dufour
Azure is Microsoft’s cloud computing platform made up of a growing collection of integrated services: compute, storage, data, networking and apps.
Azure is the only major cloud platform ranked by Gartner as an industry leader for both Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). This powerful combination of managed and unmanaged services lets you build, deploy and manage applications in any way you like for unmatched productivity.
In this talk we will take a look at Microsoft’s cloud strategy and see how you can leverage PaaS in your environment.
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm NATS
NATS is a high-performance messaging system optimized for simplicity, reliability and low latency which can be a lightweight solution for the internal communication of your distributed system. In this talk, we will cover its core feature set as well as how to develop and assemble NATS-based microservices using the latest Docker tooling such as Compose and Swarm mode.
You can learn more about NATS at http://www.nats.io
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...Amazon Web Services Korea
<쿠키런:킹덤> 게임 유저 수가 급증하면서 지금까지 겪어보지 못했던 대규모 인프라 환경을 운영하게 되었고, 그 과정에서 다양한 문제점들에 부딪히게 되었습니다. 이 세션에서는 AWS에서 Stateful 한 게임 서버를 어떻게 운영해야 하는지 아키텍처 관점에서 먼저 설명한 후, 수 백만 명의 사용자를 감당하기 위해 해결해야 했던 어려움에 대해 Scalability 관점에서 설명해드립니다.
AWS-powered services for analytics can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches that will allow you to transform your data into a valuable corporate asset. In this session, AWS will provide an overview of the different AWS services available for your data analytics needs. You can combine these blocks to build data flows that will extend your organization’s agility, ability to derive more insights and value from its data, and capability to adopt more sophisticated analytics tools and processes as your needs evolve. In the second part of the session, Paddy Power Betfair’s Data team will discuss the adoption and large scale operation of a broad range of AWS services that make up PPB’s scalable, mixed workload, multi-brand data platform. The data capabilities developed by PPB and powered by AWS were implemented to enable low-latency, high-volume and near real-time advanced analytics use cases, in the highly regulated and fast-paced betting industry. This was only possible through a focus on automation, innovation and continuous improvement.
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
We will demonstrate how easy it is to use Confluent Cloud as the data source of your Beam pipelines. You will learn how to process the information that comes from Confluent Cloud in real time, make transformations on such information and feed it back to your Kafka topics and other parts of your architecture.
MongoDB Atlas makes it easy to set up, operate, and scale your MongoDB deployments in the cloud. From high availability to scalability, security to disaster recovery - we've got you covered.
Automated: With MongoDB Atlas, you no longer need to worry about operational tasks such as provisioning, configuration, patching, upgrades, backups, and failure recovery. MongoDB Atlas provides the functionality and reliability you need, at the click of a button.
Flexible: Only MongoDB Atlas combines the critical capabilities of relational databases with the innovations of NoSQL. Radically simplify development and operations by delivering a diverse range of capabilities in a single, managed database platform.
Secure: MongoDB Atlas provides multiple levels of security for your database. These include robust access control, network isolation using Amazon VPC, IP whitelists, encryption of data in-flight using TLS/SSL, and optional encryption of the underlying filesystem.
Scalable: MongoDB Atlas grows with you, all with the click of a button. You can scale up across a range of instance sizes, and scale-out with automatic sharding. And you can do it with zero application downtime.
Highly Available: MongoDB Atlas is designed to offer exceptional uptime. Recovery from instance failures is transparent and fully automated. A minimum of three copies of your data are replicated across availability zones and continuously backed up.
High Performance: MongoDB Atlas provides high throughput and low latency for the most demanding workloads. Consistent, predictable performance eliminates the need for separate caching tiers, and delivers a far better price-performance ratio compared to traditional database software.
Installation of Grafana on linux ; connectivity with Prometheus database , installation of Prometheus ; Installation of node_exporter ,Tomcat-exporter ; installation and configuration of alert manager .. Detailed step by step installation and working
In this presentation, we will do assess the on-premises environment and determining what workloads and databases are ready to make the move and what can you do to improve their Azure readiness while reducing downtime during the migration. Planning and assessment plays a critical role in moving to the cloud. We would see wide range of resources and tools to get an assessment completed with ease while identifying workload dependencies with practical tips and tricks focusing on sizing and costs. And finally, we’ll assess the SQL instances and identify their readiness for Azure as well.
Amazon Virtual Private Cloud (VPC): Networking Fundamentals and Connectivity ...Amazon Web Services
In this session, we will walk through the fundamentals of Amazon Virtual Private Cloud (VPC). We will discuss core VPC concepts including picking your IP space, subnetting, routing, security, NAT and VPC Endpoints.
Microsoft Azure Platform-as-a-Service (PaaS)Chris Dufour
Azure is Microsoft’s cloud computing platform made up of a growing collection of integrated services: compute, storage, data, networking and apps.
Azure is the only major cloud platform ranked by Gartner as an industry leader for both Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS). This powerful combination of managed and unmanaged services lets you build, deploy and manage applications in any way you like for unmatched productivity.
In this talk we will take a look at Microsoft’s cloud strategy and see how you can leverage PaaS in your environment.
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm NATS
NATS is a high-performance messaging system optimized for simplicity, reliability and low latency which can be a lightweight solution for the internal communication of your distributed system. In this talk, we will cover its core feature set as well as how to develop and assemble NATS-based microservices using the latest Docker tooling such as Compose and Swarm mode.
You can learn more about NATS at http://www.nats.io
쿠키런: 킹덤 대규모 인프라 및 서버 운영 사례 공유 [데브시스터즈 - 레벨 200] - 발표자: 용찬호, R&D 엔지니어, 데브시스터즈 ...Amazon Web Services Korea
<쿠키런:킹덤> 게임 유저 수가 급증하면서 지금까지 겪어보지 못했던 대규모 인프라 환경을 운영하게 되었고, 그 과정에서 다양한 문제점들에 부딪히게 되었습니다. 이 세션에서는 AWS에서 Stateful 한 게임 서버를 어떻게 운영해야 하는지 아키텍처 관점에서 먼저 설명한 후, 수 백만 명의 사용자를 감당하기 위해 해결해야 했던 어려움에 대해 Scalability 관점에서 설명해드립니다.
AWS-powered services for analytics can handle the scale, agility, and flexibility required to combine different types of data and analytics approaches that will allow you to transform your data into a valuable corporate asset. In this session, AWS will provide an overview of the different AWS services available for your data analytics needs. You can combine these blocks to build data flows that will extend your organization’s agility, ability to derive more insights and value from its data, and capability to adopt more sophisticated analytics tools and processes as your needs evolve. In the second part of the session, Paddy Power Betfair’s Data team will discuss the adoption and large scale operation of a broad range of AWS services that make up PPB’s scalable, mixed workload, multi-brand data platform. The data capabilities developed by PPB and powered by AWS were implemented to enable low-latency, high-volume and near real-time advanced analytics use cases, in the highly regulated and fast-paced betting industry. This was only possible through a focus on automation, innovation and continuous improvement.
Building Streaming Data Pipelines with Google Cloud Dataflow and Confluent Cl...HostedbyConfluent
We will demonstrate how easy it is to use Confluent Cloud as the data source of your Beam pipelines. You will learn how to process the information that comes from Confluent Cloud in real time, make transformations on such information and feed it back to your Kafka topics and other parts of your architecture.
MongoDB Atlas makes it easy to set up, operate, and scale your MongoDB deployments in the cloud. From high availability to scalability, security to disaster recovery - we've got you covered.
Automated: With MongoDB Atlas, you no longer need to worry about operational tasks such as provisioning, configuration, patching, upgrades, backups, and failure recovery. MongoDB Atlas provides the functionality and reliability you need, at the click of a button.
Flexible: Only MongoDB Atlas combines the critical capabilities of relational databases with the innovations of NoSQL. Radically simplify development and operations by delivering a diverse range of capabilities in a single, managed database platform.
Secure: MongoDB Atlas provides multiple levels of security for your database. These include robust access control, network isolation using Amazon VPC, IP whitelists, encryption of data in-flight using TLS/SSL, and optional encryption of the underlying filesystem.
Scalable: MongoDB Atlas grows with you, all with the click of a button. You can scale up across a range of instance sizes, and scale-out with automatic sharding. And you can do it with zero application downtime.
Highly Available: MongoDB Atlas is designed to offer exceptional uptime. Recovery from instance failures is transparent and fully automated. A minimum of three copies of your data are replicated across availability zones and continuously backed up.
High Performance: MongoDB Atlas provides high throughput and low latency for the most demanding workloads. Consistent, predictable performance eliminates the need for separate caching tiers, and delivers a far better price-performance ratio compared to traditional database software.
Slides from QConSF Nov 19th, 2011 focusing this time on describing the globally distributed and scaled industrial strength Java Platform as a Service that Netflix has built and run on top of AWS and Cassandra. Parts of that platform are being released as open source - Curator, Priam and Astyanax.
Openstack Rally - Benchmark as a Service. Openstack Meetup India. Ananth/Rahul.Rahul Krishna Upadhyaya
Slide deck used at the presentation at Openstack India Meetup on 01/March 2014 at Netapp, Bangalore. Slide talks about installation and use of Rally and its scope to benchmark and measure performance. There is little on how to install Cisco Openstack as a All in One setup.
LinkedIn Data Infrastructure Slides (Version 2)Sid Anand
Learn about Espresso, Databus, and Voldemort. LinkedIn Data Infrastructure Slides (Version 2). This talk was given in NYC on June 20, 2012
You can download the slides as PPT in order to see the transitions here :
http://bit.ly/LfH6Ru
Document Classification using DMX in SQL Server Analysis ServicesMark Tabladillo
Presentation for SQL Saturday Raleigh NC, Septmber 18, 2010
Overview of using DMX (Data Mining Extensions) in Excel, SSMS (SQL Server Management Studio), BIDS (Business Intelligence Development Studio), and PowerShell
IBM THINK 2019 - A Sharing Economy for Analytics: SQL Query in IBM CloudTorsten Steinbach
Cloud is a sharing economy that reduces your spending. But does this also apply to data and analytics? Doesn't this require you to provision dedicated data warehouse systems to run analytics SQL queries on terabytes of data? With IBM Cloud, the answer is no. By using serverless analytics via IBM Cloud SQL Query, you can analyze your data directly where it sits, be it in IBM Cloud Object Storage or in your NoSQL databases. Due to the serverless nature of SQL Query, you only pay for your queries depending on the data volume that they process. There are no standing costs. You do not need to provision and wait for a data warehouse. But you can still run SQLs on terabytes of data.
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsInside Analysis
The Briefing Room with Barry Devlin and NuoDB
Live Webcast on Oct. 23, 2012
Three major factors in enterprise computing are combining to rewrite how data is stored, accessed and managed: 1) the demand of analytics that now spreads across hundreds, even thousands of users; 2) the pervasiveness of Big Data in all its forms and sizes; and 3) the rise of the commodity data center, aka Cloud computing. The convergence of these forces calls for a new data foundation, one that can handle the scalability and workload issues that face today's information managers.
Check out this episode of The Briefing Room to learn from veteran Analyst Barry Devlin, one of the very first architects of data warehousing, who will explain how today's information architectures require a radically different approach. He'll be briefed by Barry Morris, Founder and CEO of NuoDB, who will tout his company's product, described as a peer-to-peer messaging system that acts as a database. It behaves just like a traditional relational database, but was designed with a completely distributed and scalable architecture.
http://www.insideanalysis.com
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT. Technology includes SQL Server 2012 SP1, Office 2013, Windows 8 Professional.
What if all members of your software development team from Project Managers, Business Analysts, Testing and documentation members could create and modify web applications and web services? With traditional SQL solutions this was difficult because of the need to convert web pages to objects, objects to tables as well as the reverse functions. But now with native XML databases and drag-and-drop forms builders, data can flow from the XML model of a web form to the database and back again without translation. This radically simpler process combined with standardized query languages makes it easier for non-programmers to build and maintain their own applications and web services.
Transitioning a Full Enterprise to Cloud in 10 Months - Cloud Exposjdeluca
Embarked on deploying a new enterprise infrastructure as part of a divestiture. Enterprise systems were stood up to execute business and leveraged Cloud technologies to the full extent possible. These artifacts discuss the landscape and lessons learned from this experience.
Similar to LinkedIn Data Infrastructure (QCon London 2012) (20)
Building & Operating High-Fidelity Data Streams - QCon Plus 2021Sid Anand
The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention, to content and network recommendations, to ranking and bidding, our world not only consumes low-latency data streams, it adapts to changing conditions modeled by that data.
While software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the ecosystem of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few.
Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.
Cloud Native Data Pipelines (in Eng & Japanese) - QCon TokyoSid Anand
Slides from "Cloud Native Data Pipelines" talk given @ QCon Tokyo 2016. The slides are in both English and Japanese. Thanks to Kiro Harada (https://jp.linkedin.com/in/haradakiro) for the translation.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
2. About Me
Current Life…
LinkedIn *
Web / Software Engineering
Search, Network, and Analytics (SNA)
Distributed Data Systems (DDS)
Me
In a Previous Life…
Netflix, Cloud Database Architect
eBay, Web Development, Research Lab, & Search
Engine ***
And Many Years Prior…
Studying Distributed Systems at Cornell University
@r39132 2
3. Our mission
Connect the world’s professionals to make
them more productive and successful
@r39132 3
4. The world’s largest professional network
Over 60% of members are now international
75%
**
150M+ *
90 Fortune 100 Companies
use LinkedIn to hire
>2M
***
Company Pages
55
32
16
Languages
~4.2B
17 ***
8
2 4
Professional searches in 2011
2004 2005 2006 2007 2008 2009 2010 *as of February 9, 2012
**as of September 30, 2011
LinkedIn Members (Millions) ***as of December 31, 2011
@r39132 4
5. Other Company Facts
• Headquartered in Mountain View, Calif., with offices around the world!
*
• As of December 31, 2011, LinkedIn has 2,116 full-time employees located
around the world.
• Currently around 650 people work in Engineering
• 400 in Web/Software Engineering
• Plan to add another 200 in 2012 ***
• 250 in Operations
*as of February 9, 2012
**as of September 30, 2011
***as of December 31, 2011
@r39132 5
6. Agenda
Company Overview
Architecture
– Data Infrastructure Overview
– Technology Spotlight
Oracle
Voldemort
DataBus
Kafka
Q & A
@r39132 6
7. LinkedIn : Architecture
Overview
• Our site runs primarily on Java, with some use of Scala for specific
infrastructure
• What runs on Scala?
• Network Graph Service
• Kafka
• Most of our services run on Apache + Jetty
@r39132 7
8. LinkedIn : Architecture
A web page requests
information A and B
Presentation Tier A thin layer focused on
building the UI. It assembles
the page by making parallel
requests to BST services
A A B B C C Business Service Tier Encapsulates business
logic. Can call other BST
clusters and its own DST
cluster.
A A B B C C Data Service Tier Encapsulates DAL logic and
concerned with one Oracle
Schema.
Data Infrastructure Concerned with the
Master Slave Master Master
persistent storage of and
easy access to data
Memcached
Voldemort
@r39132 8
9. LinkedIn : Architecture
A web page requests
information A and B
Presentation Tier A thin layer focused on
building the UI. It assembles
the page by making parallel
requests to BST services
A A B B C C Business Service Tier Encapsulates business
logic. Can call other BST
clusters and its own DST
cluster.
A A B B C C Data Service Tier Encapsulates DAL logic and
concerned with one Oracle
Schema.
Data Infrastructure Concerned with the
Master Slave Master Master
persistent storage of and
easy access to data
Memcached
Voldemort
@r39132 9
10. LinkedIn : Data Infrastructure Technologies
• Database Technologies
• Oracle
• Voldemort *
• Espresso
• Data Replication Technologies
• Kafka
• DataBus
• Search Technologies
• Zoie – real-time search and indexing with Lucene
• Bobo – faceted search library for Lucene ***
• SenseiDB – fast, real-time, faceted, KV and full-text Search Engine and more
@r39132 10
11. LinkedIn : Data Infrastructure Technologies
This talk will focus on a few of the key technologies below!
• Database Technologies *
• Oracle
• Voldemort
• Espresso – A new K-V store under development
• Data Replication Technologies
• Kafka
• DataBus
• Search Technologies ***
• Zoie – real-time search and indexing with Lucene
• Bobo – faceted search library for Lucene
• SenseiDB – fast, real-time, faceted, KV and full-text Search Engine and more
@r39132 11
13. Oracle : Overview
Oracle
• All user-provided data is stored in Oracle – our current source of truth
• About 50 Schemas running on tens of physical instances
*
• With our user base and traffic growing at an accelerating pace, so how do we
scale Oracle for user-provided data?
Scaling Reads
• Oracle Slaves (c.f. DSC)
• Memcached
• Voldemort – for key-value lookups
***
Scaling Writes
• Move to more expensive hardware or replace Oracle with something else
@r39132 13
14. Oracle : Overview – Data Service Context
Scaling Oracle Reads using DSC
DSC uses a token (e.g. cookie) to ensure that a reader always
sees his or her own writes immediately
– If I update my own status, it is okay if you don’t see the
change for a few minutes, but I have to see it immediately
@r39132 14
15. Oracle : Overview – How DSC Works?
When a user writes data to the master, the DSC token (for that
data domain) is updated with a timestamp
When the user reads data, we first attempt to read from a
replica (a.k.a. slave) database
If the data in the slave is older than the data in the DSC token,
we read from the Master instead
@r39132 15
17. Voldemort : Overview
• A distributed, persistent key-value store influenced by the AWS Dynamo paper
• Key Features of Dynamo
Highly Scalable, Available, and Performant
Achieves this via Tunable Consistency
• For higher consistency, the user accepts lower availability, scalability, and
performance, and vice-versa
Provides several self-healing mechanisms when data does become inconsistent
• Read Repair
Repairs value for a key when the key is looked up/read
• Hinted Handoff
Buffers value for a key that wasn’t successfully written, then writes it later
• Anti-Entropy Repair
Scans the entire data set on a node and fixes it
Provides means to detect node failure and a means to recover from node failure
• Failure Detection
• Bootstrapping New Nodes
@r39132 17
18. Voldemort : Overview
API Layered, Pluggable Architecture
VectorClock<V> get (K key)
Client
put (K key, VectorClock<V> value)
applyUpdate(UpdateAction action, int retries) Client API
Conflict Resolution
Voldemort-specific Features Serialization
Implements a layered, pluggable Repair Mechanism
architecture
Failure Detector
Routing
Each layer implements a common interface
(c.f. API). This allows us to replace or
remove implementations at any layer Server
Repair Mechanism
• Pluggable data storage layer Failure Detector Admin
BDB JE, Custom RO storage,
Routing
etc…
Storage Engine
• Pluggable routing supports
Single or Multi-datacenter routing
@r39132 18
19. Voldemort : Overview
Layered, Pluggable Architecture
Client
Voldemort-specific Features
Client API
Conflict Resolution
• Supports Fat client or Fat Server Serialization
Repair Mechanism
• Repair Mechanism + Failure Failure Detector
Detector + Routing can run on Routing
server or client
Server
• LinkedIn currently runs Fat Client, but we Repair Mechanism
would like to move this to a Fat Server Failure Detector Admin
Model Routing
Storage Engine
@r39132 19
21. Voldemort : Usage Patterns @ LinkedIn
2 Usage-Patterns
Read-Write Store
– Uses BDB JE for the storage engine
– 50% of Voldemort Stores (aka Tables) are RW
Read-only Store
– Uses a custom Read-only format
– 50% of Voldemort Stores (aka Tables) are RO
Let’s look at the RO Store
@r39132 21
22. Voldemort : RO Store Usage at LinkedIn
People You May Know
Viewers of this profile also viewed
Related Searches
Events you may be interested in
LinkedIn Skills
Jobs you may be
interested in
@r39132 22
23. Voldemort : Usage Patterns @ LinkedIn
RO Store Usage Pattern
1. Use Hadoop to build a model
2. Voldemort loads the output of Hadoop
3. Voldemort serves fast key-value look-ups on the site
– e.g. For key=“Sid Anand”, get all the people that “Sid Anand” may know!
– e.g. For key=“Sid Anand”, get all the jobs that “Sid Anand” may be interested in!
@r39132 23
24. How Do The Voldemort RO
Stores Perform?
@r39132 24
28. DataBus : Use-Cases @ LinkedIn
Updates
Standard Search Graph Read
ization Index Index Replicas
Oracle
Data Change Events
A user updates his profile with skills and position history. He also accepts a connection
• The write is made to an Oracle Master and DataBus replicates:
• the profile change to the Standardization service
E.G. the many forms of IBM are canonicalized for search-friendliness and
recommendation-friendliness
• the profile change to the Search Index service
Recruiters can find you immediately by new keywords
• the connection change to the Graph Index service
The user can now start receiving feed updates from his new connections immediately
@r39132 28
29. DataBus : Architecture
DataBus consists of 2 services
Capture Relay • Relay Services
DB Changes
Event Win • Sharded
• Maintain an in-memory buffer per
On-line shard
Changes
• Each shard polls Oracle and then
deserializes transactions into Avro
Bootstrap
• Bootstrap Service
DB • Picks up online changes as they
appear in the Relay
• Supports 2 types of operations
from clients
If a client falls behind and
needs records older than what
the relay has, Bootstrap can
send consolidated deltas!
If a new client comes on line,
Bootstrap can send a
consistent snapshot
@r39132 29
30. DataBus : Architecture
Client
Capture Relay Consumer 1
Client Lib
On-line
Databus
DB Changes Changes
Event Win Consumer n
On-line
ated
Changes solid
Con eT
Sinc
Delta Client
Bootstrap
Consumer 1
Client Lib
Consistent
Databus
Snapshot at U Consumer n
DB
Guarantees
Transactional semantics
In-commit-order Delivery
At-least-once delivery
Durability (by data source)
High-availability and reliability
Low latency
@r39132 30
31. DataBus : Architecture - Bootstrap
Generate consistent snapshots and consolidated deltas
during continuous updates with long-running queries
Client
Read
Consumer 1
Client Lib
Databus
Online
Relay Changes
Consumer n
Event Win
Read
Changes Bootstrap server Bootstrap
Read
Log Writer Server
recent events
Replay
Log Storage events Snapshot Storage
Log Applier
@r39132 31
33. Kafka : Usage at LinkedIn
Where as DataBus is used for Database change capture and
replication, Kafka is used for application-level data streams
Examples:
• End-user Action Tracking (a.k.a. Web Tracking) of
• Emails opened
• Pages seen
• Links followed
• Executing Searches
• Operational Metrics
• Network & System metrics such as
• TCP metrics (connection resets, message resends, etc…)
• System metrics (iops, CPU, load average, etc…)
@r39132 33
34. Kafka : Overview
Broker Tier
WebTier Consumers
Push Sequential write sendfile Pull
Events Events Iterator 1
Client Lib
Topic 1
Kafka
100 MB/sec 200 MB/sec
Topic 2
Iterator n
Topic N
Topic Offset
Topic, Partition Offset
Ownership
Zookeeper Management
Features Guarantees Scale
Pub/Sub At least once delivery Billions of Events
Batch Send/Receive Very high throughput TBs per day
System Decoupling Low latency Inter-colo: few seconds
Durability Typical retention: weeks
Horizontally Scalable
@r39132 34
35. Kafka : Overview
Key Design Choices
• When reading from a file and sending to network socket, we typically incur 4 buffer copies and
2 OS system calls
Kafka leverages a sendFile API to eliminate 2 of the buffer copies and 1 of the system
calls
• No double-buffering of messages - we rely on the OS page cache and do not store a copy of
the message in the JVM
Less pressure on memory and GC
If the Kafka process is restarted on a machine, recently accessed messages are still in
the page cache, so we get the benefit of a warm start
• Kafka doesn't keep track of which messages have yet to be consumed -- i.e. no book keeping
overhead
Instead, messages have time-based SLA expiration -- after 7 days, messages are deleted
@r39132 35
39. Kafka : Performance : Resilience as Messages Pile Up
(1
topic,
broker
flush
interval
10K)
200000
160000
Throughput
in
msg/s
120000
80000
40000
0
10
105
199
294
388
473
567
662
756
851
945
1039
Unconsumed data in GB
@r39132 39
40. Acknowledgments
Presentation & Content
Chavdar Botev (DataBus) @cbotev
Roshan Sumbaly (Voldemort) @rsumbaly
Neha Narkhede (Kafka) @nehanarkhede
Development Team
Aditya Auradkar, Chavdar Botev, Shirshanka Das, Dave DeMaagd, Alex
Feinberg, Phanindra Ganti, Lei Gao, Bhaskar Ghosh, Kishore
Gopalakrishna, Brendan Harris, Joel Koshy, Kevin Krawez, Jay Kreps, Shi
Lu, Sunil Nagaraj, Neha Narkhede, Sasha Pachev, Igor Perisic, Lin Qiao,
Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian, Oliver Seeliger,
Adam Silberstein, Boris Skolnick, Chinmay Soman, Roshan Sumbaly, Kapil
Surlaker, Sajid Topiwala, Balaji Varadarajan, Jemiah Westerman, Zach
White, David Zhang, and Jason Zhang
@r39132 40