Data Applications and Infrastructure at LinkedIn__HadoopSummit2010

•

97 likes•7,776 views

Yahoo Developer Network

Hadoop Summit 2010 - application track Data Applications and Infrastructure at LinkedIn Jay Kreps, LinkedIn

Technology

Data Applications and Infrastructure at LinkedIn ,[object Object],LinkedIn

Plan ,[object Object],[object Object],[object Object]

Data-centric engineering at LinkedIn ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

People You May Know ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Relevance Products ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Infrastructure as an Ecosystem ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Open Source Zoie – Faceted Search Bobo – Real-time search indexing Decomposer – Very large matrix decomposition routines (now in Mahout) Norbert – Partition aware cluster management & RPC Voldemort – Key/Value storage Kamikaze – Compression package Sensei – Distributed search Azkaban – Hadoop workflow

Azkaban workflow:hadoop :: web framework:webapp

Azkaban Examples ,[object Object],Example workflow UI

Azkaban ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Data Deployment How do you get your multi-billion edge probabilistic relationship graph to the live website to serve queries?

Voldemort ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Voldemort Data Deployment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What's hot

WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan

What's new in SQL on Hadoop and BeyondDataWorks Summit/Hadoop Summit

Lambda-less Stream Processing @Scale in LinkedIn DataWorks Summit/Hadoop Summit

Big Data Ready Enterprise DataWorks Summit/Hadoop Summit

What is an Open Data Lake? - Data Sheets | WhitepaperVasu S

Discovery & Consumption of Analytics Data @TwitterKamran Munshi

Benefits of Hadoop as Platform as a ServiceDataWorks Summit/Hadoop Summit

Machine learning at scale challenges and solutionsStavros Kontopoulos

Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit

The Past, Present and Future of Big Data @LinkedInSuja Viswesan

Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop Shirshanka Das

[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business

AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...Databricks

Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward

Querying Druid in SQL with SupersetDataWorks Summit

Gobblin' Big Data With Ease @ QConSF 2014Lin Qiao

Spark and Couchbase– Augmenting the Operational Database with SparkMatt Ingenthron

Schema-on-Read vs Schema-on-WriteAmr Awadallah

Data Infrastructure at LinkedInAmy W. Tang

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA

What's hot (20)

WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms

What's new in SQL on Hadoop and Beyond

Lambda-less Stream Processing @Scale in LinkedIn

Big Data Ready Enterprise

What is an Open Data Lake? - Data Sheets | Whitepaper

Discovery & Consumption of Analytics Data @Twitter

Benefits of Hadoop as Platform as a Service

Machine learning at scale challenges and solutions

Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse

The Past, Present and Future of Big Data @LinkedIn

Strata SG 2015: LinkedIn Self Serve Reporting Platform on Hadoop

[Webinar] Getting to Insights Faster: A Framework for Agile Big Data

AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...

Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...

Querying Druid in SQL with Superset

Gobblin' Big Data With Ease @ QConSF 2014

Spark and Couchbase– Augmenting the Operational Database with Spark

Schema-on-Read vs Schema-on-Write

Data Infrastructure at LinkedIn

Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...

Viewers also liked

Building a Real-Time Data Pipeline: Apache Kafka at LinkedInAmy W. Tang

Graph dbGagan Agrawal

GraphDB Connectors – Powering Complex SPARQL QueriesMarin Dimitrov

LinkedIn Data Infrastructure Slides (Version 2)Sid Anand

Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das

Architecture of a Kafka camus infrastructuremattlieber

Netflix Data Pipeline With KafkaAllen (Xiaozhong) Wang

NoSQL x SQL: Bancos de Dados em Nuvens ComputacionaisCarlo Pires

The Big Data Analytics Ecosystem at LinkedInrajappaiyer

Apache KafkaMaher TEBOURBI

Bigger Faster Easier: LinkedIn Hadoop Summit 2015Shirshanka Das

Text Analytics & Linked Data Management As-a-ServiceMarin Dimitrov

Realtime streaming architecture in INFINARIOJozo Kovac

IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...In-Memory Computing Summit

Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...rajappaiyer

Bringing OLTP woth OLAP: Lumos on HadoopDataWorks Summit

Comparação de desempenho entre SQL e NoSQLpichiliani

Free Code Friday - Spark Streaming with HBaseMapR Technologies

Real-time Analytics with Apache Flink and DruidJan Graßegger

Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012Shirshanka Das

Viewers also liked (20)

Building a Real-Time Data Pipeline: Apache Kafka at LinkedIn

Graph db

GraphDB Connectors – Powering Complex SPARQL Queries

LinkedIn Data Infrastructure Slides (Version 2)

Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem

Architecture of a Kafka camus infrastructure

Netflix Data Pipeline With Kafka

NoSQL x SQL: Bancos de Dados em Nuvens Computacionais

The Big Data Analytics Ecosystem at LinkedIn

Apache Kafka

Bigger Faster Easier: LinkedIn Hadoop Summit 2015

Text Analytics & Linked Data Management As-a-Service

Realtime streaming architecture in INFINARIO

IMCSummit 2015 - Day 2 IT Business Track - Real-time Interactive Big Data Ana...

Taming the ETL beast: How LinkedIn uses metadata to run complex ETL flows rel...

Bringing OLTP woth OLAP: Lumos on Hadoop

Comparação de desempenho entre SQL e NoSQL

Free Code Friday - Spark Streaming with HBase

Real-time Analytics with Apache Flink and Druid

Databus: LinkedIn's Change Data Capture Pipeline SOCC 2012

Similar to Data Applications and Infrastructure at LinkedIn__HadoopSummit2010

UnConference for Georgia Southern Computer Science March 31, 2015Christopher Curtin

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal

Hadoop and Voldemort @ LinkedInHadoop User Group

Super Sizing Youtube with Pythondidip

Os Solomonoscon2007

scale_perf_best_practiceswebuploader

Bhupeshbansal bigdata Bhupesh Bansal

Front Range PHP NoSQL DatabasesJon Meredith

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Precisely

Drupal Performance : DrupalCamp NorthPhilip Norton

Web20expo Scalable Web Archmclee

Web20expo Scalable Web Archroyans

Web20expo Scalable Web Archguest18a0f1

Java ee7 with apache spark for the world's largest credit card core systems, ...Rakuten Group, Inc.

Stream Processing with CompletableFuture and Flow in Java 9Trayan Iliev

Final deckSteve Watt

Beat the devil: towards a Drupal performance benchmarkPedro González Serrano

Performance Analysis of Idle Programsgreenwop

Real time analyticsLeandro Totino Pereira

Apache Kafka® and the Data MeshConfluentInc1

Similar to Data Applications and Infrastructure at LinkedIn__HadoopSummit2010 (20)

UnConference for Georgia Southern Computer Science March 31, 2015

Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010

Hadoop and Voldemort @ LinkedIn

Super Sizing Youtube with Python

Os Solomon

scale_perf_best_practices

Bhupeshbansal bigdata

Front Range PHP NoSQL Databases

Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...

Drupal Performance : DrupalCamp North

Web20expo Scalable Web Arch

Java ee7 with apache spark for the world's largest credit card core systems, ...

Stream Processing with CompletableFuture and Flow in Java 9

Final deck

Beat the devil: towards a Drupal performance benchmark

Performance Analysis of Idle Programs

Real time analytics

Apache Kafka® and the Data Mesh

Recently uploaded

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2

Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2

Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz

Exploring Multimodal Embeddings with MilvusZilliz

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Corporate and higher education May webinar.pptxRustici Software

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Quantum Leap in Next-Generation ComputingWSO2

Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya

Introduction to use of FHIR Documents in ABDMKumar Satyam

Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea

DBX First Quarter 2024 Investor PresentationDropbox

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

MINDCTI Revenue Release Quarter One 2024MIND CTI

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Recently uploaded (20)

WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...

Vector Search -An Introduction in Oracle Database 23ai.pptx

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

WSO2's API Vision: Unifying Control, Empowering Developers

Stronger Together: Developing an Organizational Strategy for Accessible Desig...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Introduction to Multilingual Retrieval Augmented Generation (RAG)

Exploring Multimodal Embeddings with Milvus

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Corporate and higher education May webinar.pptx

How to Troubleshoot Apps for the Modern Connected Worker

Quantum Leap in Next-Generation Computing

Artificial Intelligence Chap.5 : Uncertainty

Introduction to use of FHIR Documents in ABDM

Finding Java's Hidden Performance Traps @ DevoxxUK 2024

DBX First Quarter 2024 Investor Presentation

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model

MINDCTI Revenue Release Quarter One 2024

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...