SlideShare a Scribd company logo
k8s : h4pk8s : h4p
From docker to kubernetes:
Running Apache Hadoop in a cloud native way
Marton Elek
RAFT library for Java
https://flokkr.github.io
RAFT library for Java
FROM frolvlad/alpine-oraclejdk8
ADD hadoop-3.2.0.tar.gz /opt
WORKDIR /opt/hadoop
DockerfileDockerfile
<configuration>
<property>
<name>dfs.namenode.rpc-address</name>
<value>namenode:9000</value>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.ozone.HddsDatanodeService</value>
</property>
<property>
<name>rpc.metrics.percentiles.intervals</name>
<value>60,300</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/namenode</value>
</property>
<property>
<name>rpc.metrics.quantile.enable</name>
<value>true</value>
</property>
</configuration>
version: "3"
services:
service1:
image: apache/imagename
hostname: namenode
ports:
- 9870:9870
environment:
CONFIGURATION1: value
DFS_DIR: /dfs
THREAD_NUMBER: 1
How to handle configuration?How to handle configuration?
Create a simple launcher script to
Create config file from environment variables
Start the application
version: "3"
services:
namenode:
image: flokkr/hadoop
hostname: namenode
command: ["hdfs","namenode"]
ports:
- 9870:9870
environment:
ENSURE_NAMENODE_DIR: "/tmp/hadoop-root/dfs/name"
CORE-SITE.XML_fs.defaultFS: "hdfs://namenode:9000"
HDFS-SITE.XML_dfs.namenode.rpc-address: "namenode:9000"
HDFS-SITE.XML_dfs.replication: "1"
datanode:
image: flokkr/hadoop
command: ["hdfs", "datanode"]
environment:
CORE-SITE.XML_fs.defaultFS: "hdfs://namenode:9000"
HDFS-SITE.XML_dfs.namenode.rpc-address: "namenode:9000"
HDFS-SITE.XML_dfs.replication: "1"
LOG4J.PROPERTIES_log4j.rootLogger: "INFO, stdout"
LOG4J.PROPERTIES_log4j.appender.stdout: "org.apache.log4j.ConsoleAppender"
LOG4J.PROPERTIES_log4j.appender.stdout.layout: "org.apache.log4j.PatternLayout"
LOG4J.PROPERTIES_log4j.appender.stdout.layout.ConversionPattern: "%d{yyyy-MM-dd HH:mm:ss} %-5p
ENV (script)
n/a
n/a
Container is the unit of packaging.
Launcher script:
the source of the power in containers
FROM frolvlad/alpine-oraclejdk8
ADD hadoop-3.2.0.tar.gz /opt
WORKDIR /opt/hadoop
CMD ["/opt/launcher.sh"]
DockerfileDockerfile
FROM frolvlad/alpine-oraclejdk8
ADD hadoop-3.2.0.tar.gz /opt
WORKDIR /opt/hadoop
CMD ["/opt/launcher.sh"]
DockerfileDockerfile
launcher.sh hdfs namenode
Final command executed by docker:
#!/usr/bin/env bash
if [ -n "$SLEEP_SECONDS" ]; then
echo "Sleeping for $SLEEP_SECONDS seconds"
sleep $SLEEP_SECONDS
fi
if [ -n "$ENSURE_NAMENODE_DIR" ]; then
if [ ! -d "$ENSURE_NAMENODE_DIR" ]; then
/opt/hadoop/bin/hdfs namenode -format -force $CLUSTERID_OPTS
fi
fi
"$@"
Launcher scriptLauncher script
Launcher scriptLauncher script
Create config files from ENV
Launcher scriptLauncher script
Create config files from ENV
Wait for the dependency (TCP check)
Launcher scriptLauncher script
Create config files from ENV
Wait for the dependency (TCP check)
Download additional optional component
Launcher scriptLauncher script
Create config files from ENV
Wait for the dependency (TCP check)
Download additional optional component
Prepare HDFS (format namenode, ...)
Launcher scriptLauncher script
Create config files from ENV
Wait for the dependency (TCP check)
Download additional optional component
Prepare HDFS (format namenode, ...)
Retrieve kerberos/SSL secrets
Launcher scriptLauncher script
Create config files from ENV
Wait for the dependency (TCP check)
Download additional optional component
Prepare HDFS (format namenode, ...)
Retrieve kerberos/SSL secrets
Enable prometheus monitoring (Java agent)
Launcher scriptLauncher script
Create config files from ENV
Wait for the dependency (TCP check)
Download additional optional component
Prepare HDFS (format namenode, ...)
Retrieve kerberos/SSL secrets
Enable prometheus monitoring (Java agent)
Show network traffic (Instrumentation with Java agent)
Hashicorp stackHashicorp stack
"do it yourself""do it yourself"
Docker network
Host network
docker host
host dns
dns
yes
host
<configuration>
<property>
<name>dfs.namenode.rpc-address</name>
<value>namenode:9000</value>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.ozone.HddsDatanodeService</value>
</property>
<property>
<name>rpc.metrics.percentiles.intervals</name>
<value>60,300</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/namenode</value>
</property>
<property>
<name>rpc.metrics.quantile.enable</name>
<value>true</value>
</property>
</configuration>
DIYDIY
Consul
Yes (script)
Restart
host netw
Nomad
.nomad
redeploy
no
yes
host netw
yes
consul
yes
host
KubernetesKubernetes
"out of the box""out of the box"
+Network!
+Storage!!
(volume, secrets,
configs)
DaemonSet
ReplicaSet
Service Pod
StatefulSet
DaemonSet
Benefits ofBenefits of
Hadoop + k8s?Hadoop + k8s?
Benefits ofBenefits of
Hadoop + k8s?Hadoop + k8s?
 
 
Ecosystem
 
Flexibility
Example:Example:
Monitor Hadoop with PrometheusMonitor Hadoop with Prometheus
Pod
Sidecar pattern
Pod
Sidecar for monitoring over jmx
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ozone-hdfs-namenode
spec:
serviceName: ozone2-hdfs-namenode
replicas: 1
template:
metadata:
labels:
app: ozone
spec:
containers:
- name: hdfs-namenode
image: flokkr/ozone:2.1.0
args: ["hdfs","namenode"]
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ozone-hdfs-namenode
spec:
serviceName: ozone2-hdfs-namenode
replicas: 1
template:
metadata:
labels:
app: ozone
spec:
containers:
- name: hdfs-namenode
image: flokkr/ozone:2.1.0
args: ["hdfs","namenode"]
Flexibility
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ozone-hdfs-namenode
spec:
serviceName: ozone2-hdfs-namenode
replicas: 1
template:
metadata:
labels:
app: ozone
spec:
shareProcessNamespace: true
containers:
- name: hdfs-namenode
image: flokkr/ozone:2.1.0
args: ["hdfs","namenode"]
- name: jmxpromo
image: flokkr/jmxpromo-sidecar
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ozone-hdfs-namenode
spec:
serviceName: ozone2-hdfs-namenode
replicas: 1
template:
metadata:
labels:
app: ozone
spec:
shareProcessNamespace: true
containers:
- name: hdfs-namenode
image: flokkr/ozone:2.1.0
args: ["hdfs","namenode"]
- name: jmxpromo
image: flokkr/jmxpromo-sidecar
Flexibility
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ozone-hdfs-namenode
spec:
serviceName: ozone2-hdfs-namenode
replicas: 1
template:
metadata:
labels:
app: ozone
spec:
containers:
- name: hdfs-namenode
image: flokkr/ozone:2.1.0
args: ["hdfs","namenode"]
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: ozone-hdfs-namenode
spec:
serviceName: ozone2-hdfs-namenode
replicas: 1
template:
metadata:
labels:
app: ozone
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "28942"
spec:
containers:
- name: hdfs-namenode
image: flokkr/ozone:2.1.0
args: ["hdfs","namenode"]
 
Ecosystem
configmap
helm
n/a
CNI
k8s
helm, yaml
yes
namespaces
yes
CNI
statefulset
DNS
no
service/ingress
Is HadoopIs Hadoop
cloud native?cloud native?
Is HadoopIs Hadoop
cloud native?cloud native?
Yes, with small modifications
Is HadoopIs Hadoop
cloud native?cloud native?
Yes, with small modifications
DNS/IP handling. Shoud work
without DNS
with changing DNS
Is HadoopIs Hadoop
cloud native?cloud native?
Yes, with small modifications
DNS/IP handling. Shoud work
without DNS
with changing DNS
More flexible configuration loading
Is HadoopIs Hadoop
cloud native?cloud native?
Yes, with small modifications
DNS/IP handling. Shoud work
without DNS
with changing DNS
More flexible configuration loading
Reverse proxy friendly UI
SummarySummary
SummarySummary
Don't buy without checking
the label
SummarySummary
Don't buy without checking
the label
Containerization can help a
lot to manage our Bigdata
clusters
SummarySummary
Don't buy without checking
the label
Hadoop is first class citizen
of cloud-native/containerized
environments*
Containerization can help a
lot to manage our Bigdata
clusters
Q&AQ&A
 
 
Márton Elek @anzix
https://flokkr.github.io (bigdata + containers project)
https://github.com/flokkr (source)
elek@apache.org
Image creditsImage credits
Yan Pritzker (CC)
Carrie Cizauskas (CC)

More Related Content

Similar to From docker to kubernetes: running Apache Hadoop in a cloud native way

Puppet
PuppetPuppet
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
Padma shree. T
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
Wesley Beary
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
Edureka!
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
Rahul Jain
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
Alejandro Fernandez
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 
Infrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformInfrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using Terraform
Adin Ermie
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Wesley Beary
 
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps dayAprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Plain Concepts
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Uday Vakalapudi
 
Apache hadoop 2_installation
Apache hadoop 2_installationApache hadoop 2_installation
Apache hadoop 2_installation
sushantbit04
 
Cutting through the fog of cloud
Cutting through the fog of cloudCutting through the fog of cloud
Cutting through the fog of cloud
Kyle Rames
 
Perform Like a frAg Star
Perform Like a frAg StarPerform Like a frAg Star
Perform Like a frAg Star
renaebair
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
confluent
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet Profiles
Bram Vogelaar
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Edureka!
 
YARN Services
YARN ServicesYARN Services
YARN Services
Steve Loughran
 
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardHow I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
SV Ruby on Rails Meetup
 

Similar to From docker to kubernetes: running Apache Hadoop in a cloud native way (20)

Puppet
PuppetPuppet
Puppet
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Infrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using TerraformInfrastructure-as-Code (IaC) using Terraform
Infrastructure-as-Code (IaC) using Terraform
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
 
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps dayAprovisionamiento multi-proveedor con Terraform  - Plain Concepts DevOps day
Aprovisionamiento multi-proveedor con Terraform - Plain Concepts DevOps day
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Apache hadoop 2_installation
Apache hadoop 2_installationApache hadoop 2_installation
Apache hadoop 2_installation
 
Cutting through the fog of cloud
Cutting through the fog of cloudCutting through the fog of cloud
Cutting through the fog of cloud
 
Perform Like a frAg Star
Perform Like a frAg StarPerform Like a frAg Star
Perform Like a frAg Star
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache KafkaSolutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafk...
 
Creating Reusable Puppet Profiles
Creating Reusable Puppet ProfilesCreating Reusable Puppet Profiles
Creating Reusable Puppet Profiles
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine YardHow I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 

Recently uploaded (20)

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 

From docker to kubernetes: running Apache Hadoop in a cloud native way